Sociolinguistics and Computational Linguistics

John Nerbonne - University of Groningen and University of Freiburg

University of Groningen and University of Freiburg

Computational linguistics (CL) techniques have been used extensively in dialectology, including especially edit-distance measures of pronunciation differences recorded in transcriptions, stemming for comparing lexical choices, part-of-speech tagging for comparing syntaxes, and a range of machine-learning techniques for detecting structure in underlying distributions.  These have become part of the dialectometry ‘tool kit’ (Wieling & Nerbonne 2015).  Given the Chambers-Trudgill thesis that language variation ought to constitute a single discipline – whether the variation be geographically or socially conditioned – we should expect similar sociolinguistic studies to be fruitful.  Nguyen et al. (to appear) survey a wide range of work in CL that has begun to include social (demographic) factors and argue that closer cooperation will benefit both CL and sociolinguistics.  They see the time as ripe for the emergence of a subdiscipline, computational sociolinguistics.

The dominate theoretical direction in sociolinguistics is Labovian, and it focuses on the social meaning attached to language variation and is particularly interested in studying the progress of change in individual linguistic features – e.g. whether a final /r/ is pronounced as such, or how high the vowel in ‘can’ is pronounced. Most dialectometric work proceeds by examining aggregate differences among varieties, making the study of individual changes less primary.  But we note that computational techniques for isolating individual differences have also been developed, and we note several areas of sociolinguistics where the aggregate perspective seems more promising, including the varied influence of standardization, education, and mobility on variation.

