Language variation and change in social media: a computational perspective

Dong Nguyen - Utrecht University

Language variation and change in social media: a computational perspective

Dong Nguyen
Utrecht University

Social media presents exciting opportunities to study language in a variety of social situations and on a very large scale. At the same time, language in social media also presents challenges to the development of NLP tools. In this talk, I will discuss results from two recent studies. In the first study, we use word embeddings (representing words as dense continuous vectors) to detect semantic change in a large Twitter corpus. In the second study, we look at what happens to spelling variants using popular word embedding methods, and I’ll ask how they should be represented in the embedding space.¬†Finally, I will discuss the emerging interdisciplinary area of computational sociolinguistics and reflect on its challenges and opportunities.

I am an assistant professor at Utrecht University. Previously I was a research fellow at the Alan Turing Institute. I was also affiliated with Edinburgh University. I completed my Ph.D. at the University of Twente. I received a master’s degree from the Language Technologies Institute at Carnegie Mellon University and a bachelor’s degree in Computer Science from the University of Twente. I have interned at Facebook (fall 2011), Microsoft Research (fall 2013), and Google (summer 2014). In fall 2015 I visited Georgia Tech.