B1: Information Density and Scientific Literacy in English – Synchronic and Diachronic Perspectives
Project B1 investigates linguistic densification in the evolution of scientific writing in English (17th century to present). As scientific activity becomes more diversified and specialized, particular meanings become more predictable (according to a given scientific field). The hypothesis of increasing linguistic densification predicts the emergence of denser, less redundant encodings that optimize efficiency in communication. To evaluate this hypothesis, the empirical part of the project will analyse a synchronic and a diachronic corpus of scientific texts in terms of forms of linguistic encoding potentially involved in managing information density. In the computational part, we exploit language models to contrast the information density profiles for whole texts and registers.
Average surprisal of parts-of-speech Inproceedings
Corpus Linguistics 2017, Birmingham, UK, 2017.
An information-theoretic account on the diachronic development of discourse connectors in scientific writing Inproceedings
39th DGfS AG1, Saarbrücken, Germany, 2017.
Information-based modeling of diachronic linguistic change: from typicality to productivity Inproceedings
In Proceedings of Language Technologies for the Socio-Economic Sciences and Humanities (LATECH'16), Association for Computational Linguistics (ACL), Berlin, Germany, 2016.
The Royal Society Corpus. Towards a high-quality resource for studying diachronic variation in scientific writing Inproceedings
In Proceedings of Digital Humanities (DH'16), Krakow, Poland, 2016.
An Information-Theoretic Approach to Modeling Diachronic Change in Scientific English Journal Article
Selected Papers from Varieng - From Data to Evidence (d2e), Helsinki, Finnland, 2016.
Topical Diversication over Time in the Royal Society Corpus Inproceedings
Proceedings of Digital Humanities (DH'16), Krakow, Poland, 2016.
The Royal Society Corpus: From Uncharted Data to Corpus Inproceedings
In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'16), Portoroz, Slovenia, 2016.
The taming of the data: Using text mining in building a corpus for diachronic analysis Inproceedings
Varieng - From Data to Evidence (d2e), University of Helsinki, 2015.
Information Density in Scientific Writing: A Diachronic Perspective Inproceedings
"Challenging Boundaries" - 42nd International Systemic Functional Congress (ISFCW2015), RWTH Aachen University, 2015.
A resource for the diachronic study of scientific English: Introducing the Royal Society Corpus Inproceedings
Corpus Linguistics 2015, Lancaster, 2015.
Modeling intra-textual variation with entropy and surprisal: Topical vs. stylistic patterns Inproceedings
pp. 68-77, LaTeCH-CLfL Workshop, ACL, Vancouver, Canada, 0000.
The making of the Royal Society Corpus Inproceedings
pp. 7-11, 21st Nordic Conference on Computational Linguistics (NoDaLiDa) Workshop on Processing Historical lancuage, Gothenburg, Sweden, 0000.