B6: Aspects of Information Density in Human and Machine Translation
Translated texts often exhibit characteristic features different from originally authored text. In Translation Studies, this is sometimes referred to as translationese. Some aspects of translationese are due to the source language, from which the translation was prepared, and varies with source language. Other aspects of translationese are deemed universal, including the tendency that translated texts are often more simple than originally authored text and the tendency that translated texts are often more explicit. A growing body of research has confirmed the existence of different aspects of translationese in human translation using a number of methodologies from empirical translation studies (Gellerstam 1986, Baker 1993, Laviosa 1998, Hansen 2003, Teich 2003) and more recently from machine learning based text categorisation and computational stylometrics (Baroni and Bernardini 2006, Ilisei et al. 2010, Koppel and Ordan 2011). However, to date there is no common methodological framework for characterising translationese and the study of translationese has concentrated on artefacts of human translation. Our research will explore to what extent information density can be used as a methodological framework to capture important aspects of translationese in both human and machine translation (MT). We will investigate the use of information density measures as additional features in MT models, as well as in MT evaluation (against a reference) and MT quality estimation (without access to a reference).
Findings of the 2016 Conference on Machine Translation Inproceedings
Proceedings of the First Conference on Machine Translation, pp. 131–198, Association for Computational Linguistics, Berlin, Germany, 2016.
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 960–970, Association for Computational Linguistics, 2016.
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 750–761, The COLING 2016 Organizing Committee, 2016.