B6: Neural Feature and Representation Learning for Information Density Based Translationese Classification

The project B6 continuation application is focused on addressing limitations of the information density methodological framework based on hand-crafted features and implemented and evaluated during the first phase of the project by making use of neural network approaches to capture and explore information density-based textual features, namely surprisal, with applications to translationese identification, machine translation evaluation and improvement. A systematic comparison will be conducted between neural and standard count-based textual features with the objective of exploring the information encoded in continuous representations obtained by unsupervised and end-to-end learning methods. The combination of hand-crafted and neural information density features will provide an extension to the classic surprisal measure. In addition, neural approaches facilitate multi-granularity input representations and various context sizes for surprisal measure calculation. An important part of project B6 in the context of the CRC is the analysis and visualisation of representations learned by neural networks, in order to compare with features inspired by our linguistic intuitions.



Chowdhury, Koel Dutta; España-Bonet, Cristina; van Genabith, Josef

Understanding Translationese in Multi-view Embedding Spaces Inproceedings

Proceedings of the 28th International Conference on Computational Linguistics, pp. 6056-6062, International Committee on Computational Linguistics, Barcelona, Catalonia (Online), 2020.

Abstract | Links | BibTeX

Bizzoni, Yuri; Juzek, Tom S; España-Bonet, Cristina; Chowdhury, Koel Dutta; van Genabith, Josef; Teich, Elke

How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech Inproceedings

The 17th International Conference on Spoken Language Translation, Seattle, WA, United States, 2020.

Abstract | Links | BibTeX


Lapshinova-Koltunski, Ekaterina; Espa{~n}a-Bonet, Cristina; van Genabith, Josef

Analysing Coreference in Transformer Outputs Inproceedings

Proceedings of the Fourth Workshop on Discourse in Machine Translation (DiscoMT 2019), pp. 1-12, Association for Computational Linguistics, Hong Kong, 2019.

Abstract | Links | BibTeX


Bojar, Ondvrej; Chatterjee, Rajen; Federmann, Christian; Graham, Yvette; Haddow, Barry; Huck, Matthias; Yepes, Antonio Jimeno; Koehn, Philipp; Logacheva, Varvara; Monz, Christof; Negri, Matteo; Neveol, Aurelie; Neves, Mariana; Popel, Martin; Post, Matt; Rubino, Raphael; Scarton, Carolina; Specia, Lucia; Turchi, Marco; Verspoor, Karin; Zampieri, Marcos

Findings of the 2016 Conference on Machine Translation Inproceedings

Proceedings of the First Conference on Machine Translation, pp. 131-198, Association for Computational Linguistics, Berlin, Germany, 2016.

Links | BibTeX

Rubino, Raphael; Lapshinova-Koltunski, Ekaterina; van Genabith, Josef

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification Inproceedings

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 960-970, Association for Computational Linguistics, 2016.

Links | BibTeX

Rubino, Raphael; Degaetano-Ortlieb, Stefania; Teich, Elke; van Genabith, Josef

Modeling Diachronic Change in Scientific Writing with Information Density Inproceedings

Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 750-761, The COLING 2016 Organizing Committee, 2016.

Links | BibTeX

Josef van Genabith



Koel Dutta Chowdhury



Cristina España i Bonet



Daria Pylypenko