B6: Neural Feature and Representation Learning for Information Density Based Translationese Classification
The project B6 continuation application is focused on addressing limitations of the information density methodological framework based on hand-crafted features and implemented and evaluated during the first phase of the project by making use of neural network approaches to capture and explore information density-based textual features, namely surprisal, with applications to translationese identification, machine translation evaluation and improvement. A systematic comparison will be conducted between neural and standard count-based textual features with the objective of exploring the information encoded in continuous representations obtained by unsupervised and end-to-end learning methods. The combination of hand-crafted and neural information density features will provide an extension to the classic surprisal measure. In addition, neural approaches facilitate multi-granularity input representations and various context sizes for surprisal measure calculation. An important part of project B6 in the context of the CRC is the analysis and visualisation of representations learned by neural networks, in order to compare with features inspired by our linguistic intuitions.
Findings of the 2016 Conference on Machine Translation Inproceedings
Proceedings of the First Conference on Machine Translation, pp. 131–198, Association for Computational Linguistics, Berlin, Germany, 2016.
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 960–970, Association for Computational Linguistics, 2016.
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 750–761, The COLING 2016 Organizing Committee, 2016.