Relating cognate words directly

Kimmo Koskenniemi
Department of Modern Languages
Faculty of Arts
University of Helsinki

Language relations are studied using cognate words, i.e. a pair of words which have the same or similar meaning and which are pronounced in a systematically similar manner. This talk presents some new methods and frameworks for such studies. Not entirely new but maybe a bit more objective and mechanical than the mainstream. The framework proposed is supported by readily available open source programs which ought to make the approach attractive for other scholars.

The method is based on the same principles as the morphological two-level model, i.e. parallel rather than sequential rules. Cognate words are related directly, phoneme by phoneme (or symbol by symbol). In order to match words of different length, one aligns the phonemes by adding zeros if necessary so that the matching symbol pairs are phonetically similar. The alignment is important, because it defines the sound correspondences and the rules to be written.

In order to relate two languages, an artificial language is built using the symbol pairs implied by the alignment. From this artificial language one can study the regular patterns how the two languages correspond to each other. One may describe the regularities one by one using two-level rules, and test them right away. When all phoneme correspondences have been described, one can build a mapping between the two languages. One may even proceed and map the artificial language into a proto-language by mapping the complex symbols of the artificial language tentatively into one or the other components of the pair. The mappings between the languages, the artificial language and the tentative proto-language can be combined using Helsinki finite-state transducer technology HFST which is free software.

Automatic methods for phoneme alignment have been built using the HFST tools. The rules implied by the alignment are fairly easy, and one can seriously consider automating the rule discovery as well. One ought to keep in mind that the methods presented in the talk are heavily based on carefully selected sets of cognate words and they are not intended for detecting language relations from masses of raw data.

Professor Emeritus Kimmo Koskenniemi began his studies at the University of Helsinki with mathematics. He was originally devoted to programming but moved soon to the Faculty of Arts, with the aim of combining general linguistics and computer science. Professor Koskenniemi, who developed the so-called two-level morphological model used for identifying word forms, worked for over two decades as Professor of computational linguistics and language technology.

For more information, please visit: CV