Unsupervised Language Acquisition from Raw SpeechReinhold Häb-Umbach - University of Paderborn
Unsupervised Language Acquisition from Raw Speech
University of Paderborn
We consider the problem of segmenting an input sequence of symbols in recurrent patterns. This is achieved by employing nonparametric Bayesian statistical models, in particular the Nested Pitman-Yor process. We then consider the problem that the input sequence is noisy, i.e., contains errors, and propose an iterative word segmentation algorithm. An application is automatic speech recognition for a language for which a pronunciation lexicon and a language model are unavailable. Results will be presented for an English task and, for the segmentation of noisefree input, for two austronesian languages, Wooi and Waima’s.
If you would like to meet with the speaker, please contact Dietrich Klakow.