Kielipankissa: Aku Rouhe

University of Helsinki

Kielipankki - The Language Bank of Finland offers a comprehensive set of resources, tools and services in a high-performance environment. Aku Rouhe tells us about his research on speech recognition.

(Image: Jasmine Gustafsson)

His current work includes, among other things, fine-tuning large language models that are optimized for Finnish and Nordic languages. These openly available LLMs have been created through successful academia-enterprise collaboration.

Who are you?

I am Aku Rouhe. For several years, I did research in the Aalto University Speech Recognition research group, and defended my doctoral thesis there this past February. After Aalto, I moved to Silo AI (now owned by AMD), where I work with large language models (LLMs) - I have moved from speech to text. My interest in language is also part of my free time in creative writing.

What is your research topic?

In my doctoral thesis, I compared end-to-end models with more traditional multi-model decomposed systems. In recent years, both the academia and commercial deployments in speech recognition have largely moved to end-to-end models. However, my work showed how multi-model decomposed systems remain a competitive alternative, for instance, in terms of recognition accuracy. Indeed, the main advantage of end-to-end models is probably their simplicity.

End-to-end models often require vast training resources. Thus, it was important for me to study end-to-end models applied to under-resourced languages as well.

My current work at Silo is on fine-tuning large language models such as Poro and Viking, which are models optimized for Finnish and Nordic language. These LLMs were developed in a collaborative research project between Silo and TurkuNLP.

How is your research related to Kielipankki?

End-to-end models hunger for data, so large corpora are needed. I was involved in compiling the Aalto Finnish Parliament ASR Corpus 2008-2020, which consists of Finnish Parliament plenary session recordings, and also in the Lahjoita Puhetta project, where volunteers donated their speech to produce the Puhelahjat corpus. I got to combine both of these large speech corpora in an article that was published when I was finalizing my PhD, at a time when I was involved with the LAREINA project. Nowadays, the Finnish speech recognition resources are respectable for a language spoken by so few.

Recent publications

Rouhe, A., Grósz, T., Kurimo, M. 2024. Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 623-638, 2024.

Virkkunen, A., Rouhe, A., Phan, N. et al. 2023. Finnish parliament ASR corpus. Lang Resources & Evaluation 57, 1645-1670 (2023).

Moisio, A., Porjazovski, D., Rouhe, A. et al. 2023. Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks. Lang Resources & Evaluation 57, 1295-1327 (2023).

Rouhe, A., Virkkunen, A., Leinonen, J., Kurimo, M. 2022. Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0. Proc. Interspeech 2022, 3543-3547.

Corpora

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.