Just like ChatGPT and other generative language models train on human texts to create grammatically correct sentences, a new modeling method by researchers at Penn State trains on recordings of birds to create accurate birdsongs. The results could improve understanding of the structure of birdsong and its underlying neurobiology, which could lend insight in the neural mechanisms of human language, the team said. A paper describing the research was recently published in the Journal of Neuroscience.
Much like how humans arrange words in a particular order to form a grammatically correct sentence, birds tend to sing sets of notes called syllables in a limited number of combinations.
"Although much simpler, the sequences of a bird's song syllables are organized in a similar way to human language, so birds provide a good model to explore the neurobiology of language," said Dezhe Jin, associate professor of physics in the Eberly College of Science and lead author of the paper.
For both humans and birds, finishing a sentence or song sequence often depends on what has already been said. For example, the phrase "flies like" could be part of an analogy as in the phrases "time flies like an arrow" or an indication of enjoyment as in "fruit flies like bananas." However, mixing and matching what comes after "flies like" results in "time flies like bananas" or "fruit flies like an arrow," which don't make sense. In this example, the phrase "flies like" has what researchers call context dependence.
"We know from our previous work that the songs of Bengalese finches also have context dependence," Jin said. "In this study, we developed a new statistical method to better quantify context dependence in individual birds and start to understand how it is wired in the brain."
The researchers analyzed previously recorded songs from six Bengalese finches, which sing about 7 to 15 syllables in each sequence. With the new method, the researchers can create the simplest models that accurately reflect the sequences that individual birds actually sing.
The models are similar to large language models in that they depict probabilities of what words - or in this case syllables - are likely to follow a particular word/syllable based on previously analyzed texts or song sequences. They are a type of Markov model, a method to model a chain of events. They are presented as a sort of flow chart that starts with a syllable that points to options for different syllables that could follow. The probability that a syllable might follow is indicated in the arrow between them.
"Basic Markov models are quite simple, but they tend to overgeneralize, meaning they might result in sequences that don't actually exist," Jin said. "Here, we used a specific type of model called a Partially Observable Markov Model that allowed us to incorporate context dependence, adding more connections to what syllables typically go together. The added complexity allows for more accurate models."
The researchers' new method creates a series of potential models that could describe an individual bird's song based on recorded sequences. They begin with the simplest model, using a statistical test to see if a potential model is accurate or if it overgeneralizes and produces sequences that do not actually exist. They work through more and more complex models until they determine the simplest model that accurately captures what the birds are singing. From this final model, the researchers can see which syllables have context dependence.
"All six birds we studied had context dependent syllable transitions, suggesting this is an important aspect of birdsong," Jin said. "However, the number of syllables with context dependence varied among the individual bids. This could be due to several factors, including aspects of the birds' brains, or, because these songs are learned, this could be related to the amount of context dependence in their tutor's songs."
To begin to understand the neurobiology behind context dependence syllable transitions, the researchers also analyzed the songs of birds that could not hear.
"In these birds, we see a dramatic decrease in context dependence, which suggests that auditory feedback plays a large role in creating context dependence in the brain," Jin said. "The birds are listening to themselves and adjusting their song based on what they hear, and the related machinery in the brain likely plays a role in context dependence. In the future, we would also like to map neuron states to specific syllables. Our study suggests that, even when a bird is singing the same syllable, different sets of neurons might be active."
The researchers said that their new method provides a more automated and robust way to analyze not only bird song, but other animal vocalizations and even behavioral sequences.
"We actually used this method with the English language and were able to generate text that is mostly grammatical," Jin said. "Of course, we're not trying to create a new generative language model, but it is interesting that the same kind of model can handle both birdsong and human language. Perhaps the underlying neural mechanism is similar too. Many philosophers describe human language, and especially grammar, as exceptional, but if this model can create language-like sentences, and if the neural mechanisms behind birdsong and human language are indeed similar, you can't help but wonder if our language really is so unique."
In addition to Jin, the research team included Jiali Lu, who earned a doctoral degree in physics at Penn State in 2023; Sumithra Surendralal, who earned a doctoral degree at Penn State in 2016 and is now at Symbiosis International (Deemed University) in India; and Kristofer Bouchard at the Lawrence Berkeley National Laboratory. Funding from the U.S. National Science Foundation supported this research.