Is human language efficient?
- January 21, 2020
- UCI language scientist uses machine learning to put our common word order patterns to the test; findings are published in the Proceedings of the National Academy of Sciences
“I wrote a letter to a friend.” The words in this sentence, whether translated in English, Spanish, Arabic, French, German or Mandarin, are deliberately ordered to maximize information transfer while minimizing complexity, says UCI language science assistant professor Richard Futrell. “When translated into another language – say Japanese, Korean or Turkish – the word order changes, but still adheres to a pattern that maximizes efficiency,” he says.
A linguistics and data science researcher, he’s spent a great deal of time studying language universals - word order arrangements that occur systematically across languages. Futrell holds that these patterns show up in language after language because of a balancing act between communicative and cognitive pressures that plays out during our everyday conversations and social interactions.
Using a massive dataset of 11.7 million words in 700,000 annotated sentences and in 51 languages collected by the Universal Dependencies project, Futrell and researchers at Stanford University deployed machine learning to test a theory that explains efficient word order universals – something that hasn’t been previously done with this level of precision.
Their resulting mathematical model, explained in the Jan 21 issue of the Proceedings of the National Academy of Sciences, was successful in predicting eight of the common language universals.
“When you find the optimal grammar for efficiency, it looks like a real human language,” he says. “Establishing a model that works this way is a first in linguistics.”
His next step in this work will seek to apply the same method to explain more and more of the universal patterns in languages.
“The goal of linguistics is to explain what all languages have in common and how they’re different,” he says. “Understanding the theory behind language helps in fields from philosophy, to second language learning, to human language technologies, because it frames our understanding of what language is and how it can be processed.”
Futrell began his faculty appointment at UCI in 2018 following a postdoctoral fellowship at the Massachusetts Institute of Technology where he earned his Ph.D. in cognitive sciences. His co-authors on this project include Stanford University graduate student Michael Hahn and professor Dan Jurafsky.