Project Overview
Specific Areas of Research
Project Overview
The basic goal of our research is understanding the computational foundations of how humans extract information from language. If humans - especially young humans - are extracting information about the linguistic system underlying the observable language data, we tend to call this process language acquisition. We also tend to marvel at how good young humans are at it. If humans are extracting information about the world from the observable language data, we tend to call this process information extraction (or perhaps comprehension). When we try to make machines extract this same information from language data, we tend to marvel at how good humans are at it.One reason humans may be so good at doing this is that they may have a proverbial leg up, in the form of biases about how to use the available language data. Computational models provide us a way to precisely explore this question by combining discrete hypotheses with probabilistic methods. Via computational modeling, we can examine what biases humans bring (both helpful and perhaps not-so-helpful) to different information extraction tasks, whether these biases are necessary for success, and what the nature of the necessary biases is.
Some specific areas of research
Language Acquisition
Language acquisition - that is, learning the underlying linguistic systems responsible for the observable language data - is a classically difficult problem. This is particularly true when the correct linguistic systems are underdetermined by the available data, a situation often referred to as "poverty of the stimulus" or "the inductive problem of language learning". Research in this area explores how children acquire the correct knowledge about the language that they do from the data they actually encounter, and what they need in order to do it. Projects include studies of the acquisition of the basic word order of objects and verbs, referential elements, stress contours, syntactic islands, and free relative clauses.This work has been supported by the National Science Foundation, grant BCS-0843896, in collaboration with Jon Sprouse. A project summary is here. Check out the results of this grant. Also, check out the Input & Syntactic Acquisition 2009 workshop held at UC Irvine, and the Input & Syntactic Acquisition 2012 workshop held at the 2012 annual meeting of the Linguistic Society of America in Portland, Oregon.
Models of Acquirability
A partially intersecting line of research focuses on language learning models that are constrained in the ways that humans are for accomplishing a particular acquisition task. That is, instead of asking, "Can it be learned at all by a model?", these models ask, "Can it be learned by a model that uses the input humans use in the way that humans use that input?" Often, this may involve adapting more general models of learnability so that they are models of "acquirability". Projects include studies of word segmentation and the acquisition of referential elements.Linguistic Cues to Information about the World
Beyond comprehending the straightforward meaning of language data (which isn't necessarily so straightforward at all to comprehend), people also seem able to extract information about social relationships (such as power dynamics) and intentions (such as the intent to hide information). Moreoever, they seem to be able to do this just by looking at the language itself (such as text in an email), in the absence of other information such as facial expressions and auditory cues. Since the only information available is the language, humans must be using linguistic cues to do so. This area of research focuses on what these cues are, whether there are additional informative cues available, and how machines can learn to be as good as humans are (or perhaps even better one day). Projects include studies of deception detection and social goal indicators.This work is supported by the UC Irvine Academic Senate Council on Research, Computing, and Libraries, multi investigator research grant MI 14B-2009-2010, in collaboration with Mark Steyvers and Padhraic Smyth. A short paper talking about this kind of project is here.
>