This site will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device.

Graduate Program in Linguistics at the City University of New York

Abstract for Xuan-Nga Cao Kam's talk

Can simple statistical learners acquire auxiliary inversion?
Xuan-Nga Cao Kam (CUNY Graduate Center)
February 20, 2007 (Tuesday)
6:30 PM - 8:00 PM; Room 7102, The CUNY Graduate Center

Successful outcomes have been reported for acquisition of aspects of natural language syntax by very simple statistical models. Reali & Christiansen (2005) demonstrated that a bigram model can discriminate grammatical from ungrammatical auxiliary inversion in complex sentences (e.g., Is the little boy who is crying hurt? / *Is the little boy who crying is hurt?) when trained on natural child-directed speech. Subsequent work by Kam et al. (2005) revealed that the bigram model's success was limited to a very narrow class of examples. Discrimination was poor for other instances of auxiliary inversion, e.g., with object-gap relative clauses (Is the wagon your sister is pushing red?) or with do-support (Does the boy who plays the drum want a cookie?). This paper reports a series of experiments in which the training corpus and the bigram model were incrementally enriched, in order to identify the point at which the model was able to attain a more general knowledge of auxiliary inversion. Results for object-gap and do-support questions showed less benefit than expected from upgrading the training corpus to speech to an older child, or from a tenfold increase in corpus size. Similarly modest gains were observed when syntactic category information was provided by replacing either content words or all words with part-of-speech labels. A trigram model did not perform better than the bigram model. It is concluded that the search for a lower bound on the resources necessary for natural language syntax acquisition must continue in future research.