Deacon, Pinker, and Parsimony
I.
A child must achieve competence with an infinite language based on a finite number of heard sentences. This is the essence of Noam Chomsky's "poverty of the stimulus" argument. As originally presented, it made a case for nativism, forcing empiricist theories to explain how such competence is achievable. In Stephen Pinker's Language Learnability and Language Development, he uses learnability both as a challenge to theories of language acquisition, and as a heuristic for evaluating them. Terrence Deacon, in The Symbolic Species, while dismissive of most of Chomskyan linguistics, still sees the learnability problem as a challenge to any theory that hopes to explain human linguistic knowledge. I will begin with a scrutiny of Pinker and Deacon's similar responses to the learnability problem, and will then examine Pinker's learnability criterion, which counters Deacon's aims.
II.
The views of Deacon and Pinker overlap to a surprising extent. Some shared ground, and potentially shared ground, are to be found in their views on learnability. Both feel compelled to respond to the formal version of the poverty of the stimulus argument, as originally stated in a mathematical learning theory paper by E. Mark Gold. Gold proved that without feedback about hypothesized grammars or examples of ungrammatical constructions, it is impossible to inductively learn the unique grammar that produces an infinite language, even with a learning algorithm that remembers all observed data (Gold 1967).
A number of responses to Gold's theorem have been offered. Gold himself proposed a likely solution: that "the class of possible natural languages is much smaller than one would expect from our present model of syntax" (Gold, p. 453). Wexler and Culicover's degree-2 learning theory follows this intuition and, with some motivation from linguistic evidence, constrains transformations to operate on at most two adjacent levels of deep structure (Wexler and Culicover 1980). They also forbid transformations from applying to already-transformed data. These two constraints strongly limit the context-freeness of language. By doing so, only a finite number of sentences at a certain level of nesting are allowed; after enough time, the learner can conclude that all unseen sentences of that depth are ungrammatical. These strong constraints are not arbitrary: Wexler and Culicover found that most proposed transformations fit the two conditions. Also, Chomsky independently concluded from examining linguistic data that a similar process, the "subjacency principle," was operating (Pinker 1979).
Pinker, too, concludes that the scope of possible languages is limited: "Children acquire language by exploiting rich formal and substantive constraints on the types of rules that languages may have...[the child] entertains a small subset of the hypotheses consistent with the data" (1984, p. 358). Pinker concludes that constraints, including word order and phrase structure geometry (e.g. subjacency) aid the learning of grammars (1984, p. 358).
Deacon's response to Gold's argument does not make any specific claims about the set of possible grammars. However, according to Deacon's theory, one would expect to find a plentitude of simplifying linguistic universals arising from the coevolution of language to the strongly biased learning process. These universals would be responses to "limiting constraints on human attention, working memory, sound production, and automation of functions" (Deacon 120). Although he explains the case of sound production and speech segmentation in detail, he does not elaborate on the biases in the syntactic processes. The models of Wexler and Culicover, and of Pinker, are a rich source of such universals, although they do not account for the evolution of said universals. Deacon also implies that features of the learning process simplify the task:
A critical factor in the argument is the way that learning is understood. Learning is construed in its most generic sense as logical induction...Learning is not one process, but the outcome of many. The efficiency with which one learns something depends in part on the match between the learning process and the structure of the patterns to be learned. (Deacon 127-8)
The problem is that any learning process that does something besides induct (in the strict sense) will reach incorrect conclusions about the grammar being learned. Since most humans are competent far beyond what can be learned by simple induction (given the poverty of the stimulus) such appeals to model-based learning processes require statistical regularities that a biased learning process can exploit. Deacon sounds like he has something else in mind: "inductively deriving grammatical rules isn't the only way to arrive at this competence without relying on preformed hypotheses" (128). I believe that by saying "preformed hypotheses" he only means something like the Universal Grammar. He apparently does not consider the biases in the learning process to be "preformed hypotheses," although in effect, they are.
Deacon's other response to Gold's theorem is that it has been empirically disproven. As proof, he cites Jeff Elman's connectionist model of grammar learning, which learns to predict the category of the next word in a sentence, thus embodying grammatical regularities:
This simulation thus demonstrated that it was possible to design a device that could learn to predict grammatically correct sentence structure in a purely inductive fashion, given nothing more than a corpus of positive examples of allowable texts-exactly what the UG theorists had said was impossible. The key...was to structure the learning process differently at different stages of learning. (Deacon 134)
Deacon overstates Elman's claim when he says that the learning process needed "nothing more than a corpus of positive examples." Elman's network does not succeed because of some novel learning process; Gold's theorem shows that even the most powerful learning process possible cannot solve the task, and neural network research has shown that no uniformly superior processes exist. Elman's learning process begins training the network with a high level of noise in "working memory." This noise is gradually reduced, which inherently biases the learning process towards larger-scale generalizations and against directly sequential relationships. The process embodies domain assumptions, which are helpful in learning problems on certain error landscapes and harmful on other error landscapes.
Elman's network was also a microdomain, with a limited number of words and grammatical classes. One might therefore question the scalability of his results, but it is reasonable to assume that they will scale unless proved otherwise. And while Elman did show that his network could not learn the task without the noisy learning process, this does not prove that the task is like achieving human grammatical competence. There are many nonlinguistic tasks where having an initially noisy network and then performing an annealing process is useful.
III.
A major difference between Pinker and Deacon is their view of the learning process. This has lent some to cast Deacon's book as a reopening of the debate between Chomsky and Piaget, between nativism and empiricism. In these somewhat misleading terms, the issue is the developmental process of language: whether it follows the same innate rules from the beginning (Chomsky, Pinker), or whether it is a staged learning process (Piaget, Deacon). However, the ground of the debate has shifted. Deacon does propose less innate brain complexity, but given the credibility he gives to the learnability argument and his advocacy of biased learning, he is far from being an empiricist. The new question is what kinds of innate knowledge are required for language learning.
To take one example, Pinker argues that "the data on language acquisition provide little support for" the hypothesis of maturational changes in language acquisition, while Deacon argues that "critical periods" are such changes. Pinker, however, turns his judgment into a heuristic for scientific investigation, ruling out Piagetian theories by fiat and insisting that Chomskyan theories are inherently more parsimonious. If Pinker's move is accepted, it becomes problematic for Deacon, since he postulates maturational changes in order to compensate for the comparative lack of innate grammatical knowledge. I will argue that Pinker's heuristic (although it may once have been a useful corrective to misguided developmental linguistics) is not fruitful.
Since we know that language is learnable, Pinker argues that investigations should begin from the perspective of adult competence and work backwards: "the null hypothesis in developmental psychology is that the cognitive mechanisms of children and adults are identical" (1984, p. 7). His "continuity assumption" states:
The most explanatory theory will posit the fewest developmental changes in the mechanisms of the [linguistic] virtual machine, attributing developmental changes, when necessary, to increases in the child's knowledge base, increasing access of computational procedures to the knowledge base, and quantitative changes in parameters like the size of working memory. (1984, pp. 6-7)
He borrows the word "explanatory" from Chomsky's criterion of "explanatory adequacy." Chomsky uses this criterion to reject theories that cannot possibly explain how humans can develop linguistic competence, for example non-compositional semantic theories, which would require infinite knowledge. Pinker's usage is not standard: exactly what does it mean to call one theory "most explanatory"? He means that a theory of language acquisition asserting discontinuities (for example, separate grammars for each stage) unsatisfactorily ignores the question of how a child makes the transition from one stage to another. A continuous theory explains language acquisition better. So "the fewer the mechanisms, the more parsimonious the theory and the more explanatory its accounts" (1984, p. 6).
The first problem with the continuity assumption is that it is bound to the mind-as-computer metaphor. The metaphor is evident in his idea of what changes to permit: he allows changes to the knowledge base, changes in working memory, and changes in access to the knowledge base. The first and second are reasonable, although the computational metaphor is far too vague about how the knowledge base and working memory are realized in the neural substrate. The third change, in access to the knowledge base, is more problematic. If "access" is defined by the speed of lookup, then it is highly unlikely to change in the direction desired: as more information enters the knowledge base, retrieval will be slower, not quicker. If access is the amount of information that can be retrieved at once, that too seems unlikely to increase, since nearly all of the neural pathways within the cortex have been fixed by the time of a child's first words.
The problem is that all of the changes that Pinker permits are "software" changes. Under the good old-fashioned AI (GOFAI) model of cognitive science, the linguistic programming is the "software" running on the brain's "hardware." Connectionist models question this distinction, arguing that the software is the hardware: the neural wiring is self-modifying. In one network that learns a small though non-trivial grammar (Elman et al., p. 342), the rules of grammar are represented by the connection weights. In order to learn the grammar, Elman's network must initially be subjected to noisy connections, preventing it from overfitting the data. From a connectionist viewpoint, postulating a relationship between maturation and perturbing noise is more parsimonious than postulating changes in access to the knowledge base. To call the noisy connections "hardware" or "software" is to miss the point of connectionism, but Pinker ignores the possibility that these changes might be simpler than "software" changes: he calls this argument "possible" but "unparsimonious" (1984, p. 8).
Can the continuity assumption be extricated from the mind-as-computer metaphor? It may be convenient for linguists to see their grammars as explicitly constituting the mechanisms of syntactic processing, but given the structure of the brain, the rules are more likely represented implicitly. And it is difficult to see which grammatical processes are more parsimonious at the neural level. Certainly, a view proposing changes in the neural configuration is less parsimonious than one proposing changes in synaptic connection strengths; but we do not need a heuristic to tell us this.
The second problem with Pinker's continuity assumption is his one-sided use of parsimony. There are real problems with competing parsimony claims, which lead me to believe that the continuity assumption should be rejected. By claiming that theories of the specified explanatory form are more parsimonious, Pinker aims to exclude undesirable speculation about grammar change from developmental psycholinguistics. Finding the minimal theory compatible with observations is a serious problem for the linguistics of adult languages. In the case of child language, grammars are especially underconstrained. Pinker is rightly skeptical of attempts to produce "child grammars" without apparent connection to adult grammars. However, excluding grammar changes by default is an unsound practice, and it clashes with other parsimony arguments. Elman's model shows that a parsimonious assumption in a connectionist model may correspond to an unparsimonious assumption in a GOFAI model, one that amounts to a grammar change as the system shifts from overregularization to correct conjugation. The reverse is also true: using a stack in working memory during parsing operations is parsimonious from a GOFAI point of view, but is not so from a connectionist point of view (because recursive structures are difficult to implement). Which paradigm one leans toward is not the point. Since both the symbolic and sub-symbolic levels of description fit certain sets of data, their incompatibilities must be reconciled, not ignored a priori.
IV.
Pinker and Deacon offer different proposals of language evolution, which have implications for the parsimony debate. Their theories about the origins of language are for the most part compatible: for example, each speculates that nonlinguistic symbolic reference may underlie the symbolic capacities of language (Pinker & Bloom, p. 478). Their rhetoric makes their viewpoints seem further apart: Deacon suggests that talk of a "Language Instinct" is misleading, because it encourages uncritical and unrealistic innatism, such as proposals of Fodorian language modules (in Fodor's strong sense of modularity). The "Language Instinct," according to Deacon, "tends to be interpreted in terms of a false dichotomy that has deeply confused research into the basis for language. It is misleading to imagine that what is innate in our language abilities is anything like foreknowledge of language or its structures" (p. 141). He proposes instead "general learning biases" and drops the word "instinct."
A real difference between the two is that Deacon, apparently unlike Pinker, proposes that apes are capable of symbolic reference, which reduces the sharp discontinuity between human language and ape communication. This is sensible from a cladistic point of view; one would expect one species to share characters with another slightly divergent species. The same type of argument has been used against other anthropocentrisms: by De Waal, who argues for a moral sense in apes, and by Wrangham, who proposes a common innate basis for lethal raiding and group violence in humans and apes. Whether or not the Kanzi data shows real language-like skill, if it is at all persuasive, then Pinker is losing the biological parsimony battle.
Secondly, Deacon postulates fewer genetic changes than Pinker. Because Pinker does not make use of the expediting force of language/brain coevolution, he must propose a much more complex language acquisition device (LAD). It encodes the principles of Chomsky's Universal Grammar, and its task is to acquire the appropriate parameters. Developmental biology cannot yet tell us whether a LAD of such complexity is possible or not. Given what we know about genetic regulation, postulating such a LAD is less parsimonious than Deacon's postulated biases.
Also, since a Pinkerian LAD would require the evolution of a very complex trait over just three million years, one would expect the genetic load of all these changes to be very high. Given that they had to have been selected simultaneously with other human adaptations, such as bipedalism and the vocal system, the chance of so much simultaneously successful selection is low. Therefore, the LAD is not parsimonious from the point of view of theoretical population biology. Pinker offers an explanation in terms of small selective advantages (he ignores genetic drift), and accuses his opponents of believing what Dawkins calls the "Argument from Personal Incredulity." These arguments betray Pinker's lack of a reasonable evolutionary scenario. Since Deacon explains his biases in terms of the standard learning processes and the preexisting symbolic system, he can offer an allometric explanation, one that does not require too much selective force, and which could more conceivably be governed by genetic regulation.
V.
The learnability problem has been productive in stimulating research, as the work of Elman, Wexler and Culicover, Newport, Deacon and others has shown. It is useful because the formal argument can be characterized as a challenge to theories of learning, rather than a disproof of learning's possibility. In contrast, Pinker's continuity assumption is a poorly founded, relying on a monolithic definition of parsimony and ignoring competing parsimony claims. It should not be used to prop up models of human linguistic competence which are, to say the least, implausible from the evolutionary and neurobiological points of view.
Bibliography
Deacon, Terrence W. (1997). The Symbolic Species. New York: Norton.
Elman, Jeffrey L., E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, D. Parisi, K. Plunkett (1996). Rethinking Innateness: A Connectionist Perspective on Development. Cambridge, MA: MIT Press.
Gold, E. Mark (1967). Language Identification in the Limit. Information and Control 10,447-74.
Pinker, Stephen (1979). Formal models of language learning. Cognition 7:217-283.
Pinker, Stephen (1984). Language Learnability and Language Development. Cambridge, MA: Harvard UP.
Pinker, Stephen and Bloom, Paul. Natural Language and Natural Selection. In The Adapted Mind, eds. J.H. Barkow, L. Cosmides, J. Tooby. Oxford: Oxford UP, pp. 451-493.
Wexler, Kenneth, and Culicover, Peter W. (1980). Formal Principles of Language Acquisition. Cambridge, MA: MIT Press.