1.  FITCH FIRST ROUND


Introduction

Darwin's "Origin of Species" (Darwin, 1859) made little mention of human evolution. This initial avoidance of human evolution was no oversight, but rather a carefully calculated move: Darwin was well aware of the widespread resistance his theory would meet from scientists, clergymen, and the lay public, and mention of human evolution might have generated insuperable opposition. But Darwin's many opponents quickly seized on the human mind, and language in particular, as a potent weapon in the battle against Darwin's new way of thinking. Alfred Wallace, whose independent discovery of the principle of natural selection spurred Darwin into finally publishing his long-developing "outline" of the theory in 1859, didn't help by arguing that natural selection was unable to explain the origins of the human mind. Although Wallace had reservations about all evolutionary approaches to the mind, human language provided the most powerful argument, due to the respectable position of linguistics and philology in Victorian science.

Darwin's most formidable foe on the linguistic front was Friederich Max Müller, professor of linguistics at Oxford University, a very well-known and well-respected scholar (Stam, 1976). In his "Lectures on the science of language," delivered at the Royal Institution of Great Britain in 1861, and rapidly published thereafter (Müller, 1861), Müller launched a full frontal attack on Darwin and Darwinism, using his credentials in the "science of language" as a powerful bludgeon. Müller's position was uncomplicated: "language is the Rubicon which divides man from beast, and no animal will ever cross it … the science of language will yet enable us to withstand the extreme theories of the Darwinians, and to draw a hard and fast line between man and brute." For Müller, "Language" was the key feature distinguishing humans from all animals. Müller's arguments were seen by many as convincing: his student Noiré dubbed him "the Darwin of the mind" and considered Müller to be "the only equal, not to say superior, antagonist, who has entered the arena against Darwin" (p. 73,Noiré, 1917). Müller's argument about the unbridgeable, qualitative difference between human language and all forms of animal communication, combined with Wallace's opinions, provided arguments that Darwin by necessity took very seriously.

Thus, when Darwin finally broached the subject of human evolution in 1871, in his second great book "The Descent of Man and Selection in Relation to Sex," the need to provide a credible explanation of language evolution was a central concern. Darwin rose to the challenge: his "musical protolanguage" model represents a powerful marriage of comparative data, evolutionary insight, and a biological perspective on language. Darwin's view of language was ahead of its time, and his model and arguments remain surprisingly relevant to contemporary debates. He clearly adopted a "multicomponent" view of language, one that recognized the necessity of several distinct mechanisms to produce the complex product that we now call language, rather than privileging any one factor as the single "key" to Language in a monolithic sense. Among these several components, he presciently recognized the necessity for complex vocal learning, and recognized that this biological capacity, while unusual among mammals, is shared with many birds. The importance of vocal learning has often been forgotten, but also frequently reaffirmed by later scholars (Egnor & Hauser, 2004; Fitch, 2000; Janik & Slater, 1997; Marler, 1976; Nottebohm, 1976).

Darwin also adopted an empirical, data-driven approach to the problem at hand. In particular, Darwin exploited a wide comparative database, exploiting not just his knowledge of nonhuman primate behaviour, but also insights from many other vertebrates. Finally, and most characteristically, he resisted any special pleading about human evolution. He intended his model of human evolution to fit within, and remain consistent with, a broader theory of evolution that applies to beetles, flowers and birds. Unlike Wallace, who remained a human exceptionalist to his death (Wallace, 1905), Darwin aimed to uncover general principles, like sexual selection and shifts of function, to provide explanations of unusual or unique human traits. While gradualistic, his model does not assume any simple continuity of function between nonhuman primate calls and language, and he clearly recognized the uniqueness of language in our species. In many ways, then, Darwin's model of language evolution finds a natural place in the landscape of contemporary debate concerning language evolution, and it is surprising that his model has received relatively little detailed consideration in the modern literature (for exceptions see Donald, 1991; Fitch, 2006).

In this essay, I aim to redress this neglect by considering Darwin's model of language evolution in detail. After discussing Darwin's main points and arguments, I will briefly review additional data supporting Darwin's model that has appeared since his death. I will also discuss the issue of meaning, about which Darwin had too little to say, but which can be resolved by the addition of a hypothesis due to (Jespersen, 1922). My conclusion is that, suitably modified in the light of contemporary understanding, Darwin's model of language evolution, based on a "protolanguage" more musical than linguistic, provides one of the most convincing frameworks available for understanding language evolution. The timing of my writing, on the 150th anniversary of the Origin, and the 200th of Darwin's birth, is also appropriate for a revival of interest in Darwin's compelling and well-supported hypothesis.

Language as an "Instinct to Learn"

Chapter Two of the Descent of Man, entitled "Comparison of the mental powers of man and the lower animals" is one of the most remarkable in the entire Darwinian corpus, noteworthy for its concision and its breadth of argument, in considering the evolution of the human mind. The first half of the chapter lays the groundwork of modern research in comparative cognition, arguing that animals have emotions, attention, memory as well as many other mental traits in common with humans. However, Darwin's opponents, notably Müller, had already ceded the point that animals have memory, experience emotions, and so on. Language was the key issue, and one can imagine considerable anticipation of both pro- and anti-Darwinian readers as they turned to the section simply titled "Language".

In ten densely-argued pages, Darwin considers some theoretical preliminaries, and then lays out his theory of language evolution. The first stage involved a general increase in intelligence and complex mental abilities, and the second involves a sexually-selected attainment of the specific capacity for complex vocal control: singing. The third stage was the addition of meaning to the "songs" of the second stage, which was both driven by, and in turn fueled, further increases in intelligence.

Theoretically, Darwin makes a number of important observations. First, he recognizes the crucial distinction between the language faculty (the biological capacity which enables humans to acquire language) and particular languages (like Latin or English). The former capacity, which Darwin refers to as "an instinctive tendency to acquire an art" (p 56), is shared by all members of the human species. Darwin neatly bypasses the unproductive nature/nurture debate that has consumed so much scholarly energy by observing that language "is not a true instinct, as every language has to be learnt. It differs, however, from all ordinary arts, for man has an instinctive tendency to speak, as we see in the babble of our young children" (p 55). As ethologist Peter Marler has put it, language is not an instinct, but an "instinct to learn" whose expression entails that both biological and environmental preconditions be fulfilled. It is this "instinct to learn" for which an biological, evolutionary explanation must be sought: a thoroughly modern perspective.

Second, although he was well-aware of the peculiarities of the human vocal tract, Darwin argues that the human capacity for language must be sought in the brain, rather than the peripheral vocal tract. He acknowledges that "articulate speech" (by which he means vocalization augmented by controlled movement of the lips and tongue, p. 59) is "peculiar to man", but he denies that this mere power of articulation suffices to distinguish human language "for as every one knows, parrots can talk." Instead, Darwin states that it is not speech, but humans' "large power of connecting definite sounds with definite ideas" that is definitive of language, and that this capacity "obviously depends on the development of the mental faculties" (p. 54). By locating the language capacity in the human brain, Darwin's viewpoint is again thoroughly modern.

Finally, Darwin recognized the relevance to language evolution of birdsong, which he considered the "nearest analogy to language". Like humans, birds have fully instinctive calls, and an instinct to sing. But the songs themselves are learned. He recognized the parallel between infant babbling and songbird "subsong", and recognized the key fact that cultural transmission ensures the formation of regional dialects in both birdsong and speech. Finally, he recognizes that physiology is not enough for learned song: crows have a syrinx as complex as a nightingale's but use it only in unmusical croaking. All of these parallels have been amply confirmed, and further explored, by modern researchers (Doupe & Kuhl, 1999; Marler, 1970; Nottebohm, 1972, 1975).

Darwin's "Musical Protolanguage" Hypothesis

Darwin's model of the phylogenesis of the language faculty, like most models today, posits that different aspects of language were acquired sequentially, in a particular order, and under the influence of distinguishable selection pressures. The hypothetical systems characterized by each addition can be termed, following (Bickerton, 1990; Hewes, 1973) "protolanguages". Darwin's first hypothetical stage in the procession from an ape-like ancestor to modern humans was a greater development of proto-human cognition: "The mental powers in some early progenitor of man must have been more highly developed than in any existing ape, before even the most imperfect form of speech could have come into use" (p 57). He elsewhere suggests that both social and technological factors may have driven this increase in cognitive power.

Next, Darwin outlines the crucial second step: what I have dubbed "musical protolanguage" (Fitch, 2006). Having noted multiple similarities with birdsong, he argues that the evolution of a key aspect of spoken language, vocal imitation, was driven by sexual selection, and used largely "in producing true musical cadences, that is in singing". He suggests that this musical proto-language would have been used in both courtship and territoriality (as a "challenge to rivals"), as well as in the expression of emotions like love, jealousy, and triumph. Darwin concludes "from a widely-spread analogy" (amply documented with comparative data later in the book) that sexual selection played a crucial role driving this stage of language evolution, in particular suggesting that the capacity to imitate vocally evolved analogously in humans and songbirds.

The crucial remaining question is how emotionally-expressive musical proto-language made the transition to true meaningful language — how, in Humboldt's words, humans became "a singing creature, only associating thoughts with the tones" (p. 76von Humboldt, 1836). This leap, from non-propositional song to propositionally-meaningful speech, remains the greatest explanatory challenge for all musical protolanguage theories (cf. Mithen, 2005). Darwin, citing the previous writings of Müller and (Farrar, 1870), suggests that articulate language "owes its origins to the imitation and modification, aided by signs and gestures, of various natural sounds, the voices of other animals, and man's own instinctive cries". Darwin thus embraces all three of the major leading theories of word origins of his contemporaries (cf. Fitch, in press). Once proto-humans had the capacity to imitate vocally, and to combine such signals with meanings, virtually any source of word forms and meanings would suffice, including onomatopoeia (an imitated roar for "lion", or "whoosh" for wind), and controlled imitation of human emotional vocalizations (mock laughter for "play" or "happiness"). The attachment of specific and flexible meanings to vocalizations required only that "some unusually wise ape-like animal should have thought of imitating the growl of a beast of prey … And this would have been a first step in the formation of a language".

Darwin does not suggest that the evolutionary process would stop with the initial acquisition of meaning. For "as the voice was used more and more, the vocal organs would have been strengthened and perfected". Additionally, language would have "reacted on the mind by enabling and encouraging it to carry on long trains of thought" which "can no more be carried on without the aid of words, whether spoken or silent, than a long calculation without the use of figures or algebra". Thus began the interactive evolutionary spiral that led to modern humans.

Signalling Modality: Vocalization or Gesture?

Darwin also explicitly acknowledged the role of gesture in conveying meaning, echoing Condillac's earlier arguments (Condillac, 1971 (1747)) and presaging contemporary discussions (Arbib, 2005; Corballis, 2003; Hewes, 1973; Stokoe, 1974; Tomasello & Call, 2007). Darwin was aware of the power of signed language: he reminds us that using his fingers "a person with practice can report to a deaf man every word of a speech rapidly delivered at a public meeting" (p 58). He also acknowledged the value of gesture in conveying meaning, and allowed that vocal communication would have been "aided by signs and gestures" (p. 56). Nevertheless, he argues against gestural theorists, because the pre-existence in all mammals of "vocal organs, constructed on the same general plan as ours" would lead any further development of communication to target the vocal organs rather than the fingers.

Darwin clearly believes that the power of speech is neural, not peripheral, citing the early aphasia literature as a demonstration of "the intimate connection between the brain, as it is now developed in us, and the faculty of speech". Comparing the vocal organs and brain, he concludes "that the development of the brain has no doubt been far more important". And although he uses a continuity argument to support the early and sustained role of speech, he firmly acknowledges the abrupt modern discontinuity in the linguistic system that has thus evolved. Thus, like many other insightful commentators (e.g., Donald, 1991; Hockett & Ascher, 1964), Darwin recognized that posing phylogenetic continuity and modern discontinuity as in any way opposed is to create a false dichotomy. The tree-like nature of phylogeny guarantees that both are core parts of the evolutionary process.

Darwin Redux: Modern Comparative Data

Summarizing, Darwin suggested that the first step on the road to human language was a general increase in intelligence in the hominid lineage. In a typically pluralistic fashion, he recognized both "social intelligence" ("Machiavellian intelligence" in the modern trope (Byrne & Whiten, 1988)) and technological/ecological intelligence (e.g. for tool use) as playing important selective roles. Given our modern understanding of hominid evolution, this first stage might be provisionally linked to the genus Australopithecus or perhaps early Homo (e.g. Homo habilis).

The second stage is the least intuitive: that before vocalizations were used meaningfully they were used, so to speak, aesthetically, to fulfil many of the same functions that modern humans use music today (courtship, bonding, territorial advertisement and defense, competitive displays, etc.). This idea that complex vocalizations (and thus some aspects of phonology and syntax) might have preceded the ability of speech to convey propositions and distinct meanings is the most challenging aspect of Darwin's model. But Darwin uses the comparative database, and particularly detailed analogy between learned bird song and human song and speech, to show that this step is not just plausible but well-documented: it has occurred in many other species. Indeed, modern data shows that vocal learning, without propositional meaning, has evolved independently in at least three other clades of mammals (cetaceans, pinnipeds and bats) and three clades of birds (parrots, hummingbirds and oscine songbirds) (Janik & Slater, 1997; Jarvis, 2004). Such convergent evolution, or repeated independent evolutionary developments of a comparable ability, provides our strongest empirical basis for estimating the likelihood of a particular type of evolutionary event (Harvey & Pagel, 1991). Many of the chapters in this book affirm, and extend, the observations of parallels between language learning and birdsong that Darwin offered in 1871. Thus, whether intuitive or not, Darwin's focus on, and hypothesis for, the evolution of vocal learning is consistent with a wealth of evolutionary and comparative data.

Difficulties with Darwin's Model: Evolving Phrasal Semantics

"How did man become, as Humboldt somewhere defined him, 'a singing creature, only associating thoughts with the tones'?" Otto Jespersen 1922 (p. 437)

Despite its many virtues, there remain some important problems with Darwin's model that have impeded its acceptance today. The first and most important is his explanation of the addition of meaning. Darwin's explanation, as typical for his day, was concerned only with word meanings (what today would be termed "lexical semantics"). But from the viewpoint of modern linguistics, his model seems wholly inadequate to deal with large swaths of semantics, particularly those aspects tied in with the interpretation of whole phrases and sentences ("phrasal semantics"). Modern formal semantics has developed rigorous models of this aspect of linguistic meaning (Dowty, Wall, & Peters, 1981; Guttenplan, 1986; Montague, 1974; Portner, 2005), and it is far more complex and difficult to explain than lexical semantics. Although one can hardly blame Darwin for not foreseeing these relatively recent developments in linguistics, they nonetheless raise substantial difficulties for his model. For much of the syntactic "glue" which binds sentences together into large, meaningful wholes (function words, inflection, bound morphemes, word order, and a host of others) cannot be understood as resulting from onamatopoeia or imitation of emotional expressions. Nor can they be readily understood as "inventions" of some uniquely intelligent individual: all evidence suggests that these indispensable linguistic tools develop reliably in individuals of normal intelligence (Bickerton, 1981; Kegl, 2002; Mufwene, 2001; Mühlhäusler, 1997; Senghas, Kita, & Özyürek, 2005). This key aspect of language thus seems to have a biological basis. Darwin does recognize the phenomenon today called "grammaticalization": he states that "conjugations, declensions, &c., originally existed as distinct words, since joined together" (p 61). But he offers no model for the origin of these distinct words, and it is hard to see how onamotopoeia or similar processes could have generated this original syntactic and semantic "glue". Thus, complex phrasal semantics remains unexplained by Darwin's model.

However, this oversight was remedied long ago by the linguist Otto Jespersen (Jespersen, 1922). Jespersen's basic insight involves recognizing the link, in humans, between musical and linguistic phrases, and working conceptually backward from there. Jespersen suggested a form of protolanguage in which, initially, whole propositional meanings attached to entire sung phrases, but where there was no consistent link between the individual conceptual components of the meaning, and component parts of the musical phrases (syllables and notes). Thus, there were no "words" as we now understand them. From this "holistic" starting point, Jespersen argued that a cognitive process of analysis started, which slowly isolated individual chunks of the musical phrase (syllables, or multi-syllabic "phraselets" — what today we call "words") and associated them with individual components of the meaning (e.g. nouns, verbs and adjectives, whose precursors were already present in the conceptual systems of our pre-linguistic ancestors).

Jespersen's hypothesis of a "holistic protolanguage" has recently been rediscovered and championed by linguist Alison Wray (Wray, 1998, 2000) and neuroscientist Michael Arbib (Arbib, 2005). Both cite considerable additional evidence supporting this "analytic" model, including data from modern adult language, child language acquisition, and cognitive neuroscience. Supporters of the more intuitive "synthetic" model of protolanguage, in which words evolved first followed by syntactic operations for combining them (e.g., Bickerton, 1990), have subjected holistic models to extensive criticisms (Bickerton, 2007; Tallerman, 2007, 2008). However, I argue that most of these critiques miss their mark if the notion of a musical protolanguage is accepted as a starting point (cf. Fitch, in press). Jespersen/Wray's model of holistic protolanguage thus dovetails nicely with the musical protolanguage hypothesis, in ways that I believe resolve many, if not all, of these criticisms (cf. Fitch, 2006; Mithen, 2005).

Sexual Selection:

A second problem with Darwin's model remains unresolved at present: his focus on sexual selection as the force driving the evolution of musical protolanguage. Appearing as it did as a few pages of an extensive tome introducing and then extensively documenting the very idea of sexual selection, this aspect of Darwin's theory has the virtue of explaining a core aspect of human evolution using a broad principle abundantly demonstrated in the evolution of other species. As throughout his work, Darwin eschewed "special pleading" for our own species. The central difficulty for this beautiful hypothesis is posed by two ugly facts about modern human language: it is equally developed in males and females, and is expressed very early in ontogeny, essentially at birth (Fitch, 2005a). These aspects of language differentiate it sharply from most sexually-selected traits, which are strongly biased to develop in the more competitive sex (typically males), and only at sexual maturity. If anything, human females have superior language skills when compared to men (Henton, 1992; Kimura, 1983; Maccoby & Jacklin, 1974), and language is remarkable in its very early development, with at least some early tuning to phonology already occurring in utero before birth (DeCasper & Fifer, 1980; Mehler et al., 1988; Spence & Freeman, 1996).

There are several potential answers to the difficulty that these facts pose: one is to argue that during the musical protolanguage stage, sexual selection was the driving force, and song was (as in most bird species) expressed mainly in males at sexual maturity. Then, at a later stage (presumably during the evolution of meaningful language) some other selective force kicked in, so that language became equally (or better) expressed in females, and was pushed to develop early. A candidate selective force is kin communication: that selection for information transmission between parents and their offspring, or more generally between adults and their younger kin. I have suggested that kin selection drove this second stage of the evolution of propositional semantic content (Fitch, 2004, 2007). For an exploration and critique of this idea, see (Zawidzki, 2006). This kin-selection scenario neatly explains the early ontogenetic appearance of language in infants (the earlier offspring begin absorbing their elders' knowledge, the better), and its bias towards females (who are primary caregivers in all hominoids). The continued presence of meaningful speech in males is easily explained by the dual facts that immature males must also learn, and that, unusually in humans, adult males play an important role in child rearing (whether the father, or male siblings of the mother, is irrelevant to this fact). Finally, this kin-selection model has the virtue of explaining why language evolved in humans and not in other "musical" lineages. Humans combine an extended childhood, with ample time to acquire knowledge, with very small reproductive output. The fact that ape babies are born singly, and rarely, conspire to make the survival of each individual hominid infant a crucial component of reproductive success in the great ape lineage (cf. Fitch, 2007; Hrdy, 1999, 2004).

An alternative possibility is that sexual selection was, and remains, an important driving force in human cognitive evolution, including language (Miller, 2001), but that human pair-bonding has "changed the rules" in significant ways, so that both sexes are choosy, and both compete for high-quality mates. Some comparative data can be cited in support of this second option. Recent data shows that female bird song is not so uncommon as thought by Darwin, who considered female song to be a simple aberration (Langmore, 2000; Riebel, 2003; Ritchison, 1986). There is some evidence suggesting that sexual selection can indeed drive female bird song, though it seems clear that female song is a secondary derivation of male song in most lineages (Langmore, 1996). While these observations provide some support for the idea that the dual-sex expression of human language could result from sexual selection, it is important to recognize that female song still appears to be numerically speaking exceptional and that any model based on sexual selection will have difficulty explaining the extremely early development, and productive use, of language in human infants.

A final possibility is that sexual selection never played a role in the evolution of music or of language. The popular notion that music evolved for courtship (Miller, 2000, 2001) stands on a surprisingly weak empirical footing compared to a less obvious, but better-documented function of music: mother-infant communication (Trainor, 1996; Trehub, 2003a, 2003b). Mothers sing to their infants all over the world, even those who claim to be unable to sing (Street, Young, Tafuri, & Ilari, 2003), and infants both prefer song to speech, and respond to song in manifestly adaptive ways (e.g. engaging with and getting excited by play songs, and being lulled to sleep by lullabies (Trehub & Trainor, 1998). These observations suggest that music originally functioned in a childcare context, as it continues to do today. By this model, the use of music in bonding among adults is simply a side-effect of this central function, and its occasional use in courtship is a red herring (Dissanayake, 2000; Falk, 2004; Trehub & Trainor, 1998). This final possibility is clearly compatible with the kin-selection arguments advanced above, but here there would be no intervening stage of language evolution in which sexual selection ever played a dominating role. Even Darwin was occasionally wrong.

Terminological Niceties: Musical or Prosodic Protolanguage?

A final, less crucial difficulty with Darwin's model is terminological. Darwin himself seemed to conceive of his pre-semantic protolangage in terms directly comparable to modern day music (or at least he provides no indication that this is not the case). He concludes that "musical notes and rhythm" were present in this protolanguage, and that they were deployed "in producing true musical cadences, that is in singing." This is why I term his model "musical protolanguage". However, modern human music consists not just of song, but also instrumental music, so this appellation might immediately have connotations of drumming, whistling or flutes that are not, strictly speaking, relevant to language evolution. More pertinently, if we take the musical protolanguage model seriously, we must acknowledge that modern music may not necessarily preserve the state of this protolanguage precisely, and that both music and language have changed in the interim (cf. Brown, 2000). That is, Darwin's hypothetical communication system was proto-music, not music per se. Adopting the logic of comparative reconstruction, we can then ask which aspects of modern speech, and of song, are shared, and thereby reconstruct this system (Fitch, 2005b). The central shared aspects are prosodic and phonological: the use of a set of primitives (syllables) to produce larger, hierarchically-structured units (phrases) which are discretely distinctive. But two key "musical" aspects are not shared between speech and song: namely discrete-pitched notes, and temporal isochrony (a steady beat). I have used this comparison of modern speech and song to argue for a subtly different model from that of Darwin, which I termed "prosodic" rather than "musical" protolanguage, in which protolanguage consisted of sung syllables, but not of notes that could be arranged in a scale, nor produced with a steady rhythm (Fitch, 2006). This prosodic protolanguage model thus includes the "sung cadence" aspect of Darwin's model, while rejecting both his "notes" and "rhythm" (at least as normally construed). Both of these aspects of (most) modern song are, by hypothesis, more recent developments in music not present in protolanguage. I see this as an adjustment of Darwin's hypothesis, fully in keeping with its spirit. Furthermore, it is unclear from his writings whether Darwin would have disagreed with this adjustment.

A different reconstruction of the common ancestor of music and language, involving both discrete pitches and isochronic rhythm (as well as tone-based meaning) is given in (Brown, 2000). Brown also argues that his hypothetical protolanguage, which he dubs "musilanguage" could not have evolved by normal neo-Darwinian selection and thus demands a group selection explanation. This remains its clearest, and most dubious, distinction from what is otherwise just a rediscovery of Darwin's basic hypothesis (for critiques see Botha, 2008; Fitch, in press).

Conclusions

I have argued that Darwin's model for language evolution, "musical protolanguage," suitably updated, provides a compelling fit to both the phenomenology of modern music and language, and to a wealth of comparative data. By placing vocal control at the centre of his model, Darwin availed himself of the rich comparative database of other species who have independently evolved complex vocal imitation, and he thus explains two of the features of human language that set if off most sharply from nonhuman primate communication systems: vocal learning and cultural transmission. The biggest missing piece in Darwin's model, as I see it, is a reasonable explanation of phrasal semantics (and the aspects of syntax that go with it), but this gap was filled by Jespersen by 1922. Together, these hypotheses provide one of the leading models of language evolution available today (for an enthusiastic book-length exploration seeMithen, 2005), and one that has been repeatedly re-discovered by later scholars (e.g., Brown, 2000; Livingstone, 1973; Richman, 1993). While many aspects of what has now become a family of models remain to be explored empirically (the issues surrounding sexual, kin and group-selection remain particularly unclear), this is a model worthy of detailed consideration and elaboration today. Most importantly, Darwin's model makes numerous testable empirical predictions (for example about the partially overlapping nature of the brain mechanisms underlying music and spoken language, and their genetic basis) that can be answered in the coming decades.

This year of Charles Darwin's 200th birthday seems an opportune time for Darwin' own model of language evolution to regain the prominence it deserves.


2. BICKERTON FIRST ROUND


I yield to no-one in my admiration of Darwin.  But admiration should not blind us to the fact that in many cases he was, inevitably, limited by the state of knowledge in his time.  Not only Mendelian genetics, but also almost the entire ancestry of humans, was wholly unknown to him; ethology and the study of non-human communication had yet to be systematically developed, and linguistics still lay in the womb of philology.  It is truly amazing, not that he was sometimes wrong, but that he was so often and so stunningly right.

He was right when he saw language as the seed, rather than the fruit, of human intelligence.  But appealing as the notion is, he was wrong in proposing a scenario in which language issued from a "musical protolanguage".   Tecumseh Fitch argues that his own account, developed from Darwin's, is soundly based on principles of evolutionary biology.  It is therefore somewhat surprising that his account pays as little attention to the evolution of humans (and the ways in which this evolution differed from that of other primates) as do those of biologically-naïve linguists or psychologists.

The notion of a terrestrial and heavily-predated primate indulging in any form of vocal activity-especially one that must, in quantity as well as quality, have exceeded those of all other primates barring gibbons-is simply bizarre, as I point out in a chapter of my book Adam's Tongue (out next month) devoted to the "singing ape" hypothesis:

"What could possibly have been the functions of song for a pre-human species in largely treeless grasslands?  Song as a pair-bonding mechanism is highly unlikely.  Human ancestors probably weren't monogamous-great apes aren't, and neither are we, even if we try or pretend to be, so a monogamous interval at any time in the past looks unlikely.  But suppose we did go through a monogamous period.  If two mates don't happen to be out of sight of one another up two different trees, there are countless more effective ways of bonding than yodeling at each other.

"Human ancestors probably weren't territorial, either-at least not in the sense of holding small, well-defined chunks of territory.  Most likely they had a fission-fusion social structure, like that of contemporary apes, that's to say groups would be continually splitting up and reforming, merging with other groups.  In open terrain, where different groups might utilize the same areas at different times without conflict or even contact, what would be the point of noisily-defended frontiers?

"Furthermore, the terrains in which gibbons and human ancestors lived were such that for maintaining contact sound was essential in one and useless, even dangerous, in the other…On the savanna, where there are beasts with keen hearing far larger and more lethal than our ancestors, to sing out with any frequency would have been to write one's own death warrant.  Moreover, the absence of trees and the level or undulating nature of most savannas means that, in contrast with the rain-forest, animals are visible at considerable distances.  To be out of sight is, under those conditions, almost always to be out of earshot–there's little point in yelling and hoping your friends will hear you.

"To assume that, even if our ancestors had sung before, they would go on singing under these conditions is absurd-something you can do only if you think that behavior and environment are completely divorced from one another… Conditions on the savanna were such that while they lived there our ancestors very probably produced less sound than our ape relatives, not more.  If this was indeed the case, a single source for music and language becomes highly unlikely. Unless, of course, someone succeeds in coming up with some function pre-humans had to perform, under those same savanna conditions, that they couldn't have performed by any means other than by singing.  It's unlikely anyone will, but never say never in science."

To persuade us of the "musical protolanguage" theory,  Tecumseh will have to come up with a scenario in which singing (of some kind) somehow increased human fitness.  Here he has proposed mother-child interaction (as already suggested by Dean Falk in a recent article, "Prelinguistic evolution in early hominins: Whence motherese?", Behavioral and Brain Sciences 27(4):491-503, 2004).  The problem with this is that all other primates have mother-child interactions, but only one has picked on this kind.  Why?   Why humans?  And this doesn't end the problems that "musical protolanguage" raises.

Tecumseh recognizes that the severest of these problems ("the greatest explanatory challenge for all musical protolanguage theories") is how sound acquired sense-how a continuously variable medium with no specific reference turned into strings of discrete chunks with individual meanings.  However, he skips nimbly over the solution:

"Supporters of the more intuitive "synthetic" model of protolanguage, in which words evolved first followed by syntactic operations for combining them (e.g., Bickerton, 1990), have subjected holistic models to extensive criticisms (Bickerton, 2007; Tallerman, 2007, 2008). However, I argue that most of these critiques miss their mark if the notion of a musical protolanguage is accepted as a starting point (cf. Fitch, in press).  Jespersen/Wray's model of holistic protolanguage thus dovetails nicely with the musical protolanguage hypothesis, in ways that I believe resolve many, if not all, of these criticisms (cf. Fitch, 2006; Mithen, 2005)."

As I don't have a copy of Fitch (in press), I remain in the dark as to what these ways are.  All I know is that when Dean Falk made the same proposal, I wrote a commentary that, inter alia, pointed out she gave no account of how symbolic meaning — symbolic use of  words or signs to refer to particular classes or individuals — emerged from originally meaningless sounds.  Significantly, she responded to all the points I made… except for that one.

Maggie Tallerman and I have made some very specific and pointed criticisms of the "holistic protolanguage" model, most of which have never been satisfactorily answered by anyone, as far as I know.  If Tecumseh believes he can answer them, he should show how.  He does point out that "Darwin… embraces all three of the major leading theories of word origins of his contemporaries" but he fails to point out that at least two of these are incompatible with one another.  For according to Darwin, "the attachment of specific and flexible meanings to vocalizations required only that 'some unusually wise ape-like animal should have thought of imitating the growl of a beast of prey'" (and of course that some even wiser primates should have understood what was meant-a lion coming, or lions often hang around here, or one was seen here last week, or "Gee, guys, see how well I can imitate a lion!").  But of course this onomatopoeic proposal is incompatible with "musical protolanguage", since it avoids the holistic phase altogether and goes straight to the kind of compositional, already-symbolic protolanguage that Tecumseh rejects.  The "lion's roar" idea needs a good bit of tweaking, but at least it's nearer the mark than a holistic protolanguage.

A major motive behind "musical protolanguage" is Strict Continuism — the belief that language grew seamlessly from animal communication.   Animal calls — if translated into humanese, and that turns out to be a very dodgy business in itself — are, like holophrases, often the equivalents of whole clauses: "Mate with me"; "Stay off my territory"; "Terrestrial predator coming — get up a tree".   Split these into their components and for a few glorious moments it seems that the transition problem has been solved.  But in Adam's Tongue I go more deeply into the transition problem than anyone ever has before.  And it's the transition problem — how any species could get from a standard animal communication system to even the crudest and most basic kind of protolanguage — that lies at the very heart of language evolution, and without which all "explanations" are mere hand-waving, smoke and mirrors.


FITCH SECOND ROUND


The point of my essay, written on Darwin's birthday, was to revive interest in Darwin's long-neglected ideas about language evolution, not to offer (or defend) my own model. Derek Bickerton's critique of Darwin's musical protolanguage model suggests that our hominin ancestors lived out their terrified lives on the treeless savannah, cowed into silence by their many predators. This "fact" renders absurd, Bickerton claims, Darwin's notion that our ancestors evolved learned, complex vocalizations ("song", for simplicity, hereafter) before language.

By dubbing Darwin's idea "bizarre" and "absurd," Bickerton reveals his unwillingness to engage in a sympathetic interpretation of the musical protolanguage hypothesis first advanced by Darwin (and others who have followed in his footsteps). But as philosopher Suzanne Langer observed "The chance that the key ideas of any professional scholar's work are pure nonsense is small; much greater the chance that a devastating refutation is based on a superficial reading or even a distorted one, subconsciously twisted by a desire to refute" (p. ix, Langer, 1962).

My goal here is only to show that Bickerton's interpretation is of this superficial and distorted sort, and because of this and some factual errors cited below, that his attempt at a devastating refutation misses its mark.

First, Bickerton misses the central point of Darwin's hypothesis: to explain the origin of vocal learning in the hominid line. This is an indubitable capacity in our species, indubitably lacking in other apes, and there must be an evolutionary explanation for it. Although it is possible that vocal learning is a "spandrel", a by-product of some other evolutionary change (e.g. large brains), it does not seem absurd to suppose that this capacity was selected, for some reason or another. If it was not in the mute Australopithecines of the Bickertonian savanna, and did not play a role in pair- or mother-infant bonding, it must have happened at some other time, for some other reason - but it happened.

Darwin simply, and correctly, observed that the capacity for vocal learning is not uniquely human, but is shared with birds, and it was this central observation upon which he built his theory. This observation, and the deductions Darwin drew from it, have subsequently been supported by additional comparative data from many species, from hummingbirds to seals and whales, of which Darwin was unaware. It is true, obviously, that language has many other critical components besides vocal control — vocal learning is one of several speech- and music-related mechanisms in our species, and language per se involves several others in addition (most notably complex syntax and semantics). But vocal learning did evolve in our species, and Darwin's hypothesis of an evolutionary route through song is a reasonable one, well-supported by abundant data.

Further, the human capacity for music has much in common with language: it is another early-developing trait, found in all human cultures, and Darwin's hypothesis has the virtue of explaining the continued existence of music along with language. This too, like vocal learning, needs to be explained if we are to understand human evolution.

From a historical viewpoint, Bickerton's critique is reminisicent of that of Darwin's nemesis, the linguist Max Müller. It was Müller who began coining the (unfortunately long-lived) nicknames for many models of language evolution, dismissing Darwin's "sing song" theory with the same brief sneer as the older onomatopoeia and interjectional hypotheses, which he nicknamed "bow wow" and "pooh-pooh" (Müller, 1861, 1873) Turnabout being fair play, Müller's own theory of semi-mystical resonance between words and things was dubbed the "ding dong" theory [(Noiré, 1917)]. Bickerton's "yodelling Australopithecines" image has the same absurd comic flair. But sneers and derogatory nicknames, however rhetorically effective, are not scientific arguments.

To close, I will only point out three key factual errors in Bickerton's critique:

1) Maggie Tallerman's critique of holistic protolanguage has been answered convincingly, point by point, by Kenny Smith recently (Smith, 2008). In a journal issue that, if I'm not mistaken, Bickerton himself co-edited…

2) Most paleoanthropologists now agree that the environment in which much of human evolution occurred was not the unitary "savannah" imagined by Bickerton, but an ecologically diverse environment better characterized as "mixed woodlands" (Kingston, Marino, & Hill, 1994). Our ancestors probably had plenty of trees to climb, and they probably did so regularly, as the lasting arboreal adaptations of the Australopithecines attest. Indeed it seems likely that Australopithecines built nests in the trees for sleeping, just as do modern chimpanzees and orangutans (Sabater Pi, Veà, & Serrallonga, 1997). Given that all apes persist in using loud vocalizations to stay in contact, for example chimpanzee pant-hoots, it is surprising that someone with an imagination as fertile as Bickerton's can't conceive of any function for loud nocturnal vocalizations in early hominins. He should consult (Mithen, 2005) for some inspiration.

3) Bickerton finds it "absurd" to suppose that a mostly-terrestrial, moderate-sized primate would be highly vocal, producing loud, repetitive vocalizations. What about Theropithecus gelada, the grassland living gelada baboon? These are mostly-terrestrial African primates, preyed upon by leopards, dogs, and humans, and whose fossil remains are also known from Olduvai Gorge. Geladas are extremely vocal and indeed noted for their vocal complexity (Aich, Moos-Heilen, & Zimmerman, 1990; Richman, 1976), and interestingly are one of the only primates for whom claims have been made of rhythmic, synchronized vocalizations (Richman, 1978, 1987). Whether Richman's claims about the musicality of gelada vocalizations hold up or not, there can be no doubt that geladas are highly vocal terrestrial primates, in apparent violation of Bickerton's evolutionary principles, and who evolved in a grassy, nearly treeless environment.

Most primates, whether arboreal or terrestrial, are highly vocal in at least some circumstances, as are humans today, and I conclude that Bickerton provides no good argument to suppose that hominins have ever been otherwise.

In conclusion, the discipline of language evolution is full of questions, and the field is only likely to make empirical progress if practitioners find it in their hearts and heads to sympathetically read, understand and compare multiple hypotheses, even those that initially seem unintuitive or even "absurd". Far too many unknowns remain about our species' past for Darwin's hypothesis to be dismissed so quickly, on such scant evidence and weak argument.

Especially on Darwin's birthday, and the 150th anniversary of his greatest book.

 

4.  BICKERTON SECOND ROUND 

 

Let me answer the points made by Tecumseh Fitch in his latest posting:

A. I don’t explain the origin of vocal learning.

Well, what would select for vocal learning better than having a few words to learn?   Vocal learning is not the result of some “deep homology” but has arisen spontaneously in a wide variety of species, each of which required vocal learning for its own particular needs.  Fitch’s assumption that I ought to be providing a source for vocal learning preceding the origin of language is based on the strange dogma found in Hauser, Chomsky & Fitch 2002: that everything needed for language (except, perhaps, the ability to create recursive structure) must have been present in the hominid line before language could emerge.  To the best of my knowledge, there’s no hard evidence for this.  Evolution does not need to get all its ducks in a row before it can create novel faculties.  To the contrary, new faculties typically emerge in some very crude and primitive form and then themselves act as selective pressures for traits that will subserve and expand those faculties.

B. Kenny Smith’s 2008 article disproves the arguments against a holistic protolanguage.

Yes, I co-edited the journal issue in question, and I actually did read this article.  It’s one of the better holophrastic papers, makes some good and useful points against Tallerman, and protracts the debate, but is far from the tie-breaker that Fitch claims.  More to the point, it doesn’t even attempt to tackle the strongest argument against a holophrastic protolanguage.

A holophrastic protolanguage consists of things that are basically like animal calls, and it’s true that we can roughly translate many such calls into humanese sentences:  “Come mate with me”, “Stay off my territory”.  The basic idea in a holophrastic protolanguage is that these calls are then fractionated into words.  But the basic assumption of holophrasis is that every holophrastic call is actually the exact equivalent of a particular sentence.  It must be so. How else could you transition from holophrasis to compositionality?  How else could you get agreement on what each of the fractionated parts of the holophrase meant?

In fact, animal calls are not the equivalent of sentences.  They are designed for entirely different purposes and function in entirely different ways.  Take the famous “vervet eagle alarm”.  It could translate as “Look out, here’s an eagle”.  It could equally well translate as “Danger from above” or “Quick, hide in the bushes!”  How would any primate know whether to fractionate this into “look” and “eagle”, or “danger” and “above”, or “hide” and “bushes”?  More on this below, see question 2.

C. Most human evolution did not occur in the savanna.

Did I say it did?   The quote from Adam’s Tongue that I gave did not, naturally, include the pages preceding it where I discussed at some length the mosaic woodland which, as I am of course well aware, human ancestors inhabited for three or four million years after they split from the other apes.  And Fitch himself knew perfectly well that I knew this when he accused me of imagining a “unitary savannah”.  He knew this because on October 30, 2008, I sent him the chapter in which this information is contained.  

But in any case, the careful reader of my original rebuttal will already have noted my real point: “To assume that, even if our ancestors had sung before, they would go on singing under these conditions is absurd.”  In other words, my point was that, whatever australopithecines might have done, hominids would hardly have persisted in doing it once the great drying set in around 2mya and they really did have to subsist in a savanna environment. 

D. Geladas are ground-living savanna primates, and they make a lot of noise, so why shouldn’t human ancestors have done the same?

To me, this is worse than being accused of things I don’t believe.  Fitch is a biologist.  That means he must know something about ecology, realize what a niche is and how it determines how a species behaves.  If the niches of geladas and hominids are totally different, he must surely know that you can’t use geladas as a model for possible hominid behavior.

Well, their niches differ dramatically.  There are three things and only three that geladas and early Homo have in common.  They are primates, they are terrestrial, and they live in savannas.  In just about everything else they are different.  Geladas eat grass and hardly anything but grass.  Human ancestors could eat pretty well anything except grass.  Geladas spend most of their foraging time hunkered down, hunching across the grass a meter or so at a time.  The long-legged Homo erectus wouldn’t have had long legs if he hadn’t had to use them to cover great distances.  The day range of geladas can be measured in meters.  The day range of Homo had to be measured in kilometers, given the scarcity of non-grass comestibles in savannas.  Geladas move around in troops of 300 to 400, indeed groups up to 600 or more have been recorded. Human ancestors could never have foraged in groups even approaching this size—they must have traveled in much smaller units.

Precisely because they’re grass-eaters, geladas can live together in large numbers, which (a) reduces the chances of predation—there’s safety in numbers—and (b) makes irrelevant their use of vocalization—out in the grassland, a huge bunch of them together, any predator is going to see them regardless of whether they vocalize or not.  For hominids it was different.  Small groups at long distances from other small groups, they might get together at night but during the day they’d have to go far and wide to find food, and for much of that—opportunistic stalking of birds and small mammals, ambush hunting and so on—they’d have to keep dead quiet.  What, exactly, would they have needed elaborate vocalizations for?

And that brings us to the meat of the matter.  Fitch’s four points serve to disguise the fact that he has no answers to the two really serious questions involved here.

Question No. 1:  For what function did hominids need complex vocalizations?  (Note: it would have to be a function basic and essential enough to offset the risk from attracting predators—“loud nocturnal vocalizations to stay in contact” won’t cut it, because at night they would have been damn careful to keep in close physical contact!)

Question No. 2: How did meaning get into the vocalizations?  There have been any number of attempts to explain this, but I have yet to see one that is even halfway convincing.

One final word.  Contra Fitch, the “singing ape” is not in any way a “key idea” of Darwin’s, any more than my original article was a “twisted desire to refute.”  In a volume of several hundred pages Darwin devotes a few sentences to it, alongside a couple of other possible language origins—and Fitch himself mentions all three!  So to talk about “Darwin's musical protolanguage model” simply expands and distorts what Darwin actually said.  And to genuflect before every casual remark a writer made is not admiration—it’s idolatry.   Idolatry of Darwin does not increase, but rather detracts from, appreciation of the many great things that Darwin did say.


5.  ????????


Well, folks, that's where it's at as of now.  Fitch hasn't answered my last post.  It's my belief he hasn't answered it because he can't answer it.  So I'm declaring victory and going home until such time as (or IF) he answers my points.  If he does, it will appear in this blog.  Meantime I'll debate anyone, ANYONE who disputes the conclusions reached in Adam's Tongue.  Come one come all, let's see your stuff.