"...look into all things with a searching eye” - Baha'u'llah (Prophet Founder of the Baha'i Faith)


Jan 25, 2015

The world's language total -- 7,000, nine giants: Mandarin, Spanish, English, Arabic, Hindi, Bengali, Portuguese, Russian, and Japanese

The known number of distinct languages still spoken or recently spoken in the modern world is around 7,000. That huge total may astonish many readers, because most of us could name only a few dozen languages, and the vast majority of languages are unfamiliar to us. Most languages are unwritten, spoken by few people, and spoken far from the industrial world. For example, all of Europe west of Russia has fewer than 100 native languages, but the African continent and the Indian subcontinent have over 1,000 native languages each, the African countries of Nigeria and Cameroon 527 and 286 languages respectively, and the small Pacific island nation of Vanuatu (area less than 5,000 square miles) 110 languages. The world's highest language diversity is on the island of New Guinea, with about 1,000 languages and an unknown but apparently large number of distinct language families crammed into an area only slightly larger than Texas.

Of those 7,000 languages, 9 "giants," each the primary language of 100 million or more people, account for over one-third of the world's population. In undoubted first place is Mandarin, the primary language of at least 700 million Chinese, followed by Spanish, English, Arabic, Hindi, Bengali, Portuguese, Russian, and Japanese in approximately that sequence. If we relax our definition of "big languages" to mean the top 70 languages - i.e., the top 1% of all languages - then we have encompassed the primary languages of almost 80% of the world's people.

But most of the world's languages are "little" languages with few speakers. If we divide the world's nearly 7 billion people by 7,000 languages, we obtain 1 million people as the average number of speakers of a language. Because that average is distorted by the 100-million-plus speakers of just 9 giant languages, a better measure of a "typical" language is to talk about the "median" number of speakers - i.e., a language such that half of the world's languages have more speakers, and the other half have fewer speakers. That median number is only a few thousand speakers. Hence half of the world's languages have under a few thousand speakers, and lots of them have between only 60 and zoo speakers.

But such discussions of numbers of languages, and numbers of language speakers, force us to confront the question of: what's the difference between a distinct language and a mere dialect of another language?

Speech differences between neighboring populations intergrade completely; neighbors may understand 100%, or 92% or 75%, or 42%, or nothing at all of what each other says. The cut-off between language and dialect is often arbitrarily taken at 70% mutual intelligibility: if neighboring populations with different ways of speaking can understand over 70% of each other's speech, then (by that definition) they're considered just to speak different dialects of the same language, while they are considered as speaking different languages if they understand less than 70%.

But even that simple, arbitrary, strictly linguistic definition of dialects and languages may encounter ambiguities when we try to apply it in practice. One practical difficulty is posed by dialect chains: in a string of neighboring villages ABCDEFGH, each village may understand both villages on either side, but villages A and H at opposite ends of the chain may not be able to understand each other at all. Another difficulty is that some pairs of speech communities are asymmetrical in their intelligibility: A can understand most of what B says, but B has difficulty understanding A. For instance, my Portuguese-speaking friends tell me that they can understand Spanish-speakers well, but my Spanish-speaking friends have more difficulty understanding Portuguese.

Those are two types of problems in drawing a line between dialects and languages on strictly linguistic grounds. A bigger problem is that languages are defined as separate not just by linguistic differences, but also by political and self-defined ethnic differences. This fact is expressed in a joke that one often hears among linguists: "A language is a dialect backed up by its own army and navy." For instance, Spanish and Italian might not pass the 70% test for being ranked as different languages rather than mere dialects: my Spanish and Italian friends tell me that they can understand most of what each other says, especially after a little practice. But, regardless of what a linguist applying this 70% test might say, every Spaniard and Italian, and everybody else, will unhesitatingly proclaim Spanish and Italian to be different languages - because they have had their own armies and navies, plus largely separate governments and school systems, for over a thousand years.

Conversely, many European languages have strongly differentiated regional forms that the governments of their country emphatically consider mere dialects, even though speakers from the different regions can't understand each other at all. My north German friends can't make heads or tails of the talk of rural Bavarians, and my north Italian friends are equally at a loss in Sicily. But their national governments are adamant that those different regions should not have separate armies and navies, and so their speech forms are labeled as dialects and don't you dare mention a criterion of mutual intelligibility. Those regional differences within European countries were even greater 60 years ago, before television and internal migration began breaking down long-established "dialect" differences. For example, on my first visit to Britain in the year 1950, my parents took my sister Susan and me to visit family friends called the Grantham-Hills in their home in the small town of Beccles in East Anglia. While my parents and their friends were talking, my sister and I became bored with the adult conversation and went outside to walk around the charming old town center. After turning at several right angles that we neglected to count, we realized that we were lost, and we asked a man on the street for directions back to our friends' house. It became obvious that the man didn't understand our American accents, even when we spoke slowly and (we thought) distinctly. But he did recognize that we were children and lost, and he perked up when we repeated the words "Grantham-Hill, Grantham-Hill." He responded with many sentences of directions, of which Susan and I couldn't decipher a single word; we wouldn't have guessed that he considered himself to be speaking English. Fortunately for us, he pointed in one direction, and we set off that way until we recognized a building near the Grantham-Hills' house. Those former local "dialects” of Beccles and other English districts have been undergoing homogenization and shifts towards BBC English, as access to television has become universal in Britain in recent decades.

By a strictly linguistic definition of 70% intelligibility - the definition that one has to use in New Guinea, where no tribe has its own army or navy - quite a few Italian "dialects" would rate as languages. That redefinition of some Italian dialects as languages would close the gap in linguistic diversity between Italy and New Guinea slightly, but not by much. If the average number of speakers of an Italian "dialect" had equaled the 4,000 speakers of an average New Guinea language, Italy would have 10,000 languages. Aficionados of the separateness of Italian dialects might credit Italy with dozens of languages, but no one would claim there to be 10,000 different languages in Italy. It really is true that New Guinea is linguistically far more diverse than is Italy. 
(Jared Diamond, ‘The World Until Yesterday)