English language ‘originated in Turkey’

Modern Indo-European languages – which include English – originated in Turkey about 9,000 years ago, researchers say. Their findings differ from conventional theory that these languages originated 5,000 years ago in south-west Russia. The New Zealand researchers used methods developed to study virus epidemics to create family trees of ancient and modern Indo-European tongues to pinpoint where and when the language family first arose.

A language family is a group of languages that arose from a common ancestor, known as the proto-language. Linguists identify these families by trawling through modern languages for words of similar sound that often describe the same thing, like water and wasser (German). These shared words – or cognates – represent our language inheritance. According to the Ethnologue database, more than 100 language families exist.

The Indo-European family is one of the largest families – more than 400 languages spoken in at least 60 countries – and its origins are unclear. The Steppes, or Kurgan, theorists hold that the proto-language originated in the Steppes of Russia, north of the Caspian Sea, about 5,000 years ago. The Anatolia hypothesis – first proposed in the late 1980s by Prof Colin Renfrew (now Lord Renfrew) – suggests an origin in the Anatolian region of Turkey about 3,000 years earlier.

To determine which competing theory was the most likely, Dr Quentin Atkinson from the University of Auckland and his team interrogated language evolution using phylogenetic analyses – more usually used to trace virus epidemics.

Fundamentals of life

Phylogenetics reveals relatedness by assessing how much of the information stored in DNA is shared between organisms. Chimpanzees and humans have a common ancestor and share about 98% of their DNA. Because of this shared ancestry, they cluster together on phylogenetic – or family – trees. Like DNA, language is passed down, generation to generation.

Although language changes and evolves, some linguists have argued that cognates describing the fundamentals of life – kinship (mother, father), body parts (eye, hand), the natural world (fire, water) and basic verbs (to walk, to run) – resist change. These conserved cognates are strongly linked to the proto-language of old.

Dr Atkinson and his team built a database containing 207 cognate words present in 103 Indo‐European languages, which included 20 ancient tongues such as Latin and Greek. Using phylogenetic analysis, they were able to reconstruct the evolutionary relatedness of these modern and ancient languages – the more words that are cognate, the more similar the languages are and the closer they group on the tree. The trees could also predict when and where the ancestral language originated. Looking back into the depths of the tree, Dr Atkinson and his colleagues were able to confirm the Anatolian origin.

To test if the alternative hypothesis – of a Russian origin several thousand years later – was possible, the team used competing models of evolution to pitch Steppes and Anatolian theory against each other. In repeated tests, the Anatolian theory always came out on top.

Commenting on the paper, Prof Mark Pagel, a Fellow of the Royal Society from the University of Reading who was involved in earlier published phylogenetic studies, said: “This is a superb application of methods taken from evolutionary biology to understand a problem in cultural evolution – the origin and expansion of the Indo-European languages. “This paper conclusively shows that the Indo-European languages are at least 8-9,500 years old, and arose, as has long been speculated, in the Anatolian region of what is modern-day Turkey and spread outwards from there.” Commenting on the inclusion of ancient languages in the analyses, he added: “The use of a number of known calibration points from ‘fossil’ languages greatly strengthens the conclusions.”

However, the findings have not found universal acceptance. Prof Petri Kallio from the University of Helsinki suggests that several cognate words describing technological inventions – such as the wheel – are evident across different languages. He argues that the Indo-European proto-language diversified after the invention of the wheel, about 5,000 years ago.

On the phylogenetic methods used to date the proto-language, Prof Kallio added: “So why do I still remain sceptical? Unlike archaeological radiocarbon dating based on the fixed rate of decay of the carbon-14 isotope, there is simply no fixed rate of decay of basic vocabulary, which would allow us to date ancestral proto-languages. “Instead of the quantity of the words, therefore, the trained Indo-Europeanists concentrate on the quality of the words.” Prof Pagel is less convinced by the counter-argument: “Compared to the Kurgan hypothesis, this new analysis shows the Anatolian hypothesis as the clear winner.”



‘Oldest English words’

Some of the oldest words in English have been identified, scientists say. Reading University researchers claim “I”, “we”, “two” and “three” are among the most ancient, dating back tens of thousands of years. Their computer model analyses the rate of change of words in English and the languages that share a common heritage. The team says it can predict which words are likely to become extinct – citing “squeeze”, “guts”, “stick” and “bad” as probable first casualties.

“We use a computer to fit a range of models that tell us how rapidly these words evolve,” said Mark Pagel, an evolutionary biologist at the University of Reading. “We fit a wide range, so there’s a lot of computation involved; and that range then brackets what the true answer is and we can estimate the rates at which these things are replaced through time.”

Sound and concept

Across the Indo-European languages – which include most of the languages spoken from Europe to the Asian subcontinent – the vocal sound made to express a given concept can be similar. New spoken words for a concept can arise in a given language, utilising different sounds, in turn giving a clue to a word’s relative age in the language.

At the root of the Reading University effort is a lexicon of 200 words that is not specific to culture or technology, and is therefore likely to represent concepts that have not changed across nations or millennia. “We have lists of words that linguists have produced for us that tell us if two words in related languages actually derive from a common ancestral word,” said Professor Pagel. “We have descriptions of the ways we think words change and their ability to change into other words, and those descriptions can be turned into a mathematical language,” he added.

The researchers used the university’s IBM supercomputer to track the known relations between words, in order to develop estimates of how long ago a given ancestral word diverged in two different languages. They have integrated that into an algorithm that will produce a list of words relevant to a given date. “You type in a date in the past or in the future and it will give you a list of words that would have changed going back in time or will change going into the future,” Professor Pagel told BBC News. “From that list you can derive a phrasebook of words you could use if you tried to show up and talk to, for example, William the Conqueror.”

That is, the model provides a list of words that are unlikely to have changed from their common ancestral root by the time of William the Conqueror. Words that have not diverged since then would comprise similar sounds to their modern descendants, whose meanings would therefore probably be recognisable on sound alone. However, the model cannot offer a guess as to what the ancestral words were. It can only estimate the likelihood that the sound from a modern English word might make some sense if called out during the Battle of Hastings.

Dirty business

What the researchers found was that the frequency with which a word is used relates to how slowly it changes through time, so that the most common words tend to be the oldest ones. For example, the words “I” and “who” are among the oldest, along with the words “two”, “three”, and “five”. The word “one” is only slightly younger. The word “four” experienced a linguistic evolutionary leap that makes it significantly younger in English and different from other Indo-European languages.

Meanwhile, the fastest-changing words are projected to die out and be replaced by other words much sooner. For example, “dirty” is a rapidly changing word; currently there are 46 different ways of saying it in the Indo-European languages, all words that are unrelated to each other. As a result, it is likely to die out soon in English, along with “stick” and “guts”. Verbs also tend to change quite quickly, so “push”, “turn”, “wipe” and “stab” appear to be heading for the lexicographer’s chopping block. Again, the model cannot predict what words may change to; those linguistic changes are according to Professor Pagel “anybody’s guess”.

High fidelity

“We think some of these words are as ancient as 40,000 years old. The sound used to make those words would have been used by all speakers of the Indo-European languages throughout history,” Professor Pagel said. “Here’s a sound that has been connected to a meaning – and it’s a mostly arbitrary connection – yet that sound has persisted for those tens of thousands of years.”

The work casts an interesting light on the connection between concepts and language in the human brain, and provides an insight into the evolution of a dynamic set of words. “If you’ve ever played ‘Chinese whispers’, what comes out the end is usually gibberish, and more or less when we speak to each other we’re playing this massive game of Chinese whispers. Yet our language can somehow retain its fidelity.”



