General Notes

The Turkic languages can be difficult to classify for a number of reasons. First, the family is not very old - it began to differentiate between 2000-3000 years ago by my calculations - so the members of the family tend to be very similar. Second, Turkic varieties have been in contact with each other for most of their existence, making it difficult to tease apart their origins. Third, there are a number of Turkic varieties that defy the tree or Stammbaum method of classification - the wave model seems to be a better model, although it does not provide simple diagrams or easy classifications. Finally, speakers of Turkic languages employ different terms to refer to themselves and their languages, but these terms do not necessarily correspond to classification.

Language, Dialect, Variety, etc.

As noted, speakers of the Turkic languages often employ similar terms to refer to themselves and their languages. This is due to a shared identity, common tribal origins, or inheritance from a common ancestor. It may also be that some authority decided that certain groups of people formed a politically convenient ethnic group and that that group must therefore be assigned a name. Once a group has been named, many rely upon the linguistic ideology that ethnic group/tribe/village/etc. = language. However, membership in a group is merely that; it does not necessarily mean that members of a given group speak the same language.

What this does mean, however, is that linguists working on the Turkic languages need to be cautious when employing terms of identity and when referring to a speech variety as a language or a dialect.

Terms like language and dialect can have intensely political and personal meanings for many speakers. Even the use of the term Turkic may be seen as controversial, particularly among those who espouse the view that there is but one single Turkic/Turkish language with many divergent dialects.

While discussions of identity and the language/dialect split are interesting from a socio-linguistic point of view, the rest of the linguistic field does not rely upon these distinctions. What is important is not whether any given speech variety is a language or a dialect, but whether that speech variety can tell us anything. By comparing Turkic varieties from many different places, we can come to linguistic conclusions and gain insight into the history of the peoples who speak these varieties.

I have attempted to avoid terms like language and dialect, although this is not always possible or practical.

External Classification

Turkic is often considered a member branch of a larger Altaic family. This family, in its smallest instantiation includes the Turkic, Mongolic, and Tungusic languages. Expanded versions include Japanese and Korean. This theory is highly controversial, and to understand the arguments for and against often requires an arcane knowledge of ancient manuscripts, Hungarian loanwords, and laws of sound change. Rather than re-hash these arguments here, I refer to Georg et al. (1998) for a broad overview, and to their references for further detail.

I am personally not convinced by the Altaic hypothesis, and consider the resemblances between the languages in this putative family to be the result of borrowings and similar typologies.

Internal Classification

The Turkic languages are usually classified on certain phonological criteria; these criteria usually stand up when compared to lexicostatistical data. In creating a classification, Turcologists generally refer to four or five criteria to come up with several branches arranged in a tree structure (see Tekin 2005 for further detail):

  1. Reflexes of proto-Turkic (PT) *z/*š, e.g. Turkish dokuz ~ Chuvash tăxăr 'nine', Uzbek tiš ~ Chuvash šăl 'tooth'
  2. Reflexes of PT *d, e.g. Sakha atax, Turkish ayak 'foot'
  3. Preservation of initial *h, e.g. Khalaj hadaq, Turkish ayak 'foot'
  4. Reflexes of PT *g after high vowels, e.g. Kazakh sarï, Tuvan sarïg, Uyghur seriq 'yellow'
  5. Reflexes of PT *g after low vowels, e.g. Khakas taɣ, Altay tuu, Nogay tav 'mountain'

Applying these criteria results in the following groupings:

BolgarLenaSayanYenisseiKhalajNorthern AltayAltayKyrgyzKipchakIli TurkiTurkiOghuz

Although these criteria are commonly employed, they are not entirely satisfactory. I am especially dubious of the initial *h, as its inclusion in this list seems merely to distinguish Khalaj from the Sayan varieties. Nearly ever sub-grouping contains exceptions, and it is often difficult to tell if this is due to borrowing, innovation, or something else. Examining the above table lends support to the idea that the wave model more adequately describes the historical development of Turkic, although we can at least be certain that the Bolgar varieties were the first to branch off.

Looking at the table above, we can establish two major sub-groupings. First, all groups except Bolgar exhibit z/š (Lena has undergone further changes). We can refer to this group as Common Turkic, adopting a term from Schönig (1997). Second the last few groupings all have y from Proto-Turkic *d. We can refer to this group as Central Turkic (again from Schönig).

Working from this point, and incorporating classificatory measures from a number of other authors (see References), we arrive at this tree.


