20000 Entries!

I’ve hit a major milestone with 20000 entries. Entry 20000 is the Yurt Tatar word for eye: күз (küz).

I’ve mostly been working on Khorasani Turkic. This is a very tedious task, as it involves concatenating data from various locations into a single list for each of the five varieties. Doerfer and Hesche use a very narrow transcriptions, so all of the differences between towns have been transcribed. I just completed the Northwest dialect, which only has 2 data points and was therefore the easiest.

I added Yurt Tatar data to the latest Khorasani dump to reach a nice even number. I was able to get my hands on a grammatical sketch by Arslanov (1976), which has some great data on this very poorly attested variety.

Khorasani Turkic

This language (group of languages? cluster of relatively unrelated dialects?) is causing me headaches. Doerfer and Hesche (1993) collects word lists from 20-something locations in Khorasan, and in this 1993 work they group the varieties spoken in these locations into 5 dialects (NW, NE, N, SE, SW). This seems like a straightforward matter of gathering the data from all of these points, then dividing it into 5 different varieties – tedious, but doable.

The problems arise when you start to find the other ways that Doerfer classifies Khorasani. Are there actually only two dialects (north and south)? Are there three, based on which branch of Oghuz Doerfer assigns the varieties to — Central Oghuz (Southwest), Southern Oghuz (Northwest, Southwest), and East Oghuz (North, Northeast)? What about the places where he claims that varieties in Turkmenistan and even Uzbekistan can be considered varieties of Khorasani? This is a historically reasonable proposition, but ignores the fact that Turkmen and Oghuz Uzbek have had a good century of independent development thanks to Soviet language policy.

Doerfer’s works are based on the speech varieties found in different towns. He (justifiably) makes no attempt to unify his data into an aggregate average that can be treated as a unique language or dialect. This means that there is a ton of data that I could potentially use, but the question is how do I use it?