25000 entries!

Quite the milestone. The 25000th entry is the Turkmen word for “strawberry”: ýer tudanasy (yer tūdanasï). It’s a compound, with ýer meaning “ground”. The tudana (note the long vowel) part is somehow related to tut “mulberry” and refers to the fruit of the mulberry tree. It’s one of those words that was likely passed back and forth between languages.

I’ve added a few new references and have gone back to add long vowels into the Turkmen data. Turkmen preserves long vowels in places that other languages don’t, so it’s vital for reconstruction.

I’ve already got a list of 27 potential new glosses to add like “chalk” and “jaw” and “knot”, as well as a ton of verbs of motion like “get on”, “enter”, “exit”, “arrive”. Maybe once I’ve finished with the data from the last 50 glosses I’ll add these new ones.

23000 surpassed, considering even more glosses

I’ve nearly completed adding Karakhanid data from Dīwān Luγāt at-Turk. This has brought me to 23000 entries. This also means that I’m nearly out of sources to consult until I can visit a bigger library (which is still difficult due to COVID).

I have considered adding 50 new glosses, which would bring my total up to 500. I’m considering new body parts/functions (palm, feces, pus, sole, hoof, vein), some plants and animals (cockroach, juniper), directional/positional terms (top, bottom, interior, side), and a few random conceptual and cultural terms (wedding, color, thief). I have 38 terms so far; the Dīwān index was very helpful in choosing these. Once I have decided, I’ll post them here. I’ll also do some background work to ensure that I’m not going back to the same sources and looking for terms that aren’t in there. It’s frustrating.

As side note, I have really enjoyed reading the following article: Janhunen, Juha. “Issues of Comparative Uralic and Altaic Studies (3): The Turkic Plural in *-s.” Altai Hakpo, 2017. He breaks down a lot of issues relating to the unusual number of paired items ending in /z/, the z~r controversy, and typological issues related to paired/plural items.

20000 Entries!

I’ve hit a major milestone with 20000 entries. Entry 20000 is the Yurt Tatar word for eye: күз (küz).

I’ve mostly been working on Khorasani Turkic. This is a very tedious task, as it involves concatenating data from various locations into a single list for each of the five varieties. Doerfer and Hesche use a very narrow transcriptions, so all of the differences between towns have been transcribed. I just completed the Northwest dialect, which only has 2 data points and was therefore the easiest.

I added Yurt Tatar data to the latest Khorasani dump to reach a nice even number. I was able to get my hands on a grammatical sketch by Arslanov (1976), which has some great data on this very poorly attested variety.

Still here…

I’ve been a bit slower about adding new forms lately. I have reached 14,450 entries.

I’ve added 191 Cuman forms from the Codex Cumanicus. Cuman is challenging because it’s transcribed using medieval Italian conventions, which don’t do well with a lot of Turkic sounds.

I’ve also been working through Urum. This has been incredibly tedious, as its mixed nature means that every gloss has a ton of forms. Occasionally you run across a Greek word, which is exciting.

After I’m done with Urum I’d like to work on some more medieval Turkic languages. It’s harder to find data on these, but I’ve got hopes that I can get some Mamluk or Bolgar data.


Well, I’ve been on a roll. I’ve just added entry no. 14,000. This latest is the Krymchak word for ‘flower’ – čiček.

I’ve got a lot more Krymchak to add, too. Every time I think I’m running out of languages or sources I find more and more. Krymchak is an interesting case because the dictionary my library holds was shelved under PJ; the Library of Congress classification scheme arbitrarily classes ‘Other languages used by Jews’ at the end of the Hebrew range. Our Karaim materials, however, were classed under PL with the rest of the Turkic language material. Weird.

I’m still considering what to do with the glosses I proposed in the previous post – still no decision.

Also, I’d like to set up a page devoted to Crimean Turkic, maybe even incorporating a fancy interactive map. We’ll see…

13,000 forms!

Another month, another milestone. This time, it’s reaching 13,000 forms. Entry 13,000 is Karakalpak for ‘far’ – узақ, алыс, қашық.

I’ve mostly been working on Karakalpak, but have also added some new sources for Western Yugur and Fu-yü Gïrgïs.

12000 forms!

We’ve hit 12000 entries in the database. The latest one is ҡош/qoš – Siberian Tatar for bird. I’ve got about 120 more entries to add for Siberian Tatar.

After I’m done with that, I think I’ll conduct a review of what I’m missing. I know I need to do some serious work to finish Turkish and Tatar (both relatively easy). After that, I’ll likely do some work on the Southern Oghuz languages. These will be a bit more complicated, and working on them might mean I have to reevaluate all of the data I have for these languages.

Something I’d like to research a bit is the Siberian Tatar month names. The dictionary I have lists two forms for each month – one Russian, and one native. I’m not sure where the native forms come from. They don’t look especially Turkic, and as far as I can tell, Tatar uses Russian month names exclusively now.

I’ve found an online Tatar-English dictionary, but haven’t explored it much: https://tt.oxforddictionaries.com/. This is super exciting, because I’ve had little luck finding a Russian-Tatar online dictionary.

Siberian Tatar

I finally got in the Russian-Siberian Tatar dictionary that I requested via interlibrary loan. It treats Siberian Tatar as a single language. Because there’s not much information about any of the Siberian Tatar varieties (Tobol-Irtysh, Baraba, Tomsk), this is a really valuable resource to have. I’ve created a new entry for a literary Siberian Tatar language with links to each of the varieties.

Thanks to this new resource, I’m up to 11,800 entries, and should have 12,000 very soon.

Standard Siberian Tatar looks a lot like standard Tatar, except that /č/ is /ts/ and the voicing distinction is lost at word boundaries. There’s a bit of confusion as to the origin of Siberian Tatar, but the fact that it looks so much like standard Tatar (especially in the vowel shift that occurred among the languages of the Ural-Volga region) suggests that it may represent an eastward migration. The language then would have mixed with local varieties of Turkic, losing the voicing distinction and the /č/ sound, and likely picking up some of the unique vocabulary we see in many of the dialects. Further research is needed to back up this hypothesis.

11000 forms!

I’ve hit 11000 forms in the database. This one’s not too exciting – it’s Turkish for “egg” – yumurta. Whenever I can’t hit an even number but want to insert a batch of forms, I’ll fill in the gaps with one of the easier languages. In this case it’s Turkish. Because Turkish is the easiest language to get data on, I’ve saved it for last.

In other news, I’ve come across discussion of language I’d never even heard of: Chanto. Chanto is a Turki variety spoken in Western Mongolia. The speakers identify as Uyghurs, although many sources call them Uzbeks. Their language is distinct enough to have its own entry, so I’ve set that up here. The same sources give some description of Altay Tuvan, and their data doesn’t neatly align with what I’ve already found. I’m not convinced that this new data is very good, but it’s all I’ve got to work with.

10000 Entries

I’ve hit 10000 entries in the lexical database. It would have been nice to have reached this at a nice stopping point, but I’m in the middle of entering Soyot terms. The 10000 entry is the Soyot word for ant: һымысқа. I estimate that I’ll be able to add about 5000 more entries, but more data may surface, so who knows.

Having reached this point I think it’s about time to reorganize the site. I have a better idea of what I want to do and need to make it more accessible.

For now, enjoy this map I’ve begun to mock up. Ideally, it will have overlays corresponding to different features. Once I’ve done a little cleanup this will need a home on the front page. Now on to finishing entering all the available data, figuring out what to do with Khorasani Turkic, working on a grammar template…