25000 entries!

Quite the milestone. The 25000th entry is the Turkmen word for “strawberry”: ýer tudanasy (yer tūdanasï). It’s a compound, with ýer meaning “ground”. The tudana (note the long vowel) part is somehow related to tut “mulberry” and refers to the fruit of the mulberry tree. It’s one of those words that was likely passed back and forth between languages.

I’ve added a few new references and have gone back to add long vowels into the Turkmen data. Turkmen preserves long vowels in places that other languages don’t, so it’s vital for reconstruction.

I’ve already got a list of 27 potential new glosses to add like “chalk” and “jaw” and “knot”, as well as a ton of verbs of motion like “get on”, “enter”, “exit”, “arrive”. Maybe once I’ve finished with the data from the last 50 glosses I’ll add these new ones.

23000 surpassed, considering even more glosses

I’ve nearly completed adding Karakhanid data from Dīwān Luγāt at-Turk. This has brought me to 23000 entries. This also means that I’m nearly out of sources to consult until I can visit a bigger library (which is still difficult due to COVID).

I have considered adding 50 new glosses, which would bring my total up to 500. I’m considering new body parts/functions (palm, feces, pus, sole, hoof, vein), some plants and animals (cockroach, juniper), directional/positional terms (top, bottom, interior, side), and a few random conceptual and cultural terms (wedding, color, thief). I have 38 terms so far; the Dīwān index was very helpful in choosing these. Once I have decided, I’ll post them here. I’ll also do some background work to ensure that I’m not going back to the same sources and looking for terms that aren’t in there. It’s frustrating.

As side note, I have really enjoyed reading the following article: Janhunen, Juha. “Issues of Comparative Uralic and Altaic Studies (3): The Turkic Plural in *-s.” Altai Hakpo, 2017. He breaks down a lot of issues relating to the unusual number of paired items ending in /z/, the z~r controversy, and typological issues related to paired/plural items.

20000 Entries!

I’ve hit a major milestone with 20000 entries. Entry 20000 is the Yurt Tatar word for eye: күз (küz).

I’ve mostly been working on Khorasani Turkic. This is a very tedious task, as it involves concatenating data from various locations into a single list for each of the five varieties. Doerfer and Hesche use a very narrow transcriptions, so all of the differences between towns have been transcribed. I just completed the Northwest dialect, which only has 2 data points and was therefore the easiest.

I added Yurt Tatar data to the latest Khorasani dump to reach a nice even number. I was able to get my hands on a grammatical sketch by Arslanov (1976), which has some great data on this very poorly attested variety.

18000 Entries

I’ve added a bunch of new glosses to the database (e.g. spider, jump, learn, valley, beard). As a result, there are a lot of new entries. Entry number 18000 is һирәә, which is the Soyot word for saw (tool). More to come!

17000 Entries!

I’ve hit entry 17000 today. The most recent is Teleut for “how”: қайында, кааньда . I’m at a point where I have so many data sources that I can’t forsee an end point. I still haven’t done anything much with Khorasani Turkic, which will result in another 1000 or so entries, have tons more Dolgan and Teleut to input, haven’t done as much as I would like with Orkhon Turkic, etc. etc.

16000 Forms

I realized that I never entered all of the Mrass Shor data I have access to, which means that the count is growing rapidly. The 16000th form is Mrass Shor for cold – sooq.

I’ve been slowly adding entries for the new glosses (honey, navel, dream, etc.) which has been slow, but productive. Now that my gloss count is up to 350, I have a mind to just keep adding glosses that I find interesting. I have about 50 more that I’ve considered adding – hammer, jump, needle, axe, birch… I’ve tried to be systematic in entering data, just because it’s annoying to have to revisit a lot of my data sources. I’ll think about it a bit more before deciding how to approach this.

I’ve also made a few adjustments to the website itself. The Missing Page is a bit more functional. I had originally set it up for my own personal use, but figured I may as well share it. Another update on my agenda is to combine the two pages that I use for the Compare feature. This is pretty trivial and I should have thought of it long ago.

New Languages!

I’ve been adding new material left and right. As noted previously, I recently added Ili Salar. I’ve also found some decent Gagauz materials, and have been adding entries as I come across them. My Afghan Uzbek-Turkish dictionary just arrived, so I’ve added Afghan Uzbek (Sar-e Pol). I’m still waiting for the work on the Samangan/Aybak dialect, which I believe is more Kipchak in nature.

I found the most incredible open access journal: Tehlikedeki Diller Dergisi. They occasionally have grammatical sketches, which are really fantastic to have. I’ve added a ton of Dolgan forms, and will be adding a new language, Kalmak. I’ve previously treated Kalmak as a variety of Tomsk Tatar, but the author has convinced me that it’s distinct enough to warrant its own page. I’m well over 15000 entries and will likely pass 15400 very soon.

While I’m talking about great sources of information, I would remiss if I didn’t mention CyberLeninka, which is a source for open access Russian journals. Also, the Russian State Library has begun digitizing a lot of its collections, which means that tons of dissertations and other materials are now freely available.

I went for a long time without finding much new material, and now it just won’t stop. Here’s to hitting 16000 soon.


I’ve added a lot more Salar material. The source I took from uses Pinyin, which is not at all ideal for writing a Turkic language. I’m beginning to doubt whether it was worth adding this new material at all…

I’ve created a completely new page for the Ili variety of Salar, which is different enough that Dwyer considers it a separate dialect. The data for this dialect is really good. I’ve surpassed 15,000 entries in the spreadsheet use to collect my data, but I haven’t entered it into the actual database. I imagine it will all be up in a day or two.

Done with Urum, on to Afghan Uzbek

I’ve finally finished with Urum. What a pain in the neck. This brings the count of total entries up to 14608. I know it’s not a nice round number, but I couldn’t bear to leave off the last 8 Urum entries.

I have some Afghan Uzbek materials on the way. There’s not much written about Afghan Uzbek, but from what I can tell from past research, there is a ton of variation. At least one variety (hopefully the one in the next book I’m getting) is pretty clearly Kipchak.

I re-did the main site using some PHP tricks to get everything into a single document. I would love to do this with the Turkic site, but I’m worried that it might get messy… Maybe if I just shove the entire contents of a single page into a database or set them up as variables in separate PHP files… We will see.

Still here…

I’ve been a bit slower about adding new forms lately. I have reached 14,450 entries.

I’ve added 191 Cuman forms from the Codex Cumanicus. Cuman is challenging because it’s transcribed using medieval Italian conventions, which don’t do well with a lot of Turkic sounds.

I’ve also been working through Urum. This has been incredibly tedious, as its mixed nature means that every gloss has a ton of forms. Occasionally you run across a Greek word, which is exciting.

After I’m done with Urum I’d like to work on some more medieval Turkic languages. It’s harder to find data on these, but I’ve got hopes that I can get some Mamluk or Bolgar data.