Site updates

I’ve completely overhauled the website’s architecture. This shouldn’t result in any visible changes, but should ensure that there are no more broken links. Please let me know if you encounter any.

New Languages!

I’ve been adding new material left and right. As noted previously, I recently added Ili Salar. I’ve also found some decent Gagauz materials, and have been adding entries as I come across them. My Afghan Uzbek-Turkish dictionary just arrived, so I’ve added Afghan Uzbek (Sar-e Pol). I’m still waiting for the work on the Samangan/Aybak dialect, which I believe is more Kipchak in nature.

I found the most incredible open access journal: Tehlikedeki Diller Dergisi. They occasionally have grammatical sketches, which are really fantastic to have. I’ve added a ton of Dolgan forms, and will be adding a new language, Kalmak. I’ve previously treated Kalmak as a variety of Tomsk Tatar, but the author has convinced me that it’s distinct enough to warrant its own page. I’m well over 15000 entries and will likely pass 15400 very soon.

While I’m talking about great sources of information, I would remiss if I didn’t mention CyberLeninka, which is a source for open access Russian journals. Also, the Russian State Library has begun digitizing a lot of its collections, which means that tons of dissertations and other materials are now freely available.

I went for a long time without finding much new material, and now it just won’t stop. Here’s to hitting 16000 soon.


I’ve added a lot more Salar material. The source I took from uses Pinyin, which is not at all ideal for writing a Turkic language. I’m beginning to doubt whether it was worth adding this new material at all…

I’ve created a completely new page for the Ili variety of Salar, which is different enough that Dwyer considers it a separate dialect. The data for this dialect is really good. I’ve surpassed 15,000 entries in the spreadsheet use to collect my data, but I haven’t entered it into the actual database. I imagine it will all be up in a day or two.


I added some data from Telengit, bringing the count up to an even 14,650. I have a few more forms, but, hey, I like even counts.

Telengit is weird because it’s unclear how distinct it is. It has separate /e/ and /ä/ phonemes, which is a bit unusual for the region. Also, Teleuts refer to themselves as Telengits (or something like that), which makes everything very confusing.

Done with Urum, on to Afghan Uzbek

I’ve finally finished with Urum. What a pain in the neck. This brings the count of total entries up to 14608. I know it’s not a nice round number, but I couldn’t bear to leave off the last 8 Urum entries.

I have some Afghan Uzbek materials on the way. There’s not much written about Afghan Uzbek, but from what I can tell from past research, there is a ton of variation. At least one variety (hopefully the one in the next book I’m getting) is pretty clearly Kipchak.

I re-did the main site using some PHP tricks to get everything into a single document. I would love to do this with the Turkic site, but I’m worried that it might get messy… Maybe if I just shove the entire contents of a single page into a database or set them up as variables in separate PHP files… We will see.

Happy New Year

I’m still here!

I’ve got about 50 entries to add for Urum (what a slog!), and then I can move on. I have been considering condensing everything on this site into a single page that is navigable by PHP. Not sure if that’s wise, but I would like to make some further changes to the basic architecture, and it would be nice not to have to edit 10-15 pages every time I do that.

One big change I’m considering is introducing a Library page. There’s a lot of good material out there in the public domain, and I think visitors would find it useful to be able to access some of it. A little related side project I have going is creating a guide to Radlov’s Опыт словаря тюркских нарѣчій / Versuch eines Wörterbuches der Türk-Dialecte. It isn’t completely done yet, but I’m happy with my progress.

I previously mentioned that I wanted to add some new glosses to the site. I haven’t done so yet, but one reason I would like to is because I have been inspired by a series of posts by Victor Mair over at Language Log. The one on honey is particularly interesting, as Turkic *bal? could easily be related. There are a number of very old Wanderwörter that suggest connections between Sinitic and Indo-European, with various Central Asian languages playing intermediary roles. Words for things like apple, honey, deer, tea, silk, etc. have all traveled throughout Eurasia. Finding Turkic connections would be especially interesting.

I’ve also begun adding etymologies where possible. It’s been interesting reading more about these, especially as I have comparatively little experience with older Turkic languages and even less with the languages early Turkic was in contact with.

Unrelated to all of this, I’ve begun a second blog where I discuss multilingual issues in the library catalog. I have a lot to say, but so little time to say it. So far I’ve just posted a quick hello. I hope to have some posts about the history of character encoding soon.

Expect further changes, likely sometime in February!

Still here…

I’ve been a bit slower about adding new forms lately. I have reached 14,450 entries.

I’ve added 191 Cuman forms from the Codex Cumanicus. Cuman is challenging because it’s transcribed using medieval Italian conventions, which don’t do well with a lot of Turkic sounds.

I’ve also been working through Urum. This has been incredibly tedious, as its mixed nature means that every gloss has a ton of forms. Occasionally you run across a Greek word, which is exciting.

After I’m done with Urum I’d like to work on some more medieval Turkic languages. It’s harder to find data on these, but I’ve got hopes that I can get some Mamluk or Bolgar data.


Well, I’ve been on a roll. I’ve just added entry no. 14,000. This latest is the Krymchak word for ‘flower’ – čiček.

I’ve got a lot more Krymchak to add, too. Every time I think I’m running out of languages or sources I find more and more. Krymchak is an interesting case because the dictionary my library holds was shelved under PJ; the Library of Congress classification scheme arbitrarily classes ‘Other languages used by Jews’ at the end of the Hebrew range. Our Karaim materials, however, were classed under PL with the rest of the Turkic language material. Weird.

I’m still considering what to do with the glosses I proposed in the previous post – still no decision.

Also, I’d like to set up a page devoted to Crimean Turkic, maybe even incorporating a fancy interactive map. We’ll see…

A brief history, new glosses?

This project started as something I began many years ago. When I lived in Turkey, my host mother had a comparative dictionary of the Turkic languages, and I would marvel at how similar yet different they could be. Later, I began my own database of sorts on a series of notecards. I eventually tossed the notecards and moved on, in part because I hadn’t been consistent in my transcription and because they were cumbersome to transport and use.

More recently, I took a class on SQL, and once my instructor recommended I look into PHP, I realized I could publish data to the web. It was then that I began putting this all together – at first on my hard drive, and later on this website.

Thinking back to those original cards, I have come remember that I once had a number of glosses that do not exist in this database. And looking through the many, many dictionaries, grammars, and field reports in my references, I have come to realize that other authors found some of these glosses to be important as well. Adding new glosses is no small task as it can be annoying to have to revisit old sources. In some cases, I may have to wait weeks, as I obtained them through interlibrary loan.

For now, here is a preliminary list of the new glosses I have considered adding:

  • butterfly
  • fly
  • walnut
  • hammer
  • ax
  • bee
  • honey
  • wool
  • thread
  • footprint
  • penis
  • vulva
  • urine
  • feces

I was on the fence about the last four, given their taboo nature. However, they do show up in sources with surprising frequency. Even the Codex Cumanicus has them. If it’s good enough for Late Medieval Italians, it’s good enough for me.

Because I’m a bit obsessive (as the existence of this site shows), I might try to add a few more to achieve a nice, round number. However, adding these 14 will bring the total to 365, which is certainly nice.


I’ve been adding tons of forms here and there, with finishing up certain languages and working towards finishing others (Uzbek is on the list right now).

I’ve also begun an interesting foray into Cuman, a medieval language spoken by an early Kipchak people in the steppes of Ukraine and Eastern Europe. Even in the 13th and 14th centuries there’s quite a bit of Persian influence.

Unfortunately, the Codex Cumanicus is written in very confusing medieval Latin, so there are bound to be tons of mistakes on my part. My favorite so far is lupi ceruerij. When I saw the Cuman gloss was silausun I thought “Huh, that looks like the Kipchak word for lynx.” Sure enough, a little searching reveals that lupi cervieri was a term that early Italian traders and costumiers used for lynx fur. The term (which literally just means “wolf-deer”) seems to have had other meanings, but finding those out is a bit beyond me now.

The transcription is very inconsistent, which makes figuring out the original form very difficult. The letter x, for example, seems to represent what I transcribe as z, č, and s. Basically, if I don’t have a modern word to check the Cuman form against, I can’t reconstruct anything.