No new content for a while…

I’ve been a bit burned out as of late. In addition to the many projects I have my hands in, I’ve been working on job applications and home improvement. I hope to have some new content sometime in the near future, maybe mid-April.

More on site updates…

I’ve gotten sorting and sticky headers to work on all of my tables. What needs to be done now is figure out how to sort each column individually. On the Doerfer page, for example, we need to sort by English, German, Persian, and Doerfer’s Persian transliteration. Also, Azerbaijani (for example) sorts differently from English, with <x> following <h> and <q> following <k>. And, of course, the various Cyrillic alphabets have their own idiosyncrasies.

I’ll further have to figure out how to sort my transcribed forms. I think I’ll base it on character shape and try to be as consistent as possible. Whatever I figure out will be available on the Transcription page.

After that, I’ll look into introducing filters on the Language Forms page and maybe on the Compare page to enable filtering by lexical class and things like numerals, colors, animals, etc.

Site updates

I’ve been working on making my tables more interactive. This has required a lot of Javascript, something I’m not completely comfortable with. I’ve learned a ton, though, so it’s been worth it.

I’ve mostly been testing on two of the less vital pages: Doerfer and Missing. So far, I’ve been able to make things sort with a degree of relative success. Ideally, each column would have it’s own sort order so that Azerbaijani, say, would sort differently from English. This is going to require a lot more research. Right now, I’m using the tablesorter and sugar libraries.

Two more things I’d like to do are to set long tables to be sticky – that means that the headers scroll down the page with you – and to introduce filtering options on the language-specific pages. It would be nice to be able to filter out nouns or pronouns or numerals or even colors or animal names.

I’m only at 16050 entries right now. Part of the reason for creating the Doerfer page is to make it easier to navigate his materials. I’d like to get more Southern Oghuz data entered, but it can be challenging without a separate index. This page will remedy that.

16000 Forms

I realized that I never entered all of the Mrass Shor data I have access to, which means that the count is growing rapidly. The 16000th form is Mrass Shor for cold – sooq.

I’ve been slowly adding entries for the new glosses (honey, navel, dream, etc.) which has been slow, but productive. Now that my gloss count is up to 350, I have a mind to just keep adding glosses that I find interesting. I have about 50 more that I’ve considered adding – hammer, jump, needle, axe, birch… I’ve tried to be systematic in entering data, just because it’s annoying to have to revisit a lot of my data sources. I’ll think about it a bit more before deciding how to approach this.

I’ve also made a few adjustments to the website itself. The Missing Page is a bit more functional. I had originally set it up for my own personal use, but figured I may as well share it. Another update on my agenda is to combine the two pages that I use for the Compare feature. This is pretty trivial and I should have thought of it long ago.

New glosses!

It is not trivial to add new glosses to this database. It basically means I have to go back and consult all of my sources again. Some are only available via interlibrary loan, which means it could be some time before I obtain them.

At any rate, I’ve decided to add 8 more, to bring my total to 350. I had previously had 340, then added cat in memory of my cat, and grape because I received some questions about it. Here are the 8 I have selected, and why:

honeyLanguage Log has had some interesting discussions about Wanderwörter, and one of them was honey. I don’t expect any surprises here, just forms based on Old Turkic *bal.

wool – I chose this because it’s likely to be found in most Turkic languages as it is a culturally salient material. Also, it’s phonologically interesting. Doerfer reconstructs the Old Turkic form as *yuŋ. This leaves a lot of room for sound changes to occur as the combination of a palatal initial, back rounded vowel, and velar nasal should do interesting things to each other.

dream – This is one of those words that comes up a lot in the reconstruction of Proto-Turkic. It ends in an /š/ sounds (PT *tǖš?) and has a long vowel, so it’s got a lot to say about the Bolgar-Common split and the development of long vowels.

copper – Copper is another major Eurasian Wanderwört. This is one where I don’t know what to expect, so I’m excited. For most of the rest of the entries I have some idea as to the Proto-Turkic form; here I have no idea. Maybe it’s just a bunch of borrowings…who knows! At least now it can join gold and silver in the pantheon of precious metals.

crane – I chose crane because it’s a culturally salient bird and because I believe it’s got a palatal nasal after a consonant (PT *turńa?). I’m taking a chance on this one…

onion – Mostly because I’m curious. Sometimes it’s exciting when a common word can’t be traced to common proto-form. I suspect there’s been a lot of borrowing. We’ll see.

most – This is a grammatical particle that I should have had from the beginning. Central Turkic can be reconstructed as *eŋ. I’m curious about Siberian and Bolgar. I don’t think it’s in Doerfer, which is a bummer.

navel – This is another form commonly seen in Turkic reconstructions, as it’s a common two-syllable word.

There are so many more I could add. We’ll see whether I end up adding them. I’m still tempted to add words for genitals and bathroom stuff, as well as needle, dawn, eyelash, learn, frog, hammer, axe, fly, footprint

I’ll have a few entries for each of these forms later in the day. I expect this will boost my numbers very quickly.


I recently received an inquiry as to the form of ‘grape’ in a certain Turkic language, so I decided to add it to the database.

It has been interesting doing this research, as I cannot find a single native (i.e. not Russian vinograd) form in any Siberian language. You would think that grapes might be able to grow in the Altay region, but apparently not.

It is very difficult to reconstruct the proto form. Wiktionary gives *jüŕüm, as does Siemieniec-Gołaś. This reconstruction is appropriate for Central Turkic (i.e. non Siberian and non-Bolgar). Interestingly, Khalaj exhibits an initial /h/ (hüzüm). This might explain why some languages in the Common group have initial /y/ and others do not. I’m still not sure what to make of Khalaj initial /h/…

Where is gets weird is when we look at Chuvash and Western Yugur. Chuvash has iśĕm, which is unexpected. Any intervocalic /z/ should change to /r/ in Bolgar (word final /z/ is another matter for another time…). In Western Yugur, the forms are öǰüm, öčüm, üčüm. The Western Yugur forms neatly matches the Chuvash form and points to a proto-form like *ečüm, which is very strange. There’s not way to reconcile Common Turkic *(h/y)izüm with that.

A couple more notes: Mongolian үзэм refers to raisins, while усан үзэм (literally wet raisins…ugh…) refers to grapes. The Western Yugur forms may apply raisins only as well. Also, Russian изюм refers to raisins as well. It’s clearly derived from Turkic, but it’s unclear what language it’s from.

I suspect that all of the Turkic forms are borrowings from some other language, but it is unclear which one. Chinese uses pútáo (or something like that), the Persian languages all use angur… It is tempting to try to link it to Persian raz, meaning vine, cognate with Greek rháx, Latin racemus (whence raisin). However, the origin of these words is unknown, and we still have to account for the rounded vowels and the –üm at the end of the word.

Site updates

I’ve completely overhauled the website’s architecture. This shouldn’t result in any visible changes, but should ensure that there are no more broken links. Please let me know if you encounter any.

New Languages!

I’ve been adding new material left and right. As noted previously, I recently added Ili Salar. I’ve also found some decent Gagauz materials, and have been adding entries as I come across them. My Afghan Uzbek-Turkish dictionary just arrived, so I’ve added Afghan Uzbek (Sar-e Pol). I’m still waiting for the work on the Samangan/Aybak dialect, which I believe is more Kipchak in nature.

I found the most incredible open access journal: Tehlikedeki Diller Dergisi. They occasionally have grammatical sketches, which are really fantastic to have. I’ve added a ton of Dolgan forms, and will be adding a new language, Kalmak. I’ve previously treated Kalmak as a variety of Tomsk Tatar, but the author has convinced me that it’s distinct enough to warrant its own page. I’m well over 15000 entries and will likely pass 15400 very soon.

While I’m talking about great sources of information, I would remiss if I didn’t mention CyberLeninka, which is a source for open access Russian journals. Also, the Russian State Library has begun digitizing a lot of its collections, which means that tons of dissertations and other materials are now freely available.

I went for a long time without finding much new material, and now it just won’t stop. Here’s to hitting 16000 soon.


I’ve added a lot more Salar material. The source I took from uses Pinyin, which is not at all ideal for writing a Turkic language. I’m beginning to doubt whether it was worth adding this new material at all…

I’ve created a completely new page for the Ili variety of Salar, which is different enough that Dwyer considers it a separate dialect. The data for this dialect is really good. I’ve surpassed 15,000 entries in the spreadsheet use to collect my data, but I haven’t entered it into the actual database. I imagine it will all be up in a day or two.


I added some data from Telengit, bringing the count up to an even 14,650. I have a few more forms, but, hey, I like even counts.

Telengit is weird because it’s unclear how distinct it is. It has separate /e/ and /ä/ phonemes, which is a bit unusual for the region. Also, Teleuts refer to themselves as Telengits (or something like that), which makes everything very confusing.