Shirts and such

I’ve had a few ideas for t-shirts, so I’ve created a Redbubble page. There’s more to come, but for now, check out my profile here:

So far I’ve got one for Adelaide Hasse, one from the Irk Bitig, and a random cracker design that I had sitting around.

More to come soon!

New Map, New Language?

In addition to working on point-based maps, I tried my hand at an SVG based map that can be altered by selecting various criteria. The result in this map of Kazakhstan. Like many of the other maps I’ve created, this map shows ethnic groups rather than languages spoken. It’s the best we can do as a proxy for language. You can display a number of different ethnic groups and choose to display either percentage of total speakers in a district or total number of speakers relative to the district with the highest number of speakers. The shading of the map will change, as will the key if you opt for number of speakers. It’s been fun and educational to create and I may try my hand with other countries. China has been particularly difficult to sort out, so that may be next. Incidentally, if anyone has data for Mongolia, I would love to see it.

I’ve been aware of Romanian Tatar for some time, but haven’t seen much information on it. It’s basically a variety of Crimean Tatar. Some sources call it Kipchak, but I’ll withhold judgement on that classification until I can inspect the data. Omniglot has a great bunch of resources on it. I’ll likely add it to the database once I’ve got some better data.

Maps & Stuff

I’ve been working on doing a massive overhaul of the maps page. There were some issues with Leaflet.js, the map creation app that I’ve been using. Also, I’ve wanted to create a more granular, more accurate set of maps than I’d previously had.

What’s up now is a map showing census data from Russia, Ukraine, and Romania. The data gathering is slightly different in each country, so it shouldn’t be seen as comparing apples to apples. Russia is complete, Romania is complete, and Ukraine is nearly complete. The biggest surprise so far is the vast range covered by Gagauz speakers in Ukraine. Crimean Tatar is pretty much only in Crimea, while Karaim is very scattered. These are the only three Turkic languages covered in the 2001 Ukrainian census (at the village level), so that’s all I can show.

Once Ukraine is done, I have a lot of work to do:

  • First, I need to edit the code so users can select only a single country or certain languages.
  • Next, I plan to continue to search national census/statistics websites to see if this kind of granular information is available elsewhere.
  • I’d like to include other types of maps where this data is not available. The Georgian census, for example, had province-level data about Azerbaijani speakers that I’d like to include is some other format.
  • Finally, I’d like to restore some of my former maps: Baraba Tatar dialects, Khalaj and Urum villages, etc. Each map needs to have properly cited sources to ensure that anyone looking at this site isn’t comparing very different types of information.

General updates

Some quick updates:

I’ve added some Salar and Ili Salar forms, bringing the total number of entries up to 25,400. However, I’ve been really busy lately and haven’t had much time to put too much work into this website.

One of the things I’ve been working on is Wikipedia. I used to edit it quite a bit, but stopped until this past spring. I worked on a Wikipedia Edit-a-thon in May, and since then I’ve been really active. You can see my profile here.

I’ve also been busy writing a paper in library science, which will be published sometime in the next few months in the journal Library Resources & Technical Services. Exciting stuff!

Other things that have kept me busy are proofreading titles published by Language Science Press and, as of yesterday, I’ve been asked to review a manuscript for Archiv Orientální.

I really need to update my CV. You can find that on the main website.

25000 entries!

Quite the milestone. The 25000th entry is the Turkmen word for “strawberry”: ýer tudanasy (yer tūdanasï). It’s a compound, with ýer meaning “ground”. The tudana (note the long vowel) part is somehow related to tut “mulberry” and refers to the fruit of the mulberry tree. It’s one of those words that was likely passed back and forth between languages.

I’ve added a few new references and have gone back to add long vowels into the Turkmen data. Turkmen preserves long vowels in places that other languages don’t, so it’s vital for reconstruction.

I’ve already got a list of 27 potential new glosses to add like “chalk” and “jaw” and “knot”, as well as a ton of verbs of motion like “get on”, “enter”, “exit”, “arrive”. Maybe once I’ve finished with the data from the last 50 glosses I’ll add these new ones.

24000 entries!

While going through all of my sources again, I realized that I hadn’t entered much for Dolgan. This has been pretty quick going, thanks to Stachowski’s Dolganischer Wortschatz. Today I reached 24000 entries. Entry 24000 is the Dolgan word for “glass” – hǟrkälä. This word is pretty interesting – it’s ultimately a borrowing from Russian зеркало, which means mirror. It means mirror, too, in Dolgan, but also means glass. I haven’t been able to find any other forms with that meaning for Dolgan. Russian зеркало was borrowed into Sakha as сиэркилэ, where it also means mirror. I’m not convinced the Dolgan form is descended from the Sakha form, as the vowels are pretty different, but there is likely some relationship. It is likely that early Russian traders traded manufactured goods like mirrors with the locals, who borrowed the word. Apply some vowel harmony and Dolgan’s strong dislike for /s/-sounds, and you get hǟrkälä. As glass was the only unknown component of these traded mirrors, the terms became conflated.

As a side note, I chose the term “glass” because I wanted to see if the early Turks had access to this technology. Most Turkic languages either use terms for manufacture products (such as bottles or mirrors) to mean “glass”, or borrow from other languages. This indicates that glass was unknown to them in ancient times. Also, glassmaking was developed only about 4000 years ago in Mesopotamia and only in the 5th Century CE in China. So any glass objects that the oldest Turkic civilizations would have had would have to come from the Middle East or Europe, and would not have been made locally. This may tell us something about their metallurgical practices, as it is believed that glass was discovered as a byproduct of metallurgy, when hot metal came into contact with sand.


Naturally, after adding 50 new glosses to the database I’ve run across a new one that I’d like to add: chalk. Chuvash has пурӑ, пур, Kazakh and Kymyk have бор

Wiktionary suggests that the Kazakh for comes from Russian бор “boron”, but this is clearly conflating the boron meaning with the chalk meaning. Fedotov says this is a native Turkic term and ties it to Sakha буор “earth, clay”. (Tuvan has пор and Tòfa has бор for clay as well). In Bashkir, the form is either бур or аҡбур, suggesting that the original term may have referred to crumbly stone or soil, with color terms used to distinguish between chalk, clay, etc.

I’ve entered forms for the latest 50 glosses for Turkish, Tuvan, Dzhungar Tuvan, Sakha, and Chuvash, and I’m working on Azerbaijani. Once I’ve made my rounds, I may add chalk to the database, plus whatever else I find.

As you can see above, there are a lot of cases where it could be useful to suggest related terms. Knowing that chalk is related to earth and clay could be beneficial. I may work on this in the near future as well.

Kinship terminology

Something that has been irking me is the inability (so far) to have kinship terminology in this site. The problem is that English, Russian, German, and French have kinship systems that are a bit more basic that those found in Turkic. Many of the languages I am aware of employ complex systems that distinguish maternal or paternal relationship, relative age, and gender. This means that many grammars and dictionaries will translate a term simply as brother (rather than older or younger brother) or aunt (rather than father’s sister or mother’s brother’s wife).

A second issue is that there is considerable variation both between languages/varieties and within languages. This makes comparison difficult. Also, many terms are borrowed from other languages, such as Turkish hala and teyze (maternal and paternal aunt, respectively), which were borrowed from Persian.

Perhaps I’ll work out a new scheme for more complicated lexemes and morphemes. Some day I’d like to have kinship terms, case morphology, verbal morphology and other forms; for now I’ll focus on more easily defined terms.

23000 surpassed, considering even more glosses

I’ve nearly completed adding Karakhanid data from Dīwān Luγāt at-Turk. This has brought me to 23000 entries. This also means that I’m nearly out of sources to consult until I can visit a bigger library (which is still difficult due to COVID).

I have considered adding 50 new glosses, which would bring my total up to 500. I’m considering new body parts/functions (palm, feces, pus, sole, hoof, vein), some plants and animals (cockroach, juniper), directional/positional terms (top, bottom, interior, side), and a few random conceptual and cultural terms (wedding, color, thief). I have 38 terms so far; the Dīwān index was very helpful in choosing these. Once I have decided, I’ll post them here. I’ll also do some background work to ensure that I’m not going back to the same sources and looking for terms that aren’t in there. It’s frustrating.

As side note, I have really enjoyed reading the following article: Janhunen, Juha. “Issues of Comparative Uralic and Altaic Studies (3): The Turkic Plural in *-s.” Altai Hakpo, 2017. He breaks down a lot of issues relating to the unusual number of paired items ending in /z/, the z~r controversy, and typological issues related to paired/plural items.

So close to 23000…

I’ve finished adding forms from both Azovian and Georgian Urum, as well as Iraqi Turcoman. I’m so close to 23000 entries, but have hit a bit of a block in terms of finding more sources for data. I don’t have access to the massive library collections that I used to, so I’m unable to get more data for Salar and other languages.

However, I am considering adding about 20-25 new glosses, including palm, glass, moustache, shovel, feces, poison, and coal. This is no small matter, as it means that I have to go back to all of my previously consulted sources and get new data. It’s a lot to keep track of and I’d prefer to have a large chunk to work on rather than a handful of easily mislaid words. Before I do any of this I’ll be updating the missing page to ensure that I’m focusing only on newly added glosses, rather than old ones that I know I can’t get translations for. We shall see. I may just try to come up with 50 so I don’t feel like I’m hopping between sources every day.

Nobody but spam bots seems to ever read this, but if anyone has any leads on Kondoma or Upper Shor, I’d appreciate it. I’ve seen some scholars brush it under the rug as just an endangered dialect, but I think it’s a linchpin that holds together the classification of Turkic. Losing Lower Chulym was devastating, but I think that Kondoma Shor is similar enough that it could fill in the missing insights that I had hoped would come from Lower Chulym.