At some point the version of SQL my database was using was updated, and that deprecated the syntax I was using for regular expressions. This messed up everything. I was able to find a relatively easy fix. This was all after I considered completely redoing the website. I still may do so, but for now I’ll focus on minor fixes.
I’ve gotten a complaint that I haven’t been adding much new data. Please keep in mind that I don’t get paid for this. Also, people have been sending me dubious posts on random websites as data sources. Not useful.
I’ve been otherwise very busy with life in general and new endeavors, so please be patient. One potential project is a re-do of the map of the Turkic world I first produced in 2021. There are issues (as there are with all linguistics maps), and I’d like to fix those.
My host recently moved servers, and, as a result, some of the code within the database has become wonky. The data is still all present in the background, but it’s currently inaccessible to the public. I hope to figure out the issue and correct it soon.
I’ve had a few ideas for t-shirts, so I’ve created a Redbubble page. There’s more to come, but for now, check out my profile here:
So far I’ve got one for Adelaide Hasse, one from the Irk Bitig, and a random cracker design that I had sitting around.
More to come soon!
In addition to working on point-based maps, I tried my hand at an SVG based map that can be altered by selecting various criteria. The result in this map of Kazakhstan. Like many of the other maps I’ve created, this map shows ethnic groups rather than languages spoken. It’s the best we can do as a proxy for language. You can display a number of different ethnic groups and choose to display either percentage of total speakers in a district or total number of speakers relative to the district with the highest number of speakers. The shading of the map will change, as will the key if you opt for number of speakers. It’s been fun and educational to create and I may try my hand with other countries. China has been particularly difficult to sort out, so that may be next. Incidentally, if anyone has data for Mongolia, I would love to see it.
I’ve been aware of Romanian Tatar for some time, but haven’t seen much information on it. It’s basically a variety of Crimean Tatar. Some sources call it Kipchak, but I’ll withhold judgement on that classification until I can inspect the data. Omniglot has a great bunch of resources on it. I’ll likely add it to the database once I’ve got some better data.
I’ve been working on doing a massive overhaul of the maps page. There were some issues with Leaflet.js, the map creation app that I’ve been using. Also, I’ve wanted to create a more granular, more accurate set of maps than I’d previously had.
What’s up now is a map showing census data from Russia, Ukraine, and Romania. The data gathering is slightly different in each country, so it shouldn’t be seen as comparing apples to apples. Russia is complete, Romania is complete, and Ukraine is nearly complete. The biggest surprise so far is the vast range covered by Gagauz speakers in Ukraine. Crimean Tatar is pretty much only in Crimea, while Karaim is very scattered. These are the only three Turkic languages covered in the 2001 Ukrainian census (at the village level), so that’s all I can show.
Once Ukraine is done, I have a lot of work to do:
- First, I need to edit the code so users can select only a single country or certain languages.
- Next, I plan to continue to search national census/statistics websites to see if this kind of granular information is available elsewhere.
- I’d like to include other types of maps where this data is not available. The Georgian census, for example, had province-level data about Azerbaijani speakers that I’d like to include is some other format.
- Finally, I’d like to restore some of my former maps: Baraba Tatar dialects, Khalaj and Urum villages, etc. Each map needs to have properly cited sources to ensure that anyone looking at this site isn’t comparing very different types of information.
Some quick updates:
I’ve added some Salar and Ili Salar forms, bringing the total number of entries up to 25,400. However, I’ve been really busy lately and haven’t had much time to put too much work into this website.
One of the things I’ve been working on is Wikipedia. I used to edit it quite a bit, but stopped until this past spring. I worked on a Wikipedia Edit-a-thon in May, and since then I’ve been really active. You can see my profile here.
I’ve also been busy writing a paper in library science, which will be published sometime in the next few months in the journal Library Resources & Technical Services. Exciting stuff!
Other things that have kept me busy are proofreading titles published by Language Science Press and, as of yesterday, I’ve been asked to review a manuscript for Archiv Orientální.
I really need to update my CV. You can find that on the main website.
Quite the milestone. The 25000th entry is the Turkmen word for “strawberry”: ýer tudanasy (yer tūdanasï). It’s a compound, with ýer meaning “ground”. The tudana (note the long vowel) part is somehow related to tut “mulberry” and refers to the fruit of the mulberry tree. It’s one of those words that was likely passed back and forth between languages.
I’ve added a few new references and have gone back to add long vowels into the Turkmen data. Turkmen preserves long vowels in places that other languages don’t, so it’s vital for reconstruction.
I’ve already got a list of 27 potential new glosses to add like “chalk” and “jaw” and “knot”, as well as a ton of verbs of motion like “get on”, “enter”, “exit”, “arrive”. Maybe once I’ve finished with the data from the last 50 glosses I’ll add these new ones.
While going through all of my sources again, I realized that I hadn’t entered much for Dolgan. This has been pretty quick going, thanks to Stachowski’s Dolganischer Wortschatz. Today I reached 24000 entries. Entry 24000 is the Dolgan word for “glass” – hǟrkälä. This word is pretty interesting – it’s ultimately a borrowing from Russian зеркало, which means mirror. It means mirror, too, in Dolgan, but also means glass. I haven’t been able to find any other forms with that meaning for Dolgan. Russian зеркало was borrowed into Sakha as сиэркилэ, where it also means mirror. I’m not convinced the Dolgan form is descended from the Sakha form, as the vowels are pretty different, but there is likely some relationship. It is likely that early Russian traders traded manufactured goods like mirrors with the locals, who borrowed the word. Apply some vowel harmony and Dolgan’s strong dislike for /s/-sounds, and you get hǟrkälä. As glass was the only unknown component of these traded mirrors, the terms became conflated.
As a side note, I chose the term “glass” because I wanted to see if the early Turks had access to this technology. Most Turkic languages either use terms for manufacture products (such as bottles or mirrors) to mean “glass”, or borrow from other languages. This indicates that glass was unknown to them in ancient times. Also, glassmaking was developed only about 4000 years ago in Mesopotamia and only in the 5th Century CE in China. So any glass objects that the oldest Turkic civilizations would have had would have to come from the Middle East or Europe, and would not have been made locally. This may tell us something about their metallurgical practices, as it is believed that glass was discovered as a byproduct of metallurgy, when hot metal came into contact with sand.
Naturally, after adding 50 new glosses to the database I’ve run across a new one that I’d like to add: chalk. Chuvash has пурӑ, пур, Kazakh and Kymyk have бор…
Wiktionary suggests that the Kazakh for comes from Russian бор “boron”, but this is clearly conflating the boron meaning with the chalk meaning. Fedotov says this is a native Turkic term and ties it to Sakha буор “earth, clay”. (Tuvan has пор and Tòfa has бор for clay as well). In Bashkir, the form is either бур or аҡбур, suggesting that the original term may have referred to crumbly stone or soil, with color terms used to distinguish between chalk, clay, etc.
I’ve entered forms for the latest 50 glosses for Turkish, Tuvan, Dzhungar Tuvan, Sakha, and Chuvash, and I’m working on Azerbaijani. Once I’ve made my rounds, I may add chalk to the database, plus whatever else I find.
As you can see above, there are a lot of cases where it could be useful to suggest related terms. Knowing that chalk is related to earth and clay could be beneficial. I may work on this in the near future as well.
Something that has been irking me is the inability (so far) to have kinship terminology in this site. The problem is that English, Russian, German, and French have kinship systems that are a bit more basic that those found in Turkic. Many of the languages I am aware of employ complex systems that distinguish maternal or paternal relationship, relative age, and gender. This means that many grammars and dictionaries will translate a term simply as brother (rather than older or younger brother) or aunt (rather than father’s sister or mother’s brother’s wife).
A second issue is that there is considerable variation both between languages/varieties and within languages. This makes comparison difficult. Also, many terms are borrowed from other languages, such as Turkish hala and teyze (maternal and paternal aunt, respectively), which were borrowed from Persian.
Perhaps I’ll work out a new scheme for more complicated lexemes and morphemes. Some day I’d like to have kinship terms, case morphology, verbal morphology and other forms; for now I’ll focus on more easily defined terms.