20000 Entries!

I’ve hit a major milestone with 20000 entries. Entry 20000 is the Yurt Tatar word for eye: күз (küz).

I’ve mostly been working on Khorasani Turkic. This is a very tedious task, as it involves concatenating data from various locations into a single list for each of the five varieties. Doerfer and Hesche use a very narrow transcriptions, so all of the differences between towns have been transcribed. I just completed the Northwest dialect, which only has 2 data points and was therefore the easiest.

I added Yurt Tatar data to the latest Khorasani dump to reach a nice even number. I was able to get my hands on a grammatical sketch by Arslanov (1976), which has some great data on this very poorly attested variety.

18000 Entries

I’ve added a bunch of new glosses to the database (e.g. spider, jump, learn, valley, beard). As a result, there are a lot of new entries. Entry number 18000 is һирәә, which is the Soyot word for saw (tool). More to come!

17000 Entries!

I’ve hit entry 17000 today. The most recent is Teleut for “how”: қайында, кааньда . I’m at a point where I have so many data sources that I can’t forsee an end point. I still haven’t done anything much with Khorasani Turkic, which will result in another 1000 or so entries, have tons more Dolgan and Teleut to input, haven’t done as much as I would like with Orkhon Turkic, etc. etc.

16000 Forms

I realized that I never entered all of the Mrass Shor data I have access to, which means that the count is growing rapidly. The 16000th form is Mrass Shor for cold – sooq.

I’ve been slowly adding entries for the new glosses (honey, navel, dream, etc.) which has been slow, but productive. Now that my gloss count is up to 350, I have a mind to just keep adding glosses that I find interesting. I have about 50 more that I’ve considered adding – hammer, jump, needle, axe, birch… I’ve tried to be systematic in entering data, just because it’s annoying to have to revisit a lot of my data sources. I’ll think about it a bit more before deciding how to approach this.

I’ve also made a few adjustments to the website itself. The Missing Page is a bit more functional. I had originally set it up for my own personal use, but figured I may as well share it. Another update on my agenda is to combine the two pages that I use for the Compare feature. This is pretty trivial and I should have thought of it long ago.

New Languages!

I’ve been adding new material left and right. As noted previously, I recently added Ili Salar. I’ve also found some decent Gagauz materials, and have been adding entries as I come across them. My Afghan Uzbek-Turkish dictionary just arrived, so I’ve added Afghan Uzbek (Sar-e Pol). I’m still waiting for the work on the Samangan/Aybak dialect, which I believe is more Kipchak in nature.

I found the most incredible open access journal: Tehlikedeki Diller Dergisi. They occasionally have grammatical sketches, which are really fantastic to have. I’ve added a ton of Dolgan forms, and will be adding a new language, Kalmak. I’ve previously treated Kalmak as a variety of Tomsk Tatar, but the author has convinced me that it’s distinct enough to warrant its own page. I’m well over 15000 entries and will likely pass 15400 very soon.

While I’m talking about great sources of information, I would remiss if I didn’t mention CyberLeninka, which is a source for open access Russian journals. Also, the Russian State Library has begun digitizing a lot of its collections, which means that tons of dissertations and other materials are now freely available.

I went for a long time without finding much new material, and now it just won’t stop. Here’s to hitting 16000 soon.

Salar!

I’ve added a lot more Salar material. The source I took from uses Pinyin, which is not at all ideal for writing a Turkic language. I’m beginning to doubt whether it was worth adding this new material at all…

I’ve created a completely new page for the Ili variety of Salar, which is different enough that Dwyer considers it a separate dialect. The data for this dialect is really good. I’ve surpassed 15,000 entries in the spreadsheet use to collect my data, but I haven’t entered it into the actual database. I imagine it will all be up in a day or two.

Done with Urum, on to Afghan Uzbek

I’ve finally finished with Urum. What a pain in the neck. This brings the count of total entries up to 14608. I know it’s not a nice round number, but I couldn’t bear to leave off the last 8 Urum entries.

I have some Afghan Uzbek materials on the way. There’s not much written about Afghan Uzbek, but from what I can tell from past research, there is a ton of variation. At least one variety (hopefully the one in the next book I’m getting) is pretty clearly Kipchak.

I re-did the main site using some PHP tricks to get everything into a single document. I would love to do this with the Turkic site, but I’m worried that it might get messy… Maybe if I just shove the entire contents of a single page into a database or set them up as variables in separate PHP files… We will see.

Still here…

I’ve been a bit slower about adding new forms lately. I have reached 14,450 entries.

I’ve added 191 Cuman forms from the Codex Cumanicus. Cuman is challenging because it’s transcribed using medieval Italian conventions, which don’t do well with a lot of Turkic sounds.

I’ve also been working through Urum. This has been incredibly tedious, as its mixed nature means that every gloss has a ton of forms. Occasionally you run across a Greek word, which is exciting.

After I’m done with Urum I’d like to work on some more medieval Turkic languages. It’s harder to find data on these, but I’ve got hopes that I can get some Mamluk or Bolgar data.

14,000

Well, I’ve been on a roll. I’ve just added entry no. 14,000. This latest is the Krymchak word for ‘flower’ – čiček.

I’ve got a lot more Krymchak to add, too. Every time I think I’m running out of languages or sources I find more and more. Krymchak is an interesting case because the dictionary my library holds was shelved under PJ; the Library of Congress classification scheme arbitrarily classes ‘Other languages used by Jews’ at the end of the Hebrew range. Our Karaim materials, however, were classed under PL with the rest of the Turkic language material. Weird.

I’m still considering what to do with the glosses I proposed in the previous post – still no decision.

Also, I’d like to set up a page devoted to Crimean Turkic, maybe even incorporating a fancy interactive map. We’ll see…

A brief history, new glosses?

This project started as something I began many years ago. When I lived in Turkey, my host mother had a comparative dictionary of the Turkic languages, and I would marvel at how similar yet different they could be. Later, I began my own database of sorts on a series of notecards. I eventually tossed the notecards and moved on, in part because I hadn’t been consistent in my transcription and because they were cumbersome to transport and use.

More recently, I took a class on SQL, and once my instructor recommended I look into PHP, I realized I could publish data to the web. It was then that I began putting this all together – at first on my hard drive, and later on this website.

Thinking back to those original cards, I have come remember that I once had a number of glosses that do not exist in this database. And looking through the many, many dictionaries, grammars, and field reports in my references, I have come to realize that other authors found some of these glosses to be important as well. Adding new glosses is no small task as it can be annoying to have to revisit old sources. In some cases, I may have to wait weeks, as I obtained them through interlibrary loan.

For now, here is a preliminary list of the new glosses I have considered adding:

  • butterfly
  • fly
  • walnut
  • hammer
  • ax
  • bee
  • honey
  • wool
  • thread
  • footprint
  • penis
  • vulva
  • urine
  • feces

I was on the fence about the last four, given their taboo nature. However, they do show up in sources with surprising frequency. Even the Codex Cumanicus has them. If it’s good enough for Late Medieval Italians, it’s good enough for me.

Because I’m a bit obsessive (as the existence of this site shows), I might try to add a few more to achieve a nice, round number. However, adding these 14 will bring the total to 365, which is certainly nice.