I’ve hit 11000 forms in the database. This one’s not too exciting – it’s Turkish for “egg” – yumurta. Whenever I can’t hit an even number but want to insert a batch of forms, I’ll fill in the gaps with one of the easier languages. In this case it’s Turkish. Because Turkish is the easiest language to get data on, I’ve saved it for last.
In other news, I’ve come across discussion of language I’d never even heard of: Chanto. Chanto is a Turki variety spoken in Western Mongolia. The speakers identify as Uyghurs, although many sources call them Uzbeks. Their language is distinct enough to have its own entry, so I’ve set that up here. The same sources give some description of Altay Tuvan, and their data doesn’t neatly align with what I’ve already found. I’m not convinced that this new data is very good, but it’s all I’ve got to work with.
I’ve added the new site template to every page. It should make it easier to navigate. The old design required clicking around to find certain potentially useful pages. They are now available via the expanding menu.
I’ve finalized the new site template. Now I just need to update every site with the new template (it’s really just a new side menu). The references page already has it, so you can see what everything will look like.
Because of the reorganization, I’m going to have to create a few new pages and fix a few others. I plan on spitting up the classification page from the tree contained within, and the about page needs some serious cleanup.
I wish there was an easier way to work with site templates, but I can’t find one…
This language (group of languages? cluster of relatively unrelated dialects?) is causing me headaches. Doerfer and Hesche (1993) collects word lists from 20-something locations in Khorasan, and in this 1993 work they group the varieties spoken in these locations into 5 dialects (NW, NE, N, SE, SW). This seems like a straightforward matter of gathering the data from all of these points, then dividing it into 5 different varieties – tedious, but doable.
The problems arise when you start to find the other ways that Doerfer classifies Khorasani. Are there actually only two dialects (north and south)? Are there three, based on which branch of Oghuz Doerfer assigns the varieties to — Central Oghuz (Southwest), Southern Oghuz (Northwest, Southwest), and East Oghuz (North, Northeast)? What about the places where he claims that varieties in Turkmenistan and even Uzbekistan can be considered varieties of Khorasani? This is a historically reasonable proposition, but ignores the fact that Turkmen and Oghuz Uzbek have had a good century of independent development thanks to Soviet language policy.
Doerfer’s works are based on the speech varieties found in different towns. He (justifiably) makes no attempt to unify his data into an aggregate average that can be treated as a unique language or dialect. This means that there is a ton of data that I could potentially use, but the question is how do I use it?
I’ve hit 10000 entries in the lexical database. It would have been nice to have reached this at a nice stopping point, but I’m in the middle of entering Soyot terms. The 10000 entry is the Soyot word for ant: һымысқа. I estimate that I’ll be able to add about 5000 more entries, but more data may surface, so who knows.
Having reached this point I think it’s about time to reorganize the site. I have a better idea of what I want to do and need to make it more accessible.
For now, enjoy this map I’ve begun to mock up. Ideally, it will have overlays corresponding to different features. Once I’ve done a little cleanup this will need a home on the front page. Now on to finishing entering all the available data, figuring out what to do with Khorasani Turkic, working on a grammar template…
I’m continuing to re-tool the side menu. I’ve got some nifty drop-downs I would like to implement.
I’ve reached 9000 entries. Lately I’ve been working on some unfinished business with the Northern Altay varieties and have started work on the Oghuz varieties in Iran. The 9000 entry, however, is from Ös. It’s the word for “to open”: ač-.
I’ve added a count of entries to the list of languages. I’m getting better with PHP – to do this I had to embed an if statement inside of a foreach statement and it actually worked! I’m still having trouble getting variables to bind, which is frustrating. Sometimes they do, sometimes they don’t, sometimes I have to refer to row cell, sometimes I have to use a variable. I’m sure there’s a good reason for all of this…
Update…I’ve done a similar thing to the list of glosses. This was more straightforward because there’s nothing in the glosses table with 0 entries. (I’ve removed at from the list, even though it’s in the Swadesh list. We can assume it’s cognate across most languages.)
I hope this is useful. It will be great to be able to more easily do lexicostatistic glottochronology.
We’ve hit 8000 entries. The latest is Altay for ‘which’ – кажы / kažï. There are still 196 glosses without Altay entries, so there’s a lot of work to do still.
I’ve added two new glosses – shoulder and crow. I dislike adding new glosses to the database because it means I have to go back over all of my sources. In this case, I’d thought a long time about adding these two, and I’ve got my source data in some kind of order.
I’ve also set up this blog to track changes. The about page was getting cluttered, and setting this up is the first step in re-doing the entire site architecture. We’re approaching 8000 entries, which is exciting. I’ve located a pdf Russian-Altay dictionary that I can actually cut and paste from (!!!), so entry 8000 will likely be Altay. At some point I’ll actually finish adding Turkish, which really should be the easiest of all languages, since it’s the best documented.