I’m creating a corpus for public distribution, but I’m stuck on a highly crucial point in the project: the corpus needs a name!
Some corpora have better names than others. There are corpora where the name is simply the initials: the British National Corpus is just the BNC, pronounced (thankfully!) just as “bee en see.” But some corpora have really nice names, like COCA (Corpus of Contemporary American English), ICE (International Corpus of English), VOICE (Vienna-Oxford International Corpus of English), and COLT (the Bergen Corpus of London Teenage Language… which you might think would be BCLT, but COLT is sooo much better). The most clever corpus names, I think, are CHILDES (Child Language Data Exchange System; actually much more than a single corpus, but close enough) and SCOTS (Scottish Corpus of Texts and Speech), which both describe the thing they stand for (there must be a term for that kind of acronym…). There are also corpus names that are not acronyms, and which only describe what they are: Switchboard is a corpus of collected telephone calls, CallFriend is a corpus of calls specifically between friends, and the Buckeye Corpus is the speech of Buckeyes (speakers in Columbus, Ohio)!
My corpus, based on my PhD research, consists of sociolinguistic interviews (in the form of anonymized sound files, transcripts, and phonetic alignments) among San Franciscans who grew up in the western neighborhood known as the Sunset District. The interviews (and reading passages and wordlists) are all in English, which thus gives me the following letters to play with for an acronym for my corpus: S(an), F(rancisco), S(unset), D(istrict), E(nglish), I(nterviews), and C(orpus). But coming up with a cute acronym has proven difficult. The most obvious one is the Corpus of San Francisco English: COSFE. But “COSFE” sounds horrible! What does that last vowel even sound like, exactly? Adding “Interviews” to the end makes it “COSFEI,” which disambiguates the vowel sound, but it still doesn’t sound very nice at all. I tried playing around “SD,” for Sunset Distict, rather than “SF,” but that didn’t get me any further. So, should I just continue to call it the boring thing I’ve affectionately been calling it from the beginning: the SanFran Corpus? Or how about the Sunset Corpus (which unfortunately makes it seem like it’s the corpus to end all corpora)?
I’ve already given this waaay too much thought, so I leave it to you….