Linguistic Miscellany Thread

Ryusenshi · Post by **Ryusenshi** » Sun Jul 24, 2022 5:47 pm

The same story also appears in Isaiah (chapters 36 to 39).

Post by **zompist** » Sun Jul 24, 2022 5:50 pm

Moose-tache wrote: ↑Sun Jul 24, 2022 5:31 pm2 Kings.

Ah right, thanks. I should have checked!

Ares Land · Post by **Ares Land** » Mon Jul 25, 2022 2:53 am

I got the story wrong, btw -- I didn't remember it very well.

Hezekiah's emissaries want to conduct the negotiations in Aramaic, but the Assyrian envoy insisted on Hebrew -- he wanted to deliver demoralizing propaganda.

MacAnDàil wrote: ↑Sun Jul 24, 2022 10:24 am Well,that Assyrian envoy at least. This may or not have incidence on Chinese government use of English at the Olympics.

It seems the Assyrians really were assholes. The Persians had many faults, but they did make an effort to be decent rulers. By contrast the Assyrians were more destructive.

Post by **zompist** » Mon Jul 25, 2022 3:11 am

Ares Land wrote: ↑Mon Jul 25, 2022 2:53 am It seems the Assyrian really were assholes. The Persians had many faults, but they did make an effort to be decent rulers. By contrast the Assyrians were more destructive.

Oh, no question about that. They regularly deported whole populations (Israel was only one instance); the kings boasted about how they terrorized everybody; they had no pretense of being benign overlords. (And it wasn't that everyone was like that-- the Kassites, by contrast, were pretty chill.)

Moose-tache · Post by **Moose-tache** » Mon Jul 25, 2022 11:13 am

More like Ass-yrians.

Raphael · Post by **Raphael** » Wed Aug 03, 2022 1:00 pm

Why is there such a long and ignoble tradition of computer systems turning umlauts, letters with accents, and the like into completely different special characters seemingly at random?

Ryusenshi · Post by **Ryusenshi** » Wed Aug 03, 2022 1:30 pm

Is that a rhetorical question? If it's not, I can start a rant on character encodings.

Raphael · Post by **Raphael** » Wed Aug 03, 2022 1:56 pm

It's not a rhetorical question, and feel free to rant about character encodings as much as you like.

Travis B. · Post by **Travis B.** » Wed Aug 03, 2022 2:03 pm

Raphael wrote: ↑Wed Aug 03, 2022 1:00 pm Why is there such a long and ignoble tradition of computer systems turning umlauts, letters with accents, and the like into completely different special characters seemingly at random?

It's called that encodings other than UTF-8 have failed to be completely phased out throughout the entire computing world, so programs still get confused about encodings every so often.

Ryusenshi · Post by **Ryusenshi** » Wed Aug 03, 2022 2:26 pm

The short version is: because most computer stuff has been developed in the U.S.A., and people from the U.S.A. tend to forget that there are letters beyond the basic Latin alphabet.

The long version... would take several pages, and references. Maybe some other day.

The middle version

Text always has to be encoded: the letters have to be turned into bits, sequences of zeros and ones. The oldest encoding still in active use is ASCII, which covers the basic Latin alphabet, digits from 0 to 9, parentheses, brackets, and basic punctuation. This was enough for English. ASCII uses 7 bits (it was developed for teletype, not computers), which means 128 characters. As computers usually group bits by groups of 8, this means one bit, the first one, was left unused.

When non-Anglophones started using computers, they noticed that they needed letters that weren't in ASCII: accents, umlauts, or even non-Latin alphabets. Since they wanted to keep the compatibility with ASCII, they invented various forms of Extended ASCII. The point is that:

Bytes with the first bit equal to zero are the same as ASCII.
The other bytes (128 values with the first bit equal to one) are other characters: letters with accents or umlauts, non-Latin letters, etc.

For example, Latin-1 added letters used in Western Europe like é, ö, ñ, while ISCII added letters for Indic scripts.

The problem is that everybody went into a different direction. Windows-1252 and Mac OS Roman have about the same characters as Latin-1, but not in the same order. And there is no reliable way to say this text uses encoding X and be sure that the program that decodes your text knows it is in encoding X. So what happens is that someone wrote a text with the letter "ö" and encoded it as Windows-1252... but you read it on a Mac, your Mac thinks the text is in Mac OS Roman, and displays "^" instead. Since "normal" letters are part of ASCII and everybody is compatible with ASCII, there is no problem with them: only letters with umlauts get garbled.

Ryusenshi · Post by **Ryusenshi** » Wed Aug 03, 2022 2:42 pm

The solution is Unicode: the One Standard to rule them all. Unicode is a collection of codes for almost every character under the Sun, at least the ones that are used in a real language, or have been used in some computer system.

The most widely used encoding for Unicode is UTF-8, which is a variable-length encoding. Some characters use only one byte: they are... well, precisely the characters from ASCII. Retro-compatibility is important. Other characters use two, three, or four bytes. With modern computers, the added space doesn't really matter.

UTF-8 is the future. There are some legitimate criticisms of Unicode, but nothing even comes close to it. UTF-8 allows you to encode everything: even if conlangs aren't supported by default, there are ways to add a conscript locally.

The problem is... not everything is up to speed. Sometimes a program still encodes text in Latin-1; sometimes a program receives text in UTF-8 but thinks it's in Windows-1252. So you get text where the "normal" Latin letters are correct but the umlauts are garbled. And that's a best-case scenario: texts in non-Latin scripts would be completely garbled.

For some applications (file names, email addresses and titles), I tend to err on the side of caution and limit myself to strict-ASCII characters. You never know.

Raphael · Post by **Raphael** » Wed Aug 03, 2022 3:17 pm

Thank you, very informative!

Edit: Is it the norm to write "USA" as "U.S.A." in French?

Travis B. · Post by **Travis B.** » Wed Aug 03, 2022 3:35 pm

I should note that back in the day when people used Teletypes, accented characters were generated by overprinting, i.e. a letter would be printed followed by a backspace character followed by an overprinted character. This is the real reason ASCII includes characters such as backticks and carets, for printing grave and circumflex characters. Of course this quickly became obsolete with the spread of video terminals.

MacAnDàil · Post by **MacAnDàil** » Wed Aug 03, 2022 4:10 pm

Raphael wrote: ↑Wed Aug 03, 2022 3:17 pm Thank you, very informative!

Edit: Is it the norm to write "USA" as "U.S.A." in French?

No, because it isn't a French word. The French word is Etats-Unis. It isn't a German word either, but allow me to rant about unnecessary Anglicisms in some modern German.

Ryusenshi · Post by **Ryusenshi** » Wed Aug 03, 2022 4:12 pm

Raphael wrote: ↑Wed Aug 03, 2022 3:17 pmEdit: Is it the norm to write "USA" as "U.S.A." in French?

Eh, not particularly. I can go either way.

Moose-tache · Post by **Moose-tache** » Tue Aug 09, 2022 2:24 am

I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?

keenir · Post by **keenir** » Tue Aug 09, 2022 5:20 pm

Moose-tache wrote: ↑Tue Aug 09, 2022 2:24 am I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?

I can't say I've heard of either of them, but I'll search through my library's copy of The World's Writing Systems next time I'm there.

Happy hunting!

Post by **zompist** » Tue Aug 09, 2022 7:28 pm

keenir wrote: ↑Tue Aug 09, 2022 5:20 pm I can't say I've heard of either of them, but I'll search through my library's copy of The World's Writing Systems next time I'm there.

I'll save you the trip: there's a single paragraph about pre-Hankul systems.

Ross King in WWS wrote:The Hyangchal system, preserved in lyric texts, is reminiscent in some ways of the Japanese man'yogana, on which it doubtless had a formative influence. The abbreviated characters of the Kwukyel system, a transcription for interpretation and translation of Chinese texts, resemble the Japanese kana in some way, just as the Kwukyel sytem or annotating Chinese texts resembles Japanese kambun traditions. The Itwu 'clerk readings' were a system of prose transcription used widely in administrative contexts. At the time of the promulgation of the Hwunmin cengum (1446), the Hyangchal system was moribund, but Kwukyel and Itwu were still in use long after the invention of the Korean alphabet.

No details or pictures. And it sure sounds like one of the uses of Kwukyel above was intended to refer to something else.

Moose-tache · Post by **Moose-tache** » Wed Aug 10, 2022 7:07 am

Hyangchal is basically the precursor to Manyogana. Idu is a slightly different set of phonograms, but operates on the same principles. Gugyeol is a notation system based on this Idu character set. The gugyeol set is very limited, since it's just a series of marks made to Chinese texts to allow them to be read with Korean grammatical information (kind of like an ancient Roman adding little case endings to French words so the sentence makes sense). But Hyangchal and Idu had hundreds of phonograms, and there doesn't appear to be a master list anywhere. Introductory texts in English and Korean from the last fifty years all fail to do this. For writing systems that were in continual use for about 1500 years, this is pretty shocking.

Kuchigakatai · Post by **Kuchigakatai** » Wed Aug 10, 2022 10:46 am

Moose-tache wrote: ↑Tue Aug 09, 2022 2:24 am I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?

I asked this around and someone from Korea gave me this link, which apparently contains... a lot of stuff about Joseon-period Idu (late Middle Korean ~ early modern Korean).

https://kostma.aks.ac.kr/dic/dicMain.aspx?mT=C

However, they would also like to warn that:

this isnt exactly accurate
like theres spelling errors everywhere with middle korean words
but thats what i usually look up
when i need to translate idu
[...]
i mean its enough for me
like the spelling errors are like
confusing arae a and normal a
which is understandable for modern speakers because they dont have that distinctuon anymore

They'd also like to point out that content words were written semantically, and it's the syllables of grammatical morphemes that were rather written with Chinese characters phonetically. It's not like you could write Middle Korean using Chinese characters phonetically only, the way the Japanese did with kana.

EDIT: someone else later added:

btw in hyangchal some hanja ARE used phonetically in spelling root words, but only usually to represent the final consonant or syllable
and i don't know of a full list of those, although wikipedia has something on their old korean article iirc
if you can find 향가 해독 자료집 it would probably help

Zompist Bboard Again

Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread

Re: Linguistic Miscellany Thread