Linguistic Miscellany Thread

Natural languages and linguistics
User avatar
Ryusenshi
Posts: 383
Joined: Sun Jul 08, 2018 1:57 pm
Location: Somewhere in France

Re: Linguistic Miscellany Thread

Post by Ryusenshi »

The same story also appears in Isaiah (chapters 36 to 39).
zompist
Site Admin
Posts: 2680
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: Linguistic Miscellany Thread

Post by zompist »

Moose-tache wrote: Sun Jul 24, 2022 5:31 pm2 Kings.
Ah right, thanks. I should have checked!
Ares Land
Posts: 2814
Joined: Sun Jul 08, 2018 12:35 pm

Re: Linguistic Miscellany Thread

Post by Ares Land »

I got the story wrong, btw -- I didn't remember it very well.

Hezekiah's emissaries want to conduct the negotiations in Aramaic, but the Assyrian envoy insisted on Hebrew -- he wanted to deliver demoralizing propaganda.
MacAnDàil wrote: Sun Jul 24, 2022 10:24 am Well,that Assyrian envoy at least. This may or not have incidence on Chinese government use of English at the Olympics.
It seems the Assyrians really were assholes. The Persians had many faults, but they did make an effort to be decent rulers. By contrast the Assyrians were more destructive.
Last edited by Ares Land on Mon Jul 25, 2022 3:16 am, edited 1 time in total.
zompist
Site Admin
Posts: 2680
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: Linguistic Miscellany Thread

Post by zompist »

Ares Land wrote: Mon Jul 25, 2022 2:53 am It seems the Assyrian really were assholes. The Persians had many faults, but they did make an effort to be decent rulers. By contrast the Assyrians were more destructive.
Oh, no question about that. They regularly deported whole populations (Israel was only one instance); the kings boasted about how they terrorized everybody; they had no pretense of being benign overlords. (And it wasn't that everyone was like that-- the Kassites, by contrast, were pretty chill.)
Moose-tache
Posts: 1746
Joined: Fri Aug 24, 2018 2:12 am

Re: Linguistic Miscellany Thread

Post by Moose-tache »

More like Ass-yrians.
I did it. I made the world's worst book review blog.
User avatar
Raphael
Posts: 4145
Joined: Sun Jul 22, 2018 6:36 am

Re: Linguistic Miscellany Thread

Post by Raphael »

Why is there such a long and ignoble tradition of computer systems turning umlauts, letters with accents, and the like into completely different special characters seemingly at random?
User avatar
Ryusenshi
Posts: 383
Joined: Sun Jul 08, 2018 1:57 pm
Location: Somewhere in France

Re: Linguistic Miscellany Thread

Post by Ryusenshi »

Is that a rhetorical question? If it's not, I can start a rant on character encodings.
User avatar
Raphael
Posts: 4145
Joined: Sun Jul 22, 2018 6:36 am

Re: Linguistic Miscellany Thread

Post by Raphael »

It's not a rhetorical question, and feel free to rant about character encodings as much as you like.
Travis B.
Posts: 6237
Joined: Sun Jul 15, 2018 8:52 pm

Re: Linguistic Miscellany Thread

Post by Travis B. »

Raphael wrote: Wed Aug 03, 2022 1:00 pm Why is there such a long and ignoble tradition of computer systems turning umlauts, letters with accents, and the like into completely different special characters seemingly at random?
It's called that encodings other than UTF-8 have failed to be completely phased out throughout the entire computing world, so programs still get confused about encodings every so often.
Yaaludinuya siima d'at yiseka ha wohadetafa gaare.
Ennadinut'a gaare d'ate ha eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
User avatar
Ryusenshi
Posts: 383
Joined: Sun Jul 08, 2018 1:57 pm
Location: Somewhere in France

Re: Linguistic Miscellany Thread

Post by Ryusenshi »

The short version is: because most computer stuff has been developed in the U.S.A., and people from the U.S.A. tend to forget that there are letters beyond the basic Latin alphabet.

The long version
... would take several pages, and references. Maybe some other day.

The middle version

Text always has to be encoded: the letters have to be turned into bits, sequences of zeros and ones. The oldest encoding still in active use is ASCII, which covers the basic Latin alphabet, digits from 0 to 9, parentheses, brackets, and basic punctuation. This was enough for English. ASCII uses 7 bits (it was developed for teletype, not computers), which means 128 characters. As computers usually group bits by groups of 8, this means one bit, the first one, was left unused.

When non-Anglophones started using computers, they noticed that they needed letters that weren't in ASCII: accents, umlauts, or even non-Latin alphabets. Since they wanted to keep the compatibility with ASCII, they invented various forms of Extended ASCII. The point is that:
  • Bytes with the first bit equal to zero are the same as ASCII.
  • The other bytes (128 values with the first bit equal to one) are other characters: letters with accents or umlauts, non-Latin letters, etc.
For example, Latin-1 added letters used in Western Europe like é, ö, ñ, while ISCII added letters for Indic scripts.

The problem is that everybody went into a different direction. Windows-1252 and Mac OS Roman have about the same characters as Latin-1, but not in the same order. And there is no reliable way to say this text uses encoding X and be sure that the program that decodes your text knows it is in encoding X. So what happens is that someone wrote a text with the letter "ö" and encoded it as Windows-1252... but you read it on a Mac, your Mac thinks the text is in Mac OS Roman, and displays "^" instead. Since "normal" letters are part of ASCII and everybody is compatible with ASCII, there is no problem with them: only letters with umlauts get garbled.
User avatar
Ryusenshi
Posts: 383
Joined: Sun Jul 08, 2018 1:57 pm
Location: Somewhere in France

Re: Linguistic Miscellany Thread

Post by Ryusenshi »

The solution is Unicode: the One Standard to rule them all. Unicode is a collection of codes for almost every character under the Sun, at least the ones that are used in a real language, or have been used in some computer system.

The most widely used encoding for Unicode is UTF-8, which is a variable-length encoding. Some characters use only one byte: they are... well, precisely the characters from ASCII. Retro-compatibility is important. Other characters use two, three, or four bytes. With modern computers, the added space doesn't really matter.

UTF-8 is the future. There are some legitimate criticisms of Unicode, but nothing even comes close to it. UTF-8 allows you to encode everything: even if conlangs aren't supported by default, there are ways to add a conscript locally.

The problem is... not everything is up to speed. Sometimes a program still encodes text in Latin-1; sometimes a program receives text in UTF-8 but thinks it's in Windows-1252. So you get text where the "normal" Latin letters are correct but the umlauts are garbled. And that's a best-case scenario: texts in non-Latin scripts would be completely garbled.

For some applications (file names, email addresses and titles), I tend to err on the side of caution and limit myself to strict-ASCII characters. You never know.
User avatar
Raphael
Posts: 4145
Joined: Sun Jul 22, 2018 6:36 am

Re: Linguistic Miscellany Thread

Post by Raphael »

Thank you, very informative!

Edit: Is it the norm to write "USA" as "U.S.A." in French?
Travis B.
Posts: 6237
Joined: Sun Jul 15, 2018 8:52 pm

Re: Linguistic Miscellany Thread

Post by Travis B. »

I should note that back in the day when people used Teletypes, accented characters were generated by overprinting, i.e. a letter would be printed followed by a backspace character followed by an overprinted character. This is the real reason ASCII includes characters such as backticks and carets, for printing grave and circumflex characters. Of course this quickly became obsolete with the spread of video terminals.
Yaaludinuya siima d'at yiseka ha wohadetafa gaare.
Ennadinut'a gaare d'ate ha eetatadi siiman.
T'awraa t'awraa t'awraa t'awraa t'awraa t'awraa t'awraa.
MacAnDàil
Posts: 714
Joined: Thu Aug 09, 2018 4:10 pm

Re: Linguistic Miscellany Thread

Post by MacAnDàil »

Raphael wrote: Wed Aug 03, 2022 3:17 pm Thank you, very informative!

Edit: Is it the norm to write "USA" as "U.S.A." in French?
No, because it isn't a French word. The French word is Etats-Unis. It isn't a German word either, but allow me to rant about unnecessary Anglicisms in some modern German.
User avatar
Ryusenshi
Posts: 383
Joined: Sun Jul 08, 2018 1:57 pm
Location: Somewhere in France

Re: Linguistic Miscellany Thread

Post by Ryusenshi »

Raphael wrote: Wed Aug 03, 2022 3:17 pmEdit: Is it the norm to write "USA" as "U.S.A." in French?
Eh, not particularly. I can go either way.
Moose-tache
Posts: 1746
Joined: Fri Aug 24, 2018 2:12 am

Re: Linguistic Miscellany Thread

Post by Moose-tache »

I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?
I did it. I made the world's worst book review blog.
keenir
Posts: 774
Joined: Fri Apr 05, 2019 6:14 pm

Re: Linguistic Miscellany Thread

Post by keenir »

Moose-tache wrote: Tue Aug 09, 2022 2:24 am I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?
I can't say I've heard of either of them, but I'll search through my library's copy of The World's Writing Systems next time I'm there.

Happy hunting!
zompist
Site Admin
Posts: 2680
Joined: Sun Jul 08, 2018 5:46 am
Location: Right here, probably
Contact:

Re: Linguistic Miscellany Thread

Post by zompist »

keenir wrote: Tue Aug 09, 2022 5:20 pm I can't say I've heard of either of them, but I'll search through my library's copy of The World's Writing Systems next time I'm there.
I'll save you the trip: there's a single paragraph about pre-Hankul systems.
Ross King in WWS wrote:The Hyangchal system, preserved in lyric texts, is reminiscent in some ways of the Japanese man'yogana, on which it doubtless had a formative influence. The abbreviated characters of the Kwukyel system, a transcription for interpretation and translation of Chinese texts, resemble the Japanese kana in some way, just as the Kwukyel sytem or annotating Chinese texts resembles Japanese kambun traditions. The Itwu 'clerk readings' were a system of prose transcription used widely in administrative contexts. At the time of the promulgation of the Hwunmin cengum (1446), the Hyangchal system was moribund, but Kwukyel and Itwu were still in use long after the invention of the Korean alphabet.
No details or pictures. And it sure sounds like one of the uses of Kwukyel above was intended to refer to something else.
Moose-tache
Posts: 1746
Joined: Fri Aug 24, 2018 2:12 am

Re: Linguistic Miscellany Thread

Post by Moose-tache »

Hyangchal is basically the precursor to Manyogana. Idu is a slightly different set of phonograms, but operates on the same principles. Gugyeol is a notation system based on this Idu character set. The gugyeol set is very limited, since it's just a series of marks made to Chinese texts to allow them to be read with Korean grammatical information (kind of like an ancient Roman adding little case endings to French words so the sentence makes sense). But Hyangchal and Idu had hundreds of phonograms, and there doesn't appear to be a master list anywhere. Introductory texts in English and Korean from the last fifty years all fail to do this. For writing systems that were in continual use for about 1500 years, this is pretty shocking.
I did it. I made the world's worst book review blog.
Kuchigakatai
Posts: 1307
Joined: Mon Jul 09, 2018 4:19 pm

Re: Linguistic Miscellany Thread

Post by Kuchigakatai »

Moose-tache wrote: Tue Aug 09, 2022 2:24 am I'm tyring to learn about the pre-Hangeul writing systems of Korea, especially Idu and Hyangchal. But I'm struggling to find anything that lays out everything we know about these systems. Wikipedia links go to generic survey textbooks that say nothing. Academia.org has nothing. Google books has nothing. Somebody somewhere must have written about these systems in detail, in any language, but I can't find it. Any ideas?
I asked this around and someone from Korea gave me this link, which apparently contains... a lot of stuff about Joseon-period Idu (late Middle Korean ~ early modern Korean).

https://kostma.aks.ac.kr/dic/dicMain.aspx?mT=C

However, they would also like to warn that:
this isnt exactly accurate
like theres spelling errors everywhere with middle korean words
but thats what i usually look up
when i need to translate idu
[...]
i mean its enough for me
like the spelling errors are like
confusing arae a and normal a
which is understandable for modern speakers because they dont have that distinctuon anymore
They'd also like to point out that content words were written semantically, and it's the syllables of grammatical morphemes that were rather written with Chinese characters phonetically. It's not like you could write Middle Korean using Chinese characters phonetically only, the way the Japanese did with kana.

EDIT: someone else later added:
btw in hyangchal some hanja ARE used phonetically in spelling root words, but only usually to represent the final consonant or syllable
and i don't know of a full list of those, although wikipedia has something on their old korean article iirc
if you can find 향가 해독 자료집 it would probably help
Post Reply