Hacker News: panpog

New comment by panpog in "UTF-8 is a brilliant design"

panpog — Sat, 13 Sep 2025 17:20:40 +0000

Why did Unicode want codepointwise round-tripping? One codepoint in a legacy encoding becoming two in Unicode doesn't seem like it should have been a problem. In other words, why include precomposed characters in Unicode?

New comment by panpog in "Corrected UTF-8 (2022)"

panpog — Mon, 07 Jul 2025 06:48:27 +0000

You still get the combinatoric explosion, but you have more bits to work with. Imagine if you could combine any 9 jamo into a single hangul syllable block. (The real combinatorics is more complicated, and I don't know if it's this bad.) Encoding just the 24 jamo and a a control character requires 25 codepoints. Giving each syllable block its own codepoint would require 24^9>2^32 codepoints.

New comment by panpog in "Corrected UTF-8 (2022)"

panpog — Mon, 07 Jul 2025 02:11:41 +0000

Can you fit everything into 32 bits? I have no idea, but Hangul and indict scripts seem like they might have a combinatoric explosion of infrequently used characters.

New comment by panpog in "Corrected UTF-8 (2022)"

panpog — Mon, 07 Jul 2025 01:48:21 +0000

Of course you sometimes need tailoring to a particular language. On the other hand, I don't see how encoding untailered casing would make tailored casing harder.

New comment by panpog in "Corrected UTF-8 (2022)"

panpog — Sun, 06 Jul 2025 21:33:05 +0000

It seems plausible that this could be made efficiently doable byte-wise. For example, C3 xx could be made to uppercase to C4 xx. Unicode actually does structure its codespace to make certain properties easier to compute, but those properties are mostly related to legacy encodings, and things are designed with USC2 or UTF32 in mind, not UTF8.

It’s also not clear to me that the code point is a good abstraction in the design of UTF8. Usually, what you want is either the byte or the grapheme cluster.

New comment by panpog in "BusyBeaver(6) Is Quite Large"

panpog — Sun, 29 Jun 2025 18:28:20 +0000

There is only one integer k that we can actually write down (given much more paper than could fit in the universe) such that ZFC+ “BB(748)=k” is consistent. However, given that same k, ZFC+ “BB(748)≠k” is also consistent. ZFC+ “BB(748)≠k” has theorems that can be thought of as it being wrong about what “finite” means.