Hacker News: ssokolow

New comment by ssokolow in "Memory safety absolutists"

ssokolow — Sun, 26 Jul 2026 08:11:31 +0000

https://play.rust-lang.org/?version=stable&mode=debug&editio...

New comment by ssokolow in "Memory safety absolutists"

ssokolow — Sun, 26 Jul 2026 08:03:46 +0000

The word "correct" already had a meaning. Memory safety is a subset of correctness in the same way that Rust can't statically prevent race conditions but it can prevent a subset of them called data races.

New comment by ssokolow in "Memory safety absolutists"

ssokolow — Sun, 26 Jul 2026 02:04:43 +0000

Bearing in mind that "there should be no room for a language above assembly and below Rust" (i.e. something like what C or C++ is to Python, Java, C#, etc.) was an intentional design decision for Rust which resulted in `unsafe`.

New comment by ssokolow in "Memory safety absolutists"

ssokolow — Sun, 26 Jul 2026 02:02:10 +0000

True... but if you go by what you're most likely to encounter, I'd argue the biggest unaddressed flaw in Rust is the lack of a current, maintained analogue to tools like Rustig! and findpanics, which would analyze your compiled, optimized binary and report any code paths which lead to panics.

I don't like having to contort my code to fit each unit of work into the API of `std::panic::catch_unwind` so I can responsibily distrust the transitive dependencies beyond the reach of my Clippy lints.

New comment by ssokolow in "Memory safety absolutists"

ssokolow — Sun, 26 Jul 2026 01:57:03 +0000

*nod* They're from before I started bookmarking rigorously, but in the early days of Rust, there was a sentiment in the blog posts from the people developing it that the point of Rust was "to give good ideas a second chance" and that Rust was intentionally boring and un-innovative.

New comment by ssokolow in "U+237C ⍼ Is Azimuth"

ssokolow — Fri, 13 Mar 2026 08:07:20 +0000

*nod* As-is, we're stuck with hacks like custom shortcodes and emoji.

...though, given the inconsistent naming of consistently laid-out buttons, I think anything that makes its way into Unicode should include something that follows the lead of what Batocera Linux does on their Wiki and with custom emojis in their Discord.

See https://wiki.batocera.org/configure_a_controller for an example of how they look inline but the gist is that it's an outline of the SNES-originated diamond of action buttons that pretty much everyone but Nintendo uses these days and which is embodied in XInput and the SDL Gamepad API, with one of the circles filled in to represent the button in question.

New comment by ssokolow in "U+237C ⍼ Is Azimuth"

ssokolow — Thu, 12 Mar 2026 06:43:36 +0000

By "game tutorials", I think they mean modern successors to the role GameFAQs used to play.

There is a combining character that, by its description, sounds like it should be implemented to do the desired thing (U+20DD Combining Enclosing Circle), but my fonts don't render it very well when I stuff geometric characters matching the PlayStation buttons into it.

Without spaces: △⃝□⃝×⃝○⃝

With two spaces between each one so you can see how "enclosing" is getting interpreted: △⃝ □⃝ ×⃝ ○⃝

For the Markdown renderer I'm working on to replace WordPress for my blog, I resorted to shortcodes which resolve to CSS styling the `` tag with `title` attributes to clarify and the occasional bit of inline SVG for things where I didn't want to specify a fixed font to get sufficient consistency, like PlayStation button glyphs.

https://imgur.com/a/1EPm7QV

(In all fairness, it's a nerd-snipe made based on the idea that I'll be more willing to blog about things I have nice tools for. I don't currently typeset button presses in any form.)

New comment by ssokolow in "Why Amazon etc. are building servers in Rust but you should probably not"

ssokolow — Wed, 04 Sep 2024 09:49:00 +0000

I know Sylvain Kerkour is a perennial "Rust should be more like Go. I don't care that they're trying to meet different needs" person and has been for many years now, but I do wish we could at least get a little acknowledgement that Rust's design took a great deal of influence from Python, both on what worked and what didn't, and that this was a direct response to how, as Amber Brown put it, Python has batteries included, but they're leaking.

Python is the most infamous example of how putting something in the standard library doesn't automatically mean everyone will use it.

For example, as of the end of the Python 2.x cycle, Python had urllib and urllib2 in the standard library and everyone said to ignore them and use Requests... which contains a urllib3, the maintainers of which refuse to ever add to the standard library.

Python had/has a bunch of "use Twisted instead" network protocol implementations. Python's standard library XML implementations carry a big warning to use the third-party `defusedxml` crate if you are processing untrusted data. etc. etc. etc.

I have next to no Java experience, but I vaguely remember it also having some similar cases of common wisdom being to ignore the standard library-provided solution.

New comment by ssokolow in "Porting Libyaml to Safe Rust: Some Thoughts"

ssokolow — Wed, 14 Feb 2024 04:20:39 +0000

That's fair.

I still think the "because it's not in the fast path" part of "Most software will not see a bottleneck because of bounds checking because it's not in the fast path" is a bit too much of a blanket statement and could detract from the admonition to benchmark very carefully before optimizing but, otherwise, I agree.

New comment by ssokolow in "Porting Libyaml to Safe Rust: Some Thoughts"

ssokolow — Fri, 09 Feb 2024 14:59:10 +0000

*nod* Give https://blog.readyset.io/bounds-checks/ a read.

They tried doing a comparison between ReadySet compiled normally and ReadySet with bounds checking removed so thoroughly that they needed to use a patched toolchain to achieve it and found the difference to be within the noise threshold.

Their conclusion was:

> At the end of the day, it seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible. It’s tough to say precisely why this is, but my intuition is that CPU branch prediction is simply good enough in practice that the cost of the extra couple of instructions and a branch effectively ends up being zero - and compilers like LLVM are good enough at local optimizations to optimize most bounds checks away entirely. Not to mention, it’s likely that quite a few (if not the majority) of the bounds checks we removed are actually necessary, in that they’re validating some kind of user input or other edge conditions where we want to panic on an out of bounds access.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Wed, 04 Oct 2023 01:25:45 +0000

*nod* ...and stemming is that taken to a greater extreme.

I was just pointing out that Unicode itself has various forms of normalization and normalization-adjacent functionality that people are far too unaware of.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:22:47 +0000

> Before comparing strings or searching for a substring, normalize!

...and learn about the TR39 Skeleton Algorithm for Unicode Confusables. Far too few people writing spam-handling code know about that thing.

(Basically, it generates matching keys from arbitrary strings so that visually similar characters compare identical, so those Disqus/Facebook/etc. spam messages promoting things like BITCO1N pump-and-dumps or using esoteric Unicode characters to advertise work-from-home scams will be wasting their time trying to disguise their words.)

...and since it's based on a tabular plaintext definition file, you can write a simple parser and algorithm to work it in reverse and generate sample spam exploiting that approach if you want.

https://www.unicode.org/Public/security/latest/confusables.t...

> and CD-ROM!

I think you mean Microsoft Windows's Joliet extensions to ISO9660 which, by the way, use UCS-2, not UTF-16. (Try generating an ISO on Linux (eg. using K3b) with the Joliet option enabled and watch as filenames with emoji outside the Basic Multilingual Plane cause the process to fail.)

The base ISO9660 filesystem uses bytewise-encoded filenames.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:14:47 +0000

My anglophone Canadian brother's name is André. Even if you're fine with alienating the ~50% of the world population using non-latin writing systems, probably best to at least stick to the stuff covered by the latin1 legacy encoding.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:11:21 +0000

Technically, a superset would have to somehow Schrödinger's cat around \ in latin1 and ¥ in Shift-JIS being the same codepoint.

Unicode just took it upon themselves to reliably round-trip legacy text... thus the precomposed forms.

Most of the other complexity and technical debt is in the writing systems themselves.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:07:39 +0000

According to a sibling to what you replied to, it's because the shapes of the glyphs are still under copyright by known-litigious rightsholders and the Unicode consortium doesn't want to subject font authors to that.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:06:34 +0000

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:04:18 +0000

"Extended (Grapheme Cluster)".

The .graphemes() method in Rust's unicode-segmentation crate takes an is_extended boolean as an argument and, if you set it to false, you're iterating legacy grapheme clusters.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 04:01:44 +0000

Cookie consent is only necessary if you're sharing it with others (eg. ad networks, Google Analytics, etc.) or using it for "non-essential" functions (again, stuff like analytics). Sites just don't want the general public to realize that.

As for the mouse cursors, I don't think they qualify as personal information under the GDPR, but IANAL.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 03:57:10 +0000

Yes. Give https://www.youtube.com/watch?v=_mZBa3sqTrI a watch... especially the "Oh my God! We've been hacked!" part at 36:20.

TL;DR: They had a transient glitch in their network switch and, because Windows uses UTF-16 when sending remote event logs over the wire, whenever it dropped a single byte, it had the effect of swapping the endianness of the messages, resulting in scary Chinese text in the logs.

You could get the same effect by naively applying byte-wise processing to UTF-16 or UTF-32, or having an off-by-one error.

UTF-8 is self-synchronizing so one-byte errors like that only lose you one character, rather than corrupting the entire stream going forward.

New comment by ssokolow in "What every software developer must know about Unicode in 2023"

ssokolow — Tue, 03 Oct 2023 03:51:06 +0000

Give the "Indic scripts" section of https://manishearth.github.io/blog/2017/01/15/breaking-our-l... a read.

TL;DR: Unicode is complicated because some non-Latin writing systems are complicated and those non-Latin writing systems account for over a quarter of the world's population. (They're either majority or present in India, Indonesia, Pakistan, Bangladesh, the Philippines, etc.)