<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ssokolow</title><link>https://news.ycombinator.com/user?id=ssokolow</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 15 May 2026 21:05:05 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ssokolow" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ssokolow in "U+237C ⍼ Is Azimuth"]]></title><description><![CDATA[
<p>*nod* As-is, we're stuck with hacks like custom shortcodes and emoji.<p>...though, given the inconsistent naming of consistently laid-out buttons, I think anything that makes its way into Unicode should include something that follows the lead of what Batocera Linux does on their Wiki and with custom emojis in their Discord.<p>See <a href="https://wiki.batocera.org/configure_a_controller" rel="nofollow">https://wiki.batocera.org/configure_a_controller</a> for an example of how they look inline but the gist is that it's an outline of the SNES-originated diamond of action buttons that pretty much everyone but Nintendo uses these days and which is embodied in XInput and the SDL Gamepad API, with one of the circles filled in to represent the button in question.</p>
]]></description><pubDate>Fri, 13 Mar 2026 08:07:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47361774</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=47361774</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47361774</guid></item><item><title><![CDATA[New comment by ssokolow in "U+237C ⍼ Is Azimuth"]]></title><description><![CDATA[
<p>By "game tutorials", I think they mean modern successors to the role GameFAQs used to play.<p>There <i>is</i> a combining character that, by its description, sounds like it <i>should</i> be implemented to do the desired thing (U+20DD Combining Enclosing Circle), but my fonts don't render it very well when I stuff geometric characters matching the PlayStation buttons into it.<p>Without spaces:
△⃝□⃝×⃝○⃝<p>With two spaces between each one so you can see how "enclosing" is getting interpreted:
△⃝  □⃝  ×⃝  ○⃝<p>For the Markdown renderer I'm working on to replace WordPress for my blog, I resorted to shortcodes which resolve to CSS styling the `<kbd>` tag with `title` attributes to clarify and the occasional bit of inline SVG for things where I didn't want to specify a fixed font to get sufficient consistency, like PlayStation button glyphs.<p><a href="https://imgur.com/a/1EPm7QV" rel="nofollow">https://imgur.com/a/1EPm7QV</a><p>(In all fairness, it's a nerd-snipe made based on the idea that I'll be more willing to blog about things I have nice tools for. I don't currently typeset button presses in any form.)</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:43:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347281</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=47347281</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347281</guid></item><item><title><![CDATA[New comment by ssokolow in "Why Amazon etc. are building servers in Rust but you should probably not"]]></title><description><![CDATA[
<p>I know Sylvain Kerkour is a perennial "Rust should be more like Go. I don't care that they're trying to meet different needs" person and has been for many years now, but I do wish we could at <i>least</i> get a little acknowledgement that Rust's design took a great deal of influence from Python, both on what worked and what didn't, and that this was a direct response to how, as Amber Brown put it, Python has batteries included, but they're leaking.<p>Python is the most infamous example of how putting something in the standard library doesn't automatically mean everyone will use it.<p>For example, as of the end of the Python 2.x cycle, Python had urllib and urllib2 in the standard library and everyone said to ignore them and use Requests... which contains a urllib3, the maintainers of which refuse to ever add to the standard library.<p>Python had/has a bunch of "use Twisted instead" network protocol implementations. Python's standard library XML implementations carry a big warning to use the third-party `defusedxml` crate if you are processing untrusted data. etc. etc. etc.<p>I have next to no Java experience, but I vaguely remember it also having some similar cases of common wisdom being to ignore the standard library-provided solution.</p>
]]></description><pubDate>Wed, 04 Sep 2024 09:49:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=41443750</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=41443750</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41443750</guid></item><item><title><![CDATA[New comment by ssokolow in "Porting Libyaml to Safe Rust: Some Thoughts"]]></title><description><![CDATA[
<p>That's fair.<p>I still think the "because it's not in the fast path" part of "Most software will not see a bottleneck because of bounds checking because it's not in the fast path" is a bit too much of a blanket statement and could detract from the admonition to benchmark very carefully before optimizing but, otherwise, I agree.</p>
]]></description><pubDate>Wed, 14 Feb 2024 04:20:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=39366446</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=39366446</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39366446</guid></item><item><title><![CDATA[New comment by ssokolow in "Porting Libyaml to Safe Rust: Some Thoughts"]]></title><description><![CDATA[
<p>*nod* Give <a href="https://blog.readyset.io/bounds-checks/" rel="nofollow">https://blog.readyset.io/bounds-checks/</a> a read.<p>They tried doing a comparison between ReadySet compiled normally and ReadySet with bounds checking removed so thoroughly that they needed to use a patched toolchain to achieve it and found the difference to be within the noise threshold.<p>Their conclusion was:<p>> At the end of the day, it seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible. It’s tough to say precisely why this is, but my intuition is that CPU branch prediction is simply good enough in practice that the cost of the extra couple of instructions and a branch effectively ends up being zero - and compilers like LLVM are good enough at local optimizations to optimize most bounds checks away entirely. Not to mention, it’s likely that quite a few (if not the majority) of the bounds checks we removed are actually <i>necessary</i>, in that they’re validating some kind of user input or other edge conditions where we want to panic on an out of bounds access.</p>
]]></description><pubDate>Fri, 09 Feb 2024 14:59:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=39315424</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=39315424</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39315424</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>*nod* ...and stemming is that taken to a greater extreme.<p>I was just pointing out that Unicode itself has various forms of normalization and normalization-adjacent functionality that people are far too unaware of.</p>
]]></description><pubDate>Wed, 04 Oct 2023 01:25:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=37759806</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37759806</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37759806</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>> Before comparing strings or searching for a substring, normalize!<p>...and learn about the TR39 Skeleton Algorithm for Unicode Confusables. Far too few people writing spam-handling code know about that thing.<p>(Basically, it generates matching keys from arbitrary strings so that visually similar characters compare identical, so those Disqus/Facebook/etc. spam messages promoting things like BITCO1N pump-and-dumps or using esoteric Unicode characters to advertise work-from-home scams will be wasting their time trying to disguise their words.)<p>...and since it's based on a tabular plaintext definition file, you can write a simple parser and algorithm to work it in reverse and generate sample spam exploiting that approach if you want.<p><a href="https://www.unicode.org/Public/security/latest/confusables.txt" rel="nofollow noreferrer">https://www.unicode.org/Public/security/latest/confusables.t...</a><p>> and CD-ROM!<p>I think you mean Microsoft Windows's Joliet extensions to ISO9660 which, by the way, use UCS-2, not UTF-16. (Try generating an ISO on Linux (eg. using K3b) with the Joliet option enabled and watch as filenames with emoji outside the Basic Multilingual Plane cause the process to fail.)<p>The base ISO9660 filesystem uses bytewise-encoded filenames.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:22:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747957</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747957</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747957</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>My anglophone Canadian brother's name is André. Even if you're fine with alienating the ~50% of the world population using non-latin writing systems, probably best to at <i>least</i> stick to the stuff covered by the latin1 legacy encoding.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:14:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747900</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747900</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747900</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>Technically, a superset would have to somehow Schrödinger's cat around \ in latin1 and ¥ in Shift-JIS being the same codepoint.<p>Unicode just took it upon themselves to reliably round-trip legacy text... thus the precomposed forms.<p>Most of the other complexity and technical debt is in the writing systems themselves.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:11:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747880</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747880</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747880</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>According to a sibling to what you replied to, it's because the shapes of the glyphs are still under copyright by known-litigious rightsholders and the Unicode consortium doesn't want to subject font authors to that.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:07:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747851</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747851</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747851</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>According to a sibling to what you replied to, it's because the shapes of the glyphs are still under copyright by known-litigious rightsholders and the Unicode consortium doesn't want to subject font authors to that.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:06:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747840</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747840</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747840</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>"Extended (Grapheme Cluster)".<p>The .graphemes() method in Rust's unicode-segmentation crate takes an is_extended boolean as an argument and, if you set it to false, you're iterating legacy grapheme clusters.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:04:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747830</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747830</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747830</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>Cookie consent is only necessary if you're sharing it with others (eg. ad networks, Google Analytics, etc.) or using it for "non-essential" functions (again, stuff like analytics). Sites just don't want the general public to realize that.<p>As for the mouse cursors, I don't think they qualify as personal information under the GDPR, but IANAL.</p>
]]></description><pubDate>Tue, 03 Oct 2023 04:01:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747820</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747820</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747820</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>Yes. Give <a href="https://www.youtube.com/watch?v=_mZBa3sqTrI">https://www.youtube.com/watch?v=_mZBa3sqTrI</a> a watch... especially the "Oh my God! We've been hacked!" part at 36:20.<p>TL;DR: They had a transient glitch in their network switch and, because Windows uses UTF-16 when sending remote event logs over the wire, whenever it dropped a single byte, it had the effect of swapping the endianness of the messages, resulting in scary Chinese text in the logs.<p>You could get the same effect by naively applying byte-wise processing to UTF-16 or UTF-32, or having an off-by-one error.<p>UTF-8 is self-synchronizing so one-byte errors like that only lose you one character, rather than corrupting the entire stream going forward.</p>
]]></description><pubDate>Tue, 03 Oct 2023 03:57:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747801</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747801</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747801</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>Give the "Indic scripts" section of <a href="https://manishearth.github.io/blog/2017/01/15/breaking-our-latin-1-assumptions/" rel="nofollow noreferrer">https://manishearth.github.io/blog/2017/01/15/breaking-our-l...</a> a read.<p>TL;DR: Unicode is complicated because some non-Latin writing systems are complicated and those non-Latin writing systems account for over a quarter of the world's population. (They're either majority or present in India, Indonesia, Pakistan, Bangladesh, the Philippines, etc.)</p>
]]></description><pubDate>Tue, 03 Oct 2023 03:51:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747761</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747761</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747761</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>Give <a href="https://manishearth.github.io/blog/2017/01/15/breaking-our-latin-1-assumptions/" rel="nofollow noreferrer">https://manishearth.github.io/blog/2017/01/15/breaking-our-l...</a> a read and, ideally, the other things it mentions like <a href="https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/" rel="nofollow noreferrer">https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/</a> and <a href="https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/" rel="nofollow noreferrer">https://manishearth.github.io/blog/2017/01/14/stop-ascribing...</a>.<p>(Among other things, it points out that doing that to non-Latin text is liable to change pronunciations and meanings in other languages. For example, some languages use diacritics for voiced/unvoiced indication where your "normalization" could do things like "tick→dick" or "did→tit".)<p>(Did you ever notice that? B/P, D/T, V/F, G/K, J/CH, and Z/S form voiced/unvoiced pairs that could have been indicated with a single letter and a diacritic. Same mouth behaviour. It's just a question of whether you engage your vocal cords.)</p>
]]></description><pubDate>Tue, 03 Oct 2023 03:33:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747661</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747661</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747661</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>> I didn't know that unicode changes the definition of grapheme in backwards incompatible fashion annually, so software which works by grapheme count is probably inconsistent with other software using a different version of the standard anyway.<p>This is EXACTLY why Rust's standard library is blind to graphemes. Support for the case where your company requires a specially certified toolchain that lags five years behind Rust upstream is an explicit goal that they address by breaking the stuff that changes quickly out into minimal crates that can be audited, updated at a quicker pace without requiring toolchain updates, and which have the option to continue to support older compiler versions indefinitely.</p>
]]></description><pubDate>Tue, 03 Oct 2023 03:28:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747637</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747637</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747637</guid></item><item><title><![CDATA[New comment by ssokolow in "What every software developer must know about Unicode in 2023"]]></title><description><![CDATA[
<p>*nod*<p>Rust was given as one of the examples and Rust's .len() behaviour is chosen based on three very reasonable concerns:<p>1. They want the String type to be available to embedded use-cases, where it's not reasonable to require the embedding of the quite large unicode tables needed to identify grapheme boundaries. (String is defined in the `alloc` module, which you can use in addition to `core` if your target has a heap allocator. It's just re-exported via `std`.)<p>2. They have a policy of not baking stuff that is defined by politics/fiat (eg. unicode codepoint assignments) into stuff that requires a compiler update to change. (Which is also why the standard library has no timezone handling.)<p>3. People need a convenient way to know how much memory/disk space to allocate to store a string verbatim. (Rust's `String` is just a newtype wrapper around `Vec<u8>` with restricted construction and added helper functions.)<p>That's why .len() counts bytes in Rust.<p>Just like with timezone definitions, Rust has a de facto standard place to find a grapheme-wise iterator... the unicode-segmentation crate.</p>
]]></description><pubDate>Tue, 03 Oct 2023 03:26:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=37747625</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37747625</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37747625</guid></item><item><title><![CDATA[New comment by ssokolow in "Rust vs. Go in 2023"]]></title><description><![CDATA[
<p>I agree... you need discipline either way. It's just easier to wind up not realizing that you don't <i>need</i> to spend so much time on up-front optimization with Rust to produce a perfectly serviceable program. (Especially if you haven't internalized the whole "fearless refactoring" angle.)</p>
]]></description><pubDate>Tue, 15 Aug 2023 00:48:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=37128753</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37128753</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37128753</guid></item><item><title><![CDATA[New comment by ssokolow in "Rust vs. Go in 2023"]]></title><description><![CDATA[
<p>While it's not a study, <a href="https://blog.polybdenum.com/2023/03/05/fixing-the-next-10-000-aliasing-bugs.html" rel="nofollow noreferrer">https://blog.polybdenum.com/2023/03/05/fixing-the-next-10-00...</a> gives a real-world example of a bug in Go Rust would have prevented.<p>Also, bear in mind that it's not just Rust's type system directly that prevents bugs, but also the patterns it enables. For example, the typestate pattern and its use in things like turning misuse of the HTTP protocol into a compile-time error in Hyper.<p><a href="https://cliffle.com/blog/rust-typestate/" rel="nofollow noreferrer">https://cliffle.com/blog/rust-typestate/</a></p>
]]></description><pubDate>Mon, 14 Aug 2023 22:56:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=37127830</link><dc:creator>ssokolow</dc:creator><comments>https://news.ycombinator.com/item?id=37127830</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37127830</guid></item></channel></rss>