<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: dzaima</title><link>https://news.ycombinator.com/user?id=dzaima</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 30 Jun 2026 21:19:04 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=dzaima" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by dzaima in "The end of my AArch64 desktop experiment"]]></title><description><![CDATA[
<p>Ampere Altra is for cloud/datacenters/servers where multithreaded throughput is approximately all that matters. Apple M series is for consumers.</p>
]]></description><pubDate>Tue, 30 Jun 2026 10:27:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=48730674</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48730674</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48730674</guid></item><item><title><![CDATA[New comment by dzaima in "Memory Safe Context Switching"]]></title><description><![CDATA[
<p>That'd only help for one object per address space. Main thing needing relocation - shared libraries - needs arbitrarily-many segment bases.<p>And when you're not a library, relocation is just a mild probabalistic security improvement (...that'd be massively-more bypassable than it already is if the program was littered full of gadgets of "read register as unrelocated offset and use it with its correct base" instructions).</p>
]]></description><pubDate>Tue, 30 Jun 2026 07:40:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48729637</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48729637</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48729637</guid></item><item><title><![CDATA[New comment by dzaima in "POSIX Is Not a Shell"]]></title><description><![CDATA[
<p>OP links you to POSIX explicitly denoting it being implementation-defined - <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_05" rel="nofollow">https://pubs.opengroup.org/onlinepubs/9699919799/utilities/e...</a>, and <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16" rel="nofollow">https://pubs.opengroup.org/onlinepubs/9699919799/utilities/e...</a> literally says "It is not possible to use echo portably across all POSIX systems unless [specific setup]"<p>Interestingly, that doesn't allow implementation-defined behavior for "echo -e", for which bash does have special behavior.</p>
]]></description><pubDate>Mon, 29 Jun 2026 19:39:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=48724070</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48724070</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48724070</guid></item><item><title><![CDATA[New comment by dzaima in "POSIX Is Not a Shell"]]></title><description><![CDATA[
<p>ECMAScript has a pretty massive amount of fully-specified behavior though; the things that differ between those implementations is nearly-entirely limited to fresh additions like `require` or whatever.<p>The echo thing would be like if ECMAScript allowed stuff like `"123" == 123` to give either false or true; and then indeed many things would probably break if moved across implementations.<p>C is the closer comparison, and indeed much software that could easily be portable (and might claim it is) often depends on implementation-specific things like 8-bit bytes, 32-bit int, assuming int8_t/etc in stdint.h exist, twos complement (before C23 at least), arithmetic shift right, etc.</p>
]]></description><pubDate>Mon, 29 Jun 2026 08:44:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48716515</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48716515</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48716515</guid></item><item><title><![CDATA[New comment by dzaima in "POSIX Is Not a Shell"]]></title><description><![CDATA[
<p>That seems to be be an entirely-different question - `echo "c:\\new"` still differs in behavior between bash and dash - dash parses backslashes in both the double-quoted string, and then echo does another backslash parsing pass, still printing a newline; whereas bash prints a backslash + n.</p>
]]></description><pubDate>Mon, 29 Jun 2026 08:07:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48716241</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48716241</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48716241</guid></item><item><title><![CDATA[New comment by dzaima in "Cessation of public development of Kefir C compiler"]]></title><description><![CDATA[
<p>> But this has never been a condition in the FOSS world, as far as I'm aware. I've only ever seen attribution requirements attach to redistribution of source, not usage of the software.<p>AGPL requires that even users using the software even across a network must be provided with a way to get the license (i.e. attribution) and source. Never mind that LLMs consume the source code instead of "using" the software anyway. (and of course things go more downhill for LLMs for licenses more restrictive than AGPL)<p>Otherwise, I'd say that, for many, the ideal condition for (copyleft) FOSS would be that anything that utilizes source code in any form also provides said source code and license/attribution. Sometimes that can even extend to outputs of software (and e.g. gcc takes time to explicitly state that its compiled code output does <i>not</i> count as being derived from gcc's code).<p>> whether training an LLM is redistribution of the underlying code<p>There's a funky side-note of whether LLM training can even be done on material with improperly-followed licensing; if you don't even have the permission to modify the material (as properly following MIT/GPL/etc would give you), it might be illegal to even tokenize it, never mind use it for training.<p>> That's literally all LLMs do. That's what tokenization is.<p>It's clearly not that simple, otherwise "split source into 10-char chunks, reverse that list, reverse it back, join this fun list we've gotten" would be enough to circumvent copyright.<p>> all you'll see on the LLM side is probability matrices representing correlations between decomposed units of knowledge aggregated across the entire dataset as an integrated whole.<p>Yeah, you need at least that, tokenization is irrelevant. But jury's out on this one - of course a good chunk is some form of "abstract knowledge", but other parts could be just encoding material in some compressed form (and surely gzipping a source code file doesn't circumvent copyright) that at the very least can apply to weights.<p>> The only intent ever in play is that of the user. LLMs are just software.<p>So my split-into-words-and-join-back is valid circumvention of copyright, if the user of some software doing that isn't informed that it's just effectively directly copying material. (I'll grant that perhaps, in such, the accidental-infringer might get a smaller penalty and/or get to defer punishment to whoever mismarketed the software to them,...but that wouldn't apply to anyone who knows that LLMs are very much just directly trained on copyrighted material. Don't know about legally derived, but surely mathematically derived)<p>Never mind that, for some things, learning some specific copyrighted code <i>is</i> the desired thing (humans <i>do</i> do this after all!), at which point at the very least the weights of the model are as copyright-infused as a gzipped source code file is.<p>If intent determination is on the user, and the user is aware that LLMs are very much technically capable of producing copyrighted works to some extent (which they better be), it would be on the user to ensure that any specific code they end up using is not, which is...a rather non-trivial task (a human that writes code can also reasonably-reason about whether they're infringing on whatever they learned from, but splitting into LLM writing + human checking fundamentally makes that basically infeasible).</p>
]]></description><pubDate>Tue, 02 Jun 2026 20:57:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=48376174</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48376174</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48376174</guid></item><item><title><![CDATA[New comment by dzaima in "Cessation of public development of Kefir C compiler"]]></title><description><![CDATA[
<p>> That's simply not correct within the applicable meaning of "derives" as understood in copyright law.<p>Would be rather hard to write a definition that handles it properly back when LLMs didn't exist; not that laws particularly have anything to do with intent/desires behind FOSS anyway - intent is clearly there: you get code, under the condition that if you use it for anything, I get credited; else, you get nothing.<p>> In fact, data per se is not even within the scope of copyright protection in the first place: specific published works are copyrighted, but the underlying ideas and facts that they convey are not.<p>Luckily, FOSS is specific published works, and unless LLMs actually reasonably-provably do such decomposing into ideas/facts (good luck reasoning about that), that part is also irrelevant.<p>> If you applied the principle you're proposing here to human developers, you'd conclude that any code written by someone who learned to program by studying techniques used in FOSS software would in turn be a derivative work of that software. No one has ever regarded this to be the case.<p>Depending on intent, that very much can happen, it's called plagiarism. Good luck proving an LLMs intent. (not to mention the obvious differentiating factor of LLMs having arbitrarily-good memory unlike humans)</p>
]]></description><pubDate>Mon, 01 Jun 2026 23:45:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48364056</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48364056</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48364056</guid></item><item><title><![CDATA[New comment by dzaima in "Bijou64: A variable-length integer encoding"]]></title><description><![CDATA[
<p>Kinda surprised that there's no discussion on that this basically just does not solve the non-canonicality problem.<p>Forgetting to do the range check on the first_byte==255 case and just letting it do 64-bit wraparound is exactly as much of a plausible bug as missing range checks on LEB128. Any test suite with the goal of covering canonicality will trivially cover both properly; and a programmer that implements things by reading 7 words into the spec, saying "oh yeah I got this" and goes to implement what seems simple, will write a broken version of both.<p>Perhaps the biggest benefit is just not being associated with a format that tolerates non-canonicality in other places (though, if bijou64 gains traction, it'll only be a matter of time for wraparound-check-less versions to start appearing in places where the wraparound is fine); and I guess also it being less annoying to implement the canonicality check, though hopefully people writing security-sensitive software aren't ones to skip out on correctness checks due to annoyingness.<p>In a sense, bijou64 could perhaps even be more problematic - it invites not doing any range checks for the smaller inputs because they obviously don't need it, and so you can just forget to special-case the max length case; whereas LEB128 makes you already care about it at the first point it is actually LEB128.<p>(of course, the format does still have other benefits; enforced canonicality is just...not one of them)</p>
]]></description><pubDate>Fri, 29 May 2026 23:50:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48330847</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48330847</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48330847</guid></item><item><title><![CDATA[New comment by dzaima in "Defeating Git Rigour Fatigue with Jujutsu"]]></title><description><![CDATA[
<p>How different people approach workflows is fascinating.<p>For example, your "not all that different from looking at all (recent) heads" implies that the number of (recent) heads isn't far off from number of (would-be-)branches (i.e. no random offshoot experiments, stashed-away debug sessions; whereas I make many of such continuously (were stashes on git (with occasional grumbles about not being able to stack stashes), regular commits now on jj (maybe with a special-format description, if I bother)));<p>and that you (even if subconsciously) try to ensure that the head of a branch is always identifiably-representative of the branch (i.e. don't put some random unrelated change at the tip with the idea of "I'll put this in a more proper place when I get back to this").<p>Effectively, using the full commit graph not as a place where anything potentially-useful can stay, but rather by itself a complete picture, with things not fitting into it going into.. idk, just being abandoned, to be found by looking through the op log? commit IDs saved in an external file? wading through evolog / scanning through `jj show -r xyz/2` etc?</p>
]]></description><pubDate>Mon, 25 May 2026 15:52:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=48268278</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48268278</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48268278</guid></item><item><title><![CDATA[New comment by dzaima in "Defeating Git Rigour Fatigue with Jujutsu"]]></title><description><![CDATA[
<p>> It is like writing out a plan for what I want to do.<p>I usually don't have a plan for the end; certainly not what any specific commit would be; sure, I could make one (and either make my future self have to do extra work to figure out what commits with lies in their descriptions actually do, or continuously update the commit message marking what actually exists), but as I said that's basically a waste of time. (if you like comparing with past thoughts, sure, but that's definitely not a necessity for a workflow to be reasonable)<p>"is/isn't an ancestor of the bookmark" is also just a pretty damn good short-hand for denoting a separation between what's been considered the best attempt at the goal, vs things with known problems or just unrelated to the task.<p>At the core, this if all of course just a question of workflow; if you go into a thing with a plan, meaningful outlook of a non-vague destination, and without expecting continuous switching back&forth between a dozen other things over the time span the branch is alive, caring less about branches or branch names can perhaps work.<p>> The first line of a commit message is already a summary of the work done.<p>But you can't (sanely) use it to reference the branch in a revset, can't find it anywhere other than the full log (that's interleaved and mixed with a bunch of other things that you won't ever need to search for), and actual English just gets in the way for finding it, remembering it, and identifying it in a list.<p>This alone means that, even if I found interest in massively-ahead-of-time-describing commits, having a sane branch reference is still simply just necessary.</p>
]]></description><pubDate>Mon, 25 May 2026 02:40:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48262945</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48262945</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48262945</guid></item><item><title><![CDATA[New comment by dzaima in "Defeating Git Rigour Fatigue with Jujutsu"]]></title><description><![CDATA[
<p>But you probably <i>haven't</i> spent time writing commit messages before a branch is finished. Or, if you have, you've quite potentially just wasted time writing something that will be rewritten anyway as things change; replacing a chore with a much bigger chore.<p>Restricted and summarized is good - easier to find/remember, less fluff in a list. And easier to recognize a short identifier from a list of the 2-3 most recent branches, than scanning through 50 commits, when trying to remember where some work last was, and which is the proper end-point instead of some failed attempt or unrelated change.<p>Unnamed branches are quite neat - I certainly have a lot more of such than named ones in jj - but as such named branches are, if anything, <i>more</i> important as a result, for separating sequences of changes striving towards a goal, from the sea of smaller experiments.</p>
]]></description><pubDate>Mon, 25 May 2026 01:43:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48262715</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48262715</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48262715</guid></item><item><title><![CDATA[New comment by dzaima in "Thinking in an array language (2022)"]]></title><description><![CDATA[
<p>In BQN, I've made <a href="https://codeberg.org/dzaima/bqn-smt/" rel="nofollow">https://codeberg.org/dzaima/bqn-smt/</a> (SMT engine bindings, plus various utilities, and a RISC-V & x86 superoptimizer of varying amounts of completeness); ~4KLoC (+1KLoC of tests). Might not fit your "real-world" as I am mostly its only user (and many things are undocumented), but it does have a good amount of non-trivial things aren't necessarily particularly array-y.<p>Can also look through APL github repos: <a href="http://github.com/search?q=language%3Aapl&type=repositories" rel="nofollow">http://github.com/search?q=language%3Aapl&type=repositories</a></p>
]]></description><pubDate>Sat, 23 May 2026 12:48:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=48247216</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48247216</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48247216</guid></item><item><title><![CDATA[New comment by dzaima in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Even if you forbid "time travel", you can still <i>technically</i> optimize many things as if time travel happened anyway - e.g. want to time-travel back to before some memory store? just pretend that the store happened, but then afterwards the previous value was stored back (and no other threads happen to see the intermediate value)!<p>Only things you need to worry about then are things with actual observable side-effects - volatile, printf and similar - and C23 does note that all observable behavior should happen even if UB follows, and compilers can't generally optimize function calls anyway (e.g. on systems on which you can define custom printf callbacks, you could put an exit(0) in such, and thus make it incorrect to optimize out a printf ever).</p>
]]></description><pubDate>Wed, 20 May 2026 10:45:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205686</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48205686</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205686</guid></item><item><title><![CDATA[New comment by dzaima in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is a variable less than 10)</p>
]]></description><pubDate>Wed, 20 May 2026 10:21:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205531</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48205531</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205531</guid></item><item><title><![CDATA[New comment by dzaima in "Saying Goodbye to one line of APL"]]></title><description><![CDATA[
<p>Some alternative spellings:<p><pre><code>    (¬∘∧⟜«' '=⊢)⊸/
    (¬·«⊸∧' '=⊢)⊸/
    {¬«⊸∧' '=x}⊸/ # should have double-struck x here (U+1D569), but hn removes it</code></pre></p>
]]></description><pubDate>Thu, 14 May 2026 16:15:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48137508</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48137508</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48137508</guid></item><item><title><![CDATA[New comment by dzaima in "Deterministic Fully-Static Whole-Binary Translation Without Heuristics"]]></title><description><![CDATA[
<p>Presumably just set to a canonical crash in the lookup table of address-to-code; which'd still get you a crash, just not that of the directly-run invalid code.</p>
]]></description><pubDate>Wed, 13 May 2026 09:32:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48119698</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48119698</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48119698</guid></item><item><title><![CDATA[New comment by dzaima in "Deterministic Fully-Static Whole-Binary Translation Without Heuristics"]]></title><description><![CDATA[
<p>Only if it is all actually used at runtime; and presumably the vast majority of possible decoding starting points won't be.</p>
]]></description><pubDate>Wed, 13 May 2026 09:27:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=48119664</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48119664</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48119664</guid></item><item><title><![CDATA[New comment by dzaima in "Dirtyfrag: Universal Linux LPE"]]></title><description><![CDATA[
<p>That's specific libraries, when using the default linker. You could construct that same behavior on desktop linux too. And you can avoid it equally well on Android - you can statically-link things just fine, you can use libraries you actually control, and presumably use a custom linker if desired. It's utterly non-surprising that "you run code you don't control" results in "said code...can do arbitrary things for unsupported use". (Never mind that, instead of a "sherif", they could've just renamed all private symbols, or just naturally replaced them over time, breaking your code all the same, just in a more confusing way)<p>Also some obligatory Linux vs GNU/Linux comment. (and it's not like GNU/Linux doesn't ever change under your feet - see the glibc DT_HASH debacle)</p>
]]></description><pubDate>Thu, 07 May 2026 20:53:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48054828</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=48054828</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48054828</guid></item><item><title><![CDATA[New comment by dzaima in "Jujutsu megamerges for fun and profit"]]></title><description><![CDATA[
<p>eh, there have been a good amount of breaking changes. `-d`/`--destination` → `-o`/`--onto` (the former isn't yet deprecated though); deprecated `--allow-new` on push (or, forcibly making it the default for `--bookmark`); deprecated `jj bookmark track foo@bar` (and `jj bookmark track foo` having a really-weird system (I personally just call it broken, even though the behavior is intentional) of sometimes tracking the bookmark on <i>all</i> remotes; really I'd call jj's entire system of bookmark tracking/pulling/pushing quite incomplete outside of the trivial cases); various changed revset functions over time that break configs; and a really-annoying thing of `jj git fetch` sometimes abandoning ascendants of `@` leaving you in a confusing state (if not one with conflicts), with the solution being a future `jj git sync`.<p>It's certainly very usable despite all that, and the changes are simple enough to adapt to, but it's a pretty new thing.</p>
]]></description><pubDate>Tue, 21 Apr 2026 10:22:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47846879</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47846879</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47846879</guid></item><item><title><![CDATA[New comment by dzaima in "Jujutsu megamerges for fun and profit"]]></title><description><![CDATA[
<p>Don't necessarily need heavy parallel work, or even anything parallel, to make use of jj; it's very nice for even just manipulating one local sequence of commits (splitting commits up, reordering them, moving files/hunks/lines between them or into/out of the working copy, without needing to checkout anything).<p>Won't get you much if you don't like to mutate commits in general, of course; at that point it's just a different committing workflow, which some may like and some dislike. (I for one am so extremely-happy with the history-rewriting capabilities that I've written some scripts for reinventing back a staging area as a commit, and am fine to struggle along with all the things I don't like about jj's auto-tracking)<p>As a fun note, git 2.54 released yesterday, adding `git history reword` and `git history split` in the style of jj (except less powerful because of git limitations) because a git dev discovered jj.</p>
]]></description><pubDate>Tue, 21 Apr 2026 08:44:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=47846246</link><dc:creator>dzaima</dc:creator><comments>https://news.ycombinator.com/item?id=47846246</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47846246</guid></item></channel></rss>