<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: thomashabets2</title><link>https://news.ycombinator.com/user?id=thomashabets2</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 20 May 2026 18:56:58 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=thomashabets2" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>Fair enough.<p>For strtoul and friends, maybe? 7.24.1 is pretty dense, but the key parts are "the expected form of the subject sequence is a sequence of letters and digits representing an integer with the radix specified by base, optionally preceded by a plus or minus sign […] If the correct value is outside the range of representable values […] ULONG_MAX […] is returned".<p>So the "expected form" allows a minus sign, but then it's clearly "outside the range of representable values" for strtoul to try parsing a negative value. So maybe it should return ULONG_MAX on those.<p>So arguably a minus sign present could already be treated as an error, and still be standard compliant. Unless I'm misreading.</p>
]]></description><pubDate>Wed, 20 May 2026 17:15:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=48210943</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48210943</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48210943</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>I think I would defer to someone more of a language lawyer than we, but I'm not sure what you're describing can be expressed in the C abstract machine. If a pointer is invalid, not pointing to an object, then I'm not sure it means anything to "read from there".<p>I know what you mean, but I'm just not sure you're describing something that fits what C "is". We program C to the abstract machine specified in the standard (5.1.2), and the compiler's job is to translate that into something with identical behavior on particular hardware. Piercing the layers down to actual hardware or assembly isn't really done.<p>Even "volatile" just says (basically) "touching this object has side effects". It <i>implies</i> no double-loading, speculative store, etc, but doesn't say "don't emit assembly instructions to load this unless the program logic path takes the route where the C program does load it".<p>The standard is not using ancient language when it refers to "objects with static storage duration" instead of "heap" or ".data segment". It is the true class of objects in the abstract machine.</p>
]]></description><pubDate>Wed, 20 May 2026 16:49:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48210563</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48210563</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48210563</guid></item><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>The point of this post, though, is even something as simple as "give me this string as an integer" doesn't have an answer that doesn't come with "are you OK with this best effort parse under these edge cases? Oh and we use this number as error, so you can't parse that".<p>Like… edge cases? It's <i>parsing a number</i>! We're not talking about I/O on hard vs soft intr NFS mounts, here. There's a right answer.<p>strlen(), on valid null terminated strings, doesn't come with caveats like "oh we can't measure strings of length 99".<p>But sure, C is turing complete. It is possible to solve any problem a turing machine can solve.<p>> understand the target platform and the target compiler’s behavior.<p>This is neither. This is purely the language.</p>
]]></description><pubDate>Wed, 20 May 2026 16:23:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48210210</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48210210</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48210210</guid></item><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>Only literally. 7.24.1 in the C programming language spec has these poor parsers.</p>
]]></description><pubDate>Wed, 20 May 2026 16:11:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48210050</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48210050</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48210050</guid></item><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>While snprintf() is better than sprintf(), I find that it's easy for people to not check if the return value is bigger than the provided size. Sure, it prevents a buffer overflow, but there could still be a string truncation problem.<p>Similar to how strlcpy() is not a slam dunk fix to the strcpy() problem.</p>
]]></description><pubDate>Wed, 20 May 2026 16:03:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209933</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48209933</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209933</guid></item><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>For unsigned that could work, but signed overflow is UB.</p>
]]></description><pubDate>Wed, 20 May 2026 15:59:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209873</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48209873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209873</guid></item><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>> they don't like seem to like that unsigned parsing will accept negative numbers and then automatically wrap them to their unsigned equivalents, nor do they like that C number parsing often bails with best effort on non-numeric trailing data rather than flagging it an error, nor do they like that ULONG_MAX is used as a sentinel value by sscanf.<p>That's right. I don't like asking it to parse the number contained inside a string, and getting <i>a different number</i> as a result.<p>That's just simply not the right answer.<p>> I'm not sure what they mean by "output raw" vs "output"<p>I can see how that's very unclear. Changed now to "Readable".</p>
]]></description><pubDate>Wed, 20 May 2026 15:54:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209807</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48209807</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209807</guid></item><item><title><![CDATA[New comment by thomashabets2 in "No way to parse integers in C (2022)"]]></title><description><![CDATA[
<p>Yup. Sorry about that.</p>
]]></description><pubDate>Wed, 20 May 2026 15:49:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48209720</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48209720</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48209720</guid></item><item><title><![CDATA[New comment by thomashabets2 in "OpenBSD 7.9"]]></title><description><![CDATA[
<p>As I mentioned in the post, I only did a brief exploration of OpenBSD in order to cheer myself up. I took some findings, confirmed them being true bugs, and ended there.<p>As I said in the out of bounds null termination write patch, I don't believe it's exploitable. I would have gotten a CVE, website, and logo then (kidding!). But it was UB. And one-byte overflows have in the past been exploitable by better sploit authors than me.<p>In any case, I reported that since I felt it was clear that OpenBSD folks would obviously care about it, exploitable or not.<p>Confirming these findings take time, even though I found GPT to almost always be correct. I will NOT report upstream until I understand the bug. I ain't no slop reporter. As I said in the post OpenBSD (and all other code bases) need a larger effort. The Mythos/Glasswing effort focusing on actually exploitable ones may be a good method for getting them fixed, without overwhelming projects with patches, even when the patches are correct.<p>I did confirm at least one more UB, and did consider whether to report that OpenBSD `find` reads `status` via `WIFEXITED(status)` without checking `waitpid()` for errors. This is UB since `status` is uninitialized.
(<a href="https://github.com/openbsd/src/blob/ae684bfaed6cae797cd90e2768b1b092b2eeb2ae/usr.bin/find/function.c#L518" rel="nofollow">https://github.com/openbsd/src/blob/ae684bfaed6cae797cd90e27...</a>)<p>The reason is my previous experience with OpenBSD where the reply may be "<some standard> is wrong in this regard", and because they control their whole system, they don't care. E.g. in this case they may go "we build with GCC x.y.z exactly, and we know what actually happens in this controlled domain". This may be a bit unfair to them, but not by much.<p>GPT also flagged the <i>extremely</i> surprising behavior of running `cat -n file1 file2` if file1 doesn't end with a newline. And that `find /etc/passwd -execdir[…]` doesn't run the command. But maybe that's how they want it? I don't want to go through the whole thing for them to go "yeah we won't do that" again. So I think this project is for them. GPT is as available to them as it is to me.<p>Tangent: in running GPT against `cat` I learned that not only is `cat -n` not standardized, but it also behaves COMPLETELY differently than on Linux, if you provide more than one file.</p>
]]></description><pubDate>Wed, 20 May 2026 14:14:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48208184</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48208184</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48208184</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>> "Just don't do that" is the correct approach to errors<p>We have 54 years of empirical data that literally nobody can follow this approach and reach UB-freeness. To stick to the plan is more like the in-debt gambler who just needs to work their system for a little longer, and they'll become rich.<p>By this logic we don't need any traffic rules other than "just don't crash or hit anyone". And we can aspire to an absolute dictatorship, all we need to do is "just" choose the benevolent one.<p>Of course we should always try to not make mistakes. But given more than half a century of empirical data that nobody has been able to avoid UB, ever, it takes quite some hubris to say "but it might work for us".<p>> you seem to underestimate how wrong placing negative values in a signed char is<p>Shrug. You don't make that mistake. There are thousands of mistakes like it, especially in C or C++.<p>Of course "don't do that". That is not the same as "So just don't do that!". The former is good advice. The latter is one of a million rules, and to expect even experts (see OpenBSD) to never make a mistake is unrealistic to say the least.<p>You may even have spotted the UB in <a href="https://pooladkhay.com/posts/first-kernel-patch/" rel="nofollow">https://pooladkhay.com/posts/first-kernel-patch/</a>. But you would not spot all of them. Nobody in history has.</p>
]]></description><pubDate>Wed, 20 May 2026 13:55:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207860</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48207860</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207860</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Are you talking about creating a pointer (more than one item) past an array, or dereferencing that pointer? Both are currently UB.<p>For the former, I kinda get it. It may need to be there for cases like with segmented address space where p+10 could actually be a value less than p, for the eventually generated assembly. Maybe it should be fine to create such a pointer, but have it be "indeterminate value" or whatever, if you try to compare that pointer to anything? I don't know enough about compiler internals to say one way or the other.<p>Dereferencing, though, can only be UB. There may not be a "value" behind that address. There may be a motor that's been I/O mapped, or a self destruct button.</p>
]]></description><pubDate>Wed, 20 May 2026 13:38:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207557</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48207557</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207557</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>> It's honestly not that difficult to be rigorous.<p>Ok, let's try it. I pointed GPT 5.5 at the smallest part of cosmopolitan as I could find in two seconds, net/finger. 299 lines.<p>describesyn.c:66: q + 13 constructs a pointer that can point well beyond the array plus one element.<p>C23 6.5.6p9:<p>>  If the pointer operand and the result do not point to elements of the same array object or one past the last element of the array object, the behavior is undefined<p>Now… you may be trolling, but I do feel like this disproves your assertion. Not you, not me, not Theo de Raadt, can avoid UB.<p>> the compiler generating code that checks for pointer overflow.<p>Do you need to check for that specifically? What pointer are you constructing that is not either pointing at a valid object correctly aligned (not UB), or exactly one past the element of an array?<p>Do you mean for the latter, in case you have an array that ends on the maximum expressible pointer address?<p>I'm a bit unclear on what you mean by "pointer overflow". From mentioning 56 bit address spaces I'm guessing you mean like the pointer wrapped, not what I pointed to in cosmopolitan, above?<p>Ok, to be clear that it's not just that one type, if you forgive that one:<p>net/http/base32.c:64: read sc[0] even if sl=0. I assume this is never called with sl=0, so could be fine.<p>net/http/ssh.c:355: pointer address underflow? Should that be `e - lp`?<p>net/http/ssh.c:209/229: double destroy of key. can this code path have non-null members, meaning double free? Looks like it, since line 207 does the parsing and checks that parse worked.<p>net/http/ssh.c:123: uses memset, which assumes that it sets member variable pointers to NULL (per my post, depending on that means depending on UB), and later these pointers are given to free(), so that's UB.<p>I won't look deeper into net/http, but presenting just the possibly incorrect remaining comments from jippity:<p><pre><code>  - ssh.c:211 and parsecidr.c:44: length-taking APIs use unbounded strstr() / strchr(), so explicit n with non-NUL-terminated input can read beyond the buffer.

  - tokenbucket.c:77 and tokenbucket.c:92: x >> (32 - c) is UB for c == 0 and for out-of-range c.

  - isacceptablehost.c:68: long numeric host labels can overflow signed int b before the function eventually rejects/accepts the host.</code></pre></p>
]]></description><pubDate>Wed, 20 May 2026 12:05:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206377</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48206377</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206377</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here.<p>> It barely scratches the surface.<p>I agree. The point of the post is not to enumerate and explain the implications of all 283 uses of the word "undefined" in the standard. Nor enumerate all the things that are undefined by omission.<p>The point of the post is to say it's not possible to avoid them. Or at least, no human since the invention of C in 1972 has.<p>And if it's not succeeded for 54 years, "try harder", or "just never make a mistake", is at least not the solution.<p>The (one!) exploitable flaw found by Mythos in OpenBSD was an impressive endorsement of the OpenBSD developers, and yet as the post says, I pointed it at the simplest of their code and found a heap of UB.<p>Now, is it exploitable that `find` also reads the uninitialized auto variable `status` (UB) from a `waitpid(&status)` before checking if `waitpid()` returned error? (not reported) I can't imagine an architecture or compiler where it would be, no.<p>FTA:<p>> The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C and C++ code has UB.</p>
]]></description><pubDate>Wed, 20 May 2026 10:56:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205770</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205770</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205770</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here.<p>> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned71) for the referenced type, the behavior is undefined.<p>C23 6.3.2.3p7.</p>
]]></description><pubDate>Wed, 20 May 2026 09:52:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205340</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205340</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205340</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>That is a typo, that I think I introduced when I went back to clarify that it applies to C++ too.<p>Will fix it.</p>
]]></description><pubDate>Wed, 20 May 2026 09:43:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205268</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205268</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205268</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here.<p>In the context of UB discussion, the arguments apply equally to C and C++.<p>How would you write that?<p>I entirely agree with all your points that C and C++ are completely different languages at this point. And yet I wanted to write this post about something that is true for both.</p>
]]></description><pubDate>Wed, 20 May 2026 09:42:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205255</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205255</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205255</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here.<p>I touched on this in the "it's not about optimizations" section. It's not the compiler is out to get you. It's that you told it to do something it cannot express.<p>It's like if you slipped in a word in French, and not being programmed for French, it misheard the word as a false friend in English. The compiler had no way to represent the French word in it's parse tree.<p>So no, it's not overly legalistic. Like if the compiler knows that this hardware can do unaligned memory access, but not atomic unaligned access, should it check for alignment in std::atomic<int> ptr but not in int ptr? Probably not, right?</p>
]]></description><pubDate>Wed, 20 May 2026 09:38:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205235</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205235</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205235</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here<p>> The part about hardware is wrong BTW<p>Could you be more specific? I think by "wrong" you may mean "not actually relevant to UB", and you're right about that. If that's what you mean then that part is not for you. It's for the "but it's demonstrably fine" crowd.<p>> the hardware semantics are clearly defined<p>Yup. The article means to dive from the C abstract machine to illustrate how your defined intentions (in your head), written as UB C, get translated into defined hardware behavior that you did not intend.<p>I'm not saying the CPU has UB, and I wonder what part made you think I did.<p>That's what I mean game of telephone. The UB parts get interpreted as real instructions by the hardware, and it will definitely do those things. But what are those things? It's not the things you intended, and any "common sense" reading of the C code is irrelevant, because the C representation of your intentions were UB.</p>
]]></description><pubDate>Wed, 20 May 2026 09:32:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205187</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205187</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205187</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here.<p>> The article suggests using LLMs to identify and fix UB. However as per the above, I think the issue is that we need more expert humans.<p>Yup. But the point of the article is that even expert humans cannot do this alone. And as I wrote, LLM+junior won't suffice either. We need LLM+senior experts.<p>And it's a problem that we have way more existing UB than expert capacity.<p>Now, will LLMs and experts <i>both</i> miss UB in some cases? Of course. There's no 100% solution. But LLMs, I claim, will <i>find</i> orders of magnitude more, with low false positive, than any expert. Even if these expert humans (like in the OpenBSD case for the two bugs I found, one of which was UB) are given more than three decades to do it.<p>I didn't even use the best model, complex code target, or time. I just wanted to choose a target that has a high chance of having very good experts already having audited it.</p>
]]></description><pubDate>Wed, 20 May 2026 09:25:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205138</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205138</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205138</guid></item><item><title><![CDATA[New comment by thomashabets2 in "Everything in C is undefined behavior"]]></title><description><![CDATA[
<p>Author here.<p>As I stated:<p>> The following is not an attempt at enumerating all the UB in the world. It’s merely making the case that UB is everywhere, and if nobody can do it right, how is it even fair to blame the programmer? My point is that ALL nontrivial C/C++ code has UB.<p>It's about that point, not about how to avoid it. Because you can't.</p>
]]></description><pubDate>Wed, 20 May 2026 09:20:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48205110</link><dc:creator>thomashabets2</dc:creator><comments>https://news.ycombinator.com/item?id=48205110</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48205110</guid></item></channel></rss>