Hacker News: nlewycky

New comment by nlewycky in "Gecode is an open source C++ toolkit for developing constraint-based systems (2019)"

nlewycky — Sat, 05 Jul 2025 17:26:38 +0000

Gecode won the minizinc challenges from inception of the challenge in 2008 until 2012, but these days or-tools wins gold every year 2013 to 2024, and in 2024 swept gold in all categories.

Why is gecode interesting? Why use it over or-tools?

https://www.minizinc.org/challenge/

New comment by nlewycky in "Optimizers need a rethink"

nlewycky — Mon, 28 Oct 2024 18:08:47 +0000

Set e-graphs apart from e-matchers. An e-graph is a vector-of-sets, where each set contains equal representations of the same thing. What makes it space efficient is that your representation refers to things by index number in the vector, so if you have "3 * (x + 1)" as an expression, you store "x + 1" in vector #1 and "3 * #1". That way if "x + 1" is also known to equivalent to "load ptr" then you add that to #1 and don't need to add anything to #0. In e-graphs, #0 and #1 are e-classes, the sets holding equivalence classes. There's no limit to what transforms you can use, and let the e-graph hold them efficiently.

E-matchers are an algorithm over e-graphs to efficiently search for a pattern in an e-graph, for instance if you want to find "(load ptr) * (load ptr)" then you have to find "#x * #x" for all x in all e-classes, then check whether "load ptr" is a member of e-class #x. This is where you get limits on what you can match. "Pattern matching" style transforms are easy and fast using e-matchers, things like "x * 2" -> "x << 1", but beyond that they don't help.

There's an optimizer problem where you have "load ptr" and you solve ptr and figure out it's pointing to a constant value and you replace the load instruction with the constant value. Later, you get to code emission and you realize that you can't encode your constant in the CPU instruction, there's not enough bits. You now need to take the constant and stuff it into a constant pool and emit a load for it. If you had stored them in an e-graph, you could have chosen to use the load you already had.

Suppose you wanted to do sparse conditional constant/constant-range propagation but your IR uses an e-graph. You could analyze each expression in the e-class, intersect them, and annotate the resulting constant-range on the whole e-class. Then do SCCP as normal looking up the e-class for each instruction as you go.

New comment by nlewycky in "Optimizers need a rethink"

nlewycky — Sun, 27 Oct 2024 22:20:02 +0000

> I was bitten by one that injected timing attacks into certain integer operations by branching on the integer data in order to optimize 32-bit multiplications on 8-bit microcontrollers.

FWIW, I think this should be considered a language design problem rather than an optimizer design problem. Black box optimizer behaviour is good for enabling language designs that have little connection to hardware behaviour, and good for portability including to different extensions within an ISA.

C doesn't offer a way to express any timing guarantees. The compiler, OS, CPU designer, etc. can't even do the right thing if they wanted to because the necessary information isn't being received from the programmer.

New comment by nlewycky in ""We took on Google and they were forced to pay out £2B""

nlewycky — Sun, 27 Oct 2024 00:23:38 +0000

> No, literally nobody has ever wanted to be directed to "foundem", which is a scam.

Could you defend the statement that "foundem" is a scam? If it really was then that adds important context to the ruling, but I'll need something more concrete than hyperbole.

New comment by nlewycky in "96% of climate policy since 1998 failed"

nlewycky — Mon, 30 Sep 2024 06:57:54 +0000

2020 was one outlier data point caused by COVID?

New comment by nlewycky in "Undefined behavior in C is a reading error (2021)"

nlewycky — Sun, 08 Sep 2024 09:03:36 +0000

> > the epilogue is only correct if the function has void return type

> That's a lie.

All C functions return via a return statement with expression (only for non-void functions), a return statement without an expression (only for void functions) or by the closing of function scope (only for void functions). True?

The simple "spit out a block of assembly for each thing in the C code" compiler spits out the epilogue that works for void-returning functions, because we reach the end of the function with no return statement. That epilog might happen to work for non-void functions too, but unless we specify an ABI and examine that case, it isn't guaranteed to work for them. So it's not correct to emit it. True?

Where's the lie?

> > Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

> Don't come onto HN with the intent of engaging in bad faith.

Always! You too!

The text you quoted was referring to how real compilers handle falling off the end of a non-void function today with -fsanitize=return from UBSan. If I understand you correctly, in your reading a compiler with UBSan enabled is non-conforming because it fails to ignore the situation. That's not an argument as to whether your reading is right or wrong, but I do think UBSan compilation ought to be standard conforming, even if that means we need to add it to the Standard.

To the larger point, because the Standard doesn't define what "ignore" means, the user and implementer can't use it to pin down whether a given UB was ignored or not, and thus whether a given program was miscompiled or not. A compiler rewrites the code into its intermediate IR -- could be Z3 SMT solver or raw Turing Machine or anything -- then writes code back out. Can ignoring be done at any stage in the middle? Once the code has been converted and processed, how can you tell from the assembly output what's been ignored and what hasn't? If you demand certain assembly or semantics, isn't that just defining undefined behaviour? If you don't demand them, and leave the interpretation of "ignore" to the particular implementation of a compiler, yet any output could be valid for some potential design of compiler, why not allow any compiler emit whatever it wants?

New comment by nlewycky in "Undefined behavior in C is a reading error (2021)"

nlewycky — Thu, 05 Sep 2024 18:39:00 +0000

I'm a huge proponent of UBSan and ASan. Genuine curiosity, what don't you like about them?

FWIW, there once was a real good-faith effort to clean up the problems, Friendly C by Prof Regehr, https://blog.regehr.org/archives/1180 and https://blog.regehr.org/archives/1287 .

It turns out it's really hard. Let's take an easy-to-understand example, signed integer overflow. C has unsigned types with guaranteed 2's complement rules, and signed types with UB on overflow, which leaves the compiler free to rewrite the expression using the field axioms, if it wants to. "a = b * c / c;" may emit the multiply and divide, or it can eliminate the pair and replace the expression with "a = b;".

Why do we connect interpreting the top bit as a sign bit with whether field axiom based rewriting should be allowed? It would make sense to have a language which splits those two choices apart, but if you do that, either the result isn't backwards compatible with C anyways or it is but doesn't add any safety to old C code even as it permits you to write new safe C code.

Sometimes the best way to rewrite an expression is not what you'd consider "simplified form" from school because of the availability of CPU instructions that don't match simple operations, and also because of register pressure limiting the number of temporaries. There's real world code out there that has UB in simple integer expressions and relies on it being run in the correct environment, either x86-64 CPU or ARM CPU. If you define one specific interpretation for the same expression, you are guaranteed to break somebody's real world "working" code.

I claim without evidence that trying to fix up C's underlying issues is all decisions like this. That leads to UBSan as the next best idea, or at least, something we can do right now. If nothing else it has pedagogical value in teaching what the existing rules are.

New comment by nlewycky in "Undefined behavior in C is a reading error (2021)"

nlewycky — Thu, 05 Sep 2024 06:54:40 +0000

This is the most normal case though, isn't it? Suppose a very simple compiler, one that sees a function so it writes out the prologue, it sees the switch so it writes out the jump tables, it sees each return statement so it writes out the code that returns the values, then it sees the function closing brace and writes out a function epilogue. The problem is that the epilogue is wrong because there is no return statement returning a value, the epilogue is only correct if the function has void return type. Depending on ABI, the function returns to a random address.

Most of the time people accuse compilers of finding and exploiting UB and say they wish it would just emit the straight-forward code, as close to writing out assembly matching the input C code expression by expression as possible. Here you have an example where the compiler never checked for UB let alone proved presence of UB in any sense, it trusted the user, it acted like a high-level assembler, yet this compiler is still not ignoring UB for you? What does it take? Adding runtime checks for the UB case is ignoring? Having the compiler find the UB paths to insert safety code is ignoring?

New comment by nlewycky in "Undefined behavior in C is a reading error (2021)"

nlewycky — Thu, 05 Sep 2024 03:31:50 +0000

This construction is called a "false range" in English.

https://www.chicagomanualofstyle.org/qanda/data/faq/topics/C... https://www.cjr.org/language_corner/out_of_range.php

The wording change from Permissible to Possible and making it non-normative was an attempt to clarify that the list of behaviors that follows is a false range and not an exhaustive list.

It's a submarine change because in the eyes of the committee, this is not a change, merely a clarification of what it already said, to guard against ongoing misinterpretation.

New comment by nlewycky in "Undefined behavior in C is a reading error (2021)"

nlewycky — Thu, 05 Sep 2024 03:14:45 +0000

Why isn't that fine? The compiler ignored the undefined behavior it didn't detect.

New comment by nlewycky in "Non-computability of solutions of certain equations on digital computers (2022)"

nlewycky — Sun, 04 Aug 2024 17:53:45 +0000

> The continuity of physical processes is not something that can be proved.

I agree with you, but is there any peer-reviewed publication that can be cited? The idea makes sense to me, firstly the Reals \ Inaccessible Reals = Computable Reals, secondly you can't ever input an inaccessible real to an experiment nor retrieve one out of an experiment -- but then I'm not completely certain in making the conclusion that no experiment can be devised which shows that inaccessible reals exist in physical space.

I am concerned about this in the field of complexity analysis of quantum computers too, I think that the use of reals in physics is leading to mathematically correct but non-physical results about complexity theory of quantum computers. Having a paper to point at and say "look, stop assuming your Bloch spheres are backed by uncountable sets, it's leaking non-computable assumptions into your analysis of computation" would be helpful.

New comment by nlewycky in "Non-computability of solutions of certain equations on digital computers (2022)"

nlewycky — Sun, 04 Aug 2024 02:38:26 +0000

From the abstract, "simulations of such systems are usually done on digital computers, which are able to compute with finite sequences of rational numbers only."

Not at all! Digital computers can use computable reals which are defined as any function from a rational value (the error bound) to a rational value which is within the supplied error-bounds from the true value. Do not mistake this for computing in the rationals, these functions which perform the described task are the computable real numbers. There are countable-infinity many of these functions, one for each computable real. For instance, note that you can always compare two rationals for equality, but you can't always compare two computable reals for equality, just like reals.

Hans Boehm (of Boehm garbage collector fame) has been working on this for a long time, here is a recent paper on it: https://dl.acm.org/doi/pdf/10.1145/3385412.3386037

New comment by nlewycky in "The difference between undefined behavior and ill-formed C++ programs"

nlewycky — Sat, 03 Aug 2024 22:47:19 +0000

The compiler may remove the nullptr check in:

  ptr->foo = 1;
  if (ptr == nullptr)
     return;

but it may not remove the nullptr check in:

  if (ptr == nullptr)
     return;
  ptr->foo = 1;

New comment by nlewycky in "How GCC and Clang handle statically known undefined behaviour"

nlewycky — Tue, 25 Jun 2024 15:23:28 +0000

> If the print blocks indefinitely then that division will never execute, and GCC must compile a binary that behaves correctly in that case.

Is `printf` allowed to loop infinitely? Its behaviour is defined in the language standard and GCC does recognize it as not being a user-defined function.

New comment by nlewycky in "Microsoft Chose Profit over Security, Whistleblower Says"

nlewycky — Wed, 19 Jun 2024 05:52:15 +0000

> You're right, most times organizations need a fire lit underneath them to change, for Google, it probably was the NSA annotation "SSL added and removed here :^)" on a slide showing Google's architecture from the Snowden leaks.

As an insider, it was not. The move to zero-trust started with "A new approach to China": https://googleblog.blogspot.com/2010/01/new-approach-to-chin...

New comment by nlewycky in "TCP connection timeout mystery"

nlewycky — Mon, 25 Mar 2024 21:51:59 +0000

The symptoms match my experience with a mid-network firewall/router that is not aware of TCP window scaling stripping out the scaling factor while leaving the window scaling feature enabled. See https://lwn.net/Articles/92727/

New comment by nlewycky in "The return of the frame pointers"

nlewycky — Sun, 17 Mar 2024 05:24:14 +0000

It caused a problem when building inline assembly heavy code that tried to use all the registers, frame pointer register included.

New comment by nlewycky in "Undefined Behavior in C and C++"

nlewycky — Fri, 08 Mar 2024 03:21:54 +0000

> If I put in a dereference, I expect a dereference to happen. Not dereferencing the pointer when I wrote a dereference operator seems like going too far.

Surely not? I mean, you probably didn't intend to include unevaluated contexts like "sizeof(ptr)" where putting in a memory access is forbidden, but I think nearly-all programmers fully expect the compiler to delete the dead store in "ptr = a; ptr = b;" or "ptr = x; free(ptr);" and would get annoyed if it didn't. Especially if we can't just take a scalar computation in a loop, move the memory access to register, then store it to memory only once when we're done.

I once did a cleanup of undefined behaviour dereferencing NULL pointers (-fsanitize=null) and I got a lot of pushback from people complaining about "&ptr" where the ptr is NULL, because the compiler doesn't emit any assembly for that, so their code is just fine as is.

The rule for memory is that all memory you've stored to has an effective-type -- same as the static types but for addresses at runtime -- and a pointer has to point to an object with the effective-type matching the pointer's static type. Further details aside (uninitialized pointers, pointers to data you just freed, freshly malloc'd memory which has no effective type yet, unions) when you think of it in this model, the fact you can't have an int and float* pointing to the memory feels natural.

New comment by nlewycky in "Why do we need for an Undefined Behavior Annex to C++"

nlewycky — Fri, 01 Mar 2024 18:51:38 +0000

The hard part of UB isn't where it's mentioned in the standard. The problem is when the standard says "in case A, X occurs, in case B, Y occurs" then somebody invents a situation where neither A nor B apply.

New comment by nlewycky in "Undefined Behavior in C and C++"

nlewycky — Sun, 25 Feb 2024 08:38:20 +0000

> Deleting code that isn’t dead, it just doesn’t have a universally defined behavior, is the issue.

Can I delete "if (x & 3 == 16)" without a warning? There is no 'x' which makes that expression true, so I can safely fold it to false without a warning?

Can I delete "if (x + 1 < x)" without a warning? There is no signed 'x' which makes that expression true, so I can safely fold it to false without a warning?

How about this:

    int x = 7;
    call_function_outside_this_file();
    if (x != 7) { /* dead */ }

Does deleting the code require a warning or no?

Or this:

    void f(int *x, float *y) {
      *x = 1;
      *y = 2;
      if (*x != 1) { /* dead */ }

A float cannot alias an int, so '*x' can not have changed. Warning or no?

The problem with UB is that you can use it to set up impossible situations, like create an 'x' where x & 3 == 16 is true or a variable whose address was never taken being modified through a pointer, and so on. If you account for UB then "code that doesn't have a universally defined behaviour" becomes all code.

Ideally I think the first two examples should have warnings, though not because we delete the code, and the last two shouldn't? The warning should be because it's a tautology so the human likely didn't mean to write that (for instance if the human wrote it indirectly through macros, then we shouldn't warn on it).