<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: boomanaiden154</title><link>https://news.ycombinator.com/user?id=boomanaiden154</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 01 Jul 2026 03:18:51 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=boomanaiden154" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by boomanaiden154 in "Analyzing Geekbench 6 under Intel's BOT"]]></title><description><![CDATA[
<p>Found it. It was <a href="https://www.phoronix.com/news/Intel-Thin-Layout-Optimizer" rel="nofollow">https://www.phoronix.com/news/Intel-Thin-Layout-Optimizer</a>.<p>It was open source, but has since been deprecated.</p>
]]></description><pubDate>Wed, 01 Apr 2026 05:08:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47597048</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=47597048</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47597048</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Analyzing Geekbench 6 under Intel's BOT"]]></title><description><![CDATA[
<p>Propeller can’t really do many instruction level modifications due to how it works (constructs a layout file that then gets passed to the linker).<p>BOLT could do this, but does not as far as I’m aware.<p>Most of vectorization like this is also probably better done in a compiler middle end. At least in LLVM, the loop vectorizer and especially the SLP Vectorizer do a decent job of picking up most of the gains.<p>You might be able to pick up some gains by doing it post-link at the MC level, but writing an IR level SLP Vectorizer is already quite difficult.</p>
]]></description><pubDate>Wed, 01 Apr 2026 05:05:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47597037</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=47597037</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47597037</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Analyzing Geekbench 6 under Intel's BOT"]]></title><description><![CDATA[
<p>I might be thinking of a different project then...<p>I swore Intel had their own PLO tool, but I can only find <a href="https://github.com/clearlinux/distribution/issues/2996" rel="nofollow">https://github.com/clearlinux/distribution/issues/2996</a>.</p>
]]></description><pubDate>Wed, 01 Apr 2026 05:01:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47597021</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=47597021</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47597021</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Analyzing Geekbench 6 under Intel's BOT"]]></title><description><![CDATA[
<p>Post link optimization (PLO) tools have been around for quite a while. In particular, Meta’s BOLT (fully upstream in LLVM) and Google’s Propeller (somewhat upstream in LLVM, but fully open source) have been around for 5+ years at this point.<p>It doesn’t seem like Intel’s BOT delivers more performance gains, and it is closed source.</p>
]]></description><pubDate>Wed, 01 Apr 2026 04:40:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47596901</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=47596901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47596901</guid></item><item><title><![CDATA[LLVM AI Policy and Automatic Bazel Fixes]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.phoronix.com/news/LLVM-AI-Tool-Policy-RFC">https://www.phoronix.com/news/LLVM-AI-Tool-Policy-RFC</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46369415">https://news.ycombinator.com/item?id=46369415</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 23 Dec 2025 21:01:46 +0000</pubDate><link>https://www.phoronix.com/news/LLVM-AI-Tool-Policy-RFC</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=46369415</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46369415</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Making the Clang AST Leaner and Faster"]]></title><description><![CDATA[
<p>Quite a few patches have landed. A couple features using this have already shipped in Apple’s downstream clang.</p>
]]></description><pubDate>Thu, 13 Nov 2025 04:01:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=45910461</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=45910461</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45910461</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>You can make the synthetic benchmarks relatively accurate, it just takes effort. The compile-time hit and additional effort is often worth it for the extra couple percent for important applications.<p>Performance is also pretty different on the scales that performance engineers are interested in for these sorts of production codes, but without the build system scalability problems that LTO has. The original AutoFDO paper shows an improvement of 10.5%->12.5% going from AutoFDO to instrumented PGO. That is pretty big. It's probably even bigger with newer instrumentation based techniques like CSPGO.<p>They also mention the exact reasons that AutoFDO will not perform as well, with issues in debug info and losing profile accuracy due to sampling inaccuracy.<p>I couldn't find any numbers for Chrome, but I am reasonably certain that they have tried both and continue to use instrumented PGO for the extra couple percent. There are other pieces of the Chrome ecosystem (specifically the ChromeOS kernel) that are already optimized using sampling-based profiling. It's been a while since I last talked to the Chromium toolchain people about this though. I also remember hearing them benchmark FEPGO vs IRPGO at some point and concluding that IRPGO was better.</p>
]]></description><pubDate>Sat, 29 Jun 2024 16:43:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=40831783</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40831783</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40831783</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>I'm not sure where you're getting your information from.<p>Chrome (and many other performance-critical workloads) is using instrumented PGO because it gives better performance gains, not because it's a more mature path. AutoFDO is only used in situations where collecting data with an instrumented build is difficult.</p>
]]></description><pubDate>Sat, 29 Jun 2024 01:45:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=40827257</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40827257</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40827257</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>It's not called AutoFDO. AutoFDO refers to a specific sampling-based profile technique out of Google (<a href="https://dl.acm.org/doi/abs/10.1145/2854038.2854044" rel="nofollow">https://dl.acm.org/doi/abs/10.1145/2854038.2854044</a>). Sometimes people will refer to that as PGO though (with PGO and FDO being somewhat synonymous, but PGO seeming to be the preferred term in the open source LLVM world). Chrome specifically uses instrumented PGO which is very much not AutoFDO.<p>PGO works just fine in Rust and has support built into the compiler (<a href="https://doc.rust-lang.org/rustc/profile-guided-optimization.html" rel="nofollow">https://doc.rust-lang.org/rustc/profile-guided-optimization....</a>).</p>
]]></description><pubDate>Fri, 28 Jun 2024 23:25:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=40826417</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40826417</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40826417</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>Do you have more information on how the dataset was constructed?<p>It seems like somehow build systems were invoked given the different targets present in the final version?<p>Was it mostly C/C++ (if so, how did you resolve missing includes/build flags), or something else?</p>
]]></description><pubDate>Fri, 28 Jun 2024 22:38:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=40826071</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40826071</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40826071</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>I'm not sure it's likely that the LLM here learned from gcc. The size optimization work here is focused on learning phase orderings for LLVM passes/the LLVM pipeline, which wouldn't be at all applicable to gcc.<p>Additionally, they train approximately half on assembly and half on LLVM-IR. They don't talk much about how they generate the dataset other than that they generated it from the CodeLlama dataset, but I would guess they compile as much code as they can into LLVM-IR and then just lower that into assembly, leaving gcc out of the loop completely for the vast majority of the compiler specific training.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:53:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=40825182</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40825182</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40825182</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>This would be difficult to deploy as-is in production.<p>There are correctness issues mentioned in the paper regarding adjusting phase orderings away from the well-trodden O0/O1/O2/O3/Os/Oz path. Their methodology works for a research project quite well, but I personally wouldn't trust it in production. While some obvious issues can be caught by a small test suite and unit tests, there are others that won't be, and that's really risky in production scenarios.<p>There are also some practical software engineering things like deployment in the compiler. There is actually tooling in upstream LLVM to do this (<a href="https://www.youtube.com/watch?v=mQu1CLZ3uWs" rel="nofollow">https://www.youtube.com/watch?v=mQu1CLZ3uWs</a>), but running models on a GPU would  be difficult and I would expect CPU inference to massively blow up compile times.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:46:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=40825118</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40825118</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40825118</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>Sure, performance is more interesting, but it's significantly harder.<p>With code size, you just need to run the code through the compiler and you have a deterministic measurement for evaluation.<p>Performance has no such metric. Benchmarks are expensive and noisy. Cost models seem like a promising direction, but they aren't really there yet.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:42:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=40825074</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40825074</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40825074</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>I'm reasonably certain the authors are aware of alive2.<p>The problem with using alive2 to verify LLM based compilation is that alive2 isn't really designed for that. It's an amazing tool for catching correctness issues in LLVM, but it's expensive to run and will time out reasonably often, especially on cases involving floating point. It's explicitly designed to minimize the rate of false-positive correctness issues to serve the primary purpose of alerting compiler developers to correctness issues that need to be fixed.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:40:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=40825046</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40825046</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40825046</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>PGO can be used in such situations, but the profile needs to be checked in. Same code + same profile -> same binary (assuming the compiler is deterministic, which is tested quite extensively).<p>There are several big projects that use PGO (like Chrome), and you can get a deterministic build at whatever revision using PGO as the profiles are checked in to the repository.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:37:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=40825015</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40825015</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40825015</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>I would not say we are anywhere close to perfect in compilation.<p>Even just looking at inlining for size, there are multiple recent studies showing ~10+% improvement (<a href="https://dl.acm.org/doi/abs/10.1145/3503222.3507744" rel="nofollow">https://dl.acm.org/doi/abs/10.1145/3503222.3507744</a>, <a href="https://arxiv.org/abs/2101.04808" rel="nofollow">https://arxiv.org/abs/2101.04808</a>).<p>There is a massive amount of headroom, and even tiny bits still matter as ~0.5% gains on code size, or especially performance, can be huge.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:34:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=40824997</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40824997</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40824997</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Meta LLM Compiler: neural optimizer and disassembler"]]></title><description><![CDATA[
<p>Right, it's only solving phase ordering.<p>In practice though, correctness even over ordering of hand-written passes is difficult. Within the paper they describe a methodology to evaluate phase orderings against a small test set as a smoke test for correctness (PassListEval) and observe that ~10% of the phase orderings result in assertion failures/compiler crashes/correctness issues.<p>You will end up with a lot more correctness issues adjusting phase orderings like this than you would using one of the more battle-tested default optimization pipelines.<p>Correctness in a production compiler is a pretty hard problem.</p>
]]></description><pubDate>Fri, 28 Jun 2024 20:32:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=40824971</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=40824971</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40824971</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Cranelift code generation comes to Rust"]]></title><description><![CDATA[
<p>LLVM doesn’t spend really any runtime solving the phase ordering problem since the pass pipelines are static. There have been proposals to dynamically adjust the pipeline based on various factors, but those are a ways out from happening.</p>
]]></description><pubDate>Mon, 18 Mar 2024 15:58:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=39746245</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=39746245</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39746245</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "High-Throughput, Formal-Methods-Assisted Fuzzing for LLVM [pdf]"]]></title><description><![CDATA[
<p>Pretty much this. It's called Alive2.<p><a href="https://dl.acm.org/doi/abs/10.1145/3453483.3454030" rel="nofollow">https://dl.acm.org/doi/abs/10.1145/3453483.3454030</a></p>
]]></description><pubDate>Fri, 12 Jan 2024 09:04:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=38965749</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=38965749</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38965749</guid></item><item><title><![CDATA[New comment by boomanaiden154 in "Large Language Models for Compiler Optimization"]]></title><description><![CDATA[
<p>ML for phase ordering is just one problem that ML could solve within compilers.<p>Heuristic replacement (like loop unrolling) is another big one. For the specific case of loop unrolling, I would think lower level elements like how much iCache pressure the unrolling creates/whether or not the loop could fit in the DSB buffer  would matter more.<p>For your point about existing IRs being too low-level, there has been a large push to try and work on that. MLIR has been used pretty extensively for that problem in ML applications, and languages like Rust have multiple higher level IRs. There's also a preliminary implementation of a Clang-IR for C/C++, and there's even be some work on higher level representations within LLVM-IR itself.</p>
]]></description><pubDate>Mon, 18 Sep 2023 06:24:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=37552890</link><dc:creator>boomanaiden154</dc:creator><comments>https://news.ycombinator.com/item?id=37552890</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37552890</guid></item></channel></rss>