<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mikemike</title><link>https://news.ycombinator.com/user?id=mikemike</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 06 Apr 2026 07:42:46 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mikemike" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mikemike in "A Walk with LuaJIT"]]></title><description><![CDATA[
<p>A good read if you want to learn (more than you ever wanted) about stack frame unwinding in conjunction with a JIT compiler.<p>The only correction I have: LuaJIT _does_ have 64 bit integers, e.g. 0x0123456789abcdefLL.</p>
]]></description><pubDate>Wed, 13 Nov 2024 19:59:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=42129318</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=42129318</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42129318</guid></item><item><title><![CDATA[A Walk with LuaJIT]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.polarsignals.com/blog/posts/2024/11/13/lua-unwinding">https://www.polarsignals.com/blog/posts/2024/11/13/lua-unwinding</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42129277">https://news.ycombinator.com/item?id=42129277</a></p>
<p>Points: 20</p>
<p># Comments: 2</p>
]]></description><pubDate>Wed, 13 Nov 2024 19:54:27 +0000</pubDate><link>https://www.polarsignals.com/blog/posts/2024/11/13/lua-unwinding</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=42129277</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42129277</guid></item><item><title><![CDATA[New comment by mikemike in "Modernizing compiler design for Carbon's toolchain [video]"]]></title><description><![CDATA[
<p>This is conjecture. OTOH I measured while I designed the LuaJIT IR.<p>1. An array index is just as suitable as a pointer for dereferencing. 
2. What matters is how many dereferences are needed and their locality. 
3. Data structure density is important to get high cache utilization.<p>References show a lot of locality: 40% of all IR operands reference the previous node. 70% reference the previous 10 nodes. A linear IR is the best cache-optimized data structure for this.<p>That said, dereferencing of an operand happens less often than one might think. Most of the time, one really needs the operand index itself, e.g. for hashes or comparisons. Again, indexes have many advantages over pointers here.<p>What paid off the most was to use a fixed size IR instruction format (only 64 bit!) with 2 operands and 16 bit indexes. The restriction to 2 operands is actually beneficial, since it helps with commoning and makes you think about IR design. The 16 bit index range is not a limitation in practice (split IR chunks, if you need to). The high orthogonality of the IR avoids many iterations and unpredictable branches in the compiler itself.<p>The 16 bit indexes also enable the use of tagged references in the compiler code (not in the IR). The tag caches node properties: type, flags, constness. This avoids even more dereferences. LuaJIT uses this in the front pipeline for fast type checks and on-the-fly folding.</p>
]]></description><pubDate>Sun, 27 Aug 2023 10:38:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=37281329</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=37281329</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37281329</guid></item><item><title><![CDATA[New comment by mikemike in "The Solid-State Register Allocator"]]></title><description><![CDATA[
<p>I had already changed the title after your reply. The objection is about the naming, which implies an invention claim without further explanation. It's not about the code.</p>
]]></description><pubDate>Wed, 05 Oct 2022 13:35:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=33095373</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=33095373</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33095373</guid></item><item><title><![CDATA[New comment by mikemike in "The Solid-State Register Allocator"]]></title><description><![CDATA[
<p>You may want to clarify that in the GitHub repo, too. See my issue there.<p>If you want to go the didactic route, then consider documenting the improvements over the naive implementation: register hinting, register priorities (PHI), two-headed register picking, fixed register picking, optimized register picking for 2-operand instructions (x86/x64), register pair picking, ABI calling-conventions, weak allocations, cost heuristics, eviction heuristics, lazy/eager spill/restore, rematerialization, register shuffling (PHI) with cycle breaking, register renaming, etc. That's all in ~2000 lines of lj_asm.c.</p>
]]></description><pubDate>Wed, 05 Oct 2022 12:22:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=33094484</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=33094484</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33094484</guid></item><item><title><![CDATA[New comment by mikemike in "The Solid-State Register Allocator"]]></title><description><![CDATA[
<p>Uh? This *is* the LuaJIT register allocator. Period.<p>Code published 2009. Description published here: <a href="https://lua-users.org/lists/lua-l/2009-11/msg00089.html" rel="nofollow">https://lua-users.org/lists/lua-l/2009-11/msg00089.html</a> (ignore the TLS cert error).<p>Coming up with a silly markting name, writing a naive implementation and then claiming it's their invention is impertinent. Especially since they mention LuaJIT itself in the text ...</p>
]]></description><pubDate>Wed, 05 Oct 2022 09:59:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=33093358</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=33093358</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33093358</guid></item><item><title><![CDATA[New comment by mikemike in "An unexpected Redis sandbox escape affecting Debian-based distros"]]></title><description><![CDATA[
<p>That's what I'm wondering, too, right now.<p>It's trivial to DoS-hang redis with the script feature (and SCRIPT KILL won't help).<p>And I found at least 3 DoS-crash, because it hasn't backported fixes to its copy of Lua 5.1.5 (but Debian's liblua 5.1 might -- I haven't checked).<p>And that's without even exploring the really problematic builtins it still has available.<p>Maybe they should instead clarify their security guarantee for redis scripting (e.g. "none").</p>
]]></description><pubDate>Wed, 09 Mar 2022 21:47:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=30620445</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=30620445</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30620445</guid></item><item><title><![CDATA[New comment by mikemike in "An unexpected Redis sandbox escape affecting Debian-based distros"]]></title><description><![CDATA[
<p>Yes, of course it's vulnerable, verified with Docker debian:sid. That was my first reaction when I read this, but I wanted to verify it first. You beat me with this post.<p>Since you've already let the cat out of the hat (which is not ideal), please file the bugs at Debian and Ubuntu.<p>Test command:<p><pre><code>    redis-cli eval 'return select(2, loadstring("\027")):match("binary") and "VULNERABLE" or "OK"' 0
</code></pre>
While we're at it, redis has ignored the advice at: <a href="http://lua-users.org/wiki/SandBoxes" rel="nofollow">http://lua-users.org/wiki/SandBoxes</a>
Almost all of the critical functions (loadstring, load, getmetatable, getfenv, ...) are present and unprotected in the redis 'SandBox' (which isn't).<p>Which means, disable scripting or shut down your redis instances NOW, which do not run with the same privileges as any client which has access to this. Scripting can be disabled by renaming the EVAL and EVALSHA commands to unguessable names.</p>
]]></description><pubDate>Wed, 09 Mar 2022 18:49:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=30618480</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=30618480</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30618480</guid></item><item><title><![CDATA[New comment by mikemike in "Malloc broke Serenity's JPGLoader, or: how to win the lottery"]]></title><description><![CDATA[
<p>One year ago I hardened LuaJIT's VM against these kind of attacks. Since then, there has been a constant influx of complaints and issues filed. All bitterly complaining their code, which mistakenly assumed a fixed hash table iteration order, is now broken.<p>Even when told that the Lua manual clearly states the undefined order since 20 years, they do not cease to complain. They do not realize this change helped them to discover a serious bug in their code (the order could differ even before that change). Sigh.<p>You can now have a guess, what one of the lesser enlightened forks of LuaJIT did ...</p>
]]></description><pubDate>Thu, 03 Jun 2021 11:22:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=27379555</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=27379555</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27379555</guid></item><item><title><![CDATA[New comment by mikemike in "Compiler Optimizations are Awesome"]]></title><description><![CDATA[
<p>Actually, LuaJIT 1.x is just that: a translator from a register-based bytecode to machine code using templates (small assembler snippets) with fixed register assignment. There's only a little bit more magic to that, like template variants depending on the inferred type etc.<p>You can compare the performance of LuaJIT 1.x and 2.0 yourself on the benchmark page (for x86). The LuaJIT 1.x JIT-compiled code is only slightly faster than the heavily tuned LuaJIT 2.x VM plus the 2.x interpreter written in assembly language by hand. Sometimes the 2.x interpreter even beats the 1.x compiler.<p>A lot of this is due to the better design of the 2.x VM (object layout, stack layout, calling conventions, builtins etc.). But from the perspective of the CPU, a heavily optimized interpreter does not look that different from simplistic, template-generated code. The interpreter dispatch overhead can be moved to independent dependency-chains by the CPU, if you're doing this right.<p>Of course, the LuaJIT 2.x JIT compiler handily beats both the 2.x interpreter and the 1.x compiler.</p>
]]></description><pubDate>Thu, 01 Jun 2017 13:26:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=14460027</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=14460027</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=14460027</guid></item><item><title><![CDATA[New comment by mikemike in "DynASM"]]></title><description><![CDATA[
<p>Notable recent use of DynASM: Zend is using it to write a JIT compiler for PHP 8.0. <a href="http://externals.io/thread/268#email-12706-body" rel="nofollow">http://externals.io/thread/268#email-12706-body</a></p>
]]></description><pubDate>Sat, 03 Dec 2016 21:13:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=13097518</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=13097518</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=13097518</guid></item><item><title><![CDATA[New comment by mikemike in "My love-hate relationship with LuaJIT (2015)"]]></title><description><![CDATA[
<p>No. You DO need a good understanding of a computer language and of JIT compilers to understand the code base for any just-in-time compiler for that computer language.<p>LuaJIT is not a toy compiler from a textbook. There's a lot of inherent complexity in a production compiler that employs advanced optimizations and needs to work on various CPU architectures and operating systems. This reflects in the code.</p>
]]></description><pubDate>Mon, 26 Sep 2016 06:16:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=12579622</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=12579622</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12579622</guid></item><item><title><![CDATA[New comment by mikemike in "My love-hate relationship with LuaJIT (2015)"]]></title><description><![CDATA[
<p>This is a wrong perception. There is/was no shortage of sponsorships. I had to turn down most of these offers, due to time constraints.</p>
]]></description><pubDate>Mon, 26 Sep 2016 05:47:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=12579501</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=12579501</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12579501</guid></item><item><title><![CDATA[New comment by mikemike in "LuaJIT 2.0 intellectual property disclosure (2009)"]]></title><description><![CDATA[
<p>Just in case, anyone has somehow gotten to the conclusion that Lua's semantics are 'simple', they should closely inspect this example and try to figure out through which contortions the VM has to go to make this work:<p><pre><code>    local t = setmetatable({}, {
      __index = pcall, __newindex = rawset,
      __call = function(t, i) t[i] = 42 end,
    })
    for i=1,100 do assert(t[i] == true and rawget(t, i) == 42) end

</code></pre>
[LuaJIT has no problems with this code and turns it into 8 machine code instructions for the actual loop.]<p>Anyway ...<p>This permanent excuse of JavaScript proponents that it has more complex semantics, which somehow prevents it from being made fast, is getting old. There are no insurmountable obstacles to make JavaScript fast -- it just takes more effort!<p>And they dug this hole themselves, by not cleaning up the language and allowing new complicated features into the language. Well ...</p>
]]></description><pubDate>Mon, 21 Mar 2016 10:44:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=11327201</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=11327201</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=11327201</guid></item><item><title><![CDATA[New comment by mikemike in "Tracing JITs and modern CPUs part 3: A bad case"]]></title><description><![CDATA[
<p>By definition, a trace doesn't have internal branches.<p>The solution is to use Hyperblock Scheduling. This is an extra pass that merges multiple traces, e.g. the described root trace and its side trace. The result is a single trace with a predicated IR. This is amenable to most linear optimizations, with only minor limitations.<p>A predicated IR is the ideal representation to apply branch-free optimizations, using bit operations or SIMD tricks. If there are any predicates left in the IR, the compiler backend will either turn it into predicated machine code (on CPUs which support that to some extent, e.g. ARM32) or generate machine code with internal branches.</p>
]]></description><pubDate>Mon, 10 Aug 2015 07:37:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=10033174</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=10033174</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=10033174</guid></item><item><title><![CDATA[New comment by mikemike in "The death of optimizing compilers [pdf]"]]></title><description><![CDATA[
<p>Thank you for taking the time to perform these tests!<p>One thing that people advocating FDO often forget: this is statically tuning the code for a specific use case. Which is not what you want for an interpreter that has many, many code paths and is supposed to run a wide variety of code.<p>You won't get a 30% FDO speedup in any practical scenario. It does little for most other benchmarks and it'll pessimize quite a few of them, for sure.<p>Ok, so feed it with a huge mix of benchmarks that simulate typical usage. But then the profile gets flatter and FDO becomes much less effective.<p>Anyway, my point still stands: a factor of 1.1x - 1.3x is doable. Fine. But we're talking about a 3x speedup for my hand-written machine vs. what the C compiler produces. And that's only a comparatively tiny speedup you get from applying domain-specific knowledge. Just ask the people writing video codecs about their opinion on C vector intrinsics sometime.<p>I write machine code, so you don't have to. The fact that I have to do it at all is disappointing. Especially from my perspective as a compiler writer.<p>But DJB is of course right: the key problem is not the compiler. We don't have a source language that's at the right level to express our domain-specific knowledge while leaving the implementation details to the compiler (or the hardware).<p>And I'd like to add: we probably don't have the CPU architectures that would fit that hypothetical language.<p>See my ramblings about preserving programmer intent, I made in the past: <a href="http://www.freelists.org/post/luajit/Ramblings-on-languages-and-architectures-was-Re-any-benefit-to-throwing-off-lua51-constraints" rel="nofollow">http://www.freelists.org/post/luajit/Ramblings-on-languages-...</a></p>
]]></description><pubDate>Sat, 18 Apr 2015 12:31:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=9399206</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=9399206</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=9399206</guid></item><item><title><![CDATA[New comment by mikemike in "How does LuaJIT's trace compiler work?"]]></title><description><![CDATA[
<p>Oh, great! Thank you very much!</p>
]]></description><pubDate>Tue, 03 Dec 2013 21:43:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=6843405</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=6843405</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=6843405</guid></item><item><title><![CDATA[New comment by mikemike in "How does LuaJIT's trace compiler work?"]]></title><description><![CDATA[
<p>Oh, well ... pasting my standard rant on this:<p>This is a common misinterpretation of the Dynamo paper: they compiled their C code at the <i>lowest</i> optimization level and then ran the (suboptimal) machine code through Dynamo. So there was actually something left to optimize.<p>Think about it this way: a 20% difference isn't unrealistic if you compare -O1 vs. -O3.<p>But it's completely unrealistic to expect a 20% improvement if you'd try this with the machine code generated by a modern C compiler at the highest optimization level.<p>Claiming that JIT compilers outperform static compilers, solely based on this paper, is an untenable position.<p>But, yes, JIT compilers <i>can</i> outperform static compilers under specific circumstances. This has more to do with e.g. better profiling feedback or extra specialization opportunities. But this is not what this paper demonstrates.<p>Many compiler optimizations have strong non-linear costs in terms of the number of control flow edges. A static compiler has to punt at a certain complexity. OTOH a JIT compiler is free to ignore many edges, since it may fall back to an interpreter for cold paths or attach new code anytime later if some edges become hot.<p>One interesting example is auto-vectorization (SIMDization) where static compilers <i>have to</i> generate code for all possible combinations of vector alignments in case the underlying alignment of the participating vectors is not statically known. This quickly gets very expensive in terms of code space. OTOH a JIT compiler can simply specialize to the observed vector alignment(s) at runtime, which show almost no variation in practice.</p>
]]></description><pubDate>Fri, 29 Nov 2013 21:00:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=6821001</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=6821001</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=6821001</guid></item><item><title><![CDATA[New comment by mikemike in "How does LuaJIT's trace compiler work?"]]></title><description><![CDATA[
<p>To give credit, where credit is due: the original work on trace compilation is much, much older. The paper you cited is an application.<p>The fundamental papers to hunt for are Joseph A. Fisher's publications on trace scheduling (sadly, his PhD thesis from the 70ies is nowhere to be found online) and the Multiflow reports from the 90ies. The Dynamo paper built upon that foundation ten years later in '99 (get the full HP report, not the short summary). A related research area is about trace caches for use in CPUs with various papers from the 90ies.<p>AFAIK there's no up-to-date comprehensive summary of the state of research on trace compilers. Most papers don't even scratch the surface of the challenges you'll face when building a production-quality trace compiler.</p>
]]></description><pubDate>Fri, 29 Nov 2013 17:28:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=6820215</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=6820215</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=6820215</guid></item><item><title><![CDATA[New comment by mikemike in "How does LuaJIT's trace compiler work?"]]></title><description><![CDATA[
<p>Err, the title of this item is a bit misleading, although it's the subject of the posting on the mailing list.<p>One cannot explain it all in a single post. I just answered some specific questions, omitting most details.</p>
]]></description><pubDate>Fri, 29 Nov 2013 15:54:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=6819832</link><dc:creator>mikemike</dc:creator><comments>https://news.ycombinator.com/item?id=6819832</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=6819832</guid></item></channel></rss>