<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: vimarsh6739</title><link>https://news.ycombinator.com/user?id=vimarsh6739</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 20 Jun 2026 01:53:05 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=vimarsh6739" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by vimarsh6739 in "CS 6120: Advanced Compilers: The Self-Guided Online Course (2020)"]]></title><description><![CDATA[
<p>I disagree with several parts here. But hopefully, this leads to a fun discussion!<p>> no need for a lexer with a recursive descent parser<p>I'd argue that teaching how to write a lexer + recursive descent parser is <i>more</i> relevant in the context of production compilers: many major production compilers out there use hand-written recursive descent parsers (cpp, javac, rust, go,javascript...). Recursive descent parsers are also really nice for emitting error messages.<p>> It is better for a first compiler to compile to a higher level language in which neither register assignment nor memory management are necessary.<p>Compiling to a high-level target can be a reasonable first project(e.g., you can emit LLVM), but imo its a different objective from learning the full stack. Emitting actual ISA instructions(even sub-optimally, after all it's a university course) forces you to learn calling conventions, isel, register pressure, stack layouts etc. Building a compiler,at least for me, is probably one of the easiest ways to understand how all of it works together.<p>> optimize a specific function by source level rewrite<p>I don't think replacing optimizations with a per-function source-level rewrite works as a general model. Many optimizations are not local to a single function (for example, inlining function calls can lead to new constant-propagation opportunities). If your argument rests on the fact that not all functions are hot, a lot of general-purpose JIT compilers out there already use runtime info to decide when to optimize hot functions, so part of what you're proposing already exists.<p>> implementation is human readable and not buried in a binary<p>Is this really a requirement for your program? In most cases, I think the optimization story is more like: "code you want to write" != "code you want to run"<p>> Moreover, and perhaps even more importantly, by not doing optimizations in the compiler, compilation times can be much faster, easily 100-1000x than state of the art optimizing compiler, while generating equivalent or even better runtime performance<p>I think the actual answer here is "it depends". For long-running programs, one tradeoff is build time vs future execution time. Also many optimizations cannot be expressed in source code itself. For example, in C++, you can do stuff like whole program de-virtualization only at link time, which is why LTO exists.<p>Aside: I personally work on source-to-source automatic differentiation inside compilers, and I can give examples for missed optimizations in generated derivative code if you don't run existing optimization passes like LICM/CSE before differentiating a function.</p>
]]></description><pubDate>Fri, 19 Jun 2026 06:28:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=48595469</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=48595469</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48595469</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "CUDA-oxide: Nvidia's official Rust to CUDA compiler"]]></title><description><![CDATA[
<p>Really hard to find alternatives to Julia for AD as a first class citizen</p>
]]></description><pubDate>Mon, 11 May 2026 17:35:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48098026</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=48098026</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48098026</guid></item><item><title><![CDATA[TileIR Internals]]></title><description><![CDATA[
<p>Article URL: <a href="https://maknee.github.io/blog/2026/NVIDIA-TileIR-Internals-from-CuTile-to-MLIR-LLVM-to-SASS/">https://maknee.github.io/blog/2026/NVIDIA-TileIR-Internals-from-CuTile-to-MLIR-LLVM-to-SASS/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46827090">https://news.ycombinator.com/item?id=46827090</a></p>
<p>Points: 10</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 30 Jan 2026 17:20:25 +0000</pubDate><link>https://maknee.github.io/blog/2026/NVIDIA-TileIR-Internals-from-CuTile-to-MLIR-LLVM-to-SASS/</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=46827090</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46827090</guid></item><item><title><![CDATA[Updated Practice for Review Articles and Position Papers in ArXiv CS Category]]></title><description><![CDATA[
<p>Article URL: <a href="https://blog.arxiv.org/2025/10/31/attention-authors-updated-practice-for-review-articles-and-position-papers-in-arxiv-cs-category/">https://blog.arxiv.org/2025/10/31/attention-authors-updated-practice-for-review-articles-and-position-papers-in-arxiv-cs-category/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45779541">https://news.ycombinator.com/item?id=45779541</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 01 Nov 2025 05:57:34 +0000</pubDate><link>https://blog.arxiv.org/2025/10/31/attention-authors-updated-practice-for-review-articles-and-position-papers-in-arxiv-cs-category/</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=45779541</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45779541</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Carefully but Purposefully Oxidising Ubuntu"]]></title><description><![CDATA[
<p>To me, this feels less about Rust and more about moving away from copyleft.</p>
]]></description><pubDate>Thu, 13 Mar 2025 18:19:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=43355938</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=43355938</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43355938</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Finite Field Assembly: A Language for Emulating GPUs on CPU"]]></title><description><![CDATA[
<p>yes, but in practice, I believe people spam __syncthreads() in GPU kernels just to ensure correctness. There is value in statically proving that you don't need a synchronization instruction at a certain point. Doubly more so in the transpilation case, when you now find your naive __syncthreads() being called multiple times due to it being present in CUDA code(or MLIR in this case).<p>An interesting add on to me would be the handling of conditionals. Because newer GPUs have independent thread scheduling which is not present in the older ones, you have to wonder what is the desired behaviour if you are using CPU execution as a debugger of sorts(or are just GPU poor). It'd be super cool to expose those semantics as a compiler flag for your transpiler, allowing me to potentially debug some code as if it ran on an ancient GPU like a K80 for some fast local debugging.<p>But the ambitious question here is this - if you take existing GPU code, run it through a transpiler and generate better code than handwritten OpenMP, do you need to maintain an OpenMP backend for the CPU in the first place? It'd be better to express everything in a more richer parallel model with support for nested synchronization right? And let the compiler handle the job of inter-converting between parallelism models. It's like saying if Pytorch 2.0 generates good Triton code, we could just transpile that to CPUs and get rid of the CPU backend. (of course triton doesn't support all patterns so you would fall back to aten, and this kind of goes for a toss)</p>
]]></description><pubDate>Sat, 18 Jan 2025 05:34:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=42746125</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=42746125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42746125</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Finite Field Assembly: A Language for Emulating GPUs on CPU"]]></title><description><![CDATA[
<p>One of the more subtle aspects of retargeting GPU code to run on the CPU is the presence of fine grained(read - block level and warp level) explicit synchronization mechanisms being available in the GPU. However, this is not the same in CPU land, so additional care has to be taken to handle this. One example of work which tries this is <a href="https://arxiv.org/pdf/2207.00257" rel="nofollow">https://arxiv.org/pdf/2207.00257</a> .<p>Interestingly, in the same work, contrary to what you’d expect, transpiling GPU code to run on CPU gives ~76% speedups in HPC workloads compared to a hand optimized multi-core CPU implementation on Fugaku(a CPU only supercomputer), after accounting for these differences in synchronization.</p>
]]></description><pubDate>Sat, 18 Jan 2025 01:19:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=42744915</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=42744915</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42744915</guid></item><item><title><![CDATA[Perspectives on Floating Point]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.eigentales.com/Floating-Point/">https://www.eigentales.com/Floating-Point/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41845884">https://news.ycombinator.com/item?id=41845884</a></p>
<p>Points: 54</p>
<p># Comments: 19</p>
]]></description><pubDate>Tue, 15 Oct 2024 07:37:37 +0000</pubDate><link>https://www.eigentales.com/Floating-Point/</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=41845884</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41845884</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Groq runs Mixtral 8x7B-32k with 500 T/s"]]></title><description><![CDATA[
<p>Thanks for the quick reply! About hardware support, I was wondering if the LPU has a hardware instruction to compute the attention matrix similar to the MatrixMultiply/Convolve instruction in the TPU ISA. (Maybe a hardware instruction which fuses a softmax on the matmul epilogue?)</p>
]]></description><pubDate>Mon, 19 Feb 2024 19:29:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=39433652</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=39433652</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39433652</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Groq runs Mixtral 8x7B-32k with 500 T/s"]]></title><description><![CDATA[
<p>Might be a bit out of context, but isn't the TPU also optimized for low latency inference? (Judging by reading the original TPU architecture paper here - <a href="https://arxiv.org/abs/1704.04760" rel="nofollow">https://arxiv.org/abs/1704.04760</a>). If so, does Groq actually provide hardware support for LLM inference?</p>
]]></description><pubDate>Mon, 19 Feb 2024 19:07:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=39433426</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=39433426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39433426</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Apple pulls plug on Goldman credit-card partnership"]]></title><description><![CDATA[
<p>The card is pretty useful to me as a first card since it has no foreign transaction fee.</p>
]]></description><pubDate>Wed, 29 Nov 2023 00:51:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=38453935</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=38453935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38453935</guid></item><item><title><![CDATA[Debugging a Bit-Flip Error (2023)]]></title><description><![CDATA[
<p>Article URL: <a href="https://sillycross.github.io/2023/06/11/2023-06-11/">https://sillycross.github.io/2023/06/11/2023-06-11/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=37699331">https://news.ycombinator.com/item?id=37699331</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 29 Sep 2023 04:39:33 +0000</pubDate><link>https://sillycross.github.io/2023/06/11/2023-06-11/</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=37699331</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37699331</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Ask HN: What's the Situation with YouTube-Dl?"]]></title><description><![CDATA[
<p>it isn't a real download, because you don't have access to the raw file.</p>
]]></description><pubDate>Sat, 26 Aug 2023 12:21:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=37272163</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=37272163</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37272163</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "AI Boyfriend [video]"]]></title><description><![CDATA[
<p>KRAZAM is really under-rated. It is one of those few YouTube channels which produce really really good content and stories relative to their size.</p>
]]></description><pubDate>Tue, 08 Aug 2023 21:29:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=37055717</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=37055717</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37055717</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Intel Arc A580 could be the next great affordable GPU"]]></title><description><![CDATA[
<p>How does oneAPI/SYCL compare to CUDA? We certainly need an alternative to OpenCL, but every day, I can't help but notice the widening gulf between CUDA and any other GPGPU API out there.</p>
]]></description><pubDate>Sat, 05 Aug 2023 08:53:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=37010235</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=37010235</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37010235</guid></item><item><title><![CDATA[LLeaves: A LLVM-based compiler for LightGBM decision trees]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/siboehm/lleaves">https://github.com/siboehm/lleaves</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=36646087">https://news.ycombinator.com/item?id=36646087</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 08 Jul 2023 16:40:14 +0000</pubDate><link>https://github.com/siboehm/lleaves</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=36646087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36646087</guid></item><item><title><![CDATA[Screen Sizes and Breakpoints for Responsive Design]]></title><description><![CDATA[
<p>Article URL: <a href="https://learn.microsoft.com/en-us/windows/apps/design/layout/screen-sizes-and-breakpoints-for-responsive-design">https://learn.microsoft.com/en-us/windows/apps/design/layout/screen-sizes-and-breakpoints-for-responsive-design</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=35299622">https://news.ycombinator.com/item?id=35299622</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 25 Mar 2023 05:04:29 +0000</pubDate><link>https://learn.microsoft.com/en-us/windows/apps/design/layout/screen-sizes-and-breakpoints-for-responsive-design</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=35299622</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35299622</guid></item><item><title><![CDATA[New comment by vimarsh6739 in "Ask HN: If I get locked out of everything, please try to help me"]]></title><description><![CDATA[
<p>I really don't have anything to say to the OP, but I wonder(in a similar situation) if with the recent push towards e-sim, will SMS based 2FA become more problematic?<p>If a phone with an e-sim dies, and you need some kind of OTP, I wonder how you'll receive it. You can't exactly 'transplant' the SIM into another phone.</p>
]]></description><pubDate>Tue, 13 Dec 2022 09:33:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=33966541</link><dc:creator>vimarsh6739</dc:creator><comments>https://news.ycombinator.com/item?id=33966541</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33966541</guid></item></channel></rss>