<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: shihab</title><link>https://news.ycombinator.com/user?id=shihab</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 20 Apr 2026 17:41:09 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=shihab" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by shihab in "The United States and Israel have launched a major attack on Iran"]]></title><description><![CDATA[
<p>I brought up Israeli-American donors because that’s what is relevant in the <i>context of the story we’re discussing.</i> We are talking about a war many right wing Israelis wanted for decades. If it were a general discussion about Citizens United and I focused on lobbying from only this group, perhaps your argument would have carried water.<p>Anyway, here’s Trump himself detailing the extraordinary access to White House this lobbying bought Adelsons:<p><a href="https://www.reuters.com/world/us/trump-salutes-mega-donor-miriam-adelson-help-shaping-us-decisions-israel-2025-10-13/" rel="nofollow">https://www.reuters.com/world/us/trump-salutes-mega-donor-mi...</a></p>
]]></description><pubDate>Sat, 28 Feb 2026 17:27:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47197913</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=47197913</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47197913</guid></item><item><title><![CDATA[New comment by shihab in "The United States and Israel have launched a major attack on Iran"]]></title><description><![CDATA[
<p>Exactly what part of my statement was dog whistling? Can you stop throwing around this serious accusation of antisemitism without any attempt to substantiate your claim?</p>
]]></description><pubDate>Sat, 28 Feb 2026 10:34:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47193411</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=47193411</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47193411</guid></item><item><title><![CDATA[New comment by shihab in "The United States and Israel have launched a major attack on Iran"]]></title><description><![CDATA[
<p>Citizens United is an existential threat for USA. You cannot have Israeli-American dual citizens pouring $200 million dollars in elections. and that’s just her alone. This is simply not sustainable.</p>
]]></description><pubDate>Sat, 28 Feb 2026 08:50:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47192462</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=47192462</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47192462</guid></item><item><title><![CDATA[New comment by shihab in "The United States and Israel have launched a major attack on Iran"]]></title><description><![CDATA[
<p>Another mid east war entirely on Israel’s behalf, another war Americans will pay tax for, die for- just so Israel can keep grabbing few parcels of lands from Palestine.</p>
]]></description><pubDate>Sat, 28 Feb 2026 08:35:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47192332</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=47192332</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47192332</guid></item><item><title><![CDATA[New comment by shihab in "The Waymo World Model"]]></title><description><![CDATA[
<p>I think there are two steps here: converting video to sensor data input, and using that sensor data to drive. Only the second step will be handled by cars on road, first one is purely for training.</p>
]]></description><pubDate>Fri, 06 Feb 2026 17:04:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46915327</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46915327</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46915327</guid></item><item><title><![CDATA[New comment by shihab in "Jeffrey Epstein's Money Mingled with Silicon Valley Startups"]]></title><description><![CDATA[
<p>The article strictly talks about people who were pals with him _after_ his Pedophilia conviction. And please don't do this strawman "evil person eating babies", nobody sane is claiming that.</p>
]]></description><pubDate>Thu, 05 Feb 2026 17:20:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46901995</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46901995</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46901995</guid></item><item><title><![CDATA[New comment by shihab in "Rust’s Standard Library on the GPU"]]></title><description><![CDATA[
<p>I work with GPUs and I'm also trying to understand the motivations here.<p>Side note & a hot take: that sort of abstraction never really existed for GPU and it's going to be even harder now as Nvidia et al races to put more & more specialized hardware bits inside GPUs</p>
]]></description><pubDate>Wed, 28 Jan 2026 06:11:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=46791687</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46791687</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46791687</guid></item><item><title><![CDATA[New comment by shihab in "Rust’s Standard Library on the GPU"]]></title><description><![CDATA[
<p>To the author (or anyone from vectorware team), can you please give me, admittedly a skeptic, a motivating example of a "GPU-native" application?<p>That is, where does it truly make a difference to dispatch non-parallel/syscalls etc from GPU to CPU instead of dispatching parallel part of a code from CPU to GPU?<p>From the "Announcing VectorWare" page:<p>> Even after opting in, the CPU is in control and orchestrates work on the GPU.<p>Isn't it better to let CPUs be in control and orchestrate things as GPUs have much smaller, dumber cores?<p>> Furthermore, if you look at the software kernels that run on the GPU they are simplistic with low cyclomatic complexity.<p>Again, there's a obvious reason why people don't put branch-y code on GPU.<p>Genuinely curious what I'm missing.</p>
]]></description><pubDate>Wed, 28 Jan 2026 05:55:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=46791591</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46791591</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46791591</guid></item><item><title><![CDATA[New comment by shihab in "SIMD programming in pure Rust"]]></title><description><![CDATA[
<p>> For example, NEON ... can hold up to 32 128-bit vectors to perform your operations without having to touch the "slow" memory.<p>Something I recently learnt: the actual number of physical registers in modern x86 CPUs are significantly larger, even for 512-bit SIMD. Zen 5 CPUs actually have 384 vectors registers, 384*512b = 24KB!</p>
]]></description><pubDate>Wed, 21 Jan 2026 23:30:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=46713179</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46713179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46713179</guid></item><item><title><![CDATA[New comment by shihab in "Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks"]]></title><description><![CDATA[
<p>I'm not asking an academic program first published 8 year ago (e3nn) to beat actively developed CuEquivariance library. An academic proposing new algorithms doesn't need to worry too much about performance. But any new work which <i>focuses on performance</i>, that includes this blog and a huge number of academic papers published every year, should absolutely use latest vendor libraries as baseline.</p>
]]></description><pubDate>Wed, 21 Jan 2026 14:54:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46706520</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46706520</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46706520</guid></item><item><title><![CDATA[New comment by shihab in "Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks"]]></title><description><![CDATA[
<p>I should note PETSc is a big piece of software that does a lot of things. It also wraps many libraries, and those might ultimately dictate actual performance depending on what you plan on doing.</p>
]]></description><pubDate>Wed, 21 Jan 2026 14:24:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46706133</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46706133</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46706133</guid></item><item><title><![CDATA[New comment by shihab in "Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks"]]></title><description><![CDATA[
<p>To be practically useful, we don't need to beat vendors, just getting close would be enough, by the virtue of being open-source (and often portable). But I found, as an example, PETSc to be ~10x slower than MKL on CPU and CUDA on GPU; It still doesn't have native shared memory parallelism support on CPU etc.</p>
]]></description><pubDate>Wed, 21 Jan 2026 14:10:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=46705935</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46705935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46705935</guid></item><item><title><![CDATA[New comment by shihab in "Batmobile: 10-20x Faster CUDA Kernels for Equivariant Graph Neural Networks"]]></title><description><![CDATA[
<p>Hi, I just wanted to note that e3nn is more of an academic software that's a bit high-level by design. A better baseline for comparison would be Nvidia's cuEquivariance, which does pretty much the same thing as you did- take e3nn and optimize it for GPU.<p>As a HPC developer, it breaks my heart how worse academic software performance is compared to vendor libraries (from Intel or Nvidia). We need to start aiming much higher.</p>
]]></description><pubDate>Wed, 21 Jan 2026 13:00:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46705109</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46705109</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46705109</guid></item><item><title><![CDATA[New comment by shihab in "AVX-512: First Impressions on Performance and Programmability"]]></title><description><![CDATA[
<p>Hi, I actually mentioned ISPC several times there. And although I strenuously avoided crowning one approach "better" over the other, it is worth pointing out that 1) Many of these benefits of ISPC can be had from explicit SIMD libraries like Google's Highway, and 2) ISPC (or any SIMT model) is a departure from how the underlying hardware works, and as the AI community is discovering with GPU, this abstraction can sometimes be lot more headache than its worth.</p>
]]></description><pubDate>Mon, 19 Jan 2026 17:42:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46682013</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46682013</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46682013</guid></item><item><title><![CDATA[New comment by shihab in "AVX-512: First Impressions on Performance and Programmability"]]></title><description><![CDATA[
<p>No. Assuming `k` is small enough, which in practice often is, the arithmetic intensity of this kernel is 25-90 Flops/Byte, way above the roofline knee of any modern CPU.</p>
]]></description><pubDate>Mon, 19 Jan 2026 17:20:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46681706</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46681706</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46681706</guid></item><item><title><![CDATA[New comment by shihab in "AVX-512: First Impressions on Performance and Programmability"]]></title><description><![CDATA[
<p>Hi, thanks for reading.<p>Re (b) I'm curious what that middle ground is. Is there any simple refactor to help GCC to get rid of this `if`? (Note, ISPC did fine here)<p>(c) Just to be clear, all the codes in benchmark figures (baseline and SIMD) were compiled with fast-math flags.<p>Regarding (a), one of the points I wanted to get across was that it didn't <i>feel</i> that complicated to program in the end as I had thought. Porting to AVX-512 felt mechanical (hence the success of LLMs in one-shotting the whole thing).<p>This is a subjective opinion, depends on programmer's experience etc- so I won't dwell on it. I just wish more CPU programmers gave it a try.</p>
]]></description><pubDate>Mon, 19 Jan 2026 17:17:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46681674</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46681674</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46681674</guid></item><item><title><![CDATA[New comment by shihab in "AVX-512: First Impressions on Performance and Programmability"]]></title><description><![CDATA[
<p>Yeah N is big enough that entire data isn't in the cache, but the memory access pattern here is the next best thing: totally linear, predictable access. I remember seeing around 94%+ L1d cache hit rate.</p>
]]></description><pubDate>Wed, 14 Jan 2026 12:01:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=46615095</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46615095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46615095</guid></item><item><title><![CDATA[AVX-512: First Impressions on Performance and Programmability]]></title><description><![CDATA[
<p>Article URL: <a href="https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/">https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46610800">https://news.ycombinator.com/item?id=46610800</a></p>
<p>Points: 125</p>
<p># Comments: 53</p>
]]></description><pubDate>Wed, 14 Jan 2026 00:43:36 +0000</pubDate><link>https://shihab-shahriar.github.io//blog/2026/AVX-512-First-Impressions-on-Performance-and-Programmability/</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46610800</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46610800</guid></item><item><title><![CDATA[New comment by shihab in "A Couple 3D AABB Tricks"]]></title><description><![CDATA[
<p>For SIMD at least, the {mins[3], maxs[3]} representation aligns more naturally with actual instructions on x86. To compute a new bounding box:<p>new_box.mins = _mm_min_ps(a.mins[3], b.mins[3]);</p>
]]></description><pubDate>Mon, 12 Jan 2026 19:57:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=46593391</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46593391</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46593391</guid></item><item><title><![CDATA[New comment by shihab in "The unreasonable effectiveness of the Fourier transform"]]></title><description><![CDATA[
<p>If you are from ML/Data science world, the analogy that finally unlocked FFT for me is feature size reduction using Principal Component Analysis. In both cases, you project data to a new "better" co-ordinate system ("time to frequency domain"), filter out the basis vectors that have low variance ("ignore high-frequency waves"), and project data back to real space from those truncated dimension ("Ifft: inverse transform to time domain").<p>Of course some differences exist (e.g. basis vectors are fixed in FFT, unlike PCA).</p>
]]></description><pubDate>Fri, 09 Jan 2026 01:03:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=46548826</link><dc:creator>shihab</dc:creator><comments>https://news.ycombinator.com/item?id=46548826</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46548826</guid></item></channel></rss>