<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: korbip</title><link>https://news.ycombinator.com/user?id=korbip</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 28 Apr 2026 08:18:29 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=korbip" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by korbip in "Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation"]]></title><description><![CDATA[
<p>This was done already here as well: <a href="https://arxiv.org/abs/2507.04239" rel="nofollow">https://arxiv.org/abs/2507.04239</a></p>
]]></description><pubDate>Thu, 05 Feb 2026 04:25:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=46895660</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=46895660</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46895660</guid></item><item><title><![CDATA[Show HN: CompoConf – modular configuration for modular systems]]></title><description><![CDATA[
<p>Article URL: <a href="https://korbi.ai/blog/compoconf/">https://korbi.ai/blog/compoconf/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44322873">https://news.ycombinator.com/item?id=44322873</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 19 Jun 2025 21:54:43 +0000</pubDate><link>https://korbi.ai/blog/compoconf/</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=44322873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44322873</guid></item><item><title><![CDATA[New comment by korbip in "Ask HN: How to learn CUDA to professional level"]]></title><description><![CDATA[
<p>I can share a similar PhD story (the result being visible here: <a href="https://github.com/NX-AI/flashrnn">https://github.com/NX-AI/flashrnn</a>). Back then I didn't find any tutorials that cover anything beyond the basics (which are still important).
Once you have understood the principle working mode and architecture of a GPU, I would recommend the following workflow:
1. First create an environment so that you can actually test your kernels against baselines written in a higher-level language.
2. If you don't have an urgent project already, try to improve/re-implement existing problems (MatMul being the first example). Don't get caught by wanting to implement all size cases. Take an example just to learn a certain functionality, rather than solving the whole problem if it's just about learning.
3. Write the functionality you want to have in increasing complexity. Write loops first, then parallelize these loops over the grid. Use global memory first, then put things into shared memory and registers. Use plain matrix multiplication first, then use mma (TensorCore) primitives to speed things up.
4. Iterate over the CUDA C Programming Guide. It covers all (most) of the functionality that you want to learn - but can't be just read an memorized. When you apply it you learn it.
5. Might depend on you use-case but also consider using higher-level abstractions like CUTLASS or ThunderKitten. Also, if your environment is jax/torch, use triton first before going to CUDA level.<p>Overall, it will be some pain for sure. And to master it including PTX etc. will take a lot of time.</p>
]]></description><pubDate>Sun, 08 Jun 2025 13:09:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=44216793</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=44216793</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44216793</guid></item><item><title><![CDATA[New comment by korbip in "Ask HN: Is anybody building an alternative transformer?"]]></title><description><![CDATA[
<p>Test it out here:<p><a href="https://github.com/NX-AI/mlstm_kernels">https://github.com/NX-AI/mlstm_kernels</a><p><a href="https://huggingface.co/NX-AI/xLSTM-7b" rel="nofollow">https://huggingface.co/NX-AI/xLSTM-7b</a></p>
]]></description><pubDate>Sat, 15 Feb 2025 03:14:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=43055573</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=43055573</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43055573</guid></item><item><title><![CDATA[New comment by korbip in "Ask HN: Is anybody building an alternative transformer?"]]></title><description><![CDATA[
<p>There is a LOT of effort in the research community currently:<p>1. Improving the Self-Attention in the Transformer as is, keeping the quadratic complexity, which has some theoretical advantage in principle[1]: The most hyped one probably DeepSeek's Multi-Latent Attention[15], which kind of is Attention still - but also somehow different.<p>2. Linear RNNs: This starts from Linear Attention[2], DeltaNet[3], RKWV[4], Retention[5], Gated Linear Attention[6], Mamba[7], Griffin[8], Based[9], xLSTM[10], TTT[11], Gated DeltaNet[12], Titans[13].<p>They all have an update like:
C_{t} = F_{t} C_{t-1} + i_{t} k_{t} v_{t}^T
with a cell state C and output h_{t} = C_{t}^T q_{t}. There's a few tricks that made these work and now being very strong competitors to Transformers. The key here is the combination of an linear associative memory (aka Hopfield Network, aka Fast Weight Programmer, aka State Expansion...) and pushing it into a sequence with gating similar to the original LSTM (input, forget, output gate) - while here this is only dependent on the current input not the previous state for linearity. The linearity is needed to make it sequence-parallelizable, there are efforts now to add non-linearities again, but let's see.
Their main benefit+downside both is that they have a fixed-size state, and therefore linear (vs Transformer-quadratic) time complexity.<p>For larger sizes they have become popular in hybrids with Transformer (Attention) Blocks, as there are problems with long context tasks [14]. Cool thing is they can also be distilled from pre-trained Transformers with not too much performance drop [16].<p>3. Along the sequence dimension most things can be categorized in these two. Attention and Linear (Associative Memory Enhanced) RNNs are heavily using Matrix Multiplications and anything else would be a waste of FLOPs on current GPUs. The essence is how to store information and how to interact with it, there might be still interesting directions as other comments show. 
Other important topics that go into the depth / width of the model are: Mixture of Experts, Iteration (RNNs) in Depth[17].<p>Disclaimer: I'm author of xLSTM and we recently released a 7B model [18] trained at NXAI, currently the fastest linear RNN at this scale and performance. Happy to answer more questions on this or the current state in this field of research.<p>[1] <a href="https://arxiv.org/abs/2008.02217" rel="nofollow">https://arxiv.org/abs/2008.02217</a><p>[2] <a href="https://arxiv.org/abs/2006.16236" rel="nofollow">https://arxiv.org/abs/2006.16236</a><p>[3] <a href="https://arxiv.org/pdf/2102.11174" rel="nofollow">https://arxiv.org/pdf/2102.11174</a><p>[4] <a href="https://github.com/BlinkDL/RWKV">https://github.com/BlinkDL/RWKV</a><p>[5] <a href="https://arxiv.org/abs/2307.08621" rel="nofollow">https://arxiv.org/abs/2307.08621</a><p>[6] <a href="https://arxiv.org/pdf/2312.00752" rel="nofollow">https://arxiv.org/pdf/2312.00752</a><p>[7] <a href="https://arxiv.org/abs/2312.06635" rel="nofollow">https://arxiv.org/abs/2312.06635</a><p>[8] <a href="https://arxiv.org/pdf/2402.19427" rel="nofollow">https://arxiv.org/pdf/2402.19427</a><p>[9] <a href="https://arxiv.org/abs/2402.18668" rel="nofollow">https://arxiv.org/abs/2402.18668</a><p>[10] <a href="https://arxiv.org/abs/2405.04517" rel="nofollow">https://arxiv.org/abs/2405.04517</a><p>[11] <a href="https://arxiv.org/abs/2407.04620" rel="nofollow">https://arxiv.org/abs/2407.04620</a><p>[12] <a href="https://arxiv.org/abs/2412.06464" rel="nofollow">https://arxiv.org/abs/2412.06464</a><p>[13] <a href="https://arxiv.org/abs/2501.00663" rel="nofollow">https://arxiv.org/abs/2501.00663</a><p>[14] <a href="https://arxiv.org/abs/2406.07887" rel="nofollow">https://arxiv.org/abs/2406.07887</a><p>[15] <a href="https://arxiv.org/abs/2405.04434" rel="nofollow">https://arxiv.org/abs/2405.04434</a><p>[16] <a href="https://arxiv.org/abs/2410.10254" rel="nofollow">https://arxiv.org/abs/2410.10254</a><p>[17] <a href="http://arxiv.org/abs/2502.05171" rel="nofollow">http://arxiv.org/abs/2502.05171</a><p>[18] <a href="https://huggingface.co/NX-AI/xLSTM-7b" rel="nofollow">https://huggingface.co/NX-AI/xLSTM-7b</a></p>
]]></description><pubDate>Sat, 15 Feb 2025 03:04:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=43055517</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=43055517</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43055517</guid></item><item><title><![CDATA[New comment by korbip in "xLSTM: Extended Long Short-Term Memory"]]></title><description><![CDATA[
<p>Thanks! 
I don't see any implementation there.
In any case, we are planning a code release soon.</p>
]]></description><pubDate>Sun, 12 May 2024 16:16:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=40335554</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=40335554</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40335554</guid></item><item><title><![CDATA[New comment by korbip in "xLSTM: Extended Long Short-Term Memory"]]></title><description><![CDATA[
<p>You mainly got it right. Usually one does have many scalar 'c' cells, that talk to each other via memory mixing. For the sLSTM, you group them into heads, talking only to cells within the same head. The reason that we referred to scalar cells here is that these are that fundamental building block. Many of them can and are usually combined and vector notation is useful in this case.<p>For the matrix 'C' state, there are also heads/cells in that sense that you have multiple, but they don't talk to each other. So yes, you can view that as a 3D tensor. And here, the matrix is the fundamental building block / concept.</p>
]]></description><pubDate>Sun, 12 May 2024 16:13:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=40335533</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=40335533</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40335533</guid></item><item><title><![CDATA[New comment by korbip in "xLSTM: Extended Long Short-Term Memory"]]></title><description><![CDATA[
<p>This was formulated a bit unclear. It is not possible to parallelize in the sequence dimension for training as it is possible for Transformers. In the batch dimension you can always do it.</p>
]]></description><pubDate>Wed, 08 May 2024 13:56:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=40298173</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=40298173</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40298173</guid></item><item><title><![CDATA[New comment by korbip in "xLSTM: Extended Long Short-Term Memory"]]></title><description><![CDATA[
<p>For language in general it seems fine. But there might be specific tasks where it is necessary indeed.</p>
]]></description><pubDate>Wed, 08 May 2024 13:54:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=40298149</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=40298149</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40298149</guid></item><item><title><![CDATA[New comment by korbip in "xLSTM: Extended Long Short-Term Memory"]]></title><description><![CDATA[
<p>Thank you! I can say that it is not really a diminishing factor at the scales reported in the paper. So, xLSTM[7:1] is pretty much on par with xLSTM[1:0] in speed.
We show that it is helpful on toy tasks, and it shows even better sequence extrapolation performance, so yes.</p>
]]></description><pubDate>Wed, 08 May 2024 08:45:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=40295734</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=40295734</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40295734</guid></item><item><title><![CDATA[New comment by korbip in "xLSTM: Extended Long Short-Term Memory"]]></title><description><![CDATA[
<p>Disclaimer: I'm shared first author of this paper.<p>As a clarification: The speed for training will be on par with FlashAttention-2, when fully optimized and only including the mLSTM. For decoding/inference both are very close to Mamba as xLSTM is a recurrent architecture. The sLSTM has memory mixing, that is state tracking capabilities, for problems Transformers and State Space Models (and any other sequence-parallelizable architecture) cannot solve fundamentally.</p>
]]></description><pubDate>Wed, 08 May 2024 08:30:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=40295657</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=40295657</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40295657</guid></item><item><title><![CDATA[Show HN: OpenPV – Photovoltaic Potential in Bavaria (and Beyond?)]]></title><description><![CDATA[
<p>Photovoltaic potential calculated live and privacy-friendly in your browser via WebGL. Using open LoD2 and laser point data from the Bavarian Public Agency for Digitization,
High-Speed Internet and Surveying. Including laser points for tree occlusions and other non-LoD2 objects. Try it with the Bavarian Hall of Fame next to Oktoberfest:<p>Hall of Fame, Munich, Bavaria<p>Note that the point clouds might be a lot of data to download if enabled (~50 MB+). Currently limited to Bavaria, Germany.<p>Thanks for your feedback!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=37853754">https://news.ycombinator.com/item?id=37853754</a></p>
<p>Points: 6</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 12 Oct 2023 05:32:07 +0000</pubDate><link>https://www.openpv.de</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=37853754</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37853754</guid></item><item><title><![CDATA[Self-Expanding Neural Networks]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2307.04526">https://arxiv.org/abs/2307.04526</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=36707110">https://news.ycombinator.com/item?id=36707110</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 13 Jul 2023 10:28:11 +0000</pubDate><link>https://arxiv.org/abs/2307.04526</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=36707110</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36707110</guid></item><item><title><![CDATA[DeepMind AlphaCode: AI learns to solve programming competition tasks [pdf]]]></title><description><![CDATA[
<p>Article URL: <a href="https://storage.googleapis.com/deepmind-media/AlphaCode/competition_level_code_generation_with_alphacode.pdf">https://storage.googleapis.com/deepmind-media/AlphaCode/competition_level_code_generation_with_alphacode.pdf</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=30198672">https://news.ycombinator.com/item?id=30198672</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 03 Feb 2022 20:54:21 +0000</pubDate><link>https://storage.googleapis.com/deepmind-media/AlphaCode/competition_level_code_generation_with_alphacode.pdf</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=30198672</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30198672</guid></item><item><title><![CDATA[Transformers Connected to Hopfield Networks]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2008.02217">https://arxiv.org/abs/2008.02217</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=27531948">https://news.ycombinator.com/item?id=27531948</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 16 Jun 2021 18:39:54 +0000</pubDate><link>https://arxiv.org/abs/2008.02217</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=27531948</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27531948</guid></item><item><title><![CDATA[Norona – We make rapid Covid tests visible]]></title><description><![CDATA[
<p>Article URL: <a href="https://noronatest.me">https://noronatest.me</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=25650759">https://news.ycombinator.com/item?id=25650759</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 05 Jan 2021 20:44:14 +0000</pubDate><link>https://noronatest.me</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=25650759</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=25650759</guid></item><item><title><![CDATA[I Can. We Want]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.korbip.de/2020/06/06/i-can/">https://www.korbip.de/2020/06/06/i-can/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=24217233">https://news.ycombinator.com/item?id=24217233</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 19 Aug 2020 22:01:26 +0000</pubDate><link>https://www.korbip.de/2020/06/06/i-can/</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=24217233</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24217233</guid></item><item><title><![CDATA[New comment by korbip in "PyTorch-ProbGraph – Hierarchical Probabilistic Graphical Models in PyTorch"]]></title><description><![CDATA[
<p>Feel free to add them. :)</p>
]]></description><pubDate>Mon, 10 Aug 2020 21:06:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=24113995</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=24113995</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24113995</guid></item><item><title><![CDATA[AutoRail – Autonomous, Simple, Energy-Efficient Mobility – A Vision]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.korbip.de/AutoRail/">https://www.korbip.de/AutoRail/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=24103235">https://news.ycombinator.com/item?id=24103235</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 09 Aug 2020 21:50:28 +0000</pubDate><link>https://www.korbip.de/AutoRail/</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=24103235</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24103235</guid></item><item><title><![CDATA[PyTorch-ProbGraph – Hierarchical Probabilistic Graphical Models in PyTorch]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/kpoeppel/pytorch_probgraph">https://github.com/kpoeppel/pytorch_probgraph</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=24102396">https://news.ycombinator.com/item?id=24102396</a></p>
<p>Points: 25</p>
<p># Comments: 4</p>
]]></description><pubDate>Sun, 09 Aug 2020 20:01:07 +0000</pubDate><link>https://github.com/kpoeppel/pytorch_probgraph</link><dc:creator>korbip</dc:creator><comments>https://news.ycombinator.com/item?id=24102396</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=24102396</guid></item></channel></rss>