<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: vforno</title><link>https://news.ycombinator.com/user?id=vforno</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 30 Jun 2026 20:24:10 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=vforno" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Thanks to you! I will update the model later to make it more and more optimized but you will see everything you need in the readme.</p>
]]></description><pubDate>Mon, 29 Jun 2026 20:08:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=48724458</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48724458</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48724458</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Absolutely yes! With nanoeuler I learned so much by testing every little detail of the project. Every little part you see has been tested and proven several times so that it could be understood and worked.</p>
]]></description><pubDate>Mon, 29 Jun 2026 13:35:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48719122</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48719122</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48719122</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Yes, because it has many separate kernels instead of aggressive merges like PyTorch (with Torch Compile). Each pass (norm, matmul, residual, RoPE, etc.) launches its own kernel, which increases launch overhead and memory traffic. CuBLAS helps, but it's not enough to compensate.</p>
]]></description><pubDate>Mon, 29 Jun 2026 13:11:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48718864</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48718864</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48718864</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Hi, in nanoeuler I use cuBLAS (NVIDIA's super-optimized library) for all matrix multiplications, with the tensor cores in TF32 mode. It's the same thing PyTorch uses underneath, so it's very fast.
What I've optimized (and will improve even more) and written by hand are the kernels for the other parts (like FlashAttention, which gave a nice 3x speedup), while I've delegated the large matrices to cuBLAS.
Training the 116M model on a 4070 runs well and in reasonable times. Compared to PyTorch, it's a bit slower (probably 1.5-2.5x), but nothing dramatic, especially considering it's all done from scratch without a framework and there are no other optimizations that would make it faster. I'm working on it.</p>
]]></description><pubDate>Mon, 29 Jun 2026 12:00:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48718120</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48718120</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48718120</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Really thanks If you need any help or have any questions I'm here.</p>
]]></description><pubDate>Mon, 29 Jun 2026 10:52:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=48717530</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48717530</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48717530</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Hi, thanks for the comment. Nanoeuler is starting as a study and research project that will obviously improve over time. I'll do my best to make the readme and other things more readable. Thank you very much.</p>
]]></description><pubDate>Mon, 29 Jun 2026 05:23:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48715139</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48715139</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48715139</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Most part of trasformer and sft!</p>
]]></description><pubDate>Mon, 29 Jun 2026 05:20:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48715119</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48715119</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48715119</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Hi, the uploads are one after the other because it was a long, step-by-step research project where I tested the code on another machine. I admit that I'm slowly making up for the commits on all the projects. For Gutenberg and Shakespeare, I admit that they were the best tests I could do, but I'll always improve!</p>
]]></description><pubDate>Sun, 28 Jun 2026 22:09:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48712227</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48712227</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48712227</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Hi, a couple of hours, not too much! Including sft!</p>
]]></description><pubDate>Sun, 28 Jun 2026 22:06:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48712196</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48712196</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48712196</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>yes yes tested on a 4070 ti 16gb everything worked without problems!</p>
]]></description><pubDate>Sun, 28 Jun 2026 20:45:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48711472</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48711472</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48711472</guid></item><item><title><![CDATA[Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch]]></title><description><![CDATA[
<p>Hi everyone,<p>I started working on nanoeuler after the ban of anthropic's fable because my ambition and dream is to work in the AI   field in anthropic. The two interesting reasons that led me to create nanoeuler were (1) interfacing with llm does not mean understanding how they are composed and (2), working on llm with a very low-level layer to understand the correlation between parameters and data and growth of the model and how the GPU works and how some layers can be optimized.<p>So I started working on it with a research aspect by making nanoeuler grow more and more but doing one step after another starting from Shakespeare.txt and understanding what a text generation model understands at 23 million parameters. For example, nanoeuler at that number had understood that Name: started a line and wrote that line with sense.<p>I wrote everything in CUDA because I wanted to not use any intermediary between the model in training and inference and what it had to do. Then the use of SFT and much more, even if in small ways, were really useful to understand the various step to make an llm like a chatbot.Any feedback, help, or suggestions are absolutely welcome!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48710778">https://news.ycombinator.com/item?id=48710778</a></p>
<p>Points: 53</p>
<p># Comments: 25</p>
]]></description><pubDate>Sun, 28 Jun 2026 19:38:14 +0000</pubDate><link>https://github.com/JustVugg/nanoeuler</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48710778</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48710778</guid></item><item><title><![CDATA[Show HN: Loomabase – Column-level CRDT sync for SQLite + Postgres]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/JustVugg/loomabase">https://github.com/JustVugg/loomabase</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48685116">https://news.ycombinator.com/item?id=48685116</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 26 Jun 2026 10:57:07 +0000</pubDate><link>https://github.com/JustVugg/loomabase</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48685116</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48685116</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>E-mail sent!</p>
]]></description><pubDate>Mon, 22 Jun 2026 19:24:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48634821</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48634821</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48634821</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch"]]></title><description><![CDATA[
<p>Of course I'll write to you right away! Thank you so much!</p>
]]></description><pubDate>Mon, 22 Jun 2026 19:17:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48634729</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48634729</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48634729</guid></item><item><title><![CDATA[Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch]]></title><description><![CDATA[
<p>Hi everyone,<p>I started working on nanoeuler after the ban of anthropic's fable because my ambition and dream is to work in the AI   field in anthropic. The two interesting reasons that led me to create nanoeuler were the first, interfacing with llm does not mean understanding how they are composed and two, working on llm with a very low-level layer to understand the correlation between parameters and data and growth of the model and how the GPU works and how some layers can be optimized. So I started working on it with a research aspect by making nanoeuler grow more and more but doing one step after another starting from Shakespeare.txt and understanding what a text generation model understands at 23 million parameters. For example, nanoeuler at that number had understood that Name: started a line and wrote that line with sense. I wrote everything in CUDA because I wanted to not use any intermediary between the model in training and inference and what it had to do. Then the use of SFT and much more, even if in small ways, were really useful to understand the various step to make an llm like a chatbot.Any feedback, help, or suggestions are absolutely welcome!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48601472">https://news.ycombinator.com/item?id=48601472</a></p>
<p>Points: 6</p>
<p># Comments: 3</p>
]]></description><pubDate>Fri, 19 Jun 2026 18:18:35 +0000</pubDate><link>https://github.com/JustVugg/nanoeuler</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48601472</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48601472</guid></item><item><title><![CDATA[Show HN: Loomabase – Offline-first column-level CRDT sync for SQLite + Postgres]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/JustVugg/loomabase">https://github.com/JustVugg/loomabase</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48585258">https://news.ycombinator.com/item?id=48585258</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 18 Jun 2026 13:47:28 +0000</pubDate><link>https://github.com/JustVugg/loomabase</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48585258</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48585258</guid></item><item><title><![CDATA[Show HN: Loomabase – Offline-first sync for SQLite and PostgreSQL]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/JustVugg/loomabase">https://github.com/JustVugg/loomabase</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48509082">https://news.ycombinator.com/item?id=48509082</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 12 Jun 2026 20:26:47 +0000</pubDate><link>https://github.com/JustVugg/loomabase</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48509082</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48509082</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: PolyCSS – A 3D engine for the DOM (without WebGL)"]]></title><description><![CDATA[
<p>Very interesting! It also seems very lightweight! But in case of 3D games, do you have any ideas on whether PolyCSS can be used in this area?</p>
]]></description><pubDate>Wed, 27 May 2026 14:21:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48294827</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48294827</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48294827</guid></item><item><title><![CDATA[New comment by vforno in "Show HN: Lavern: an open-source multi-agent legal system (Apache 2.0)"]]></title><description><![CDATA[
<p>Hi guys really cool project,
I create judicex <a href="https://github.com/JustVugg/judicex" rel="nofollow">https://github.com/JustVugg/judicex</a> maybe we can help each other to make one big software open source in a law workspace!</p>
]]></description><pubDate>Tue, 26 May 2026 13:04:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48279261</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48279261</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48279261</guid></item><item><title><![CDATA[Show HN: Judicex – Open-source legal AI that abstains instead of hallucinating]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/JustVugg/judicex">https://github.com/JustVugg/judicex</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48277539">https://news.ycombinator.com/item?id=48277539</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 26 May 2026 10:09:09 +0000</pubDate><link>https://github.com/JustVugg/judicex</link><dc:creator>vforno</dc:creator><comments>https://news.ycombinator.com/item?id=48277539</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48277539</guid></item></channel></rss>