<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: d3m0t3p</title><link>https://news.ycombinator.com/user?id=d3m0t3p</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 29 Apr 2026 08:09:47 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=d3m0t3p" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by d3m0t3p in "AGENTS.md outperforms skills in our agent evals"]]></title><description><![CDATA[
<p>Yea but the goal it not to bloat the context space.
Here you "waste" context by providing non usefull information.
What they did instead is put an index of the documentation into the context, then the LLM can fetch the documentation. This is the same idea that skills but it apparently works better without the agentic part of the skills.
Furthermore instead of having a nice index pointing to the doc, They compressed it.</p>
]]></description><pubDate>Thu, 29 Jan 2026 23:49:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=46818607</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=46818607</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46818607</guid></item><item><title><![CDATA[New comment by d3m0t3p in "I asked AI researchers and economists about SWE career strategies given AI"]]></title><description><![CDATA[
<p>Same, Firefox iOS</p>
]]></description><pubDate>Mon, 29 Dec 2025 12:30:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=46420008</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=46420008</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46420008</guid></item><item><title><![CDATA[New comment by d3m0t3p in "History LLMs: Models trained exclusively on pre-1913 texts"]]></title><description><![CDATA[
<p>The model is fined tuned for chat behavior. So the style might be due to
- Fine tuning
- More Stylised text in the corpus, english evolved a lot in the last century.</p>
]]></description><pubDate>Thu, 18 Dec 2025 23:51:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=46320494</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=46320494</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46320494</guid></item><item><title><![CDATA[New comment by d3m0t3p in "F-35 Fighter Jet's C++ Coding Standards [pdf]"]]></title><description><![CDATA[
<p>Is that really the only thing you managed to remember ?</p>
]]></description><pubDate>Sun, 07 Dec 2025 22:10:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46185732</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=46185732</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46185732</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Nvidia DGX Spark: When benchmark numbers meet production reality"]]></title><description><![CDATA[
<p>Because the ML ecosystem is more mature on the NVidia side. Software-wise the cuda platform is more advanced. It will be hard for AMD to catch up. It is good to see competition tho.</p>
]]></description><pubDate>Sun, 26 Oct 2025 20:59:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45715162</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=45715162</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45715162</guid></item><item><title><![CDATA[New comment by d3m0t3p in "The Missing Semester of Your CS Education (2020)"]]></title><description><![CDATA[
<p>In my own studies, software engineering was mostly about structurig code, coding pattern such as visitor, singleton etc. I.E how to create a maintainable codebase</p>
]]></description><pubDate>Sat, 25 Oct 2025 10:57:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=45702827</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=45702827</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45702827</guid></item><item><title><![CDATA[New comment by d3m0t3p in "How does gradient descent work?"]]></title><description><![CDATA[
<p>Would you have some literature about that ?</p>
]]></description><pubDate>Tue, 07 Oct 2025 21:42:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=45509287</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=45509287</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45509287</guid></item><item><title><![CDATA[New comment by d3m0t3p in "How does gradient descent work?"]]></title><description><![CDATA[
<p>This sounds a lot like what the Muon / Shampoo optimizer do.</p>
]]></description><pubDate>Tue, 07 Oct 2025 20:11:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=45508205</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=45508205</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45508205</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Apertus: An open, transparent, multilingual language model"]]></title><description><![CDATA[
<p>Interesting to see that they enforce retroactive opt out for data collection.
I wonder how they do that,
what if the model is already trained with your data and you opt out.</p>
]]></description><pubDate>Tue, 02 Sep 2025 11:02:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=45101385</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=45101385</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45101385</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Open models by OpenAI"]]></title><description><![CDATA[
<p>You can batch only if you have distinct chat in parallel,</p>
]]></description><pubDate>Tue, 05 Aug 2025 20:38:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=44803946</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44803946</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44803946</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Adding lookbehinds to rust-lang/regex"]]></title><description><![CDATA[
<p>Nice to see a master thesis highlighted on the research groupe page</p>
]]></description><pubDate>Tue, 15 Jul 2025 16:50:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=44573220</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44573220</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44573220</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Cognition (Devin AI) to Acquire Windsurf"]]></title><description><![CDATA[
<p>Your first link is (in my opinion) highly biased in the samples they choose, they hired maintainers from open-source repos (people with multi years of experience, on their specific repo).<p>So indeed, IF you are in that case: Many years on the same project with multiple years experience then it is not usefull, otherwise it might be. 
This means it might be usefull for junior and for experienced devs who are switching projects.
It is a tool like any other, indeed if you have a workflow that you optimized through years of usage it won't help.</p>
]]></description><pubDate>Mon, 14 Jul 2025 20:48:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=44565070</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44565070</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44565070</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Show HN: Refine – A Local Alternative to Grammarly"]]></title><description><![CDATA[
<p>It is Gemma 3n, I can't give feedback yet on the battery hit, But I would not expect anything bad as these models have been developed for much smaller devices (Phones)</p>
]]></description><pubDate>Mon, 14 Jul 2025 08:08:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=44557561</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44557561</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44557561</guid></item><item><title><![CDATA[New comment by d3m0t3p in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"]]></title><description><![CDATA[
<p>Hey, really cool project, I’m excited to see the outcome.
Is there a blog / paper summarizing how you are doing it ?  
Also which research group is currently working on it at eth ?</p>
]]></description><pubDate>Sat, 12 Jul 2025 10:13:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44540873</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44540873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44540873</guid></item><item><title><![CDATA[New comment by d3m0t3p in "A non-anthropomorphized view of LLMs"]]></title><description><![CDATA[
<p>Do they ?
LLM embedd the token sequence N^{L} to R^{LxD}, we have some attention and the output is also R^{LxD}, then we apply a projection to the vocabulary and we get R^{LxV} we get therefore for each token a likelihood over the voc.
In the attention, you can have Multi Head attention (or whatever version is fancy: GQA,MLA) and therefore multiple representation, but it is always tied to a token. I would argue that there is no hidden state independant of a token.<p>Whereas LSTM, or structured state space for example have a  state that is updated and not tied to a specific item in the sequence.<p>I would argue that his text is easily understandable except for the notation of the function, explaining that you can compute a probability based on previous words is understandable by everyone without having to resort to anthropomorphic terminology</p>
]]></description><pubDate>Sun, 06 Jul 2025 23:46:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=44485223</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44485223</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44485223</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Occurences of swearing in the Linux kernel source code over time"]]></title><description><![CDATA[
<p>You can check company names too ! It's interesting to see that by default, the graph shows google,apple.
But adding meta, and IBM really changes the plot.<p>Meta went from 2K to 10K+ from 2018 to 2025. While IBM seems to have stopped contributing in 2008. Since they the merging with RedHat, I would have expected to see them increase again but none of RedHat / IBM seems to have increase.
<a href="https://www.vidarholen.net/contents/wordcount/#redhat,oracle,google,apple,microsoft,IBM,facebook,meta" rel="nofollow">https://www.vidarholen.net/contents/wordcount/#redhat,oracle...</a>
Not sure if their name appearing means that they are contributing tho.<p>Really cool project,</p>
]]></description><pubDate>Mon, 16 Jun 2025 08:56:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=44287731</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=44287731</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44287731</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Show HN: WhatsApp MCP Server"]]></title><description><![CDATA[
<p>Apparently this is the case: <a href="https://github.com/tulir/whatsmeow/discussions/199" rel="nofollow">https://github.com/tulir/whatsmeow/discussions/199</a></p>
]]></description><pubDate>Mon, 31 Mar 2025 11:23:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=43533679</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=43533679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43533679</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Ask HN: Who is hiring? (March 2025)"]]></title><description><![CDATA[
<p>Hi, do you offer visa / allow remote from the EU (GMT+2)</p>
]]></description><pubDate>Mon, 03 Mar 2025 23:30:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43248134</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=43248134</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43248134</guid></item><item><title><![CDATA[New comment by d3m0t3p in "CTO / cofounder exit deal after 1.5y at 600k revenue without SHA"]]></title><description><![CDATA[
<p>Would you mind sharing your company name?
I'm a master's student in AI, and after finishing my master's thesis at IBM this summer, I'll be looking for jobs.</p>
]]></description><pubDate>Sun, 05 Jan 2025 13:07:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=42601499</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=42601499</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42601499</guid></item><item><title><![CDATA[New comment by d3m0t3p in "Beyond Gradient Averaging in Parallel Optimization"]]></title><description><![CDATA[
<p>Well, it's just like stochastic gradient descent, if you think about it. The normal gradient descent is computed using the whole training set. The stochastic gradient is trained on a batch (a subset of the training set), and in the distributed case, we compute two batches at once by doing the gradient on each in parallel. 
The intuition works IMO, but indeed, having the first batch update and then the second, is not equal to having the mean update.<p>This is indeed super cool !</p>
]]></description><pubDate>Tue, 31 Dec 2024 01:16:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=42555421</link><dc:creator>d3m0t3p</dc:creator><comments>https://news.ycombinator.com/item?id=42555421</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42555421</guid></item></channel></rss>