<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: nielstron</title><link>https://news.ycombinator.com/user?id=nielstron</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 08 Jun 2026 16:15:35 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=nielstron" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Coding Agents Are "Fixing" Correct Code]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.sri.inf.ethz.ch/blog/fixedcode">https://www.sri.inf.ethz.ch/blog/fixedcode</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47504519">https://news.ycombinator.com/item?id=47504519</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 24 Mar 2026 15:52:57 +0000</pubDate><link>https://www.sri.inf.ethz.ch/blog/fixedcode</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47504519</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47504519</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>Yes that's a great summary and I agree broadly.<p>Note with different prompt types I refer to different types of meta-prompts to generate the AGENTS.md. All of these are quite useless. Some additional experiments not in the paper showed that other automated approaches are also useless ("memory" creating methods, broadly speaking).</p>
]]></description><pubDate>Tue, 17 Feb 2026 17:27:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47050174</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47050174</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47050174</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>It could... but as pointed out by other the significance is unclear and per-model results have even less samples than the benchmark average. So: maybe :)</p>
]]></description><pubDate>Tue, 17 Feb 2026 17:26:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47050160</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47050160</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47050160</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>Hey thanks for your review, a paper author here.<p>Regarding the 4% improvement for human written AGENTS.md: this would be huge indeed if it were a _consistent_ improvement. However, for example on Sonnet 4.5, performance _drops_ by over 2%. Qwen3 benefits most and GPT-5.2 improves by 1-2%.<p>The LLM-generated prompts follow the coding agent recommendations. We also show an ablation over different prompt types, and none have consistently better performance.<p>But ultimately I agree with your post. In fact we do recommend writing good AGENTS.md, manually and targetedly. This is emphasized for example at the end of our abstract and conclusion.</p>
]]></description><pubDate>Tue, 17 Feb 2026 09:55:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47045565</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47045565</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47045565</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>This is life of an LLM researcher. We literally ran the last experiments only a month ago on what were the latest models back then...</p>
]]></description><pubDate>Tue, 17 Feb 2026 07:06:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47044555</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47044555</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47044555</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>Exactly my thoughts... the model should just auto ingest README and CONTRIBUTING when started.</p>
]]></description><pubDate>Tue, 17 Feb 2026 07:04:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47044540</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47044540</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47044540</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>Hey, paper author here.
We did try to get an even sample - we include both SWE-bench repos (which are large, popular and mostly human-written) and a sample of smaller, more recent repositories with existing AGENTS.md (these tend to contain LLM written code of course). Our findings generalize across both these samples. What is arguably missing are small repositories of completely human-written code, but this is quite difficult to obtain nowadays.</p>
]]></description><pubDate>Tue, 17 Feb 2026 07:00:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47044517</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47044517</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47044517</guid></item><item><title><![CDATA[New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"]]></title><description><![CDATA[
<p>Hey, a paper author here :)
I agree, if you know well about LLMs it shouldn't be too surprising that autogenerated context files are not helping - yet this is the default recommendation by major AI companies which we wanted to scrutinize.<p>> Their definition of context excludes prescriptive specs/requirements files.<p>Can you explain a bit what you mean here? If the context file specifies a desired behavior, we do check whether the LLM follows it, and this seems generally to work (Section 4.3).</p>
]]></description><pubDate>Tue, 17 Feb 2026 06:58:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47044493</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=47044493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47044493</guid></item><item><title><![CDATA[Transcribe your aunts post cards with Gemini 3 Pro]]></title><description><![CDATA[
<p>Article URL: <a href="https://leserli.ch/ocr/">https://leserli.ch/ocr/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46926549">https://news.ycombinator.com/item?id=46926549</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 07 Feb 2026 19:03:35 +0000</pubDate><link>https://leserli.ch/ocr/</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=46926549</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46926549</guid></item><item><title><![CDATA[New comment by nielstron in "K2-Think: A Parameter-Efficient Reasoning System"]]></title><description><![CDATA[
<p>Debunking the Claims of K2-Think <a href="https://www.sri.inf.ethz.ch/blog/k2think" rel="nofollow">https://www.sri.inf.ethz.ch/blog/k2think</a></p>
]]></description><pubDate>Sat, 13 Sep 2025 12:30:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=45231514</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=45231514</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45231514</guid></item><item><title><![CDATA[New comment by nielstron in "K2-Think: A Parameter-Efficient Reasoning System"]]></title><description><![CDATA[
<p>Debunking the Claims of K2-Think <a href="https://www.sri.inf.ethz.ch/blog/k2think" rel="nofollow">https://www.sri.inf.ethz.ch/blog/k2think</a></p>
]]></description><pubDate>Sat, 13 Sep 2025 12:30:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=45231512</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=45231512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45231512</guid></item><item><title><![CDATA[Debunking the Claims of K2-Think]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.sri.inf.ethz.ch/blog/k2think">https://www.sri.inf.ethz.ch/blog/k2think</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45221629">https://news.ycombinator.com/item?id=45221629</a></p>
<p>Points: 6</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 12 Sep 2025 12:58:28 +0000</pubDate><link>https://www.sri.inf.ethz.ch/blog/k2think</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=45221629</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45221629</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>noted. we'll make sure to critizise turing complete type systems more thoroughly next time :))</p>
]]></description><pubDate>Wed, 14 May 2025 12:01:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=43983472</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43983472</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43983472</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.</p>
]]></description><pubDate>Wed, 14 May 2025 08:34:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=43982257</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43982257</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43982257</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>thank you!</p>
]]></description><pubDate>Wed, 14 May 2025 06:16:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=43981409</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43981409</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43981409</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>re detecting and switching language: you could run several constraint systems in parallel and switch as soon as one of them rejects the input and another accepts it<p>re backtracking: a core part of this paper is ensuring a prefix property. that is there is always a legitimate completion and the model <i>can not</i> "corner" itself!<p>research needs to be done for what kind of languages and language features this prefix property can be ensured</p>
]]></description><pubDate>Wed, 14 May 2025 06:14:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=43981397</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43981397</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43981397</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>the problem with LSPs is that they don't guarantee generating a type annotation that we can use for constraints, i.e. we can not ensure the prefix property using LSPs. so we had to roll our own :)<p>Pulling in more features to help the system is definitely worth looking into!</p>
]]></description><pubDate>Wed, 14 May 2025 06:11:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=43981374</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43981374</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43981374</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>The downside is that you need to properly preprocess code, have less non-code Training Data, and can not adapt easily to new programming languages</p>
]]></description><pubDate>Wed, 14 May 2025 06:10:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43981362</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43981362</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43981362</guid></item><item><title><![CDATA[New comment by nielstron in "Type-constrained code generation with language models"]]></title><description><![CDATA[
<p>we were thinking about doing exactly this, the closest current work is probably the amazing "Learning Formal Mathematics from Intrinsic Motivation" by Poesia et al (they use constraints too increase the likelihood of generating correct theorems/proofs during RL)<p><a href="https://arxiv.org/abs/2407.00695" rel="nofollow">https://arxiv.org/abs/2407.00695</a></p>
]]></description><pubDate>Wed, 14 May 2025 06:08:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=43981352</link><dc:creator>nielstron</dc:creator><comments>https://news.ycombinator.com/item?id=43981352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43981352</guid></item></channel></rss>