Hacker News: nielstron

Coding Agents Are "Fixing" Correct Code

nielstron — Tue, 24 Mar 2026 15:52:57 +0000

Article URL: https://www.sri.inf.ethz.ch/blog/fixedcode

Comments URL: https://news.ycombinator.com/item?id=47504519

Points: 3

# Comments: 1

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 17:27:03 +0000

Yes that's a great summary and I agree broadly.

Note with different prompt types I refer to different types of meta-prompts to generate the AGENTS.md. All of these are quite useless. Some additional experiments not in the paper showed that other automated approaches are also useless ("memory" creating methods, broadly speaking).

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 17:26:14 +0000

It could... but as pointed out by other the significance is unclear and per-model results have even less samples than the benchmark average. So: maybe :)

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 09:55:27 +0000

Hey thanks for your review, a paper author here.

Regarding the 4% improvement for human written AGENTS.md: this would be huge indeed if it were a _consistent_ improvement. However, for example on Sonnet 4.5, performance _drops_ by over 2%. Qwen3 benefits most and GPT-5.2 improves by 1-2%.

The LLM-generated prompts follow the coding agent recommendations. We also show an ablation over different prompt types, and none have consistently better performance.

But ultimately I agree with your post. In fact we do recommend writing good AGENTS.md, manually and targetedly. This is emphasized for example at the end of our abstract and conclusion.

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 07:06:30 +0000

This is life of an LLM researcher. We literally ran the last experiments only a month ago on what were the latest models back then...

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 07:04:07 +0000

Exactly my thoughts... the model should just auto ingest README and CONTRIBUTING when started.

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 07:00:55 +0000

Hey, paper author here. We did try to get an even sample - we include both SWE-bench repos (which are large, popular and mostly human-written) and a sample of smaller, more recent repositories with existing AGENTS.md (these tend to contain LLM written code of course). Our findings generalize across both these samples. What is arguably missing are small repositories of completely human-written code, but this is quite difficult to obtain nowadays.

New comment by nielstron in "Evaluating AGENTS.md: are they helpful for coding agents?"

nielstron — Tue, 17 Feb 2026 06:58:15 +0000

Hey, a paper author here :) I agree, if you know well about LLMs it shouldn't be too surprising that autogenerated context files are not helping - yet this is the default recommendation by major AI companies which we wanted to scrutinize.

> Their definition of context excludes prescriptive specs/requirements files.

Can you explain a bit what you mean here? If the context file specifies a desired behavior, we do check whether the LLM follows it, and this seems generally to work (Section 4.3).

Transcribe your aunts post cards with Gemini 3 Pro

nielstron — Sat, 07 Feb 2026 19:03:35 +0000

Article URL: https://leserli.ch/ocr/

Comments URL: https://news.ycombinator.com/item?id=46926549

Points: 1

# Comments: 0

New comment by nielstron in "K2-Think: A Parameter-Efficient Reasoning System"

nielstron — Sat, 13 Sep 2025 12:30:20 +0000

Debunking the Claims of K2-Think https://www.sri.inf.ethz.ch/blog/k2think

New comment by nielstron in "K2-Think: A Parameter-Efficient Reasoning System"

nielstron — Sat, 13 Sep 2025 12:30:10 +0000

Debunking the Claims of K2-Think https://www.sri.inf.ethz.ch/blog/k2think

Debunking the Claims of K2-Think

nielstron — Fri, 12 Sep 2025 12:58:28 +0000

Article URL: https://www.sri.inf.ethz.ch/blog/k2think

Comments URL: https://news.ycombinator.com/item?id=45221629

Points: 6

# Comments: 0

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 12:01:53 +0000

noted. we'll make sure to critizise turing complete type systems more thoroughly next time :))

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 08:34:27 +0000

Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 06:16:20 +0000

thank you!

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 06:14:22 +0000

re detecting and switching language: you could run several constraint systems in parallel and switch as soon as one of them rejects the input and another accepts it

re backtracking: a core part of this paper is ensuring a prefix property. that is there is always a legitimate completion and the model can not "corner" itself!

research needs to be done for what kind of languages and language features this prefix property can be ensured

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 06:11:58 +0000

the problem with LSPs is that they don't guarantee generating a type annotation that we can use for constraints, i.e. we can not ensure the prefix property using LSPs. so we had to roll our own :)

Pulling in more features to help the system is definitely worth looking into!

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 06:10:13 +0000

The downside is that you need to properly preprocess code, have less non-code Training Data, and can not adapt easily to new programming languages

New comment by nielstron in "Type-constrained code generation with language models"

nielstron — Wed, 14 May 2025 06:08:47 +0000

we were thinking about doing exactly this, the closest current work is probably the amazing "Learning Formal Mathematics from Intrinsic Motivation" by Poesia et al (they use constraints too increase the likelihood of generating correct theorems/proofs during RL)

https://arxiv.org/abs/2407.00695