Hacker News: willj

New comment by willj in "LLMs as the new high level language"

willj — Sun, 08 Feb 2026 14:28:23 +0000

If you’re using a model from a provider (not one that you’re hosting locally), greedy decoding via temperature = 0 does not guarantee determinism. A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

New comment by willj in "LLMs as the new high level language"

willj — Sun, 08 Feb 2026 14:25:23 +0000

A temperature of 0 doesn’t result in the same responses every time, in part due to floating-point precision and in part to to lack of batch invariance [1]

[1] https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

New comment by willj in "Ask HN: What are the metrics for "AI-generated technical debt"?"

willj — Fri, 30 Jan 2026 12:07:47 +0000

Thanks! That makes sense. I suppose this requires commit messages or PRs to indicate code was AI-generated vs. not, or to assume that commits after a certain time period were all from AI coding. It’d be an interesting analysis. Maybe there’s already a study out there.

In any case, thank you again!

New comment by willj in "AI code and software craft"

willj — Tue, 27 Jan 2026 13:14:20 +0000

100%. This is what I posted about on Hacker News ([1] where it got no traction) and Reddit [2] (where it led to a discussion but then got deleted by a mod).

[1] https://news.ycombinator.com/item?id=46705588

[2] https://www.reddit.com/r/ExperiencedDevs/comments/1qj03gq/wh...

Ask HN: What are the metrics for "AI-generated technical debt"?

willj — Wed, 21 Jan 2026 13:41:37 +0000

Here’s one place where I think proponents and skeptics of agentic coding tools (Claude Code, Codex, etc.) tend to talk past each other:

Proponents say things like:

- “I shipped feature X in days instead of weeks.”

- “I could build this despite not knowing Rust / the framework / the codebase.”

- “This unblocked work that would never have been prioritized.”

Skeptics say things like:

- “This might work for solo projects, but it won’t scale to large codebases with many developers.”

- “You’re trading short-term velocity for long-term maintainability, security, and operability.”

- “You’re creating tons of technical debt that will surface later.”

I’m sympathetic to both sides. But the asymmetry is interesting: The pro side has quantifiable metrics (time-to-ship, features delivered, scope unlocked). The con side often relies on qualitative warnings (maintainability, architectural erosion, future cost).

In most organizations, leadership is structurally biased toward what can be measured: velocity, throughput, roadmap progress. “This codebase is a mess” or “This will be a problem in two years” is a much harder sell than “we shipped this in a week.”

My question: Are there concrete, quantitative ways to measure the quality and long-term cost side of agentic coding?. In other words: if agentic coding optimizes for speed, what are the best metrics that can represent the other side of the tradeoff, so this isn’t just a qualitative craftsmanship argument versus a quantitative velocity argument?

Comments URL: https://news.ycombinator.com/item?id=46705588

Points: 3

# Comments: 2

New comment by willj in "Ask HN: What are you working on? (January 2026)"

willj — Wed, 14 Jan 2026 01:25:45 +0000

Can you say more about the approach you take for summarization? Are the papers short enough that you just put the whole thing in the context window of the model you’re using, or do you do anything fancy? I’ve tried out various summarization approaches (hierarchical, aspect-based, incremental refinement), and am curious what you found works best for your use case.

New comment by willj in "Ask HN: What are you working on? (January 2026)"

willj — Mon, 12 Jan 2026 11:50:22 +0000

Thanks! I got some initial ideas from Nano Banana, actually, but then spent a while iterating on different layouts myself.

New comment by willj in "Ask HN: What are you working on? (January 2026)"

willj — Mon, 12 Jan 2026 02:08:50 +0000

This is something I built over the holidays to support people having a hard time with the short days and early sunsets: https://sunshineoptimist.com.

For the past several years I would look up the day lengths and sunset times for my location and identify milestones like “first 5pm sunset”, “1 hour of daylight gained since the winter solstice”, etc. But that manual process also meant I was limited to sharing updates on just my location, and my friends only benefitted when I made a post. I wanted to make a site anyone could come to at any time to get an optimistic message and a milestone to look forward to.

Some features this has:

- Calculation of several possible optimistic headlines. No LLMs used here.

- Offers comparisons to the earliest sunset of the year and shortest day

- Careful consideration of optimistic messaging at all times of year, including after the summer solstice when daylight is being lost

- Static-only site, no ads or tracking. All calculations happen in the browser.

Show HN: Sunshine Optimist: Optimistic takes on daylight and sunset times

willj — Sun, 11 Jan 2026 20:00:25 +0000

This is something I built over the holidays to support people having a hard time with the short days and early sunsets. For the past several years I would look up the day lengths and sunset times for my location and identify milestones like “first 5pm sunset”, “1 hour of daylight gained since the winter solstice”, etc. But that manual process also meant I was limited to sharing updates on just my location, and my friends only benefitted when I made a post. I wanted to make a site anyone could come to at any time to get an optimistic message and a milestone to look forward to.

Some features this has:

- Calculation of several possible optimistic headlines. No LLMs used here.

- Offers comparisons to the earliest sunset of the year and shortest day

- Careful consideration of optimistic messaging at all times of year, including after the summer solstice when daylight is being lost

- Static-only site, no ads or tracking. All calculations happen in the browser.

Comments URL: https://news.ycombinator.com/item?id=46579370

Points: 1

# Comments: 0

New comment by willj in "AI coding assistants are getting worse?"

willj — Thu, 08 Jan 2026 15:52:02 +0000

I think the models are so big that they can’t keep many old versions around because they would take away from the available GPUs they use to serve the latest models, and thereby reduce overall throughput. So they phase out older models over time. However, the major providers usually provide a time snapshot for each model, and keep the latest 2-3 available.

New comment by willj in "Avoid Mini-Frameworks"

willj — Wed, 24 Dec 2025 22:06:47 +0000

This reminds me a bit of using LLM frameworks like langchain, Haystack, etc., especially if you’re only using them for the chat completions or responses APIs and not doing anything fancy.

New comment by willj in "Ask HN: What are some impressive vibe coding projects?"

willj — Mon, 20 Oct 2025 11:49:46 +0000

DOOMscroll[1] for sure! I still play it since hearing about it on HN.

[1] https://ironicsans.ghost.io/doomscrolling-the-game/

New comment by willj in "Judge mulls sanctions over Google's destruction of internal chats"

willj — Sun, 05 May 2024 12:59:19 +0000

I think this ignores that the monopolies have the power to buy up any new competitors, or to drive them out of business using monopoly power. Regulatory hurdles are only one tool that (can) benefit monopolies.

New comment by willj in "OpenAI transcribed over a million hours of YouTube videos to train GPT-4"

willj — Tue, 09 Apr 2024 10:47:44 +0000

I think that’s different. AlphaGo is using reinforcement learning in a context in which there is a clear evaluation function— did a strategy lead to a win or loss.

New comment by willj in "Show HN: Beyond text splitting – improved file parsing for LLMs"

willj — Mon, 08 Apr 2024 11:18:22 +0000

Relatedly, the OCR component relies on PyMuPDF, which has a license that requires releasing source code, which isn’t possible for most commercial applications. Is there any plan to move away from PyMuPDF, or is there a way to use an alternative?

New comment by willj in "The Obscene Energy Demands of A.I"

willj — Sat, 09 Mar 2024 15:29:59 +0000

I’d argue Bitcoin is Obscene Energy Demand.

New comment by willj in "Google's once happy offices feel the chill of layoffs"

willj — Mon, 05 Feb 2024 14:09:58 +0000

Where are the happy offices these days? Which companies are the new “Google” who people are very eager to work for?

New comment by willj in "Microsoft Teams outage causes connection issues, message delays"

willj — Sat, 27 Jan 2024 04:59:25 +0000

That makes me think of this: https://nohello.net/en/

New comment by willj in "How to prioritize tasks?"

willj — Wed, 25 Jan 2023 13:46:53 +0000

I like this idea, but isn’t this a recipe for only ever doing the urgent stuff, not the not-urgent-but-important stuff? For example, if your list had “read 1 chapter of SICP” on it, you might never get to it.

New comment by willj in "PyTorch 1.10"

willj — Fri, 22 Oct 2021 11:43:46 +0000

One thing with fastai that annoyed me when I built a project with it was that the v1 and v2 APIs are totally different, and if you google or search stackoverflow for help with something, I found it more likely to stumble on answers for the v1 API than the v2 API. I also didn’t find their documentation super helpful for more than the most basic things (though not all documentation can be as amazing as scikit-learn).