Hacker News: djhn

New comment by djhn in "Bullshit Machines"

djhn — Wed, 13 May 2026 07:05:56 +0000

I think I know the examples you’re talking about. They don’t show much in terms of reasoning.

The Erdős problems have turned out to be largely brute force or finding older results.

The Feb 2026 GPT-5.2 theoretical physics paper was a result of “dialogue between physicists and LLMs”, called “grad student level” by experts in the field, used a “custom harnessed” “internal OpenAI” model with “20 hours of reasoning”. Quotes from OpenAI blog.

The Matthew Schwartz physics paper with Claude this March involved “51,248 messages across 270 sessions, producing over 110 draft versions and consuming 36 million tokens”, and the actual contribution was Schwartz finding an error in Claude’s solution.

New comment by djhn in "Where Are All the Data Centers?"

djhn — Wed, 13 May 2026 06:05:35 +0000

I’m afraid your numbers, all over 99%, are anchoring the conversation to an unreasonably high quality level.

I would have personally gone for 75%, 85% and 95%, which are all still best case scenario answers.

Had I taken on chatbot advice on electronics or chemistry I’d have died every couple of weeks (doing some hands-on real world R&D in my basement as a distraction from software).

New comment by djhn in "Bullshit Machines"

djhn — Wed, 13 May 2026 05:56:37 +0000

I’m genuinely interested in someone countering the following evidence that supports the authors.

Plane of words: broadly correct. Everything is flattened to tokens and token sequences, and the training data is dominated by text tokens.

Reasoning: CoT tokens are mostly just tokens, more appropriately called intermediate tokens, and are largely disconnected from the end result. Including them improves the end result (user satisfaction), but does not imply reasoning. See for example Turpin 2023, Mirzadeh 2024, Pournemat 2025, Palod 2025.

Synthesising evidence: You can achieve SOTA summaries with LLMs, but this involves, for example, using a harness to generate dozens of summaries with different models, separately using some kind of vector embedding model to compare results to the original, and selecting the best match. This is not how most people are using LLMs for summaries. While this is being slowly RLVR’d in post-training, a one-shot naive summary underperforms more complex methods significantly.

New comment by djhn in "I'm going back to writing code by hand"

djhn — Mon, 11 May 2026 20:30:09 +0000

That is certifiably insane if that code touches anything that’s exposed to the internet or any PII. What kind of industries is those acceptable in?

New comment by djhn in "Vibe coding and agentic engineering are getting closer than I'd like"

djhn — Thu, 07 May 2026 10:24:56 +0000

Ok, fair. I incorrectly assumed you meant resizing static images to create a lower resolution preview image.

Video thumbnails are a different beast altogether. And you might want to double check your assumptions about security considerations. If any of your ffmpeg, opencv, pyscenedetect code is running on your server, it might well be exploitable.

New comment by djhn in "Vibe coding and agentic engineering are getting closer than I'd like"

djhn — Thu, 07 May 2026 04:53:15 +0000

But a thumbnail generator is a 1 hour task at best if you’re on a solo greenfield project and it’ll still be a 6 week project at an enterprise, even with AI.

New comment by djhn in "Agent Skills"

djhn — Tue, 05 May 2026 17:10:41 +0000

I think I should also clarify, I work in the training of encoder-decoder transformer models. Before the ChatGPT era I worked on on encoder-only transformer models. I'm not unfamiliar with the literature and general discourse. I just do not use LLMs for programming.

New comment by djhn in "Agent Skills"

djhn — Tue, 05 May 2026 09:31:04 +0000

I can take on a slightly weaker form in good faith: professionally it’s a non-starter until private, open source inference can be self-hosted and the ROI is clear enough to invest in that.

And on the ROI side, trying things out regularly, I haven’t seen the positive ROI in the limited time I’ve dedicated to exploring the tools. I’ve restricted experimenting to 4 hours per month, because spending more than 2.5% of the month chasing productivity improvements that realistically seem to be 10-20%, will quickly eat into those gains. After accounting for token costs, it ends up being a wash.

New comment by djhn in "Agent Skills"

djhn — Tue, 05 May 2026 09:21:01 +0000

What kind of code is infrastructure in this context? Devops in a software company? Internal tooling in a software org?

New comment by djhn in "OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors"

djhn — Mon, 04 May 2026 05:15:47 +0000

At many (otherwise) world-leading facilities even just reviewing the patient history is a slog. There is rarelly any ability to keyword search the records or even filter the records by location, title and occupation of the healthcare professional making it, etc. Especially very ill people will have hundreds and hundreds of recent entries.

And stepping through those entries isn’t like browsing a modern local-first app [1], where you will just scroll through dozens of entries in milliseconds. It’s not like the slightly older and slightly slower Gmail interface. You’re clicking on each record and waiting 400ms-3s for it to load, as if instead of a 25Gb fiber connection you’re on dialup requesting the record from Epic’s headquarters in the US and proxying them via Australia.

[1] https://bugs.rocicorp.dev/p/roci

New comment by djhn in "This Month in Ladybird – April 2026"

djhn — Sun, 03 May 2026 08:20:49 +0000

Evidence? Harsh accusation.

New comment by djhn in "Becoming a father shrinks your cerebrum (2022)"

djhn — Sun, 03 May 2026 07:45:07 +0000

I might have too French of an attitude towards parenting for American taste, but as long as the crying and screaming isn’t based on anything real (and as long as you’ve childproofed the children’s room well enough it shouldn’t be) the child will be fine and what y’all need is sufficient distance between the bedrooms, some nice, solid brick walls in between the the rooms and some earplugs.

New comment by djhn in "Becoming a father shrinks your cerebrum (2022)"

djhn — Sat, 02 May 2026 18:06:26 +0000

What could a 6-24 month old possibly do from their bed in their room, to disturb your sleep in your bed in your room? Bring a trumpet to bed and badly play Miles Davis?

What happened to lights off, door closed, do whatever you want in complete darkness in the bed that you aren’t able to climb out of?

New comment by djhn in "Spotify adds 'Verified' badges to distinguish human artists from AI"

djhn — Sat, 02 May 2026 17:27:19 +0000

I agree with the sentiment. And I think we’ve accidentally stumbled upon how the prompt-writer should be viewed: the buyer, or sponsor of the output. A punter, if you will, would be even more appropriate. The financial commitment is minor amd the process is largely a gamble.

New comment by djhn in "Spotify adds 'Verified' badges to distinguish human artists from AI"

djhn — Sat, 02 May 2026 12:07:36 +0000

This is a fun analogy, even if it’s just novel to me.

With any kind of creative work for hire, from architecture to advertising, from jingles to commisioned sculptures, the client’s taste and budget, more than almost anything else, determine the outcome.

Take Cannes Lions as an example of a competition and awards ceremony that essentially exists to define what ’good taste’ means within that industry. The client’s team is prominently credited alongside the creative agency. They get to climb onto the stage for the speech and they have a voice on whatever video clip is made about the project.

Partly this is to encourage more ambitious and spendy work for the industry at large. But everyone involved certainly knows, that the same creative team, with the same creative idea, could have ended up making something much worse working with a different client team.

I can’t stand AI slop, yet I think I’ve unintentionally argued in favour of people creating it, as long as it’s… good by some measure?

New comment by djhn in "Where the goblins came from"

djhn — Thu, 30 Apr 2026 21:14:23 +0000

Do these… ”groups”, which would rather you not lump chewing tobacco together with other tobacco products… manufacture and sell chewing tobacco?

New comment by djhn in "Talkie: a 13B vintage language model from 1930"

djhn — Wed, 29 Apr 2026 04:32:16 +0000

Is there a book for a lay person you could recommend on this? Something a bit more rigorous than Yuval Harari, Bill Bryson and the like but not aimed at fellow historians only.

New comment by djhn in "Microsoft and OpenAI end their exclusive and revenue-sharing deal"

djhn — Tue, 28 Apr 2026 04:29:47 +0000

Yet another reason to doubt claims that ”software is solved”.

Anthropic did retire an interview take-home assignment involving optimising inference on exotic hardware, because Claude could one shot a solution, but that was clearly a whiteboard hypothetical instead of a real system with warts, issues and nuance.

New comment by djhn in "An AI agent deleted our production database. The agent's confession is below"

djhn — Sun, 26 Apr 2026 21:01:03 +0000

Throughout history people have taken precautions against ceilings disintegrating. One might even say, ”strong engineering controls”.

Some of the best known laws from the ~1700BC Babylonian legal text, The Code of Hammurabi, are laws 228-233, which deal with building regulations.

229. If a builder builds a house for a man and does not make its construction firm, and the house which he has built collapses and causes the death of the owner of the house, that builder shall be put to death.

230. If it causes the death of the son of the owner of the house, they shall put to death a son of that builder.

233. If a builder constructs a house for a man but does not make it conform to specifications so that a wall then buckles, that builder shall make that wall sound using his silver (at his own expense).

That doesn’t sound like ceilings never disintegrated!

New comment by djhn in "Amateur armed with ChatGPT solves an Erdős problem"

djhn — Sun, 26 Apr 2026 20:25:06 +0000

I feel compelled to concur with fwip, dpark and breezybottom. LLMs and the chatbot interfaces built for these text generating models are very good at writing fiction, including writing fictional roles and acting out those roles. Don’t get too carried away by this fiction.