Hacker News: avereveard

New comment by avereveard in "If AI writes your code, why use Python?"

avereveard — Mon, 11 May 2026 21:16:33 +0000

tldr 2% average point lost on Rust compared to python, gap vary by model, go has a better upper bound but opus had it 3% below python.

benchmark is a bit old but research on why is there, article is just vibes

New comment by avereveard in "AWS North Virginia data center outage – resolved"

avereveard — Sat, 09 May 2026 05:16:30 +0000

All the control plane. Data plane is distributed and roles using iam to access resources can still do so during a control plane outage.

New comment by avereveard in "The bottleneck was never the code"

avereveard — Wed, 06 May 2026 23:37:21 +0000

You don't wait. You run multiple independent incremental feature in parallel, while also running a code review, which will create the next set of tasks while you or the llm think up the feature to add after.

New comment by avereveard in "Microsoft Edge stores all passwords in memory in clear text, even when unused"

avereveard — Mon, 04 May 2026 22:36:12 +0000

It's surprisingly hard to do the compiler or cpu may see a write without a read and optimize it away. Windows has a SecureZeroMemory and a few other barrier primitives but not all languages reach to it

New comment by avereveard in "OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs"

avereveard — Wed, 29 Apr 2026 12:44:02 +0000

For better and worse tho.

New comment by avereveard in "A good AGENTS.md is a model upgrade. A bad one is worse than no docs at all"

avereveard — Tue, 28 Apr 2026 20:36:35 +0000

eh, good programmer are goal oriented, today SOTA models still need for the most part step by step guidance, so there's a gap still.

the AGENTS.md pieces that pin specific tool-call shapes or force chain-of-thought before action are coping that ages out, same lifecycle as the retry-with-different-prompt loops or chains of thought prompt most stacks shipped in 2024 to compensate for brittle instruction-following.

not quite there yet, but it's nice to see them being shorter and shorter as model release until all the basic are peeled out by the march of progress and one day only the invariants will be left there

New comment by avereveard in "Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview"

avereveard — Mon, 27 Apr 2026 14:42:44 +0000

"astounding how much the harness matters" is the right read and it should be the lasting one. the model is rentable, the prompts are rentable, the benchmark numbers are mostly a function of the harness around them. swapping Gemini for Sonnet underneath the same harness has a smaller bench delta than swapping the harness around the model. the cheating-agents post you linked is the same observation through a different lens, the harness is what's being measured, the model is just the substrate.

that said context management seem to be solving today model problems, more than being an universal property, and will probably be obsoleted a few model generations down the road, as tool obsoleted RAG context injection from question embeddings.

New comment by avereveard in "Show HN: Browser Harness – Gives LLM freedom to complete any browser task"

avereveard — Mon, 27 Apr 2026 09:09:47 +0000

"Minimal infrastructure, max LLM freedom" works great for personal automation. The same shape under enterprise security review collapses on the question of what cannot happen, which is exactly what the prompts-and-vibes school doesn't have a structural answer for. Direct CDP hands the model the keys; the harness around it is what should decide which doors the keys open.

Most agent stacks at AI startups have that layer as llm driven glue rather than an owned surface, and it shows up as a re-architecture cost on every model release. model should be replaceable, the integrations and guardrails specific to the customer's environment should not.

New comment by avereveard in "If you stop hiring juniors, your senior engineers own you"

avereveard — Mon, 27 Apr 2026 09:03:57 +0000

as knowledge is commoditized the bar for junior raises, what was advanced math research a century ago is now undergrads' homework. I don't see why code is so special in that regard that cannot progress beyond artisanship.

New comment by avereveard in "SWE-bench Verified no longer measures frontier coding capabilities"

avereveard — Mon, 27 Apr 2026 08:54:30 +0000

the ecosystem obsession with public benchmarks comes from the fact that running benchmark costs, and labs don't test on any given private benchmark

but yeah you're correct anyone optimizing for public-bench rank instead of their own task-distribution eval has been pointing at the wrong thing for a while

still I guess useful signal to know which one model to consider, negative signal is still signal, assuming everyone is gaming benchmark in certain ways, lack of performance do result in a real workload effect

New comment by avereveard in "The West forgot how to make things, now it’s forgetting how to code"

avereveard — Sun, 26 Apr 2026 15:15:07 +0000

Why do millions need to code? When we industrialized million lost the knowledge to sewn a garment. Why a few specialized worker producing self building solution is that bad?

I mean beyond the obvious hacker news bias.

If you like it nobody will remove it to you as a hobby. But the artisanal aspect of coding as a production mechansim is dying, and it was about time.

New comment by avereveard in "Using coding assistance tools to revive projects you never were going to finish"

avereveard — Sat, 25 Apr 2026 22:12:49 +0000

Same I purposefully have a number of over ambitious project out of distribution entirely to test so failure mode, mostly games, when one works, well I gained a new game. Can't wait for my 10 player battleship game on a 100x100 grid to be functional.

New comment by avereveard in "DeepSeek v4"

avereveard — Fri, 24 Apr 2026 09:44:54 +0000

Yeah but gemini has a hard time discussing about solutions it just jump to implementation which is great if it gets it right and not so great if it goes down the wrong path.

Not saying it is better or worse, but the way I perpersonally prefer is to design in chat, to make sure all unknown unknown are addressed

New comment by avereveard in "Familiarity is the enemy: On why Enterprise systems have failed for 60 years"

avereveard — Fri, 24 Apr 2026 07:18:40 +0000

Eh, it's skipped in "the enemy" section an important bit, that was spelled out in the intro by the buyer, and wasn't listened: if the small vendor goes bust, who maintains the system after? if you plan for in 10 year cycles, greenfield buys look scary

That why vc look favorably to startup which go trough the motion of setting up partner led sales channel. an established partner taking maintenance contracts bridge the disconnect in the lifecycle gap between the two realities.

But no, corporate is bad, I guess.

New comment by avereveard in "DeepSeek v4"

avereveard — Fri, 24 Apr 2026 06:03:09 +0000

eh idk. until yesterday opus was the one that got spatial reasoning right (had to do some head pose stuff, neither glm 5.1 nor codex 5.3 could "get" it) and codex 5.3 was my champion at making UX work.

So while I agree mixed model is the way to go, opus is still my workhorse.

New comment by avereveard in "Our newsroom AI policy"

avereveard — Thu, 23 Apr 2026 10:59:51 +0000

Depends on topic, often what they consider important isn't what is important and details that are essential get out of view. I'm having good success with youtube video, not as much with technical docs.

New comment by avereveard in "Show HN submissions tripled and now mostly have the same vibe-coded look"

avereveard — Wed, 22 Apr 2026 15:40:28 +0000

Ask a llm for a code review along code duplication, encapsulation and sequential coupling as quality axes and the difference should show up readily

New comment by avereveard in "Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return"

avereveard — Tue, 21 Apr 2026 15:16:09 +0000

Cannot get Tranium anywhere else and NVIDIA commands a super high premium.

New comment by avereveard in "Deezer says 44% of songs uploaded to its platform daily are AI-generated"

avereveard — Tue, 21 Apr 2026 06:50:35 +0000

this is all circular, with the why being the claim. I still see no why, just phrased slightly differently

New comment by avereveard in "Deezer says 44% of songs uploaded to its platform daily are AI-generated"

avereveard — Mon, 20 Apr 2026 21:52:54 +0000

Why?