Hacker News: numeri

New comment by numeri in "AI agent bankrupted their operator while trying to scan DN42"

numeri — Mon, 15 Jun 2026 02:38:56 +0000

One context I could imagine is a young person with shaky grasp of English trying to come up with an interesting school/university project via conversations with an LLM set up as an OpenClaw agent.

It's got the right combinations of inexperience, cluelessness, panic, expectations that Westerners are rich, and hopes of others being willing to fix their mistake.

New comment by numeri in "Can I Buy Your KV Cache?"

numeri — Fri, 12 Jun 2026 22:47:14 +0000

especially because this is the most painfully glaring flaw in their plan. Their solution is for an inference provider to... store the KV cache (which they can compute!) on-premise, on their own disks, but pay some third party for it?

New comment by numeri in "Claude Fable is relentlessly proactive"

numeri — Fri, 12 Jun 2026 17:16:11 +0000

I've had it happen. I ran an experiment, taking a couple hours and producing ~2 GiB of files. One of the results looked good, so I told Claude Opus 4.5 (at the time) to commit the code changes, upload the important file to cloud storage, then clean up the rest.

I then saw it run `rm -r results/`, before messaging me: "Now all that's left is for you to upload the successful results, then I'll delete the rest!"

Why did it not upload the files itself, when it had been using the cloud storage CLI during that session? No clue. I do accept that I could have and should have just uploaded the file myself. It would have taken 3 seconds to type.

New comment by numeri in "Claude Fable 5: mid-tier results on coding tasks"

numeri — Thu, 11 Jun 2026 20:36:59 +0000

To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)

It's not a great sign for alignment.

New comment by numeri in "Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes"

numeri — Fri, 05 Jun 2026 05:28:09 +0000

I would just warn that you may not be able to recognize what is worth learning at your stage.

Intuition for library design and the architecture of software packages/external APIs is something you can only learn by doing.

New comment by numeri in "Good sleep, good learning, good life (2012)"

numeri — Wed, 15 Apr 2026 20:51:40 +0000

I have DSPD as well, and was pleasantly surprised to see how much of the article discussed DSPD.

That being said, I do think a lot of what the author is saying flies right in the face of traditional advice, esp. the suggestion that we should all just free-sleep and rotate around the clock. I personally find myself happiest when I'm entrained to the 24-hour cycle, but at my own natural offset. Whenever I've been cycling the day it's felt miserable, uncontrollable and exhausting.

To be fair, the author did claim that you can fully solve this by completely cutting out after-dark electronics, but I've tried pretty intensely to do exactly that for extended periods in the past, and didn't see any progress. I do sleep amazingly when camping, though, and the delay is lesser than normal (still definitely there).

New comment by numeri in "Agent Reading Test"

numeri — Mon, 06 Apr 2026 23:36:53 +0000

11/20 for qwen/qwen3.5-flash-02-23 in Claude Code, with effort set to low.

New comment by numeri in "Owner of ICE detention facility sees big opportunity in AI man camps"

numeri — Mon, 09 Mar 2026 13:53:22 +0000

No, that's what the headline implies, and the body of the article doesn't support at all. It's (currently, and with no indication of intent to change this) two separate branches of their business.

New comment by numeri in "Mercury 2: Fast reasoning LLM powered by diffusion"

numeri — Wed, 25 Feb 2026 15:24:01 +0000

but Taalas had to quantize Llama 3.1 8B to death to get it to fit. It can't produce coherent non-English text at all.

New comment by numeri in "Ask HN: What explains the recent surge in LLM coding capabilities?"

numeri — Mon, 16 Feb 2026 17:08:20 +0000

and if I was to guess, the latest generation of models (Claude Opus 4.6, GPT-5.3-codex, etc.) differ from Opus 4.5, GPT 5.2 primarily in the addition of deeper, more difficult (most likely agentic and coding-based, like Terminal Bench) tasks to their RLVR training.

I could be completely off, as my intuition here is fully based on public research papers, but it seems to explain the current state of things fairly well.

Petition for Recognition of Work on Open-Source as Volunteering in Germany

numeri — Wed, 04 Feb 2026 04:46:15 +0000

Article URL: https://www.openpetition.de/petition/online/recognition-of-work-on-open-source-as-volunteering-in-germany

Comments URL: https://news.ycombinator.com/item?id=46881568

Points: 213

# Comments: 50

Exploration Posteriors for Generative Modeling Using Only Negative Rewards

numeri — Tue, 03 Feb 2026 23:47:14 +0000

Article URL: https://arxiv.org/abs/2510.09596

Comments URL: https://news.ycombinator.com/item?id=46879151

Points: 1

# Comments: 0

New comment by numeri in "Ask HN: Do you still use physical calculators?"

numeri — Sun, 01 Feb 2026 23:10:55 +0000

No, Python or units[1] is always a better choice if I'm near a computer (and I nearly always am these days, unfortunately, I suppose). I do have three wonderful slide rules, though.

[1]: https://www.gnu.org/software/units/

New comment by numeri in "Finland looks to introduce Australia-style ban on social media"

numeri — Sun, 01 Feb 2026 23:00:13 +0000

Introducing a solid zero-knowledge age verification option is the opposite direction of ending anonymity in the Internet, which other parts of the same governments are also working on.

So yeah, I'll gladly trust and cheer on the part working in the right direction.

Underrated reasons to be thankful V

numeri — Thu, 27 Nov 2025 20:37:51 +0000

Article URL: https://dynomight.net/thanks-5/

Comments URL: https://news.ycombinator.com/item?id=46073033

Points: 226

# Comments: 98

New comment by numeri in "It's OpenAI's world, we're just living in it"

numeri — Sat, 11 Oct 2025 00:15:40 +0000

I'll just throw in support for gaming on Linux – it's pretty nice feeling these days! I still have the occasional (once every 5–8 months?) update cause a short-lived bug, but it's a very justifiable trade-off to avoid Windows these days.

New comment by numeri in "GPT-5-Codex is a better AI researcher than me"

numeri — Tue, 07 Oct 2025 15:27:18 +0000

This is written by someone who's not an AI researcher, working with tiny models on toy datasets. It's at the level of a motivated undergraduate student in their first NLP course, but not much more.

New comment by numeri in "How to be a leader when the vibes are off"

numeri — Thu, 25 Sep 2025 10:09:08 +0000

One sign would be occasionally changing course in response to overwhelming employee feedback. If that never or almost never happens, the feedback is being ignored, not taken constructively and not followed.

New comment by numeri in "Why language models hallucinate"

numeri — Sun, 07 Sep 2025 00:09:32 +0000

This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated.

New comment by numeri in "Grok: Searching X for "From:Elonmusk (Israel or Palestine or Hamas or Gaza)""

numeri — Fri, 11 Jul 2025 14:05:20 +0000

I really like your posts, and they're generally very clearly written. Maybe this one's just the odd duck out, as it's hard for me to find what you actually meant (as clarified in your comment here) in this paragraph:

> This suggests that Grok may have a weird sense of identity—if asked for its own opinions it turns to search to find previous indications of opinions expressed by itself or by its ultimate owner. I think there is a good chance this behavior is unintended!

I'd say it's far more likely that:

1. Elon ordered his research scientists to "fix it" – make it agree with him

2. They did RL (probably just basic tool use training) to encourage checking for Elon's opinions

3. They did not update the UI (for whatever reason – most likely just because research scientists aren't responsible for front-end, so they forgot)

4. Elon is likely now upset that this is shown so obviously

The key difference is that I think it's incredibly unlikely that this is emergent behavior due to an "sense of identity", as opposed to direct efforts of the xAI research team. It's likely also a case of https://en.wiktionary.org/wiki/anticipatory_obedience.