Hacker News: robkop

New comment by robkop in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"

robkop — Wed, 20 May 2026 11:36:08 +0000

Just saying you’re not alone, very surprised by the reception given how brutally sloppified the OP is.

Interesting problems space but I hope the author just gives dot points next time rather than bloating it and losing most of its meaning.

New comment by robkop in "Show HN: Actual Claude Tokenizer"

robkop — Sun, 03 May 2026 15:37:41 +0000

Could you please elaborate a bit more for my understanding?

What in particular about this method breaks correct token boundaries?

On my first read I read your comment as there are special tokens that require multiple tokens to emit, hence you can't get certain tokens emitted alone - but I don't think that's what you're getting at on a second read?

Interesting that you've found similarities between "d" and the hidden tokens for opening an xml tag, pressing caps lock and the other hidden tokens of note. I haven't run into any trouble extracting "d" tokens, is it a particular model that you see create that pattern?

New comment by robkop in "Claude.ai unavailable and elevated errors on the API"

robkop — Tue, 28 Apr 2026 20:58:51 +0000

I use bedrock with 1M context every day. Not sure this is right

New comment by robkop in "SpaceX says it has agreement to acquire Cursor for $60B"

robkop — Wed, 22 Apr 2026 11:11:53 +0000

A lot of enterprises were doing that but now they hit the 150 user limit on Claude and are paying seat+api rates.

Codex is still going strong but it’s hard to imagine they won’t do similar eventually.

So now im honestly hearing a lot more folk stick it out with cursor while waiting for the dust to settle.

Show HN: Actual Claude Tokenizer

robkop — Mon, 20 Apr 2026 13:51:25 +0000

I've seen a few "Claude tokenizers" floating around lately with all the 4.7 chatter, but most of them just hit the count_tokens endpoint and hand you back a number. You don't actually see how your text gets split or understand the changes from 4.6 to 4.7.

I built this a while back for doing some mech interp research. It faithfully represents Claude token splitting - showing hidden tokens, real boundaries and so on. It is not cheap to run - essentially n^2 cost - you could optimise for longer sequences but you are not guaranteed a faithful representation if so.

Open Source: https://github.com/R0bk/claude-tokenizer

Feedback welcome, let me know if there are any edge cases that look wrong.

P.S. I'd expect this to face a similar fate as streaming chunk and prefill based token extraction methods did. I do worry about the ability to do independent research once it's fully closed off and would love it if there was more public frontier tokenizers.

Comments URL: https://news.ycombinator.com/item?id=47834347

Points: 3

# Comments: 4

New comment by robkop in "Are the costs of AI agents also rising exponentially? (2025)"

robkop — Sun, 19 Apr 2026 12:58:38 +0000

There’s a lot of tradeoffs to play with, those inference ASICs may not carry the gradient but they are still optimised for larger batches and to run any model. They need enough memory for the weights, wide batch inference, and ideally leftovers for kv cache efficiency.

For personal inference you’re given a lot more room to play in - much of it poorly explored today - enough to concern an argument of cost advantages evaporating

New comment by robkop in "Are the costs of AI agents also rising exponentially? (2025)"

robkop — Sun, 19 Apr 2026 12:37:39 +0000

You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.

Obviously it’s not ideal but you could likely have single digit % of all weights affected and still have a useful model (many caveats here: e.g. locality of damaged weights matters, distribution of errors matters, fail high/low matters, …)

New comment by robkop in "Why do we tell ourselves scary stories about AI?"

robkop — Sat, 11 Apr 2026 05:01:36 +0000

I can’t speak for the states, but in AU I clearly see a massive displacement of undergrad and junior roles (only in AI exposed domains).

I say this as both someone who works with many execs, hearing their musings, and someone who no longer can justify hiring junior roles themselves.

Irrespective of that; if we take this strategy of only taking action once it is visible to the layman - our scope of actions available will be invariably and significantly diminished.

Even if you are not convinced it is guaranteed and do not believe what myself and others see. I would ask you is your probability of it happening now really that close to 0? If not then would it not be prudent to take the risk seriously?

New comment by robkop in "Sam Altman's response to Molotov cocktail incident"

robkop — Sat, 11 Apr 2026 03:21:42 +0000

one of their highlights with mythos was it's ability to generate new puns

I took a look and honestly they're the first AI puns that aren't bad

Times are changing

New comment by robkop in "LLM=True"

robkop — Wed, 25 Feb 2026 10:20:31 +0000

We’ve got a long way to go in optimising our environments for these models. Our perception of a terminal is much closer to feeding a video into Gemini than reading a textbook of logs. But we don’t make that ax affordance at the moment.

I wrote a small game for my dev team to experience what it’s like interacting through these painful interfaces over the summer www.youareanagent.app

Jump to the agentic coding level or the mcp level to experience true frustration (call it empathy). I also wrote up a lot more thinking here www.robkopel.me/field-notes/ax-agent-experience/

New comment by robkop in "Qwen3.5: Towards Native Multimodal Agents"

robkop — Mon, 16 Feb 2026 12:40:43 +0000

Rumours say you do something like:

  Download every github repo
    -> Classify if it could be used as an env, and what types
      -> Issues and PRs are great for coding rl envs
      -> If the software has a UI, awesome, UI env
      -> If the software is a game, awesome, game env
      -> If the software has xyz, awesome, ...
    -> Do more detailed run checks, 
      -> Can it build
      -> Is it complex and/or distinct enough
      -> Can you verify if it reached some generated goal
      -> Can generated goals even be achieved
      -> Maybe some human review - maybe not
    -> Generate goals
      -> For a coding env you can imagine you may have a LLM introduce a new bug and can see that test cases now fail. Goal for model is now to fix it
    ... Do the rest of the normal RL env stuff

New comment by robkop in "Dario Amodei – "We are near the end of the exponential" [video]"

robkop — Sat, 14 Feb 2026 01:00:35 +0000

I get this at least once a week. And then once you have to dig in and understand the full mental model it’s not really giving you any uplift anyway.

I will say that doing this for enough months has made my ability to pick up the mental model quickly and to scope how much need to absorb much quicker. It seems possible that with another year you’d become very rapid at this.

Show HN: You Are an Agent

robkop — Sun, 01 Feb 2026 20:59:12 +0000

After adding "Human" as a LLM provider to OpenCode a few months ago as a joke, it turns-out that acting as a LLM is quite painful. But it was surprisingly useful for understanding real agent harnesses dev.

So I thought I wouldn't leave anyone out! I made a small oss game - You Are An Agent - youareanagent.app - to share in the (useful?) frustration

It's a bit ridiculous. To tell you about some entirely necessary features, we've got: - A full WASM arch-linux vm that runs in your browser for the agent coding level - A bad desktop simulation with a beautiful excel simulation for our computer use level - A lovely WebGL CRT simulation (I think the first one that supports proper DOM 2d barrel warp distortion on safari? honestly wanted to leverage/ not write my own but I couldn't find one I was happy with) - A MCP server simulator with full simulation of off-brand Jira/ Confluence/ ... connected - And of course, a full WebGL oscilloscope music simulator for the intro sequence

Let me know what you think!

Code (If you'd like to add a level): https://github.com/R0bk/you-are-an-agent

(And if you want to waste 20 minutes - I spent way too long writing up my messy thinking about agent harness dev): http://robkopel.me/field-notes/ax-agent-experience/

Comments URL: https://news.ycombinator.com/item?id=46849318

Points: 14

# Comments: 0

New comment by robkop in "You Are an Agent – Try Being a Human LLM"

robkop — Sun, 25 Jan 2026 16:38:36 +0000

I added a "Human" LLM provider to my local OpenCode a few months ago as a joke, and it turns-out acting as a LLM is quite painful. But it massively improve my agent harnesses dev skills.

So I thought I wouldn't leave anyone out! I made a small oss game - You Are An Agent - youareanagent.app - to share in the (useful?) frustration

Let me know what you think!

Code (If you'd like to add a level): https://github.com/R0bk/you-are-an-agent (And if you want to waste 20 minutes - I spent way too long writing up my messy thinking about agent harness dev): http://robkopel.me/field-notes/ax-agent-experience/

You Are an Agent – Try Being a Human LLM

robkop — Sun, 25 Jan 2026 16:38:36 +0000

Article URL: https://youareanagent.app/

Comments URL: https://news.ycombinator.com/item?id=46755568

Points: 3

# Comments: 1

New comment by robkop in "Ax Not UX"

robkop — Thu, 22 Jan 2026 16:37:45 +0000

It's a fair question - I think the fact that they hold abilities (read 200k tokens instantly, can clone themselves, ...) that we don't would suggest they will have quirks and differecnes.

What downstream implication that will have on a AX sense is certainly arguable, but I would put forward that we're already seeing it with effective harnesses such as Claude Code. The experience the agent has there is quite different to how you'd build an IDE for a human.

Ax Not UX

robkop — Thu, 22 Jan 2026 16:30:57 +0000

Article URL: https://www.robkopel.me/field-notes/ax-agent-experience/

Comments URL: https://news.ycombinator.com/item?id=46721473

Points: 3

# Comments: 2

New comment by robkop in "Ask HN: Share your personal website"

robkop — Wed, 14 Jan 2026 20:17:54 +0000

https://robkopel.me

New comment by robkop in "2025 Letter"

robkop — Fri, 02 Jan 2026 00:40:54 +0000

Can you elaborate? I would have thought the main driver for the price of a service is the labor?

New comment by robkop in "OpenAI's cash burn will be one of the big bubble questions of 2026"

robkop — Wed, 31 Dec 2025 00:35:28 +0000

Does that cost to serve multiple stay the same when conventional sites are forced to shovel ai into each request? e.g. the new google search