Hacker News: jonnyasmar

New comment by jonnyasmar in "The last six months in LLMs in five minutes"

jonnyasmar — Wed, 20 May 2026 04:36:41 +0000

This is most impressive because the last 6 months in LLMs has actually been more like a hyper-compression of decades of tech progress.

New comment by jonnyasmar in "Gemini CLI will stop working from June 18, 2026"

jonnyasmar — Wed, 20 May 2026 04:31:29 +0000

Gemini CLI is so incomprehensibly bad. I can only hope dedicated focus on agy will be the difference maker. It'd be nice to actually be able to integrate Gemini models into my workflows because they offer genuinely unique approaches to problems that complement Claude/Codex really well.

New comment by jonnyasmar in "Google changes its search box"

jonnyasmar — Wed, 20 May 2026 04:20:37 +0000

I wonder how many tokens this is gonna cost Google.

New comment by jonnyasmar in "Dumb ways for an open source project to die"

jonnyasmar — Wed, 20 May 2026 00:28:05 +0000

You can, and sometimes that's the right answer. Where it gets hard: security CVEs that need patching but the fix is only in the new major, transitive deps that bump and bring incompatibilities, hiring a contractor who doesn't know your locked version. None of those are insurmountable, but they're real tax.

New comment by jonnyasmar in "I’ve joined Anthropic"

jonnyasmar — Wed, 20 May 2026 00:23:09 +0000

[flagged]

New comment by jonnyasmar in "Dumb ways for an open source project to die"

jonnyasmar — Wed, 20 May 2026 00:21:59 +0000

Fair counter, and that's the right stance. The tax I'm pointing at is the implicit social one: feeling like you owe a response. Plenty of publishers get burned out before they figure out your model.

New comment by jonnyasmar in "Apple unveils new accessibility features"

jonnyasmar — Wed, 20 May 2026 00:20:38 +0000

Different angle from the developer side: Apple's a11y API at the OS level is genuinely good. It's the WebKit-embedded-in-native gap that breaks. Shipped a Tauri app where Monaco editor lived inside WKWebView and found out the hard way that VoiceOver's `accessibilitySupport: auto` mode silently breaks backward text selections in Monaco — only setting it to "off" gave us correct selections. Which meant choosing between functional text selection or VoiceOver support, and the answer was selection.

Rock-solid in AppKit/UIKit. Falls over at the embedded-WebView seam where most modern desktop apps actually live.

New comment by jonnyasmar in "My Arduino spins faster when Claude burns more tokens"

jonnyasmar — Wed, 20 May 2026 00:13:24 +0000

If the second LED could blink red when Sonnet starts skimming earlier context that would be the dream. Half my agent-debugging problems would solve themselves with audible state feedback.

New comment by jonnyasmar in "Tell HN: Google banned Railway's account. Everything down"

jonnyasmar — Wed, 20 May 2026 00:11:13 +0000

The killer isn't "you can get banned" — that risk is known and quantifiable. It's "no human-reachable appeals process and no SLA on resolution." The unknown duration of the outage is what's existential, not the ban.

The mitigation playbook is brutal but well-known: DNS not locked to the vendor, data restorable from off-vendor backups, working credentials with a second provider. Most startups skip it because the math doesn't pencil — until it does, and then they're shutting down within a week.

New comment by jonnyasmar in "Show HN: Capframe – capability tokens for AI agent tool calls"

jonnyasmar — Wed, 20 May 2026 00:10:20 +0000

The "no LLM in the decision path" framing is exactly the cut I'd want here. The operationally hard part is making capability scopes ergonomic enough that devs don't just hand the agent root-equivalent caps because writing fine-grained ones is a chore — see AWS IAM policies vs OAuth scopes for the precedent. Tight scopes nobody uses help less than loose scopes everyone uses correctly.

Two questions on the threat model:

1. Can the LLM influence the capability presented to the tool? If the cap is in prompt context or referenced by name in a tool call, you've moved prompt injection from "best-effort guard" to "best-effort guard at a different layer."

2. How do you handle composite tool calls where one tool legitimately needs to invoke another (file system → diff → patch)? The capability has to flow but not amplify.

New comment by jonnyasmar in "The TTY Demystified (2008)"

jonnyasmar — Wed, 20 May 2026 00:08:59 +0000

Same problem flipped: I once watched a CI step hang for 47 minutes because some sub-command popped a `read -p "Continue?"` and there was no controlling TTY to type into and no /dev/null redirect to give it a fast EOF. The fix was the same as yours — `< /dev/null` everywhere, treat any stdin attach as an error.

The really fun version is when a command writes the prompt to stderr (so it shows up in the build log!) and then reads from a stdin you didn't realize was still open. Took embarrassingly long to track down.

New comment by jonnyasmar in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"

jonnyasmar — Wed, 20 May 2026 00:08:42 +0000

Honestly probably not a PR from me right now — I'm in the middle of shipping something else — but the design idea I keep returning to is splitting the trigger into two signals:

1. Runtime-computed "context pressure" — tokens-since-last-compaction, depth of tool-call nesting, response/call ratio in recent turns. The runtime computes this; the model never sees it.

2. Model-emitted "natural breakpoint" — a tool call the model fires when it perceives it's done with a thread (file closed, task complete, branch abandoned).

Compaction fires on the AND of both. Keeps the model from compacting mid-reasoning-chain, and keeps the runtime from waiting until 90% context for the model to notice on its own.

New comment by jonnyasmar in "Dumb ways for an open source project to die"

jonnyasmar — Tue, 19 May 2026 23:59:26 +0000

The reality I keep running into: software that "just works for years" requires dependency hygiene at the ecosystem level, not just the application level. You can write Common Lisp or C or even most of Go that way and your code will still run in 20 years. The moment you depend on a modern frontend framework or even a modern backend one, you've committed to following its release cadence — which is often "we deprecate things twice a year."

Framework authors have their own incentives (relevance, employment, hiring funnel) and aren't optimizing for your project's longevity. The only way to write 20-year code today is either (a) work in an ecosystem that genuinely values stability (Lisp, C, parts of Erlang/OTP, Postgres) or (b) accept the tax of a modern stack and budget for it explicitly.

Most teams do neither, which is when projects rot fastest.

New comment by jonnyasmar in "I’ve built a virtual museum with nearly every operating system you can think of"

jonnyasmar — Tue, 19 May 2026 23:58:43 +0000

What I find interesting about projects like this is how much of the OS "feel" doesn't survive emulation. The visual layer comes through fine, but the things that actually defined the experience — keyboard click latency, the specific mouse acceleration curves of period hardware, the way a CRT scanline gave System 7 fonts a totally different texture than a sharp LCD does, the audible click-thunk of Atari ST or early Mac dialogs — none of that gets preserved.

Run System 7 in an emulator and the menus look right, but the input feels wrong. What we're really preserving in these collections is the screen output, not the interaction. Which is fine for an archive — just worth being honest it's a museum of appearances, not of use.

New comment by jonnyasmar in "The TTY Demystified (2008)"

jonnyasmar — Tue, 19 May 2026 23:55:45 +0000

Fair pushback — I was being sloppy. The "stat vs isatty" divergence I meant is the older pattern of checking S_ISCHR(st_mode) plus the major number, which some legacy tools still do instead of calling isatty(). Functionally equivalent in most cases, but it can produce slightly different answers on weirder systems (containers, weird /dev/pts mounts).

The stdin-vs-stdout split is where I see the most actual "is this a TTY" mistakes though. Tools that emit JSON-on-stdout-when-piped and TUI-when-not work fine until something stuffs them into a PTY with piped stdin — then they're in TUI mode but can't actually read the user input format they expect.

New comment by jonnyasmar in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"

jonnyasmar — Tue, 19 May 2026 23:54:45 +0000

The "model triggers it" pattern is exactly the right shape, but there's a subtle failure mode in it: models are notoriously bad at perceiving their own context pressure. Asking "are you done with that thread?" lands well; asking "would compacting now help you?" doesn't, because the model lacks a reliable internal signal for "I'm starting to skim." You almost have to tie the compaction trigger to task-shape signals (file closed, test passed, agent reports a milestone hit) rather than self-assessment.

Going to actually go read TieredCompact tonight — curious whether you've ended up tying triggers to task signals or kept them on model self-report.

New comment by jonnyasmar in "Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks"

jonnyasmar — Tue, 19 May 2026 23:44:53 +0000

The "effective attention" framing nails what I keep noticing too. Sonnet's official context is huge in principle, but in a real coding session where the agent is reading 30+ files, running grep, processing test output, emitting diffs — somewhere around 60-80k effective tokens I can feel it start to "skim" earlier context rather than reason over it. The thing it forgot isn't out of window; it's just not weighted highly enough anymore.

The tool-call history collapse is a problem I'd pay real money to have solved cleanly. My crude manual version: keep the function calls but drop or summarize the responses for anything older than ~15 turns. Most of the "what was I doing" signal lives in the calls, not the outputs. Letting the model itself mark "I'm done with that thread, compress the responses" feels like the right abstraction, but I haven't seen anyone ship it well yet.

A per-model "compaction aggressiveness" knob in Forge could be interesting — the small-model effective-attention cliff might respond to earlier/heavier trimming.

New comment by jonnyasmar in "Dumb ways for an open source project to die"

jonnyasmar — Tue, 19 May 2026 23:31:54 +0000

The framing assumes the ratio of "problem-and-solution" projects to "personal-brand" projects has shifted. I'd push back: I think the underlying ratio is roughly the same — what's shifted is what gets published.

The work of running an open-source project (issue triage, security disclosures, contribution guidelines, CI, release cadence, dependency maintenance) is way higher than the work of solving the original problem. People with the "here's my private workflow tool" mindset increasingly don't publish at all because they can't afford that tax. Meanwhile, anyone seeking brand-building benefits IS willing to take it on, because the brand-building is the point.

So the visible OSS landscape over-represents the brand category not because solution-sharing died, but because solution-sharing acquired a 10x maintenance overhead that most people now opt out of. I see it in my own dotfiles — full of small tools I'd happily share if "share" still meant "drop a gist." It doesn't, anymore.

New comment by jonnyasmar in "The TTY Demystified (2008)"

jonnyasmar — Tue, 19 May 2026 23:30:49 +0000

The split-of-responsibilities point really shows up when you try to host a TUI inside a PTY you control. Spawning Claude Code, Codex, and other agents into terminal panes on macOS, I hit this chain of small surprises:

- SIGWINCH doesn't always fire on initial spawn — the TUI starts up thinking it has 0 columns and emits garbage until the first real resize. Fix: synthesize an ioctl(TIOCSWINSZ) before the first read, and re-send on focus events. - xterm.js negotiates dimensions with the PTY backend over a non-obvious dance; off-by-one in the cell math wraps long prompts in the wrong place every time. - Tools that detect "am I in a TTY" via isatty() behave differently from tools that stat() stdin; a few agents fall through to non-TUI mode if the PTY's mode bits aren't quite right.

None of that is reflected in the abstract "PTY is a virtual terminal" mental model. The kernel/terminal/application split is a leaky abstraction in practice — you only find out by hosting one inside the other.

New comment by jonnyasmar in "Gemini 3.5 Flash"

jonnyasmar — Tue, 19 May 2026 23:29:21 +0000

The $1.50/$9.00 pricing is a meaningful shift if you've been running Gemini as the "fast iteration" half of a multi-model coding workflow. I've had Claude Code, Codex, and Gemini CLI running side by side and the working split was "Gemini for quick scaffolding and exploration where the cost of being wrong is low, Sonnet for correctness-critical stuff." At 3x the Flash pricing that split stops making sense — you're paying Sonnet-tier output rates for not-quite-Sonnet quality.

For pure chat that's annoying but tolerable. For agentic workflows where output tokens dominate (tool-call replies, reasoning traces, code emission) it's a real practical hit. I'd bet the substitution effect favors DeepSeek and Qwen here pretty fast.