Hacker News: philipbjorge

New comment by philipbjorge in "Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep"

philipbjorge — Mon, 18 May 2026 02:24:14 +0000

I can't find the relevant issues in their repo, but I've been somewhat skeptical of their tool over-reporting token savings and there are many issues to that effect in the repo.

I'm not likely to install it again in my latest configuration, instead applying some specific tricks to things like `make test` to spit out zero output exit on unsuccessful error codes, that sort of thing. Anecdotally, I see GPT-5.5 often automatically applying context limiting flags to the bash it writes :shrug:

New comment by philipbjorge in "Rewrite Bun in Rust has been merged"

philipbjorge — Thu, 14 May 2026 18:18:51 +0000

I've done some pretty incredible things with LLMs. If this were sqlite with its exhaustive test suite... OK, I can see it.

It's hard for me to see this not becoming a pile of slop, but hey, maybe I'm wrong

New comment by philipbjorge in "Tell HN: Starting June 15, claude -p usage will change"

philipbjorge — Thu, 14 May 2026 16:04:45 +0000

Been loving pi and codex lately. Good to build resiliency and self sufficiency into these systems.

New comment by philipbjorge in "Tell HN: Starting June 15, claude -p usage will change"

philipbjorge — Thu, 14 May 2026 14:57:42 +0000

Can you fill me in on how this impacts conductor? How are they using `claude -p`?

New comment by philipbjorge in "Tell HN: Starting June 15, claude -p usage will change"

philipbjorge — Thu, 14 May 2026 14:56:37 +0000

Tracking with `ccusage`, I pretty easily hit $2000/mo in API equivalent credits and while I'd consider myself a power user, I'm a responsible one that's generally always in the loop. If I were using `claude -p`, this would effectively be a kneecapping.

New comment by philipbjorge in "Starship V3"

philipbjorge — Wed, 13 May 2026 03:43:26 +0000

This seems like less of a today thing and more of an ancient human tendency.

A lot of Buddhist practice is basically trying to train against immediately collapsing reality into self/other, right/wrong, craving/aversion.

Practicing this with Elon Musk is effectively ultra hard mode.

Though I do think there’s a subtle irony here too — the original commenter may simply be describing their own emotional reaction/disillusionment, while your response risks collapsing them into "part of the problem."

Feels like everybody in the thread is pointing at the same tendency from different angles.

New comment by philipbjorge in "Claude.ai and API unavailable [fixed]"

philipbjorge — Thu, 30 Apr 2026 05:23:54 +0000

I haven't really shared what I use, I'm still deciding if that's something I want to do.

To get an idea of what I'm talking about, you could install https://github.com/obra/superpowers/ into both Codex and Claude Code -- You'll find that the behavior is remarkably similar if you A/B compare them on the same problems. CC occasionally misses things that Codex gets and vice versa.

Overall the output structure and final code is remarkably similar... Which is pretty different than if you just run them with their default system prompts. I'd throw codex out the window with its default outputs.

New comment by philipbjorge in "Claude.ai and API Unavailable"

philipbjorge — Thu, 30 Apr 2026 02:04:23 +0000

Ahh good point -- I've handled this by switching my harness to `pi` but recognize that may not be for everyone and doesn't directly address OP's question.

New comment by philipbjorge in "Claude.ai and API unavailable [fixed]"

philipbjorge — Thu, 30 Apr 2026 02:00:45 +0000

What I found was that I *strongly* preferred Claude Code with its defaults. Codex was almost unusable to me -- It would spit out a 4-5 page plan where it kept repeating itself, where Claude would give me a crisp 1-2 pager I could actually review.

*But* I don't work with the defaults -- I work with my own prompt framework based off of superpowers.

Given sufficient prompt scaffolding, I've found the models relatively interchangeable -- _I might_ be getting some of this for free by basing my own system off of superpowers which is used across various harnesses -- In other words achieving this kind of portability may be a lot harder than it looks and I'm benefiting from other people's work.

New comment by philipbjorge in "Claude.ai and API Unavailable"

philipbjorge — Thu, 30 Apr 2026 01:36:41 +0000

You might search for a concept like `/handoff` that's in ampcode. I'm sure someone's built a skill for just this.

New comment by philipbjorge in "Claude.ai and API unavailable [fixed]"

philipbjorge — Thu, 30 Apr 2026 01:31:12 +0000

So happy to have diversified my model providers this past couple of weeks. GPT-5.5 has had no trouble slotting into Opus workloads. Will be fun to try out more of the models as time goes on to build some resiliency into my engineering workflows :).

New comment by philipbjorge in "Mistral Medium 3.5"

philipbjorge — Wed, 29 Apr 2026 16:36:25 +0000

I’ve been comparing Claude Code and Codex extensively side by side over the past couple of weeks with my favorite prompting framework superpowers…

From my perspective, Claude Code is decidedly not better than Codex. They’re slightly different and work better together. I would have no issues dropping CC entirely and using codex 100%.

If you’re working off of “defaults”, in other words no custom prompting, Claude Code does perform a lot better out of the box. I think this matters, but if you’re a professional software developer, I’d make the case that you should be owning your tools and moving beyond the baked in prompts.

New comment by philipbjorge in "Claude Code to be removed from Anthropic's Pro plan?"

philipbjorge — Wed, 22 Apr 2026 00:47:18 +0000

gpt 5.4 has been performing great in my harness.

New comment by philipbjorge in "Show HN: Webctl – Browser automation for agents based on CLI instead of MCP"

philipbjorge — Wed, 14 Jan 2026 20:43:10 +0000

This looks remarkably similar to https://github.com/vercel-labs/agent-browser

How is it different?

New comment by philipbjorge in "CBP is monitoring US drivers and detaining those with suspicious travel patterns"

philipbjorge — Thu, 20 Nov 2025 21:49:31 +0000

Until it's happened to you, it sounds unbelievable

Sorry about all the broken plastic on the trim -- That's also very familiar...

New comment by philipbjorge in "Using AI to negotiate a $195k hospital bill down to $33k"

philipbjorge — Tue, 28 Oct 2025 17:41:23 +0000

> We asked for a bill with the standard CPT codes. No reply. Asked again. “Oh, we meant to send it. We upgraded our computers five months ago and nothing works.” Uh-huh. Finally got the CPT codes.

I work in healthcare RCM. I have no trouble believing the staff here that nothing in their system works.

New comment by philipbjorge in "Asking AI to build scrapers should be easy right?"

philipbjorge — Fri, 17 Oct 2025 20:07:45 +0000

We had a similar realization here at Thoughtful and pivoted towards code generation approaches as well.

I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?

From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.

New comment by philipbjorge in "Windows-Use: an AI agent that interacts with Windows at GUI layer"

philipbjorge — Fri, 12 Sep 2025 18:55:47 +0000

Important is subjective — In the healthcare space, I’d make the claim that most applications don’t expose themselves correctly (native or web).

CV and direct mouse/kb interactions are the “base” interface, so if you solve this problem, you unlock just about every automation usecase.

(I agree that if you can get good, unambiguous, actionable context from accessibility/automation trees, that’s going to be superior)

New comment by philipbjorge in "GPT-4.1 in the API"

philipbjorge — Mon, 14 Apr 2025 20:02:18 +0000

If you're looking to test an LLMs ability to solve a coding task without prior knowledge of the task at hand, I don't think their benchmark is super useful.

If you care about understanding relative performance between models for solving known problems and producing correct output format, it's pretty useful.

- Even for well-known problems, we see a large distribution of quality between models (5 to 75% correctness) - Additionally, we see a large distribution of model's ability to produce responses in formats they were instructed in

At the end of the day, benchmarks are pretty fuzzy, but I always welcome a formalized benchmark as a means to understand model performance over vibe checking.

New comment by philipbjorge in "Ask HN: What's the best way to get started with LLM-assisted programing?"

philipbjorge — Fri, 04 Apr 2025 18:03:16 +0000

devcontainers extension was a year out of date up until the last month or something? sorry, this is from memory, but definitely not 100% compatibility.