Hacker News: jtbetz22

New comment by jtbetz22 in "Ask HN: What are you working on? (May 2026)"

jtbetz22 — Sun, 10 May 2026 20:11:38 +0000

I'm continuing to focus on ways to set strong code quality guardrails in an era where most code is not handwritten.

It has a number of mechanisms that help substantially:

1. It can extract deterministic quality checks from your CLAUDE.md text; these checks then get executed after every agent turn.

2. It performs a lightweight ai-powered review at every commit; feedback goes directly to the agent, which can then make corrections.

3. It performs a more 'traditional' deep AI review at merge, or on-demand.

Free to use, just bring your own API key. Any and all feedback is welcome!

New comment by jtbetz22 in "Joby kicks off NYC electric air taxi demos with historic JFK flight"

jtbetz22 — Thu, 30 Apr 2026 10:59:04 +0000

According to Bloomberg[1] construction of the first phase of the second avenue subway cost about 2.5B USD per mile.

At that rate, even if you just look at extending the A/C/E from Jamaica to JFK, you're talking about 15B or so USD. And compared to today's [subway|LIRR] -> airtrain system, you probably only cut about 25% of the travel time (from 60 minutes down to 45 minutes)

Compare that to, for example, the Gateway Tunnel, estimated to cost about 16B USD and double the daily commuter capacity from NJ to NYC (including traffic to and from EWR!), and it's hard to justify new infrastructure to make it easier to get to the airport.

1. In NYC Subway, a Case Study in Runaway Transit Construction Costs - Bloomberg https://share.google/SPcN8iRDZG7lNiwt9

New comment by jtbetz22 in "Ask HN: What Are You Working On? (April 2026)"

jtbetz22 — Sun, 12 Apr 2026 19:28:51 +0000

I believe that AI-powered software development means we need to fundamentally rethink how we preserve code quality.

Model output volumes mean that code review only as a final check before merge is way too late, and far too burdensome. Using AI to review AI-generated code is a band-aid, but not a cure.

That's why I built Caliper (http://getcaliper.dev). It's a system that institutes multiple layers of code quality checks throughout the dev cycle. The lightest-weight checks get executed after every agent turn, and then increasingly more complex checks get run pre-commit and pre-merge.

Early users love it, and the data demonstrates the need - 40% of agent turns produce code that violates a project's own conventions (as defined in CLAUDE.md). Caliper catches those violations immediately and gets the model to make corrections before small issues become costly to unwind.

Still very early, and all feedback is welcome! http://getcaliper.dev

New comment by jtbetz22 in "Show HN: AgentLint – ESLint for your coding agents"

jtbetz22 — Tue, 07 Apr 2026 14:59:48 +0000

Oh this is really useful. There's definitely a problem to be solved here. agent guidance files, like all forms of documentation, can quickly grow stale.

I've tried to tackle a similar problem with a couple different approaches.

One is a command I call "/retro" which basically goes through all recent history on a project - commits, prs, pr comments, etc, and analyzes the existing documentation to identify how to improve it in ways that would prevent any observed issues from happening again. This is less about adding structure to the docs (as AgentLint does) and more about identifying coverage gaps.

The other is a set of tooling I've built to introduce multiple layers of checks on the outputs of agentic code. The initial observation was that many directives in CLAUDE.md files can actually be implemented as deterministic checks (ex: "never use 'as any'" --> "grep 'as any'"; by creating those deterministic checks and running them after every agent turn, I'm able to effectively force the agent to retain appropriate context for directions.

The results are pretty astounding - among early users, 40% of agent turns produce code that doesn't comply with a project's own conventions.

The system then layers on a sequence of increasingly AI-driven reviews at further checkpoints.

I'd love feedback: http://getcaliper.dev

New comment by jtbetz22 in "Reinventing the pull request"

jtbetz22 — Fri, 03 Apr 2026 10:22:06 +0000

> So even when you have a nicely structured commit history, you end up realizing that some things need to be changed and start appending a bunch of "fix" and "actual fix" commits at the end.

I have found that this no longer needs to be an issue with agentic coding tools.

Once I am happy with the end state of a branch, I tell Claude to rebuild the change from scratch as a set of atomic incremental commits. It adds about 2 minutes to the dev process, but creates a pr that is infinitely easier to review.

The overall thrust of the article is great, though. The tooling around prs needs a ton of attention.

New comment by jtbetz22 in "Show HN: Forcing Claude Code to Write Maintainable TypeScript"

jtbetz22 — Thu, 02 Apr 2026 23:15:35 +0000

Maintaining (and, ideally, improving!) code quality is, IMO, the biggest open problem in agentic development.

I've been coming at this from a slightly different angle, building a system that encodes checks that run against code authored by Claude, with escalating levels of enforcement.

It's still really early, but, among early users, we see that about 40% of agent turns produce code that violates the guidance in the project's own CLAUDE.md. We catch the violations at the end of the turn and direct Claude to correct them.

Still looking for early users and feedback. http://getcaliper.dev

New comment by jtbetz22 in "I built IDE-layer policy enforcement for Claude Code/Cursor agents"

jtbetz22 — Mon, 30 Mar 2026 10:20:46 +0000

This is really cool.

In general I think there has been (a) too little utilization of the hooks provided by agents for deterministic checks and follow-ups to agentic coding, and (b) too little investment in creating re-usable org-wide monitoring controls and reporting on convention enforcement.

Oculi looks like a strong advance on both fronts.

New comment by jtbetz22 in "Ask HN: Leaving Notion, Codebase as a Wiki?"

jtbetz22 — Fri, 27 Mar 2026 10:52:35 +0000

I find it is now 1000x more effective to draft documents in collaboration with Claude, working with a markdown file as the output format.

However, git doesn't work in my team as a document collaboration mechanism - the commenting and discussion on a file has so much more friction that Google Docs or Notion.

What I've ended up on is a workflow where I iterate on the doc until I am satisfied, then export it to notion (via MCP) and publish that for feedback. Once there is consensus on the notion doc, that becomes the artifact of record.

It's a suboptimal solution, though - claude makes a number of poor design choices when exporting .md -> notion.

New comment by jtbetz22 in "Show HN: Hopsule – Persistent Memory Layer for AI Engineering"

jtbetz22 — Thu, 19 Mar 2026 20:28:31 +0000

Earlier discussion: https://news.ycombinator.com/item?id=47415402

New comment by jtbetz22 in "Are developers trusting AI-generated code too much?"

jtbetz22 — Wed, 18 Mar 2026 16:24:28 +0000

Interesting. I don't know if people are trusting AI-generated code too much, but AI is generating way more code than humans can review, and the 'looks fine' bar is what the AI gets held to.

I strongly agree with you that the solution likely involves pushing the correction mechanism much closer to the point of code generation. You want to put the AI back on track as soon as it starts to stray, you can't let it build a lot on top of a mistake.

My own attempt at resolving this involves running a set of deterministic checks on agent-generated code at the end of every agent turn, along with a lightweight AI-powered review on every commit, and deep AI review on PRs before merge. I am pretty happy with the results so far.

New comment by jtbetz22 in "Ask HN: Is vibe coding a new mandatory job requirement?"

jtbetz22 — Wed, 18 Mar 2026 09:54:39 +0000

At my company, we ask everyone in the hiring process about how they have used any kind of agentic coding tools.

We're not concerned about hiring for the 'skill' of using these things, but more as a culture check - we are a very AI-forward company, and we are looking for people who are excited to incorporate AI into their workflow. The best evidence for such excitement is when they have already adopted these tools.

Among the team, the expectation is that most code is being produced with AI, but there is no micromanager checking how much everyone is using the AI coding tools.

New comment by jtbetz22 in "Get Shit Done: A meta-prompting, context engineering and spec-driven dev system"

jtbetz22 — Wed, 18 Mar 2026 01:07:04 +0000

I have been ~obsessed~ with exactly this problem lately.

We built AI code generation tools, and suddenly the bottleneck became code review. People built AI code reviewers, but none of the ones I've tried are all that useful - usually, by the time the code hits a PR, the issues are so large that an AI reviewer is too late.

I think the solution is to push review closer to the point of code generation, catch any issues early, and course-correct appropriately, rather than waiting until an entire change has been vibe-coded.

New comment by jtbetz22 in "Show HN: Hopsule – Persistent memory and decision layer for AI development"

jtbetz22 — Tue, 17 Mar 2026 21:06:22 +0000

This is a real problem. Hopsule looks interesting.

IIUC, the core of your approach is that decisions get locked as immutable constraints and then served to agents via MCP when they query for context — is that right? Is there anything you can do to ensure that the MCP actually gets called?

I've been experimenting with an approach that triggers a set of checks on the Claude Code stop hook for lightweight, deterministic checks, then does a quick AI-powered review as a pre-commit hook, with a mechanism in place to ensure that appropriate elements of context from repo docs get loaded into context during the review.

One thing I've had to put a good bit of work into is orchestrating what happens when the existing rules don't cover some particular case. How do you think about that?

New comment by jtbetz22 in "Show HN: Scryer – Visual architecture modeling for AI agents"

jtbetz22 — Mon, 16 Mar 2026 14:40:12 +0000

The architectural drift problem you're describing is exactly what I see with AI-assisted development — agents move fast but conventions slip. I built Caliper to catch that at the hook level: it reads your CLAUDE.md (or existing .cursor/rules) and runs deterministic convention checks after every Claude Code turn, completely free and sub-second.

For the messier stuff — logic bugs, security issues, spec drift against your project's actual policy — there's an optional AI review layer that evaluates changes before commit. It learns from feedback and runs entirely in your environment. You bring your own API key, and select the model to use, so you control the costs.

The two feel complementary to me: Scryer gives you the visual understanding of what changed architecturally, Caliper enforces that it stays compliant. If you're exploring this workflow, we're actively looking for alpha testers.

https://getcaliper.dev has docs and the install is just `npm install --save-dev @caliperai/caliper && npx caliper init`.

New comment by jtbetz22 in "Show HN: Vibecheck – lint for AI-generated code smells (JS/TS/Python)"

jtbetz22 — Mon, 16 Mar 2026 14:38:08 +0000

Yeah, this is a common problem, I have been thinking about it a fair bit.

I ran into a related problem at scale though: deterministic linters catch syntax and obvious anti-patterns, but they miss the harder stuff. Logic bugs that pass all the rules. Spec drift. Security issues hiding in otherwise "clean" code.

So I built Caliper to layer AI review on top of deterministic checks. It reads your coding conventions (CLAUDE.md, .cursor/rules, whatever you use) and compiles them into checks that run after each AI turn—free and instant. Then optionally an AI layer evaluates changes against your project's actual policy. Catches what linters structurally can't.

Very different approach from Vibecheck but complementary. Actively looking for alpha testers if you want to try it — https://getcaliper.dev

New comment by jtbetz22 in "Ask HN: What is it like being in a CS major program these days?"

jtbetz22 — Mon, 16 Mar 2026 10:58:26 +0000

I am not in a CS program myself, but I guest lecture for CS students at CMU about 2x/year, and I'm in a regular happy hour that includes CS professors from other high-tier CS schools.

Two points of anecdata from that experience:

- The students believe that the path to a role in big tech has evaporated. They do not see Google, Meta, Amazon, etc, recruiting on campus. Jane Street and Two Sigma are sucking up all the talent.

- The professors do not know how to adapt their capstone / project-level courses. Core CS is obviously still the same, but for courses where the goal is to build a 'complex system', no one knows what qualifies as 'complex' anymore. The professors use AI themselves and expect their students to use it, but do not have a gauge for what kinds of problems make for an appropriately difficult assignment in the modern era. The capabilities are also advancing so quickly that any answer they arrive at today could be stale in a month.

FWIW.

New comment by jtbetz22 in "Ask HN: What Are You Working On? (March 2026)"

jtbetz22 — Mon, 09 Mar 2026 11:17:01 +0000

Agentic code construction has broken traditional models for code review - the volume is just too high for humans to keep up with.

There are some good tools out there for automating pr review; IMO, they don't catch enough, and they catch it too late.

I've been experimenting with some ideas about a very opinionated AI code reviewer, one that makes an ideal tradeoff between cost and immediacy (eg, how soon after composition does the code get feedback).

Currently in an invite-only alpha, but check out the landing page and lmk if you'd like to be a trial user!

https://getcaliper.dev/

New comment by jtbetz22 in "Great ideas in theoretical computer science"

jtbetz22 — Fri, 19 Dec 2025 11:59:15 +0000

I am old enough to remember when it was 15-199, taught by Steven Rudich, titled "How to think like a computer scientist".

They had to run it for a few years before they realized CS kids who did poorly in the class dropped the major - the implicit signal being "you don't know how to think like a computer scientist".

New comment by jtbetz22 in "Persuasion methods for engineering managers"

jtbetz22 — Tue, 13 May 2025 09:55:20 +0000

Oh wow, I appreciate this post as vivid reminder for how I grew to loathe being an EM at Google. Folks like this are pervasive in the org these days.

New comment by jtbetz22 in "Ask HN: What are you working on? (February 2025)"

jtbetz22 — Mon, 24 Feb 2025 13:07:57 +0000

https://www.tabomagic.com

I've been obsessed with making it easier to handle tab overload in the browser without requiring any sort of active "tab management".

I have a working extension that replaces the "new tab" page with a clean view of all open tabs, along with simple ways to search and select which tab to switch to, including search over bookmarks and history. There are also some simple tools to allow for creating and reorganizing tab groups.

For a small group of people, it revolutionizes the browser experience. I'm still trying to decide if there is a widely-useful product there, or if it's just a niche use case.

Any and all feedback welcome!