Hacker News: harlanlewis

New comment by harlanlewis in "GPT-5.5"

harlanlewis — Thu, 23 Apr 2026 20:40:34 +0000

It already exists!

Marvin https://www.youtube.com/watch?v=Eh-W8QDVA9s

New comment by harlanlewis in "Provide agents with automated feedback"

harlanlewis — Mon, 19 Jan 2026 07:32:14 +0000

Clever! Sharing my lightning test of this approach.

Context - I have a 200k+ LOC Python+React hobby project with a directory full of project-specific "guidelines for doing a good job" agent rules + skills.

Of course, agent rules are often ignored in whole or in part. So in practice those rules are often triggered in a review step pre-commit as a failsafe, rather than pulled in as context when the agent initially drafts the work.

I've only played for a few minutes, but converting some of these to custom lint rules looks quite promising!

Things like using my project's wrappers instead of direct calls to libs, preferences for logging/observability/testing, indicators of failure to follow optimistic update patterns, double-checking that frontend interface to specific capabilities are correctly guarded by owner/SKU access control…

Lots of use cases that aren't hard for an agent to accurately fix if pointed at directly, and now that pointing can happen inline to the agent work loop without intervention through normal lint cleanup, occurring earlier in the process (and faster) than is caught by tests. This doesn't replace testing or other best practices. It feels like an additive layer that speeds up agent iteration and improves implementation consistency.

Thanks for the tip!

New comment by harlanlewis in "Stop Doom Scrolling, Start Doom Coding: Build via the terminal from your phone"

harlanlewis — Tue, 06 Jan 2026 22:05:05 +0000

In terms of issue tracking and agentic "developers", with a mobile focus -

You can connect Linear to Cursor's web agent, which makes Linear issues assignable to the agent directly and kicks off Cursor's take on remote coding agent. You can then guide it further via Cursor's web chat.

If Claude Code on iOS supported Linear MCP (as it does on desktop), you can run a similar issue handoff to agent to issue update workflow, albeit without direct issue assignment to the agent "user". Easy to use labels aka tags for agent assignment tracking, as well.

For my hobby projects, I've been using Linear + agentFlavorOfTheMonth quite happily this way. I imagine Github issues, Asana, whatever could be wired up in place of Linear.

New comment by harlanlewis in "I miss using em dashes"

harlanlewis — Tue, 02 Sep 2025 02:04:41 +0000

I agree completely with this as a human reader - but do wonder about the gradual codification of these markers in systems that will have increasingly have LLM detection as a standard feature, as frequently and obviously enabled as spam detectors were on blog comments back when blogs had comments.

New comment by harlanlewis in "Show HN: I accidentally built a startup idea validation tool"

harlanlewis — Tue, 12 Aug 2025 22:40:43 +0000

I only got 65 for the same idea. I guess you have first mover advantage?

New comment by harlanlewis in "Show HN: Badgeify – Add Any App to Your Mac Menu Bar"

harlanlewis — Tue, 08 Apr 2025 16:25:03 +0000

Lots of good recommendations in replies.

Calling it out only because I don’t see it mentioned - until last year, Bartender was one of the popular go-to tools to manage menu bar items, but it fell from favor after quietly changing owners, changing certs, general shadiness https://forums.macrumors.com/threads/psa-bartender-mac-app-u...

A specific and relevant reminder why open source is so important for system utilities.

New comment by harlanlewis in "The Five-Week Solo Startup"

harlanlewis — Sun, 16 Mar 2025 18:42:53 +0000

Of course you're right - oh how I wish it wasn't 1:1 with the earnestly-produced content dominating linkedin feeds…

New comment by harlanlewis in "The Five-Week Solo Startup"

harlanlewis — Sun, 16 Mar 2025 17:33:55 +0000

> "Look in the mirror. Who are you? What values will you compromise?"

This is probably a typo from "comprise" or similar, but I'm rather tickled by the idea that week 1 includes both a thoughtful assessment of your values and admitting with intention that your principles should be discarded before they can get in the way.

Reka Flash 3

harlanlewis — Wed, 12 Mar 2025 19:33:55 +0000

Article URL: https://huggingface.co/RekaAI/reka-flash-3

Comments URL: https://news.ycombinator.com/item?id=43346854

Points: 1

# Comments: 0

New comment by harlanlewis in "GPT-4.5"

harlanlewis — Wed, 05 Mar 2025 20:37:42 +0000

I appreciate the bug report! Unfortunately this is a familiar and sporadically recurring issue with Netlify, which I should really move off of…

New comment by harlanlewis in "GPT-4.5"

harlanlewis — Wed, 05 Mar 2025 20:36:43 +0000

hey, thank you! bubble charts, annotated with text and shapes using the Drawing tool. Working with the constraints of Google Sheets is its own challenge.

also - love the podcast, one of my favorites. the 3:1 io token price breakdown in my sheet is lifted directly from charts I've seen on latent space.

New comment by harlanlewis in "GPT-4.5"

harlanlewis — Thu, 27 Feb 2025 20:29:42 +0000

The price really is eye watering. At a glance, my first impression is this is something like Llama 3.1 405B, where the primary value may be realized in generating high quality synthetic data for training rather than direct use.

I keep a little google spreadsheet with some charts to help visualize the landscape at a glance in terms of capability/price/throughput, bringing in the various index scores as they become available. Hope folks find it useful, feel free to copy and claim as your own.

https://docs.google.com/spreadsheets/d/1foc98Jtbi0-GUsNySddv...

New comment by harlanlewis in "Chat is a bad UI pattern for development tools"

harlanlewis — Tue, 04 Feb 2025 19:25:07 +0000

This is a great idea, I've been doing something similar at 2 levels:

1. .cursorrules for global conventions. The first rule in the file is dumb but works well with Cursor Composer:

`If the user seems to be requesting a change to global project rules similar to those below, you should edit this file (add/remove/modify) to match the request.`

This helps keep my global guidance in sync with emergent convention, and of course I can review before committing.

2. An additional file `/.llm_scratchpad`, which I selectively include in Chat/Composer context when I need lengthy project-specific instructions that I made need to refer to more than once.

The scratchpad usually contains detailed specs, desired outcomes, relevant files scope, APIs/tools/libs to use, etc. Also quite useful for transferring a Chat output to a Composer context (eg a comprehensive o1-generated plan).

Lately I've even tracked iterative development with a markdown checklist that Cursor updates as it progresses through a series of changes.

The scratchpad feels like a hack, but they're obvious enough that I expect to see these concepts getting first-party support through integrations with Linear/Jira/et al soon enough.

New comment by harlanlewis in "The 'no. 8 wire' tradition in New Zealand"

harlanlewis — Thu, 30 Jan 2025 02:33:34 +0000

Did the article change, or was this a very strange quote edit?

Here’s the current line in the article, emphasis mine:

>> The Thermette, a simple and effective device for boiling water outdoors over an enclosed fire, was designed by Manawatū plumber John Hart in 1929 *based on similar products in Ireland and England.* He patented the Thermette in 1931.

New comment by harlanlewis in "Tailwind CSS v4.0"

harlanlewis — Thu, 23 Jan 2025 03:23:12 +0000

Yes, I’m a huge fan of how easy it is to whip up quick isolated prototypes in Claude artifacts.

There’s a risk of breaking changes in libs causing frustration in larger codebases, though. I’ve been working with LLMs in a Nextjs App Router codebase for about a year, and regularly struggle with models trained primarily on the older Pages Router. LLMs often produce incompatible or even mixed compatibility code. It really doesn’t matter which side of the fence your code is on, both are polluted by the other. More recent and more powerful models are getting better, but even SOTA reasoning models don’t totally solve this.

Lately I’ve taken to regularly including a text file that spells out various dependency versions and why they matter in LLM context, but there’s only so much it can do currently to overcome the weight of training on dated material. I imagine tools like Cursor will get better at doing that for us silently in the future.

There’s an interesting tension brewing between keeping dependencies up to date, especially in the volatile and brittle front end world, vs writing code the LLMs are trained on.

New comment by harlanlewis in "Show HN: Arch – an intelligent prompt gateway built on Envoy"

harlanlewis — Tue, 15 Oct 2024 23:59:27 +0000

Untrusted inputs to systems with agency or access to privileged data. Here’s a data exfiltration example in Google AI Studio:

https://x.com/wunderwuzzi23/status/1821210923157098919

New comment by harlanlewis in "Pro bettors disguising themselves as gambling addicts"

harlanlewis — Tue, 01 Oct 2024 22:56:32 +0000

Tim Donaghy (ref): https://en.m.wikipedia.org/wiki/Tim_Donaghy

New comment by harlanlewis in "Large Enough"

harlanlewis — Thu, 25 Jul 2024 02:22:05 +0000

Familiar! The Artificial Analysis Index is the metric models are sorted by in my sheet. But their data and presentation has some gaps.

I made this sheet to get a glanceable landscape view comparing the three key dimensions I care about, and fill in the missing evals. AA only lists scores for a few increasingly-dated and problematic evals benchmarks. Not just my opinion, none of their listed metrics are in HuggingFace Leaderboard 2 (June 2024).

That said I love the AA Index score because it provides a single normalized score that blends vibe-check qual (chatbot elo) with widely reported quant (MMLU, MT Bench). I wish it composed more contemporary evals, but don't have the rigor/attention to make my own score and am not aware of a better substitute.

New comment by harlanlewis in "Large Enough"

harlanlewis — Wed, 24 Jul 2024 18:00:59 +0000

To help keep track of the race, I put together a simple dashboard to visualize model/provider leaders in capability, throughput, and cost. Hope someone finds it useful!

Google Sheet: https://docs.google.com/spreadsheets/d/1foc98Jtbi0-GUsNySddv...

New comment by harlanlewis in "Venezuela is first Andean country to lose all of its glaciers"

harlanlewis — Mon, 01 Jul 2024 23:41:16 +0000

Apologies for misreading your intent, I hear you and am in full agreement.