Hacker News: blixt

New comment by blixt in "Claude Opus 4.7 Model Card"

blixt — Thu, 16 Apr 2026 14:50:33 +0000

Isn't it pretty common for the smaller models to release a little while after the bigger ones, for all the big model providers?

New comment by blixt in "Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference"

blixt — Wed, 15 Apr 2026 13:54:25 +0000

I made this offline pocket vibe coder using Gemma 4 (works offline once model is downloaded) on an iPhone. It can technically run the 4B model but it will default to 2B because of memory constraints.

https://github.com/blixt/pucky

It writes a single TypeScript file (I tried multiple files but embedded Gemma 4 is just not smart enough) and compiles the code with oxc.

You need to build it yourself in Xcode because this probably wouldn't survive the App Store review process. Once you run it, there are two starting points included (React Native and Three.js), the UX is a bit obscure but edge-swipe left/right to switch between views.

New comment by blixt in "The browser is the sandbox"

blixt — Mon, 26 Jan 2026 10:41:12 +0000

Since AI became capable of long-running sessions with tool calls, one VM per AI as a service became very lucrative. But I do think a large amount of these can indeed run in the browser, especially all the ones that essentially just want to live-update and execute code, or run shells on top of a mounted file system. You can actually do all of this in the user's browser very efficiently. There are two things you lose though: collaboration (you can do it, but it becomes a distributed problem if you don't have a central server) and working in the background (you need to pause all work while the user's tab is suspended or closed).

So if you can work within the constraints there are a lot of benefits you get as a platform: latency goes down a lot, performance may go up depending on user hardware (usually more powerful than the type of VM you'd use for this), bandwidth can go down significantly if you design this right, and your uptime and costs as a platform will improve if you don't need to make sure you can run thousands of VMs at once (or pay a premium for a platform that does it for you)[1]

All that said I'm not sure trying to put an entire OS or something like WebContainers in the user's browser is the way, I think you need to build a slightly custom runtime for this type of local agentic environment. But I'm convinced it's the best way to get the smoothest user experience and smoothest platform growth. We did this at Framer to be able to recompile any part of a website into React code at 60+ frames per second, which meant less tricks necessary to make the platform both feel snappy and be able to publish in a second.

[1] For big model providers like OpenAI and Anthropic there's an interesting edge they have in that they run a tremendous amount of GPU-heavy loads and have a lot of CPUs available for this purpose.

New comment by blixt in "Open Responses"

blixt — Thu, 15 Jan 2026 23:21:46 +0000

I’ve been building on an opinionated provider-agnostic library in Go[1] for a year now and it’s nice to see standardization around the format given how much variety there is between the providers. Hopefully it won’t just be the OpenAI logo on this though.

[1] https://github.com/flitsinc/go-llms

New comment by blixt in "I hate GitHub Actions with passion"

blixt — Wed, 14 Jan 2026 15:16:08 +0000

I've gotten to a point where my workflow YAML files are mostly `mise` tool calls (because it handles versioning of all tooling and has cache support) and webhooks, and still it is a pain. Also their concurrency and matrix strategies are just not working well, and sometimes you end up having to use a REST API endpoint to force cancel a job because their normal cancel functionality simply does not take.

There was a time I wanted our GH actions to be more capable, but now I just want them to do as little as possible. I've got a Cloudflare worker receiving the GitHub webhooks firehose, storing metadata about each push and each run so I don't have to pass variables between workflows (which somehow is a horrible experience), and any long-running task that should run in parallel (like evaluations) happens on a Hetzner machine instead.

I'm very open to hear of nice alternatives that integrate well with GitHub, but are more fun to configure.

New comment by blixt in "Show HN: Create LLM-optimized random identifiers"

blixt — Mon, 12 Jan 2026 17:02:03 +0000

If the immediate next token probabilities are flat, that would mean the LLM is not able to predict the next token with any certainty. This might happen if an LLM is thrown off by out of distribution data, though I haven't personally seen it happen with modern models, so it was mostly a sanity check. But examples from the past that would cause this have been simple things like not normalizing token boundaries in your input, trailing whitespace, etc. And sometimes using very rare tokens AKA "glitch tokens" (https://en.wikipedia.org/wiki/Glitch_token).

Show HN: Create LLM-optimized random identifiers

blixt — Mon, 12 Jan 2026 15:51:58 +0000

I went exploring whether using LLM tokens as the individual "digits" of random ids would let you get more randomness for the same number of tokens, and the answer is yes. Using the current strategy in this library is about ~50% more token efficient than using base64 ids.

I also ran hundreds of sessions against the OpenAI API to see if the logprobs would look off using this strategy, compared to base64 ids, and it seems like it's about the same or possibly slightly better (more "peaky").

Could be useful for agentic frameworks where tool results need to provide ids to refer back to later. A small win at best, but it was fun to explore!

Comments URL: https://news.ycombinator.com/item?id=46590068

Points: 2

# Comments: 2

New comment by blixt in "Useful patterns for building HTML tools"

blixt — Sat, 13 Dec 2025 19:50:30 +0000

One thing I tend to do myself is use https://generator.jspm.io/ to produce an import map once for all base dependencies I need (there's also a CLI), then I can easily copy/paste this template and get a self-contained single-file app that still supports JSX, React, and everything else. Some people may think it's overkill, but for me it's much more productive than document.getElementById("...") everywhere.

I don't have a lot of public examples of this, but here's a larger project where I used this strategy for a relatively large app that has TypeScript annotations for easy VSCode use, Tailwind for design, and it even loads in huge libraries like the Monaco code editor etc, and it all just works quite well 100% statically:

HTML file: https://github.com/blixt/go-gittyup/blob/main/static/index.h...

Main entrypoint file: https://github.com/blixt/go-gittyup/blob/main/static/main.js

New comment by blixt in "The "confident idiot" problem: Why AI needs hard rules, not vibe checks"

blixt — Mon, 08 Dec 2025 13:40:30 +0000

Yeah I’ve found that the only way to let AI build any larger amount of useful code and data for a user that does not review all of it requires a lot of “gutter rails”. Not just adding more prompting, because it is an after-the-fact solution. Not just verifying and erroring a turn, because it adds latency and allows the model to start spinning out of control. But also isolating tasks and autofixing output keep the model on track.

Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.

New comment by blixt in "Anthropic acquires Bun"

blixt — Tue, 02 Dec 2025 23:53:38 +0000

Extrapolating and wildly guessing, we could end up with using all that mostly idle CPU/RAM (the non-VRAM) on the beefy GPUs doing inference on agentic loops where the AI runs small JS scripts in a sandbox (which Bun is the best at, with its faster startup times and lower RAM use, not to mention its extensive native bindings that Node.js/V8 do not have) essentially allowing multiple turns to happen before yielding to the API caller. It would also go well with Anthropic's advanced tool use that they recently announced. This would be a big competitive advantage in the age of agents.

Show HN: Use any LLM in Go with stable, minimal API

blixt — Fri, 21 Nov 2025 13:56:18 +0000

I started this package about a year ago because most existing packages were overly complex and I just wanted the basic LLM functionality (text, tools, streaming, images, caching, etc) compatible with all the major APIs (OpenAI Chat Completions + Responses, Anthropic, Google Studio + Vertex). It also works with any other vendor that provides a compatible API.

Along this journey we found a ton of quirks and differences between vendors, and tried to make switching between them on the fly feel as smooth as possible (e.g. not having to worry exactly how to include an image in a tool result).

Sharing it now because it's reaching some form of maturity after a year, but because the goal has been to keep it minimal, it only has the bare minimum we needed – I'd love to hear what people think is missing!

Comments URL: https://news.ycombinator.com/item?id=46004654

Points: 2

# Comments: 0

New comment by blixt in "Questions for Cloudflare"

blixt — Wed, 19 Nov 2025 17:12:53 +0000

It's a bit odd to come from the outside to judge the internal process of an organization with many very complex moving parts, only a fraction of which we have been given context for, especially so soon after the incident and the post-mortem explaining it.

I think the ultimate judgement must come from whether we will stay with Cloudflare now that we have seen how bad it can get. One could also say that this level of outage hasn't happened in many years, and they are now freshly frightened by it happening again so expect things to get tightened up (probably using different questions than this blog post proposes).

As for what this blog post could have been: maybe a page out of how these ideas were actively used by the author at e.g. Tradera or Loop54.

New comment by blixt in "A file format uncracked for 20 years"

blixt — Mon, 17 Nov 2025 10:26:00 +0000

The quirks of field values not matching expectations reminds me of a rabbit hole when I was reverse engineering the Starbound engine[1] and eventually figured out the game was using a flawed implementation of SHA-256 hashing and had to create a replica of it [2]. Originally I used Python [3] which is a really nice language for reverse engineering data formats thanks to its flexibility.

[1] Starbounded was supposed to become an editor: https://github.com/blixt/starbounded

[2] https://github.com/blixt/starbound-sha256

[3] https://github.com/blixt/py-starbound

New comment by blixt in "When stick figures fought"

blixt — Tue, 04 Nov 2025 19:46:54 +0000

Wow, blast from the past. There's a fairly recent game on Steam called "Stick it to the Stickman" which practically puts you in control of the character in these animations. In fact I think the game was directly inspired by them (There's a Devolver interview with someone working on the game mentioning it[1]).

[1] https://www.youtube.com/watch?v=LF2kGSAIljU

New comment by blixt in "Scripts I wrote that I use all the time"

blixt — Thu, 23 Oct 2025 13:41:06 +0000

I see other people mentioning env and mise does this too, with additional support to add on extra env overrides with a dedicated file such as for example .mise.testing.toml config and running something like:

MISE_ENV=testing bun run test

(“testing” in this example can be whatever you like)

New comment by blixt in "Scripts I wrote that I use all the time"

blixt — Thu, 23 Oct 2025 10:54:14 +0000

Slightly related but mise, a tool you can use instead of eg make, has “on enter directory” hooks that can reconfigure your system quite a bit whenever you enter the project directory in the terminal. Initially I was horrified by this idea but I have to admit it’s been quite nice to enter into a directory and everything is set up just right, also for new people joining. It has built in version management of just about every command line tool you could imagine, so that an entire team can be on a consistent setup of Python, Node, Go, etc.

New comment by blixt in "Go subtleties"

blixt — Wed, 22 Oct 2025 13:05:05 +0000

I thought exactly the same thing. I use errgroup in practically every Go project because it does something you'd most likely do by hand otherwise, and it does it cleaner.

I discovered it after I had already written my own utility to do exactly the same thing, and the code was almost line for line the same, which was pretty funny. But it was a great opportunity to delete some code from the repo without having to refactor anything!

New comment by blixt in "How AI hears accents: An audible visualization of accent clusters"

blixt — Wed, 15 Oct 2025 08:28:25 +0000

It would be interesting to do a wider test like this but instead of trying to clump people together into "American English" and "British English" it would be interesting if the data point was "in which city do people speak like you do?" and create a geographic map of accents.

I'm from the south of Sweden and I've had my "accent" made fun of by people from Malmö just because I grew up outside of Helsingborg, because the accent changes that much in just 60 kilometers.

New comment by blixt in "Managing context on the Claude Developer Platform"

blixt — Sun, 05 Oct 2025 11:07:40 +0000

Yes we had the same issue with our coding agent. We found that instead of replacing large tool results in the context it was sometimes better to have two agents, one long lived with smaller tool results produced by another short lived agent that would actually be the one to read and edit large chunks. The downside of this is you always have to manage the balance of which agent gets what context, and you also increase latency and cost a bit (slightly less reuse of prompt cache)

New comment by blixt in "Managing context on the Claude Developer Platform"

blixt — Sun, 05 Oct 2025 11:03:57 +0000

From what I can tell the new context editing and memory APIs are essentially formalization of common patterns:

Context editing: Replace tool call results in message history (i.e replace a file output with an indicator that it’s no longer available).

Memory: Give LLM access to read and write .md files like a virtual file system

I feel like these formalizations of tools are on the path towards managing message history on the server, which means better vendor lock in, but not necessarily a big boon to the user of the API (well, bandwidth and latency will improve). I see the ChatGPT Responses API going a similar path, and together these changes will make it harder to swap transparently between providers, something I enjoy having the ability to do.