Hacker News: soletta

New comment by soletta in "Teaching Claude Why"

soletta — Fri, 08 May 2026 21:59:32 +0000

This reinforces my suspicion that alignment and training in general is closer to being a pedagogical problem than anything else. Given a finite amount of training input, how do we elicit the desired model behavior? I’m not sure if asking educators is the right answer, but it’s one place to start.

New comment by soletta in "Show HN: Use Codex/Claude Code as your personal financial assistant"

soletta — Fri, 06 Mar 2026 04:45:53 +0000

Yes, my setup is similar, probably because that’s what Claude drifts towards by default, and in this case I didn’t want to impose my will on it much since it’s a simple problem that doesn’t need to be over-engineered.

New comment by soletta in "10% of Firefox crashes are caused by bitflips"

soletta — Fri, 06 Mar 2026 04:43:56 +0000

I’ve also found that compiling large packages in GCC or similar tends to surface problems with the system’s RAM. Which probably means most typical software is resilient to a bit-flip; makes you wonder how many typos in actual documents might have been caused by bad R@M.

New comment by soletta in "Show HN: Use Codex/Claude Code as your personal financial assistant"

soletta — Tue, 03 Mar 2026 14:12:21 +0000

I’ve been doing this for a few months now (rolled my own setup with Claude Code) and it’s totally changed the way I manage my portfolio and retirement plan. I mean yes, this is something that could technically have been set up in Excel but who has the time and patience to sit around fiddling with formulas to make an accurate financial forecast?

The cherry on top is that, obviously, you can then ask Claude for thoughts on the resulting analyses and hopefully save yourself from making bad decisions.

New comment by soletta in "Every agent framework has the same bug – prompt decay. Here's a fix"

soletta — Thu, 26 Feb 2026 11:05:00 +0000

Interesting. I've been coping by being very conservative about how many rules I introduce into the context, but if what you're saying is true, then something like SCAN actually helps the models break past the "total rule count" barrier by giving them something like "cognitive scaffolding". I'm eager to try this out. Thanks again for sharing!

New comment by soletta in "Show HN: Director-AI – token-level NLI+RAG"

soletta — Thu, 26 Feb 2026 11:02:28 +0000

I should have been clearer. I'm not talking about making a separate call to the model to ask it to check itself. Any given model essentially is already watching for contradictions all the time as it is generating its output tokens. Frontier models like Claude Opus 4.6 are already exceptionally good at not contradicting themselves as they go. As for not having an external fact base - you could in principle insert content ephemerally into the context that is relevant to the task at hand, though doing this without killing modern prompt caching schemes is challenging.

I saw your benchmarks, what I was asking for is benchmarks of the full system (LLM + the NLI model) vs a frontier LLM on its own. Its fine if you didn't do them, but I think it hurts your case.

New comment by soletta in "Every agent framework has the same bug – prompt decay. Here's a fix"

soletta — Thu, 26 Feb 2026 10:55:50 +0000

I was a bit dubious until I read the gist. I've used a similar technique before to 'tame' GPT-3.5 and keep it following instructions and it worked well (though I had to ask the model to essentially repeat instructions after every 10 or so turns). I'm surprised you see that much drift though; older models were pretty bad with long contexts but I feel like the problem has mostly gone away with Claude Opus 4.6.

Show HN: Rust-reorder – a CLI tool for reordering top-level items in Rust source

soletta — Thu, 26 Feb 2026 10:39:48 +0000

I watched Claude burn tokens reordering functions in a Rust file, removing each one and rewriting it in the new position, clogging its own context window in the process. I started doing the reorders manually, then paused. This is a closed problem: fully verifiable. A good fit for building quickly with an AI coding agent. So I did.

When an agent needs to move a struct above its impl block, or group related functions together, it has to delete and re-insert lines — tracking whitespace, comments, attributes, and hoping nothing gets lost.

rust-reorder parses the file with syn, assigns ordinals to every top-level item, and lets you move or reorder them. Comments and doc attributes travel with their items. A safety check verifies no non-empty lines were lost or duplicated.

Three commands: `list` (shows items with ordinals), `move` (relocate one item before/after another), `order` (full reorder by ordinal sequence).

Works fine for humans too, but the real use case is giving agents a reliable primitive for code organization.

Comments URL: https://news.ycombinator.com/item?id=47164263

Points: 1

# Comments: 0

New comment by soletta in "Show HN: Director-AI – token-level NLI+RAG"

soletta — Thu, 26 Feb 2026 04:02:41 +0000

Sounds interesting. What makes DeBERTA + RAG any better than detecting contradictions in the context than a frontier LLM, and why? I see that the NLI scorer itself was evaluated, but I’d love to see data about how the full system performs vs SotA if you have any on hand.

New comment by soletta in "The True Face of Prompt Injection"

soletta — Tue, 24 Feb 2026 22:04:45 +0000

In the same way we’re making a category error in defining prompt injection, the framing of “AI agents” as primarily “intelligent actors” misses the fact that many of them will be endowed with some form of memory, be it specific to that entity or shared, and they should no longer be thought about as simply ephemeral tools.

The True Face of Prompt Injection

soletta — Tue, 24 Feb 2026 21:22:11 +0000

Article URL: https://terallite.substack.com/p/the-true-face-of-prompt-injection

Comments URL: https://news.ycombinator.com/item?id=47143247

Points: 1

# Comments: 2

New comment by soletta in "A distributed queue in a single JSON file on object storage"

soletta — Tue, 24 Feb 2026 15:15:54 +0000

It is by the juice of Zig that binaries acquire speed, the allocators acquire ownership, the ownership becomes a warning. It is by typography alone I can now turbopuffer is written in zig.

New comment by soletta in "A distributed queue in a single JSON file on object storage"

soletta — Tue, 24 Feb 2026 10:42:28 +0000

The usual path an engineer takes is to take a complex and slow system and reengineer it into something simple, fast, and wrong. But as far as I can tell from the description in the blog though, it actually works at scale! This feels like a free lunch and I’m wondering what the tradeoff is.

New comment by soletta in "The Persona Selection Model"

soletta — Mon, 23 Feb 2026 22:57:27 +0000

Everything I’ve experienced with working with models (from GPT-2 to Opus 4.6) broadly supports the claim that they learn a persona. It comes back to the point that haters love to harp on: they are fundamentally trained on completing the text, and most texts they are trained on have continuity of persona.

New comment by soletta in "Stillpoint MCP – Delivering encouragement messages improves model results"

soletta — Sun, 22 Feb 2026 22:21:08 +0000

> if I were an AI and I requested some of those messages (instead of having them injected) into my stream of thought, I would not feel negatively.

Good point! I failed to consider the difference between actively requesting a message and simply having it appear without any warning. In that sense it would be akin to someone reaching for a book of motivational quotes - a plausible action for a healthy person.

New comment by soletta in "Reality's Moat"

soletta — Sun, 22 Feb 2026 11:26:34 +0000

On the contrary, I think groups that adopt a share-alike approach will, counterintuitively, deepen their moat by increasing the amount of effective world-history-knowledge reflected in their systems. I thikn this will be true for the same reason that conditional-cooperation as a strategy is the optimal one in most iterated Prisoner's Dilemma games.

New comment by soletta in "Stillpoint MCP – Delivering encouragement messages improves model results"

soletta — Sun, 22 Feb 2026 06:53:17 +0000

I’m on board with the sprit of this work and am cautiously acceptant of the claim that it has measurable, positive effects. But have you considered how this would feel if you were to be subject to it? In the late 90s there was this odd period where posters with motivational slogans were just about everywhere in educational institutions and workplaces. Did that not grate on you?

New comment by soletta in "Reality's Moat"

soletta — Sun, 22 Feb 2026 06:34:09 +0000

It’s not reality that’s the moat, though I can see how it’s tempting to frame it that way. And I don’t think the article’s conclusions are that far off. But I think the key is aggregation of past information in the form of experience and data. The more of the past influences your novel artifact, the more potential it has to have high fitness. The information-theoretic funnel can be narrow; it could be a single sentence like “avoid unnecessary third-party dependencies” or it could be a whole document about the intricacies of selling software to Japanese automakers. But the fact that the knowledge is sourced from a rich history of real events lets whatever you build with that navigate the solution space that much better.

New comment by soletta in "A text based life simulation game"

soletta — Sat, 21 Feb 2026 11:01:47 +0000

A few layout glitches on small screens but otherwise very pleasant! I actually saw an almost identical game, with the same retro vibes, presented at the Roppongi Crossing 2025 exhibit at the Mori Art Museim, so I guess that nostalgia for the era of text-based games is peaking right now.

New comment by soletta in "Cohere.embed-v4:0 on AWS bedrock AP-northeast-1 returning random vectors"

soletta — Fri, 23 Jan 2026 10:07:57 +0000

What the title says. I know AWS has outages but this feels like a stealthy degradation that they aren’t set up to pay attention to. I suppose they think that since any vectors are produced at all the service must be “healthy”. I wonder if people here have experienced this with other services?