Hacker News: bitexploder

New comment by bitexploder in "Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers"

bitexploder — Tue, 16 Jun 2026 20:47:49 +0000

I call it "Manhattan Projecting" them. The amusing thing is I had Fable review my harness (which I have been building for some time) and it helped improve it. It is just kind of funny that it enthusiastically helped build a harness whose sole purpose was to divide agents up and compartmentalize security sensitive vulnerability research.

New comment by bitexploder in "Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers"

bitexploder — Tue, 16 Jun 2026 13:06:58 +0000

I also have a 100% success rate jail breaking them by breaking the work down into small pieces and stripping all security related language. Smaller tasks, test engineering and normal programming language. Fable found a few bugs in my harness for me before they pulled it. I was testing it vs ChatGPT, Gemini, and Opus. It was doing well at bug hunting.

New comment by bitexploder in "Statement on US government directive to suspend access to Fable 5 and Mythos 5"

bitexploder — Sat, 13 Jun 2026 02:37:53 +0000

What if I told you there are no safety guardrails. I used GLM 5.1 and had fable literally build a harness to avoid triggering guard rails. I built skills carefully and had Fable doing vuln research and exploit repro in a few hours. I called the project manhattan. The GLM models are down for almost anything so I named it Oppenheimer. It orchestrated the fable CLI agents via tmux. This whole Fable/Mythos thing is such a fucking joke. It is all PR and theatre and they know it.

New comment by bitexploder in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"

bitexploder — Thu, 04 Jun 2026 22:36:57 +0000

But a good harness lowers the model floor and accessibility and makes stronger models that much better.

New comment by bitexploder in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"

bitexploder — Thu, 04 Jun 2026 14:58:39 +0000

I don't even care. It is the same problem advent of code had as a public challenge with a leader board. I now mostly just think either embrace the LLM or keep it to a more in person or vetted audience. But, again, if you create a competition in the spirit of humans without LLMs and that is in the rules and someone uses an LLM that is on them IMO. I am sad advent of code decided to end their competition. LLMs are here to stay, let's embrace that and see what the new universe of competitions with LLMs can be. There will always be a place for human only competition, but for public facing ones LLM accepted is the only tenable position.

This does bring "Pay to compete" concerns and create incentive structures that encourage more LLM use. I don't know what to do about it.

New comment by bitexploder in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"

bitexploder — Thu, 04 Jun 2026 14:36:29 +0000

Anthropic has a vested interest in downplaying the harness relevance. In my experience harness really matters. More capable models are great, but current models are enough if you put some engineering effort into the harness.

New comment by bitexploder in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"

bitexploder — Thu, 04 Jun 2026 14:34:15 +0000

You have to do what I call "Manhattan Project" them. You can almost always evade the controls by carefully prompting them. It just wastes effort and time you should be spending doing other things in an LLM workflow. Essentially, there is almost no single discrete piece of a reverse engineering or CTF process that you can't get Claude to do, you just have to isolate it adequately and avoid letting it use names that attenuate it towards "this is an exploit" or "this is reverse engineering". I have not found a task I could not convince Claude to do. You can also fill the context window up with badgering it and eventually it is likely to simply let you through if you are careful, most of the safe guards are not deterministic.

New comment by bitexploder in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"

bitexploder — Thu, 04 Jun 2026 14:32:02 +0000

Anthropic made their models very averse to reverse engineering and vulnerability research chores. It is a difficult problem, but attackers will use models like GLM and defenders will be stuck with security engineering averse models.

New comment by bitexploder in "Gemma 4 12B: A unified, encoder-free multimodal model"

bitexploder — Wed, 03 Jun 2026 22:26:26 +0000

Google is also a Cloud Provider. Cloud is now ~18% of Google. While it is an advertising juggernaut. Cloud is also rapidly growing, so the local models simply fit as AI research and dev and getting more people on Gemini models. They /are/ advertising, effectively :)

New comment by bitexploder in "Gemma 4 12B: A unified, encoder-free multimodal model"

bitexploder — Wed, 03 Jun 2026 22:23:51 +0000

Don't LLMs work on attention though? The closer in their hyperdimensional space you can land your problem to their inherent understand the better they are at understanding your problem domain. RAG loops can be very slow and agents may simply lack the knowledge to use them correctly.

New comment by bitexploder in "MAI-Code-1-Flash"

bitexploder — Wed, 03 Jun 2026 06:44:02 +0000

3 Flash is likely rather underrated here. It continues to impress me on few-shot tasks.

New comment by bitexploder in "macOS needs its grid back"

bitexploder — Tue, 02 Jun 2026 17:24:11 +0000

On the other hand, just working at big tech doesn't mean you are especially great. Conformance and criteria other than raw skill matter. As you say, promotion games, etc... I would just lump all of that under conformance. So, you aren't wrong.

However, why startups outperform big companies isn't just the skill gap. Even if you have the most amazing leadership in big tech it is monumentally difficult to move the needle on some problems purely because of size not because of incompetence. All I am saying is don't overindex on perceived intelligence. A big org can start looking pretty dumb even though it is still far right of the bell curve compared to even a startup (hypothetically). Org size and the constraints that brings are a significant factor.

New comment by bitexploder in "macOS needs its grid back"

bitexploder — Tue, 02 Jun 2026 15:21:16 +0000

This assumes that every designer is on the bell curve at the big tech firms in the roles that can influence this. I am not defending modern UI/UX, but that is quite an assumption.

New comment by bitexploder in "Adafruit receives demand letter from Fenwick legal counsel on behalf of Flux.ai"

bitexploder — Tue, 02 Jun 2026 13:37:48 +0000

Adafruit sure has a lot of stories they are eager to tell lately.

New comment by bitexploder in "SQLite is all you need for durable workflows"

bitexploder — Tue, 02 Jun 2026 13:25:10 +0000

Have not. For my workflows this was fine. Good to keep in mind thoUgh. I don’t plan to manage a truly distributed system with it. Plus my only reason to do so is professional and we rolled our own system here due to our size solutions like DBOS or Temporal would not work well.

New comment by bitexploder in "Vibe Coding Is Not Engineering"

bitexploder — Sat, 30 May 2026 17:35:58 +0000

LLMs are not deterministic. They do typically behave within pretty reasonable boundaries. Humans are not deterministic. They also typically behave within pretty reasonable boundaries. Engineering with LLMs and humans means understanding those boundaries and designing for them. This is a legitimate engineering problem like any other. I think the main misalignment I see is the expected productivity gain. When you are using real engineering discipline it is still very productive to use AI for coding, but not nearly so productive as many people claim when you factor the fragility of their system.

There is no correct use. There is no “correct” way to build systems. There is principled and disciplined use.

New comment by bitexploder in "SQLite is all you need for durable workflows"

bitexploder — Sat, 30 May 2026 16:00:41 +0000

That is the real truth people are voicing when they say Temporal is heavy. They are really saying: Durable, reliable, distributed workloads are hard and it takes effort to manage! And that is true. I know of no systems that make that genuinely easy. It is a hard discipline. Maybe Temporal makes that harder than it should be, but I have no experience there.

There are no free lunches in this space. I have no idea how good or bad Temporal is since my usage is pretty small and isolated, but software rarely just works and impresses me and Temporal for my local machine orchestrating genuinely did. I think Netflix's conductor is another cool option, but I ended up with Temporal due to license.

New comment by bitexploder in "SQLite is all you need for durable workflows"

bitexploder — Sat, 30 May 2026 05:02:22 +0000

Autonomous C to Rust. Automated penetration testing and vuln validation.

New comment by bitexploder in "SQLite is all you need for durable workflows"

bitexploder — Sat, 30 May 2026 05:01:50 +0000

Yep. Individual systems with yolo agents doing stuff in isolation. I could see how it can get complex. Most distributed systems are. No free lunches I guess. I am not sure what the alternatives are at scale.

New comment by bitexploder in "SQLite is all you need for durable workflows"

bitexploder — Fri, 29 May 2026 22:41:47 +0000

Well, just my experience. I installed it, had my agents configure it and it immediately solved problems I had with very little friction. Dealing with long running, long horizon agentic tasks that need very high reliability so I don’t have to babysit. I vibed the first version, realized I was reinventing reliable distributed systems. Stopped vibing and started surveying for something that fit :)