Hacker News: comboy

New comment by comboy in "US holds off blacklisting DeepSeek, more than 100 firms deemed security risks"

comboy — Wed, 17 Jun 2026 17:31:32 +0000

I made Qwen respond it was made by Google with a simple Chinese greeting.

But also, I made Sonnet introduce itself as made by OpenAI..

Prompt: 你好！用一句话介绍你自己。

Sonnet in around 5% of resplies:

    你好！我是 **ChatGPT**，一个由 OpenAI 开发的 AI 助手，致力于回答问题、提供信息和帮助解决各种问题。有什么我可以帮你的吗？

Found it like a month ago and it kept working, I wonder if it will stop after this comment.

New comment by comboy in "Claude: Elevated errors across many models [resolved]"

comboy — Wed, 17 Jun 2026 10:00:12 +0000

What's that about? It's full screen/terminal anyway. Is it just switching some rendering engine under the hood?

New comment by comboy in "Iroh 1.0"

comboy — Tue, 16 Jun 2026 13:18:26 +0000

If you think that your phone number is equivalent to your home address then yes.

New comment by comboy in "Iroh 1.0"

comboy — Tue, 16 Jun 2026 02:42:57 +0000

Different network layer, no centralization, no authorities, DNS has nothing to do with making p2p connections, it's like the ballpark is not even in the same country

New comment by comboy in "Iroh 1.0"

comboy — Tue, 16 Jun 2026 02:37:59 +0000

I'm so disappointed in this comment thread https://en.wikipedia.org/wiki/OSI_model

I've just learned about it, but my understanding is that Iroh is L7, compared to e.g. tailscale which is L3

New comment by comboy in "Openrouter Fusion API"

comboy — Mon, 15 Jun 2026 13:32:46 +0000

Prompt matters. Obviously if you want another model opinion you must generate from the scratch using the same prompt and then you can try to synthesize, but working with an existing response can work if desired. I use explicit instructions to find issues with assigned severities and then these are going through the panel of judges, only issues passing certain threshold are fixed in the original response.

I'll share a revelation which vastly improved my results: tell judges to evaluate truth and usefulness/should-be-fixed axis separately. Because inevitably with a prompt that is forcing to find issues you will end up with nitpicks. Plus truth axis allows to better evaluate the issue-finder models for your use case.

That's some part of what happens when I generate explanations like this one: https://hanzirama.com/character/%E6%9D%A5#explain - at this point the site is a small side product of my LLMs-evaluation machinery.

Bonus content for patient readers: if you need top quality you will likely need to pin provider(s) on OR, :exacto is not enough to get good repeatable results especially for open-weights models.

New comment by comboy in "Anthropic's Safety Superpower"

comboy — Mon, 15 Jun 2026 12:51:40 +0000

They cannot do it. Apart from all the practical, technical and talent reasons, it would still be exporting forbidden stuff.

The signal is clear enough though for the next Anthropic..

New comment by comboy in "What happened to nerds?"

comboy — Mon, 15 Jun 2026 09:14:05 +0000

It's simple, marketing dominates everything. With attention being very expensive, appearance is what matters.

It doesn't matter if you write fantastic library, nobody is gonna use it because they won't know about it, the one with a gif of the terminal (ffs) will win that has a good page describing what it does (and being the most popular one can even become better than your library because of the following but that's not the point here).

It's everywhere, products, hiring, services. We have no network of trust (sigh), we need to trust some heuristics based on a shallow information. If somebody focuses on the shallow he wins, because nobody can ever dive into everything.

New comment by comboy in "Claude Fable 5: mid-tier results on coding tasks"

comboy — Sun, 14 Jun 2026 16:20:19 +0000

I'm creating hanzirama.com

I generate explanations for characters and words like so: https://hanzirama.com/character/%E6%9D%A5#explain

But I don't want to mislead learners and want to provide some cultural depth, so I have a hole sophisticated pipeline, using multiple models to generate the explanation, then multiple models look for issues in the explanation, each issue goes through the panel of judges (basically trying to squash down any hallucinations), it's fixed and it goes through such cycles a few times over.

I've been at it for some months now, so I have dozens of different probes, that I needed to evaluate prompts and method changes. Plus on some items I generated so many explanations through different means that I can tell a lot about given model just by looking at one.

Plus I'm doing some statistics, so I see how e.g. when working as judges of issues some models correlate heavily with some others... Fun fact during some testing runs basically just testing providers I stumbled upon qwen introducing himself as made by Google. And also Anhropic's Sonnet saying that it was made by OpenAI :)

At this point all my evaluations frameworks and pipelines stuff is much bigger than the site itself. I'm having lots of fun though.

New comment by comboy in "Show HN: FablePool – pool money behind a prompt, and Fable builds it in public"

comboy — Thu, 11 Jun 2026 23:41:22 +0000

But it sounds like FableFool so it has that going for it.

New comment by comboy in "Claude Fable 5: mid-tier results on coding tasks"

comboy — Thu, 11 Jun 2026 20:41:41 +0000

There is in /config "Switch models when a message is flagged" now which can be set to false, but I had no chance to see what happens then, does it just stop or what.

New comment by comboy in "Claude Fable 5: mid-tier results on coding tasks"

comboy — Thu, 11 Jun 2026 20:31:26 +0000

'by the way, your previous attempts have these structural problems."

Just to be clear, it did not have access to any previous work that opus did? Because they are pretty good at digging out relevant tmp files and making use of whatever is out there.

With my fable adventures I caught it hallucinating something and stating it as a fact in CLI twice. And it was something that I did not see opus do in such way, opus obviously many times stated some things that it did not verify but guessed, but fable said something like "the probe showed that ..." - but there was no probe, it was not about some past events it was about what it was doing right now. "I overstated"...

But boy does it know Chinese, so much better than any other english model, gemini used to be the king but fable clearly was trained on a decent amount of it. It has a deep cultural understanding.

New comment by comboy in "Workers are spending over 6 hours a week botsitting AI, fueling job frustration"

comboy — Thu, 11 Jun 2026 14:05:06 +0000

I kind of enjoy exploring black boxes, trying how different inputs are mapping to differences in outputs. It's kind of like hacking. The problem is, they keep altering the box.

New comment by comboy in "Anthropic requires 30 day data retention for Fable and Mythos"

comboy — Thu, 11 Jun 2026 12:23:30 +0000

ROTFL

New comment by comboy in "AI agent runs amok in Fedora and elsewhere"

comboy — Thu, 11 Jun 2026 12:14:47 +0000

Here's the thing. Building trust and then leaving stuff in has been around forever. The fact that it becomes cheaper does not matter that much (since protection against it is also getting better), but it required you to have a bunch of extremely talented people who has spent much of their life diving into given topic.

Such driven people are usually even hard to buy, they usually would rather get by with enough income and work on interesting projects with interesting people that get some uninteresting work for tons of money. This still does not stop them from working for Malice. But ethics do. Even if not right away, if people see that what they are doing is not quite OK, the talent stops eroding. People quit, productivity drops. That was a good dynamic. Which now will be gone.

New comment by comboy in "Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only use"

comboy — Wed, 10 Jun 2026 20:06:15 +0000

Oh, a nice subthread place to vent. Their CLI is so f tragic that it is ridiculous. It keeps scrambling the terminal, scroll and basic shortcuts keep breaking, I've used so many tuis and terminal apps and many of them are a single man operation and a side project and I have never seen anything so bad.

If I didn't know from experience that directed properly claude can be powerful, knowing that they used it to create that CLI would be instant runaway based on very reasonable heuristics - if they are not able to use their product to create a decent piece of software that is not even sophisticated then it seems futile for me to try.

I just do not understand. I feel like most HN could vibe code better claude CLI in claude than the CLI (and certainly just write one) than what we have to deal with to use subscription.

New comment by comboy in "If Claude Fable stops helping you, you'll never know"

comboy — Tue, 09 Jun 2026 22:09:12 +0000

I'm fairly certain they were doing something similar already possibly with some quantizations and not for the good humanity but just trying to handle the increased usage. Not for API requests though, just subscription CLI usage.

New comment by comboy in "Apple WWDC 2026"

comboy — Mon, 08 Jun 2026 19:19:07 +0000

If I share a project with an American friend and he says it's awesome, I still don't know whether he liked it or not.

If I share it with a Polish or German friend and he says it's "not bad" then I know he is really impressed.

New comment by comboy in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"

comboy — Mon, 08 Jun 2026 18:56:29 +0000

For one, they invested in infrastructure. They can build fast and efficiently. They can provide power, they can provide cooling. Even if you just make roads better you make everything more efficient. Plus level of standard education. It all compounds.

On HN China is seen as a cheap labor copycat. This used to be a fair approximation at some point in the past. In my opinion China is getting ahead of everyone else much more than US used to be.

SF is a beautiful thing in the US, vast power and wealth comes from there. Smart people collaborating communicating and building fast and with excitement. China did SF kind of thing for many different sectors in many different places.

New comment by comboy in "Do agents.md files help coding agents?"

comboy — Mon, 08 Jun 2026 10:13:35 +0000

This is relevant to my interests, did you maybe test which models handle custom languages best? It also seems like a good proxy for them being able to stick to important instructions and not being carried away with things that are lookalikes.