Hacker News: Hfuffzehn

New comment by Hfuffzehn in "German ruling declares Google liable for false answers in AI Overviews"

Hfuffzehn — Wed, 10 Jun 2026 08:40:07 +0000

Yes, the monopoly is not relevant for the court.

It is relevant for Google though, because they want to transfer it to another product.

And the court is saying that whatever that new product is, Google is not allowed to mislead the public by pretending it is search.

New comment by Hfuffzehn in "German ruling declares Google liable for false answers in AI Overviews"

Hfuffzehn — Wed, 10 Jun 2026 06:53:26 +0000

If I get it correctly I like the ruling.

So Google has established a product called Search. For that product rules have been established. Google has monopolized that product.

Now Google is replacing that product with a new product. But they keep calling it the same thing. Because they want to keep their monopoly.

That is what has been deemed illegal. Gemini is not illegal. Pretending the worst version of Gemini is Search is illegal, because it breaks the rules established for Search.

But IANAL.

New comment by Hfuffzehn in "MAI-Code-1-Flash"

Hfuffzehn — Wed, 03 Jun 2026 13:01:44 +0000

And as I am on holiday today I will try to help them out:

                   GPT-5.4 mini Haiku 4.5 MAI-Code

SWE-Bench Pro 54.4 % 35.2% 51.2%

Terminal-Bench 2.0 60.0 % 41.6% 54.8%

Source: https://openai.com/index/introducing-gpt-5-4-mini-and-nano/

New comment by Hfuffzehn in "MAI-Code-1-Flash"

Hfuffzehn — Wed, 03 Jun 2026 12:43:04 +0000

So I guess the important link the marketing department forgot is this one: https://docs.github.com/en/copilot/reference/copilot-billing...

Model Input Cached input Output

MAI-Code-1-Flash $0.75 $0.075 $4.50

Comparing to

Claude Haiku 4.5 $1.00 $0.10 $5.00

looks fine.

But they also forgot to include the benchmarks comparing to

GPT-5.4 mini $0.75 $0.075 $4.50

Those would have been helpful.

New comment by Hfuffzehn in "AI outperforms law professors in Stanford Law study"

Hfuffzehn — Wed, 03 Jun 2026 12:28:58 +0000

This is a very good comment. But notice how even in software engineering there is still disagreement about these structural safeguards.

So yes, we can say the LLM created bad code when it does not compile or fails prewritten tests.

But experts might disagree what good comments, good cohesion, appropriate use of design patterns, appropriate test coverage or clear variable names are.

So what are we suppossed to train the LLMs towards? Somebody still has to decide what "good" is.

New comment by Hfuffzehn in "AI outperforms law professors in Stanford Law study"

Hfuffzehn — Wed, 03 Jun 2026 11:52:28 +0000

I agree. But notice that you assume that there is a metric with which you can messure improvement. Which is fine if you are measuring against your personal taste.

But it might be that the optimization target itself has a ceiling. If you're training toward human approval ratings from a broad population, you converge toward what median preference selects for. The plateau is baked into what you're measuring against.

New comment by Hfuffzehn in "MAI-Code-1-Flash"

Hfuffzehn — Wed, 03 Jun 2026 11:07:49 +0000

https://docs.github.com/en/copilot/reference/copilot-billing...

Model Input Cached input Output MAI-Code-1-Flash $0.75 $0.075 $4.50

New comment by Hfuffzehn in "MAI-Code-1-Flash"

Hfuffzehn — Wed, 03 Jun 2026 11:02:20 +0000

The first time I was impressed by AI coding was when I pointed it at some switch case monster code and told it to replace it with a strategy pattern.

And it did just fine.

So no matter what you think about vibe coding, using AI for these slightly more complicated use cases is genuinely useful.

New comment by Hfuffzehn in "MAI-Code-1-Flash"

Hfuffzehn — Wed, 03 Jun 2026 09:57:45 +0000

Agreed. Seems like this could have been a nice model if we would still be in the old GitHub Copilot free request/ premium multiplier mode. It could have been a good compromise to somehow reign in the costs for Microsoft.

But with Copilot now just being paying per-token prices I don't see how this is competitive with Chinese models.

It is probably telling you can't find the costs in the announcement. Because Input $0.75 Cached input $0.075 Output $4.50 might be competitive with Haiku, but nobody in their right mind uses Haiku and Anthropic has abandoned it chasing the tokenmaxers who aren't thinking about budgets.

So I guess they are aiming for corporate customers that are bound to Microsoft through compliance approval that will soon start seeing their budgets explode that have to find some corporate compromise.

New comment by Hfuffzehn in "Real-time LLM Inference on Standard GPUs: 3k tokens/s per request"

Hfuffzehn — Fri, 29 May 2026 12:56:16 +0000

That's really nice of them.

That means Jensen can add another 30 times faster when comparing Rubin to Blackwell without having to actually do anything.

Hopefully that means he won't have any problem to make another 150 billion in profit in the next year.

Sorry for the sarcasm. Looks like interesting work.

New comment by Hfuffzehn in "Various LLM Smells"

Hfuffzehn — Thu, 28 May 2026 22:02:58 +0000

The interesting thing for me is that I do not feel like the writing of LLMs has improved very much lately stylistically. They have reached a "good" level some time ago but the newer models havn't brought such improvements that you would prefer them to an expert human writer.

Will be interesting if that holds in other areas when chasing super intelligence.

New comment by Hfuffzehn in "Using AI to write better code more slowly"

Hfuffzehn — Tue, 26 May 2026 06:48:52 +0000

The main insight here I think is that LLMs are great tools for iterative development and iterative problem solving in general.

You can very effectivly iterate alone using the LLM as a mirror, rephrasing what you put in and adding a bit.

You can use LLMs to quickly create prototypes to give to other human beings to help you with the next iteration.

If you get something from someone else to iterate on you can use the LLM to help you with understanding to rephrase things in a way more suitable for your understanding.

But instead everything anybody seems to be talking about seems to be one shoting things and AI iterating with other AI.

The big problem here is that the one thing AI does not have is agency. The naming AI agent is wishful thinking and marketing.

New comment by Hfuffzehn in "Does anybody like React?"

Hfuffzehn — Tue, 26 May 2026 05:04:34 +0000

I haven't really deeply thought about frontend JS for many years.

Back then the question we were looking at was whether it would be good idea to move away from SAP UI5. The alternatives back then where React, Angular and Vue.

The conclusion we came to was that it was definitely worth to migrate, but to what was not so easy to agree on.

Right now I am working with a legacy Java codebase that was based on RxJava. And every single day I am cursing the people that made that decision. It seems so obviously a bad idea. And the only thing that lets me keep my sanity is remembering that every decision only becomes obvious with hindsight.

So I guess the only thing I can contribute is that it could always be worse and sometimes making the bold and seemingly innovative decision comes back many years later to bite other people.

New comment by Hfuffzehn in "DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost"

Hfuffzehn — Sun, 24 May 2026 16:50:29 +0000

This is really tickling the conspiracy theorist part of my brain.

"Independent open-source project · not affiliated with DeepSeek" "Reasonix only targets DeepSeek because..." "Why DeepSeek only? Can I swap to Claude / GPT? It's a design choice, not a limitation"

The lady doth protest too much, methinks?

Nicely timed shortly after the making the rebate permanent anouncement.

Could just be Chinese devs trying to help western devs with some software and a western facing marketing campaign to raise awareness. Could be DeepSeek astroturfing. Could be "someone" in China trying to get more access to western data.

Who knows?

New comment by Hfuffzehn in "BambuStudio has been violating PrusaSlicer AGPL license since their fork"

Hfuffzehn — Sat, 23 May 2026 13:36:02 +0000

With DeepSeek making their price rebates permanent we now have some data what China values data access at.

Western providers of the open weight models are 3 times or more as expensive as DeepSeek itself right now.

Of course the data access for the Chinese is not the only part valued in there, but I am pretty sure it is one.

New comment by Hfuffzehn in "I Miss Terry Pratchett"

Hfuffzehn — Sat, 23 May 2026 13:24:42 +0000

I miss him too.

Even though I had the experiences he discribes with Douglas Adams first before discovering Terry Pratchett.

New comment by Hfuffzehn in "The last six months in LLMs in five minutes"

Hfuffzehn — Tue, 19 May 2026 06:47:54 +0000

You have correctly identified that getting a "high-quality harness (ie preloaded instructions from md files, including custom skills)" is the (or at least a) hard part.

Because you have to adjust the harness to your problem space and provide that so you can say it is high-quality.

Many people will stop that discussion at the claude code vs. codex vs. opencode level and then merge that with discussing model performance.

And that is also why "Generate an SVG of a pelican riding a bicycle" is still a benchmark worth discussing. Because at least it is a defined problem space.

New comment by Hfuffzehn in "Maybe you shouldn't install new software for a bit"

Hfuffzehn — Fri, 08 May 2026 06:26:16 +0000

Isn't blaming AI for that similar to blaming C for buffer overflows?

More people are producing more code because of easier tools. Most code is bad. But that's not the tools fault.

And in the end it is a problem of processes and culture.

New comment by Hfuffzehn in "Grok 4.3"

Hfuffzehn — Fri, 01 May 2026 12:43:32 +0000

Yes, but that market is not b2b, less commercialized, more end consumer focused and more bring your own key.

That's why I find it interesting. Anthropic is not interested in building a moat there and OpenAI has given up on their announcement of exploring it.

So you can see end users making decisions.

New comment by Hfuffzehn in "Grok 4.3"

Hfuffzehn — Fri, 01 May 2026 11:59:01 +0000

Sure, but the best statistics about what models people are actually using when they can choose is probably from openrouter: https://openrouter.ai/apps/category/entertainment/roleplay