Hacker News: nodja

New comment by nodja in "The US is winning the AI race where it matters most: commercialization"

nodja — Wed, 13 May 2026 21:02:08 +0000

When a cyclist is leading a pack and pushing themselves against the air resistance for half the race, do you expect that cyclist to win, or one of the ones behind that's been taking it easy in the slipstream?

It's a race metaphor not a football metaphor.

New comment by nodja in "The US is winning the AI race where it matters most: commercialization"

nodja — Wed, 13 May 2026 20:59:08 +0000

GP here, leading and winning are different things in the race context/metaphor.

In foot/cycling races there's often a pack leader, that leader is often not the winner of the race, all they're doing is taking the brunt of the air resistance while everyone else slipstreams behind. For a casual observer it seems that the pack leader will win, but everyone knows that it's gonna be someone that paced themselves that's going to overtake the first spot at the tail end of the race.

New comment by nodja in "The US is winning the AI race where it matters most: commercialization"

nodja — Wed, 13 May 2026 19:30:48 +0000

If they got there by tiring themselves out more than the other team, yes.

New comment by nodja in "The US is winning the AI race where it matters most: commercialization"

nodja — Wed, 13 May 2026 14:35:51 +0000

No, the US is _leading_ the AI race, but the race isn't over.

What's the point of leading the race for 90% of it, if they're gonna slip on their own sweat and fall down by the end? In non metaphorical terms, what's the point of spending billions of dollars rushing to get the best AI tech at all costs, when the competition can distil your progress and catch up in 6-12 months while only spending 1% of what you spent.

Even in the aspect the article cares about, commercialization, the US is starting to lose marketshare, I've seen people move from cc/codex plans to use glm/opencode plans due to the recent squeeze the US companies put on plan usage, the US companies are screwed if that sticks, not everyone needs the bleeding edge models, they just want to pay $20/month and have the models be decently capable.

New comment by nodja in "Even 'uncensored' models can't say what they want"

nodja — Tue, 21 Apr 2026 00:09:30 +0000

If I'm understanding this right, this presupposes that the models were pre-trained on unfiltered data like with the "floor" models, so when comparing between the "retail" and uncensored models they will obviously not match the floor because they were not trained on the same data in the first place.

To me it stands to reason that a model that has only seen a limited amount of smut, hate speech, etc. can't just start writing that stuff at the same level just because it not longer refuses to do it.

The reason uncensored models are popular is because the uncensored models treat the user as an adult, nobody wants to ask the model some question and have it refuse because it deemed the situation too dangerous or whatever. Example being if you're using a gemma model on a plane or a place without internet and ask for medical advice and it refuses to answer because it insists on you seeking professional medical assistance.

New comment by nodja in "We got 207 tok/s with Qwen3.5-27B on an RTX 3090"

nodja — Mon, 20 Apr 2026 21:01:32 +0000

> speculative decoding which, generally speaking, is not the same quality as serving the model without it.

I've never heard of ANY speculative decoding that wasn't lossless. If it was lossy it'd be called something else.

This page is just a port of DFLASH to gguf format, it only implements greedy decoding like you said so the outputs will be inferior, but not inferior to greedy decoding on the original model. Tho that's just a matter of implementing temperature, top_k, etc.

New comment by nodja in "Introspective Diffusion Language Models"

nodja — Thu, 16 Apr 2026 00:15:58 +0000

That will depend on the model, but they'll hit compute limits before a typical GPU in almost all cases. Macs will still benefit a speedup from this, just not one as big as the one reported.

New comment by nodja in "Introspective Diffusion Language Models"

nodja — Tue, 14 Apr 2026 20:57:27 +0000

Same reason why prompt processing is faster than text generation.

When you already know the tokens ahead of time you can calculate the probabilities of all tokens batched together, incurring significant bandwidth savings. This won't work if you're already compute bound so people with macs/etc. won't get as much benefits from this.

New comment by nodja in "Spain to expand internet blocks to tennis, golf, movies broadcasting times"

nodja — Tue, 14 Apr 2026 19:14:48 +0000

Official sites make things worse on purpose after getting any sort of traction because they can't stop chasing profits.

I don't watch sports, but my father watches soccer. He really only cares about 1 team and the national games from our home country. He was spending over $100/month to be able to watch the games, and they werent even in his native language. Now he pays $80/year for a pirate IPTV service and not only can he watch the games anywhere he wants, he also gets native language commentary for the games, national tv channels like news, etc.

When pirates can charge you money and offer a superior service, it absolutely is a service problem. You can claim that the realities of licensing and whatnot don't allow official channels to provide the best service they can, but that's not true in this case. When the same provider is splitting game broadcast from one team into different packages you know they're just trying to extract the most amount of money possible.

IDK the deal with scanlator sites nowadays, but I assume the official sites can provide more timely translations for manga since they can access the source material before anyone has seen it. I know most popular manga gets translated within hours of release, but if you're following some more niche stuff it can be several days. I also know a lot of scanlators have patreon pages so it's not like the demand from paying customers for translated media isn't there.

New comment by nodja in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"

nodja — Sun, 12 Apr 2026 15:57:10 +0000

Not parent but I can guess from watching mostly from the sidelines.

They introduced a 1M context model semi-transparently without realizing the effects it would have, then refused to "make it right' to the customer which is a trait most people expect from a business when they spend money on it, specially in the US, and specially when the money spent is often in the thousands of dollars.

Unless anthropic has some secret sauce, I refuse to believe that their models perform anywhere near the same on >300k context sizes than they do on 100k. People don't realize but even a small drop in success rate becomes very noticeable if you're used to have near 100%, i.e. 99% -> 95% is more noticeable than 55% -> 50%.

I got my first claude sub last month (it expires in 4 days) and I've used it on some bigish projects with opencode, it went from compacting after 5-10 questions to just expanding the context window, I personally notice it deteriorating somewhere between 200-300k tokens and I either just fork a previous context or start a new one after that because at that size even compacting seems to generate subpar summaries. It currently no longer works with opencode so I can't attest to how it well it worked the past week or so.

If the 1M model introduction is at fault for this mass user perception that the models are getting worse, then it's anthropics fault for introducing confusion into the ecosystem. Even if there was zero problems introduced and the 1M model was perfect, if your response when the users complain is to blame it on the user, then don't expect the user will be happy. Nobody wants to hear "you're holding it wrong", but it seems that anthropic is trying to be apple of LLMs in all the wrong ways as well.

New comment by nodja in "Small models also found the vulnerabilities that Mythos found"

nodja — Sat, 11 Apr 2026 22:55:16 +0000

You misunderstood.

Instead of asking the model: "Here's this codebase, report any vulnerability." you ask. "Here's this codebase, report any vulnerability in module\main.c".

The model can still explore references and other files inside the codebase, but you start over a new context/session for each file in the codebase.

New comment by nodja in "Claude mixes up who said what and that's not OK"

nodja — Thu, 09 Apr 2026 12:58:30 +0000

Anyone familiar with the literature knows if anyone tried figuring out why we don't add "speaker" embeddings? So we'd have an embedding purely for system/assistant/user/tool, maybe even turn number if i.e. multiple tools are called in a row. Surely it would perform better than expecting the attention matrix to look for special tokens no?

New comment by nodja in "Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw"

nodja — Sat, 04 Apr 2026 03:00:25 +0000

You can charge $10 on the account and get unlimited requests. I abused this last week with the nemotron super to test out some stuff and made probably over 10000 requests over a couple of days and didn't get blocked or anything, expect 5xx errors and slowdowns tho.

New comment by nodja in "Cohere Transcribe: Speech Recognition"

nodja — Tue, 31 Mar 2026 23:27:27 +0000

It's probably another ASR model that focuses on benchmarks and simple uses instead of more challenging real use cases.

I upload edited gameplay vods of twitch streams on youtube, and use whisper-large-v3 to provide subtitles for accessibility reasons (youtube's own auto-subtitles suck, tho they've been getting better).

My checklist for a good ASR model for my use case is:

1. Have timestamp support.

2. Support overlapping speakers.

3. Accurate transcripts that don't coalesce half words/interrupted sentences.

4. Support non verbal stuff like [coughs], [groans], [laughs], [sighs], etc.

5. Allow context injection of non-trivial sizes (10k+ words)

1 is obvious because without it we can't have subtitles. Force alignment fails too often.

2 is crucial for real world scenarios because in the real world people talk over each other all the time, in my case it's a streamer talking over gameplay audio, or when the streamer has guests over. When 2 people speak the transcript either ignores one of them, or in the worst case, both of them.

3 and 4 are an accessibility thing, if you're deaf or hard of hearing having a more literal transcript of what's being said conveys better how the speaker is speaking. If all subtitles are properly "spell-checked" then it's clear your model is overfit to the benchmarks.

5 Is not a requirement per se, but more of a nice to have. In my use cause the streamer is often reading stream chat so feeding the model the list of users that recently talked, recent chat messages, text on screen, etc. Would make for more accurate transcripts.

I've tried many models, and the closest that fulfill my needs are LLM style models on top of forced alignment. It's too slow, so I've been sticky with whisper because with whisperx I can get a transcript in 5 minutes with just a single command.

One thing all these models do (including whisper) is just omit full sentences, it's the worst thing a model can do.

New comment by nodja in "The Claude Code Source Leak: fake tools, frustration regexes, undercover mode"

nodja — Tue, 31 Mar 2026 22:46:23 +0000

This blog post looks to be partially AI generated as well...

New comment by nodja in "Claude Code's source code has been leaked via a map file in their NPM registry"

nodja — Tue, 31 Mar 2026 10:56:51 +0000

If anyone at anthropic is reading this and wants more logs from me add jfc.

New comment by nodja in "Nvidia greenboost: transparently extend GPU VRAM using system RAM/NVMe"

nodja — Thu, 19 Mar 2026 02:59:07 +0000

NVIDIA's GPU drivers on windows 100% do this

https://i.imgur.com/c0a3vUy.png

New comment by nodja in "Unsloth Studio"

nodja — Wed, 18 Mar 2026 00:47:48 +0000

IDK how it did but it detected my LM studio downloaded models I have on a spinning drive (they're not in the default location).

New comment by nodja in "Intel XeSS 3: expanded support for Core Ultra/Core Ultra 2 and Arc A, B series"

nodja — Tue, 24 Feb 2026 11:26:52 +0000

> It's like people let their hate of AI and LLM bubble blind them, and their brains can't compartmentalize good from bad news anymore.

DLSS is also AI and people like it.

People don't like framegen because the manufacturers are not being honest about it and using it for deceptive hype marketing. Anyone with a brain knows that it introduces latency and is only useful if you're already 40+ FPS, we also know that companies will use it to pad benchmarks. NVIDIA themselves said that the 5070 had 4090 performance because it supports framegen.

New comment by nodja in "Claude Opus 4.6"

nodja — Thu, 05 Feb 2026 21:06:57 +0000

Doing some math in my head, buying the GPUs at retail price, it would take probably around half a year to make the money back, probably more depending how expensive electricity is in the area you're serving from. So I don't know where this "losing money" rhetoric is coming from. It's probably harder to source the actual GPUs than making money off them.