Hacker News: wolttam

New comment by wolttam in "Previewing GPT‑5.6 Sol: a next-generation model"

wolttam — Fri, 26 Jun 2026 21:21:13 +0000

I found Flash to be a bit shaky as well until I started using it in xhigh/max thinking effort, then it became my daily driver. It runs quite well on a couple of DGX Sparks.

I still wish it was a little better, but there's hope for another model checkpoint (maybe with some of GLM 5.2's goodness distilled into it, that would be nice).

New comment by wolttam in "Previewing GPT‑5.6 Sol: a next-generation model"

wolttam — Fri, 26 Jun 2026 17:26:12 +0000

If you have no need for Anthropic/OpenAI's frontier model capability, you may be better served with an open-weight model that can't be taken away.

Edit:

> GPT-5 does the job.

I bring up DeepSeek V4 Flash a lot on HN, but I want to mention that according to Artificial Analysis, it trades blows with GPT-5 (high) (from August, 2025) [0]

[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...

New comment by wolttam in "GLM-5.2 is a step change for open agents"

wolttam — Thu, 25 Jun 2026 15:52:24 +0000

I am one of those ecstatic folk :)

New comment by wolttam in "Hey Nico, you didn't vibe code your data room but stole it from Papermark"

wolttam — Thu, 25 Jun 2026 13:16:02 +0000

Folks... read the actual tweet. They literally didn't vibe code it - they copy-pasted another project.

New comment by wolttam in "Hey Nico, you didn't vibe code your data room but stole it from Papermark"

wolttam — Thu, 25 Jun 2026 13:15:20 +0000

This doesn't appear to be AI posturing, did you read the tweet? It is about one product blatantly, directly ripping off another.

New comment by wolttam in "Claude Tag"

wolttam — Tue, 23 Jun 2026 22:55:05 +0000

Baseline for “modern” apps, what? We’re talking about a terminal application here, there is definitely, most-assuredly ways to write something that does exactly what Claude Code does with a teeny fraction of the resource requirements.

The trick is not bringing React into the terminal.

(FWIW, I have a link to a TUI harness in my profile that uses 50MB of ram and about 1% CPU while streaming, even in giant contexts)

New comment by wolttam in "Prompt Injection as Role Confusion"

wolttam — Tue, 23 Jun 2026 22:43:15 +0000

They’re valid things to be concerned about IMO.

I think you’re looking for an answer you’re not going to get unfortunately. I think there actually is a higher than average risk of data leakage with the insane optimizations that go into model serving - GLM5.1 had an issue of going into jibberish when their infra was under high load, and it turned out to be a cross-request KV cache contamination issue.[1]

Personally, my effort has been to use local models only as of late, and it’s gone pretty well!

[1]: https://z.ai/blog/scaling-pain

New comment by wolttam in "Prompt Injection as Role Confusion"

wolttam — Tue, 23 Jun 2026 19:43:56 +0000

In other words: controlling for that kind of potential data-mixing is the same as in any other application where customer data is co-located within the same running process/memory/storage space.

New comment by wolttam in "The 100k whys of AI"

wolttam — Tue, 23 Jun 2026 17:31:58 +0000

And chances are those 3-5 LLMs are more alike than they are different, because there is only one internet to pre-train on.

New comment by wolttam in "The Coming Loop"

wolttam — Tue, 23 Jun 2026 14:41:48 +0000

I am 100% for fully agentic loops... for tasks other than engineering.

I'm not willing to outsource the understanding how things work part of myself. That part of myself is what got me into computing in the first place.

If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.

It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.

New comment by wolttam in "Nearly half of LG smart TV apps contain residential proxy SDKs"

wolttam — Tue, 23 Jun 2026 01:35:25 +0000

> A better solution would be to root the damn TV and neuter its spyware/adware crap.

That sounds like a lot of work. I don't want to sign up to this much work for every product I own that I want an iota of control over.

So I would argue if this is "better" by any stretch of the word

New comment by wolttam in "Prompt Injection as Role Confusion"

wolttam — Mon, 22 Jun 2026 22:38:44 +0000

I think the key to making "useful" things is to sandbox the agent and give it read/write access to strictly the data needed for the function. The agent can only talk to preordained services and its input to those services will be treated as untrusted user input.

To be clear: I agree fundamentally that there is no safe way to have agents connected to the world in a way that allows them to take irreversible actions. Deployments where agents can take destructive actions are deployments where the agent will, eventually, take destructive action.

New comment by wolttam in "GLM 5.2 vs. Opus"

wolttam — Mon, 22 Jun 2026 12:47:31 +0000

Would have run it with GLM on max/xhigh effort. Just for fun.

New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"

wolttam — Mon, 22 Jun 2026 01:27:05 +0000

https://developer.nvidia.com/blog/an-introduction-to-specula...

You draft n tokens, and you verify them in a single forward pass.

Here's the vLLM flag:

    --speculative-config '{{"method":"mtp","num_speculative_tokens":2}}'

They may have only trained at a depth of 1, but boy-howdy, does that little MTP head do a pretty good of successfully predicting that second token about 60-80% of the time.

It works great. I'll keep my increased performance, and

> so i don't know why you are punching these documents into the chatbot, and asking it questions about them, and then it gives you the wrong answers

you keep whatever this is. I posted direct quotes from their papers which say "it speeds up inference" (paraphrasing). I don't feel there is anything I can do to turn this into a good-faith discussion. Beep boop.

New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"

wolttam — Sun, 21 Jun 2026 19:07:59 +0000

> MTP in Inference. Our MTP strategy mainly aims to improve the performance of the main model, so during inference, we can directly discard the MTP modules and the main model can function independently and normally. *Additionally, we can also repurpose these MTP modules for speculative decoding to further improve the generation latency.*[1]

(emphasis mine)

> Instead of predicting just the next single token, DeepSeek-V3 predicts the next 2 tokens through the MTP technique. Combined with the framework of speculative decoding (Leviathan et al., 2023; Xia et al., 2023), it can significantly accelerate the decoding speed of the model.[2]

> As DeepSeek-V3, DeepSeek-V4 series also set MTP modules and objectives. Given that the MTP strategy has been validated in DeepSeek-V3, we adopt the same strategy for DeepSeek-V4 series without modification.[3]

[1]: https://arxiv.org/pdf/2412.19437#subsection.2.2

[2]: https://arxiv.org/pdf/2412.19437#subsubsection.5.4.3

[3]: https://arxiv.org/pdf/2606.19348v1#subsection.2.1

Side comment: I feel you may be too cynical towards your fellow commenters.

New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"

wolttam — Sun, 21 Jun 2026 15:22:51 +0000

Yep, those are the numbers I'm getting with DSv4 Flash on vLLM across 2 sparks.

New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"

wolttam — Sun, 21 Jun 2026 15:10:35 +0000

I suspect DwarfStar could probably squeeze more performance out of the single spark, maybe up closer to 20tok/s.

Moving to 2 sparks meant switching to vLLM with 2-way tensor parallelism and working multi-token prediction. The parallelism and MTP on top of better tuned kernels[1] gave an extremely nice boost! I was quite pleased. I've seen bursts up to 60tok/s at ~150k context - sometimes the MTP seems to really kick in (i.e. high acceptance rate on its tokens)

Currently running a custom vLLM build put together by some folks on the Nvidia forums[2], which speaks to how early support for the model is.

[1]: https://github.com/lukealonso/b12x

[2]: https://forums.developer.nvidia.com/t/372268

New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"

wolttam — Sun, 21 Jun 2026 14:46:40 +0000

I started with antirez' DwarfStar[1] on one spark and that (~11-14tok/s generation, ~300-400 tok/s prompt processing) was enough of a taste for me to jump into 2 sparks, running the native quant of DSv4 Flash.

Now at 40-50tok/s generation and ~2000 tok/s prefill with a model that I've seen reason through race conditions and be able to trivially pull off any straight-forward coding task, and remain coherent at 500k context. With a preview checkpoint of the weights!

I'm excited for the future of local LLMs. There is some buy-in but apparently not an extreme amount to get access to models that can stand in the for the giants on all but the most challenging and/or hands-off coding tasks.

[1]: https://github.com/antirez/ds4

New comment by wolttam in "GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2"

wolttam — Sat, 20 Jun 2026 12:32:04 +0000

> it is clear that actual intelligence has plateaued significantly.

> Moving forward, the industry cannot continue to train bigger and bigger models since their intelligence not only plateaus but often will get worse

These are wild claims - why are we concluding that bigger models and more data = more hallucination? That’s actually the opposite of what’s been happening over the last couple years. Some models may still hallucinate more but they all hallucinate much less than the original 175B ChatGPT which was smaller and trained on (much) less data than anything current.

Edit: My mention of data comes from this quote:

> A shift is happening among major AI labs, who are becoming increasingly skeptical of endless parameter count and training data scaling

My take on the current situation: it seems clear that the industry has seen that there is still a lot left to squeeze out of sub-1T models. But for that you do need more, high-quality data in the distribution which you want to unlock capabilities for.

New comment by wolttam in "Is AI ruining our skills? Early results are in – and they're not good"

wolttam — Fri, 19 Jun 2026 18:33:23 +0000

Humans will become individually and independently less skilled while having access to tools that allow them to do far more than even the most skilled human could, before having access to these tools.

I'm not sure if we'll become less intelligent. I think our sacks of neurons are gonna keep on making associations, just across a different set of topics.