<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: wolttam</title><link>https://news.ycombinator.com/user?id=wolttam</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 27 Jun 2026 00:04:33 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=wolttam" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by wolttam in "Previewing GPT‑5.6 Sol: a next-generation model"]]></title><description><![CDATA[
<p>I found Flash to be a bit shaky as well until I started using it in xhigh/max thinking effort, then it became my daily driver. It runs quite well on a couple of DGX Sparks.<p>I still wish it was a little better, but there's hope for another model checkpoint (maybe with some of GLM 5.2's goodness distilled into it, that would be nice).</p>
]]></description><pubDate>Fri, 26 Jun 2026 21:21:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48692125</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48692125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48692125</guid></item><item><title><![CDATA[New comment by wolttam in "Previewing GPT‑5.6 Sol: a next-generation model"]]></title><description><![CDATA[
<p>If you have no need for Anthropic/OpenAI's frontier model capability, you may be better served with an open-weight model that <i>can't</i> be taken away.<p>Edit:<p>> GPT-5 does the job.<p>I bring up DeepSeek V4 Flash a lot on HN, but I want to mention that according to Artificial Analysis, it trades blows with GPT-5 (high) (from August, 2025) [0]<p>[0]: <a href="https://artificialanalysis.ai/models/comparisons/deepseek-v4-flash-vs-gpt-5" rel="nofollow">https://artificialanalysis.ai/models/comparisons/deepseek-v4...</a></p>
]]></description><pubDate>Fri, 26 Jun 2026 17:26:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48689312</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48689312</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48689312</guid></item><item><title><![CDATA[New comment by wolttam in "GLM-5.2 is a step change for open agents"]]></title><description><![CDATA[
<p>I am one of those ecstatic folk :)</p>
]]></description><pubDate>Thu, 25 Jun 2026 15:52:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48675257</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48675257</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48675257</guid></item><item><title><![CDATA[New comment by wolttam in "Hey Nico, you didn't vibe code your data room but stole it from Papermark"]]></title><description><![CDATA[
<p>Folks... read the actual tweet. They literally didn't vibe code it - they copy-pasted another project.</p>
]]></description><pubDate>Thu, 25 Jun 2026 13:16:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48672863</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48672863</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48672863</guid></item><item><title><![CDATA[New comment by wolttam in "Hey Nico, you didn't vibe code your data room but stole it from Papermark"]]></title><description><![CDATA[
<p>This doesn't appear to be AI posturing, did you read the tweet? It is about one product blatantly, directly ripping off another.</p>
]]></description><pubDate>Thu, 25 Jun 2026 13:15:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=48672851</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48672851</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48672851</guid></item><item><title><![CDATA[New comment by wolttam in "Claude Tag"]]></title><description><![CDATA[
<p>Baseline for “modern” apps, what? We’re talking about a terminal application here, there is <i>definitely, most-assuredly</i> ways to write something that does exactly what Claude Code does with a teeny fraction of the resource requirements.<p>The trick is not bringing React into the terminal.<p>(FWIW, I have a link to a TUI harness in my profile that uses 50MB of ram and about 1% CPU while streaming, even in giant contexts)</p>
]]></description><pubDate>Tue, 23 Jun 2026 22:55:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48652669</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48652669</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48652669</guid></item><item><title><![CDATA[New comment by wolttam in "Prompt Injection as Role Confusion"]]></title><description><![CDATA[
<p>They’re valid things to be concerned about IMO.<p>I think you’re looking for an answer you’re not going to get unfortunately. I think there <i>actually is</i> a higher than average risk of data leakage with the insane optimizations that go into model serving - GLM5.1 had an issue of going into jibberish when their infra was under high load, and it turned out to be a cross-request KV cache contamination issue.[1]<p>Personally, my effort has been to use local models only as of late, and it’s gone pretty well!<p>[1]: <a href="https://z.ai/blog/scaling-pain" rel="nofollow">https://z.ai/blog/scaling-pain</a></p>
]]></description><pubDate>Tue, 23 Jun 2026 22:43:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48652526</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48652526</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48652526</guid></item><item><title><![CDATA[New comment by wolttam in "Prompt Injection as Role Confusion"]]></title><description><![CDATA[
<p>In other words: controlling for that kind of potential data-mixing is the same as in any other application where customer data is co-located within the same running process/memory/storage space.</p>
]]></description><pubDate>Tue, 23 Jun 2026 19:43:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48650296</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48650296</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48650296</guid></item><item><title><![CDATA[New comment by wolttam in "The 100k whys of AI"]]></title><description><![CDATA[
<p>And chances are those 3-5 LLMs are more alike than they are different, because there is only one internet to pre-train on.</p>
]]></description><pubDate>Tue, 23 Jun 2026 17:31:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48648376</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48648376</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48648376</guid></item><item><title><![CDATA[New comment by wolttam in "The Coming Loop"]]></title><description><![CDATA[
<p>I am 100% for fully agentic loops... for tasks other than engineering.<p>I'm not willing to outsource the <i>understanding how things work</i> part of myself. That part of myself is what got me into computing in the first place.<p>If this work becomes simply a matter of describing intent to a machine (probably through an Issue, like a user), and going to check on the result when you get the 'done' notification: I'm done.<p>It's possible to use the tools to do awesome things without letting go of full system understanding of the parts that you look after.</p>
]]></description><pubDate>Tue, 23 Jun 2026 14:41:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=48645758</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48645758</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48645758</guid></item><item><title><![CDATA[New comment by wolttam in "Nearly half of LG smart TV apps contain residential proxy SDKs"]]></title><description><![CDATA[
<p>> A better solution would be to root the damn TV and neuter its spyware/adware crap.<p>That sounds like a lot of work. I don't want to sign up to this much work for every product I own that I want an iota of control over.<p>So I would argue if this is "better" by any stretch of the word</p>
]]></description><pubDate>Tue, 23 Jun 2026 01:35:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=48639056</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48639056</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48639056</guid></item><item><title><![CDATA[New comment by wolttam in "Prompt Injection as Role Confusion"]]></title><description><![CDATA[
<p>I think the key to making "useful" things is to sandbox the agent and give it read/write access to strictly the data needed for the function. The agent can only talk to preordained services and its input to those services will be treated as untrusted user input.<p>To be clear: I agree fundamentally that there <i>is no safe way</i> to have agents connected to the world in a way that allows them to take irreversible actions. Deployments where agents <i>can</i> take destructive actions are deployments where the agent <i>will</i>, eventually, take destructive action.</p>
]]></description><pubDate>Mon, 22 Jun 2026 22:38:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=48637368</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48637368</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48637368</guid></item><item><title><![CDATA[New comment by wolttam in "GLM 5.2 vs. Opus"]]></title><description><![CDATA[
<p>Would have run it with GLM on max/xhigh effort. Just for fun.</p>
]]></description><pubDate>Mon, 22 Jun 2026 12:47:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48629452</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48629452</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48629452</guid></item><item><title><![CDATA[New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"]]></title><description><![CDATA[
<p><a href="https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/" rel="nofollow">https://developer.nvidia.com/blog/an-introduction-to-specula...</a><p>You draft n tokens, and you verify them in a single forward pass.<p>Here's the vLLM flag:<p><pre><code>    --speculative-config '{{"method":"mtp","num_speculative_tokens":2}}'
</code></pre>
They may have only trained at a depth of 1, but boy-howdy, does that little MTP head do a pretty good of successfully predicting that second token about 60-80% of the time.<p>It works great. I'll keep my increased performance, and<p>> so i don't know why you are punching these documents into the chatbot, and asking it questions about them, and then it gives you the wrong answers<p>you keep whatever this is. I posted direct quotes from their papers which say "it speeds up inference" (paraphrasing). I don't feel there is anything I can do to turn this into a good-faith discussion. Beep boop.</p>
]]></description><pubDate>Mon, 22 Jun 2026 01:27:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48624517</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48624517</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48624517</guid></item><item><title><![CDATA[New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"]]></title><description><![CDATA[
<p>> MTP in Inference. Our MTP strategy mainly aims to improve the performance of the main model, so during inference, we can directly discard the MTP modules and the main model can function independently and normally. *Additionally, we can also repurpose these MTP modules for speculative decoding to further improve the generation latency.*[1]<p>(emphasis mine)<p>> Instead of predicting just the next single token, DeepSeek-V3 predicts the next 2 tokens through the MTP technique. Combined with the framework of speculative decoding (Leviathan et al., 2023; Xia et al., 2023), it can significantly accelerate the decoding speed of the model.[2]<p>> As DeepSeek-V3, DeepSeek-V4 series also set MTP modules and
objectives. Given that the MTP strategy has been validated in DeepSeek-V3, we adopt the same strategy for DeepSeek-V4 series without modification.[3]<p>[1]: <a href="https://arxiv.org/pdf/2412.19437#subsection.2.2" rel="nofollow">https://arxiv.org/pdf/2412.19437#subsection.2.2</a><p>[2]: <a href="https://arxiv.org/pdf/2412.19437#subsubsection.5.4.3" rel="nofollow">https://arxiv.org/pdf/2412.19437#subsubsection.5.4.3</a><p>[3]: <a href="https://arxiv.org/pdf/2606.19348v1#subsection.2.1" rel="nofollow">https://arxiv.org/pdf/2606.19348v1#subsection.2.1</a><p>Side comment: I feel you may be too cynical towards your fellow commenters.</p>
]]></description><pubDate>Sun, 21 Jun 2026 19:07:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=48621663</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48621663</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48621663</guid></item><item><title><![CDATA[New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"]]></title><description><![CDATA[
<p>Yep, those are the numbers I'm getting with DSv4 Flash on vLLM across 2 sparks.</p>
]]></description><pubDate>Sun, 21 Jun 2026 15:22:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48619730</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48619730</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48619730</guid></item><item><title><![CDATA[New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"]]></title><description><![CDATA[
<p>I suspect DwarfStar could probably squeeze more performance out of the single spark, maybe up closer to 20tok/s.<p>Moving to 2 sparks meant switching to vLLM with 2-way tensor parallelism and working multi-token prediction. The parallelism and MTP on top of better tuned kernels[1] gave an extremely nice boost! I was quite pleased. I've seen bursts up to 60tok/s at ~150k context - sometimes the MTP seems to really kick in (i.e. high acceptance rate on its tokens)<p>Currently running a custom vLLM build put together by some folks on the Nvidia forums[2], which speaks to how early support for the model is.<p>[1]: <a href="https://github.com/lukealonso/b12x" rel="nofollow">https://github.com/lukealonso/b12x</a><p>[2]: <a href="https://forums.developer.nvidia.com/t/372268" rel="nofollow">https://forums.developer.nvidia.com/t/372268</a></p>
]]></description><pubDate>Sun, 21 Jun 2026 15:10:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=48619610</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48619610</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48619610</guid></item><item><title><![CDATA[New comment by wolttam in "Two Qwen3 models on one DGX Spark: the residency math"]]></title><description><![CDATA[
<p>I started with antirez' DwarfStar[1] on one spark and that (~11-14tok/s generation, ~300-400 tok/s prompt processing) was enough of a taste for me to jump into 2 sparks, running the native quant of DSv4 Flash.<p>Now at 40-50tok/s generation and ~2000 tok/s prefill with a model that I've seen reason through race conditions and be able to trivially pull off any straight-forward coding task, and remain coherent at 500k context.  With a <i>preview</i> checkpoint of the weights!<p>I'm excited for the future of local LLMs. There is some buy-in but apparently not an extreme amount to get access to models that can stand in the for the giants on all but the most challenging and/or hands-off coding tasks.<p>[1]: <a href="https://github.com/antirez/ds4" rel="nofollow">https://github.com/antirez/ds4</a></p>
]]></description><pubDate>Sun, 21 Jun 2026 14:46:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=48619420</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48619420</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48619420</guid></item><item><title><![CDATA[New comment by wolttam in "GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2"]]></title><description><![CDATA[
<p>> it is clear that actual intelligence has plateaued significantly.<p>> Moving forward, the industry cannot continue to train bigger and bigger models since their intelligence not only plateaus but often will get worse<p>These are wild claims - why are we concluding that bigger models and more data = more hallucination? That’s actually the opposite of what’s been happening over the last couple years. Some models may still hallucinate more but they all hallucinate <i>much less</i> than the original 175B ChatGPT which was smaller and trained on (much) less data than anything current.<p>Edit: My mention of data comes from this quote:<p>> A shift is happening among major AI labs, who are becoming increasingly skeptical of endless parameter count and training data scaling<p>My take on the current situation: it seems clear that the industry has seen that there is still a lot left to squeeze out of sub-1T models. But for that you do need more, high-quality data in the distribution which you want to unlock capabilities for.</p>
]]></description><pubDate>Sat, 20 Jun 2026 12:32:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=48608771</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48608771</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48608771</guid></item><item><title><![CDATA[New comment by wolttam in "Is AI ruining our skills? Early results are in – and they're not good"]]></title><description><![CDATA[
<p>Humans will become individually and independently less skilled while having access to tools that allow them to do far more than even the most skilled human could, before having access to these tools.<p>I'm not sure if we'll become less <i>intelligent</i>. I think our sacks of neurons are gonna keep on making associations, just across a different set of topics.</p>
]]></description><pubDate>Fri, 19 Jun 2026 18:33:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48601657</link><dc:creator>wolttam</dc:creator><comments>https://news.ycombinator.com/item?id=48601657</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48601657</guid></item></channel></rss>