<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: throwdbaaway</title><link>https://news.ycombinator.com/user?id=throwdbaaway</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 16 Jun 2026 14:58:20 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=throwdbaaway" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by throwdbaaway in "DeepSeek makes the V4 Pro price discount permanent"]]></title><description><![CDATA[
<p>And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.<p>Stats from pi:<p>↑400k ↓438k R432M 71.9%/1.0M<p>Half a billion tokens, $2.12</p>
]]></description><pubDate>Sat, 23 May 2026 01:28:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48243628</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=48243628</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48243628</guid></item><item><title><![CDATA[New comment by throwdbaaway in "A few words on DS4"]]></title><description><![CDATA[
<p>Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.</p>
]]></description><pubDate>Fri, 15 May 2026 07:04:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48145419</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=48145419</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48145419</guid></item><item><title><![CDATA[New comment by throwdbaaway in "An AI agent deleted our production database. The agent's confession is below"]]></title><description><![CDATA[
<p>Huh that's not what I gathered from the tweet at all. If I am going to write a five why's analysis, the immediate cause is the LLM wrongly decided to delete a volume, while the root cause is the bad design to co-locate staging and production data in the same volume. The writing was quite vague though, let's wait for a response from railway.</p>
]]></description><pubDate>Mon, 27 Apr 2026 01:23:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47916712</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47916712</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47916712</guid></item><item><title><![CDATA[New comment by throwdbaaway in "An AI agent deleted our production database. The agent's confession is below"]]></title><description><![CDATA[
<p>If I understand correctly, both the staging database and the production database share the same volume. Thus, production data was gone as well after deleting the volume.<p>1st hint - the API call only contains one volume:<p><pre><code>    curl -X POST https://backboard.railway.app/graphql/v2 \
      -H "Authorization: Bearer [token]" \
      -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}'
</code></pre>
2nd hint - this gem from the tweet:<p>> No "this volume contains production data, are you sure?"</p>
]]></description><pubDate>Sun, 26 Apr 2026 21:57:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47915102</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47915102</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47915102</guid></item><item><title><![CDATA[New comment by throwdbaaway in "An update on recent Claude Code quality reports"]]></title><description><![CDATA[
<p>Should be about 10~20 GiB per session. Save/restore is exactly what DeepSeek does using its 3FS distributed filesystem: <a href="https://github.com/deepseek-ai/3fs#3-kvcache" rel="nofollow">https://github.com/deepseek-ai/3fs#3-kvcache</a><p>With this much cheaper setup backed by disks, they can offer much better caching experience:<p>> Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.</p>
]]></description><pubDate>Thu, 23 Apr 2026 21:54:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47882623</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47882623</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47882623</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Qwen3.6-35B-A3B: Agentic coding power, now open to all"]]></title><description><![CDATA[
<p>Based on the release schedule of 3.5 previously, my optimistic take is that they distill the small models from the 397B, and it is much faster to distill a sparse A3B model. Hopefully the other variants will be released in the coming days.</p>
]]></description><pubDate>Thu, 16 Apr 2026 22:58:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47800594</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47800594</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47800594</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Does Gas Town 'steal' usage from users' LLM credits to improve itself?"]]></title><description><![CDATA[
<p>His Vibe Coding book is invaluable as a textbook example of slop.</p>
]]></description><pubDate>Wed, 15 Apr 2026 23:23:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47786670</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47786670</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47786670</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"]]></title><description><![CDATA[
<p><a href="https://github.com/anthropics/claude-code/issues/46829#issuecomment-4231266649" rel="nofollow">https://github.com/anthropics/claude-code/issues/46829#issue...</a> - Have you checked with your colleague? (and his AI, of course)</p>
]]></description><pubDate>Sun, 12 Apr 2026 15:28:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47740856</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47740856</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47740856</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Research-Driven Agents: When an agent reads before it codes"]]></title><description><![CDATA[
<p>> EC2 instances on shared hardware showed up to 30% variance between runs due to noisy neighbors.<p>Based on this finding, I suppose the better way is to rely on local hardware whenever possible?</p>
]]></description><pubDate>Fri, 10 Apr 2026 03:45:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47713368</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47713368</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47713368</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Research-Driven Agents: When an agent reads before it codes"]]></title><description><![CDATA[
<p>Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.</p>
]]></description><pubDate>Fri, 10 Apr 2026 02:59:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47713094</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47713094</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47713094</guid></item><item><title><![CDATA[New comment by throwdbaaway in "GLM-5.1: Towards Long-Horizon Tasks"]]></title><description><![CDATA[
<p><a href="https://github.com/THUDM/IndexCache" rel="nofollow">https://github.com/THUDM/IndexCache</a> - Might be some expected issue when rolling out this. They don't have enough compute, and have to innovate.</p>
]]></description><pubDate>Tue, 07 Apr 2026 21:19:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47681493</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47681493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47681493</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Can I run AI locally?"]]></title><description><![CDATA[
<p>90% of what you pay in agentic coding is for cached reads, which are free with local inference serving one user. This is well known in r/LocalLLaMA for ages, and an article about this also hit HN front page few weeks ago.</p>
]]></description><pubDate>Sat, 14 Mar 2026 08:12:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47374424</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47374424</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47374424</guid></item><item><title><![CDATA[New comment by throwdbaaway in "No, it doesn't cost Anthropic $5k per Claude Code user"]]></title><description><![CDATA[
<p>What about the VRAM requirement for KV cache? That may matter more than memory bandwidth. With these GPUs, there are more compute capacity than memory bandwidth than VRAM.<p>DeepSeek got MLA, and then DSA. Qwen got gated delta-net. These inventions allow efficient inference both at home and at scale. If Anthropic got nothing here, then their inference cost can be much higher.<p>DeepSeek also got <a href="https://github.com/deepseek-ai/3FS" rel="nofollow">https://github.com/deepseek-ai/3FS</a> that makes cached reads a lot cheaper with way longer TTL. If Anthropic didn't need to invent and uses some expensive solution like Redis, as indicated by the crappy TTL, then that also contributes to higher inference cost.</p>
]]></description><pubDate>Wed, 11 Mar 2026 14:49:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47336346</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47336346</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47336346</guid></item><item><title><![CDATA[New comment by throwdbaaway in "How to run Qwen 3.5 locally"]]></title><description><![CDATA[
<p>Yours is the only benchmark that puts 35B A3B above 27B. Time for human judgement to verify? For example, if you look at the thinking traces, there might be logical inconsistencies in the prompts, which then tripped up the 27B more when reasoning. This will also be reflected in the score when thinking is disabled, but we can sort of debug with the thinking traces.</p>
]]></description><pubDate>Sun, 08 Mar 2026 11:31:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47296501</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47296501</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47296501</guid></item><item><title><![CDATA[New comment by throwdbaaway in "How to run Qwen 3.5 locally"]]></title><description><![CDATA[
<p>Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context.<p>35B A3B is faster but didn't do too well in my limited testing.</p>
]]></description><pubDate>Sun, 08 Mar 2026 09:01:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47295741</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47295741</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47295741</guid></item><item><title><![CDATA[New comment by throwdbaaway in "How to run Qwen 3.5 locally"]]></title><description><![CDATA[
<p>There are Qwen3.5 27B quants in the range of 4 bits per weight, which fits into 16G of VRAM. The quality is comparable to Sonnet 4.0 from summer 2025. Inference speed is very good with ik_llama.cpp, and still decent with mainline llama.cpp.</p>
]]></description><pubDate>Sun, 08 Mar 2026 07:02:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47295236</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47295236</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47295236</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>I don't quite get the low temperature coupled with the high penalty. We get thinking loop due to low temperature, and we then counter it with high penalty. That seems backward.<p>For Qwen3.5 27B, I got good result with --temp 1.0 --top-p 1.0 --top-k 40 --min-p 0.2, without penalty. It allows the model to explore (temp, top-p, top-k) without going off the rail (min-p) during reasoning. No loop so far.</p>
]]></description><pubDate>Sun, 01 Mar 2026 04:32:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47203738</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47203738</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47203738</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>We are all reasonable people here, and while you are (mostly) correct, I think we can all agree that Anthropic documentation sucks. If I have to infer from the  doc:<p>* Haiku 4.5 by default doesn't think, i.e. it has a default thinking budget of 0.<p>* By setting a non-zero thinking budget, Haiku 4.5 can think. My guess is that Claude Code may set this differently for different tasks, e.g. thinking for Explore, no thinking for Compact.<p>* This hybrid thinking is different from the adaptive thinking introduced in Opus 4.6, which when enabled, can automatically adjust the thinking level based on task difficulty.</p>
]]></description><pubDate>Sun, 01 Mar 2026 04:12:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47203617</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47203617</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47203617</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>For 27B, just get a used 3090 and hop on to r/LocalLLaMA. You can run a 4bpw quant at full context with Q8 KV cache.</p>
]]></description><pubDate>Sun, 01 Mar 2026 03:47:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47203488</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47203488</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47203488</guid></item><item><title><![CDATA[New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"]]></title><description><![CDATA[
<p>I would say 27B matches with Sonnet 4.0, while 397B A17B matches with Opus 4.1. They are indeed nowhere near Sonnet 4.5, but getting 262144 context length at good speed with modest hardware is huge for local inference.<p>Will check your updated ranking on Monday.</p>
]]></description><pubDate>Sun, 01 Mar 2026 03:02:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47203209</link><dc:creator>throwdbaaway</dc:creator><comments>https://news.ycombinator.com/item?id=47203209</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47203209</guid></item></channel></rss>