Hacker News: throwdbaaway

New comment by throwdbaaway in "DeepSeek makes the V4 Pro price discount permanent"

throwdbaaway — Sat, 23 May 2026 01:28:45 +0000

And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.

Stats from pi:

↑400k ↓438k R432M 71.9%/1.0M

Half a billion tokens, $2.12

New comment by throwdbaaway in "A few words on DS4"

throwdbaaway — Fri, 15 May 2026 07:04:12 +0000

Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.

New comment by throwdbaaway in "An AI agent deleted our production database. The agent's confession is below"

throwdbaaway — Mon, 27 Apr 2026 01:23:11 +0000

Huh that's not what I gathered from the tweet at all. If I am going to write a five why's analysis, the immediate cause is the LLM wrongly decided to delete a volume, while the root cause is the bad design to co-locate staging and production data in the same volume. The writing was quite vague though, let's wait for a response from railway.

New comment by throwdbaaway in "An AI agent deleted our production database. The agent's confession is below"

throwdbaaway — Sun, 26 Apr 2026 21:57:55 +0000

If I understand correctly, both the staging database and the production database share the same volume. Thus, production data was gone as well after deleting the volume.

1st hint - the API call only contains one volume:

    curl -X POST https://backboard.railway.app/graphql/v2 \
      -H "Authorization: Bearer [token]" \
      -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}'

2nd hint - this gem from the tweet:

> No "this volume contains production data, are you sure?"

New comment by throwdbaaway in "An update on recent Claude Code quality reports"

throwdbaaway — Thu, 23 Apr 2026 21:54:16 +0000

Should be about 10~20 GiB per session. Save/restore is exactly what DeepSeek does using its 3FS distributed filesystem: https://github.com/deepseek-ai/3fs#3-kvcache

With this much cheaper setup backed by disks, they can offer much better caching experience:

> Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.

New comment by throwdbaaway in "Qwen3.6-35B-A3B: Agentic coding power, now open to all"

throwdbaaway — Thu, 16 Apr 2026 22:58:20 +0000

Based on the release schedule of 3.5 previously, my optimistic take is that they distill the small models from the 397B, and it is much faster to distill a sparse A3B model. Hopefully the other variants will be released in the coming days.

New comment by throwdbaaway in "Does Gas Town 'steal' usage from users' LLM credits to improve itself?"

throwdbaaway — Wed, 15 Apr 2026 23:23:09 +0000

His Vibe Coding book is invaluable as a textbook example of slop.

New comment by throwdbaaway in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"

throwdbaaway — Sun, 12 Apr 2026 15:28:10 +0000

https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)

New comment by throwdbaaway in "Research-Driven Agents: When an agent reads before it codes"

throwdbaaway — Fri, 10 Apr 2026 03:45:10 +0000

> EC2 instances on shared hardware showed up to 30% variance between runs due to noisy neighbors.

Based on this finding, I suppose the better way is to rely on local hardware whenever possible?

New comment by throwdbaaway in "Research-Driven Agents: When an agent reads before it codes"

throwdbaaway — Fri, 10 Apr 2026 02:59:11 +0000

Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.

New comment by throwdbaaway in "GLM-5.1: Towards Long-Horizon Tasks"

throwdbaaway — Tue, 07 Apr 2026 21:19:02 +0000

https://github.com/THUDM/IndexCache - Might be some expected issue when rolling out this. They don't have enough compute, and have to innovate.

New comment by throwdbaaway in "Can I run AI locally?"

throwdbaaway — Sat, 14 Mar 2026 08:12:23 +0000

90% of what you pay in agentic coding is for cached reads, which are free with local inference serving one user. This is well known in r/LocalLLaMA for ages, and an article about this also hit HN front page few weeks ago.

New comment by throwdbaaway in "No, it doesn't cost Anthropic $5k per Claude Code user"

throwdbaaway — Wed, 11 Mar 2026 14:49:57 +0000

What about the VRAM requirement for KV cache? That may matter more than memory bandwidth. With these GPUs, there are more compute capacity than memory bandwidth than VRAM.

DeepSeek got MLA, and then DSA. Qwen got gated delta-net. These inventions allow efficient inference both at home and at scale. If Anthropic got nothing here, then their inference cost can be much higher.

DeepSeek also got https://github.com/deepseek-ai/3FS that makes cached reads a lot cheaper with way longer TTL. If Anthropic didn't need to invent and uses some expensive solution like Redis, as indicated by the crappy TTL, then that also contributes to higher inference cost.

New comment by throwdbaaway in "How to run Qwen 3.5 locally"

throwdbaaway — Sun, 08 Mar 2026 11:31:58 +0000

Yours is the only benchmark that puts 35B A3B above 27B. Time for human judgement to verify? For example, if you look at the thinking traces, there might be logical inconsistencies in the prompts, which then tripped up the 27B more when reasoning. This will also be reflected in the score when thinking is disabled, but we can sort of debug with the thinking traces.

New comment by throwdbaaway in "How to run Qwen 3.5 locally"

throwdbaaway — Sun, 08 Mar 2026 09:01:04 +0000

Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context.

35B A3B is faster but didn't do too well in my limited testing.

New comment by throwdbaaway in "How to run Qwen 3.5 locally"

throwdbaaway — Sun, 08 Mar 2026 07:02:01 +0000

There are Qwen3.5 27B quants in the range of 4 bits per weight, which fits into 16G of VRAM. The quality is comparable to Sonnet 4.0 from summer 2025. Inference speed is very good with ik_llama.cpp, and still decent with mainline llama.cpp.

New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

throwdbaaway — Sun, 01 Mar 2026 04:32:04 +0000

I don't quite get the low temperature coupled with the high penalty. We get thinking loop due to low temperature, and we then counter it with high penalty. That seems backward.

For Qwen3.5 27B, I got good result with --temp 1.0 --top-p 1.0 --top-k 40 --min-p 0.2, without penalty. It allows the model to explore (temp, top-p, top-k) without going off the rail (min-p) during reasoning. No loop so far.

New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

throwdbaaway — Sun, 01 Mar 2026 04:12:08 +0000

We are all reasonable people here, and while you are (mostly) correct, I think we can all agree that Anthropic documentation sucks. If I have to infer from the doc:

* Haiku 4.5 by default doesn't think, i.e. it has a default thinking budget of 0.

* By setting a non-zero thinking budget, Haiku 4.5 can think. My guess is that Claude Code may set this differently for different tasks, e.g. thinking for Explore, no thinking for Compact.

* This hybrid thinking is different from the adaptive thinking introduced in Opus 4.6, which when enabled, can automatically adjust the thinking level based on task difficulty.

New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

throwdbaaway — Sun, 01 Mar 2026 03:47:23 +0000

For 27B, just get a used 3090 and hop on to r/LocalLLaMA. You can run a 4bpw quant at full context with Q8 KV cache.

New comment by throwdbaaway in "Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers"

throwdbaaway — Sun, 01 Mar 2026 03:02:21 +0000

I would say 27B matches with Sonnet 4.0, while 397B A17B matches with Opus 4.1. They are indeed nowhere near Sonnet 4.5, but getting 262144 context length at good speed with modest hardware is huge for local inference.

Will check your updated ranking on Monday.