Hacker News: Palmik

New comment by Palmik in "DeepSeek makes the V4 Pro price discount permanent"

Palmik — Sat, 23 May 2026 08:25:06 +0000

DeepSeek V4's KV cache is very efficient due to its heavily compressed and sparse attention architecture.

DeepSeek V3.2 which uses DSA only (sparse attention, but without compression from HCA and CSA) is a smaller model but uses 10x more memory at 1M context window compared to DS V4 Pro.

Also, I have to say, DeepSeek's API has a very good cache hit rate. With the same workload, I see ~80% KV cache hit rate with the DS API vs ~50% with the major western inference providers for open weight models.

New comment by Palmik in "DeepSeek makes the V4 Pro price discount permanent"

Palmik — Sat, 23 May 2026 08:18:51 +0000

I really hope Huawei ramps up Ascend production and DeepSeek open sources their optimized inference engine (they already open source a lot of their kernels -- kudos to them). This could shake things up.

New comment by Palmik in "DeepSeek makes the V4 Pro price discount permanent"

Palmik — Sat, 23 May 2026 08:17:16 +0000

There are several things at play:

Inference stack efficiency: Many of these providers take off the shelf sglang / vllm / trtllm and hope for the best. Meanwhile DeepSeek team is known for pushing the boundary of optimizations.

Now, sglang and vllm are great pieces of software, but take DeepSeek's Sparse Attention (DSA). Introduced 1.5 years ago (https://arxiv.org/abs/2512.02556), used by DeepSeek 3.2, GLM 5, DeepSeek V4. Only now is it slowly strating to get optimized in the major inference engines: (https://github.com/sgl-project/sglang/issues/19380 https://github.com/sgl-project/sglang/pull/22851 etc.). Of course, DS V4 adds extra optimizations into the model architecture on top of DSA, and those will take more time to be taken full advantage of by the open source inference engines.

Privacy: Betting that people will pay extra for inference hosted outside China. This is especially true with DeepSeek, because DeepSeek is transparent about using API data for model improvements.

And few other things (scale (matters a lot for MoEs), reliability, soft enterprise lock in, etc.)

---

There is also, likely, tacit collusion at play here. Look at GLM 5 and GLM 5.1 prices. GLM 5 and 5.1 cost the same to run, but providers decided to charge much more for 5.1 because it is much better model, and because Z.AI raised their price as well.

House Committees Probe Cursor Parent, Airbnb over Chinese AI

Palmik — Wed, 06 May 2026 08:20:09 +0000

Article URL: https://www.semafor.com/article/04/29/2026/house-committee-probes-cursor-parent-airbnb-over-chinese-ai

Comments URL: https://news.ycombinator.com/item?id=48033664

Points: 2

# Comments: 0

New comment by Palmik in "DeepSeek V4 – almost on the frontier"

Palmik — Sun, 03 May 2026 05:50:27 +0000

Why was the title changed from "DeepSeek V4—almost on the frontier, a fraction of the price" to "DeepSeek V4—almost on the frontier"?

New comment by Palmik in "Anthropic Joins the Blender Development Fund as Corporate Patron"

Palmik — Tue, 28 Apr 2026 19:55:27 +0000

Surely art also exists in textual realm.

Anthropic Claude Code HERMES.md billing flaw

Palmik — Tue, 28 Apr 2026 07:39:14 +0000

Article URL: https://consumerrights.wiki/w/Anthropic_Claude_Code_HERMES.md_billing_flaw

Comments URL: https://news.ycombinator.com/item?id=47931492

Points: 1

# Comments: 0

New comment by Palmik in "DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles"

Palmik — Sun, 26 Apr 2026 05:40:42 +0000

I don't think "friendly" and "publishing benchmarks" are at odds with each other.

Model makers (both open and closed weight) typically publish benchmarks against other models and when they do not, people rightfully call them out.

Including comparison against "other OSS engine" is just not helpful (what if it's a sandbagged baseline like HF Transformers?)

New comment by Palmik in "DeepSeek-V4 on Day 0: From Fast Inference to Verified RL with SGLang and Miles"

Palmik — Sun, 26 Apr 2026 04:48:34 +0000

Bechmarks from InferenceX (they do not have apples-to-apples setups to compare the different engines for whatever reason): https://inferencex.semianalysis.com/inference?i_hc=1&g_model...

I find it odd that sglang, vLLM, TRTLLM don't seem to want to publish benchmarks comparing each other. They used to, but now there seems to be some unspoken rule against it.

At least we get comparison against "other OSS engine" this time, but that could be HF's Transformers as well :)

DeepSeek V4 in vLLM: Efficient Long-Context Attention

Palmik — Sat, 25 Apr 2026 15:02:53 +0000

Article URL: https://vllm-website-pdzeaspbm-inferact-inc.vercel.app/blog/deepseek-v4

Comments URL: https://news.ycombinator.com/item?id=47902025

Points: 2

# Comments: 0

New comment by Palmik in "DeepSeek v4"

Palmik — Fri, 24 Apr 2026 09:58:26 +0000

Or there will be DSv4.1/2/3 ;)

New comment by Palmik in "ChatGPT Images 2.0"

Palmik — Wed, 22 Apr 2026 20:47:01 +0000

Misleading conclusion.

This model is 8 times cheaper than Gemini for 1K images. Gemini is extremely overpriced.

1K image with Gemini is roughly $0.08 and only $0.01 with GPT Image.

New comment by Palmik in "ChatGPT Images 2.0"

Palmik — Wed, 22 Apr 2026 16:38:52 +0000

Did you enable thinking for your experiment? Are you sure you were on the 2.0 rather than 1.5 version?

New comment by Palmik in "ChatGPT Images 2.0"

Palmik — Wed, 22 Apr 2026 08:47:54 +0000

I do not think this is a good prompt or useful benchmark, but nonetheless, it seems to work better for me: https://chatgpt.com/share/69e88a94-ded8-8395-b5dc-abceb2f44d...