Hacker News: cold_harbor

New comment by cold_harbor in "DeepSeek makes the V4 Pro price discount permanent"

cold_harbor — Fri, 22 May 2026 17:35:07 +0000

their MLA architecture cuts KV cache by ~5-13x vs standard attention. that's why inference is actually cheaper to run, not just a price war to gain market share.

New comment by cold_harbor in "CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs"

cold_harbor — Fri, 22 May 2026 14:32:18 +0000

synthesis-only is the hard part. with execution feedback — run, profile, patch — the gap closes fast. it's basically an RL problem in disguise

New comment by cold_harbor in "Was my $48K GPU server worth it?"

cold_harbor — Fri, 22 May 2026 14:32:01 +0000

missing from most of these cost discussions: privacy. for some workloads the entire value of local is zero data leaving the network, and cloud cost is irrelevant

New comment by cold_harbor in "Learnings from 100K lines of Rust with AI (2025)"

cold_harbor — Thu, 21 May 2026 17:39:47 +0000

with Rust the failure mode isnt wrong code, it's unidiomatic code. .clone() everywhere will compile fine but you'll feel it later

New comment by cold_harbor in "Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)"

cold_harbor — Thu, 21 May 2026 17:37:56 +0000

the reason 50GB swap is even viable here is Apple Silicon's memory bandwidth. on x86 that much swap would make inference unusably slow

New comment by cold_harbor in "PyTorch Landscape"

cold_harbor — Tue, 19 May 2026 11:39:27 +0000

JAX is brilliant for research but the debugging story is still rough compared to PyTorch. eager mode + native Python exceptions win for most people.

New comment by cold_harbor in "The last six months in LLMs in five minutes"

cold_harbor — Tue, 19 May 2026 11:38:40 +0000

for non-coders: local AI. a couple years ago you needed a dedicated GPU rig. now a 30B model fits on a laptop and runs offline.

New comment by cold_harbor in "Where Are the Vibecoded Photoshops?"

cold_harbor — Mon, 18 May 2026 11:23:56 +0000

the bottleneck is precise control. diffusion models are great at generation but bad at 'change only this region, preserve everything else exactly' — that constraint keeps Photoshop alive.

New comment by cold_harbor in "CUDA Books"

cold_harbor — Mon, 18 May 2026 11:23:25 +0000

for LLM work, reading the Flash Attention and vLLM kernel source taught me more than any book. real code makes memory hierarchy concrete — books stay too abstract.