Hacker News: jchandra

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Tue, 21 Apr 2026 17:27:32 +0000

Yeah, that’s consistent. topK keeps the obvious tokens, but subtle context gets eroded over time rather than dropped all at once.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Tue, 21 Apr 2026 17:25:27 +0000

Fair point, the gap isn’t huge in that plot, and both degrade at low ratios. The difference is more in how they degrade: TopK can have sharper, localized failures, while HAE tends to be a bit more smooth. That doesn’t always show up strongly in average MSE.

That said, the gains are modest right now, this is still a research prototype exploring the tradeoff, and there’s clearly more work to be done.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Tue, 21 Apr 2026 17:16:15 +0000

Thanks, really appreciate the pointer. Will dig into it.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Tue, 21 Apr 2026 16:11:16 +0000

Haha, that’s a very fair reading :)

Yeah, the latency hit is definitely real. That said, most of what I’ve run so far is CPU-bound, which likely exaggerates it quite a bit so I didn’t want to draw strong conclusions from that.

Would need proper GPU implementations to really understand where it lands.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Tue, 21 Apr 2026 16:08:23 +0000

I completely agree.Right now this is all on a synthetic setup to isolate the behavior and understand the reconstruction vs memory tradeoff. Real models will definitely behave differently.

I’ve started trying this out with actual models, but currently running things CPU-bound, so it’s pretty slow. Would ideally want to try this properly on GPU, but that gets expensive quickly

So yeah, still very much a research prototype — but validating this on real models/data is definitely the next step.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Tue, 21 Apr 2026 15:57:16 +0000

That’s a great point and yeah, I’d agree SVD itself isn’t new at all.

On downsides: definitely a few. The biggest one is latency - SVD is fairly heavy, so even though it’s amortized (runs periodically, not per token), it still adds noticeable overhead. It’s also more complex than simple pruning, and I haven’t validated how well this holds on real downstream tasks yet.

This is very much a research prototype right now more about exploring a different tradeoff space than something ready for production.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Sun, 19 Apr 2026 11:57:37 +0000

In this prototype, OLS + SVD isn’t per-token, it runs only when the recycle bin fills (amortized over multiple tokens).

That said, it’s still heavier than Top-K. I haven’t benchmarked end-to-end latency yet; this is mainly exploring the accuracy vs memory tradeoff.

New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

jchandra — Sun, 19 Apr 2026 11:36:37 +0000

I’ve been exploring KV cache optimization for LLM inference.

Most methods (Top-K, sliding window) prune tokens. This works on average, but fails selectively — a few tokens cause large errors when removed.

I tried reframing the problem as approximating the attention function: Attn(Q, K, V)

Prototype: - entropy → identify weak tokens - OLS → reconstruct their contribution - SVD → compress them

Early results show lower error than Top-K at low memory, sometimes even lower memory overall.

This is still a small research prototype, would appreciate feedback or pointers to related work.

High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction

jchandra — Sun, 19 Apr 2026 11:35:23 +0000

Article URL: https://jchandra.com/posts/hae-ols/

Comments URL: https://news.ycombinator.com/item?id=47823549

Points: 64

# Comments: 17

New comment by jchandra in "Hyperparameter Tuning Is a Resource Scheduling Problem"

jchandra — Sun, 04 May 2025 14:22:10 +0000

Totally fair point — at the end of the day, it's all about getting the best model performance. I was mostly trying to highlight how, under the hood, a lot of modern HPO algos really boil down to smart scheduling decisions.

Hyperparameter Tuning Is a Resource Scheduling Problem

jchandra — Sun, 04 May 2025 14:14:28 +0000

Article URL: https://jchandra.com/posts/hyperparameter-optimisation/

Comments URL: https://news.ycombinator.com/item?id=43886817

Points: 2

# Comments: 3

New comment by jchandra in "AI Supply Chain Attack: How Malicious Pickle Files Backdoor Models"

jchandra — Fri, 21 Mar 2025 15:41:06 +0000

Pickle still is good for custom objects (JSON loses methods and also order), Graphs & circular refs (JSON breaks), Functions & lambdas (Essential for ML & distributed systems) and is provided out of box.

AI Supply Chain Attack: How Malicious Pickle Files Backdoor Models

jchandra — Thu, 20 Mar 2025 17:55:46 +0000

Article URL: https://jchandra.com/posts/python-pickle/

Comments URL: https://news.ycombinator.com/item?id=43426583

Points: 4

# Comments: 7

New comment by jchandra in "How Pickle Files Backdoor AI Models"

jchandra — Sat, 15 Mar 2025 17:20:59 +0000

pytorch save/load still are pickle based models. Its fine for trusted sources but when you start using from untrusted sources then there is always a risk of ACE. If you want to execute it, would suggest to try it in a sandbox env like docker, VM or online notebooks envs or other option is to inspect the model file.

As Open source AI booms, the risk of supply chain attacks also increases.

New comment by jchandra in "How Pickle Files Backdoor AI Models"

jchandra — Sat, 15 Mar 2025 17:07:14 +0000

joblib is not fully secure because it still relies on Pickle internally. The reason it is slightly better in pickle is due to fact that pickle file gets immediately executed when it gets imported whereas joblib doesn’t execute code just by being imported.

How Pickle Files Backdoor AI Models

jchandra — Sat, 15 Mar 2025 16:57:37 +0000

Article URL: https://jchandra.com/posts/python-pickle/

Comments URL: https://news.ycombinator.com/item?id=43373711

Points: 6

# Comments: 6

New comment by jchandra in "We built a modern data stack from scratch and reduced our bill by 70%"

jchandra — Mon, 10 Mar 2025 05:06:34 +0000

New comment by jchandra in "We built a modern data stack from scratch and reduced our bill by 70%"

jchandra — Mon, 10 Mar 2025 05:03:32 +0000

our approach wasn’t about over-engineering, we were trying to leverage our existing investments (like Confluent BYOC) while optimizing for flexibility, cost, and performance. We wanted to stay loosely coupled to adapt to cloud restrictions across multiple geographic deployments.

New comment by jchandra in "We built a Modern Data Stack from scratch and reduced our bill by 70%"

jchandra — Sun, 09 Mar 2025 18:55:33 +0000

We did have a discussion on Self vs Managed and TCOs associated with it. 1> We have multi regional setup so it came up with Data Sovereignty requirements. 2> Vendor Lock ins - Few of the services were not available in that geographic region 3> With managed services, you often pay for capacity you might not always use. our workloads were often consistent and predictable, so self managed solutions helped in fine tuning our resources. 4> One og the goal was to keep our storage and compute loosely coupled while staying Iceberg-compatible for flexibility. Whether it’s Trino today or Snowflake/Databricks tomorrow, we aren’t locked in.

New comment by jchandra in "We built a modern data stack from scratch and reduced our bill by 70%"

jchandra — Sun, 09 Mar 2025 18:44:41 +0000

As for BigQuery, while it's a great tool, we faced challenges with high-volume, small queries where costs became unpredictable as it is priced per data volume scanned. Clustered tables, Materialised views helped to some extent, but they didn’t fully mitigate the overhead for our specific workloads. There are ways to overcome and optimize it for sure so i wouldn't exactly put it on GBQ or any limitations.

It’s always a trade-off, and we made the call that best fit our scale, workloads, and long-term plans