<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jchandra</title><link>https://news.ycombinator.com/user?id=jchandra</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 26 Jun 2026 08:23:34 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jchandra" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>Yeah, that’s consistent. topK keeps the obvious tokens, but subtle context gets eroded over time rather than dropped all at once.</p>
]]></description><pubDate>Tue, 21 Apr 2026 17:27:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47851811</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47851811</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47851811</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>Fair point, the gap isn’t huge in that plot, and both degrade at low ratios. 
The difference is more in how they degrade: TopK can have sharper, localized failures, while HAE tends to be a bit more smooth. That doesn’t always show up strongly in average MSE.<p>That said, the gains are modest right now, this is still a research prototype exploring the tradeoff, and there’s clearly more work to be done.</p>
]]></description><pubDate>Tue, 21 Apr 2026 17:25:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47851789</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47851789</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47851789</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>Thanks, really appreciate the pointer. Will dig into it.</p>
]]></description><pubDate>Tue, 21 Apr 2026 17:16:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47851656</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47851656</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47851656</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>Haha, that’s a very fair reading :)<p>Yeah, the latency hit is definitely real. That said, most of what I’ve run so far is CPU-bound, which likely exaggerates it quite a bit so I didn’t want to draw strong conclusions from that.<p>Would need proper GPU implementations to really understand where it lands.</p>
]]></description><pubDate>Tue, 21 Apr 2026 16:11:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47850823</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47850823</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47850823</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>I completely agree.Right now this is all on a synthetic setup to isolate the behavior and understand the reconstruction vs memory tradeoff. Real models will definitely behave differently.<p>I’ve started trying this out with actual models, but currently running things CPU-bound, so it’s pretty slow. Would ideally want to try this properly on GPU, but that gets expensive quickly<p>So yeah, still very much a research prototype — but validating this on real models/data is definitely the next step.</p>
]]></description><pubDate>Tue, 21 Apr 2026 16:08:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47850776</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47850776</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47850776</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>That’s a great point and yeah, I’d agree SVD itself isn’t new at all.<p>On downsides: definitely a few. The biggest one is latency - SVD is fairly heavy, so even though it’s amortized (runs periodically, not per token), it still adds noticeable overhead. It’s also more complex than simple pruning, and I haven’t validated how well this holds on real downstream tasks yet.<p>This is very much a research prototype right now more about exploring a different tradeoff space than something ready for production.</p>
]]></description><pubDate>Tue, 21 Apr 2026 15:57:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47850603</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47850603</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47850603</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>In this prototype, OLS + SVD isn’t per-token, it runs only when the recycle bin fills (amortized over multiple tokens).<p>That said, it’s still heavier than Top-K. I haven’t benchmarked end-to-end latency yet; this is mainly exploring the accuracy vs memory tradeoff.</p>
]]></description><pubDate>Sun, 19 Apr 2026 11:57:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47823676</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47823676</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47823676</guid></item><item><title><![CDATA[New comment by jchandra in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>I’ve been exploring KV cache optimization for LLM inference.<p>Most methods (Top-K, sliding window) prune tokens. This works on average, but fails selectively — a few tokens cause large errors when removed.<p>I tried reframing the problem as approximating the attention function: Attn(Q, K, V)<p>Prototype:
- entropy → identify weak tokens  
- OLS → reconstruct their contribution  
- SVD → compress them<p>Early results show lower error than Top-K at low memory, sometimes even lower memory overall.<p>This is still a small research prototype, would appreciate feedback or pointers to related work.</p>
]]></description><pubDate>Sun, 19 Apr 2026 11:36:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47823556</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47823556</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47823556</guid></item><item><title><![CDATA[High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction]]></title><description><![CDATA[
<p>Article URL: <a href="https://jchandra.com/posts/hae-ols/">https://jchandra.com/posts/hae-ols/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47823549">https://news.ycombinator.com/item?id=47823549</a></p>
<p>Points: 64</p>
<p># Comments: 17</p>
]]></description><pubDate>Sun, 19 Apr 2026 11:35:23 +0000</pubDate><link>https://jchandra.com/posts/hae-ols/</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=47823549</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47823549</guid></item><item><title><![CDATA[New comment by jchandra in "Hyperparameter Tuning Is a Resource Scheduling Problem"]]></title><description><![CDATA[
<p>Totally fair point — at the end of the day, it's all about getting the best model performance. I was mostly trying to highlight how, under the hood, a lot of modern HPO algos really boil down to smart scheduling decisions.</p>
]]></description><pubDate>Sun, 04 May 2025 14:22:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=43886876</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43886876</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43886876</guid></item><item><title><![CDATA[Hyperparameter Tuning Is a Resource Scheduling Problem]]></title><description><![CDATA[
<p>Article URL: <a href="https://jchandra.com/posts/hyperparameter-optimisation/">https://jchandra.com/posts/hyperparameter-optimisation/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43886817">https://news.ycombinator.com/item?id=43886817</a></p>
<p>Points: 2</p>
<p># Comments: 3</p>
]]></description><pubDate>Sun, 04 May 2025 14:14:28 +0000</pubDate><link>https://jchandra.com/posts/hyperparameter-optimisation/</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43886817</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43886817</guid></item><item><title><![CDATA[New comment by jchandra in "AI Supply Chain Attack: How Malicious Pickle Files Backdoor Models"]]></title><description><![CDATA[
<p>Pickle still is good for custom objects (JSON loses methods and also order), Graphs & circular refs (JSON breaks), Functions & lambdas (Essential for ML & distributed systems) and is provided out of box.</p>
]]></description><pubDate>Fri, 21 Mar 2025 15:41:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=43436973</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43436973</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43436973</guid></item><item><title><![CDATA[AI Supply Chain Attack: How Malicious Pickle Files Backdoor Models]]></title><description><![CDATA[
<p>Article URL: <a href="https://jchandra.com/posts/python-pickle/">https://jchandra.com/posts/python-pickle/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43426583">https://news.ycombinator.com/item?id=43426583</a></p>
<p>Points: 4</p>
<p># Comments: 7</p>
]]></description><pubDate>Thu, 20 Mar 2025 17:55:46 +0000</pubDate><link>https://jchandra.com/posts/python-pickle/</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43426583</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43426583</guid></item><item><title><![CDATA[New comment by jchandra in "How Pickle Files Backdoor AI Models"]]></title><description><![CDATA[
<p>pytorch save/load still are pickle based models. Its fine for trusted sources but when you start using from untrusted sources then there is always a risk of ACE. 
If you want to execute it, would suggest to try it in a sandbox env like docker, VM or online notebooks envs or other option is to inspect the model file.<p>As Open source AI booms, the risk of supply chain attacks also increases.</p>
]]></description><pubDate>Sat, 15 Mar 2025 17:20:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=43373871</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43373871</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43373871</guid></item><item><title><![CDATA[New comment by jchandra in "How Pickle Files Backdoor AI Models"]]></title><description><![CDATA[
<p>joblib is not fully secure because it still relies on Pickle internally. The reason it is slightly better in pickle is due to fact that pickle file gets immediately executed when it gets imported whereas joblib doesn’t execute code just by being imported.</p>
]]></description><pubDate>Sat, 15 Mar 2025 17:07:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=43373777</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43373777</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43373777</guid></item><item><title><![CDATA[How Pickle Files Backdoor AI Models]]></title><description><![CDATA[
<p>Article URL: <a href="https://jchandra.com/posts/python-pickle/">https://jchandra.com/posts/python-pickle/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43373711">https://news.ycombinator.com/item?id=43373711</a></p>
<p>Points: 6</p>
<p># Comments: 6</p>
]]></description><pubDate>Sat, 15 Mar 2025 16:57:37 +0000</pubDate><link>https://jchandra.com/posts/python-pickle/</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43373711</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43373711</guid></item><item><title><![CDATA[New comment by jchandra in "We built a modern data stack from scratch and reduced our bill by 70%"]]></title><description><![CDATA[
<p>2</p>
]]></description><pubDate>Mon, 10 Mar 2025 05:06:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=43317171</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43317171</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43317171</guid></item><item><title><![CDATA[New comment by jchandra in "We built a modern data stack from scratch and reduced our bill by 70%"]]></title><description><![CDATA[
<p>our approach wasn’t about over-engineering, we were trying to leverage our existing investments (like Confluent BYOC) while optimizing for flexibility, cost, and performance. We wanted to stay loosely coupled to adapt to cloud restrictions across multiple geographic deployments.</p>
]]></description><pubDate>Mon, 10 Mar 2025 05:03:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=43317158</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43317158</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43317158</guid></item><item><title><![CDATA[New comment by jchandra in "We built a Modern Data Stack from scratch and reduced our bill by 70%"]]></title><description><![CDATA[
<p>We did have a discussion on Self vs Managed and TCOs associated with it.
1> We have multi regional setup so it came up with Data Sovereignty requirements.
2> Vendor Lock ins - Few of the services were not available in that geographic region
3> With managed services, you often pay for capacity you might not always use. our workloads were often consistent and predictable, so self managed solutions helped in fine tuning our resources.
4> One og the goal was to keep our storage and compute loosely coupled while staying Iceberg-compatible for flexibility. Whether it’s Trino today or Snowflake/Databricks tomorrow, we aren’t locked in.</p>
]]></description><pubDate>Sun, 09 Mar 2025 18:55:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43312396</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43312396</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43312396</guid></item><item><title><![CDATA[New comment by jchandra in "We built a modern data stack from scratch and reduced our bill by 70%"]]></title><description><![CDATA[
<p>As for BigQuery, while it's a great tool, we faced challenges with high-volume, small queries where costs became unpredictable as it is priced per data volume scanned. Clustered tables, Materialised views helped to some extent, but they didn’t fully mitigate the overhead for our specific workloads. There are ways to overcome and optimize it for sure so i wouldn't exactly put it on GBQ or any limitations.<p>It’s always a trade-off, and we made the call that best fit our scale, workloads, and long-term plans</p>
]]></description><pubDate>Sun, 09 Mar 2025 18:44:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=43312290</link><dc:creator>jchandra</dc:creator><comments>https://news.ycombinator.com/item?id=43312290</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43312290</guid></item></channel></rss>