<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: LuxBennu</title><link>https://news.ycombinator.com/user?id=LuxBennu</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 03:33:04 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=LuxBennu" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Show HN: CtxGov – see what instructions your AI agent inherits before it runs]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/ctxgov/ctxgov">https://github.com/ctxgov/ctxgov</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48678976">https://news.ycombinator.com/item?id=48678976</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 25 Jun 2026 20:47:38 +0000</pubDate><link>https://github.com/ctxgov/ctxgov</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=48678976</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48678976</guid></item><item><title><![CDATA[Show HN: CtxGov – drop in AI memory files, get an influence-boundary report]]></title><description><![CDATA[
<p>Article URL: <a href="https://ctxgov.github.io/ctxgov/memory-state-influence-boundary-try-in-5-minutes.htmlhttps://ctxgov.github.io/ctxgov/memory-state-influence-boundary-try-in-5-minutes.html">https://ctxgov.github.io/ctxgov/memory-state-influence-boundary-try-in-5-minutes.htmlhttps://ctxgov.github.io/ctxgov/memory-state-influence-boundary-try-in-5-minutes.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48573815">https://news.ycombinator.com/item?id=48573815</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 17 Jun 2026 17:37:46 +0000</pubDate><link>https://ctxgov.github.io/ctxgov/memory-state-influence-boundary-try-in-5-minutes.htmlhttps://ctxgov.github.io/ctxgov/memory-state-influence-boundary-try-in-5-minutes.html</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=48573815</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48573815</guid></item><item><title><![CDATA[Show HN: CtxGov – a local claim firewall for AI memory claims]]></title><description><![CDATA[
<p>Article URL: <a href="https://ctxgov.github.io/ctxgov/try-in-5-minutes.html">https://ctxgov.github.io/ctxgov/try-in-5-minutes.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48482262">https://news.ycombinator.com/item?id=48482262</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 10 Jun 2026 20:33:14 +0000</pubDate><link>https://ctxgov.github.io/ctxgov/try-in-5-minutes.html</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=48482262</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48482262</guid></item><item><title><![CDATA[Show HN: CtxVault – receipts for AI context, not another memory store]]></title><description><![CDATA[
<p>Article URL: <a href="https://ctxvault.github.io/ctxvault/">https://ctxvault.github.io/ctxvault/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48149113">https://news.ycombinator.com/item?id=48149113</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 15 May 2026 14:30:03 +0000</pubDate><link>https://ctxvault.github.io/ctxvault/</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=48149113</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48149113</guid></item><item><title><![CDATA[Show HN: CtxVault – local receipts for AI context handoffs]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/ctxvault/ctxvault">https://github.com/ctxvault/ctxvault</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48050048">https://news.ycombinator.com/item?id=48050048</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 07 May 2026 14:40:36 +0000</pubDate><link>https://github.com/ctxvault/ctxvault</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=48050048</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48050048</guid></item><item><title><![CDATA[New comment by LuxBennu in "ChatGPT for Excel"]]></title><description><![CDATA[
<p>Chatgpt for Excel is still an office add-in running in the same sandbox though. strongpigeon described the exact bottleneck upthread, process boundary crossings, context.sync() roundtrips that take seconds on web. That's a platform limitation, not a model limitation.
Swapping AI behind the add-in doesn't fix the fundamental constraint that third-party add-ins can't deeply integrate with Excel's runtime the way a native feature can. If copilot is bad despite having more access to excel internals(I don't like how Copilot is designed or implemented tho), an add-in with less access is likely not be better.</p>
]]></description><pubDate>Wed, 15 Apr 2026 23:28:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=47786718</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47786718</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47786718</guid></item><item><title><![CDATA[Making prompts longer did not help. Making the task contract explicit did]]></title><description><![CDATA[
<p>Article URL: <a href="https://signaldepth.ai/">https://signaldepth.ai/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47778720">https://news.ycombinator.com/item?id=47778720</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 15 Apr 2026 13:31:03 +0000</pubDate><link>https://signaldepth.ai/</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47778720</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47778720</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon"]]></title><description><![CDATA[
<p>Yeah sorry that was unclear on my part. I chunk at the endpoint level, whisper itself obviously processes 30s windows. The memory/latency thing I was referring to is more about processing longer files end to end through the pipeline, not a single whisper pass. My fastapi wrapper just splits the audio and runs chunks sequentially so total wall time scales linearly with file length, nothing fancy.</p>
]]></description><pubDate>Wed, 08 Apr 2026 15:55:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47691954</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47691954</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47691954</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>Oh nice, the pyannote coreml port is interesting. Last time I looked at pyannote it was pytorch only so getting it to run efficiently on apple silicon was kind of a pain. Does the coreml version handle diarization or just activity detection?</p>
]]></description><pubDate>Tue, 07 Apr 2026 22:36:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47682234</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47682234</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47682234</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon"]]></title><description><![CDATA[
<p>Ah that makes sense, quadratic scaling is brutal. So with 96gb i'd probably get somewhere around 4-5k total sequence length before hitting the wall, which is still pretty limiting for anything multimodal. Do you do any gradient checkpointing or is that not worth the speed tradeoff at these sizes?</p>
]]></description><pubDate>Tue, 07 Apr 2026 22:35:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47682225</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47682225</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47682225</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon"]]></title><description><![CDATA[
<p>I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.</p>
]]></description><pubDate>Tue, 07 Apr 2026 20:28:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47680929</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47680929</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47680929</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Ghost Pepper – Local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>Yeah that makes sense, chunking on silence would sidestep the latency issue pretty cleanly. I've been running it through a basic fastapi wrapper so it just takes whatever audio blob gets thrown at it, no chunking logic on the server side. Might be worth adding a vad pass before sending to whisper though, would cut down on processing dead air too.</p>
]]></description><pubDate>Mon, 06 Apr 2026 21:47:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47667650</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47667650</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47667650</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Ghost Pepper – 100% local hold-to-talk speech-to-text for macOS"]]></title><description><![CDATA[
<p>I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.</p>
]]></description><pubDate>Mon, 06 Apr 2026 20:48:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47666857</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47666857</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47666857</guid></item><item><title><![CDATA[New comment by LuxBennu in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>that tracks with what i've noticed practically. shorter prompts feel basically the same between llama.cpp metal and what i'd expect from native mlx, but once context gets longer the overhead starts showing up. would be interesting to see if ollama's mlx path actually handles kv cache differently under the hood or if it just skips the buffer sync layer</p>
]]></description><pubDate>Wed, 01 Apr 2026 07:25:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47597922</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47597922</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47597922</guid></item><item><title><![CDATA[New comment by LuxBennu in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>Roughly 8-12 token/s on generation depending on context length. Prompt processing is faster obviously. Haven't benchmarked it super carefully though, just eyeballing the llama.cpp output.</p>
]]></description><pubDate>Wed, 01 Apr 2026 07:21:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47597902</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47597902</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47597902</guid></item><item><title><![CDATA[New comment by LuxBennu in "From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem"]]></title><description><![CDATA[
<p>yeah fair point, it's definitely model dependent. i've had good results with qwen but tried it on a smaller mistral variant once and the output quality dropped noticeably even at q8 for both. the speed hit from mixed types hasn't been bad on apple silicon in my experience but i can see it mattering more on cuda.</p>
]]></description><pubDate>Wed, 01 Apr 2026 07:20:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47597894</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47597894</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47597894</guid></item><item><title><![CDATA[New comment by LuxBennu in "From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem"]]></title><description><![CDATA[
<p>good overview of the architecture side but worth mentioning there's another axis that stacks on top of all of this: you can quantize the kv cache itself at inference time. in llama.cpp you can run q8 for keys and q4 for values and it cuts cache memory roughly in half again on top of whatever gqa or mla already saves you. i run qwen 70b 4-bit on m2 max 96gb and the kv quant is what actually made longer contexts fit without running out of unified memory. keys need more precision because they drive attention scores but values are way more tolerant of lossy compression, so the asymmetry works out.</p>
]]></description><pubDate>Tue, 31 Mar 2026 19:29:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47592257</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47592257</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47592257</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Reprompt – Analyze what you type into AI tools, not what they output"]]></title><description><![CDATA[
<p>Thanks! Turns out structural signals get you surprisingly far. An LLM catches more, but speed is the feature.</p>
]]></description><pubDate>Tue, 31 Mar 2026 18:23:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47591461</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47591461</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47591461</guid></item><item><title><![CDATA[New comment by LuxBennu in "Show HN: Reprompt – Analyze what you type into AI tools, not what they output"]]></title><description><![CDATA[
<p>I ran this on my own prompt history and three things surprised me. found 3 API keys buried in copy-pasted stack traces (`reprompt privacy`). 35% of my agent sessions had error loops -- the agent retrying the same failing approach 3+ times (`reprompt agent`). And 50-70% of my conversation turns were filler like "ok try that" (`reprompt distill`).<p><pre><code>    pip install reprompt-cli
    reprompt scan && reprompt
</code></pre>
Everything runs locally -- zero network calls, zero telemetry. Also works as an MCP server and GitHub Action.</p>
]]></description><pubDate>Tue, 31 Mar 2026 15:53:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47589246</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47589246</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47589246</guid></item><item><title><![CDATA[Show HN: Reprompt – Analyze what you type into AI tools, not what they output]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/reprompt-dev/reprompt">https://github.com/reprompt-dev/reprompt</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47589133">https://news.ycombinator.com/item?id=47589133</a></p>
<p>Points: 3</p>
<p># Comments: 3</p>
]]></description><pubDate>Tue, 31 Mar 2026 15:46:25 +0000</pubDate><link>https://github.com/reprompt-dev/reprompt</link><dc:creator>LuxBennu</dc:creator><comments>https://news.ycombinator.com/item?id=47589133</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47589133</guid></item></channel></rss>