<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News - Newest: &#34;LLM&#34;</title><link>https://news.ycombinator.com/newest</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 20 Jun 2026 21:30:13 +0000</lastBuildDate><atom:link href="https://hnrss.org/newest?q=LLM" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[My self-hosted local LLM server setup]]></title><description><![CDATA[
<p>Article URL: <a href="https://old.reddit.com/r/LocalLLM/comments/1ub1iu2/my_selfhosted_llm_server_setup_to_access_open">https://old.reddit.com/r/LocalLLM/comments/1ub1iu2/my_selfhosted_llm_server_setup_to_access_open</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48611821">https://news.ycombinator.com/item?id=48611821</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Sat, 20 Jun 2026 18:48:49 +0000</pubDate><link>https://old.reddit.com/r/LocalLLM/comments/1ub1iu2/my_selfhosted_llm_server_setup_to_access_open</link><dc:creator>onthemarkdata</dc:creator><comments>https://news.ycombinator.com/item?id=48611821</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48611821</guid></item><item><title><![CDATA[Hand-powered LLM (YouTube) [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=HSapdLYpmWY">https://www.youtube.com/watch?v=HSapdLYpmWY</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48610738">https://news.ycombinator.com/item?id=48610738</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 20 Jun 2026 16:52:17 +0000</pubDate><link>https://www.youtube.com/watch?v=HSapdLYpmWY</link><dc:creator>mcchen51</dc:creator><comments>https://news.ycombinator.com/item?id=48610738</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48610738</guid></item><item><title><![CDATA[Changes that cut our LLM pipeline costs more than model-switching did]]></title><description><![CDATA[
<p>I have been building multiple LLM systems and for our Organization biggest cost savings weren't from prompt-wordsmithing or model switchings.
Sharing useful to anyone watching their token bill :<p>1) JSON → TOON for structured output:
JSON was not made for LLMs. well you can implement your own verison that fits for your needs that reduce tokens usage but what worked for us was TOON. TOON cut output our tokens by ~30% same information, way less syntax tax.<p>2) Full markdown/HTML → condensed markdown:
Using markdown for writing your prompts, getting intermediate results or communication between your Agents eats a lot of tokens. We swithced to condesed markdown and short system prompts that replicate Caveman. this alone cut just on input token costs ~50% on calls that pass prior context forward which can be implemented between Agent Calls.<p>3) Long Do/Don't instruction lists → 2-3 multi-shot examples:
Counterintuitive one - replacing a large lists of DO's and Don'ts for agents rules don't help. rather couple of concrete examples that convers major and all cases actually improved output quality more reliably and it's usually fewer tokens once the instruction list gets long enough to cover real edge cases.<p>I have seen most people on this sub reddit talk about using open-source or cheaper models. Like we were spending thousands of dollar's but this all changes alone helped reduce cost by 60%.<p>edit: Open to Discussion, anyone whether something similar would help their setup.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48608978">https://news.ycombinator.com/item?id=48608978</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 20 Jun 2026 13:05:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48608978</link><dc:creator>Abbas_Maka</dc:creator><comments>https://news.ycombinator.com/item?id=48608978</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48608978</guid></item><item><title><![CDATA[Show HN: Local automation runner with built-in LLM steps – YAML pipelines]]></title><description><![CDATA[
<p>Article URL: <a href="https://rorlikowski.github.io/stepyard/">https://rorlikowski.github.io/stepyard/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48607721">https://news.ycombinator.com/item?id=48607721</a></p>
<p>Points: 5</p>
<p># Comments: 2</p>
]]></description><pubDate>Sat, 20 Jun 2026 09:21:46 +0000</pubDate><link>https://rorlikowski.github.io/stepyard/</link><dc:creator>rorlikowski</dc:creator><comments>https://news.ycombinator.com/item?id=48607721</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48607721</guid></item><item><title><![CDATA[Show HN: FERNme – agent memory that updates with ~zero LLM calls]]></title><description><![CDATA[
<p>Hi HN,
FERNme is a memory layer for AI agents. Most systems (Mem0, etc.) call an LLM on every turn to extract/summarize memory, or dump everything into a vector DB. I wanted to see how far a cheaper, more brain-like approach goes.
Each user is a node in a fuzzy preference graph; edges (0–9, plus explicit negative edges for dislikes) update with a Hebbian co-occurrence rule — no LLM on the write path — decay ACT-R-style, and compile to a flat, ~40-token "memory card." Retrieval is spreading activation, not vector search.
Some early results (all reproducible in the repo): ingesting 86 free-form diary entries about one person ran with 0 write-time LLM calls, kept a flat ~40-token card, and on a LoCoMo-style QA set the context-seeded retrieval answered ~4× more questions than frequency/recency baselines at equal budget — and was the only LLM-free method that handled preference drift.
Honest about what this is not: the Hebbian + spreading-activation mechanism isn't novel — HippoRAG, Ori-Mnemos, and HeLa-Mem all use versions of it. My bet is on the combination: near-zero-cost writes, multi-tenant + a privacy-preserving population prior, user-owned cross-surface memory, and evaluating on actions rather than QA. Benchmarks so far are synthetic or single-person; a real LLM-memory (Mem0) head-to-head needs API keys and isn't run yet. It's MCP-compatible, and the repo has a paper draft (PAPER.md) plus a reproducible demo (demo/elena/ — builds memory from 86 free-form diary entries and runs the QA benchmark).
Code: github.com/mirkofr/FERNme
I'd love feedback on: the memory representation, whether the action-coupled eval is meaningful, failure modes you'd expect, and benchmarks/competitors I should test against — especially a real Mem0 head-to-head. Criticism very welcome.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48607574">https://news.ycombinator.com/item?id=48607574</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 20 Jun 2026 08:55:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48607574</link><dc:creator>mirkofr</dc:creator><comments>https://news.ycombinator.com/item?id=48607574</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48607574</guid></item><item><title><![CDATA[Ask HN: Will we start seeing tools for LLM use?]]></title><description><![CDATA[
<p>Many projects and addons exists to reduce the verbosity on standard bash / git /npm etc. Commands that agents pass regularly as tools to LLMs.  (eg. rtk, headroom, lean-ctx).  Tool output compression does yield good token savings. Though it can lead to increased turns - effectively nullifying the per turn token savings. This is the current topology.  Are we going to see a class of products and libraries that structure o/p to what models want to see?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48606997">https://news.ycombinator.com/item?id=48606997</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Sat, 20 Jun 2026 06:58:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=48606997</link><dc:creator>bonigv</dc:creator><comments>https://news.ycombinator.com/item?id=48606997</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48606997</guid></item><item><title><![CDATA[Compress tool outputs, logs, files, RAG chunks before LLM for 60-95% less tokens]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/chopratejas/headroom">https://github.com/chopratejas/headroom</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48606411">https://news.ycombinator.com/item?id=48606411</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 20 Jun 2026 04:45:42 +0000</pubDate><link>https://github.com/chopratejas/headroom</link><dc:creator>gmays</dc:creator><comments>https://news.ycombinator.com/item?id=48606411</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48606411</guid></item><item><title><![CDATA[I am dreading our LLM-written incident report future]]></title><description><![CDATA[
<p>Article URL: <a href="https://surfingcomplexity.blog/2026/06/19/i-am-dreading-our-llm-written-incident-report-future/">https://surfingcomplexity.blog/2026/06/19/i-am-dreading-our-llm-written-incident-report-future/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48605136">https://news.ycombinator.com/item?id=48605136</a></p>
<p>Points: 6</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 20 Jun 2026 00:50:52 +0000</pubDate><link>https://surfingcomplexity.blog/2026/06/19/i-am-dreading-our-llm-written-incident-report-future/</link><dc:creator>azhenley</dc:creator><comments>https://news.ycombinator.com/item?id=48605136</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48605136</guid></item><item><title><![CDATA[Show HN: slash-agent – Native LLM copilot for your terminal]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/akatzmann/slash-agent">https://github.com/akatzmann/slash-agent</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48602690">https://news.ycombinator.com/item?id=48602690</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 20:07:55 +0000</pubDate><link>https://github.com/akatzmann/slash-agent</link><dc:creator>akatzmann</dc:creator><comments>https://news.ycombinator.com/item?id=48602690</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48602690</guid></item><item><title><![CDATA[Pipeline-parallel LLM inference across GPUs on separate machines]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/leyten/shard">https://github.com/leyten/shard</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48602121">https://news.ycombinator.com/item?id=48602121</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 19:14:34 +0000</pubDate><link>https://github.com/leyten/shard</link><dc:creator>ngaut</dc:creator><comments>https://news.ycombinator.com/item?id=48602121</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48602121</guid></item><item><title><![CDATA[LLM Quantization Project Part 1: What Even Is an LLM?]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.lttlabs.com/articles/2026/06/19/llm-quantization-part-1-what-even-is-an-llm">https://www.lttlabs.com/articles/2026/06/19/llm-quantization-part-1-what-even-is-an-llm</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48600968">https://news.ycombinator.com/item?id=48600968</a></p>
<p>Points: 4</p>
<p># Comments: 2</p>
]]></description><pubDate>Fri, 19 Jun 2026 17:36:17 +0000</pubDate><link>https://www.lttlabs.com/articles/2026/06/19/llm-quantization-part-1-what-even-is-an-llm</link><dc:creator>LabsLucas</dc:creator><comments>https://news.ycombinator.com/item?id=48600968</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48600968</guid></item><item><title><![CDATA[AI-gateway product that cuts LLM API TOKEN costs by 40-70%]]></title><description><![CDATA[

<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48600462">https://news.ycombinator.com/item?id=48600462</a></p>
<p>Points: 2</p>
<p># Comments: 2</p>
]]></description><pubDate>Fri, 19 Jun 2026 16:44:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=48600462</link><dc:creator>arnab777</dc:creator><comments>https://news.ycombinator.com/item?id=48600462</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48600462</guid></item><item><title><![CDATA[Show HN: Wyolet Relay – high throughput, open source LLM router]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/wyolet/relay">https://github.com/wyolet/relay</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48598419">https://news.ycombinator.com/item?id=48598419</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 13:36:00 +0000</pubDate><link>https://github.com/wyolet/relay</link><dc:creator>aaliboyev</dc:creator><comments>https://news.ycombinator.com/item?id=48598419</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48598419</guid></item><item><title><![CDATA[The LLM industry must keep the RAM prices at absurd levels]]></title><description><![CDATA[
<p>Article URL: <a href="https://infosec.exchange/@masek/116775772309957886">https://infosec.exchange/@masek/116775772309957886</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48597189">https://news.ycombinator.com/item?id=48597189</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 11:02:04 +0000</pubDate><link>https://infosec.exchange/@masek/116775772309957886</link><dc:creator>ndr42</dc:creator><comments>https://news.ycombinator.com/item?id=48597189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48597189</guid></item><item><title><![CDATA[Self-adapting and mutating LLM based viruses/worms]]></title><description><![CDATA[
<p>I am thinking about a future of malware and cyber worms. I bet it's gonna be self-mutating and adapting to local environment using local models (once they are built-in to all devices and performant enough in future years). Basically almost a real organism resembling real biological viruses. In this case the non-determinism of LLMs is a feature. Every infection could take its own development path - and half might die, half might survive. Think genetic programming but autonomous and on steroids. For some non tech (even tech?) people this reminds Skynet and it's fascinating that we are in a trajectory that this suddenly imaginable and theoretically soon possible.<p>Why is not happening now? Inference is still expensive and local models are not there yet, so there's no ROI in making this at scale. But once inference is local and cheap as electricity or running water, this is the natural development. How do we stop the spreading then?<p>Are there already some documented experiments?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48597160">https://news.ycombinator.com/item?id=48597160</a></p>
<p>Points: 3</p>
<p># Comments: 4</p>
]]></description><pubDate>Fri, 19 Jun 2026 10:58:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48597160</link><dc:creator>rozumbrada</dc:creator><comments>https://news.ycombinator.com/item?id=48597160</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48597160</guid></item><item><title><![CDATA[Show HN: I built an 11-LLM consensus engine to detect AI hallucination]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/jaquelinejaque/quorum-saas-starter">https://github.com/jaquelinejaque/quorum-saas-starter</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48596771">https://news.ycombinator.com/item?id=48596771</a></p>
<p>Points: 6</p>
<p># Comments: 5</p>
]]></description><pubDate>Fri, 19 Jun 2026 09:53:50 +0000</pubDate><link>https://github.com/jaquelinejaque/quorum-saas-starter</link><dc:creator>jaquelinejaque</dc:creator><comments>https://news.ycombinator.com/item?id=48596771</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48596771</guid></item><item><title><![CDATA[How to Drive an LLM]]></title><description><![CDATA[
<p>Article URL: <a href="https://home.robusta.dev/blog/how-to-drive-an-llm">https://home.robusta.dev/blog/how-to-drive-an-llm</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48596318">https://news.ycombinator.com/item?id=48596318</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 08:40:45 +0000</pubDate><link>https://home.robusta.dev/blog/how-to-drive-an-llm</link><dc:creator>nyellin</dc:creator><comments>https://news.ycombinator.com/item?id=48596318</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48596318</guid></item><item><title><![CDATA[What 'Getting Your Hands Dirty' Means at LLM-Era]]></title><description><![CDATA[
<p>Article URL: <a href="https://carette.xyz/posts/the_mud_and_the_mind/">https://carette.xyz/posts/the_mud_and_the_mind/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48596273">https://news.ycombinator.com/item?id=48596273</a></p>
<p>Points: 8</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 19 Jun 2026 08:33:50 +0000</pubDate><link>https://carette.xyz/posts/the_mud_and_the_mind/</link><dc:creator>maarcel93</dc:creator><comments>https://news.ycombinator.com/item?id=48596273</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48596273</guid></item><item><title><![CDATA[My suitcase robot gets high off a real gas sensor wired into the LLM sampler]]></title><description><![CDATA[
<p>Article URL: <a href="https://old.reddit.com/r/LocalLLaMA/comments/1u9a17y/my_suitcase_robot_gets_high_now_off_a_real_gas/">https://old.reddit.com/r/LocalLLaMA/comments/1u9a17y/my_suitcase_robot_gets_high_now_off_a_real_gas/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48595528">https://news.ycombinator.com/item?id=48595528</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 06:39:48 +0000</pubDate><link>https://old.reddit.com/r/LocalLLaMA/comments/1u9a17y/my_suitcase_robot_gets_high_now_off_a_real_gas/</link><dc:creator>thunderbong</dc:creator><comments>https://news.ycombinator.com/item?id=48595528</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48595528</guid></item><item><title><![CDATA[The comfortable slow boil of LLM assisted coding]]></title><description><![CDATA[
<p>Article URL: <a href="https://01max.io/blog/a-comfortable-slow-boil/">https://01max.io/blog/a-comfortable-slow-boil/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48595253">https://news.ycombinator.com/item?id=48595253</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 05:52:02 +0000</pubDate><link>https://01max.io/blog/a-comfortable-slow-boil/</link><dc:creator>maxime_</dc:creator><comments>https://news.ycombinator.com/item?id=48595253</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48595253</guid></item></channel></rss>