<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: LavaDMan</title><link>https://news.ycombinator.com/user?id=LavaDMan</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 13 Apr 2026 12:04:31 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=LavaDMan" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by LavaDMan in "Ask HN: What Are You Working On? (March 2026)"]]></title><description><![CDATA[
<p>Building a self-hosted agentic OS I call AEGIS — Adaptive Execution & Generative Intelligence System. Running on a single workstation with a consumer GPU.
The core idea is a three-tier model cascade: a cloud model handles architecture and review, a local 32B model handles execution and code generation, smaller local models handle evaluation. The cloud model never executes directly — it reviews diffs and approves before anything gets committed.
The interesting problems so far: GPU arbitration across competing inference services using a distributed lock, giving local models read-only access to institutional memory before task execution so they're not flying blind, and autonomous fleet provisioning — I spun up a new server node last night without touching it after the USB went in.
Next phase is adding department queues so the system understands context — infrastructure work vs. client consulting work vs. internal tooling — and idle-time priority advisory so it starts anticipating what I need rather than waiting to be asked.
Goal is something closer to Jarvis than a chatbot. Early days but the bones are solid.</p>
]]></description><pubDate>Mon, 09 Mar 2026 16:04:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47310860</link><dc:creator>LavaDMan</dc:creator><comments>https://news.ycombinator.com/item?id=47310860</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47310860</guid></item><item><title><![CDATA[Giving local LLMs read-only institutional memory before task execution]]></title><description><![CDATA[
<p>I run a three-tier agentic system: cloud LLM for architecture/review, local 32B model for code generation and execution, smaller models for evaluation. The local model (Qwen2.5-Coder 32B) kept making avoidable mistakes — suggesting approaches that had already failed, ignoring active project context, reinventing solutions we'd already discarded.
The problem isn't capability. It's that each delegation is stateless. The local model gets a task string and nothing else.
The fix: Before every local model call, run a parallel enrichment pipeline and inject the results into the system prompt.
async def fetch_qdrant_hits():
    # Embed the task, search execution_memory for similar prior operations
    # Returns: operation_type, outcome_score, last_action, result summary<p>async def fetch_active_mandates():
    # Pull IN_PROGRESS and APPROVED work from Postgres<p>async def fetch_pending_horizon():
    # Pull top PENDING items — awareness only, not authorization<p>qdrant_hits, mandates, horizon = await asyncio.gather(
    fetch_qdrant_hits(), 
    fetch_active_mandates(), 
    fetch_pending_horizon(),
    return_exceptions=True  # enrichment failure never blocks execution
)
The injected block looks like:
---
INSTITUTIONAL MEMORY (read-only, do not modify):<p>Prior relevant operations:
- [SECURITY] Score:9/10 | Action:detonate_package
  Result: PERC H710 Mini does not support JBOD/Non-RAID on iDRAC 7. 
  Used single-drive RAID-0 as workaround.<p>Active mandates:
- [Phase 15] R720xd Provisioning — APPROVED (priority 8)<p>Upcoming pipeline (awareness only — not yet authorized):
- [priority 6] Visual & Strategy Audit — PENDING<p>CONSTRAINTS: Read this context to inform your work. You may NOT update 
mandates, write to memory, or modify fleet state. All outputs are 
returned as string payloads to the L3 Architect for review and commit.
---
The hard constraint block matters. Without it, a capable local model will attempt to act on context it shouldn't touch. The read-only boundary is enforced in the prompt, not technically — but in practice it works because the model is explicitly told its role.
Results so far: The hardware-specific mistake that prompted this (local model looping on invalid RAID commands for 20 minutes) wouldn't happen now — the correct workaround is in execution_memory and would surface on the next similar task.
The open question: How do you prevent the context window from getting polluted over time as execution_memory grows? Right now using score_threshold=0.5 and limit=3 on the semantic search. Curious whether others have found better filtering strategies for long-running agentic systems.
Code is self-hosted, stack is Qdrant + Postgres + Neo4j + Ollama. Happy to share more details on any piece.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47309028">https://news.ycombinator.com/item?id=47309028</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 09 Mar 2026 13:47:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47309028</link><dc:creator>LavaDMan</dc:creator><comments>https://news.ycombinator.com/item?id=47309028</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47309028</guid></item></channel></rss>