<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: d4rkp4ttern</title><link>https://news.ycombinator.com/user?id=d4rkp4ttern</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 20 May 2026 19:24:57 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=d4rkp4ttern" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by d4rkp4ttern in "Click (2016)"]]></title><description><![CDATA[
<p>Another one like this -<p><a href="https://sinceyouarrived.world/taken" rel="nofollow">https://sinceyouarrived.world/taken</a></p>
]]></description><pubDate>Mon, 18 May 2026 23:57:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48187533</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=48187533</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48187533</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Eric Schmidt speech about AI booed during graduation"]]></title><description><![CDATA[
<p>This should probably be merged with this:<p><a href="https://news.ycombinator.com/item?id=48177107">https://news.ycombinator.com/item?id=48177107</a></p>
]]></description><pubDate>Mon, 18 May 2026 13:58:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48180004</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=48180004</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48180004</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Multiple commencement speakers booed for AI comments during graduation speeches"]]></title><description><![CDATA[
<p>I agree with this (and the earlier comment about perceived expertise vs actual expertise), and I think it goes beyond hiring managers.<p>The core demoralizing fact is that when people perceive that AI can give results at least as good as human experts, they choose AI, because it is faster and/or cheaper.</p>
]]></description><pubDate>Mon, 18 May 2026 13:24:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=48179530</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=48179530</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48179530</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Multiple commencement speakers booed for AI comments during graduation speeches"]]></title><description><![CDATA[
<p>I can see where it's coming from. Putting it starkly, at a high level, the broad effect of AI is this:<p><pre><code>    devaluation of expertise,
</code></pre>
whether in coding, or drawing, or music composition, or writing, or translation, or so many other areas.<p>College students working hard to gain expertise in specific areas are faced with the prospect that this very expertise is being "democratized" by AI, putting it in the hands of literally anyone. Sure, true expertise is still needed to "validate" (and train) the AI, etc, etc, but that's a small consolation.<p>Relatedly, a year ago I was excited to learn the Rust language. Now I don't see the point (And I'm building tools with Rust). I'm sure this sentiment extends across fields.</p>
]]></description><pubDate>Mon, 18 May 2026 11:49:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48178355</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=48178355</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48178355</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "AI is making me dumb"]]></title><description><![CDATA[
<p>In the context of coding agents, one thing I find surprisingly useful is this - rather than have the code-agent explain to me what it did and why, I have it quiz me Socratic-style: it presents a scenario or problem, and asks me why a certain idea would not work. It forces me put effort into thinking it through, and even if I answer wrong, it avoids giving away the answer and persists in drilling down with further questions, until I eventually arrive at the answer. This type of effortful thinking likely helps both learning and retention. I find the frontier LLMs very good at this type of quizzing. I made this Socratic Quiz as part of my suite of plugins for Claude-Code or Codex:<p><a href="https://pchalasani.github.io/claude-code-tools/plugins-detail/workflow/#socratic-quiz" rel="nofollow">https://pchalasani.github.io/claude-code-tools/plugins-detai...</a></p>
]]></description><pubDate>Fri, 15 May 2026 14:18:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=48148944</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=48148944</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48148944</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "VibeVoice: Open-source frontier voice AI"]]></title><description><![CDATA[
<p>I guess you mean for STT. For my usecase of talking to AI's or coding agents, pure STT accuracy is less important than transcription speed. Transcription needs to be near-instant, and accuracy "good enough" so that the AI's can "read between the lines". Parakeet-V3 gives exactly this.</p>
]]></description><pubDate>Wed, 29 Apr 2026 19:57:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47953684</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47953684</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47953684</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "VibeVoice: Open-source frontier voice AI"]]></title><description><![CDATA[
<p>This was on HN 7 months ago:<p><a href="https://news.ycombinator.com/item?id=45114245">https://news.ycombinator.com/item?id=45114245</a><p>Every time a STT/TTS model is posted I wonder if it will change my current workflow on MacOS, which is:<p>STT with Parakeet-V3 via Hex [1] app for near-instant good-enough transcription for talking to AI agents.<p>TTS using KyutAI’s Pocket-TTS, an amazing 100M-param model. I used this to make a "voice" plugin [2]  for Claude Code<p>So far I haven’t seen anything that replaces these for me, or haven't been persuaded enough to spend time testing an alternative (explore/exploit and all that).<p>[1] Hex STT app - <a href="https://github.com/kitlangton/Hex" rel="nofollow">https://github.com/kitlangton/Hex</a>, which is macOS-only.
(also good free/OSS alternatives: Handy, VoiceInk. No need for Wispr, Superwhisper etc)<p>[2] Claude Code Voice Plugin - <a href="https://pchalasani.github.io/claude-code-tools/plugins-detail/voice/" rel="nofollow">https://pchalasani.github.io/claude-code-tools/plugins-detai...</a></p>
]]></description><pubDate>Wed, 29 Apr 2026 13:41:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47948352</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47948352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47948352</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Show HN: A lightweight way to make agents talk without paying for API usage"]]></title><description><![CDATA[
<p>My regular workflow is to run code agents in Tmux panes and often I have  Claude Code consult/collaborate with Codex, using my tmux-cli [1] tool, which is a wrapper around Tmux that provides good defaults (delay etc) for robust sending of messages, and waiting for completion etc.<p>[1] <a href="https://pchalasani.github.io/claude-code-tools/tools/tmux-cli/" rel="nofollow">https://pchalasani.github.io/claude-code-tools/tools/tmux-cl...</a></p>
]]></description><pubDate>Mon, 20 Apr 2026 10:32:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47832410</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47832410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47832410</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB)"]]></title><description><![CDATA[
<p>I now often have CC make technical/architecture diagrams with tikz, the results look much better than mermaid but still requires multiple iterations to fix bad arrows, bad layouts etc.<p>Diagrams are still far from solved. We need a good non-gameable diagrams benchmark.</p>
]]></description><pubDate>Mon, 20 Apr 2026 10:26:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47832367</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47832367</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47832367</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"]]></title><description><![CDATA[
<p>you can set it in the .omlx/settings.json - ask a code-agent to figure it out by pointing it at the omlx repo</p>
]]></description><pubDate>Mon, 06 Apr 2026 23:39:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668860</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47668860</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668860</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"]]></title><description><![CDATA[
<p>At least for the Gemma4-26B-A4B, Token-gen speed with OMLX is far worse on my M1 Max 64GB Macbook, compared to llama-server:<p><pre><code>  Quick benchmark on M1 Max 64GB, Gemma 4 26B-A4B (MoE), comparing matched dynamic 4-bit quants. Workload
  was Claude Code, which sends ~35K tokens of input context per request (system prompt + tools + user
  message):

  llama.cpp (unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, llama-server -fa on -c 131072 --jinja --temp 1.0
  --top-p 0.95 --top-k 64):
  - pp ≈ 395 tok/s
  - tg ≈ 40 tok/s

  oMLX (unsloth/gemma-4-26b-a4b-it-UD-MLX-4bit, omlx serve --model-dir ~/models/omlx, with
  sampling.max_context_window and max_tokens bumped to 131072 in ~/.omlx/settings.json):
  - pp ≈ 350 tok/s
  - tg ≈ 5–13 tok/s

  Same model family and quant tier. Prompt processing is comparable, but oMLX's token generation is 3–7x
  slower than llama.cpp's Metal backend. Counter-intuitive given MLX is Apple's native ML framework.</code></pre></p>
]]></description><pubDate>Mon, 06 Apr 2026 23:37:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668849</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47668849</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668849</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"]]></title><description><![CDATA[
<p>Correct, no issues because since at least a few months, llama.cpp/server exposes an Anthropic messages API at v1/messages, in addition to the OpenAI-compatible API at v1/chat/completions. Claude Code uses the former.</p>
]]></description><pubDate>Mon, 06 Apr 2026 15:29:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47662207</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47662207</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47662207</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code"]]></title><description><![CDATA[
<p>You can use llama.cpp server directly to serve local LLMs and use them in Claude Code or other CLI agents. I’ve collected full setup instructions for  Gemma4 and other recent open-weight LLMs here, tested on my M1 Max 64 GB MacBook:<p><a href="https://pchalasani.github.io/claude-code-tools/integrations/local-llms/" rel="nofollow">https://pchalasani.github.io/claude-code-tools/integrations/...</a><p>The 26BA4B is the most interesting to run on such hardware, and I get nearly double the token-gen speed (40 tok/s) compared to Qwen3.5 35BA3B. However the tau2 bench results[1] for this Gemma4 variant lag far behind the Qwen  variant (68% vs 81%), so I don’t expect the former to do well on heavy agentic tool-heavy tasks:<p>[1] <a href="https://news.ycombinator.com/item?id=47616761">https://news.ycombinator.com/item?id=47616761</a></p>
]]></description><pubDate>Mon, 06 Apr 2026 11:41:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47659626</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47659626</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47659626</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Google releases Gemma 4 open models"]]></title><description><![CDATA[
<p>Thanks, fixed</p>
]]></description><pubDate>Thu, 02 Apr 2026 22:11:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47620843</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47620843</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47620843</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Google releases Gemma 4 open models"]]></title><description><![CDATA[
<p>For token-generation speed, a challenging test is to see how it performs in a code-agent harness like Claude Code, which has anywhere between 15-40K tokens  from the system prompt itself (+ tools/skills etc).<p>Here the 26B-A4B variant is head and shoulders above recent open-weight models, at least on my trusty M1 Max 64GB MacBook.<p>I set up Claude Code to use this variant via llama-server, with 37K tokens initial context, and it performs very well: ~40 tokens/sec, far better than Qwen3.5-35B-A3B, though I don't know yet about the intelligence or tool-calling consistency. Prompt processing speed is comparable to the Qwen variant at ~400 tok/s.<p>My informal tests, all with roughly 30K-37K tokens initial context:<p><pre><code>    ┌────────────────────┬───────────────┬────────────┐
    │       Model        │ Active Params │ tg (tok/s) │
    ├────────────────────┼───────────────┼────────────┤
    │ Gemma-4-26B-A4B    │ 4B            │ ~40        │
    ├────────────────────┼───────────────┼────────────┤
    │ GPT-OSS-20B        │ 3.6B          │ ~17-38     │
    ├────────────────────┼───────────────┼────────────┤
    │ Qwen3-30B-A3B      │ 3B            │ ~15-27     │
    ├────────────────────┼───────────────┼────────────┤
    │ GLM-4.7-Flash      │ 3B            │ ~12-13     │
    ├────────────────────┼───────────────┼────────────┤
    │ Qwen3.5-35B-A3B    │ 3B            │ ~12        │
    ├────────────────────┼───────────────┼────────────┤
    │ Qwen3-Next-80B-A3B │ 3B            │ ~3-5       │
    └────────────────────┴───────────────┴────────────┘

</code></pre>
Full instructions for running this and other open-weight models with Claude Code are here:<p><a href="https://pchalasani.github.io/claude-code-tools/integrations/local-llms/#gemma-4-26b-a4b--google-moe-with-vision" rel="nofollow">https://pchalasani.github.io/claude-code-tools/integrations/...</a></p>
]]></description><pubDate>Thu, 02 Apr 2026 21:55:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=47620689</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47620689</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47620689</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "The Claude Code Source Leak: fake tools, frustration regexes, undercover mode"]]></title><description><![CDATA[
<p>I dug into this more. It's disabled by default, and it's a cost/token-usage optimization.<p><pre><code>  The logic is:

  1. Anthropic's API has a server-side prompt cache with a 1-hour TTL
  2. When you're actively using a session, each API call reuses the cached prefix — you only pay
  for new tokens
  3. After 1 hour idle, that cache is guaranteed expired
  4. Your next message will re-send and re-process the entire conversation from scratch — every
  token, full price
  5. So if you have 150K tokens of old Grep/Read/Bash outputs sitting in the conversation, you're
  paying to re-ingest all of that even though it's stale context the model probably doesn't need

  The microcompact says: "since we're paying full price anyway, let's shrink the bill by clearing
  the bulky stuff."

  What's preserved vs lost:
  - The tool_use blocks (what tool was called, with what arguments) — kept
  - The tool_result content (the actual output) — replaced with [Old tool result content cleared]
  - The most recent 5 tool results — kept

  So Claude can still see "I ran Grep for foo in src/" but not the 500-line grep output from 2
  hours ago.

  Does it affect quality? Yes, somewhat — but the tradeoff is that without it, you're paying
  potentially tens of thousands of tokens to re-ingest stale tool outputs that the model already
  acted on. And remember, if the conversation is long enough, full compaction would have summarized
   those messages anyway.

  And critically: this is disabled by default (enabled: false in timeBasedMCConfig.ts:31). It's
  behind a GrowthBook feature flag that Anthropic controls server-side. So unless they've flipped
  it on for your account, it's not happening to you.</code></pre></p>
]]></description><pubDate>Thu, 02 Apr 2026 13:17:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47614091</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47614091</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47614091</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "The Claude Code Source Leak: fake tools, frustration regexes, undercover mode"]]></title><description><![CDATA[
<p>For me one of the most interesting aspects is how compaction works. It turns out compaction still preserves the full original pre-compaction conversation in the session jsonl file, and those are marked as "not to be sent to the API". Which means, even after compaction, if you think something was lost, you can tell CC to "look in the session log files to find details about what we did with XYZ". I knew this before the leak since it can be seen from the session logs. Some more details:<p><pre><code>  The full conversation is preserved in the JSONL file, and messages
  are filtered before being sent to the API.

  Key mechanisms:

  1. JSONL is append-only — old pre-compaction messages are never deleted. New messages (boundary
  marker, summary, attachments) are appended after compaction.
  2. Messages have flags controlling API visibility:
    - isCompactSummary: true — marks the AI-generated summary message
    - isVisibleInTranscriptOnly: true — prevents a message from being sent to the API
    - isMeta — another filter for non-API messages
    - getMessagesAfterCompactBoundary() returns only post-compaction messages for API calls
  3. After compaction, the API sees only:
    - The compact boundary marker
    - The summary message
    - Attachments (file refs, plan, skills)
    - Any new messages after compaction
  4. Three compaction types exist:
    - Full compaction — API summarizes all old messages
    - Session memory compaction — uses extracted session memory as summary (cheaper)
    - Microcompaction — clears old tool result content when cache is cold (>1h idle)</code></pre></p>
]]></description><pubDate>Wed, 01 Apr 2026 14:41:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47601622</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47601622</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47601622</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "The Claude Code Source Leak: fake tools, frustration regexes, undercover mode"]]></title><description><![CDATA[
<p>that frustration regex is missing "idiot", which is the most common frustration word I use with code-agents</p>
]]></description><pubDate>Wed, 01 Apr 2026 14:38:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47601581</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47601581</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47601581</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "I use excalidraw to manage my diagrams for my blog"]]></title><description><![CDATA[
<p>Another option I use open is to ask the  code-agent to make a diagram using Tikz (as a .tex file), which can then be converted to pdf/png.<p>But in general AI-diagramming is still unsolved; needs several iterations to get rid of wonky/wrong arrows, misplaced boxes, misplaced text etc.</p>
]]></description><pubDate>Mon, 30 Mar 2026 11:10:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=47572813</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47572813</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47572813</guid></item><item><title><![CDATA[New comment by d4rkp4ttern in "Anatomy of the .claude/ folder"]]></title><description><![CDATA[
<p>Agree. For what it’s worth, in interviews Cherny (Claude Code creator) and Steinberger (OpenClaw creator) say they keep things simple and use none of the workflow frameworks. The latter even said he doesn’t even use plan mode, but I find that very useful: exiting plan mode starts clean with compressed context.</p>
]]></description><pubDate>Sat, 28 Mar 2026 11:32:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47553611</link><dc:creator>d4rkp4ttern</dc:creator><comments>https://news.ycombinator.com/item?id=47553611</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47553611</guid></item></channel></rss>