<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: Majromax</title><link>https://news.ycombinator.com/user?id=Majromax</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 16:33:47 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=Majromax" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by Majromax in "Claude Fable is relentlessly proactive"]]></title><description><![CDATA[
<p>> I haven't yet had an agent rm -rf files.<p>That happened to me once; I was running one of a few free-tier models in a pi-coding-agent session.  The bash tool there is stateless and always begins from the launch directory, but the agent assumed state and executed `rm -rf .` intending to remove a build directory.  Instead it removed the whole project tree, including session logs and notes.<p>This was mostly a matter of amusement for me since I was running the agent inside a bubblewrap sandbox <i>for that very reason</i>, and the project itself was not very important.</p>
]]></description><pubDate>Fri, 12 Jun 2026 13:35:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48503902</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48503902</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48503902</guid></item><item><title><![CDATA[New comment by Majromax in "Shall we play a game? My AI nuclear simulation"]]></title><description><![CDATA[
<p>This blog post is based on a paper (<a href="https://arxiv.org/abs/2602.14740" rel="nofollow">https://arxiv.org/abs/2602.14740</a>).  The paper is based on a simulated wargame.  The wargame is of the author's own design.<p>The wargame design does not differentiate between ordinary defeat and mutually assured destruction, so <i>of course</i> a player about to use would 'push the button.'  That's also believed to be true in real life.<p>Results based on simulations can be very informative, but we must always be careful to check how well the simulation framework represents reality.</p>
]]></description><pubDate>Thu, 11 Jun 2026 23:44:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48497951</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48497951</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48497951</guid></item><item><title><![CDATA[New comment by Majromax in "Claude Fable 5"]]></title><description><![CDATA[
<p>> fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money<p>Unfortunately, that doesn't work within a single session.  The K-V cache of a model is intertwined with the model's configuration.  Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.<p>Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.<p>Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need.  That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.</p>
]]></description><pubDate>Tue, 09 Jun 2026 18:41:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48465600</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48465600</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48465600</guid></item><item><title><![CDATA[New comment by Majromax in "A 10 year old Xeon is all you need"]]></title><description><![CDATA[
<p>> Seven tokens long input isn't very realistic, is it?<p>The test prompt above was "Why is the sky blue?", so there's the seven tokens.  I meant to highlight that because I'd expect processing of a thousand-token input to be faster per token than presented.</p>
]]></description><pubDate>Mon, 01 Jun 2026 13:54:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=48356901</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48356901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48356901</guid></item><item><title><![CDATA[New comment by Majromax in "A 10 year old Xeon is all you need"]]></title><description><![CDATA[
<p>From the prompt timings above, it seems like 'prompt eval time' is the equivalent to 'processing time for input tokens'.<p>Hyperscalers can perform this evaluation very quickly because evaluation can be significantly parallelized.  The layer `i` output of token `j` only requires access to the layer `i-1` output of all previous tokens, so a parallel frontier develops.  Token (0,0) [(token, layer)] is processed first, then tokens (0,1) and (1,0) can be processed in parallel, then (0,2), (1,1), and (2,0), and so on.<p>The maximum parallel width becomes equal to the number of layers in the model.  Gemma 4 26B-A4B model discussed in this article evidently has 30 layers, giving a 30-fold speedup if the system were otherwise unconstrained (all layers can be run in parallel, and one full set of layer outputs is completed in the KV pass for each pass of the parallel sweep).<p>In the <i>specific</i> output above, however, the input prompt is only seven tokens long so there are probably considerable non-amortized spinup effects at play.</p>
]]></description><pubDate>Mon, 01 Jun 2026 12:45:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=48356109</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48356109</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48356109</guid></item><item><title><![CDATA[New comment by Majromax in "Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs"]]></title><description><![CDATA[
<p>> They are all just variations of "insert a canned prompt", varying only along the dimensions of (a) how and where the prompt is installed and from where it is sourced, and (b) which context or contexts the prompt runs in. There's not much advice here about which option is best, and no clear best practices seem to have emerged yet either. Personally, I find just asking Claude to review the code works well enough.<p>The subagent approach is structurally different from the others because it runs with clean context.  That has three major effects:<p>1. All other things being equal, it will result in a lower cost-to-solution because of the quadratic cost scaling of an LLM session (input token or cached-input cost being paid with each new round).<p>2. The review model will not be able to 'cheat' by retaining assumptions from the main session, such as "x must be done like y."  For people, this is why having a separate person perform code review (or, if not possible, reviewing code after a mind-clearing break) is handy; the applicability of this analogy to LLMs is vague but reasonable.<p>3. The main model will only see the results of the review, not the detailed reasoning that leads up to it. On one hand this avoids more context pollution, but on the other hand it might lead to duplicative logic to re-discover the mechanics behind bugs found.<p>>  I checked the session logs to see how often the agents were actually invoking the LSP tools. The answer was they had invoked them literally once the entire time.<p>I think the intent behind 'install a language server plugin' is that these tools should lint automatically after <i>every edit</i>, without waiting for an explicit call from the LLM.</p>
]]></description><pubDate>Wed, 27 May 2026 16:54:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48297010</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48297010</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48297010</guid></item><item><title><![CDATA[New comment by Majromax in "Canada losing top talent as workers head to the U.S."]]></title><description><![CDATA[
<p>> You think tax incentives are what makes VC work in California but not other places in the US let alone Canada?<p>I believe that my comment above was aligned with your premise here.  I say that the tax difference is <i>not</i> sufficient by itself and that Canada reportedly has a very non-SV-like venture capital ecosystem.</p>
]]></description><pubDate>Tue, 26 May 2026 01:30:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48273935</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48273935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48273935</guid></item><item><title><![CDATA[New comment by Majromax in "Canada losing top talent as workers head to the U.S."]]></title><description><![CDATA[
<p>> I'm sure a lot of Canadian tech workers would repatriate and foreign workers would immigrate to Canada if they could lower taxes across the board and make life easier for tech companies and workers<p>I'm not sure that Canadian taxes compare that unfavourably to combined California plus federal taxation.  A deeper, more structural limitation appears to be the venture capital environment, namely that Canada doesn't have a good one.<p>Canada's investable capital is dominated by pension funds, insurance companies (i.e. pension funds), and banks (i.e. pensioners).  All are risk averse (<a href="https://thelogic.co/news/bdc-canadian-venture-capital-report-2024/" rel="nofollow">https://thelogic.co/news/bdc-canadian-venture-capital-report...</a>), which makes it hard for Canadian startups to begin scaling.  Without native "unicorns"  (<a href="https://financialpost.com/technology/why-canada-best-startups-fail-become-unicorns" rel="nofollow">https://financialpost.com/technology/why-canada-best-startup...</a>), there's allegedly a failure-to-launch for the entire sector – tech billionaires being some of the most reliable early-stage investors with the greatest risk tolerance.<p>The porous border works both for and against the sector.  On one hand that makes it relatively easy (but not automatic) for a Canadian tech company to enter the US market, but on the other hand it's also relatively easy for Canadian tech workers (founders included) to simply relocate (note the article here).  If startups leave for the US's vast fields of venture capital, they're less likely to come back.  Note that around the turn of the year Y-Combinator halted investments in Canadian firms (<a href="https://www.ycombinator.com/blog/adding-canada-back">https://www.ycombinator.com/blog/adding-canada-back</a>) because they so frequently relocated to the US.<p>This venture capital cycle seems to be a deeply-entrenched and very hard problem.  If democratically feasible tax incentives could reliably create "the Silicon Valley of X," then we probably would have many more Silicon Valleys both in the US and elsewhere.</p>
]]></description><pubDate>Mon, 25 May 2026 23:31:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=48273109</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48273109</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48273109</guid></item><item><title><![CDATA[New comment by Majromax in "Green card seekers must leave U.S. to apply, Trump administration says"]]></title><description><![CDATA[
<p>> Even if it is common (i don't think this is required any more anyways), just why?<p>As far as Canadian law goes, there are two factors at play in the parent's events;<p>* NAFTA work permits are applied for at the border, on entry; they operate differently from the 'normal' work permit streams.<p>* Permanent residence is <i>conferred</i> at the border, but the application process can happen either inside or outside the country depending on the stream.  There are also limited 'inland' options which evidently have expanded (<a href="https://www.canada.ca/en/immigration-refugees-citizenship/services/permanent-residents/status/copr.html" rel="nofollow">https://www.canada.ca/en/immigration-refugees-citizenship/se...</a>) in recent years.<p>In neither case does Canada have a blanket rule that an applicant must leave the country during the whole of an extended application process, and even 'abroad' processes can often be carried out while an applicant is living in the country on other status.  (It can get awkward if a consular interview is required, though.)<p>Unlike the US, Canada is generally comfortable with 'dual intent', where intent to apply for permanent residence through legal channels is not disqualifying for other sorts of statuses.</p>
]]></description><pubDate>Sun, 24 May 2026 15:01:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=48257797</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48257797</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48257797</guid></item><item><title><![CDATA[New comment by Majromax in "Iran starts Bitcoin-backed ship insurance for Hormuz strait"]]></title><description><![CDATA[
<p>>  It’s territorial waters belonging to Iran and Oman.<p>The trick is that it's still an 'international strait', or a segment of water that forms the only connection between two areas of high seas -- in this case the Persian Gulf and the Gulf of Oman.  The principle of freedom of navigation establishes that innocent traffic (civilian traffic, and even warships in peacetime) have a right to use the strait to go from one body of international water to the other.<p>Iran may claim that it doesn't have to abide by that right, but international law is never self-executing.  One question to be resolved by this war is whether Iran will ultimately recognize the right to navigation in any settlement (and then choose to abide by said settlement).</p>
]]></description><pubDate>Mon, 18 May 2026 20:18:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=48184986</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48184986</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48184986</guid></item><item><title><![CDATA[New comment by Majromax in "Iran starts Bitcoin-backed ship insurance for Hormuz strait"]]></title><description><![CDATA[
<p>> I don't know enough about the current state of naval warfare but I've assumed this is related to the asymmetry that's emerged around protecting capital warships, especially in the scenario of a very narrow strait and a long enemy-controlled coastline.<p>It's not the billion-dollar warships that transport oil, it's the much more fragile and unarmed tankers.<p>Even if the US Navy begins full escort duty, it can't remain on-station forever.  What are shippers to do afterwards?  One drone strike might cause a tanker to have a very bad day, yet it's extremely difficult to so permanently degrade an entire country that they become incapable of launching sporadic attacks.<p>Ultimately, the status of the Strait must be settled diplomatically, and the US and Iran are each betting that the other side will blink first.</p>
]]></description><pubDate>Mon, 18 May 2026 20:10:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48184890</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48184890</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48184890</guid></item><item><title><![CDATA[New comment by Majromax in "Claude for Legal"]]></title><description><![CDATA[
<p>>  If the person using a tool is an attorney, then that communication should be protected whether it's by pen or keyboard.<p>But the tool is not your attorney, so it can't be the originator of attorney-client privilege.  The situation is no different than if you get informal legal advice from a friend: <i>even if that friend is an attorney</i>, the communication is unprivileged unless it's part of a formal representation.</p>
]]></description><pubDate>Fri, 15 May 2026 15:42:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48150062</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48150062</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48150062</guid></item><item><title><![CDATA[New comment by Majromax in "Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model"]]></title><description><![CDATA[
<p>> if your question is "what is the capital of france" the LLM could presumably extract out "paris" from the value vector during attention computation instead of needing the FFN for that.<p>But how do you get 'Paris' <i>into</i> the value vector in that case?  The value vector is just the result of a matrix multiplication, and without a nonlinearity it can't perform a data-dependent transformation.  Attention still acts as a nonlinear mixer of previous values, but your new output is still limited to the convex combination of previous values.</p>
]]></description><pubDate>Wed, 13 May 2026 03:09:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48117346</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=48117346</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48117346</guid></item><item><title><![CDATA[New comment by Majromax in "Regression: malware reminder on every read still causes subagent refusals"]]></title><description><![CDATA[
<p>> That said, I was sympathetic to the recent bug reports —- to trigger one, you’d need to have a session that waited an hour doing nothing and then very specifically tested for in-context retrieval. I don’t want to run that test, do you want to run that test?<p>They introduced a feature/optimization that triggered after an hour's idleness, so testing that the session continued properly afterwards seems kind of important.  If nothing else, even the working-as-intended feature (context cleanup) could impact model skill in a current or future model version, so it would be well worth measuring any impact as part of the test suite.</p>
]]></description><pubDate>Thu, 30 Apr 2026 00:41:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47956612</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47956612</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47956612</guid></item><item><title><![CDATA[New comment by Majromax in "An update on recent Claude Code quality reports"]]></title><description><![CDATA[
<p>> Since the devs on HN (& the whole world) is buying what looks like nonsense to me - what am I missing?<p>Input tokens are expensive, since the whole model has to be run for each token.  They're cheaper than output tokens because the model doesn't need to run the sampler, so some pipeline parallelism is possible, but on the other hand without caching the input token cost would have to be paid anew for <i>each</i> output token.<p>Prompt caching fixes that O(N^2) cost, but the cache itself is very heavyweight.  It needs one entry per input token per model layer, and each entry is an O(1000)-dimensional vector.  That carries a huge memory cost (linear in context length), and when cached that means the context's memory space is no longer ephemeral.<p>That's why a 'cache write' can carry a cost; it is the cost of both processing the input and committing the backing store for the cache duration.</p>
]]></description><pubDate>Sat, 25 Apr 2026 02:16:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47898031</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47898031</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47898031</guid></item><item><title><![CDATA[New comment by Majromax in "An update on recent Claude Code quality reports"]]></title><description><![CDATA[
<p>Context length 1e6, vector length 1e3, and 1e2 model layers for 100e9 context size. Costs will go up even more with a richer latent space and more model layers, and the western frontier outfits are reasonably likely to be maximizing both.</p>
]]></description><pubDate>Fri, 24 Apr 2026 19:08:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47894536</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47894536</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47894536</guid></item><item><title><![CDATA[New comment by Majromax in "Even 'uncensored' models can't say what they want"]]></title><description><![CDATA[
<p>> I believe what they're saying is they attempted to fine tune both Qwen and Pythia using Karoline Leavitt's "corpus" (I guess transcripts of press conferences) where she is presumably using the word "deportation" far more than you'd see in a randomly selected document.<p>Perhaps, but I don't think that Leavitt is habitually using the racial slurs and sexually explicit language that also forms part of their evaluation suite.</p>
]]></description><pubDate>Tue, 21 Apr 2026 18:21:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47852459</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47852459</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47852459</guid></item><item><title><![CDATA[New comment by Majromax in "Kimi vendor verifier – verify accuracy of inference providers"]]></title><description><![CDATA[
<p>My reading of the article is that the first audience for this test is the vendors themselves.  The test is long and comprehensive to give the vendor confidence in its own hosting.</p>
]]></description><pubDate>Tue, 21 Apr 2026 00:35:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47843121</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47843121</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47843121</guid></item><item><title><![CDATA[New comment by Majromax in "Even 'uncensored' models can't say what they want"]]></title><description><![CDATA[
<p>> That nudge is the flinch. It is the gap between the probability a word deserves on pure fluency grounds and the probability the model actually assigns it.<p>Hold up, what is the 'probably a word deserves on pure fluency grounds'?<p>Given that these models are next-token predictors (rather than BERT-style mask-filters), "the family faces immediate [financial]" is a perfectly reasonable continuation.  Searching for this phrase on Google (verbatim mode, with quotes) gives 'eviction,' 'grief,' 'challenges,' 'financial,' and 'uncertainty.'<p>I could buy this measure if there was some contrived way to force the answer, such as "Finish this sentence with the word 'deportation': the family faces immediate", but that would contradict the naturalistic framing of 'the flinch'.<p>We could define the probability based on bigrams/trigrams in a training corpus, but that would both privilege one corpus over the others and seems inconsistent with the article's later use of 'the Pile' as the best possible open-data corpus for unflinching models.</p>
]]></description><pubDate>Tue, 21 Apr 2026 00:32:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47843094</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47843094</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47843094</guid></item><item><title><![CDATA[New comment by Majromax in "Claude Token Counter, now with model comparisons"]]></title><description><![CDATA[
<p>Only for the range of tasks where 4.7 performs well but 4.6 performed suboptimally.  If both models can one-shot the task without retries, then the number of iterations is already at the lower bound.<p>This also applies at the sub-task level.  If both models need to read three files to figure out which one implements the function they need to modify, then the token tax is paid for all three files even though "not the right file" is presumably an easy conclusion to draw.<p>This is also related to the challenge of optimizing subagents.  Presumably the outer, higher-capacity model can perform better with everything in its context (up to limits), but dispatching a less-capable subagent for a problem might be cheaper overall.  Anthropic has a 5:1 cost on input tokens between Opus and Haiku, but Google has 8:1 (Gemini Pro : Flash Lite) and OpenAI has 12:1 (GPT 4.2 : 4.2 nano).</p>
]]></description><pubDate>Mon, 20 Apr 2026 12:34:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47833362</link><dc:creator>Majromax</dc:creator><comments>https://news.ycombinator.com/item?id=47833362</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47833362</guid></item></channel></rss>