<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mnicky</title><link>https://news.ycombinator.com/user?id=mnicky</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 12 Apr 2026 09:56:29 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mnicky" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mnicky in "Small models also found the vulnerabilities that Mythos found"]]></title><description><![CDATA[
<p>Also, what is $20,000 today can be $2000 next year. Or $20...<p>See e.g. <a href="https://epoch.ai/data-insights/llm-inference-price-trends/" rel="nofollow">https://epoch.ai/data-insights/llm-inference-price-trends/</a></p>
]]></description><pubDate>Sat, 11 Apr 2026 20:20:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47733674</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47733674</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47733674</guid></item><item><title><![CDATA[New comment by mnicky in "Gold overtakes U.S. Treasuries as the largest foreign reserve asset"]]></title><description><![CDATA[
<p>This might be unconstitutional?</p>
]]></description><pubDate>Sat, 04 Apr 2026 23:02:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47644449</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47644449</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47644449</guid></item><item><title><![CDATA[New comment by mnicky in "Gold overtakes U.S. Treasuries as the largest foreign reserve asset"]]></title><description><![CDATA[
<p>Averages tell nothing about an average citizen.<p>Also, there are other measurements like inequality, healthcare cost, social securities...</p>
]]></description><pubDate>Sat, 04 Apr 2026 22:46:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47644327</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47644327</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47644327</guid></item><item><title><![CDATA[New comment by mnicky in "GPT-5.4"]]></title><description><![CDATA[
<p>This observation makes sense, because all models currently probably use some kind of a sparse attention architecture.<p>So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.</p>
]]></description><pubDate>Fri, 06 Mar 2026 08:40:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=47272526</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47272526</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47272526</guid></item><item><title><![CDATA[New comment by mnicky in "Cancel ChatGPT AI boycott surges after OpenAI pentagon military deal"]]></title><description><![CDATA[
<p>He's trying to make it sound so, but in legal domain, devil lies in the details.<p>It seems that government wanted to use Claude for mass analysis of commercially obtained data on American people and Anthropic wouldn't let them (source: <a href="https://www.theatlantic.com/technology/2026/03/inside-anthropics-killer-robot-dispute-with-the-pentagon/686200/?gift=2iIN4YrefPjuvZ5d2Kh30zpPxOtZj8TuGGLnTN11Z-s" rel="nofollow">https://www.theatlantic.com/technology/2026/03/inside-anthro...</a> ).<p>DoD kept asking for changes of contract where at least the legalese would be changed to somewhat more permissive but Anthropic stayed their ground.<p>Sam Altman probably let them do that, while using language like "we have technical means of oversight and the same red lines as Anthropic". But in reality they will allow DoD to do what Anthropic didn't.<p>See this for more information: <a href="https://www.lesswrong.com/posts/PBrggrw4mhgbksoYY/a-tale-of-three-contracts" rel="nofollow">https://www.lesswrong.com/posts/PBrggrw4mhgbksoYY/a-tale-of-...</a></p>
]]></description><pubDate>Wed, 04 Mar 2026 07:09:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47244154</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47244154</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47244154</guid></item><item><title><![CDATA[New comment by mnicky in "How I use Claude Code: Separation of planning and execution"]]></title><description><![CDATA[
<p>> Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.<p>I've experienced that too. Usually when I request correction, I add something like "Include only production level comments, (not changes)". Recently I also added special instruction for this to CLAUDE.md.</p>
]]></description><pubDate>Sun, 22 Feb 2026 15:58:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47112030</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47112030</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47112030</guid></item><item><title><![CDATA[New comment by mnicky in "How I use Claude Code: Separation of planning and execution"]]></title><description><![CDATA[
<p>Since some time, Claude Codes's plan mode also writes file with a plan that you could probably edit etc. It's located in ~/.claude/plans/ for me. Actually, there's whole history of plans there.<p>I sometimes reference some of them to build context, e.g. after few unsuccessful tries to implement something, so that Claude doesn't try the same thing again.</p>
]]></description><pubDate>Sun, 22 Feb 2026 15:53:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47111983</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=47111983</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47111983</guid></item><item><title><![CDATA[New comment by mnicky in "GPT‑5.3‑Codex‑Spark"]]></title><description><![CDATA[
<p>Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast.</p>
]]></description><pubDate>Thu, 12 Feb 2026 20:29:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=46994670</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46994670</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46994670</guid></item><item><title><![CDATA[New comment by mnicky in "GPT‑5.3‑Codex‑Spark"]]></title><description><![CDATA[
<p>> What am I missing?<p>Largest production capacity maybe?<p>Also, market demand will be so high that every player's chips will be sold out.</p>
]]></description><pubDate>Thu, 12 Feb 2026 20:26:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=46994621</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46994621</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46994621</guid></item><item><title><![CDATA[New comment by mnicky in "Gemini 3 Deep Think"]]></title><description><![CDATA[
<p>Well, fair comparison would be with GPT-5.x Pro, which is the same class of a model as Gemini Deep Think.</p>
]]></description><pubDate>Thu, 12 Feb 2026 17:53:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46992340</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46992340</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46992340</guid></item><item><title><![CDATA[New comment by mnicky in "Gemini 3 Deep Think"]]></title><description><![CDATA[
<p>> can a sufficiently large non thinking model perform the same as a smaller thinking?<p>Models from Anthropic have always been excellent at this. See e.g. <a href="https://imgur.com/a/EwW9H6q" rel="nofollow">https://imgur.com/a/EwW9H6q</a> (top-left Opus 4.6 is without thinking).</p>
]]></description><pubDate>Thu, 12 Feb 2026 17:50:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46992275</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46992275</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46992275</guid></item><item><title><![CDATA[New comment by mnicky in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>It is possible to think of tokens as some proxy for thinking space. At least reasoning tokens work like this.<p>Dollar/watt are not public and time has confounders like hardware.</p>
]]></description><pubDate>Wed, 11 Feb 2026 19:50:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=46979935</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46979935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46979935</guid></item><item><title><![CDATA[New comment by mnicky in "Claude Code is being dumbed down?"]]></title><description><![CDATA[
<p>At least now we also have a tracker: <a href="https://marginlab.ai/trackers/claude-code/" rel="nofollow">https://marginlab.ai/trackers/claude-code/</a></p>
]]></description><pubDate>Wed, 11 Feb 2026 18:47:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=46979044</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46979044</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46979044</guid></item><item><title><![CDATA[New comment by mnicky in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>What I haven't seen discussed anywhere so far is how big a lead Anthropic seems to have in intelligence per output token, e.g. if you look at [1].<p>We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.<p>I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?<p>How do you read this?<p>[1] <a href="https://imgur.com/a/EwW9H6q" rel="nofollow">https://imgur.com/a/EwW9H6q</a></p>
]]></description><pubDate>Wed, 11 Feb 2026 18:44:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46978980</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46978980</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46978980</guid></item><item><title><![CDATA[New comment by mnicky in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>> I think GPT-5.3-Codex was a disappointment<p>Care to elaborate more?</p>
]]></description><pubDate>Wed, 11 Feb 2026 17:33:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46977998</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46977998</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46977998</guid></item><item><title><![CDATA[New comment by mnicky in "Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs"]]></title><description><![CDATA[
<p>Evaluation than depends on your specific cost-benefit tradeoff of accuracy vs hallucinations.<p>For some tasks where detecting hallucinations is easy I can see it being beneficial.<p>In general case not so much...</p>
]]></description><pubDate>Wed, 11 Feb 2026 09:03:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=46972608</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46972608</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46972608</guid></item><item><title><![CDATA[New comment by mnicky in "Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs"]]></title><description><![CDATA[
<p>If you recall the context/situation at the time it was released, that might be close to the truth. Google desperately needed to show competency in improving Gemini capabilities, and other considerations could have been assigned lower priority.<p>So they could have paid a price in “model welfare” and released an LLM very eager to deliver.<p>It also shows in AA-Omniscience Hallucination Rate benchmark where Gemini has 88%, the worst from frontier models.</p>
]]></description><pubDate>Wed, 11 Feb 2026 08:49:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=46972516</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46972516</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46972516</guid></item><item><title><![CDATA[New comment by mnicky in "Coding agents have replaced every framework I used"]]></title><description><![CDATA[
<p>Critically, they will also enable faster future migration to a framework in case it proves useful.</p>
]]></description><pubDate>Sat, 07 Feb 2026 16:19:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=46924998</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46924998</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46924998</guid></item><item><title><![CDATA[New comment by mnicky in "Claude Opus 4.6"]]></title><description><![CDATA[
<p>On my tasks (mostly data science), Opus has significantly lower probability of making stupid mistakes than Sonnet.<p>I'd still appreciate more intelligence than Opus 4.5 so I'm looking forward to trying 4.6.</p>
]]></description><pubDate>Thu, 05 Feb 2026 22:49:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46906533</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46906533</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46906533</guid></item><item><title><![CDATA[New comment by mnicky in "AISLE’s autonomous analyzer found all CVEs in the January OpenSSL release"]]></title><description><![CDATA[
<p>To your second point - why would you need this? There are _plenty_ of previously found CVEs to train on.<p>Also, I don't think the three letter agencies would share one of the most prized assets they have...</p>
]]></description><pubDate>Wed, 28 Jan 2026 08:44:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=46792663</link><dc:creator>mnicky</dc:creator><comments>https://news.ycombinator.com/item?id=46792663</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46792663</guid></item></channel></rss>