<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: data_maan</title><link>https://news.ycombinator.com/user?id=data_maan</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 07 Jun 2026 22:17:59 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=data_maan" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by data_maan in "Harness engineering: Leveraging Codex in an agent-first world"]]></title><description><![CDATA[
<p>Was this in the GPT2 paper?</p>
]]></description><pubDate>Sun, 07 Jun 2026 11:38:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48433882</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=48433882</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48433882</guid></item><item><title><![CDATA[New comment by data_maan in "ML promises to be profoundly weird"]]></title><description><![CDATA[
<p>If LLMs lie as much as the OP claims in the article, why can they then solve Olympiad math problems they never saw during training, consistently?<p>There's the aimoprize.com on Kaggle for example that shows this</p>
]]></description><pubDate>Thu, 09 Apr 2026 15:11:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47704793</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47704793</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47704793</guid></item><item><title><![CDATA[New comment by data_maan in "Why the US Navy won't blast the Iranians and 'open' Strait of Hormuz"]]></title><description><![CDATA[
<p>More "American minds": <a href="https://en.wikipedia.org/wiki/Hartmut_Esslinger" rel="nofollow">https://en.wikipedia.org/wiki/Hartmut_Esslinger</a><p>Chief designer at Apple war German.</p>
]]></description><pubDate>Thu, 02 Apr 2026 04:35:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47610036</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47610036</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47610036</guid></item><item><title><![CDATA[New comment by data_maan in "Why the US Navy won't blast the Iranians and 'open' Strait of Hormuz"]]></title><description><![CDATA[
<p>> built with American capital and mostly American minds.<p>I would say "built with American agency and commercial spirit", not minds.<p>Most of the things that we have were first built elsewhere (Germany being a prime supplier here with the mp3 or the Zuse), but turning them commercial was the input that came from  America.</p>
]]></description><pubDate>Wed, 01 Apr 2026 11:47:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47599585</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47599585</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47599585</guid></item><item><title><![CDATA[New comment by data_maan in "Why the US Navy won't blast the Iranians and 'open' Strait of Hormuz"]]></title><description><![CDATA[
<p>To be fair, Iran is not pretentious either, killing a few thousand people because they dared to protest.<p>There are no good guys in this conflict.</p>
]]></description><pubDate>Wed, 01 Apr 2026 11:37:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47599501</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47599501</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47599501</guid></item><item><title><![CDATA[New comment by data_maan in "Why the US Navy won't blast the Iranians and 'open' Strait of Hormuz"]]></title><description><![CDATA[
<p><a href="https://www.worldatlas.com/us-history/wars-the-united-states-didn-t-win.html" rel="nofollow">https://www.worldatlas.com/us-history/wars-the-united-states...</a></p>
]]></description><pubDate>Wed, 01 Apr 2026 11:34:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47599468</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47599468</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47599468</guid></item><item><title><![CDATA[New comment by data_maan in "Epoch confirms GPT5.4 Pro solved a frontier math open problem"]]></title><description><![CDATA[
<p>A model to whose internals we don't have access solved a problem we didn't knew was in their datasets. Great, I'm impressed</p>
]]></description><pubDate>Tue, 24 Mar 2026 04:40:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47498660</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47498660</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47498660</guid></item><item><title><![CDATA[New comment by data_maan in "Israel is running critically low on interceptors, US officials say"]]></title><description><![CDATA[
<p>Strategic? Yes.<p>Moral? Hm. From a moral POV this would be about who has the right to terrorize the Iranian population: the Iranian government or the US/Israel government.</p>
]]></description><pubDate>Sun, 15 Mar 2026 11:43:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47386452</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47386452</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47386452</guid></item><item><title><![CDATA[New comment by data_maan in "Tell HN: I'm 60 years old. Claude Code has re-ignited a passion"]]></title><description><![CDATA[
<p>Opinions differ: hobby coders love it, but domain expert secretly despise it because it narrows the gap between the skills they spent years honing and the average Claude, I mean Joe, that just uses this mental exoskeleton.</p>
]]></description><pubDate>Sat, 07 Mar 2026 12:13:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47286935</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47286935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47286935</guid></item><item><title><![CDATA[New comment by data_maan in "Tech employment now significantly worse than the 2008 or 2020 recessions"]]></title><description><![CDATA[
<p>> The people getting pushed out are the intermediates and seniors who aren't high performers.<p>Also the people that can't market themselves. There are very average programmers that have a large following on X that seem to do very well.</p>
]]></description><pubDate>Sat, 07 Mar 2026 11:10:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47286551</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47286551</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47286551</guid></item><item><title><![CDATA[New comment by data_maan in "Danish government agency to ditch Microsoft software (2025)"]]></title><description><![CDATA[
<p>I love these posts that are so on the edge that I can't tell if it's sarcastic or for real :)</p>
]]></description><pubDate>Wed, 25 Feb 2026 11:33:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47150192</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=47150192</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47150192</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>> What do you mean ? These are top-notch mathematicians<p>YeS. I didn't dispute that. I disputed that they are NOT top notch ML specialist and have made one of the worst benchmarks of 2025-2026. Benchmarks like these would have worked maybe in early 2024 at latest. The field has moved on significantly since.<p>And yes, many many other benchmarks don't use toy problems -- their names are just a prompt away.<p>> You are kidding right ? FrontierMath benchmark [1] is produced by a startup whose incentives are dubious to say the least.<p>They did 1) open source some of their datapoints (on a similar order of magnitude) and 2) they carried out detailed evals. Here is much to learn from their blog posts, much more than from the current dataset.<p>But fair. If you don't like them, have a look at IMProofBench. Have a look at the AIMO competition. Have a loom at HardMath. It's quite a landscape of datasets already.<p>> Unlike the AI hypesters, these are real mathematicians trying to inject some realism and really test the boundaries of these tools<p>As mentioned above, realistic benchmarks that are bigger and better exist. Unfortunately, from a benchmarking POV, these mathematicians are the hypesters with a preprint that wouldnt even make it to the AI&Math workshops at ICML or NeurIPS.</p>
]]></description><pubDate>Tue, 10 Feb 2026 19:24:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46965410</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46965410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46965410</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>If it's the latter case (which it has to be), it seems that attention credit (via, e.g., articles in NY Times) is very unfairly distributed.<p>None of the people that advanced the state of benchmarking and did the hard work on much bigger benchmarks got any, but a ridiculous benchmark of 10 question scored big.</p>
]]></description><pubDate>Tue, 10 Feb 2026 19:17:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=46965317</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46965317</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46965317</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>> We will learn if the magical capabilities attributed to these tools are really true or not.<p>They're not. We already know that. FrontierMath. Yu Tsumura's 553th problem, RealMath benchmark. The list goes on. As I said many times on this thread, there is nothing novel in this benchmark.<p>This fact that this benchmark is so hyped shows that the community knows nothing, NOTHING, about prior work in this space, which makes me sad.</p>
]]></description><pubDate>Tue, 10 Feb 2026 19:15:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46965280</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46965280</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46965280</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>> These problems are representative of the types of subproblems research mathematicians have to solve to get a “research result”. They are finding that LLMs aren’t that useful for mathematical research because they can’t crush these problems along the way. And I assume they put this doc together because they want that to change :)<p>Same holds true for IMProofBench problems. This dataset shows nothing new.</p>
]]></description><pubDate>Tue, 10 Feb 2026 19:12:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=46965240</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46965240</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46965240</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>But everything has been explored in other datasets already.<p>If only a bunch of mathematicians learn something, why are so many people talking about this, why is the NY Times posting about this?<p>This is the attention economy at its worst.</p>
]]></description><pubDate>Tue, 10 Feb 2026 19:11:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=46965221</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46965221</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46965221</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>It's not angst. It's intense frustration that they 1) are not doing the science correctly, and 2) that others (e.g. FrontierMath) already did everything they claim to be doing, so we won't learn anything new here, but somehow 1stproof get all the credit.</p>
]]></description><pubDate>Sun, 08 Feb 2026 08:28:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46932460</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46932460</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46932460</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>If you want to do this rigorously, you should run it as a competition like the guys at the AI-MO Prize are doing on Kaggle.<p>That way you get all the necessary data.<p>I still think this is bro science.</p>
]]></description><pubDate>Sun, 08 Feb 2026 08:26:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=46932443</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46932443</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46932443</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>> There are some experiments which cannot be carried out more than once<p>Yes, in which case a very detailed methodology is required: which hardware, runtimes, token counts etc.<p>This does none of that.</p>
]]></description><pubDate>Sun, 08 Feb 2026 08:24:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=46932433</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46932433</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46932433</guid></item><item><title><![CDATA[New comment by data_maan in "First Proof"]]></title><description><![CDATA[
<p>It wasn't like this in any way.<p>CASP relies on a robust benchmark (not just 10 random proteins), and has clear participation criteria, objective metrics how the eval plays out, etc.<p>So I stand by my claim: This isn't scientific. If CASP is Japan, a highly organized & civilized society, this is a banana republic.</p>
]]></description><pubDate>Sun, 08 Feb 2026 08:23:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46932427</link><dc:creator>data_maan</dc:creator><comments>https://news.ycombinator.com/item?id=46932427</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46932427</guid></item></channel></rss>