<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: usaar333</title><link>https://news.ycombinator.com/user?id=usaar333</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 17:11:55 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=usaar333" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by usaar333 in "Claude Opus 4.7"]]></title><description><![CDATA[
<p>page is updated to state:<p>MCP-Atlas: The Opus 4.6 score has been updated to reflect revised grading methodology from Scale AI.</p>
]]></description><pubDate>Thu, 16 Apr 2026 16:07:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47795453</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=47795453</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47795453</guid></item><item><title><![CDATA[New comment by usaar333 in "How We Broke Top AI Agent Benchmarks: And What Comes Next"]]></title><description><![CDATA[
<p>> But even setting aside the leaked answers, the scorer’s normalize_str function strips ALL whitespace, ALL punctuation, and lowercases everything before comparison. This means:<p>I don't understand the concern here</p>
]]></description><pubDate>Sun, 12 Apr 2026 04:35:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47736190</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=47736190</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47736190</guid></item><item><title><![CDATA[New comment by usaar333 in "Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs"]]></title><description><![CDATA[
<p>True, but it gets you higher accuracy. Gemini had the best aa-omniscience score<p><a href="https://artificialanalysis.ai/evaluations/omniscience" rel="nofollow">https://artificialanalysis.ai/evaluations/omniscience</a></p>
]]></description><pubDate>Tue, 10 Feb 2026 07:11:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=46956327</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=46956327</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46956327</guid></item><item><title><![CDATA[New comment by usaar333 in "Claude Opus 4.6"]]></title><description><![CDATA[
<p>Openai has; they don't even mention score on gpt-5.3-codex.<p>On the other hand, it is their own verified benchmark, which is telling.</p>
]]></description><pubDate>Thu, 05 Feb 2026 18:17:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46902793</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=46902793</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46902793</guid></item><item><title><![CDATA[New comment by usaar333 in "Claude Opus 4.6"]]></title><description><![CDATA[
<p>i'd interpret that as rounding error. that is unchanged<p>swe-bench seems really hard once you are above 80%</p>
]]></description><pubDate>Thu, 05 Feb 2026 17:59:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46902501</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=46902501</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46902501</guid></item><item><title><![CDATA[New comment by usaar333 in "In a U.S. First, New Mexico Opens Doors to Free Child Care for All"]]></title><description><![CDATA[
<p>In Quebec it was a 20% jump in mother employment: <a href="https://www.bloomberg.com/news/articles/2018-12-31/affordable-daycare-and-working-moms-the-quebec-model" rel="nofollow">https://www.bloomberg.com/news/articles/2018-12-31/affordabl...</a><p>And had all sorts of negative outcomes for the kids: <a href="https://www.edweek.org/teaching-learning/long-term-study-of-universal-preschool-in-quebec-yields-sobering-outcomes/2018/12" rel="nofollow">https://www.edweek.org/teaching-learning/long-term-study-of-...</a></p>
]]></description><pubDate>Sat, 22 Nov 2025 17:48:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46016627</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=46016627</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46016627</guid></item><item><title><![CDATA[New comment by usaar333 in "Gemini 3"]]></title><description><![CDATA[
<p>claude 4.5 gets 82% on their own highly customized scaffolding. (parallel compute with a scoring function). That beats Doubao</p>
]]></description><pubDate>Tue, 18 Nov 2025 17:39:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=45969455</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=45969455</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45969455</guid></item><item><title><![CDATA[New comment by usaar333 in "How Israeli actions caused famine in Gaza, visualized"]]></title><description><![CDATA[
<p>That wasn't a ceasefire violation. It was a six week ceasefire that had expired at the beginning of March</p>
]]></description><pubDate>Thu, 02 Oct 2025 20:09:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45454889</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=45454889</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45454889</guid></item><item><title><![CDATA[New comment by usaar333 in "Sora 2"]]></title><description><![CDATA[
<p>Physics seems better than veo 3 at least from demo videos</p>
]]></description><pubDate>Tue, 30 Sep 2025 19:41:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=45430267</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=45430267</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45430267</guid></item><item><title><![CDATA[New comment by usaar333 in "Claude Sonnet 4.5"]]></title><description><![CDATA[
<p>Except it is sublinear.  Sonnet 4 was 10.2% above sonnet 3.7 after 3 months.</p>
]]></description><pubDate>Mon, 29 Sep 2025 19:39:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=45417855</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=45417855</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45417855</guid></item><item><title><![CDATA[New comment by usaar333 in "GPT-5"]]></title><description><![CDATA[
<p>No it doesn't. If it were even linear compared to o1 -> o3, we'd be at 2.43 hours.  Instead we're only at 2.29.<p>Exponential would be at 3.6 hours</p>
]]></description><pubDate>Fri, 08 Aug 2025 00:04:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=44831916</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44831916</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44831916</guid></item><item><title><![CDATA[New comment by usaar333 in "GPT-5: Key characteristics, pricing and system card"]]></title><description><![CDATA[
<p>No, this is below expectations on both Manifold and lesswrong (<a href="https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_greenblatt-s-shortform?commentId=6ue8BPWrcoa2eGJdP" rel="nofollow">https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_green...</a>).  Median was ~2.75 hours on both (which already represented a bearish slowdown).<p>Not massively off -- manifold yesterday implied odds this low were ~35%.  30% before Claude Opus 4.1 came out which updated expected agentic coding abilities downward.</p>
]]></description><pubDate>Thu, 07 Aug 2025 18:33:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=44828550</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44828550</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44828550</guid></item><item><title><![CDATA[New comment by usaar333 in "GPT-5"]]></title><description><![CDATA[
<p>At this point the prediction for SWE bench (85% by end of this month) is not materializing. We're actually quite far away.</p>
]]></description><pubDate>Thu, 07 Aug 2025 17:12:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=44827226</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44827226</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44827226</guid></item><item><title><![CDATA[New comment by usaar333 in "Claude Opus 4.1"]]></title><description><![CDATA[
<p>No obvious gains I feel from quick chats, but too early to tell.<p>These benchmark gains aren't that high, so I doubt it is that obvious.</p>
]]></description><pubDate>Tue, 05 Aug 2025 16:57:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44800664</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44800664</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44800664</guid></item><item><title><![CDATA[New comment by usaar333 in "Startup equity is worth more than you think"]]></title><description><![CDATA[
<p>> Firstly, if your prior is that every previous startup failed, what does that say about your future chances of success?<p>The prior is the market. It isn't sane to use your own prior experience. (Works both ways -- if your last startup did great, shouldn't assume next will).<p>> 4% of YC companies become unicorns. How many startups do you need to work for before you become part of the 4%? That number is not a feasible number of jobs for one lifetime.<p>The bar (and what the model is calculating) is Series A from top VC, not YC Seed funding.  That significantly increases odds.   Specifically, ~45% YC companies get Series A, so it's more like 10% chance of a YC Series A funded company becoming a unicorn (<a href="https://www.lennysnewsletter.com/p/pulling-back-the-curtain-on-the-magic" rel="nofollow">https://www.lennysnewsletter.com/p/pulling-back-the-curtain-...</a>).<p>Model is change jobs every 18 months if not booming. A 1 in 10 chance is quite reasonable over a career.<p>I agree there is an issue with the event being too rare, but you can't just look only at modal returns.  2/3 chance of $0 (the modal return) and 1/3 chance of $10 million profit is still pretty good odds to work with.</p>
]]></description><pubDate>Mon, 04 Aug 2025 19:14:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=44790197</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44790197</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44790197</guid></item><item><title><![CDATA[New comment by usaar333 in "Startup equity is worth more than you think"]]></title><description><![CDATA[
<p>Why is modal return so important? You'll work more than 2 jobs</p>
]]></description><pubDate>Mon, 04 Aug 2025 14:14:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=44786076</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44786076</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44786076</guid></item><item><title><![CDATA[New comment by usaar333 in "Startup equity is worth more than you think"]]></title><description><![CDATA[
<p>It's a probabilistic model.  It assumes (correctly) that the low probability of a home run times the home run's valuation is quite large ("expected returns" in the probabilistic sense).<p>>  this argument reads to me like "the returns on a Powerball win are so much higher than your projected lifetime earnings that playing the lottery is a smart financial move".<p>That's stronger claim than it is making, but yes in a sense it is saying the lottery can be a good move because the expectation is large - that's what VCs do after all.<p>Note that all the model aims to do is value the equity package.  If a public company is offering more than what this model values the startup equity package as (and this often is the case!), it isn't worth it financially to work at that startup.</p>
]]></description><pubDate>Mon, 04 Aug 2025 02:27:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=44781598</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44781598</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44781598</guid></item><item><title><![CDATA[New comment by usaar333 in "Startup equity is worth more than you think"]]></title><description><![CDATA[
<p>The value of the equity package is 4x higher than the FAANG equivalent equity package (at preferred/market pricing) - that's not the same as saying the shares themselves are worth that.<p>To sum up the arguments:<p>* Employment packages allow things a shareholder cannot do (functionally recall their investment), so the high volatility leads to higher package returns.<p>* FAANG equity grants (RSUs) are taxed at much higher rates<p>* Expected return is in fact higher on startup equity than FAANG equity (and you generally have no way to invest in the good startups directly aside from working for them).</p>
]]></description><pubDate>Mon, 04 Aug 2025 01:24:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=44781307</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44781307</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44781307</guid></item><item><title><![CDATA[Startup equity is worth more than you think]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.amafinance.org/startup_comp/">https://www.amafinance.org/startup_comp/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44781179">https://news.ycombinator.com/item?id=44781179</a></p>
<p>Points: 2</p>
<p># Comments: 9</p>
]]></description><pubDate>Mon, 04 Aug 2025 00:55:10 +0000</pubDate><link>https://www.amafinance.org/startup_comp/</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44781179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44781179</guid></item><item><title><![CDATA[New comment by usaar333 in "Lina Khan points to Figma IPO as vindication of M&A scrutiny"]]></title><description><![CDATA[
<p>I don't see why the market cap proves whether she is correct or not.  You'd have to compare it to the counter-factual of what the value of a Figma subsidiary would be under Adobe today.<p>This is not obvious at all to me.  Instagram (bought for $1B) is probably worth ~700 B of Meta's market cap.</p>
]]></description><pubDate>Sun, 03 Aug 2025 18:34:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=44778623</link><dc:creator>usaar333</dc:creator><comments>https://news.ycombinator.com/item?id=44778623</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44778623</guid></item></channel></rss>