Hacker News: usaar333

New comment by usaar333 in "Samsung chip workers will get an average $340k bonus as AI profits soar"

usaar333 — Fri, 22 May 2026 04:06:23 +0000

Tech workers get paid in equity and many in the semiconductor industry are making far far more than this a year with all the equity appreciation.

New comment by usaar333 in "“Too dangerous to release” or just too expensive?"

usaar333 — Fri, 15 May 2026 15:30:24 +0000

How does delaying the release not solve anything? It puts everyone on a notice to fix all security vulnerabilities now

New comment by usaar333 in "Sierra Raises $950M at $15B Valuation"

usaar333 — Mon, 04 May 2026 23:53:33 +0000

I hate waiting on hold for 30 minutes even more.

New comment by usaar333 in "Sierra Raises $950M at $15B Valuation"

usaar333 — Mon, 04 May 2026 23:53:15 +0000

There's literally a link on the blog post to an article noting they hit $150M ARR.

New comment by usaar333 in "Sierra Raises $950M at $15B Valuation"

usaar333 — Mon, 04 May 2026 23:51:25 +0000

Voice agents have capabilities and policy to alter customer state. Just the other day I called into a CC company and the AI waived an interest charge.

New comment by usaar333 in "Claude Opus 4.7"

usaar333 — Thu, 16 Apr 2026 16:07:09 +0000

page is updated to state:

MCP-Atlas: The Opus 4.6 score has been updated to reflect revised grading methodology from Scale AI.

New comment by usaar333 in "How We Broke Top AI Agent Benchmarks: And What Comes Next"

usaar333 — Sun, 12 Apr 2026 04:35:28 +0000

> But even setting aside the leaked answers, the scorer’s normalize_str function strips ALL whitespace, ALL punctuation, and lowercases everything before comparison. This means:

I don't understand the concern here

New comment by usaar333 in "Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs"

usaar333 — Tue, 10 Feb 2026 07:11:12 +0000

True, but it gets you higher accuracy. Gemini had the best aa-omniscience score

https://artificialanalysis.ai/evaluations/omniscience

New comment by usaar333 in "Claude Opus 4.6"

usaar333 — Thu, 05 Feb 2026 18:17:27 +0000

Openai has; they don't even mention score on gpt-5.3-codex.

On the other hand, it is their own verified benchmark, which is telling.

New comment by usaar333 in "Claude Opus 4.6"

usaar333 — Thu, 05 Feb 2026 17:59:27 +0000

i'd interpret that as rounding error. that is unchanged

swe-bench seems really hard once you are above 80%

New comment by usaar333 in "In a U.S. First, New Mexico Opens Doors to Free Child Care for All"

usaar333 — Sat, 22 Nov 2025 17:48:39 +0000

In Quebec it was a 20% jump in mother employment: https://www.bloomberg.com/news/articles/2018-12-31/affordabl...

And had all sorts of negative outcomes for the kids: https://www.edweek.org/teaching-learning/long-term-study-of-...

New comment by usaar333 in "Gemini 3"

usaar333 — Tue, 18 Nov 2025 17:39:44 +0000

claude 4.5 gets 82% on their own highly customized scaffolding. (parallel compute with a scoring function). That beats Doubao

New comment by usaar333 in "How Israeli actions caused famine in Gaza, visualized"

usaar333 — Thu, 02 Oct 2025 20:09:25 +0000

That wasn't a ceasefire violation. It was a six week ceasefire that had expired at the beginning of March

New comment by usaar333 in "Sora 2"

usaar333 — Tue, 30 Sep 2025 19:41:50 +0000

Physics seems better than veo 3 at least from demo videos

New comment by usaar333 in "Claude Sonnet 4.5"

usaar333 — Mon, 29 Sep 2025 19:39:59 +0000

Except it is sublinear. Sonnet 4 was 10.2% above sonnet 3.7 after 3 months.

New comment by usaar333 in "GPT-5"

usaar333 — Fri, 08 Aug 2025 00:04:23 +0000

No it doesn't. If it were even linear compared to o1 -> o3, we'd be at 2.43 hours. Instead we're only at 2.29.

Exponential would be at 3.6 hours

New comment by usaar333 in "GPT-5: Key characteristics, pricing and system card"

usaar333 — Thu, 07 Aug 2025 18:33:28 +0000

No, this is below expectations on both Manifold and lesswrong (https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_green...). Median was ~2.75 hours on both (which already represented a bearish slowdown).

Not massively off -- manifold yesterday implied odds this low were ~35%. 30% before Claude Opus 4.1 came out which updated expected agentic coding abilities downward.

New comment by usaar333 in "GPT-5"

usaar333 — Thu, 07 Aug 2025 17:12:40 +0000

At this point the prediction for SWE bench (85% by end of this month) is not materializing. We're actually quite far away.

New comment by usaar333 in "Claude Opus 4.1"

usaar333 — Tue, 05 Aug 2025 16:57:10 +0000

No obvious gains I feel from quick chats, but too early to tell.

These benchmark gains aren't that high, so I doubt it is that obvious.

New comment by usaar333 in "Startup equity is worth more than you think"

usaar333 — Mon, 04 Aug 2025 19:14:08 +0000

> Firstly, if your prior is that every previous startup failed, what does that say about your future chances of success?

The prior is the market. It isn't sane to use your own prior experience. (Works both ways -- if your last startup did great, shouldn't assume next will).

> 4% of YC companies become unicorns. How many startups do you need to work for before you become part of the 4%? That number is not a feasible number of jobs for one lifetime.

The bar (and what the model is calculating) is Series A from top VC, not YC Seed funding. That significantly increases odds. Specifically, ~45% YC companies get Series A, so it's more like 10% chance of a YC Series A funded company becoming a unicorn (https://www.lennysnewsletter.com/p/pulling-back-the-curtain-...).

Model is change jobs every 18 months if not booming. A 1 in 10 chance is quite reasonable over a career.

I agree there is an issue with the event being too rare, but you can't just look only at modal returns. 2/3 chance of $0 (the modal return) and 1/3 chance of $10 million profit is still pretty good odds to work with.