Hacker News: alextheparrot

New comment by alextheparrot in "ChatGPT Images 2.0"

alextheparrot — Tue, 21 Apr 2026 23:23:32 +0000

The paper they published last year goes over some of these transformations: https://arxiv.org/pdf/2510.09263

New comment by alextheparrot in "ChatGPT Images 2.0"

alextheparrot — Tue, 21 Apr 2026 19:57:10 +0000

> Integrating an imperceptible, robust, and content-specific watermark

From the system card someone linked elsewhere in the discussion

New comment by alextheparrot in "Ask ChatGPT to pick a number from 1-10000, it generally selects from 7200-7500"

alextheparrot — Sat, 21 Mar 2026 07:26:02 +0000

No LLMs are calibrated?

New comment by alextheparrot in "Court orders restart of all US offshore wind power construction"

alextheparrot — Tue, 03 Feb 2026 03:53:53 +0000

I mean, you could also frame this as an issue the electorate could actually prioritize instead of just hoping the courts work it out

New comment by alextheparrot in "Recursive Language Models"

alextheparrot — Sun, 04 Jan 2026 01:57:15 +0000

The derivative being a grad(ient) student sampling scaffolds against evals + qualitative observations: most prompt-based llm papers

New comment by alextheparrot in "New research reveals longevity gains slowing, life expectancy of 100 unlikely"

alextheparrot — Sat, 30 Aug 2025 18:06:05 +0000

Cancer is a parasitism that kills the host (or the host dies from other causes and it is not self-sufficient). Just because something is defined by uncontrolled self-replication doesn’t mean it is stable to live forever (Which is as much a comment on homeostasis as self-renewal)

New comment by alextheparrot in "AI agent benchmarks are broken"

alextheparrot — Fri, 11 Jul 2025 20:25:58 +0000

It isn’t actually very wrong. Your example is tangential as graders in school have multiple roles — teaching the content and grading. That’s an implementation detail, not a counter to the premise.

I don’t think we should assume answering a test would be easy for a Scantron machine just because it is very good at grading them, either.

New comment by alextheparrot in "AI agent benchmarks are broken"

alextheparrot — Fri, 11 Jul 2025 20:20:33 +0000

> Good evaluations write test sets for the discriminators to show when this is or isn’t true.

If they can’t write an evaluation for the discriminator I agree. All the input data issues you highlight also apply to generators.

New comment by alextheparrot in "AI agent benchmarks are broken"

alextheparrot — Fri, 11 Jul 2025 20:16:07 +0000

I wish the other replies and this would engage with the sentence right after it indicating that you should test this premise empirically.

New comment by alextheparrot in "AI agent benchmarks are broken"

alextheparrot — Fri, 11 Jul 2025 15:37:59 +0000

If this sort of error isn’t acceptable, it should be part of an evaluation set for your discriminator

Fundamentally I’m not disagreeing with the article, but also think most people who care take the above approach because if you do care you read samples, find the issues, and patch them to hill climb better

New comment by alextheparrot in "AI agent benchmarks are broken"

alextheparrot — Fri, 11 Jul 2025 14:14:45 +0000

LLMs evaluating LLM outputs really isn’t that dire…

Discriminating good answers is easier than generating them. Good evaluations write test sets for the discriminators to show when this is or isn’t true. Evaluating the outputs as the user might see them are more representative than having your generator do multiple tasks (e.g. solve a math query and format the output as a multiple choice answer).

Also, human labels are good but have problems of their own, it isn’t like by using a “different intelligence architecture” we elide all the possible errors. Good instructions to the evaluation model often translate directly to better human results, showing a correlation between these two sources of sampling intelligence.

New comment by alextheparrot in "How we’re responding to The NYT’s data demands in order to protect user privacy"

alextheparrot — Fri, 06 Jun 2025 16:21:47 +0000

in the app: Settings ~> Data Controls ~> Improve the model for everyone

New comment by alextheparrot in "Retailers will soon have only about 7 weeks of full inventories left"

alextheparrot — Wed, 30 Apr 2025 15:19:24 +0000

That’s a premise that would make me consider the wiseness of my actions.

New comment by alextheparrot in "Lawmakers are skeptical of Zuckerberg's commitment to free speech"

alextheparrot — Thu, 10 Apr 2025 14:32:50 +0000

Quippy, but off the cuff: - I don’t go to my present town square(s) socially because it is full of a-social behavior. Same reason to avoid certain bars or clubs, prefer certain parks, or why some are wary of public transit.

- I don’t feel a right to decide the vibe of how a business curates its space. My bakery, coffee shop, local library, etc. all curate a space with an opinion. I don’t feel I have standing to assert that my preferences should dominate their choices.

As an aside, businesses are also an extension of the people, the best ones tend to just not be mode collapsed

New comment by alextheparrot in "The hacking of culture and the creation of socio-technical debt"

alextheparrot — Wed, 19 Jun 2024 19:37:29 +0000

Really enjoyed the piece.

A passing thought: the ethe of individuals in the 70s and 80s is important because of the people it informed in subsequent years. While many people still like to hack, code, etc., the relative proportion of people doing this and working in tech continues to diminish as the popularity and importance of the sector grows. I wonder if debt without values / a more cohered zeitgeist is better or worse?

New comment by alextheparrot in "Safe Superintelligence Inc."

alextheparrot — Wed, 19 Jun 2024 17:51:36 +0000

Glibly, I’d also love your definition of the education system writ large.

New comment by alextheparrot in "OpenAI and Apple Announce Partnership"

alextheparrot — Mon, 10 Jun 2024 20:08:42 +0000

Bit of a detail, but where are you deriving “with hundreds of terabytes of unified GPU memory” from?

New comment by alextheparrot in "σ-GPTs: A new approach to autoregressive models"

alextheparrot — Fri, 07 Jun 2024 19:32:47 +0000

No, but it makes more conceptual sense given the model can consider what was said before it

New comment by alextheparrot in "I should have loved biology (2020)"

alextheparrot — Mon, 22 Apr 2024 03:33:36 +0000

I love the romance of this piece, but in my experience he’s just describing the difference in expectations of learning biology at a high school versus advanced undergraduate to graduate level.

Romance is for those who care, and most don’t. But it is so, so beautiful once you do.

New comment by alextheparrot in "Gemma: New Open Models"

alextheparrot — Wed, 21 Feb 2024 15:06:43 +0000

> “Our best shot at making the quarter is if we get an injection of at least [redacted]% , queries ASAP from Chrome.” (Google Exec)

Isn’t there a whole anti-trust case going on around this?

[0] https://www.nytimes.com/interactive/2023/10/24/business/goog...