Hacker News: mikeknoop

New comment by mikeknoop in "Ti-84 Evo"

mikeknoop — Fri, 01 May 2026 23:22:38 +0000

Fun memory trip. Learned assembly on those old Z80s in middle school. I had to go re-dig up SafeGuard, a program I made by reverse engineering TI's TestGuard, to stop admins from wiping your calculator memory and all your games! https://mikeknoop.com/upload/safeguard/

New comment by mikeknoop in "Recent results show that LLMs struggle with compositional tasks"

mikeknoop — Sun, 02 Feb 2025 06:13:18 +0000

One must now ask whether research results are analyzing pure LLMs (eg. gpt-series) or LLM synthesis engines (eg. o-series, r-series). In this case, the headline is summarizing a paper originally published in 2023 and does not necessarily have bearing on new synthesis engines. In fact, evidence strongly suggests the opposite given o3's significant performance on ARC-AGI-1 which requires on-the-fly composition capability.

New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"

mikeknoop — Fri, 06 Dec 2024 22:55:26 +0000

I think we agree; to clarify, sharp messaging isn't inaccurate messaging. And I believe the story is not overhyped given the evidence: the benchmark resisted a $1M prize pool for ~6 months. But I concede we did obsess about the story to give it the best chance of survival in the marketplace of ideas against the incumbent AI research meme (LLM scaling). Now that the AI research field is coming around to the idea that something beyond deep learning is needed, the story matters less, and the benchmark, and future versions, can stand on their utility as a compass towards AGI.

New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"

mikeknoop — Fri, 06 Dec 2024 22:42:35 +0000

Correct, fine-tuning is not new. It's long been used to augment foundational LLMs with private data. Eg. private enterprise data. We do this at Zapier, for instance.

The new and surprising thing about test-time training (TTT) is how effective it is an approach to deal with novel abstract reasoning problems like ARC-AGI.

TTT was pioneered by Jack Cole last year and popularized this year by several teams, including this winning paper: https://ekinakyurek.github.io/papers/ttt.pdf

New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"

mikeknoop — Fri, 06 Dec 2024 22:03:21 +0000

> I'd heartily recommend maybe taking down the marketing vibrance down a notch and keep things a bit more measured, it's not entirely a meme, though some of the more-serious researchers don't take it as seriously as a result.

This is fair critique. ARC Prize's 2024 messaging was sharp to break through the noise floor -- ARC has been around since 2019 but most only learned about it this summer. Now that it has garnered awareness, it is no longer useful, and in same cases hurting progress like you point out. The messaging needs to evolve and mature next year to be more neutral/academic.

New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"

mikeknoop — Fri, 06 Dec 2024 19:52:57 +0000

Author here -- six months ago we launched ARC Prize, a huge $1M experiment, to test if we need new ideas for AGI. The ARC-AGI benchmark remains unbeaten and I think we can now definitely say "yes".

One big update since June is that progress is no longer stalled. Coming into 2024, the public consensus vibe was that pure deep learning / LLMs would continue scaling to AGI. The fundamental architecture of these systems hasn't changed since ~2019.

But this flipped late summer. AlphaProof and o1 are evidence of this new reality. All frontier AI systems are now incorporating components beyond pure deep learning like program synthesis and program search.

I believe ARC Prize played a role here too. All the winners this year are leveraging new AGI reasoning approaches like deep-learning guided program synthesis, and test-time training/fine-tuning. We'll be seeing a lot more of these in frontier AI systems in coming years.

And I'm proud to say that all the code and papers from this year's winners are now open source!

We're going to keep running this thing annually until its defeated. And we've got ARC-AGI-2 in the works to improve on several of the v1 flaws (more here: https://arcprize.org/blog/arc-prize-2024-winners-technical-r...)

The ARC-AGI community keeps surprising me. From initial launch, through o1 testing, to the final 48 hours when the winning team jumped 10% and both winning papers dropped out of nowhere. I'm incredibly grateful to everyone and we will do our best to steward this attention towards AGI.

We'll be back in 2025!

New comment by mikeknoop in "The surprising effectiveness of test-time training for abstract reasoning [pdf]"

mikeknoop — Mon, 11 Nov 2024 17:54:09 +0000

Context: ARC Prize 2024 just wrapped up yesterday. ARC Prize's goal is to be a north star towards AGI. The two major categories of this year's progress seem to fall into "program synthesis" and "test-time fine tuning". Both of these techniques are adopted by DeepMind's impressive AlphaProof system [1]. And I'm personally excited to finally see actual code implementation of these ideas [2]!

We still have a long way to go for the grand prize -- we'll be back next year. Also got some new stuff in the works for 2025.

Watch for the official ARC Prize 2024 paper coming Dec 6. We're going to be overviewing all the new AI reasoning code and approaches open sourced via the competition [3].

[1] https://deepmind.google/discover/blog/ai-solves-imo-problems...

[2] https://github.com/ekinakyurek/marc

[3] https://x.com/arcprize

New comment by mikeknoop in "Show HN: Meet.hn – Meet the Hacker News community in your city"

mikeknoop — Sun, 15 Sep 2024 00:14:11 +0000

I met my Zapier co-founder bryanh through HN 15 years ago when someone made a similar service to OP called "hacker newsers". We were the only two people in Missouri at the time which led to a meetup. https://news.ycombinator.com/item?id=1520916

New comment by mikeknoop in "OpenAI o1 Results on ARC-AGI-Pub"

mikeknoop — Sat, 14 Sep 2024 05:00:02 +0000

I personally am slightly surprised at o1's modest performance on ARC-AGI given the large leaps in performance on other objectively hard benchmarks like IOI and AIME.

Curiosity is the first step towards new ideas.

ARC Prize's whole goal is to inspire curiosity like this and to encourage more AI researchers to explore and openly share new approaches towards AGI.

New comment by mikeknoop in "OpenAI o1 Results on ARC-AGI-Pub"

mikeknoop — Sat, 14 Sep 2024 04:13:15 +0000

I bet pretty well! Someone should try this. It's likely expensive but sampling could give you confidence to keep going. Ryan's approach costs about $10k to run the full 400 public eval set at current 4o prices -- which is the arbitrary limit we set for the public leaderboard.

New comment by mikeknoop in "OpenAI o1 Results on ARC-AGI-Pub"

mikeknoop — Sat, 14 Sep 2024 04:09:28 +0000

Author here. Which aspects are misleading? How can it be improved?

New comment by mikeknoop in "AI solves International Math Olympiad problems at silver medal level"

mikeknoop — Thu, 25 Jul 2024 19:30:45 +0000

High efficiency "search" is necessary to reach AGI. For example, humans don't search millions of potentially answers to beat ARC Prize puzzles. Instead, humans use our core experience to shrink the search space "intuitively" and deterministically check only a handful of ideas. I think deep-learning guided search is an incredibly promising research direction.

New comment by mikeknoop in "Getting 50% (SoTA) on Arc-AGI with GPT-4o"

mikeknoop — Tue, 18 Jun 2024 05:36:08 +0000

ARC isn't perfect and I hope ARC is not the last AGI benchmark. I've spoken with a few other benchmark creators looking to emulate ARC's novelty in other domains, so I think we'll see more. The evolution of AGI benchmarks likely needs to evolve alongside the tech -- humans have to design these tasks today to ensure novelty but should expect that to shift.

One core idea we've been advocating with ARC is that pure LLM scaling (parameters...) is insufficient to achieve AGI. Something new is needed. And OPs approach using a novel outer loop is one cool demonstration of this.

New comment by mikeknoop in "Getting 50% (SoTA) on Arc-AGI with GPT-4o"

mikeknoop — Mon, 17 Jun 2024 23:14:08 +0000

(ARC Prize co-founder here).

Ryan's work is legitimately interesting and novel "LLM reasoning" research! The core idea:

> get GPT-4o to generate around 8,000 python programs which attempt to implement the transformation, select a program which is right on all the examples (usually there are 3 examples), and then submit the output this function produces when applied to the additional test input(s)

Roughly, he's implemented an outer loop and using 4o to sample reasoning traces/programs from training data and test. Hybrid DL + program synthesis approaches are solutions we'd love to see more of.

A couple important notes:

1. this result is on the public eval set vs private set (ARC Prize $).

2. the current private set SOTA ~35% solution also performed ~50% on the public set. so this new result might be SOTA but hasn't been validated or scrutinized yet.

All said, I do expect verified public set results to flow down to the private set over time. We'll be publishing all the SOTA scores and open source reproductions here once available: https://arcprize.org/leaderboard

EDIT: also, congrats and kudos to Ryan for achieving this and putting the effort in to document and share his approach. we hope to inspire more frontier AI research sharing like this

New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"

mikeknoop — Wed, 12 Jun 2024 15:19:44 +0000

Yes there is a secondary leaderboard called ARC-AGI-Pub (in beta) with no limitations: https://arcprize.org/leaderboard

New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"

mikeknoop — Tue, 11 Jun 2024 23:41:03 +0000

(You can direct link to a task like this: https://arcprize.org/play?task=009d5c81 in case you want to share!)

New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"

mikeknoop — Tue, 11 Jun 2024 22:53:16 +0000

Here is some published research on the human difficulty of ARC-AGI: https://cims.nyu.edu/~brenden/papers/JohnsonEtAl2021CogSci.p...

> We found that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 84% of tasks solved per participant

New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"

mikeknoop — Tue, 11 Jun 2024 22:48:53 +0000

That is correct for ARC Prize: limited Kaggle compute (to target efficiency) and no internet (to reduce cheating).

We are also trialing a secondary leaderboard called ARC-AGI-Pub that imposes no limits or constraints. Not part of the prize today but could be in the future: https://arcprize.org/leaderboard

New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"

mikeknoop — Tue, 11 Jun 2024 22:46:16 +0000

I agree, $1M is ~trivial in AI. The primary goal with the prize is to raise public awareness about how close (or far today) we are from AGI: https://arcprize.org/leaderboard and we hope that understanding will shift more would-be AI researchers to working new ideas

ARC Prize – a $1M+ competition towards open AGI progress

mikeknoop — Tue, 11 Jun 2024 17:19:41 +0000

Hey folks! Mike here. Francois Chollet and I are launching ARC Prize, a public competition to beat and open-source the solution to the ARC-AGI eval.

ARC-AGI is (to our knowledge) the only eval which measures AGI: a system that can efficiently acquire new skill and solve novel, open-ended problems. Most AI evals measure skill directly vs the acquisition of new skill.

Francois created the eval in 2019, SOTA was 20% at inception, SOTA today is only 34%. Humans score 85-100%. 300 teams attempted ARC-AGI last year and several bigger labs have attempted it.

While most other skill-based evals have rapidly saturated to human-level, ARC-AGI was designed to resist “memorization” techniques (eg. LLMs)

Solving ARC-AGI tasks is quite easy for humans (even children) but impossible for modern AI. You can try ARC-AGI tasks yourself here: https://arcprize.org/play

ARC-AGI consists of 400 public training tasks, 400 public test tasks, and 100 secret test tasks. Every task is novel. SOTA is measured against the secret test set which adds to the robustness of the eval.

Solving ARC-AGI tasks requires no world knowledge, no understanding of language. Instead each puzzle requires a small set of “core knowledge priors” (goal directedness, objectness, symmetry, rotation, etc.)

At minimum, a solution to ARC-AGI opens up a completely new programming paradigm where programs can perfectly and reliably generalize from an arbitrary set of priors. At maximum, unlocks the tech tree towards AGI.

Our goal with this competition is:

1. Increase the number of researchers working on frontier AGI research (vs tinkering with LLMs). We need new ideas and the solution is likely to come from an outsider! 2. Establish a popular, objective measure of AGI progress that the public can use to understand how close we are to AGI (or not). Every new SOTA score will be published here: https://x.com/arcprize 3. Beat ARC-AGI and learn something new about the nature of intelligence.

Happy to answer questions!

Comments URL: https://news.ycombinator.com/item?id=40648960

Points: 588

# Comments: 337