<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mikeknoop</title><link>https://news.ycombinator.com/user?id=mikeknoop</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 03 May 2026 20:24:44 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mikeknoop" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mikeknoop in "Ti-84 Evo"]]></title><description><![CDATA[
<p>Fun memory trip. Learned assembly on those old Z80s in middle school. I had to go re-dig up SafeGuard, a program I made by reverse engineering TI's TestGuard, to stop admins from wiping your calculator memory and all your games! <a href="https://mikeknoop.com/upload/safeguard/" rel="nofollow">https://mikeknoop.com/upload/safeguard/</a></p>
]]></description><pubDate>Fri, 01 May 2026 23:22:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47981659</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=47981659</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47981659</guid></item><item><title><![CDATA[New comment by mikeknoop in "Recent results show that LLMs struggle with compositional tasks"]]></title><description><![CDATA[
<p>One must now ask whether research results are analyzing pure LLMs (eg. gpt-series) or LLM synthesis engines (eg. o-series, r-series). In this case, the headline is summarizing a paper originally published in 2023 and does not necessarily have bearing on new synthesis engines. In fact, evidence strongly suggests the opposite given o3's significant performance on ARC-AGI-1 which requires on-the-fly composition capability.</p>
]]></description><pubDate>Sun, 02 Feb 2025 06:13:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=42906402</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=42906402</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42906402</guid></item><item><title><![CDATA[New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"]]></title><description><![CDATA[
<p>I think we agree; to clarify, sharp messaging isn't inaccurate messaging. And I believe the story is not overhyped given the evidence: the benchmark resisted a $1M prize pool for ~6 months. But I concede we did obsess about the story to give it the best chance of survival in the marketplace of ideas against the incumbent AI research meme (LLM scaling). Now that the AI research field is coming around to the idea that something beyond deep learning is needed, the story matters less, and the benchmark, and future versions, can stand on their utility as a compass towards AGI.</p>
]]></description><pubDate>Fri, 06 Dec 2024 22:55:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=42345467</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=42345467</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42345467</guid></item><item><title><![CDATA[New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"]]></title><description><![CDATA[
<p>Correct, fine-tuning is not new. It's long been used to augment foundational LLMs with private data. Eg. private enterprise data. We do this at Zapier, for instance.<p>The new and surprising thing about test-time training (TTT) is how effective it is an approach to deal with novel abstract reasoning problems like ARC-AGI.<p>TTT was pioneered by Jack Cole last year and popularized this year by several teams, including this winning paper: <a href="https://ekinakyurek.github.io/papers/ttt.pdf" rel="nofollow">https://ekinakyurek.github.io/papers/ttt.pdf</a></p>
]]></description><pubDate>Fri, 06 Dec 2024 22:42:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=42345368</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=42345368</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42345368</guid></item><item><title><![CDATA[New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"]]></title><description><![CDATA[
<p>> I'd heartily recommend maybe taking down the marketing vibrance down a notch and keep things a bit more measured, it's not entirely a meme, though some of the more-serious researchers don't take it as seriously as a result.<p>This is fair critique. ARC Prize's 2024 messaging was sharp to break through the noise floor -- ARC has been around since 2019 but most only learned about it this summer. Now that it has garnered awareness, it is no longer useful, and in same cases hurting progress like you point out. The messaging needs to evolve and mature next year to be more neutral/academic.</p>
]]></description><pubDate>Fri, 06 Dec 2024 22:03:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=42345021</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=42345021</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42345021</guid></item><item><title><![CDATA[New comment by mikeknoop in "Arc Prize 2024 Winners and Technical Report"]]></title><description><![CDATA[
<p>Author here -- six months ago we launched ARC Prize, a huge $1M experiment, to test if we need new ideas for AGI. The ARC-AGI benchmark remains unbeaten and I think we can now definitely say "yes".<p>One big update since June is that progress is no longer stalled. Coming into 2024, the public consensus vibe was that pure deep learning / LLMs would continue scaling to AGI. The fundamental architecture of these systems hasn't changed since ~2019.<p>But this flipped late summer. AlphaProof and o1 are evidence of this new reality. All frontier AI systems are now incorporating components beyond pure deep learning like program synthesis and program search.<p>I believe ARC Prize played a role here too. All the winners this year are leveraging new AGI reasoning approaches like deep-learning guided program synthesis, and test-time training/fine-tuning. We'll be seeing a lot more of these in frontier AI systems in coming years.<p>And I'm proud to say that all the code and papers from this year's winners are now open source!<p>We're going to keep running this thing annually until its defeated. And we've got ARC-AGI-2 in the works to improve on several of the v1 flaws (more here: <a href="https://arcprize.org/blog/arc-prize-2024-winners-technical-report" rel="nofollow">https://arcprize.org/blog/arc-prize-2024-winners-technical-r...</a>)<p>The ARC-AGI community keeps surprising me. From initial launch, through o1 testing, to the final 48 hours when the winning team jumped 10% and both winning papers dropped out of nowhere. I'm incredibly grateful to everyone and we will do our best to steward this attention towards AGI.<p>We'll be back in 2025!</p>
]]></description><pubDate>Fri, 06 Dec 2024 19:52:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=42343621</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=42343621</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42343621</guid></item><item><title><![CDATA[New comment by mikeknoop in "The surprising effectiveness of test-time training for abstract reasoning [pdf]"]]></title><description><![CDATA[
<p>Context: ARC Prize 2024 just wrapped up yesterday. ARC Prize's goal is to be a north star towards AGI. The two major categories of this year's progress seem to fall into "program synthesis" and "test-time fine tuning". Both of these techniques are adopted by DeepMind's impressive AlphaProof system [1]. And I'm personally excited to finally see actual code implementation of these ideas [2]!<p>We still have a long way to go for the grand prize -- we'll be back next year. Also got some new stuff in the works for 2025.<p>Watch for the official ARC Prize 2024 paper coming Dec 6. We're going to be overviewing all the new AI reasoning code and approaches open sourced via the competition [3].<p>[1] <a href="https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/" rel="nofollow">https://deepmind.google/discover/blog/ai-solves-imo-problems...</a><p>[2] <a href="https://github.com/ekinakyurek/marc">https://github.com/ekinakyurek/marc</a><p>[3] <a href="https://x.com/arcprize" rel="nofollow">https://x.com/arcprize</a></p>
]]></description><pubDate>Mon, 11 Nov 2024 17:54:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=42109018</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=42109018</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42109018</guid></item><item><title><![CDATA[New comment by mikeknoop in "Show HN: Meet.hn – Meet the Hacker News community in your city"]]></title><description><![CDATA[
<p>I met my Zapier co-founder bryanh through HN 15 years ago when someone made a similar service to OP called "hacker newsers". We were the only two people in Missouri at the time which led to a meetup. <a href="https://news.ycombinator.com/item?id=1520916">https://news.ycombinator.com/item?id=1520916</a></p>
]]></description><pubDate>Sun, 15 Sep 2024 00:14:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=41544105</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=41544105</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41544105</guid></item><item><title><![CDATA[New comment by mikeknoop in "OpenAI o1 Results on ARC-AGI-Pub"]]></title><description><![CDATA[
<p>I personally am slightly surprised at o1's modest performance on ARC-AGI given the large leaps in performance on other objectively hard benchmarks like IOI and AIME.<p>Curiosity is the first step towards new ideas.<p>ARC Prize's whole goal is to inspire curiosity like this and to encourage more AI researchers to explore and  openly share new approaches towards AGI.</p>
]]></description><pubDate>Sat, 14 Sep 2024 05:00:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=41537580</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=41537580</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41537580</guid></item><item><title><![CDATA[New comment by mikeknoop in "OpenAI o1 Results on ARC-AGI-Pub"]]></title><description><![CDATA[
<p>I bet pretty well! Someone should try this. It's likely expensive but sampling could give you confidence to keep going. Ryan's approach costs about $10k to run the full 400 public eval set at current 4o prices -- which is the arbitrary limit we set for the public leaderboard.</p>
]]></description><pubDate>Sat, 14 Sep 2024 04:13:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=41537419</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=41537419</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41537419</guid></item><item><title><![CDATA[New comment by mikeknoop in "OpenAI o1 Results on ARC-AGI-Pub"]]></title><description><![CDATA[
<p>Author here. Which aspects are misleading? How can it be improved?</p>
]]></description><pubDate>Sat, 14 Sep 2024 04:09:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=41537409</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=41537409</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41537409</guid></item><item><title><![CDATA[New comment by mikeknoop in "AI solves International Math Olympiad problems at silver medal level"]]></title><description><![CDATA[
<p>High efficiency "search" is necessary to reach AGI. For example, humans don't search millions of potentially answers to beat ARC Prize puzzles. Instead, humans use our core experience to shrink the search space "intuitively" and deterministically check only a handful of ideas. I think deep-learning guided search is an incredibly promising research direction.</p>
]]></description><pubDate>Thu, 25 Jul 2024 19:30:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=41072438</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=41072438</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41072438</guid></item><item><title><![CDATA[New comment by mikeknoop in "Getting 50% (SoTA) on Arc-AGI with GPT-4o"]]></title><description><![CDATA[
<p>ARC isn't perfect and I hope ARC is not the last AGI benchmark. I've spoken with a few other benchmark creators looking to emulate ARC's novelty in other domains, so I think we'll see more. The evolution of AGI benchmarks likely needs to evolve alongside the tech -- humans have to design these tasks today to ensure novelty but should expect that to shift.<p>One core idea we've been advocating with ARC is that pure LLM scaling (parameters...) is insufficient to achieve AGI. Something new is needed. And OPs approach using a novel outer loop is one cool demonstration of this.</p>
]]></description><pubDate>Tue, 18 Jun 2024 05:36:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=40714361</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40714361</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40714361</guid></item><item><title><![CDATA[New comment by mikeknoop in "Getting 50% (SoTA) on Arc-AGI with GPT-4o"]]></title><description><![CDATA[
<p>(ARC Prize co-founder here).<p>Ryan's work is legitimately interesting and novel "LLM reasoning" research! The core idea:<p>> get GPT-4o to generate around 8,000 python programs which attempt to implement the transformation, select a program which is right on all the examples (usually there are 3 examples), and then submit the output this function produces when applied to the additional test input(s)<p>Roughly, he's implemented an outer loop and using 4o to sample reasoning traces/programs from training data and test. Hybrid DL + program synthesis approaches are solutions we'd love to see more of.<p>A couple important notes:<p>1. this result is on the public eval set vs private set (ARC Prize $).<p>2. the current private set SOTA ~35% solution also performed ~50% on the public set. so this new result <i>might</i> be SOTA but hasn't been validated or scrutinized yet.<p>All said, I do expect verified public set results to flow down to the private set over time. We'll be publishing all the SOTA scores and open source reproductions here once available: <a href="https://arcprize.org/leaderboard" rel="nofollow">https://arcprize.org/leaderboard</a><p>EDIT: also, congrats and kudos to Ryan for achieving this and putting the effort in to document and share his approach. we hope to inspire more frontier AI research sharing like this</p>
]]></description><pubDate>Mon, 17 Jun 2024 23:14:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=40712282</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40712282</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40712282</guid></item><item><title><![CDATA[New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"]]></title><description><![CDATA[
<p>Yes there is a secondary leaderboard called ARC-AGI-Pub (in beta) with no limitations: <a href="https://arcprize.org/leaderboard" rel="nofollow">https://arcprize.org/leaderboard</a></p>
]]></description><pubDate>Wed, 12 Jun 2024 15:19:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=40659100</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40659100</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40659100</guid></item><item><title><![CDATA[New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"]]></title><description><![CDATA[
<p>(You can direct link to a task like this: <a href="https://arcprize.org/play?task=009d5c81" rel="nofollow">https://arcprize.org/play?task=009d5c81</a> in case you want to share!)</p>
]]></description><pubDate>Tue, 11 Jun 2024 23:41:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=40652941</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40652941</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40652941</guid></item><item><title><![CDATA[New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"]]></title><description><![CDATA[
<p>Here is some published research on the human difficulty of ARC-AGI: <a href="https://cims.nyu.edu/~brenden/papers/JohnsonEtAl2021CogSci.pdf" rel="nofollow">https://cims.nyu.edu/~brenden/papers/JohnsonEtAl2021CogSci.p...</a><p>> We found that humans were able to infer the underlying program
and generate the correct test output for a novel test input example, with an average of 84% of tasks solved per participant</p>
]]></description><pubDate>Tue, 11 Jun 2024 22:53:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=40652600</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40652600</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40652600</guid></item><item><title><![CDATA[New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"]]></title><description><![CDATA[
<p>That is correct for ARC Prize: limited Kaggle compute (to target efficiency) and no internet (to reduce cheating).<p>We are also trialing a secondary leaderboard called ARC-AGI-Pub that imposes no limits or constraints. Not part of the prize today but could be in the future: <a href="https://arcprize.org/leaderboard" rel="nofollow">https://arcprize.org/leaderboard</a></p>
]]></description><pubDate>Tue, 11 Jun 2024 22:48:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=40652563</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40652563</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40652563</guid></item><item><title><![CDATA[New comment by mikeknoop in "ARC Prize – a $1M+ competition towards open AGI progress"]]></title><description><![CDATA[
<p>I agree, $1M is ~trivial in AI. The primary goal with the prize is to raise public awareness about how close (or far today) we are from AGI: <a href="https://arcprize.org/leaderboard" rel="nofollow">https://arcprize.org/leaderboard</a> and we hope that understanding will shift more would-be AI researchers to working new ideas</p>
]]></description><pubDate>Tue, 11 Jun 2024 22:46:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=40652544</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40652544</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40652544</guid></item><item><title><![CDATA[ARC Prize – a $1M+ competition towards open AGI progress]]></title><description><![CDATA[
<p>Hey folks! Mike here. Francois Chollet and I are launching ARC Prize, a public competition to beat and open-source the solution to the ARC-AGI eval.<p>ARC-AGI is (to our knowledge) the only eval which measures AGI: a system that can efficiently acquire new skill and solve novel, open-ended problems. Most AI evals measure skill directly vs the acquisition of new skill.<p>Francois created the eval in 2019, SOTA was 20% at inception, SOTA today is only 34%. Humans score 85-100%. 300 teams attempted ARC-AGI last year and several bigger labs have attempted it.<p>While most other skill-based evals have rapidly saturated to human-level, ARC-AGI was designed to resist “memorization” techniques (eg. LLMs)<p>Solving ARC-AGI tasks is quite easy for humans (even children) but impossible for modern AI. You can try ARC-AGI tasks yourself here: <a href="https://arcprize.org/play" rel="nofollow">https://arcprize.org/play</a><p>ARC-AGI consists of 400 public training tasks, 400 public test tasks, and 100 secret test tasks. Every task is novel. SOTA is measured against the secret test set which adds to the robustness of the eval.<p>Solving ARC-AGI tasks requires no world knowledge, no understanding of language. Instead each puzzle requires a small set of “core knowledge priors” (goal directedness, objectness, symmetry, rotation, etc.)<p>At minimum, a solution to ARC-AGI opens up a completely new programming paradigm where programs can perfectly and reliably generalize from an arbitrary set of priors. At maximum, unlocks the tech tree towards AGI.<p>Our goal with this competition is:<p>1. Increase the number of researchers working on frontier AGI research (vs tinkering with LLMs). We need new ideas and the solution is likely to come from an outsider!
2. Establish a popular, objective measure of AGI progress that the public can use to understand how close we are to AGI (or not). Every new SOTA score will be published here: <a href="https://x.com/arcprize" rel="nofollow">https://x.com/arcprize</a>
3. Beat ARC-AGI and learn something new about the nature of intelligence.<p>Happy to answer questions!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=40648960">https://news.ycombinator.com/item?id=40648960</a></p>
<p>Points: 588</p>
<p># Comments: 337</p>
]]></description><pubDate>Tue, 11 Jun 2024 17:19:41 +0000</pubDate><link>https://arcprize.org/blog/launch</link><dc:creator>mikeknoop</dc:creator><comments>https://news.ycombinator.com/item?id=40648960</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40648960</guid></item></channel></rss>