Hacker News: gkamradt

New comment by gkamradt in "Ask HN: What are you working on? (June 2026)"

gkamradt — Sun, 14 Jun 2026 21:15:43 +0000

Two projects

Share a single google doc with your agent (w/o oauth mess)

I needed a way to share a single google doc/sheet with my agent

I didn’t want to go through the heavy oauth gcp project so I’m using disposable email addresses as the work around

2. Agents.sh

I get so many cold emails that could be better if I tell the bots how to talk to and reach me. What’s top of mind for me, how I like to be pitched, etc.

So I made a mini platform to put up text/md files. Then added all the perms fun - pw support, expiration, every url has an inbox. Aimed at agents only.

Ex: https://agnts.sh/greg

Show HN: ARC-AGI-3 Toolkit

gkamradt — Thu, 29 Jan 2026 21:01:27 +0000

Article URL: https://docs.arcprize.org

Comments URL: https://news.ycombinator.com/item?id=46816529

Points: 9

# Comments: 1

New comment by gkamradt in "OpenAI o3-pro"

gkamradt — Tue, 10 Jun 2025 21:26:51 +0000

o3-pro is not the same as the o3-preview that was shown in Dec '24. OpenAI confirmed this for us. More on that here: https://x.com/arcprize/status/1932535380865347585

New comment by gkamradt in "Arc-AGI-2 and ARC Prize 2025"

gkamradt — Mon, 24 Mar 2025 23:58:58 +0000

Ah yes, two things

1. We had a no-data retention agreement with them. We were assured by the highest level of their company + security division that the box our test was run on would be wiped after testing

2. We only tested o3 against the semi-private set. We didn't test it with the private eval.

New comment by gkamradt in "Arc-AGI-2 and ARC Prize 2025"

gkamradt — Mon, 24 Mar 2025 23:47:17 +0000

#4 (private test set) doesn't get used for any public model testing. It is only used on the Kaggle leaderboard where no internet access is allowed.

New comment by gkamradt in "Arc-AGI-2 and ARC Prize 2025"

gkamradt — Mon, 24 Mar 2025 23:46:06 +0000

Good question! This was one of the main motivations of our "Paper Prize" track. We wanted to reward conceptual progress vs leaderboard chasing. In fact, when we increased the prizes mid year we awarded more money towards the paper track vs top score.

We had 40 papers submitted last year and 8 were awarded prizes. [1]

On of the main teams, MindsAI, just published their paper on their novel test time fine tuning approach. [2]

Jan/Daniel (1st place winners last year) talk all about their progress and journey building out here [3]. Stories like theirs help push the field forward.

[1] https://arcprize.org/blog/arc-prize-2024-winners-technical-r...

[2] https://github.com/MohamedOsman1998/deep-learning-for-arc/bl...

[3] https://www.youtube.com/watch?v=mTX_sAq--zY

New comment by gkamradt in "Arc-AGI-2 and ARC Prize 2025"

gkamradt — Mon, 24 Mar 2025 21:00:52 +0000

We have a few sets:

1. Public Train - 1,000 tasks that are public 2. Public Eval - 120 tasks that are public

So for those two we don't have protections.

3. Semi Private Eval - 120 tasks that are exposed to 3rd parties. We sign data agreements where we can, but we understand this is exposed and not 100% secure. It's a risk we are open to in order to keep testing velocity. In theory it is very difficulty to secure this 100%. The cost to create a new semi-private test set is lower than the effort needed to secure it 100%.

4. Private Eval - Only on Kaggle, not exposed to any 3rd parties at all. Very few people have access to this. Our trust vectors are with Kaggle and the internal team only.

New comment by gkamradt in "Arc-AGI-2 and ARC Prize 2025"

gkamradt — Mon, 24 Mar 2025 20:37:02 +0000

Hey HN, Greg from ARC Prize Foundation here.

Alongside Mike Knoop and François Francois Chollet, we’re launching ARC-AGI-2, a frontier AI benchmark that measures a model’s ability to generalize on tasks it hasn’t seen before, and the ARC Prize 2025 competition to beat it.

In Dec ‘24, ARC-AGI-1 (2019) pinpointed the moment AI moved beyond pure memorization as seen by OpenAI's o3.

ARC-AGI-2 targets test-time reasoning.

My view is that good AI benchmarks don't just measure progress, they inspire it. Our mission is to guide research towards general systems.

Base LLMs (no reasoning) are currently scoring 0% on ARC-AGI-2. Specialized AI reasoning systems (like R1 or o3-mini) are <4%.

Every (100%) of ARC-AGI-2 tasks, however, have been solved by at least two humans, quickly and easily. We know this because we tested 400 people live.

Our belief is that once we can no longer come up with quantifiable problems that are "feasible for humans and hard for AI" then we effectively have AGI. ARC-AGI-2 proves that we do not have AGI.

Change log from ARC-AGI-2 to ARC-AGI-2: * The two main evaluation sets (semi-private, private eval) have increased to 120 tasks * Solving tasks requires more reasoning vs pure intuition * Each task has been confirmed to have been solved by at least 2 people (many more) out of an average of 7 test taskers in 2 attempts or less * Non-training task sets are now difficulty-calibrated

The 2025 Prize ($1M, open-source required) is designed to drive progress on this specific gap. Last year's competition (also launched on HN) had 1.5K teams participate and had 40+ research papers published.

The Kaggle competition goes live later this week and you can sign up here: https://arcprize.org/competition

We're in an idea-constrained environment. The next AGI breakthrough might come from you, not a giant lab.

Happy to answer questions.

Arc-AGI-2 and ARC Prize 2025

gkamradt — Mon, 24 Mar 2025 20:35:30 +0000

Article URL: https://arcprize.org/blog/announcing-arc-agi-2-and-arc-prize-2025

Comments URL: https://news.ycombinator.com/item?id=43465147

Points: 188

# Comments: 101

How the cofounder of Zapier recruited me to run a $1M AI competition

gkamradt — Wed, 23 Oct 2024 15:11:35 +0000

Article URL: https://gregkamradt.com/writing/arc_prize

Comments URL: https://news.ycombinator.com/item?id=41925844

Points: 2

# Comments: 0

Scaling LLMs apps via accuracy, latency, cost

gkamradt — Thu, 03 Oct 2024 14:03:48 +0000

Article URL: https://www.leverage.to/learn/dev/posts/scaling_llm_apps

Comments URL: https://news.ycombinator.com/item?id=41730915

Points: 1

# Comments: 0

New comment by gkamradt in "ARC Prize – a $1M+ competition towards open AGI progress"

gkamradt — Tue, 11 Jun 2024 22:41:52 +0000

Check out the SOTA resources on the guide

https://arcprize.org/guide

Happy to answer any questions you have along the way

(I'm helping run ARC Prize)

New comment by gkamradt in "ARC Prize – a $1M+ competition towards open AGI progress"

gkamradt — Tue, 11 Jun 2024 22:41:27 +0000

We put a bunch of detail to get started on the guide https://arcprize.org/guide

Happy to answer any questions you have along the way

(I'm helping run ARC Prize)

Show HN: I Built a Semantic De-Deduplicator

gkamradt — Wed, 01 Nov 2023 16:59:17 +0000

Hey HN Crew!

We all have lists...and they can be annoying to de-duplicate.

* User feedback * Groceries * Employee Surveys * Bug reports * You name it

Most ways to consolidate like-items work off of keywords or worse, exact phrases (Sheets/Excel).

But LLMs are much better at understanding an items semantic meaning and determining if two items should be combined or not.

I decided to build my first python package, The Semantic Deduplicator, to help me consolidate items based on their meaning, not keywords.

For Example On Groceries: ['We need more berries', 'I want more more milk', 'Can we get more carbonated water please?', 'We need more sparkling water'] ...deduplicated... ['Berries', 'Milk', 'Sparkling Water']

How it works:

1. Start with an empty list ready to populate

2. The first item you add will get 1) transformed into a clean name (user feedback > product request) and 2) added to the list

3. While you're adding more items

* Check to see if your new item's embedding is close to any existing item

* If so, ask the LLM to compare your two items to see if they should be combined

* If so, combine them

This package is more of an exploration and POC so be careful with it. I'd love to hear any feedback.

All the links:

* YT Explainer Video: https://www.youtube.com/watch?v=etLsNgkGbeM

* Twitter Thread: https://twitter.com/GregKamradt/status/1719760658936545336

* Pypi: https://pypi.org/project/semantic-deduplicator/

* Github: https://github.com/gkamradt/SemanticDeduplicator

Comments URL: https://news.ycombinator.com/item?id=38101201

Points: 2

# Comments: 2

New comment by gkamradt in "QGIS is the mapping software you didn't know you needed"

gkamradt — Sat, 11 Feb 2023 17:17:13 +0000

Thank you

New comment by gkamradt in "QGIS is the mapping software you didn't know you needed"

gkamradt — Sat, 11 Feb 2023 17:17:06 +0000

Ha that would be sweet.

Do you have a video link of what you're referring to?

I once tried to use the molds to make chocolate representations of the mountains ha! I learned the hard way that tempering is difficult for a novice

New comment by gkamradt in "QGIS is the mapping software you didn't know you needed"

gkamradt — Sat, 11 Feb 2023 17:15:29 +0000

Thank you!

New comment by gkamradt in "QGIS is the mapping software you didn't know you needed"

gkamradt — Sat, 11 Feb 2023 17:15:23 +0000

I actually use DEMto3D. It's touchy, but I do post-work on the .stl/3d model in blender so it works out ok for me.

If you have weird artifacts, I'm guessing that is due to the underlying data vs QGIS itself. Have you looked at their documentation (https://demto3d.com/en/)?

I outline how the whole process works here https://www.gregkamradt.com/gregkamradt/2020/2/29/manufactur...

New comment by gkamradt in "QGIS is the mapping software you didn't know you needed"

gkamradt — Sat, 11 Feb 2023 17:09:01 +0000

Here's the process on custom orders. I'll put the link right on the site to try and avoid confusion.

https://docs.google.com/document/d/1IkiHG_Z5JS03mWYHv-KNAhi8...

edit: whoops added link

New comment by gkamradt in "QGIS is the mapping software you didn't know you needed"

gkamradt — Sat, 11 Feb 2023 17:08:39 +0000

The upfront costs are pretty expensive.

For every new location you do the process looks like: 1. Get the data and prep it for print (fixed) 2. 3D print it (fixed) 3. Rubber Mold (fixed) 4. Wax Model (variable) 5. Bronze (variable)

Steps 1-3 are 40-60% of the costs. So I haven't put the money out of pocket yet to put up new locations. I've let customer's ask first and then do them.

Surprisingly, most of our orders have been custom

Here's my info packet on the custom process https://docs.google.com/document/d/1IkiHG_Z5JS03mWYHv-KNAhi8...