Hacker News: narush

New comment by narush in "AI slows down open source developers. Peter Naur can teach us why"

narush — Mon, 14 Jul 2025 17:08:53 +0000

We call this over-generalization out specifically in the "We do not provide evidence that:" table in the blog post and paper - I agree there are tasks these developers are likely sped up on with early-2025 tools.

New comment by narush in "AI slows down open source developers. Peter Naur can teach us why"

narush — Mon, 14 Jul 2025 16:48:46 +0000

Hey, thanks for linking this! I'm a study author, and I greatly appreciate that this author dug into the appendix and provided feedback so that other folks can read it as well.

A few notes if it's helpful:

1. This post is primarily worried about ordering considerations -- I think this is a valid concern. We explicitly call this out in the paper [1] as a factor we can't rule out -- see "Bias from issue completion order (C.2.4)". We have no evidence this occurred, but we also don't have evidence it didn't.

2. "I mean, rather than boring us with these robustness checks, METR could just release a CSV with three columns (developer ID, task condition, time)." Seconded :) We're planning on open-sourcing pretty much this data (and some core analysis code) later this week here: https://github.com/METR/Measuring-Early-2025-AI-on-Exp-OSS-D... - star if you want to dig in when it comes out.

3. As I said in my comment on the post, the takeaway at the end of the post is that "What we can glean from this study is that even expert developers aren’t great at predicting how long tasks will take. And despite the new coding tools being incredibly useful, people are certainly far too optimistic about the dramatic gains in productivity they will bring." I think this is a reasonable takeaway from the study overall. As we say in the "We do not provide evidence that:" section of the paper (Page 17), we don't provide evidence across all developers (or even most developers) -- and ofc, this is just a point-in-time measurement that could totally be different by now (from tooling and model improvements in the past month alone).

Thanks again for linking, and to the original author for their detailed review. It's greatly appreciated!

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

New comment by narush in "AI slows down open source developers. Peter Naur can teach us why"

narush — Mon, 14 Jul 2025 16:33:59 +0000

Hey, thanks for digging into the details here! Copying a relevant comment (https://news.ycombinator.com/item?id=44523638) from the other thread on the paper, in case it's help on this point.

1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.

2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.

3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!

4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.

5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.

In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).

I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!

(You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)

New comment by narush in "AI slows down open source developers. Peter Naur can teach us why"

narush — Mon, 14 Jul 2025 16:29:27 +0000

Thanks for the feedback! I strongly agree this is not the only measure of developer productivity -- but it's certainly one of them. I think this measure as speaks very directly to how _many_ developers (myself included) understand the impact of AI tools on their own work currently (e.g. just speeding up implementation speed).

(The SPACE [1] framework is a pretty overview of considerations here; I agree with a lot of it, although I'll note that METR [2] has different motivations for studying developer productivity than Microsoft does.)

[1] https://dl.acm.org/doi/10.1145/3454122.3454124

[2] https://metr.org/about

New comment by narush in "AI slows down open source developers. Peter Naur can teach us why"

narush — Mon, 14 Jul 2025 16:23:33 +0000

Hey HN -- study author here! (See previous thread on the paper here [1].)

I think this blog post is an interesting take on one specific factor that is likely contributing to slowdown. We discuss this in the paper [2] in the section "Implicit repository context (C.1.5)" -- check it out if you want to see some developer quotes about this factor.

> This is why AI coding tools, as they exist today, will generally slow someone down if they know what they are doing, and are working on a project that they understand.

I made this point in the other thread discussing the study, but in general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the full factors table on page 11).

> If there are no takers then I might try experimenting on myself.

This sounds super cool! I'd be very excited to see how you set this up + how it turns out... please do shoot me an email (in the paper) if you do this!

> AI slows down open source developers. Peter Naur can teach us why

Nit: I appreciate how hard it is to write short titles summarizing the paper (the graph title is the best I was able to do after a lot of trying) -- but I might have written this "Early-2025 AI slows down experienced open-source developers. Peter Naur can give us more context about one specific factor." It's admittedly less of a catchy-title, but I think getting the qualifications right are really important!

Thanks again for the sweet write-up! I'll hang around in the comments today as well.

[1] https://news.ycombinator.com/item?id=44522772

[2] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Fri, 11 Jul 2025 05:10:35 +0000

Check out section AI increasing issue scope (C.2.3) in the paper -- https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

We speak (the best we can) to changes in amount of code -- I'll note that this metric is quite messy and hard to reason about!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Fri, 11 Jul 2025 05:00:27 +0000

How these results transfer to other settings is an excellent question. Previous literature would suggest speedup -- but I'd be excited to run a very similar methodology in those settings. It's already challenging as models + tools have changed!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 21:30:56 +0000

Sorry, this is the first 8 issues per-developer!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 20:14:18 +0000

Thank you!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 20:14:08 +0000

We attempted to! We explore this more in the section Trading speed for ease (C.2.5) in the paper (https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf).

TLDR: mixed evidence that developers make it less effortful, from quantitative and qualitative reports. Unclear effect.

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 20:04:10 +0000

There's additional breakdown per-minute in the appendix -- see appendix E.4!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 19:21:34 +0000

You can see a list of repositories with participating developers in the appendix! Section G.7.

Paper is here: https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 19:12:02 +0000

You can see this analysis in the factor analysis of "Below-average use of AI tools" (C.2.7) in the paper [1], which we mark as an unclear effect.

TLDR: over the first 8 issues, developers do not appear to get majorly less slowed down.

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:57:33 +0000

Thanks for the kind words!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:57:08 +0000

The instructions given to developers was not just "implement with AI" - but rather that they could use AI if they deemed it would be helpful, but indeed did _not need to use AI if they didn't think it would be helpful_. In about ~16% of labeled screen recordings where developers were allowed to use AI, they choose to use no AI at all!

That being said, we can't rule out that the experiment drove them to use more AI than they would have outside of the experiment (in a way that made them less productive). You can see more in section "Experimentally driven overuse of AI (C.2.1)" [1]

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:39:48 +0000

Honestly, this is a fair point -- and speaks the difficulty of figuring out the right baseline to measure against here!

If we studied folks with _no_ AI experience, then we might underestimate speedup, as these folks are learning tools (see a discussion of learning effects in section (C.2.7) - Below-average use of AI tools - in the paper). If we studied folks with _only_ AI experience, then we might overestimate speedup, as perhaps these folks can't really program without AI at all.

In some sense, these are just two separate and interesting questions - I'm excited for future work to really dig in on both!

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:33:26 +0000

Yep, sorry, meant to post this somewhere but forgot in final-paper-polishing-sprint yesterday!

We'll be releasing anonymized data and some basic analysis code to replicate core results within the next few weeks (probably next, depending).

Our GitHub is here (http://github.com/METR/) -- or you can follow us (https://x.com/metr_evals) and we'll probably tweet about it.

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:24:43 +0000

> which feels like it is easier and hence faster.

We explore this factor in section (C.2.5) - "Trading speed for ease" - in the paper [1]. It's labeled as a factor with an unclear effect, some developers seem to think so, and others don't!

> like the developers deliberately picked "easy" tasks that they already knew how to do

We explore this factor in (C.2.2) - "Unrepresentative task distribution." I think the effect here is unclear; these are certainly real tasks, but they are sampled from the smaller end of tasks developers would work on. I think the relative effect on AI vs. human performance is not super clear...

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:18:27 +0000

The graphs are all matplotlib. The methodology figure is built in Figma! (Source: I'm a paper author :)).

New comment by narush in "Measuring the impact of AI on experienced open-source developer productivity"

narush — Thu, 10 Jul 2025 18:16:08 +0000

Yeah, I'll note that this study does _not_ capture the entire OS dev workflow -- you're totally right that reviewing PRs is a big portion of the time that many maintainers spend on their projects (and thanks to them for doing this [often hard] work). In the paper [1], we explore this factor in more detail -- see section (C.2.2) - Unrepresentative task distribution.

There's some existing lit about increased contributions to OS repositories after the introduction of AI -- I've also personally heard a fear anecdotes about an increase in the number of low-quality PRs from first time contributors, seemingly as a result of AI making it easier to get started -- ofc, the tradeoff is that making it easier to get started has pros to it too!

[1] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf