Hacker News: peterbell_nyc

New comment by peterbell_nyc in "AI outperforms law professors in Stanford Law study"

peterbell_nyc — Wed, 03 Jun 2026 17:38:57 +0000

There's a huge difference between one shot and few shot versus building a robust harness with deterministic and adversarial quality gates. And I'm finding that agents can actually do a pretty good job of a surprising number of things if you are very clear about your dimensions of quality and the rubrics that you get agents to research and then use to validate against those dimensions of quality.

Make sure to use a deterministic pipeline or harness to go step by step so agents aren't checking their own work and I sometimes get alpha from having a codex check the work of a clod but I am seeing pretty good output across multiple domains when I have three independent quality gates and a loop which only spits it out to a human if it doesn't converge at a reasonable cost.

New comment by peterbell_nyc in "DaVinci Resolve 21"

peterbell_nyc — Wed, 03 Jun 2026 17:17:21 +0000

Anyone using this headlessly got a read on how much of this an agent could do without human intervention? Would love to have a gut check on "sure, spend the $295 and you'll get some benefits for free if you have an agent run your videos through this before shipping them"

To be clear, my use case is making weekly online videos suck a little less - not grading feature films :)

New comment by peterbell_nyc in "AI is just unauthorised plagiarism at a bigger scale"

peterbell_nyc — Thu, 21 May 2026 14:20:29 +0000

Re: the higher ranking plagarism, that stings and makes sense. AEO and SEO are a thing. We need better mechanisms for identifying "root sources" of content - it's something I find myself working on personally. As I ingest sources for my book I need to be able to build a classifier that incrementally moves towards finding origin sources. That said, it's in my interest to do that because there is a differentiated value in having access to the sources that regularly provide novel, valuable content.

To be fair there is also value (at least for now) in sites that aggregate quality content and republish as a secondary level of discovery if my agents don't go far enough down the search results, but I'd expect that value to diminish over time as I better tune my research and build my lists of originating authors.

And to be clear, I don't like the idea of people stealing someone elses content and republishing without attribution (although it has been going on long before ChatGPT) but I think now we can all run agentic research teams the "bad actors" will slowly get filtered out of the ecosystem.

New comment by peterbell_nyc in "AI is just unauthorised plagiarism at a bigger scale"

peterbell_nyc — Thu, 21 May 2026 14:15:32 +0000

I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.

I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).

I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.

New comment by peterbell_nyc in "I'm going back to writing code by hand"

peterbell_nyc — Mon, 11 May 2026 18:06:24 +0000

Love the DDD callout. I have explicit steps to review and rate delta's to the ubiquitous language and one of my architectural reviewers will often engage with me about where the bounded contexts should be and will probably the translation layers.

I find the more good practices I add to my envision/scope/spec/build/test/deploy loops the happier I am with the outcomes.

I will say that I am finding the actual code to be somewhat ephemeral for me - the more precise the specifications are and generally the tighter and more elegant the design is, the less the code matters as a long term artifact.

I'm not at the "code is assembler" point yet - but I could see that with more, richer specs I could end up there. Of course the specs are then substantial, but declarative specs can be robust and unambigous (with sufficient read teaming review) and - like domain specific languages - reduce the accidental complexity of the syntax when compared to an implementation in a given language.

There are exceptions to all of this, but it's fascinating to see how it's evolving!

New comment by peterbell_nyc in "I'm going back to writing code by hand"

peterbell_nyc — Mon, 11 May 2026 17:59:52 +0000

I'm generally in agreement with everyone here. - Some code is ephemeral - it's generated to do the thing, thrown away end of session and the csv was imported successfully (or whatever). Make sure you have at least some testing of the output or you may find the email is in the last name field for some rows. If possible, have an API your agent uses with rich domain types and validations that force it to do things right or do them again (and that it' can't rewrite to relax the constraints!) - You can one or few shot a real app - for a few users, for a small set of use cases. Scope of this will improve with models, but at least today it's spelling bee app for my kids" not "salesforce replacement for millions of workers". - You can add rich validation steps for all types of quality that you care about which (assuming they converge) can deliver high performance, well designed and functionally correct code mostly autonomously.

I'm building an orchestrator (who isn't). Haven't looked at the code yet, but it appears to work. But man have I spent hours in loops between Claude, Codex and myself all on the highest thinking levels to figure out what interface portability means for the employee, how best to handle "remote" sessions and the appropriate semantics for pipelines/recipes.

I've also been very opinionated about who does what. I'll let the agent write a script to sync with github and reload workers, but I decided to "waste" the 5 minutes to manually do all of the config steps on render for my server when claude told me that I couldn't just give it read only scope to pull the logs. Bad news, I'm cutting and pasting for my computer overlord. Good news? Claude can't blow away the prod db if it happens to get in the way of whatever interpretation is makes of the instructions I give it.

A chainsaw requires very different skills that an axe. It has different failure modes. Some experience as a lumberjack probably helps using either/both.

No difference (at least now) with agents.

New comment by peterbell_nyc in "Vibe coding and agentic engineering are getting closer than I'd like"

peterbell_nyc — Wed, 06 May 2026 17:53:52 +0000

For me the distinction is the quality and rigor of your pipeline.

Vibe coding: one shot or few shot, smoke test the output, use it until it breaks (or doesn't). Ideal for lightweight PoC and low stakes individual, family or small team apps.

Agentic engineering: - You care about a larger subset of concerns such as functional correctness, performance, infrastructure, resilience/availability, scalability and maintainability. - You have a multi-step pipeline for managing the flow of work - Stages might be project intake, project selection, project specification, epic decomposition, d=story decomposition, coding, documentation and deployment. - Each stage will have some combination of deterministic quality gates (tests must pass, performance must hit a benchmark) and adversarial reviews (business value of proposed project, comprehensiveness of spec, elegance of code, rigor and simplicity of ubiquitous language, etc)

And it's a slider. Sometimes I throw a ticket into my system because I don't want to have to do an interview and burn tokens on three rounds of adversarial reviews, estimating potential value and then detailed specification and adversarial reviews just to ship a feature.

New comment by peterbell_nyc in "Agent Skills"

peterbell_nyc — Tue, 05 May 2026 18:29:51 +0000

Helps if you both hand to original agent as strong guidance and then to an adversarial agent as a quality reviewer. The adversarial agent is more likely tro loop the work back if it fails the validation criteria.

I do find that just asking the same agent to do and check it's own work is not particularly reliable.

New comment by peterbell_nyc in "Cybersecurity looks like proof of work now"

peterbell_nyc — Thu, 16 Apr 2026 00:53:03 +0000

Why crack one website when you can crack all of them? For a well funded (especially nation state) attacker, if $1 in compute and effort returns $2 in ransoms, when it's possible to access another n x $1 of compute and if you don't hit diminishing returns or cashflow limitations, why wouldn't you just keep spending $'s until you p0wned all the systems?

If there is only one bear, you just need to run faster than your friends. If there's a pack of them, it you need to start training much harder!

New comment by peterbell_nyc in "Multi-Agentic Software Development Is a Distributed Systems Problem"

peterbell_nyc — Tue, 14 Apr 2026 17:24:29 +0000

Exactly this. I'm writing my own little orchestrator and memory system and because I have a modest number of workflows, I'm taking the time to specify them deterministically, describe them as a DAG (with goto's for the inevitable loops) and generate deternministic orchestration code. I'm trying to make most of the tool calls as clear and comprehensive as possible (don't make Opus convert a PDF, have a script do that and give it the text instead) and I'm putting all the things you'd expect to track state and assume ~20% task failure rate so I can simply wipe and repeat failed tasks.

Small model and (where still required) human in the loop steps for deterministic workflows can solve a surprisingly large number of problems and don't depend on the models to be consistent or not to fail.

Just invest heavily in adversarial agents and quality gates and apply transforms on intermediate artifacts that can be validated for some dimensions of quality to minimize drift.

New comment by peterbell_nyc in "The future of everything is lies, I guess: Work"

peterbell_nyc — Tue, 14 Apr 2026 17:18:34 +0000

Seeing plenty of this. The quality of agentic code is a function of the quantity and quality of adversarial quality gates. I have seen no proof that an agentic system is incapable of delivering code that is as functional, performant and maintainable as code from a great team of developers, and enough anecdotes in the other direction to suggest that AI "slop" is going to be a problem that teams with great harnesses will be solving fairly soon if they haven't already.

New comment by peterbell_nyc in "The future of everything is lies, I guess: Work"

peterbell_nyc — Tue, 14 Apr 2026 17:14:28 +0000

I model this as "stacked sigmoid curves". I have no reason to believe that any specific technological implementation will be exponential in impact vs sigmoidal.

However if we throw enough money and smart people at the problems and get enough value from the early sigmoid curves, the effective impact of a large number of stacked sigmoids could theoretically average to a linear impact, but if the sigmoids stay of a similar magnitude (on average) and appear at a higher velocity over time, you end up with an exponential made up of sigmoids*

* To be fair, it has been so long since I have done math that this may be completely incorrect mathematically - I'm not sure how to model it. However I think in practice more and more sigmoids coming faster and faster with a similar median amplitude is gonna feel very fast to humans very soon - whether or not it's a true exponential.

I'm honestly having a very hard time thinking through the likely implications of what's currently happening over the next 2-10 years. Anyone who has the answers, please do share. I'm assuming from Cynafin that it's a peturbated complex adaptive system so I can just OODA or experiment, sense and respond to what happens - not what I think might happen.

New comment by peterbell_nyc in "AI bug reports went from junk to legit overnight, says Linux kernel czar"

peterbell_nyc — Wed, 01 Apr 2026 16:06:18 +0000

4.5 and 5.2. Transformative. I know dozens of CTOs who were piloting AI in the fall, took a day to do something real over Xmas and then came back to their orgs with a mandate to double down and experiment with software factories once they saw what the November drops enabled.

New comment by peterbell_nyc in "Nearly half Dell's US workforce has rejected RTO. Rather WFH than get promoted (2024)"

peterbell_nyc — Sun, 05 Jan 2025 01:47:40 +0000

There's a reason there are IC and manager tracks at most companies employing devs. You can keep refining your craft as an IC and become a Staff/Principal/Distinguished. No need to become a manager to get promotions, more money and/or more responsibility.

New comment by peterbell_nyc in "Tech companies Palantir and Anduril form fellowship for AI adventures"

peterbell_nyc — Mon, 09 Dec 2024 18:43:33 +0000

You just made my day - whatever the answer :)

New comment by peterbell_nyc in "What spreadsheets need? LLMs, says Microsoft"

peterbell_nyc — Mon, 22 Jul 2024 16:38:33 +0000

A big chunk of the knowledge contained in enterprises is in spreadsheets. It's a no brainer to make that more easily accessible to LLMs. Wrap a thoughfully designed agentic framework around this and you presumably get a cheap junior data analyst that you can ask arbitrary questions of to better optimize value delivered through the knowledge you have. Anything from "free" drill down reports by territory on sales or ops to potentially running monte carlo simulations based on identified correlations to get a sense of the best classes of improvements to invest in to reduce shipping costs, improve sales conversions in specific verticals, etc.

I don't know if this is the framework, but this is one of the problems that needs to be effectively solved for large spreadsheets to unlock access to the data more efficiently.

New comment by peterbell_nyc in "Diving into Domain-Specific Languages: A Practical Guide for Developers"

peterbell_nyc — Mon, 03 Jun 2024 19:31:52 +0000

I will also note, that while it's possible to build a parser generator for external DSLs, for many use cases concrete syntaxes with existing parsers work fine. Back in 2000 I created a set of DSLs for generating web applications and I used the concrete, serializable syntax of XML as I got the capacity to describe types fairly clearly and I didn't need to generate a parser (or more importantly, an IDE plugin with auto complete and type validation). I figured I could always write a lightweight interface to display the info from the concrete DSLs to save business users from the angle brakcets. Json would allow you to do the same these days.

New comment by peterbell_nyc in "Diving into Domain-Specific Languages: A Practical Guide for Developers"

peterbell_nyc — Mon, 03 Jun 2024 19:31:37 +0000

Loads. Back in 1999 I built an entire suite of DSLs for generating web applications - you described objects, properties, validation rules, newsletter settings, commerce functionality, forum rules, etc and it used some combination of compile time code generation and run time parsing to deliver the functionality, substantially decreasing the dev overhead in building "good enough" web apps.

New comment by peterbell_nyc in "The CAP theorem. The Bad, the Bad, & the Ugly"

peterbell_nyc — Fri, 08 Mar 2024 13:51:57 +0000

It seems like people make this really complicated.

There will occasionally be network partitions. When there are, a given node can either respond (potentially inconsistently) or not. So you pick some balance between consistency and availability. Latency is really just a proxy for availability - as latency tends towards infinite, availability tends towards zero. Of course you can wait until the network partition is resolved and the nodes are caught up - but I think it's simpler to consider that as not being available for a period of time rather than as having a high latency at that time.

Consistency or Availability - you don't have to pick one, but the more consistent you want to be (in the case of network partitions) the less available you'll be.

New comment by peterbell_nyc in "Practical ways to increase product velocity"

peterbell_nyc — Fri, 08 Dec 2023 17:37:39 +0000

Lean software development has a lot to say about this too - check out anything on value stream mapping and if you haven't already, try "Principles of Product Development Flow". It's dry but good.