Hacker News: px1999

New comment by px1999 in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"

px1999 — Thu, 04 Jun 2026 02:33:04 +0000

My org now sends some portion of our requests to non-anthropic models because refusal has become common from Claude. The requests themselves aren't dangerous, we find that benign requests in biological science wind up being blocked semi-frequently.

If it gets worse in future releases, we'd likely step fully away towards more useful (for us) models even if they're less capable.

New comment by px1999 in "Verification debt: the hidden cost of AI-generated code"

px1999 — Sat, 07 Mar 2026 23:39:52 +0000

Very well said.

I think that "deciding what types of code can be reliably handed off to AI" might be missing from the list. It's orders of magnitude easier to nail 80% all the time than 100% all the time. I could see standalone products even developing in this space.

New comment by px1999 in "Fix your tools"

px1999 — Mon, 23 Feb 2026 03:58:01 +0000

Tools exist to be an energy/effort multiplier, so it's pretty intuitive that increasing that multiplier will make it easier to get more done.

In practice it's pretty difficult to find the balance between yak shaving and piling in unnecessary manual labour by just trying to do the work with existing (possibly poorly fitting) tools.

If you're planning to stick with your current tools for a long time, each 1% improvement compounds massively over time, so that balance is probably much closer to yak shaving than most people might realise.

New comment by px1999 in "Agent orchestration for the timid"

px1999 — Sat, 24 Jan 2026 22:55:21 +0000

Imo there's a huge blind spot forming between 6 and 8 when talking to people and in reading posts by various agent evangelists - few people seem to be focussing on building "high quality" changes vs maximising throughput of low quality work items.

My (boring b2b/b2e) org has scripts that wrap a small handful of agent calls to handle/automate our workflow. These have been incredibly valuable.

We still 'yolo' into PRs, use agents to improve code quality, do initial checks via gating. We're trying to get docs working through the same approach. We see huge value in automating and lightweight orchestration of agents, but other parts of the whole system are the bottleneck, so theres no real point in running more than a couple of agents concurrently - claude could already build a low quality version our entire backlog in a week.

Is anyone exploring the (imo more practically useful today) space of using agents to put together better changes vs "more commits"?

New comment by px1999 in "Ask HN: How are you LLM-coding in an established code base?"

px1999 — Fri, 19 Dec 2025 23:41:02 +0000

My org has built internal tooling that approximates this. It's incredibly valuable from a manual test perspective though we haven't managed to get the agent part working well, app startup times (10+ min) make iterating hard.

Do you have customers who have faced/solved this problem? If so, how did they do it -- it seems like a killer on the approach?

New comment by px1999 in "Show HN: Picknplace.js, an alternative to drag-and-drop"

px1999 — Fri, 19 Dec 2025 07:40:00 +0000

This is really nice and a very original take. It feels good on mobile / other touch devices.

I'd love to see it feel a bit more polished on desktop (maybe I'll give that a shot if I find a bit of spare time!) - I could see a few simple things like adding up/down arrows to the picked item and wiring into up and down arrow presses going a long way to making it work really well there too.

Genuinely, thank you for sharing this, it's something different and interesting.

New comment by px1999 in "Low-background Steel: content without AI contamination"

px1999 — Wed, 11 Jun 2025 01:03:05 +0000

Following this logic, why write anything at all? Shakespeare's sonnets are arrangements of existing words that were possible before he wrote them. Every mathematical proof, novel, piece of journalism is simply a configuration of symbols that existed in the space of all possible configurations. The fact that something could be generated doesn't negate its value when it is generated for a specific purpose, context, and audience.

New comment by px1999 in "Ask HN: Go deep into AI/LLMs or just use them as tools?"

px1999 — Sat, 24 May 2025 10:54:36 +0000

Consider this (possibly very bad) take:

RAG could largely be replaced with tool use to a search engine. You could keep some of the approach around indexing/embeddings/semantic search, but it just becomes another tool call to a separate system.

How would you feel about becoming an expert in something that is so in flux and might disappear? That might help give you your answer.

That said, there's a lot of comparatively low hanging fruit in LLM adjacent areas atm.

New comment by px1999 in "Why can't HTML alone do includes?"

px1999 — Sat, 03 May 2025 23:57:04 +0000

The Umbraco CMS was amazing during the time that it used and supported XSLT.

While it evaluated the xslt serverside it was a really neat and simple approach.

New comment by px1999 in "Migrating away from Rust"

px1999 — Mon, 28 Apr 2025 21:13:15 +0000

I expect it will wind up like search engines where you either submit urls for indexing/inclusion or wait for a crawl to pick your information up.

Until the tech catches up it will have a stifling effect on progress toward and adoption of new things (which imo is pretty common of new/immature tech, eg how culture has more generally kind of stagnated since the early 2000s)

New comment by px1999 in "I genuinely don't understand why some people are still bullish about LLMs"

px1999 — Fri, 28 Mar 2025 22:34:28 +0000

Except value isnt polarised like that.

In a research context, it provides pointers, and keywords for further investigation. In a report-writing context it provides textual content.

Neither of these or the thousand other uses are worthless. Its when you expect working and complete work product that it's (subjectively, maybe) worthless but frankly aiming for that with current gen technology is a fool's errand.

New comment by px1999 in "DOGE employees ordered to stop using Slack"

px1999 — Wed, 05 Feb 2025 21:54:16 +0000

devoir de désobéissance is _duty_ of disobedience.

If they choose to follow orders they know are illegal they can be personally liable.

New comment by px1999 in "ROCm Device Support Wishlist"

px1999 — Tue, 21 Jan 2025 00:13:38 +0000

AMD's offer was more than fair. Hotz was throwing a trantrum.

New comment by px1999 in "Gemini 2.0: our new AI model for the agentic era"

px1999 — Wed, 11 Dec 2024 23:11:04 +0000

The business model doesn't matter.

I can write something with Microsoft tech and expect it with reasonable likelihood to work in 10 years (even their service-based stuff), but can't say the same about anything from Google.

That alone stops me/my org buying stuff from Google.

New comment by px1999 in "ChatGPT Pro"

px1999 — Thu, 05 Dec 2024 20:33:54 +0000

Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)

> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one

This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.

New comment by px1999 in "Terence Tao on O1"

px1999 — Sat, 14 Sep 2024 23:21:28 +0000

Specifically within the last week, I have used Claude and Claude via cursor to:

- write some moderately complex powershell to perform a one-off process

- add typescript annotations to a random file in my org's codebase

- land a minor feature quickly in another codebase

- suggest libraries and write sample(ish) code to see what their rough use would look like to help choose between them for a future feature design

- provide text to fill out an extensive sales RFT spreadsheet based on notes and some RAG

- generat some very domain-specific realistic sounding test data (just naming)

- scaffold out some PowerPoint slides for a training session

There are likely others (LLMs have helped with research and in my personal life too)

All of these are things that I could do (and probably do better) but I have a young baby at the moment and the situation means that my focus windows are small and I'm time poor. With this workflow I'm achieving more than I was when I had fully uninterrupted time.

New comment by px1999 in "No "Hello", No "Quick Call", and No Meetings Without an Agenda"

px1999 — Fri, 23 Aug 2024 01:16:40 +0000

This is great, but I wish there was a shorter and more to the point version for me to link folks to.

Each of the ideas in here is solid, but there's too much writing around the core idea -- a sentence or two for each point and then a tldr like "put in some basic level of effort if you're going to ask for others' valuable time." would do it for me personally.

New comment by px1999 in "Parents outraged at Snoo after smart bassinet company charges fee to rock crib"

px1999 — Sun, 18 Aug 2024 22:48:36 +0000

The aftermarket for these things means that the cost winds up being split between multiple parties in a lot of cases.

Anecdotally, most parents within my circle bought their Snoo used and sold it after use. I bought an unopened snoo from facebook marketplace for $X and sold it after 6 months for $X-200.

I was a little annoyed that Happiest Baby is meddling with the resale value (because I was expecting to be able to sell it on after a few months of use)

IMO even though the product is overpriced, I'd have happily paid 5k for the extra sleep I believe it gave me.

New comment by px1999 in "Playwright Test Generator"

px1999 — Thu, 14 Mar 2024 21:58:21 +0000

My org uses codegen as a starting point for one of our test layers.

It works for us probably because we sidestep the pain points you list - the environments we run in are pristine complete copies of known datasets, we remove as many sources of randomness as possible, and our environment flakiness level is very low.

They still break but usually because the locators in use have been chosen poorly (or we've made planned changes to a page/component)

We're a web based b2b saas that runs an instance of the entire environment for each of our customers. Our non prod setup consists of a bajillion static test environments but more importantly we use testcontainers to spin up the transient test environments from database snapshots. Using the recorder on the static environments (before the transient ones existed) _was_ a pain

New comment by px1999 in "Goody-2, the world's most responsible AI model"

px1999 — Fri, 09 Feb 2024 20:57:38 +0000

A level of fear allows the introduction of regulatory moats that protect the organisations who are currently building and deploying these models at scale.

"It's dangerous" is a beneficial lie for eg openai to push because they can afford any compliance/certification process that's introduced (hell, they'd probably be heavily involved in designing the process)