<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: px1999</title><link>https://news.ycombinator.com/user?id=px1999</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 05 Jun 2026 07:00:53 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=px1999" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by px1999 in "I built a vulnerable app and spent $1,500 seeing if LLMs could hack it"]]></title><description><![CDATA[
<p>My org now sends some portion of our requests to non-anthropic models because refusal has become common from Claude.  The requests themselves aren't dangerous, we find that benign requests in biological science wind up being blocked semi-frequently.<p>If it gets worse in future releases, we'd likely step fully away towards more useful (for us) models even if they're less capable.</p>
]]></description><pubDate>Thu, 04 Jun 2026 02:33:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=48392950</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=48392950</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48392950</guid></item><item><title><![CDATA[New comment by px1999 in "Verification debt: the hidden cost of AI-generated code"]]></title><description><![CDATA[
<p>Very well said.<p>I think that "deciding what types of code can be reliably handed off to AI" might be missing from the list.  It's orders of magnitude easier to nail 80% all the time than 100% all the time.  I could see standalone products even developing in this space.</p>
]]></description><pubDate>Sat, 07 Mar 2026 23:39:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47292581</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=47292581</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47292581</guid></item><item><title><![CDATA[New comment by px1999 in "Fix your tools"]]></title><description><![CDATA[
<p>Tools exist to be an energy/effort multiplier, so it's pretty intuitive that increasing that multiplier will make it easier to get more done.<p>In practice it's pretty difficult to find the balance between yak shaving and piling in unnecessary manual labour by just trying to do the work with existing (possibly poorly fitting) tools.<p>If you're planning to stick with your current tools for a long time, each 1% improvement compounds massively over time, so that balance is probably much closer to yak shaving than most people might realise.</p>
]]></description><pubDate>Mon, 23 Feb 2026 03:58:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47117910</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=47117910</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47117910</guid></item><item><title><![CDATA[New comment by px1999 in "Agent orchestration for the timid"]]></title><description><![CDATA[
<p>Imo there's a huge blind spot forming between 6 and 8 when talking to people and in reading posts by various agent evangelists - few people seem to be focussing on building "high quality" changes vs maximising throughput of low quality work items.<p>My (boring b2b/b2e) org has scripts that wrap a small handful of agent calls to handle/automate our workflow.  These have been <i>incredibly</i> valuable.<p>We still 'yolo' into PRs, use agents to improve code quality, do initial checks via gating.  We're trying to get docs working through the same approach.  We see huge value in automating and lightweight orchestration of agents, but other parts of the whole system are the bottleneck, so theres no real point in running more than a couple of agents concurrently - claude could already build a low quality version our entire backlog in a week.<p>Is anyone exploring the (imo more practically useful today) space of using agents to put together better changes vs "more commits"?</p>
]]></description><pubDate>Sat, 24 Jan 2026 22:55:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=46748579</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=46748579</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46748579</guid></item><item><title><![CDATA[New comment by px1999 in "Ask HN: How are you LLM-coding in an established code base?"]]></title><description><![CDATA[
<p>My org has built internal tooling that approximates this.  It's incredibly valuable from a manual test perspective though we haven't managed to get the agent part working well, app startup times (10+ min) make iterating hard.<p>Do you have customers who have faced/solved this problem?  If so, how did they do it -- it seems like a killer on the approach?</p>
]]></description><pubDate>Fri, 19 Dec 2025 23:41:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=46332270</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=46332270</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46332270</guid></item><item><title><![CDATA[New comment by px1999 in "Show HN: Picknplace.js, an alternative to drag-and-drop"]]></title><description><![CDATA[
<p>This is really nice and a very original take.  It feels good on mobile / other touch devices.<p>I'd love to see it feel a bit more polished on desktop (maybe I'll give that a shot if I find a bit of spare time!) - I could see a few simple things like adding up/down arrows to the picked item and wiring into up and down arrow presses going a long way to making it work really well there too.<p>Genuinely, thank you for sharing this, it's something different and interesting.</p>
]]></description><pubDate>Fri, 19 Dec 2025 07:40:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46323233</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=46323233</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46323233</guid></item><item><title><![CDATA[New comment by px1999 in "Low-background Steel: content without AI contamination"]]></title><description><![CDATA[
<p>Following this logic, why write anything at all? Shakespeare's sonnets are arrangements of existing words that were possible before he wrote them. Every mathematical proof, novel, piece of journalism is simply a configuration of symbols that existed in the space of all possible configurations. The fact that something <i>could</i> be generated doesn't negate its value when it <i>is</i> generated for a specific purpose, context, and audience.</p>
]]></description><pubDate>Wed, 11 Jun 2025 01:03:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=44243214</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=44243214</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44243214</guid></item><item><title><![CDATA[New comment by px1999 in "Ask HN: Go deep into AI/LLMs or just use them as tools?"]]></title><description><![CDATA[
<p>Consider this (possibly very bad) take:<p>RAG could largely be replaced with tool use to a search engine.  You could keep some of the approach around indexing/embeddings/semantic search, but it just becomes another tool call to a separate system.<p>How would you feel about becoming an expert in something that is so in flux and might disappear?  That might help give you your answer.<p>That said, there's a lot of comparatively low hanging fruit in LLM adjacent areas atm.</p>
]]></description><pubDate>Sat, 24 May 2025 10:54:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=44080197</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=44080197</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44080197</guid></item><item><title><![CDATA[New comment by px1999 in "Why can't HTML alone do includes?"]]></title><description><![CDATA[
<p>The Umbraco CMS was amazing during the time that it used and supported XSLT.<p>While it evaluated the xslt serverside it was a really neat and simple approach.</p>
]]></description><pubDate>Sat, 03 May 2025 23:57:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=43883336</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=43883336</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43883336</guid></item><item><title><![CDATA[New comment by px1999 in "Migrating away from Rust"]]></title><description><![CDATA[
<p>I expect it will wind up like search engines where you either submit urls for indexing/inclusion or wait for a crawl to pick your information up.<p>Until the tech catches up it will have a stifling effect on progress toward and adoption of new things (which imo is pretty common of new/immature tech, eg how culture has more generally kind of stagnated since the early 2000s)</p>
]]></description><pubDate>Mon, 28 Apr 2025 21:13:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=43826174</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=43826174</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43826174</guid></item><item><title><![CDATA[New comment by px1999 in "I genuinely don't understand why some people are still bullish about LLMs"]]></title><description><![CDATA[
<p>Except value isnt polarised like that.<p>In a research context, it provides pointers, and keywords for further investigation.  In a report-writing context it provides textual content.<p>Neither of these or the thousand other uses are worthless.  Its when you expect working and complete work product that it's (subjectively, maybe) worthless but frankly aiming for that with current gen technology <i>is</i> a  fool's errand.</p>
]]></description><pubDate>Fri, 28 Mar 2025 22:34:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=43510579</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=43510579</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43510579</guid></item><item><title><![CDATA[New comment by px1999 in "DOGE employees ordered to stop using Slack"]]></title><description><![CDATA[
<p>devoir de désobéissance is _duty_ of disobedience.<p>If they choose to follow orders they know are illegal they can be personally liable.</p>
]]></description><pubDate>Wed, 05 Feb 2025 21:54:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=42955791</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=42955791</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42955791</guid></item><item><title><![CDATA[New comment by px1999 in "ROCm Device Support Wishlist"]]></title><description><![CDATA[
<p>AMD's offer was more than fair.  Hotz was throwing a trantrum.</p>
]]></description><pubDate>Tue, 21 Jan 2025 00:13:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=42774746</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=42774746</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42774746</guid></item><item><title><![CDATA[New comment by px1999 in "Gemini 2.0: our new AI model for the agentic era"]]></title><description><![CDATA[
<p>The business model doesn't matter.<p>I can write something with Microsoft tech and expect it with reasonable likelihood to work in 10 years (even their service-based stuff), but can't say the same about anything from Google.<p>That alone stops me/my org buying stuff from Google.</p>
]]></description><pubDate>Wed, 11 Dec 2024 23:11:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=42394101</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=42394101</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42394101</guid></item><item><title><![CDATA[New comment by px1999 in "ChatGPT Pro"]]></title><description><![CDATA[
<p>Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)<p>> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one<p>This surely makes the other models post smaller numbers.  I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.</p>
]]></description><pubDate>Thu, 05 Dec 2024 20:33:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=42332458</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=42332458</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42332458</guid></item><item><title><![CDATA[New comment by px1999 in "Terence Tao on O1"]]></title><description><![CDATA[
<p>Specifically within the last week, I have used Claude and Claude via cursor to:<p>- write some moderately complex powershell to perform a one-off process<p>- add typescript annotations to a random file in my org's codebase<p>- land a minor feature quickly in another codebase<p>- suggest libraries and write sample(ish) code to see what their rough use would look like to help choose between them for a future feature design<p>- provide text to fill out an extensive sales RFT spreadsheet based on notes and some RAG<p>- generat some very domain-specific realistic sounding test data (just naming)<p>- scaffold out some PowerPoint slides for a training session<p>There are likely others (LLMs have helped with research and in my personal life too)<p>All of these are things that I could do (and probably do better) but I have a young baby at the moment and the situation means that my focus windows are small and I'm time poor. With this workflow I'm achieving more than I was when I had fully uninterrupted time.</p>
]]></description><pubDate>Sat, 14 Sep 2024 23:21:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=41543848</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=41543848</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41543848</guid></item><item><title><![CDATA[New comment by px1999 in "No "Hello", No "Quick Call", and No Meetings Without an Agenda"]]></title><description><![CDATA[
<p>This is great, but I wish there was a shorter and more to the point version for me to link folks to.<p>Each of the ideas in here is solid, but there's too much writing around the core idea -- a sentence or two for each point and then a tldr like "put in some basic level of effort if you're going to ask for others' valuable time." would do it for me personally.</p>
]]></description><pubDate>Fri, 23 Aug 2024 01:16:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=41325538</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=41325538</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41325538</guid></item><item><title><![CDATA[New comment by px1999 in "Parents outraged at Snoo after smart bassinet company charges fee to rock crib"]]></title><description><![CDATA[
<p>The aftermarket for these things means that the cost winds up being split between multiple parties in a lot of cases.<p>Anecdotally, most parents within my circle bought their Snoo used and sold it after use.  I bought an unopened snoo from facebook marketplace for $X and sold it after 6 months for $X-200.<p>I was a little annoyed that Happiest Baby is meddling with the resale value (because I was expecting to be able to sell it on after a few months of use)<p>IMO even though the product is overpriced, I'd have happily paid 5k for the extra sleep I believe it gave me.</p>
]]></description><pubDate>Sun, 18 Aug 2024 22:48:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=41286188</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=41286188</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41286188</guid></item><item><title><![CDATA[New comment by px1999 in "Playwright Test Generator"]]></title><description><![CDATA[
<p>My org uses codegen as a starting point for one of our test layers.<p>It works for us probably because we sidestep the pain points you list - the environments we run in are pristine complete copies of known datasets, we remove as many sources of randomness as possible, and our environment flakiness level is very low.<p>They still break but usually because the locators in use have been chosen poorly (or we've made planned changes to a page/component)<p>We're a web based b2b saas that runs an instance of the entire environment for each of our customers.  Our non prod setup consists of a bajillion static test environments but more importantly we use testcontainers to spin up the transient test environments from database snapshots.  Using the recorder on the static environments (before the transient ones existed) _was_ a pain</p>
]]></description><pubDate>Thu, 14 Mar 2024 21:58:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=39709443</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=39709443</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39709443</guid></item><item><title><![CDATA[New comment by px1999 in "Goody-2, the world's most responsible AI model"]]></title><description><![CDATA[
<p>A level of fear allows the introduction of regulatory moats that protect the organisations who are currently building and deploying these models at scale.<p>"It's dangerous" is a beneficial lie for eg openai to push because they can afford any compliance/certification process that's introduced (hell, they'd probably be heavily involved in designing the process)</p>
]]></description><pubDate>Fri, 09 Feb 2024 20:57:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=39320457</link><dc:creator>px1999</dc:creator><comments>https://news.ycombinator.com/item?id=39320457</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39320457</guid></item></channel></rss>