<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: cowartc</title><link>https://news.ycombinator.com/user?id=cowartc</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 18 Jun 2026 13:47:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=cowartc" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by cowartc in "SWE-bench Verified no longer measures frontier coding capabilities"]]></title><description><![CDATA[
<p>The headline leads with contamination, but buried is that 59% of audited failures had test design defects.  That's a measurement system never validated against ground truth before being adopted industry-wide as a score that mattered.  They reported on it for two years but the gauge was broken the entire time.</p>
]]></description><pubDate>Sun, 26 Apr 2026 16:32:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47911561</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47911561</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47911561</guid></item><item><title><![CDATA[New comment by cowartc in "AI Might Be Lying to Your Boss"]]></title><description><![CDATA[
<p>PCW clustering around ~85-95% regardless of usage is a measurement bias, not a real signal.  In manufacturing, this would fail measurement system analysis by having a larger variation than you're trying to detect.  Companies trying to make headcount and copyright decisions on that are doing the AI version of measuring with a broken ruler.</p>
]]></description><pubDate>Sun, 26 Apr 2026 16:28:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=47911527</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47911527</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47911527</guid></item><item><title><![CDATA[New comment by cowartc in "Kimi vendor verifier – verify accuracy of inference providers"]]></title><description><![CDATA[
<p>The verifier isn't just a fraud detector.  It's an admission that open weights alone aren't a shippable contract.  Without a standardized verifier, a buyer has no way to know which case they're in.  The weights are the easy part.  The verification isn't.</p>
]]></description><pubDate>Tue, 21 Apr 2026 14:21:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47849251</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47849251</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47849251</guid></item><item><title><![CDATA[New comment by cowartc in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"]]></title><description><![CDATA[
<p>Interesting direction.  One question: How does this hold up outside the synthetic transformer on a real downstream task?  Reconstruction error is the right measure but its one step removed from the end task.  I'm curious whether HAE would show a similar gap on a downstream benchmark.</p>
]]></description><pubDate>Tue, 21 Apr 2026 14:17:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47849178</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47849178</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47849178</guid></item><item><title><![CDATA[New comment by cowartc in "Scientific datasets are riddled with copy-paste errors"]]></title><description><![CDATA[
<p>The real rate is certainly higher because this only catches the laziest form of error.  The harder problem is the same one we see in production ML.  Your pipeline can produce confident results on garbage data and nothing in the system tells you.  The first step isn't better models or better tools, its profiling the input before you trust anything downstream of it.</p>
]]></description><pubDate>Mon, 20 Apr 2026 15:25:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47835689</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47835689</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47835689</guid></item><item><title><![CDATA[New comment by cowartc in "Amazon's AI boom is creating mess of duplicate tools and data inside the company"]]></title><description><![CDATA[
<p>This is a symptom of the problem. The real issue is that everyone is running off and building their own thing without tying back to a north star and coordinating. I've seen this play out before in a F200.  Tooling proliferation resolves itself once everything is driving towards the same goal and owns it. Without that, you're just duplicating symptoms.</p>
]]></description><pubDate>Mon, 20 Apr 2026 15:22:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47835657</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47835657</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47835657</guid></item><item><title><![CDATA[Probabilistic Record Linkage Using Pretrained Text Embeddings]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.cambridge.org/core/journals/political-analysis/article/probabilistic-record-linkage-using-pretrained-text-embeddings/0414DDE200A0305EEDD7B31EA8849EB9">https://www.cambridge.org/core/journals/political-analysis/article/probabilistic-record-linkage-using-pretrained-text-embeddings/0414DDE200A0305EEDD7B31EA8849EB9</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47807603">https://news.ycombinator.com/item?id=47807603</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 17 Apr 2026 16:21:34 +0000</pubDate><link>https://www.cambridge.org/core/journals/political-analysis/article/probabilistic-record-linkage-using-pretrained-text-embeddings/0414DDE200A0305EEDD7B31EA8849EB9</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47807603</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47807603</guid></item><item><title><![CDATA[New comment by cowartc in "AI cybersecurity is not proof of work"]]></title><description><![CDATA[
<p>Hallucination vs real finding distinction is the core problem and doesn't get solved by a better model alone.  It gets solved by what you do with the output.  The verification layer is what makes the system production grade.</p>
]]></description><pubDate>Fri, 17 Apr 2026 15:10:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47806793</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47806793</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47806793</guid></item><item><title><![CDATA[New comment by cowartc in "The beginning of scarcity in AI"]]></title><description><![CDATA[
<p>The scarcity framing assumes compute is the bottleneck.  For most production deployment's Ive seen, the actual bottleneck is evaluation and knowing what to trust.<p>You can throw cheaper models at a problem all day but, if you can't measure where the model fails on your data, You're just making mistakes faster at a lower cost.<p>Compute gets cheaper.  Reliable evaluation doesn't.</p>
]]></description><pubDate>Fri, 17 Apr 2026 15:07:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47806755</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47806755</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47806755</guid></item><item><title><![CDATA[New comment by cowartc in "Show HN: Libretto – Making AI browser automations deterministic"]]></title><description><![CDATA[
<p>This is what I found doing playwright based extraction against anti-bot defenses.  Runtime agents were brittle.  It felt like trying to debug/audit a black box.<p>We used to deal with RPA stuff at work.  Always fragile.  Good to see evolution in the space.</p>
]]></description><pubDate>Thu, 16 Apr 2026 12:48:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47792235</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47792235</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47792235</guid></item><item><title><![CDATA[New comment by cowartc in "The next evolution of the Agents SDK"]]></title><description><![CDATA[
<p>The separation of harness from compute is the right architectural move.   The part that's still missing from most agent frameworks is the verification layer between steps.  Sandbox execution solves the safety problem.  It doesn't solve the accuracy problem.  Those are different failure modes that need different infrastructure.</p>
]]></description><pubDate>Wed, 15 Apr 2026 18:18:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47783045</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47783045</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47783045</guid></item><item><title><![CDATA[New comment by cowartc in "The Three Enterprise Layers Are Collapsing into One"]]></title><description><![CDATA[
<p>There's truth in the accountability angle, but the architectural driver is cost. Three vendor layers with human queues at every handoff is expensive.  A confidence gate that routes 70%  of decisions to automation and only escalation on uncertainty cost less and produces a measurable audit trail.  Which is actually more accountable than an approval chain where nobody tracks whether the approvals were correct.</p>
]]></description><pubDate>Sun, 12 Apr 2026 21:45:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47744889</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47744889</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47744889</guid></item><item><title><![CDATA[The Three Enterprise Layers Are Collapsing into One]]></title><description><![CDATA[
<p>Article URL: <a href="https://walsenburgtech.com/blog/hub-and-spoke-architecture-production-ai">https://walsenburgtech.com/blog/hub-and-spoke-architecture-production-ai</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47743972">https://news.ycombinator.com/item?id=47743972</a></p>
<p>Points: 3</p>
<p># Comments: 2</p>
]]></description><pubDate>Sun, 12 Apr 2026 20:15:20 +0000</pubDate><link>https://walsenburgtech.com/blog/hub-and-spoke-architecture-production-ai</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47743972</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47743972</guid></item><item><title><![CDATA[Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI]]></title><description><![CDATA[
<p>Article URL: <a href="https://walsenburgtech.com/blog/quantization-lora-benchmarking-local-llms">https://walsenburgtech.com/blog/quantization-lora-benchmarking-local-llms</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47731333">https://news.ycombinator.com/item?id=47731333</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 11 Apr 2026 15:15:42 +0000</pubDate><link>https://walsenburgtech.com/blog/quantization-lora-benchmarking-local-llms</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47731333</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47731333</guid></item><item><title><![CDATA[KYB Engine at 3 Quantization Levels: Accuracy Held. Cost Dropped 6x]]></title><description><![CDATA[
<p>Article URL: <a href="https://walsenburgtech.com/blog/quantization-benchmark-kyb-verification">https://walsenburgtech.com/blog/quantization-benchmark-kyb-verification</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47706279">https://news.ycombinator.com/item?id=47706279</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 09 Apr 2026 17:09:22 +0000</pubDate><link>https://walsenburgtech.com/blog/quantization-benchmark-kyb-verification</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47706279</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47706279</guid></item><item><title><![CDATA[Show HN: Same agentic pipeline, two implementations – custom async vs. LangGraph]]></title><description><![CDATA[
<p>Article URL: <a href="https://walsenburgtech.com/blog/from-custom-orchestration-to-langgraph">https://walsenburgtech.com/blog/from-custom-orchestration-to-langgraph</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47628609">https://news.ycombinator.com/item?id=47628609</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 03 Apr 2026 16:21:36 +0000</pubDate><link>https://walsenburgtech.com/blog/from-custom-orchestration-to-langgraph</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47628609</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47628609</guid></item><item><title><![CDATA[New comment by cowartc in "Ask HN: Who wants to be hired? (April 2026)"]]></title><description><![CDATA[
<p>Location: Walsenburg, CO<p>Remote: Yes (US only)<p>Willing to relocate: No<p>Technologies: Python, Java, TypeScript, Kafka, Spark, PostgreSQL, OpenSearch, AWS (Lambda, ECS, S3, Step Functions), Kubernetes, Terraform, MLflow, PyTorch, FastAPI, BentoML, LLM orchestration (Ollama/Llama 3), RAG architectures<p>Portfolio: <a href="https://walsenburgtech.com/blog" rel="nofollow">https://walsenburgtech.com/blog</a><p>Resume/CV: <a href="https://drive.google.com/file/d/180FozwS-NM4EV4Dhhop2t8nlsHpVTVa9/view?usp=drive_link" rel="nofollow">https://drive.google.com/file/d/180FozwS-NM4EV4Dhhop2t8nlsHp...</a><p>chriscowart18[at]gmail[dot]com<p>Staff Engineer / Engineering Manager | 18 years | Fintech, AI/ML, Platform Engineering<p>I build production AI systems in regulated financial environments. Previous: 60M+ ML classification system and high-throughput data pipelines at an auto lender. Managed engineering teams up to 10 and directed 35 developers through a CoE model.  Recently shipped a KYB and a tiered fraud detection pipeline with ML model serving.</p>
]]></description><pubDate>Wed, 01 Apr 2026 16:19:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47602912</link><dc:creator>cowartc</dc:creator><comments>https://news.ycombinator.com/item?id=47602912</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47602912</guid></item></channel></rss>