Hacker News: cowartc

New comment by cowartc in "SWE-bench Verified no longer measures frontier coding capabilities"

cowartc — Sun, 26 Apr 2026 16:32:52 +0000

The headline leads with contamination, but buried is that 59% of audited failures had test design defects. That's a measurement system never validated against ground truth before being adopted industry-wide as a score that mattered. They reported on it for two years but the gauge was broken the entire time.

New comment by cowartc in "AI Might Be Lying to Your Boss"

cowartc — Sun, 26 Apr 2026 16:28:00 +0000

PCW clustering around ~85-95% regardless of usage is a measurement bias, not a real signal. In manufacturing, this would fail measurement system analysis by having a larger variation than you're trying to detect. Companies trying to make headcount and copyright decisions on that are doing the AI version of measuring with a broken ruler.

New comment by cowartc in "Kimi vendor verifier – verify accuracy of inference providers"

cowartc — Tue, 21 Apr 2026 14:21:39 +0000

The verifier isn't just a fraud detector. It's an admission that open weights alone aren't a shippable contract. Without a standardized verifier, a buyer has no way to know which case they're in. The weights are the easy part. The verification isn't.

New comment by cowartc in "High-Fidelity KV Cache Summarization Using Entropy and Low-Rank Reconstruction"

cowartc — Tue, 21 Apr 2026 14:17:29 +0000

Interesting direction. One question: How does this hold up outside the synthetic transformer on a real downstream task? Reconstruction error is the right measure but its one step removed from the end task. I'm curious whether HAE would show a similar gap on a downstream benchmark.

New comment by cowartc in "Scientific datasets are riddled with copy-paste errors"

cowartc — Mon, 20 Apr 2026 15:25:14 +0000

The real rate is certainly higher because this only catches the laziest form of error. The harder problem is the same one we see in production ML. Your pipeline can produce confident results on garbage data and nothing in the system tells you. The first step isn't better models or better tools, its profiling the input before you trust anything downstream of it.

New comment by cowartc in "Amazon's AI boom is creating mess of duplicate tools and data inside the company"

cowartc — Mon, 20 Apr 2026 15:22:48 +0000

This is a symptom of the problem. The real issue is that everyone is running off and building their own thing without tying back to a north star and coordinating. I've seen this play out before in a F200. Tooling proliferation resolves itself once everything is driving towards the same goal and owns it. Without that, you're just duplicating symptoms.

Probabilistic Record Linkage Using Pretrained Text Embeddings

cowartc — Fri, 17 Apr 2026 16:21:34 +0000

Article URL: https://www.cambridge.org/core/journals/political-analysis/article/probabilistic-record-linkage-using-pretrained-text-embeddings/0414DDE200A0305EEDD7B31EA8849EB9

Comments URL: https://news.ycombinator.com/item?id=47807603

Points: 1

# Comments: 0

New comment by cowartc in "AI cybersecurity is not proof of work"

cowartc — Fri, 17 Apr 2026 15:10:27 +0000

Hallucination vs real finding distinction is the core problem and doesn't get solved by a better model alone. It gets solved by what you do with the output. The verification layer is what makes the system production grade.

New comment by cowartc in "The beginning of scarcity in AI"

cowartc — Fri, 17 Apr 2026 15:07:13 +0000

The scarcity framing assumes compute is the bottleneck. For most production deployment's Ive seen, the actual bottleneck is evaluation and knowing what to trust.

You can throw cheaper models at a problem all day but, if you can't measure where the model fails on your data, You're just making mistakes faster at a lower cost.

Compute gets cheaper. Reliable evaluation doesn't.

New comment by cowartc in "Show HN: Libretto – Making AI browser automations deterministic"

cowartc — Thu, 16 Apr 2026 12:48:53 +0000

This is what I found doing playwright based extraction against anti-bot defenses. Runtime agents were brittle. It felt like trying to debug/audit a black box.

We used to deal with RPA stuff at work. Always fragile. Good to see evolution in the space.

New comment by cowartc in "The next evolution of the Agents SDK"

cowartc — Wed, 15 Apr 2026 18:18:47 +0000

The separation of harness from compute is the right architectural move. The part that's still missing from most agent frameworks is the verification layer between steps. Sandbox execution solves the safety problem. It doesn't solve the accuracy problem. Those are different failure modes that need different infrastructure.

New comment by cowartc in "The Three Enterprise Layers Are Collapsing into One"

cowartc — Sun, 12 Apr 2026 21:45:58 +0000

There's truth in the accountability angle, but the architectural driver is cost. Three vendor layers with human queues at every handoff is expensive. A confidence gate that routes 70% of decisions to automation and only escalation on uncertainty cost less and produces a measurable audit trail. Which is actually more accountable than an approval chain where nobody tracks whether the approvals were correct.

The Three Enterprise Layers Are Collapsing into One

cowartc — Sun, 12 Apr 2026 20:15:20 +0000

Article URL: https://walsenburgtech.com/blog/hub-and-spoke-architecture-production-ai

Comments URL: https://news.ycombinator.com/item?id=47743972

Points: 3

# Comments: 2

Quantization, LoRA, and the 8% Problem Benchmarking Local LLMs for Production AI

cowartc — Sat, 11 Apr 2026 15:15:42 +0000

Article URL: https://walsenburgtech.com/blog/quantization-lora-benchmarking-local-llms

Comments URL: https://news.ycombinator.com/item?id=47731333

Points: 3

# Comments: 0

KYB Engine at 3 Quantization Levels: Accuracy Held. Cost Dropped 6x

cowartc — Thu, 09 Apr 2026 17:09:22 +0000

Article URL: https://walsenburgtech.com/blog/quantization-benchmark-kyb-verification

Comments URL: https://news.ycombinator.com/item?id=47706279

Points: 1

# Comments: 0

Show HN: Same agentic pipeline, two implementations – custom async vs. LangGraph

cowartc — Fri, 03 Apr 2026 16:21:36 +0000

Article URL: https://walsenburgtech.com/blog/from-custom-orchestration-to-langgraph

Comments URL: https://news.ycombinator.com/item?id=47628609

Points: 2

# Comments: 0

New comment by cowartc in "Ask HN: Who wants to be hired? (April 2026)"

cowartc — Wed, 01 Apr 2026 16:19:47 +0000

Location: Walsenburg, CO

Remote: Yes (US only)

Willing to relocate: No

Technologies: Python, Java, TypeScript, Kafka, Spark, PostgreSQL, OpenSearch, AWS (Lambda, ECS, S3, Step Functions), Kubernetes, Terraform, MLflow, PyTorch, FastAPI, BentoML, LLM orchestration (Ollama/Llama 3), RAG architectures

Portfolio: https://walsenburgtech.com/blog

Resume/CV: https://drive.google.com/file/d/180FozwS-NM4EV4Dhhop2t8nlsHp...

chriscowart18[at]gmail[dot]com

Staff Engineer / Engineering Manager | 18 years | Fintech, AI/ML, Platform Engineering

I build production AI systems in regulated financial environments. Previous: 60M+ ML classification system and high-throughput data pipelines at an auto lender. Managed engineering teams up to 10 and directed 35 developers through a CoE model. Recently shipped a KYB and a tiered fraud detection pipeline with ML model serving.