Hacker News: fryz

New comment by fryz in "Show HN: Chess on a Donut/Torus and Deep-Dive"

fryz — Thu, 04 Dec 2025 21:12:01 +0000

looks like the white king/queen aren't on the right colors (queen goes on her color) - confused me a bit when trying to map the space to a 2d board

How we helped a YC company (Upsolve) catch a GPT-5 regression

fryz — Fri, 14 Nov 2025 16:01:06 +0000

Article URL: https://www.arthur.ai/blog/how-upsolve-built-trusted-agentic-ai-with-arthur

Comments URL: https://news.ycombinator.com/item?id=45928131

Points: 17

# Comments: 0

New comment by fryz in "You should write an agent"

fryz — Fri, 07 Nov 2025 17:38:28 +0000

The "magic" is done via the JSON schemas that are passed in along with the definition of the tool.

Structured Output APIs (inc. the Tool API) take the schema and build a Context-free Grammar, which is then used during generation to mask which tokens can be output.

I found https://openai.com/index/introducing-structured-outputs-in-t... (have to scroll down a bit to the "under the hood" section) and https://www.leewayhertz.com/structured-outputs-in-llms/#cons... to be pretty good resources

New comment by fryz in "How to Catch a Wily Poacher in a Sting: A Thermal Robotic Deer"

fryz — Sat, 26 Jul 2025 17:11:42 +0000

yeah - it's one of the best success stories of wildlife conservation in the modern era.

New comment by fryz in "How to Catch a Wily Poacher in a Sting: A Thermal Robotic Deer"

fryz — Fri, 25 Jul 2025 20:29:34 +0000

FWIW, not saying it's right (as a hunter I wouldn't ever do this myself), but most of the biologists that build the population models, inc. the ones that they use to set the amount of hunting licenses or tags sold, build a certain amount of poaching into their models.

It's a particularly hard problem to solve - the hobby is usually spread through traditional means (you do it if your parents did it), and going all the way back in certain communities this was the main way to get meat, even before it became regulated. It's difficult to stop something that not only puts food on the table for your family, but has been done that way for generations.

This was one of the main contributors to the decline of the turkey population in the lower 48. In the early 1900's, a lot of folks thought turkey's were extinct because of over hunting and poaching, and the National Wild Turkey Foundation took efforts to restore the population for hunting.

New comment by fryz in "Lasagna Battery Cell"

fryz — Mon, 14 Jul 2025 13:59:20 +0000

This is one of those joyful concepts you learn about as a homeowner, especially on older homes.

If you have plumbing that's done in different metal materials (copper, steel, lead, etc.) and any of your pipes touch, you have to perform regular maintenance and apply a dielectric grease (another one of those single-use materials that you have to buy and store away) or your pipes could corrode and cause a ton of damage.

Ask HN: Are there other language/framework-specific LLMs?

fryz — Mon, 07 Jul 2025 17:32:51 +0000

We've been using V0 (https://v0.dev/) for a while now, and relative to other LLMs it definitely seems to be a level up in terms of the quality and readability of the code.

I've watched a few talks by the Vercel engineers and they mention that they've done a lot of work specifically for this by leaning into a strategy of:

* Training the model for specific language + frameworks (typescript, nextjs, react and shadcdn)

* Collecting and using high-quality code samples for training

I'm wondering if anyone else knows of any other model providers that have similar offerings where they're building LLMs to be better within the context of a specific language and/or framework (eg: Python + FastAPI, etc.).

Comments URL: https://news.ycombinator.com/item?id=44492694

Points: 2

# Comments: 0

New comment by fryz in "Show HN: High-performance GenAI engine now open source"

fryz — Thu, 24 Apr 2025 18:38:20 +0000

Yeah thanks for the feedback.

We think we stand out from our competitors in the space because we built first for the enterprise case, with consideration for things like data governance, acceptable use, and data privacy and information security that can be deployed in managed easily and reliably in customer-managed environments.

A lot of the products today have similar evaluations and metrics, but they either offer a SAAS solution or require some onerous integration into your application stack.

Because we started w/ the enterprise first, our goal was to get to value as quickly and as easily as possible (to avoid shoulder-surfing over zoom calls because we don't have access to the service), and think this plays out well with our product.

New comment by fryz in "Show HN: High-performance GenAI engine now open source"

fryz — Thu, 24 Apr 2025 18:35:54 +0000

Yeah great question

We based our hallucination detection on "groundedness" on a claim-by-claim basis, which evaluates whether the LLM response can be cited in provided context (eg: message history, tool calls, retrieved context from a vector DB, etc.)

We split the response into multiple claims, determine if a claim needs to be evaluated (eg: and isn't just some boilerplate) and then check to see if the claim is referenced in the context.

Show HN: High-performance GenAI engine now open source

fryz — Thu, 24 Apr 2025 13:55:19 +0000

Hey HN

After one too many customer firedrills regarding hallucinating or insecure AI models, we built a system to catch these issues before they reached production. The Arthur Engine has been running in Fortune 100 to AI Native Start-Ups over the past two years, putting security controls around more than 10 billion tokens in production every month. We're now opening up this service to developers, enabling you to leverage enterprise-grade solutions to provide guardrails and evals as a service, all for free.

Get it on Github (https://github.com/arthur-ai/arthur-engine) to start evaluating your models today

Highlights of Arthur's Engine include:

* Built for speed and scale: It performs well with p90 latencies of sub-second well over 100+ RPS

* Made for full lifecycle support: Ideal for pre-production validation, real-time guardrails, and post-production monitoring.

* Ease of use: It is designed to be easy for anyone to run and deploy whether you're working on it locally during development, or you're deploying it within a horizontally-scaling architecture for large-scale workloads.

* Unification of generative and traditional AI: The Arthur AI Engine can be used to evaluate a diverse range of models from LLMs and Agentic AI systems to binary classifiers, regression models, recommender systems, forecasting models, and more.

* Content-specific guardrail and detection features: Ranging from toxicity and hallucination detection to sensitive data (like PII, keyword/regex and custom rules) and prompt injection.

* Customizability: Plug in your own models or integrate with other model or guardrail providers with ease, and tailor the system to match your specific needs.

Having been first-hand witnesses to the lack of adequate AI monitoring tools and the general under delivery of Gen AI systems in production, we believe that such a capability shouldn't be exclusive to big-budget organizations. Our mission is to make AI better, for everyone, and we believe by opening up this tool we can help more people get to that goal.

Check out our GitHub repo for examples and directions on how to use the Arthur AI Engine for various purposes such as validation during development, real-time guardrails or performance troubleshooting using enriched logging data. (https://github.com/arthur-ai/engine-examples)

We can’t wait to see what you build

— Zach and Team Arthur

Comments URL: https://news.ycombinator.com/item?id=43782869

Points: 22

# Comments: 12

New comment by fryz in "Googler... ex-Googler"

fryz — Mon, 14 Apr 2025 14:34:36 +0000

You're not wrong but suffering isn't comparative. Because it's easier for someone to bounce back or have support in the transition doesn't mean it still doesn't suck.

New comment by fryz in "Show HN: Mastra – Open-source JS agent framework, by the developers of Gatsby"

fryz — Wed, 19 Feb 2025 21:41:19 +0000

To add some color to this

Anthropic does a good job of breaking down some common architecture around using these components [1] (good outline of this if you prefer video [2]).

"Agent" is definitely an overloaded term - the best framing of this I've seen is aligns more closely with the Anthropic definition. Specifically, an "agent" is a GenAI system that dynamically identifies the tasks ("steps" from the parent comment) without having to be instructed that those are the steps. There are obvious parallels to the reasoning capabilities that we've seen released in the latest cut of the foundation models.

So for example, the "Agent" would first build a plan for how to address the query, dynamically farm out the steps in that plan to other LLM calls, and then evaluate execution for correctness/success.

[1] https://www.anthropic.com/research/building-effective-agents [2] https://www.youtube.com/watch?v=pGdZ2SnrKFU

New comment by fryz in "Splitting engineering teams into defense and offense"

fryz — Mon, 14 Oct 2024 20:47:02 +0000

Neat article - I know the author mentioned this in the post, but I only see this working as long as a few assumptions hold:

* avg tenure / skill level of team is relatively uniform

* team is small with high-touch comms (eg: same/near timezone)

* most importantly - everyone feels accountable and has agency for work others do (eg: codebase is small, relatively simple, etc)

Where I would expect to see this fall apart is when these assumptions drift and holding accountability becomes harder. When folks start to specialize, something becomes complex, or work quality is sacrificed for short-term deliverables, the folks that feel the pain are the defense folks and they dont have agency to drive the improvements.

The incentives for folks on defense are completely different than folks on offense, which can make conversations about what to prioritize difficult in the long term.

Guide to LLM Experimentation and Development in 2024

fryz — Mon, 10 Jun 2024 15:28:34 +0000

Article URL: https://www.arthur.ai/blog/the-ultimate-guide-to-llm-experimentation-and-development-in-2024

Comments URL: https://news.ycombinator.com/item?id=40634623

Points: 2

# Comments: 0

New comment by fryz in "Why I'm Leaving New York City [video]"

fryz — Sun, 21 Apr 2024 12:04:59 +0000

shhhhhh - dont give it up :)

Part of whats nice about it is that there isnt a ton of people and the NJ stereotypes work out in our favor.

New comment by fryz in "After AI beat them, professional Go players got better and more creative"

fryz — Mon, 08 Apr 2024 20:32:44 +0000

FWIW, I find the classical chess tournaments with the super GMs to be fairly interesting, if only because the focus of the games is more about the metagame than about the game itself.

The article linked at the bottom of the source is a WSJ piece about how Magnus beats the best players because of the "human element".

A lot about the games today are about opening preparation, where the goal is to out-prepare and surprise your opponent by studying opening lines and esoteric responses (somewhere computer play has drastically opened up new fields). Similarly, during the middle/end-games, the best players will try to force uncomfortable decisions on their opponents, knowing what positions their opponents tend to not prefer. For example, in the candidates game round 1, Fabiano took Hikari into a position that had very little in the way of aggressive counter-play, effectively taking away a big advantage that Hikaru would otherwise have had.

Watching these games feels somewhat akin to watching generals develop strategies trying to out maneuver their counterparts on the other side, taking into consideration their strengths and weaknesses as much as the tactics/deployment of troops/etc.

New comment by fryz in "Grabbing Dinner"

fryz — Sun, 17 Sep 2023 19:29:47 +0000

From the article:

> When I asked Jody how much of his family’s meat is wild game, he initially said “about half.” Upon reflection, he bumped the number to 70 percent.

Doesn't sound like this is a justification for "culture" or "tradition". Certainly seems a lot more responsible than the average "tradition" of "I got it at the grocery store".

When you hunt for your own food, you are forced to consider the sacrifice of the animal and have to put in the work of preparing for the hunt and cleaning the animal. Things that anyone who's not done this takes for granted when they eat meat.

How to Think About Production Performance of Generative Text

fryz — Wed, 26 Apr 2023 14:07:20 +0000

Article URL: https://www.arthur.ai/blog/how-to-think-about-production-performance-of-generative-text

Comments URL: https://news.ycombinator.com/item?id=35713868

Points: 1

# Comments: 0

New comment by fryz in "Netflix's New Chapter"

fryz — Tue, 24 Jan 2023 17:31:49 +0000

Maybe we've not gotten there yet (kids are 2 and 1), but they can watch the same thing thousands of times and it will still glue them to the TV.

New comment by fryz in "Netflix's New Chapter"

fryz — Mon, 23 Jan 2023 21:16:38 +0000

You might not be the right market (or at least, the marketplace might be different for your demographic).

I'm a parent, and for me, and all my parent friends, Disney+ is the streaming service that generates the most value in our households. Along with all the old/nostalgic Disney animated films, they generate and acquire a lot of the "in" content for kids (Bluey, Mickey Mouse Kids House, etc.)

Before my kids, Disney+ would have been the first streaming service to make the cut. But now, it'll be the last.