Hacker News: GabrielBianconi

New comment by GabrielBianconi in "Even (very) noisy LLM evaluators are useful for improving AI agents"

GabrielBianconi — Sat, 30 May 2026 07:49:58 +0000

Any function that can score (i.e. "evaluate") your LLM system (e.g. your agent).

For example:

- You write a heuristic (regex, code, etc.) that assigns a score to an output

- You make another LLM score the output from your system (aka "LLM-as-a-judge")

- You have an automated system that can verify the generated outputs (e.g. does generated code compile or pass tests?)

People often talk about "LLM evals (evaluations)" which will include a set of evaluators i.e. scoring functions.

We'll make this clearer next time!

Even (very) noisy LLM evaluators are useful for improving AI agents

GabrielBianconi — Wed, 27 May 2026 07:49:56 +0000

Article URL: https://www.tensorzero.com/blog/even-very-noisy-llm-evaluators-are-useful-for-improving-ai-agents/

Comments URL: https://news.ycombinator.com/item?id=48291016

Points: 35

# Comments: 10

Designing for Agents

GabrielBianconi — Tue, 12 May 2026 15:41:26 +0000

Article URL: https://twitter.com/teddy_riker/status/2047312986696454584

Comments URL: https://news.ycombinator.com/item?id=48109916

Points: 1

# Comments: 0

New comment by GabrielBianconi in "Stop comparing price per million tokens: the hidden LLM API costs"

GabrielBianconi — Thu, 16 Apr 2026 20:03:37 +0000

It's getting more and more challenging to keep track!

New comment by GabrielBianconi in "If DSPy is so great, why isn't anyone using it?"

GabrielBianconi — Mon, 23 Mar 2026 16:58:40 +0000

TensorZero works with the OpenAI SDK out of the box:

```

from openai import OpenAI

# Point the client to the TensorZero Gateway

client = OpenAI(base_url="http://localhost:3000/openai/v1", api_key="not-used")

response = client.chat.completions.create(

    # Call any model provider (or TensorZero function)

    model="tensorzero::model_name::anthropic::claude-sonnet-4-6",

    messages=[

        {

            "role": "user",

            "content": "Share a fun fact about TensorZero.",

        }

    ],

)

```

You can layer additional features only as needed (fallbacks, templates, A/B testing, etc).

We're building an automated AI engineer, and it works

GabrielBianconi — Mon, 23 Mar 2026 16:20:03 +0000

Article URL: https://www.tensorzero.com/blog/automated-ai-engineer/

Comments URL: https://news.ycombinator.com/item?id=47491580

Points: 3

# Comments: 0

Mitchell Hashimoto on Feature Design [video]

GabrielBianconi — Fri, 19 Dec 2025 01:36:01 +0000

Article URL: https://twitter.com/mitchellh/status/2001810354096214059

Comments URL: https://news.ycombinator.com/item?id=46321305

Points: 3

# Comments: 0

Bandits in Your LLM Gateway

GabrielBianconi — Tue, 11 Nov 2025 15:32:13 +0000

Article URL: https://www.tensorzero.com/blog/bandits-in-your-llm-gateway/

Comments URL: https://news.ycombinator.com/item?id=45888437

Points: 3

# Comments: 0

Claude Plays Catan [video]

GabrielBianconi — Mon, 29 Sep 2025 18:31:35 +0000

Article URL: https://www.youtube.com/watch?v=BER3EhUIyz0

Comments URL: https://news.ycombinator.com/item?id=45417147

Points: 3

# Comments: 0

Is OpenAI's Reinforcement Fine-Tuning (RFT) Worth It?

GabrielBianconi — Thu, 25 Sep 2025 17:27:02 +0000

Article URL: https://www.tensorzero.com/blog/is-openai-reinforcement-fine-tuning-rft-worth-it/

Comments URL: https://news.ycombinator.com/item?id=45375954

Points: 4

# Comments: 0

How Kimi K2 achieves efficient RL parameter updates

GabrielBianconi — Tue, 16 Sep 2025 14:57:01 +0000

Article URL: https://moonshotai.github.io/checkpoint-engine/

Comments URL: https://news.ycombinator.com/item?id=45263188

Points: 3

# Comments: 0

Ask HN: How much would it cost to own and operate a personal gTLD?

GabrielBianconi — Fri, 29 Aug 2025 20:59:35 +0000

I've briefly looked into it before, but the discussion here [1] earlier today made me curious again:

How much would it cost to own and operate a personal gTLD? Say, `.gabriel`.

ChatGPT claims $250k to start then $100k per year. Is this reasonable? Completely off?

[1] https://news.ycombinator.com/item?id=45068215

Comments URL: https://news.ycombinator.com/item?id=45069326

Points: 3

# Comments: 1

Deploying DeepSeek on 96 H100 GPUs

GabrielBianconi — Fri, 29 Aug 2025 14:07:28 +0000

Article URL: https://lmsys.org/blog/2025-05-05-large-scale-ep/

Comments URL: https://news.ycombinator.com/item?id=45064329

Points: 285

# Comments: 80

Sporks of AGI: why the Real Thing is better than the Next Best Thing

GabrielBianconi — Thu, 28 Aug 2025 16:27:13 +0000

Article URL: https://sergeylevine.substack.com/p/sporks-of-agi

Comments URL: https://news.ycombinator.com/item?id=45054098

Points: 3

# Comments: 0

We raised $7.3M to build an open-source stack for industrial-grade LLM apps

GabrielBianconi — Tue, 19 Aug 2025 06:13:13 +0000

Article URL: https://www.tensorzero.com/blog/tensorzero-raises-7-3m-seed-round-to-build-an-open-source-stack-for-industrial-grade-llm-applications/

Comments URL: https://news.ycombinator.com/item?id=44948735

Points: 1

# Comments: 0

New comment by GabrielBianconi in "Fine-tuned small LLMs can beat large ones with programmatic data curation"

GabrielBianconi — Tue, 05 Aug 2025 14:48:11 +0000

We set up dataset splits and the usual best practices. Of course, if you overdo things, you can still hack benchmarks; our goal isn't to publish SOTA numbers but rather to illustrate results from our methodology. We didn't even tune hyperparameters, we just used the default choices. Definitely a valid concern for teams chasing SOTA though.

Thanks!

New comment by GabrielBianconi in "Ask HN: How does the Postgres ecosystem compare to Vitess at 1PB+?"

GabrielBianconi — Mon, 04 Aug 2025 22:56:16 +0000

Thanks, Sam! I'm excited to see what you guys come up with.

Ask HN: How does the Postgres ecosystem compare to Vitess at 1PB+?

GabrielBianconi — Mon, 04 Aug 2025 22:43:44 +0000

I'm not an expert in databases.

The MySQL ecosystem has a mature open-source solution for scaling horizontally with Vitess.

The Postgres ecosystem seems to have alternatives like Citus, CockroachDB, etc.

Are they similarly mature? How do they compare for massive-scale deployments (1PB+ of data, insert-heavy workload)?

Comments URL: https://news.ycombinator.com/item?id=44792168

Points: 4

# Comments: 2

New comment by GabrielBianconi in "Fine-tuned small LLMs can beat large ones with programmatic data curation"

GabrielBianconi — Mon, 04 Aug 2025 19:57:59 +0000

With supervised fine-tuning (SFT), you'll often see good results with 100-1000+ datapoints (they can be variations of the same prompt template). If you have more limited data, reinforcement fine-tuning (RFT) can work well in the 10-100 range.

Good luck!

New comment by GabrielBianconi in "Fine-tuned small LLMs can beat large ones with programmatic data curation"

GabrielBianconi — Mon, 04 Aug 2025 19:42:25 +0000

AFAIK, distillation typically refers to tuning on the logits of the larger model, so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post). We fine-tune on the outputs themselves.

But broadly speaking, yes, we generate data using a large model, curate the best samples using metrics from the environment, and fine-tune on that data. This isn't a novel technique from an academic perspective; our focus is on applying it to different use cases (e.g. agentic RAG, agentic tool use) and models (OpenAI, Google, Qwen).

Thanks!