Hacker News: Barathkanna

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Thu, 12 Mar 2026 06:21:45 +0000

Sounds like a plan, But what if you can just pay a fixed cost every month and not worry about anything?

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Thu, 12 Mar 2026 06:19:44 +0000

That’s true, but AI is interesting because consumption-based pricing introduces a lot more variance than typical SaaS infrastructure. One user action can trigger dozens of model calls in an agent workflow. That’s partly why we started experimenting with models like https://oxlo.ai where the pricing flips back to a fixed subscription and we absorb the usage spikes.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Thu, 12 Mar 2026 06:14:28 +0000

Local models help remove token cost uncertainty, but they shift the problem to infrastructure and ops. GPUs, scaling, maintenance, and latency can add up quickly depending on the workload. For many builders it ends up being a tradeoff between predictable infra cost and flexible API usage.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Thu, 12 Mar 2026 06:13:34 +0000

That’s great. Real-time tracking is a big step already. The tricky part we kept running into was the variance itself, especially with retries and agent loops. That’s partly why we started experimenting with Oxlo.ai (https://oxlo.ai) where the pricing model absorbs that variance so builders don’t have to constantly model token risk.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Thu, 12 Mar 2026 06:11:40 +0000

One underlooked source of variance is retries from formatting failures. In many agent systems the loops dominate the cost, not the raw token length.

We ran into the same issue building agent workflows, which is why we started building https://oxlo.ai — experimenting with a flat subscription model where we absorb the token variance so builders don’t have to constantly model token risk.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Thu, 12 Mar 2026 06:08:26 +0000

Agreed. The real cost unit becomes the whole agent workflow, not a single LLM call. One user action can trigger dozens of calls.

We ran into the same issue and ended up building https://oxlo.ai to make the cost side more predictable for agent workloads.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Wed, 11 Mar 2026 06:58:09 +0000

Exactly. That’s actually why we started building Oxlo.ai. Early stage builders usually just want to experiment without worrying too much about token cost spikes.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Wed, 11 Mar 2026 06:50:16 +0000

True, but for early stage builders it’s harder to design those guardrails upfront. A lot of the time you only discover the retry patterns and cost spikes once real users start hitting the system.

New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"

Barathkanna — Wed, 11 Mar 2026 06:46:38 +0000

Local models solve the marginal cost problem, but they move the complexity into infrastructure and throughput planning instead.

Ask HN: How are people forecasting AI API costs for agent workflows?

Barathkanna — Wed, 11 Mar 2026 06:06:35 +0000

I’ve been experimenting with agent-based features and one thing that surprised me is how hard it is to estimate API costs.

A single user action can trigger anywhere from a few to dozens of LLM calls (tool use, retries, reasoning steps), and with token-based pricing the cost can vary a lot.

How are builders here planning for this when pricing their SaaS?

Are you just padding margins, limiting usage, or building internal cost tracking? Also curious, would a service that offers predictable pricing for AI APIs (like a fixed subscription cost) actually be useful for people building agentic workflows?

Comments URL: https://news.ycombinator.com/item?id=47332177

Points: 5

# Comments: 23

Show HN: Oxlo.ai – AI APIs with unlimited tokens and request based pricing

Barathkanna — Mon, 23 Feb 2026 20:06:20 +0000

Hi HN,

I’m one of the founders of Oxlo.ai. We’re building a developer first AI API platform focused on simplifying how small teams integrate AI into production.

Most AI APIs charge per token, which can make costs unpredictable as usage grows. Oxlo.ai takes a different approach: request based pricing with unlimited token output per request.

We provide unified API access to curated open models across: • Text generation • Coding • Embeddings • Image generation • Audio & speech • Computer vision

The goal isn’t to replace large API providers. If you’re already using a major API in production, Oxlo.ai can act as a complementary layer.

For example, teams can: • Route simpler or lower-priority workloads to Oxlo.ai under predictable pricing • Keep higher-complexity or overflow workloads with their existing provider • Implement fallback routing when one endpoint is busy

This hybrid approach can improve cost control while maintaining production reliability.

We’re still early (<3k users) and actively looking for feedback, especially from teams running AI features in production.

Happy to answer questions.

— Barath Kanna - Founder, Oxlo.ai

Comments URL: https://news.ycombinator.com/item?id=47128001

Points: 1

# Comments: 1

New comment by Barathkanna in "Ask HN: How do you budget for token based AI APIs?"

Barathkanna — Tue, 27 Jan 2026 15:35:04 +0000

Agreed. Self-hosting gives the cleanest fixed cost, but you pay for it in ops and capacity planning. I’m mainly curious whether there’s a middle ground that gives early teams more predictable spend without immediately taking on full infra overhead.

Ask HN: How do you budget for token based AI APIs?

Barathkanna — Tue, 27 Jan 2026 15:10:05 +0000

The default norm today for using AI models via APIs is token based pricing, where you pay based on how much you use.

While this isn’t hard to understand, in practice it makes costs harder to predict, especially for small teams moving from experiments to early production. This feels less like a technical problem and more like a budgeting and planning problem.

I’m curious about alternative pricing abstractions, for example a subscription with unlimited tokens but a capped number of requests, aimed at making monthly spend easier to reason about while building.

For people running AI in production today, does token based billing give you enough predictability, or would a model like this actually reduce friction? What tradeoffs would matter most to you?

Comments URL: https://news.ycombinator.com/item?id=46781011

Points: 1

# Comments: 4

New comment by Barathkanna in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

Barathkanna — Tue, 27 Jan 2026 10:20:36 +0000

I asked GPT for a rough estimate to benchmark prompt prefill on an 8,192 token input. • 16× H100: 8,192 / (20k to 80k tokens/sec) ≈ 0.10 to 0.41s • 2× Mac Studio (M3 Max): 8,192 / (150 to 700 tokens/sec) ≈ 12 to 55s

These are order-of-magnitude numbers, but the takeaway is that multi H100 boxes are plausibly ~100× faster than workstation Macs for this class of model, especially for long-context prefill.

New comment by Barathkanna in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

Barathkanna — Tue, 27 Jan 2026 10:14:13 +0000

That won’t realistically work for this model. Even with only ~32B active params, a 1T-scale MoE still needs the full expert set available for fast routing, which means hundreds of GB to TBs of weights resident. Mac Studios don’t share unified memory across machines, Thunderbolt isn’t remotely comparable to NVLink for expert exchange, and bandwidth becomes the bottleneck immediately. You could maybe load fragments experimentally, but inference would be impractically slow and brittle. It’s a very different class of workload than private coding models.

New comment by Barathkanna in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"

Barathkanna — Tue, 27 Jan 2026 09:55:30 +0000

A realistic setup for this would be a 16× H100 80GB with NVLink. That comfortably handles the active 32B experts plus KV cache without extreme quantization. Cost-wise we are looking at roughly $500k–$700k upfront or $40–60/hr on-demand, which makes it clear this model is aimed at serious infra teams, not casual single-GPU deployments. I’m curious how API providers will price tokens on top of that hardware reality.

New comment by Barathkanna in "I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor"

Barathkanna — Tue, 27 Jan 2026 09:47:00 +0000

TLDR: AI didn’t diagnose anything, it turned years of messy health data into clear trends. That helped the author ask better questions and have a more useful conversation with their doctor, which is the real value here.

New comment by Barathkanna in "IP Addresses Through 2025"

Barathkanna — Wed, 21 Jan 2026 09:11:22 +0000

TLDR: IPv4 is fully exhausted and no longer growing. Internet growth now depends on IPv6 adoption and address sharing, but IPv6 rollout is still uneven across regions.

New comment by Barathkanna in "Our approach to age prediction"

Barathkanna — Wed, 21 Jan 2026 09:06:19 +0000

I get why this exists and appreciate the transparency, but it still feels like a slippery middle ground. Age prediction avoids hard ID checks, which is good for privacy, yet it also normalizes behavioral inference about users that can be wrong in subtle ways. I’m supportive of the safety goal, but long term I’m more comfortable with systems that rely on explicit user choice and clear guardrails rather than probabilistic profiling, even if that’s messier to implement

New comment by Barathkanna in "Proof of Concept to Test Humanoid Robots"

Barathkanna — Wed, 21 Jan 2026 09:03:36 +0000

What’s interesting here isn’t the humanoid form factor, it’s the systems integration. Plugging robots into Siemens’ industrial stack means they’re being treated like first-class nodes in existing logistics workflows, not special demos. If humanoids can reuse current automation software, safety models, and ops tooling, that lowers adoption friction a lot. The real question is whether reliability and MTBF get good enough to compete with simpler, non-humanoid automation at scale.