<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: Barathkanna</title><link>https://news.ycombinator.com/user?id=Barathkanna</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 25 Apr 2026 12:36:16 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=Barathkanna" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>Sounds like a plan, But what if you can just pay a fixed cost every month and not worry about anything?</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:21:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347146</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47347146</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347146</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>That’s true, but AI is interesting because consumption-based pricing introduces a lot more variance than typical SaaS infrastructure. One user action can trigger dozens of model calls in an agent workflow. That’s partly why we started experimenting with models like <a href="https://oxlo.ai" rel="nofollow">https://oxlo.ai</a> where the pricing flips back to a fixed subscription and we absorb the usage spikes.</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:19:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347134</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47347134</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347134</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>Local models help remove token cost uncertainty, but they shift the problem to infrastructure and ops. GPUs, scaling, maintenance, and latency can add up quickly depending on the workload. For many builders it ends up being a tradeoff between predictable infra cost and flexible API usage.</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:14:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347111</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47347111</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347111</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>That’s great. Real-time tracking is a big step already. The tricky part we kept running into was the variance itself, especially with retries and agent loops. That’s partly why we started experimenting with Oxlo.ai (<a href="https://oxlo.ai" rel="nofollow">https://oxlo.ai</a>) where the pricing model absorbs that variance so builders don’t have to constantly model token risk.</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:13:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347105</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47347105</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347105</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>One underlooked source of variance is retries from formatting failures. In many agent systems the loops dominate the cost, not the raw token length.<p>We ran into the same issue building agent workflows, which is why we started building <a href="https://oxlo.ai" rel="nofollow">https://oxlo.ai</a> — experimenting with a flat subscription model where we absorb the token variance so builders don’t have to constantly model token risk.</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:11:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347087</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47347087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347087</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>Agreed. The real cost unit becomes the whole agent workflow, not a single LLM call. One user action can trigger dozens of calls.<p>We ran into the same issue and ended up building <a href="https://oxlo.ai" rel="nofollow">https://oxlo.ai</a> to make the cost side more predictable for agent workloads.</p>
]]></description><pubDate>Thu, 12 Mar 2026 06:08:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47347073</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47347073</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47347073</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>Exactly. That’s actually why we started building Oxlo.ai. Early stage builders usually just want to experiment without worrying too much about token cost spikes.</p>
]]></description><pubDate>Wed, 11 Mar 2026 06:58:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47332432</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47332432</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47332432</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>True, but for early stage builders it’s harder to design those guardrails upfront. A lot of the time you only discover the retry patterns and cost spikes once real users start hitting the system.</p>
]]></description><pubDate>Wed, 11 Mar 2026 06:50:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47332377</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47332377</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47332377</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How are people forecasting AI API costs for agent workflows?"]]></title><description><![CDATA[
<p>Local models solve the marginal cost problem, but they move the complexity into infrastructure and throughput planning instead.</p>
]]></description><pubDate>Wed, 11 Mar 2026 06:46:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47332355</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47332355</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47332355</guid></item><item><title><![CDATA[Ask HN: How are people forecasting AI API costs for agent workflows?]]></title><description><![CDATA[
<p>I’ve been experimenting with agent-based features and one thing that surprised me is how hard it is to estimate API costs.<p>A single user action can trigger anywhere from a few to dozens of LLM calls (tool use, retries, reasoning steps), and with token-based pricing the cost can vary a lot.<p>How are builders here planning for this when pricing their SaaS?<p>Are you just padding margins, limiting usage, or building internal cost tracking?
Also curious, would a service that offers predictable pricing for AI APIs (like a fixed subscription cost) actually be useful for people building agentic workflows?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47332177">https://news.ycombinator.com/item?id=47332177</a></p>
<p>Points: 5</p>
<p># Comments: 23</p>
]]></description><pubDate>Wed, 11 Mar 2026 06:06:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47332177</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47332177</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47332177</guid></item><item><title><![CDATA[Show HN: Oxlo.ai – AI APIs with unlimited tokens and request based pricing]]></title><description><![CDATA[
<p>Hi HN,<p>I’m one of the founders of Oxlo.ai. We’re building a developer first AI API platform focused on simplifying how small teams integrate AI into production.<p>Most AI APIs charge per token, which can make costs unpredictable as usage grows. Oxlo.ai takes a different approach: request based pricing with unlimited token output per request.<p>We provide unified API access to curated open models across:
• Text generation
• Coding
• Embeddings
• Image generation
• Audio & speech
• Computer vision<p>The goal isn’t to replace large API providers. If you’re already using a major API in production, Oxlo.ai can act as a complementary layer.<p>For example, teams can:
• Route simpler or lower-priority workloads to Oxlo.ai under predictable pricing
• Keep higher-complexity or overflow workloads with their existing provider
• Implement fallback routing when one endpoint is busy<p>This hybrid approach can improve cost control while maintaining production reliability.<p>We’re still early (<3k users) and actively looking for feedback, especially from teams running AI features in production.<p>Happy to answer questions.<p>— Barath Kanna - Founder, Oxlo.ai</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47128001">https://news.ycombinator.com/item?id=47128001</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 23 Feb 2026 20:06:20 +0000</pubDate><link>https://www.oxlo.ai/</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=47128001</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47128001</guid></item><item><title><![CDATA[New comment by Barathkanna in "Ask HN: How do you budget for token based AI APIs?"]]></title><description><![CDATA[
<p>Agreed. Self-hosting gives the cleanest fixed cost, but you pay for it in ops and capacity planning. I’m mainly curious whether there’s a middle ground that gives early teams more predictable spend without immediately taking on full infra overhead.</p>
]]></description><pubDate>Tue, 27 Jan 2026 15:35:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=46781360</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46781360</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46781360</guid></item><item><title><![CDATA[Ask HN: How do you budget for token based AI APIs?]]></title><description><![CDATA[
<p>The default norm today for using AI models via APIs is token based pricing, where you pay based on how much you use.<p>While this isn’t hard to understand, in practice it makes costs harder to predict, especially for small teams moving from experiments to early production. This feels less like a technical problem and more like a budgeting and planning problem.<p>I’m curious about alternative pricing abstractions, for example a subscription with unlimited tokens but a capped number of requests, aimed at making monthly spend easier to reason about while building.<p>For people running AI in production today, does token based billing give you enough predictability, or would a model like this actually reduce friction? What tradeoffs would matter most to you?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46781011">https://news.ycombinator.com/item?id=46781011</a></p>
<p>Points: 1</p>
<p># Comments: 4</p>
]]></description><pubDate>Tue, 27 Jan 2026 15:10:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=46781011</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46781011</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46781011</guid></item><item><title><![CDATA[New comment by Barathkanna in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"]]></title><description><![CDATA[
<p>I asked GPT for a rough estimate to benchmark prompt prefill on an 8,192 token input.
 • 16× H100: 8,192 / (20k to 80k tokens/sec) ≈ 0.10 to 0.41s
 • 2× Mac Studio (M3 Max): 8,192 / (150 to 700 tokens/sec) ≈ 12 to 55s<p>These are order-of-magnitude numbers, but the takeaway is that multi H100 boxes are plausibly ~100× faster than workstation Macs for this class of model, especially for long-context prefill.</p>
]]></description><pubDate>Tue, 27 Jan 2026 10:20:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=46777976</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46777976</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46777976</guid></item><item><title><![CDATA[New comment by Barathkanna in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"]]></title><description><![CDATA[
<p>That won’t realistically work for this model. Even with only ~32B active params, a 1T-scale MoE still needs the full expert set available for fast routing, which means hundreds of GB to TBs of weights resident. Mac Studios don’t share unified memory across machines, Thunderbolt isn’t remotely comparable to NVLink for expert exchange, and bandwidth becomes the bottleneck immediately. You could maybe load fragments experimentally, but inference would be impractically slow and brittle. It’s a very different class of workload than private coding models.</p>
]]></description><pubDate>Tue, 27 Jan 2026 10:14:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=46777911</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46777911</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46777911</guid></item><item><title><![CDATA[New comment by Barathkanna in "Kimi Released Kimi K2.5, Open-Source Visual SOTA-Agentic Model"]]></title><description><![CDATA[
<p>A realistic setup for this would be a 16× H100 80GB with NVLink. That comfortably handles the active 32B experts plus KV cache without extreme quantization. Cost-wise we are looking at roughly $500k–$700k upfront or $40–60/hr on-demand, which makes it clear this model is aimed at serious infra teams, not casual single-GPU deployments. I’m curious how API providers will price tokens on top of that hardware reality.</p>
]]></description><pubDate>Tue, 27 Jan 2026 09:55:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=46777766</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46777766</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46777766</guid></item><item><title><![CDATA[New comment by Barathkanna in "I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor"]]></title><description><![CDATA[
<p>TLDR: AI didn’t diagnose anything, it turned years of messy health data into clear trends. That helped the author ask better questions and have a more useful conversation with their doctor, which is the real value here.</p>
]]></description><pubDate>Tue, 27 Jan 2026 09:47:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46777714</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46777714</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46777714</guid></item><item><title><![CDATA[New comment by Barathkanna in "IP Addresses Through 2025"]]></title><description><![CDATA[
<p>TLDR: IPv4 is fully exhausted and no longer growing. Internet growth now depends on IPv6 adoption and address sharing, but IPv6 rollout is still uneven across regions.</p>
]]></description><pubDate>Wed, 21 Jan 2026 09:11:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46703029</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46703029</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46703029</guid></item><item><title><![CDATA[New comment by Barathkanna in "Our approach to age prediction"]]></title><description><![CDATA[
<p>I get why this exists and appreciate the transparency, but it still feels like a slippery middle ground. Age prediction avoids hard ID checks, which is good for privacy, yet it also normalizes behavioral inference about users that can be wrong in subtle ways. I’m supportive of the safety goal, but long term I’m more comfortable with systems that rely on explicit user choice and clear guardrails rather than probabilistic profiling, even if that’s messier to implement</p>
]]></description><pubDate>Wed, 21 Jan 2026 09:06:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=46702995</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46702995</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46702995</guid></item><item><title><![CDATA[New comment by Barathkanna in "Proof of Concept to Test Humanoid Robots"]]></title><description><![CDATA[
<p>What’s interesting here isn’t the humanoid form factor, it’s the systems integration. Plugging robots into Siemens’ industrial stack means they’re being treated like first-class nodes in existing logistics workflows, not special demos. If humanoids can reuse current automation software, safety models, and ops tooling, that lowers adoption friction a lot. The real question is whether reliability and MTBF get good enough to compete with simpler, non-humanoid automation at scale.</p>
]]></description><pubDate>Wed, 21 Jan 2026 09:03:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=46702976</link><dc:creator>Barathkanna</dc:creator><comments>https://news.ycombinator.com/item?id=46702976</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46702976</guid></item></channel></rss>