Hacker News: rohansood15

New comment by rohansood15 in "Fable situation update from David Sacks"

rohansood15 — Sat, 13 Jun 2026 18:03:15 +0000

Cyber-attacks to start, and real-world terrorist attacks/bombings (inc. chemical/biological weapons) later.

Fable situation update from David Sacks

rohansood15 — Sat, 13 Jun 2026 17:54:53 +0000

Article URL: https://twitter.com/DavidSacks/status/2065853007619588171

Comments URL: https://news.ycombinator.com/item?id=48519695

Points: 10

# Comments: 7

New comment by rohansood15 in "Statement on US government directive to suspend access to Fable 5 and Mythos 5"

rohansood15 — Sat, 13 Jun 2026 01:37:27 +0000

I mean, we all pay via CC so it's bit like they can't know who you are if they wanted to.

New comment by rohansood15 in "AWS Bedrock to require sharing data with Anthropic for Mythos and future models"

rohansood15 — Wed, 10 Jun 2026 12:34:38 +0000

It is only abuse flagged data and there too for OpenAI they're not sharing that data with them. But for Anthropic they are.

New comment by rohansood15 in "AWS Bedrock to require sharing data with Anthropic for Mythos and future models"

rohansood15 — Wed, 10 Jun 2026 08:55:20 +0000

Pretty sure this doesn't work for any regulated enterprise or government client. But AWS knows this, so I am curious why they'd agree to it.

New comment by rohansood15 in "Launch HN: General Instinct (YC P26) – Frontier models on edge devices"

rohansood15 — Fri, 05 Jun 2026 17:58:55 +0000

I can't find it. Can you state your performance versus comparable 3-bit quantization from Unsloth/Bartowski? Edit: I appreciate that you seem to have open-sourced the quantization pipeline. This is not to question your work, but to understand where the outputs stand relative to the SoTA for quantization.

New comment by rohansood15 in "Launch HN: General Instinct (YC P26) – Frontier models on edge devices"

rohansood15 — Fri, 05 Jun 2026 17:40:34 +0000

Have you benchmarked against other 3-bit dynamic quants like Unsloth? I am sorry but this framing against a full precision, newer, smaller MoE just seems misleading. Also, Gemma-4-26B-A4B is not the SOTA for edge. Even at launch, that would be the 31B.

New comment by rohansood15 in "OpenAI frontier models and Codex are now available on AWS"

rohansood15 — Tue, 02 Jun 2026 02:31:05 +0000

Anthropic better get that IPO out soon. Their incredible revenue run-up was basically a result of botched Gemini releases and OpenAI having their hands-tied behind their Azure backs.

Anthropic models were quite literally the only viable serverless API (i.e. Bedrock) models on AWS. They didn't even bother releasing the recent Qwen 3.5/3.6 series. Combined with the token efficiency/ROI focus, I would really like to see how Antrhopic ends Q3.

New comment by rohansood15 in "I think Anthropic and OpenAI have found product-market fit"

rohansood15 — Thu, 28 May 2026 01:57:30 +0000

Are you comparing single-user requests or multiple concurrent requests when you say comparable to rented GPU? Most of the cost efficiencies kick in with concurrent/batch requests. A single H100 node can provide like 5k input + 2k output tok/s on a model like Qwen 3.6 35B-A3B with 30+ concurrent requests.

New comment by rohansood15 in "Green card seekers must leave U.S. to apply, Trump administration says"

rohansood15 — Sun, 24 May 2026 01:47:16 +0000

So ask for it. Seems like your issue isn't immigration, it is abuse. The recent changes don't do much to fix that, imo.

New comment by rohansood15 in "Green card seekers must leave U.S. to apply, Trump administration says"

rohansood15 — Sun, 24 May 2026 00:33:58 +0000

Let's say the government can't care for 100M people because of lack of doctors. Now they could train one over 10 years, or you could have one of the smartest doctors in the world come be 100M+1. Would you take that?

Now expand that across socio-economic spectrum (not enough plumbers, teachers, AI experts, researchers etc). That is what legal immigration is meant for.

New comment by rohansood15 in "Gemini 3.5 Flash"

rohansood15 — Wed, 20 May 2026 01:08:32 +0000

Subjective, but if we compare to compute not everyone needs the most expensive laptops or super computers for their work.

I think frontier models will be invaluable for scientific research, defense, financial analysis and such. But the average person probably would be reasonably well-served with a local model.

If you're in sales, customer service, product management and such - the leading open models at the 30B mark are already good enough.

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 08:11:44 +0000

Which part of this is a 'prediction'?

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 06:47:13 +0000

I don't think I follow?

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 05:37:03 +0000

Thanks for the info. Daniel fixed it - and no it wasn't an LLM error. :P

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 05:22:55 +0000

I used the same assumptions as the original HN post https://news.ycombinator.com/item?id=48168198

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 05:16:43 +0000

The title auto-corrected, my post was 'less' not 'more'.

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 05:09:50 +0000

Nope, HN changed the title.

https://imgur.com/a/UgJqWEh

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 05:05:57 +0000

The title is Apple Silicon costs LESS than OpenRouter. Not sure why it got updated to this - maybe because I referenced the original HN post?

Here's the full post:

TLDR; When you consider batching, cache and input tokens, together with the residual cost of Macbook Pro is actually 14% cheaper than OpenRouter. This becomes a whooping 3x (i.e. 65%) cheaper if you consider MoE models like Gemma 4 26B.

There was a well-meaning post yesterday by @DataDrivenAngel comparing costs of self-hosting LLMs v/s using OpenRouter (HN link). The analysis however had a few flaws as pointed out by the HN community, and I ran benchmarks on my M4 Max 128GB to adjust for those.

1. The estimate was based entirely using output tokens, instead of real-world input-output token mix. The numbers look very different if you consider a 4:1 or 5:1 input to output token ratio.

2. Batching/concurrency/caching improves token throughput, and if you're running multiple coding agents/work trees the performance gain can be significant.

3. A Macbook Pro is an asset purchase, and retains significant residual value through it's life. Probably not unreasonable to expect ~1.5-2.5k resale value after 3-5 years of use.

I ran vllm bench using a resonable approximation for a coding agent workload with concurrency 4 for Gemma 4 31B (same as the original post), and got the following results:

-----------------------------------

Serving Benchmark Gemma 4 31B Successful requests: 20 Maximum request concurrency: 4 Benchmark duration (s): 263.19 Total input tokens: 35000 Total generated tokens: 6400 Request throughput (req/s): 0.08 Output token throughput (tok/s): 24.32 Peak output token throughput (tok/s): 36 Peak concurrent requests: 8 Total token throughput (tok/s): 157.3

Scenario 3 years $0.15 Local cheaper (~6%) 5 years $0.14 Local cheaper (~13%) 7 years $0.13 Local cheaper (~19%)

-----------------------------------

Once you work out the math (using original assumptions on power costs and 5 year timeline), you get to a blended cost of ~$0.14 per million tokens for local, v/s ~$0.16 for OpenRouter. That is not a massive win. But it is close enough to flip the narrative from local being more expensive to 'it depends'.

But it doesn't end there. If you used an MoE model like Gemma 4 26B, the blended cost drops to $0.038 per million tokens, v/s OpenRouter's $0.1 per million. That is a ~3x difference.

-----------------------------------

Serving Benchmark Gemma 4 26B (MoE) Successful requests: 20 Maximum request concurrency: 4 Benchmark duration (s): 60.05 Total input tokens: 30002 Total generated tokens: 4870 Request throughput (req/s): 0.33 Output token throughput (tok/s): 81.1 Peak output token throughput (tok/s): 128 Peak concurrent requests: 8 Total token throughput (tok/s): 580.72

Scenario 3 years $0.040 Local cheaper (~60%) 5 years $0.038 Local cheaper (~62%) 7 years $0.035 Local cheaper (~65%)

-----------------------------------

This is not meant as an attack on the original analysis - I am sure the synthetic bench I used has a few holes, plus buying price/residual value varies a fair bit. Plus, I don't think anybody will run their MBP for inference for 5 years straight. But with worsening GPU supply and the inevitable price/access squeeze, I think local LLMs have a huge role to play. And this is on top of the privacy benefits. A misperceived price differential should not be the reason that slows down adoption.

New comment by rohansood15 in "Apple Silicon costs less than OpenRouter"

rohansood15 — Tue, 19 May 2026 04:59:53 +0000

That's some HN shenanigans, I swear I copy pasted my original title. https://imgur.com/a/UgJqWEh