Hacker News: reissbaker

New comment by reissbaker in "Was my $48K GPU server worth it?"

reissbaker — Fri, 22 May 2026 02:15:33 +0000

Qwen 3.6 27B is fine but it's not in the same ballpark as GLM-5.1 or Kimi K2.6.

If you truly want to scale up, you should get the 8xH200 with NVLink.

New comment by reissbaker in "Apple Silicon costs more than OpenRouter"

reissbaker — Mon, 18 May 2026 02:30:14 +0000

Most of these folks aren't running on spot/preemptible instances, they're doing 1-2 year reserved rentals. There isn't that much cheap compute floating around and you can be quite profitable on reservations (if you can get them — the compute shortage is real).

I think the question in terms of throwing money away isn't the inference layer: it's whether the companies training open models will be able to financially keep doing so. How long will Moonshot keep releasing future Kimi models? I think there's an interesting wedge they're exploring with being basically a base-model-trainer-as-a-service, i.e. selling rights to Fireworks to sell finetuning services to the Cursors of the world, but it's entirely possible it doesn't pan out.

That being said, Nvidia seems willing to step up to being the base model trainer of last resort via the Nemotron family of open models, since it helps sell more of their hardware — similar to their investments in the CUDA stack to sell hardware (unsurprisingly, Nemotron is designed to run most efficiently on Nvidia hardware, e.g. native NVFP4). So I suspect there will continue to be a pretty good market here.

New comment by reissbaker in "Local AI needs to be the norm"

reissbaker — Thu, 14 May 2026 19:24:22 +0000

LLM subscriptions are not "all you can eat," they have rate limits — and fundamentally there is no difference between subscription-with-rate-limit and typical usage-based business practices. Subscriptions are simply usage-based pricing with volume discounts in exchange for upfront payment; every single usage-based provider of pretty much anything offers the same kind of discounting for buying volume commitments upfront. Although from a business sense, subscriptions are even better than volume discounts... Because they're recurring, whereas reserved volume might not recur.

Subscriptions aren't gonna go away. They're great for businesses. Rate limits or pricing might change but the underlying business model is very good.

The reason usage-based is so much more expensive than subscription isn't that usage-based is the "true" cost and subscription is a loss leader — just like a buying 30 consecutive day passes to a gym being more expensive than a monthly membership isn't a result of memberships being a loss leader. Memberships are the business model! The day passes are overpriced to steer you into buying the membership.

New comment by reissbaker in "Local AI needs to be the norm"

reissbaker — Thu, 14 May 2026 19:13:26 +0000

> Sometimes it really is free though, because the hardware was bought to serve some other existing needs and that capital expense was fully depreciated quite some time ago.

No one has 8xRTX Pro 6000s that have depreciated to zero "quite some time ago."

> But using that 8xB200 setup to run inference in cheap, non-frontier models is plain waste

From whose perspective? If someone wants to run an open-source model — and plenty do — someone buying or renting an 8xB200 to serve it cheaply at scale is much better than everyone buying huge amounts of pointless, wasted hardware such as 8xRTX Pro 6000s for $80,000 per person.

New comment by reissbaker in "Local AI needs to be the norm"

reissbaker — Mon, 11 May 2026 20:15:10 +0000

The local rig is not free and requires very large capital expenditures while producing very low token throughput for large models. Within any time budget, you can get many orders of magnitude more large-model tokens off an 8xB200 than off a local rig. Therefore cloud tokens have a huge capital efficiency advantage over local rigs. That will continue basically forever, since there will always be large cloud companies willing to spend millions of dollars for more capital-efficient hardware, so Nvidia and friends will continue to spare no expense producing it, meaning the cloud hardware will be way too expensive if you're not a large inference company. You can also buy local rigs, but they will be less capital efficient per token, not more.

(This is a generous argument: it also ignores the massive software stack optimization the cloud companies do that doesn't trickle down to local-rig-sized deployments; for example, prefill/decode disaggregation, which would double the VRAM requirements for a local rig — if you could even do it on a local rig, which you can't, because local rigs don't have Infiniband. But at scale, prefill/decode disaggregation improves capital efficiency, since you can tune the compute-bound prefill node differently than the memory-bound decode node.)

The advantage of local rigs is not capital-efficient tokens. It's privacy. But then again, you can get zero-data-retention options from many inference companies, so for many use cases it may not matter unless you need strict guarantees the data never leaves the building...

New comment by reissbaker in "Local AI needs to be the norm"

reissbaker — Mon, 11 May 2026 20:07:36 +0000

There already are many subscriptions for LLM tokens: OpenAI, Claude, Synthetic (shameless plug), Zai...

I'm not sure what you mean by "There will not ever be a monthly subscription for LLM tokens." That already exists!

New comment by reissbaker in "Local AI needs to be the norm"

reissbaker — Mon, 11 May 2026 02:53:28 +0000

Users do not have an existing $80k of hardware, are not going to buy $80k of hardware for worse performance than paying $100/month, and models are continuing to grow in size while memory grows in price.

New comment by reissbaker in "Local AI needs to be the norm"

reissbaker — Sun, 10 May 2026 22:32:08 +0000

RTX 6000 Pro retails for $10k so an 8x is $80k before anything else in the computer, and long-context will have... pretty bad performance (20+ seconds of waiting before any tokens come out), but it's true it technically works.

I don't think cloud models are going away; the hardware for good perf is expensive and higher param count models will remain smarter for a looong time. Even if the hardware cost for kind-of-usable perf fell to only $10k, cloud ones will be way faster and you'd need a lot of tokens to break even.

New comment by reissbaker in "Bun is being ported from Zig to Rust"

reissbaker — Tue, 05 May 2026 01:32:56 +0000

Probably an experiment due to Bun's PRs to Zig being rejected (Zig does not allow AI use). If Rust works well enough, and the alternative is maintaining a fork of Zig, I'd guess they'd go with Rust.

New comment by reissbaker in "The text mode lie: why modern TUIs are a nightmare for accessibility"

reissbaker — Mon, 04 May 2026 05:59:07 +0000

For the Claude Code / OpenCode / Crush / etc new wave TUIs, it's not about composability or text streaming. It's basically a combination of a few tailwinds:

1. There's already a large-ish community of engineers who live in the terminal e.g. Vim/Neovim/tmux/zellij/etc users. Lots of engineering tasks are accomplished by running scripts in a terminal, so it makes sense for some people to just move as much of their work there as possible. This means there's a set of users you can address with dev tools that run in a terminal.

2. Cross-platform distribution among the platforms most of those people care about — macOS and Linux — is largely a solved problem via package managers. Distributing cross-platform native apps is fragmented at best.

3. Building modern TUIs has become a lot easier thanks to the demand+distribution wins above: there's a lot of appetite for building blocks, and so lots of good options have flourished like Ink for React, Bubble Tea for Go, etc.

4. General developer distaste for the most straightforward analogue to all of this for desktop GUIs: Electron. Deservedly or not it's associated with slow, bloated applications. And if you don't use Electron, doing cross-platform anything is going to be a much harder problem than just pushing out a quick TUI app.

Eventually successful products seem to eventually jump the gap, like Claude Code eventually spawning Claude Cowork and OpenCode adding OpenCode Web. But it's easier and faster to test product market fit for dev tools with a TUI. And plenty of your users will stay there, even after you launch something else.

New comment by reissbaker in "Uber torches 2026 AI budget on Claude Code in four months"

reissbaker — Fri, 01 May 2026 23:48:58 +0000

Caching is pretty simple. If it's a prefix match, it's cacheable. Very long context windows will be much more expensive than shorter ones, even with caching, assuming you're using Claude Code or some similar harness for both. You'll get caching in both, but you'll pay more for the longer context. The cost of occasional compaction is more or less negligible compared to the massive cost of the input tokens that are getting charged repeatedly for every single request.

If you have 500k context, three turns will burn ~1.5MM tokens. If you have 250k context, three turns will burn ~750k tokens. If you have 125k context, three turns will burn ~375k tokens. Claude can at most generate 32k output tokens per turn in Claude Code (and it rarely does so), so despite the higher price of output tokens, almost all costs are dominated by input token costs. Even at cached input prices, cost scales near-linearly with context length: if you 2x your context length, you'll roughly ~2x your cost.

Now, it might be the case that longer context windows allow Claude to complete the task better — although I'd be surprised if there were many tasks requiring >200k tokens just to get the job done (that's nearly ten full copies of Shakespeare's "A Midsummer Night's Dream"). And they're definitely convenient, in the sense that you don't need to think about context management as much and worry about a sudden, unexpected autocompact wrecking things if you aren't carefully manually compacting at logical points. But they're definitely more expensive on a near-linear basis and you're paying for that convenience.

New comment by reissbaker in "Grok 4.3"

reissbaker — Fri, 01 May 2026 18:24:08 +0000

Wow, I'm surprised. Grok 4.3 actually is noticeably better than the other two for the close-friend variant. Surprisingly I found Claude the cringiest of the three!

New comment by reissbaker in "China blocks Meta's acquisition of AI startup Manus"

reissbaker — Wed, 29 Apr 2026 22:19:34 +0000

The U.S. restrictions on Nvidia compute were clear and extremely widely reported on for years prior to Supermicro's sales. That's why Nvidia created special Chinese-market versions of their GPUs, e.g. the H800 and H20, which Supermicro was free to sell at the time (although eventually those were banned as well, which was similarly widely reported on and well known). The rest of your post is, similarly, basically nonsense.

New comment by reissbaker in "Mistral Medium 3.5"

reissbaker — Wed, 29 Apr 2026 20:11:11 +0000

Good models vs bad models are relative: if this was released in 2020 it would be earth shattering. But releasing a model today that's only on par with open-source dense models a quarter of the size and soundly beaten by open-source MoEs with active param counts a quarter of the size is kind of a flop. The niche for this is basically no one. It'll run at near-zero TPS for the few local model aficionados with enough hardware to try it out, and is lower throughout and lower quality for people trying to use it at scale.

I'm rooting for Mistral, I want them to release good models. This just isn't one. It's a little sad since they once were so prominent for open-source.

Who knows — if they have the compute to train this, they have the compute to train an MoE that's 3-4T total params with 128B active. Maybe they'll make a comeback (although using Llama 2 attention is... not promising). I hope they do.

New comment by reissbaker in "China blocks Meta's acquisition of AI startup Manus"

reissbaker — Tue, 28 Apr 2026 03:16:56 +0000

Naive or not, plenty of investors believed that running a company out of Singapore would shield them from the aggressively un-free Chinese market controls. Manus is proving them wrong, which will hurt Chinese companies that try to run out of Singapore for access to Western capital markets.

New comment by reissbaker in "China blocks Meta's acquisition of AI startup Manus"

reissbaker — Tue, 28 Apr 2026 03:05:56 +0000

What are you talking about? Here are the concrete differences:

1. The U.S. has had a long-standing, extremely public policy that you Cannot Sell Nvidia Chips to China since 2022. Supermicro is an American company (based in San Jose, California), and they sold chips to China from 2024-2025, and they got caught, so they were arrested.

2. Manus founders created... an agent harness? And their company was incorporated in Singapore, not in China. And after they sold their Singaporean company to Meta, China decided that selling Singaporean agentic software "violated export controls" (and even the CCP representative couldn't list which supposed control it violated), and detained them all in China and is attempting to force the Singaporean company to unwind the sale.

These are not really comparable. The Supermicro folks are running a company in America and knew ahead of time, for years, that what they were doing violated American export controls. In the case of Manus, they weren't a Chinese company, no one knew they were supposedly violating unwritten export controls, and China decided post-hoc to detain them all and attempt to force the (Singaporean!) company to unwind the sale.

Quite simply this has never happened in corporate America. America is very friendly to corporations and you'd have to be wildly, knowingly in the wrong to get arrested for an M&A deal.

New comment by reissbaker in "China blocks Meta's acquisition of AI startup Manus"

reissbaker — Mon, 27 Apr 2026 19:44:01 +0000

You've been all over this this thread responding with the same whataboutist comments claiming America does the same thing. And yet, I'm pretty sure America hasn't held American citizens hostage in order to force them to unwind a sale of a foreign company they founded to a different foreign company.

New comment by reissbaker in "China blocks Meta's acquisition of AI startup Manus"

reissbaker — Mon, 27 Apr 2026 19:40:51 +0000

I think it's precisely the fact that the founders live in China. The CCP can make them... kick rocks... for the rest of their lives.

Generally speaking this seems bad for Chinese companies, though. They were able to raise capital from the West by running out of "Singapore"; I think basically every investor will have significant pause investing in Chinese-national-owned startups after this, "Singapore-based" or not.

New comment by reissbaker in "I am building a cloud"

reissbaker — Thu, 23 Apr 2026 23:30:41 +0000

We do both: managed Kubernetes when it's available (AWS, Nebius, others), but for some hardware vendors they just give us raw machines and we self-host K3s on their nodes. We're an open-source LLM inference company so we're basically always scrambling for GPUs wherever we can get them, which means we need to be fairly scrappy with what we support while still having a semi-sane interface for ourselves internally. Kubernetes makes that pretty easy: onboarding a new vendor takes ~minutes, and then everything Just Works and we can interact with the pool of compute the same way we do every other pool since the K8s API is standard, with all of our built-in prod monitoring tools immediately set up and running.

That being said I love exe.dev and have been a happy customer since launch. It's a different use case but they do an amazing job at it. Very, very easy personal cloud dev box. But K8s is very very good too, just for production workloads rather than personal ones!

New comment by reissbaker in "I am building a cloud"

reissbaker — Thu, 23 Apr 2026 21:51:25 +0000

K8s isn't even hard! My team of three manages everything on K8s and we spend ~0 minutes per week on it. Write a script to generate some YAML files, stick it in a CI pipeline, and it's basically fire-and-forget.

You're going to want most of what K8s has anyway: blue-green deployments, some way to specify how many replicas you want, health checks, etc.

The initial setup cost is annoying if you've never done it before, but in terms of maintenance it's very very easy.