Hacker News: greyskull

New comment by greyskull in "Notes from the Mistral AI Now Summit"

greyskull — Fri, 29 May 2026 19:13:15 +0000

Interesting. I'd have guessed there would be meaningful opex benefits to serving smaller models.

New comment by greyskull in "Notes from the Mistral AI Now Summit in Paris"

greyskull — Fri, 29 May 2026 18:37:25 +0000

> task focused small models

This is tangential: and forgive my ignorance here, but is there an inherent reason why there aren't smaller, focused models from the frontier model providers?

I'm thinking something like a software-specific subset of Opus that is the default for use in Claude Code. Smaller, cheaper to deploy and consume, maybe faster.

New comment by greyskull in "Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving"

greyskull — Tue, 21 Apr 2026 05:13:43 +0000

Thanks, I'll have to continue experimenting. I just ran this model Qwen3.6-35B-A3B-GGUF:UD-Q4_K_XL and it works, but if gemini is to be believed this is saturating too much VRAM to use for chat context.

How did you land on that model? Hard to tell if I should be a) going to 3.5, b) going to fewer parameters, c) going to a different quantization/variant.

I didn't consider those other flags either, cool.

Are you having good luck with any particular harnesses or other tooling?

New comment by greyskull in "Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving"

greyskull — Tue, 21 Apr 2026 02:49:59 +0000

Thank you for all this, I'll give it a shot. Out of curiosity, are there any resources that sort of spell this out already? i.e., not requiring a comment like this to navigate.

> nothing you can run locally, on that machine anyways, is going to compare with Opus

Definitely not expecting that. Just wanted to find a setup that individuals were content with using a coding harness and a model that is usable locally.

What does your setup look like? Model, harness, etc.

New comment by greyskull in "Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving"

greyskull — Tue, 21 Apr 2026 00:48:32 +0000

It's certainly part of the problem. Thanks, I'll give that a shot.

New comment by greyskull in "Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving"

greyskull — Tue, 21 Apr 2026 00:48:10 +0000

Thanks! These things you're mentioning like "You may be able to offload some layers to GPU...", "You can keep the KV cache on GPU..." configured as part of the llama.cpp? I wouldn't know what to prompt with or how to evaluate "correctness" (outside of literally feeding your comment into claude and seeing what happens).

Aside: what is your tooling setup? Which harness you're using (if any), what's running the inference and where, what runs in WSL vs Windows, etc.

I struggle to even ask the right questions about the workflow and environment.

New comment by greyskull in "Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving"

greyskull — Tue, 21 Apr 2026 00:00:52 +0000

I've been using Claude Code regularly at work for several months, and I successfully used it for a small personal project (a website) not long ago. Last weekend, I explored self-hosting for the first time.

Does anyone have a similar experience of having thoroughly used CC/Codex/whatever and also have an analogous self-hosted setup that they're somewhat happy with? I'm struggling a bit.

I have 32GB of DDR5 (seems inadequate nowadays), an AMD 7800X3D, and an RTX 4090. I'm using Windows but I have WSL enabled.

I tried a few combinations of ollama, docker desktop model runner, pi-coding-agent and opencode; and for models, I think I tried a few variants each of Gemma 4, Qwen, GLM-5.1. My "baseline" RAM usage was so high from the handful of regular applications that IIRC it wasn't enough to use the best models; e.g., I couldn't run Gemma4-31B.

Things work okay in a Windows-only setup, though the agent struggled to get file paths correct. I did have some success running pi/opencode in WSL and running ollama and the model via docker desktop.

In terms of actual performance, it was painfully slow compared to the throughput I'm used to from CC, and the tooling didn't feel as good as the CC harness. Admittedly I didn't spend long enough actually using it after fiddling with setup for so long, it was at least a fun experiment.

New comment by greyskull in "Nvidia with unusually fast coding model on plate-sized chips"

greyskull — Tue, 17 Feb 2026 00:12:20 +0000

Missing "OpenAI sidesteps" from the beginning of the title article title

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Fri, 22 Nov 2024 19:54:57 +0000

I agree.

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Thu, 21 Nov 2024 03:51:32 +0000

Didn't get far enough along to understand the motivations and considered alternatives.

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Thu, 21 Nov 2024 03:51:23 +0000

Didn't get far enough along to understand the motivations and considered alternatives.

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Thu, 21 Nov 2024 03:45:37 +0000

OpenNext does model [0] incremental static regeneration, but beyond that I actually don't know, or at least don't recall. OpenNext doesn't do per-route lambdas like vercel does, so it's not like you get any behavior differences there.

I _think_ you can get scale to zero on Lambda by deploying a docker container, too.

[0] https://opennext.js.org/aws/v2/advanced/architecture

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Thu, 21 Nov 2024 03:31:11 +0000

1) I don't think it's related to Next, per se, but there may be behavior I didn't build the expertise to comment on. I also know that there were major inefficiencies in the application, so, for example, our P90 latency was (imo) terrible.

2) We'd have to define what constitutes low traffic vs any other arbitrary measure, so it's moot to discuss like this; all I said it wasn't high traffic. You could run it for cheaper, but there wasn't much expertise for self-hosting, for example.

3) For all I remember it may have been half that in daily cost. In any case, miniscule compared to engineer time. What was worse was the prior decision to use serverless aurora rds, that dwarfed everything else in AWS cost - I know this is tangentially related, just saying optimizing that a bit more was not the highest priority, we could do it for cheaper.

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Thu, 21 Nov 2024 00:00:48 +0000

It offers packaging for deploying to a serverless environment (e.g. Lambda) analogous to how Vercel does it.

The last question is salient, and it's possible for OpenNext to break and have to catch up to changes in Next.js, though I believe there's some more direct collaboration. I'd say that's the biggest downside - it's not guaranteed compatibility.

I did a migration recently (comments elsewhere in this post), and I don't recall the specific issue, but I _do_ recall running into at least one scenario where OpenNext had made a decision that impacted - in a way that was visible to me and undesirable - how Next.js functioned. That's not a criticism, there's tradeoffs.

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Wed, 20 Nov 2024 23:53:53 +0000

OpenNext is just for packaging the Next.js build artifacts. The infrastructure is defined by projects that deploy those artifacts, examples here: https://opennext.js.org/aws/get_started

Some of them are, for example, Terraform projects that list the specific infra. I have experience with the SST deployment, whose website unfortunately doesn't do a great job of listing the infra architecture.

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Wed, 20 Nov 2024 23:49:40 +0000

The biggest cost for us on Vercel (several hundred dollars a month) was Image Optimization, and that was because the app was being majorly inefficient with images, in part due to some default behavior in Next.js that we found unfriendly [0], and in part due to negligence. That being said, it wasn't "cheap" by any means outside of that, still hundreds a month for something that I would not consider a high traffic application (I wish I could remember more specific numbers).

Migrating to OpenNext using SST, I think we got the bills for compute and asset serving down to like $15/day or something (granted, we spent expensive engineer time on the migration).

[0] https://nextjs.org/docs/app/api-reference/components/image#s...

New comment by greyskull in "Show HN: Self-Host Next.js in Production"

greyskull — Wed, 20 Nov 2024 23:40:14 +0000

In the company I just left, I actually went through the process two or so months ago of migrating their Vercel deployment to AWS. I evaluated several options that are listed on the website and on GitHub, and we landed on using OpenNext via SST, it was a low-pain effort, especially given the CTO's desire to also migrate off of Next.js.

As other commenters have touched on - my understanding is the purpose of OpenNext is to package the output artifacts of a Next build in a way that can be deployed to a serverless environment, analogous to how Vercel does it. The supporting projects like SST and the other links in the repo are to take those OpenNext artifacts and deploy them to infrastructure generally in an opinionated way - additionally supporting some of the "extra" features described in the repository.

The last project I was working on was to then migrate from SST to Fargate, as a persistent process (serverful?) deployment was preferable for various reasons. In that scenario, we would just be running the built in server using the Next.js standalone deployment mode (effectively a `node index.js`). We didn't need the extra functionality covered by OpenNext.

New comment by greyskull in "Show HN: I Wrote a Book on Java"

greyskull — Mon, 23 Sep 2024 19:15:11 +0000

Congratulations!

I see that the book is incomplete. I didn't know that early access for books was a thing, very neat. It might be pertinent to note in your post that it's still being written, with an estimated release window of Spring 2025.

I'm very much a "consume it when it's ready" person, so I'll keep this on my watch list.

New comment by greyskull in "Cloudflare misidentifies Hetzner IPs as being located in Iran"

greyskull — Wed, 18 Sep 2024 21:07:13 +0000

Might be pertinent to suffix this with (2023), though I see there are still recent replies

New comment by greyskull in "Riot Games laying off 11%"

greyskull — Tue, 23 Jan 2024 01:46:33 +0000

It's opening in Q2. Source: I'm one of the subjects of the article :^)