Hacker News: ekojs

New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

ekojs — Wed, 22 Apr 2026 16:04:10 +0000

> You cannot run these models at 8-bit on a 32GB card because you need space for context

You probably can actually. Not saying that it would be ideal but it can fit entirely in VRAM (if you make sure to quantize the attention layers). KV cache quantization and not loading the vision tower would help quite a bit. Not ideal for long context, but it should be very much possible.

I addressed the lossless claim in another reply but I guess it really depends on what the model is used for. For my usecases, it's nearly lossless I'd say.

New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

ekojs — Wed, 22 Apr 2026 15:55:57 +0000

Yeah, figure the 'nearly lossless' claim is the most controversial thing. But in my defense, ~97% recovery in benchmarks is what I consider 'nearly lossless'. When quantized with calibration data for a specialized domain, the difference in my internal benchmark is pretty much indistinguishable. But for agentic work, 4-bit quants can indeed fall a bit short in long-context usecase, especially if you quantize the attention layers.

New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

ekojs — Wed, 22 Apr 2026 15:46:10 +0000

Not at all, I actually run ~30B dense models for production and have tested out 5090/3090 for that. There are gotchas of course, but the speed/quality claims should be roughly there.

New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

ekojs — Wed, 22 Apr 2026 15:41:14 +0000

As this is a dense model and it's pretty sizable, 4-bit quantization can be nearly lossless. With that, you can run this on a 3090/4090/5090. You can probably even go FP8 with 5090 (though there will be tradeoffs). Probably ~70 tok/s on a 5090 and roughly half that on a 4090/3090. With speculative decoding, you can get even faster (2-3x I'd say). Pretty amazing what you can get locally.

Show HN: Linux Nvidia GPU V/F Curve Editor for Undervolting/OC

ekojs — Fri, 20 Mar 2026 13:35:09 +0000

Article URL: https://github.com/ekojsalim/nvcurve/tree/main

Comments URL: https://news.ycombinator.com/item?id=47454289

Points: 4

# Comments: 1

Gemini API Down

ekojs — Mon, 29 Sep 2025 18:36:14 +0000

Article URL: https://twitter.com/OfficialLoganK/status/1972729571868086327

Comments URL: https://news.ycombinator.com/item?id=45417196

Points: 3

# Comments: 0

New comment by ekojs in "Gemini API Billing Bug Causing Erroneous Charge for 'Image Generation'"

ekojs — Mon, 25 Aug 2025 10:58:58 +0000

Seems pretty widespread. We got mistakenly charged for ~$800 over the weekend.

Other Sources:

[0]: https://aistudio.google.com/status

[1]: https://www.reddit.com/r/GeminiAI/comments/1mycmtk/google_cl...

[2]: https://www.reddit.com/r/GeminiAI/comments/1myg04q/gemini_25...

Gemini API Billing Bug Causing Erroneous Charge for 'Image Generation'

ekojs — Mon, 25 Aug 2025 10:58:58 +0000

Article URL: https://discuss.ai.google.dev/t/gemini-api-cost-suddenly-skyrocketed/99479

Comments URL: https://news.ycombinator.com/item?id=45012503

Points: 4

# Comments: 1

New comment by ekojs in "Gemini with Deep Think achieves gold-medal standard at the IMO"

ekojs — Mon, 21 Jul 2025 17:28:06 +0000

> Btw as an aside, we didn’t announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved

> We've now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!

From https://x.com/demishassabis/status/1947337620226240803

Was OpenAI simply not coordinating with the IMO Board then?

New comment by ekojs in "How I Use Kagi"

ekojs — Thu, 17 Jul 2025 17:05:21 +0000

Maybe not a popular sentiment here on HN but I cancelled my Kagi subscription (9+ months) just recently. Increasingly, most of my queries/search have been through LLMs and Google search is just fine (and even better for restaurants, places, and the like). I don't think the improved search experience is worth the subscription anymore.

New comment by ekojs in "GCP Outage"

ekojs — Thu, 12 Jun 2025 18:53:25 +0000

https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...

> Multiple GCP products are experiencing impact due to Identity and Access Management Service Issue

IAM issue huh. The post-mortem should be interesting at least.

New comment by ekojs in "GCP Outage"

ekojs — Thu, 12 Jun 2025 18:29:46 +0000

Super duper frustrating having the status page being green. Why can't Google do this properly?

New comment by ekojs in "Next.js 15.1 is unusable outside of Vercel"

ekojs — Thu, 12 Jun 2025 10:50:19 +0000

I share the sentiment. I think we will only be using Next.js for static sites/prebuilt SPA in the future.

New comment by ekojs in "Meta got caught gaming AI benchmarks"

ekojs — Tue, 08 Apr 2025 16:58:45 +0000

I think it's most illustrative to see the sample battles (H2H) that LMArena released [1]. The outputs of Meta's model is too verbose and too 'yappy' IMO. And looking at the verdicts, it's no wonder by people are discounting LMArena rankings.

[1]: https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03...

New comment by ekojs in "Gemini 2.5"

ekojs — Tue, 25 Mar 2025 17:19:50 +0000

> This will mark the first experimental model with higher rate limits + billing. Excited for this to land and for folks to really put the model through the paces!

From https://x.com/OfficialLoganK/status/1904583353954882046

The low rate-limit really hampered my usage of 2.0 Pro and the like. Interesting to see how this plays out.

New comment by ekojs in "Fine-tune Google's Gemma 3"

ekojs — Wed, 19 Mar 2025 20:58:12 +0000

> The bottleneck then becomes how to self-host the finetuned model in a way that's cost-effective and scalable

It's not actually that expensive and hard. For narrow usecases, you can produce 4-bit quantized fine-tunes that perform as well as the full model. Hosting the 4-bit quantized version can be done on relatively low cost. You can use A40 or RTX 3090 on Runpod for ~$300/month.

New comment by ekojs in "How much traffic can a pre-rendered Next.js site handle?"

ekojs — Sun, 09 Mar 2025 07:18:18 +0000

Normally, yes. But there's a couple rendering modes with these frameworks. In this case, the rendering is most likely 'hybrid'. Some routes are statically pre-rendered, some are served via SSR. You'd need a JS server for the SSR ofc.

New comment by ekojs in "How much traffic can a pre-rendered Next.js site handle?"

ekojs — Sun, 09 Mar 2025 05:23:32 +0000

Interesting. My hunch is that Next.js is not optimized for the dockerized Node server deployment. I would say that you could get much greater prerendering performance from Next.js by just fronting the assets directly using Caddy/Nginx.

New comment by ekojs in "GPT-4.5"

ekojs — Thu, 27 Feb 2025 20:14:29 +0000

> Because of this, we’re evaluating whether to continue serving it in the API long-term as we balance supporting current capabilities with building future models.

Seems like it's not going to be deployed for long.

$75.00 / 1M tokens for input

$150.00 / 1M tokens for output

That's crazy prices.

Multilingual MMLU Dataset from OpenAI (OpenAI/Mmmlu)

ekojs — Mon, 23 Sep 2024 16:50:04 +0000

Article URL: https://huggingface.co/datasets/openai/MMMLU

Comments URL: https://news.ycombinator.com/item?id=41628043

Points: 2

# Comments: 0