<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ekojs</title><link>https://news.ycombinator.com/user?id=ekojs</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 01:25:36 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ekojs" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>> You cannot run these models at 8-bit on a 32GB card because you need space for context<p>You probably can actually. Not saying that it would be ideal but it can fit entirely in VRAM (if you make sure to quantize the attention layers). KV cache quantization and not loading the vision tower would help quite a bit. Not ideal for long context, but it should be very much possible.<p>I addressed the lossless claim in another reply but I guess it really depends on what the model is used for. For my usecases, it's nearly lossless I'd say.</p>
]]></description><pubDate>Wed, 22 Apr 2026 16:04:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47865572</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=47865572</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47865572</guid></item><item><title><![CDATA[New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>Yeah, figure the 'nearly lossless' claim is the most controversial thing. But in my defense, ~97% recovery in benchmarks is what I consider 'nearly lossless'. When quantized with calibration data for a specialized domain, the difference in my internal benchmark is pretty much indistinguishable. But for agentic work, 4-bit quants can indeed fall a bit short in long-context usecase, especially if you quantize the attention layers.</p>
]]></description><pubDate>Wed, 22 Apr 2026 15:55:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47865464</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=47865464</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47865464</guid></item><item><title><![CDATA[New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>Not at all, I actually run ~30B dense models for production and have tested out 5090/3090 for that. There are gotchas of course, but the speed/quality claims should be roughly there.</p>
]]></description><pubDate>Wed, 22 Apr 2026 15:46:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47865281</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=47865281</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47865281</guid></item><item><title><![CDATA[New comment by ekojs in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>As this is a dense model and it's pretty sizable, 4-bit quantization can be nearly lossless. With that, you can run this on a 3090/4090/5090. You can probably even go FP8 with 5090 (though there will be tradeoffs). Probably ~70 tok/s on a 5090 and roughly half that on a 4090/3090. With speculative decoding, you can get even faster (2-3x I'd say). Pretty amazing what you can get locally.</p>
]]></description><pubDate>Wed, 22 Apr 2026 15:41:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47865195</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=47865195</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47865195</guid></item><item><title><![CDATA[Show HN: Linux Nvidia GPU V/F Curve Editor for Undervolting/OC]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/ekojsalim/nvcurve/tree/main">https://github.com/ekojsalim/nvcurve/tree/main</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47454289">https://news.ycombinator.com/item?id=47454289</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 20 Mar 2026 13:35:09 +0000</pubDate><link>https://github.com/ekojsalim/nvcurve/tree/main</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=47454289</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47454289</guid></item><item><title><![CDATA[Gemini API Down]]></title><description><![CDATA[
<p>Article URL: <a href="https://twitter.com/OfficialLoganK/status/1972729571868086327">https://twitter.com/OfficialLoganK/status/1972729571868086327</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45417196">https://news.ycombinator.com/item?id=45417196</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 29 Sep 2025 18:36:14 +0000</pubDate><link>https://twitter.com/OfficialLoganK/status/1972729571868086327</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=45417196</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45417196</guid></item><item><title><![CDATA[New comment by ekojs in "Gemini API Billing Bug Causing Erroneous Charge for 'Image Generation'"]]></title><description><![CDATA[
<p>Seems pretty widespread. We got mistakenly charged for ~$800 over the weekend.<p>Other Sources:<p>[0]: <a href="https://aistudio.google.com/status" rel="nofollow">https://aistudio.google.com/status</a><p>[1]: <a href="https://www.reddit.com/r/GeminiAI/comments/1mycmtk/google_cloud_charged_me_1000_for_image_generation/" rel="nofollow">https://www.reddit.com/r/GeminiAI/comments/1mycmtk/google_cl...</a><p>[2]: <a href="https://www.reddit.com/r/GeminiAI/comments/1myg04q/gemini_25_flash_native_image_generation/" rel="nofollow">https://www.reddit.com/r/GeminiAI/comments/1myg04q/gemini_25...</a></p>
]]></description><pubDate>Mon, 25 Aug 2025 10:58:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=45012504</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=45012504</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45012504</guid></item><item><title><![CDATA[Gemini API Billing Bug Causing Erroneous Charge for 'Image Generation']]></title><description><![CDATA[
<p>Article URL: <a href="https://discuss.ai.google.dev/t/gemini-api-cost-suddenly-skyrocketed/99479">https://discuss.ai.google.dev/t/gemini-api-cost-suddenly-skyrocketed/99479</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45012503">https://news.ycombinator.com/item?id=45012503</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 25 Aug 2025 10:58:58 +0000</pubDate><link>https://discuss.ai.google.dev/t/gemini-api-cost-suddenly-skyrocketed/99479</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=45012503</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45012503</guid></item><item><title><![CDATA[New comment by ekojs in "Gemini with Deep Think achieves gold-medal standard at the IMO"]]></title><description><![CDATA[
<p>> Btw as an aside, we didn’t announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved<p>> We've now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!<p>From <a href="https://x.com/demishassabis/status/1947337620226240803" rel="nofollow">https://x.com/demishassabis/status/1947337620226240803</a><p>Was OpenAI simply not coordinating with the IMO Board then?</p>
]]></description><pubDate>Mon, 21 Jul 2025 17:28:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=44637877</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=44637877</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44637877</guid></item><item><title><![CDATA[New comment by ekojs in "How I Use Kagi"]]></title><description><![CDATA[
<p>Maybe not a popular sentiment here on HN but I cancelled my Kagi subscription (9+ months) just recently. Increasingly, most of my queries/search have been through LLMs and Google search is just fine (and even better for restaurants, places, and the like). I don't think the improved search experience is worth the subscription anymore.</p>
]]></description><pubDate>Thu, 17 Jul 2025 17:05:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=44595532</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=44595532</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44595532</guid></item><item><title><![CDATA[New comment by ekojs in "GCP Outage"]]></title><description><![CDATA[
<p><a href="https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW" rel="nofollow">https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...</a><p>> Multiple GCP products are experiencing impact due to Identity and Access Management Service Issue<p>IAM issue huh. The post-mortem should be interesting at least.</p>
]]></description><pubDate>Thu, 12 Jun 2025 18:53:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44261594</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=44261594</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44261594</guid></item><item><title><![CDATA[New comment by ekojs in "GCP Outage"]]></title><description><![CDATA[
<p>Super duper frustrating having the status page being green. Why can't Google do this properly?</p>
]]></description><pubDate>Thu, 12 Jun 2025 18:29:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=44261170</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=44261170</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44261170</guid></item><item><title><![CDATA[New comment by ekojs in "Next.js 15.1 is unusable outside of Vercel"]]></title><description><![CDATA[
<p>I share the sentiment. I think we will only be using Next.js for static sites/prebuilt SPA in the future.</p>
]]></description><pubDate>Thu, 12 Jun 2025 10:50:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=44256118</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=44256118</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44256118</guid></item><item><title><![CDATA[New comment by ekojs in "Meta got caught gaming AI benchmarks"]]></title><description><![CDATA[
<p>I think it's most illustrative to see the sample battles (H2H) that LMArena released [1]. The outputs of Meta's model is too verbose and too 'yappy' IMO. And looking at the verdicts, it's no wonder by people are discounting LMArena rankings.<p>[1]: <a href="https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03-26-Experimental_battles" rel="nofollow">https://huggingface.co/spaces/lmarena-ai/Llama-4-Maverick-03...</a></p>
]]></description><pubDate>Tue, 08 Apr 2025 16:58:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=43623925</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=43623925</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43623925</guid></item><item><title><![CDATA[New comment by ekojs in "Gemini 2.5"]]></title><description><![CDATA[
<p>> This will mark the first experimental model with higher rate limits + billing. Excited for this to land and for folks to really put the model through the paces!<p>From <a href="https://x.com/OfficialLoganK/status/1904583353954882046" rel="nofollow">https://x.com/OfficialLoganK/status/1904583353954882046</a><p>The low rate-limit really hampered my usage of 2.0 Pro and the like. Interesting to see how this plays out.</p>
]]></description><pubDate>Tue, 25 Mar 2025 17:19:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=43473644</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=43473644</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43473644</guid></item><item><title><![CDATA[New comment by ekojs in "Fine-tune Google's Gemma 3"]]></title><description><![CDATA[
<p>> The bottleneck then becomes how to self-host the finetuned model in a way that's cost-effective and scalable<p>It's not actually that expensive and hard. For narrow usecases, you can produce 4-bit quantized fine-tunes that perform as well as the full model. Hosting the 4-bit quantized version can be done on relatively low cost. You can use A40 or RTX 3090 on Runpod for ~$300/month.</p>
]]></description><pubDate>Wed, 19 Mar 2025 20:58:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=43417256</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=43417256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43417256</guid></item><item><title><![CDATA[New comment by ekojs in "How much traffic can a pre-rendered Next.js site handle?"]]></title><description><![CDATA[
<p>Normally, yes. But there's a couple rendering modes with these frameworks. In this case, the rendering is most likely 'hybrid'. Some routes are statically pre-rendered, some are served via SSR. You'd need a JS server for the SSR ofc.</p>
]]></description><pubDate>Sun, 09 Mar 2025 07:18:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=43306968</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=43306968</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43306968</guid></item><item><title><![CDATA[New comment by ekojs in "How much traffic can a pre-rendered Next.js site handle?"]]></title><description><![CDATA[
<p>Interesting. My hunch is that Next.js is not optimized for the dockerized Node server deployment. I would say that you could get much greater prerendering performance from Next.js by just fronting the assets directly using Caddy/Nginx.</p>
]]></description><pubDate>Sun, 09 Mar 2025 05:23:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=43306524</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=43306524</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43306524</guid></item><item><title><![CDATA[New comment by ekojs in "GPT-4.5"]]></title><description><![CDATA[
<p>> Because of this, we’re evaluating whether to continue serving it in the API long-term as we balance supporting current capabilities with building future models.<p>Seems like it's not going to be deployed for long.<p>$75.00 / 1M tokens for input<p>$150.00 / 1M tokens for output<p>That's crazy prices.</p>
]]></description><pubDate>Thu, 27 Feb 2025 20:14:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=43197977</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=43197977</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43197977</guid></item><item><title><![CDATA[Multilingual MMLU Dataset from OpenAI (OpenAI/Mmmlu)]]></title><description><![CDATA[
<p>Article URL: <a href="https://huggingface.co/datasets/openai/MMMLU">https://huggingface.co/datasets/openai/MMMLU</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41628043">https://news.ycombinator.com/item?id=41628043</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 23 Sep 2024 16:50:04 +0000</pubDate><link>https://huggingface.co/datasets/openai/MMMLU</link><dc:creator>ekojs</dc:creator><comments>https://news.ycombinator.com/item?id=41628043</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41628043</guid></item></channel></rss>