<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: veselin</title><link>https://news.ycombinator.com/user?id=veselin</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 06 Apr 2026 02:45:30 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=veselin" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by veselin in "Caveman: Why use many token when few token do trick"]]></title><description><![CDATA[
<p>This is an experiment that, although not to this extreme, was tested by OpenAI. Their responses API allow you to control verbosity:<p><a href="https://developers.openai.com/api/reference/resources/responses/methods/create#(resource)%20responses%20%3E%20(method)%20create%20%3E%20(params)%200.non_streaming%20%3E%20(param)%20text%20%3E%20(schema)%20%2B%20(resource)%20responses%20%3E%20(model)%20response_text_config%20%3E%20(schema)%20%3E%20(property)%20verbosity" rel="nofollow">https://developers.openai.com/api/reference/resources/respon...</a><p>I don't know their internal eval, but I think I have heard it does not hurt or improve performance. But at least this parameter may affect how many comments are in the code.</p>
]]></description><pubDate>Sun, 05 Apr 2026 11:48:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47648409</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47648409</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47648409</guid></item><item><title><![CDATA[New comment by veselin in "AutoKernel: Autoresearch for GPU Kernels"]]></title><description><![CDATA[
<p>I guess we will have a lot more benefits if we can get this to work on something like llama.cpp - since it really has a lot of kernels for different quantizations, a lot of home users, high hardware diversity - so it is a likely place with highest bang for the buck.<p>I guess they can be a contributor there.</p>
]]></description><pubDate>Wed, 11 Mar 2026 10:09:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47333705</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47333705</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47333705</guid></item><item><title><![CDATA[New comment by veselin in "Anthropic announces proof of distillation at scale by MiniMax, DeepSeek,Moonshot"]]></title><description><![CDATA[
<p>I think they put two things:<p>* Likely they will seek regulation that would ban some models. Not sure this can work, but they will certainly try.<p>* Likely they will not release some of their next models in the API.</p>
]]></description><pubDate>Mon, 23 Feb 2026 19:23:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47127400</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47127400</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47127400</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 3.1 Pro"]]></title><description><![CDATA[
<p>I am actually going to complain about this: that neither of the Gemini models are not preview ones.<p>Anthropic seems the best in this. Everything is in the API on day one. OpenAI tend to want to ask you for subscription, but the API gets there a week or a few later. Now, Gemini 3 is not for production use and this is already the previous iteration. So, does Google even intent to release this model?</p>
]]></description><pubDate>Thu, 19 Feb 2026 18:16:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47077009</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47077009</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47077009</guid></item><item><title><![CDATA[New comment by veselin in "GLM-4.7-Flash"]]></title><description><![CDATA[
<p>What is the state of using quants? For chat models, a few errors or lost intelligence may matter a little. But what is happening to tool calling in coding agents? Does it fail catastrophically after a few steps in the agent?<p>I am interesting if I can run it on a 24GB RTX 4090.<p>Also, would vllm be a good option?</p>
]]></description><pubDate>Mon, 19 Jan 2026 20:48:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46684304</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=46684304</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46684304</guid></item><item><title><![CDATA[New comment by veselin in "How to code Claude Code in 200 lines of code"]]></title><description><![CDATA[
<p>I am taking for SWE bench style problems where Todo doesn't help, except for more parallelism.</p>
]]></description><pubDate>Sat, 10 Jan 2026 15:43:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=46566630</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=46566630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46566630</guid></item><item><title><![CDATA[New comment by veselin in "How to code Claude Code in 200 lines of code"]]></title><description><![CDATA[
<p>I run evals and the Todo tool doesn't help most of the time. Usually models on high thinking would maintain Todo/state in their thinking tokens. What Todo helps is for cases like Anthropic models to run more parallel tool calls. If there is a Todo list call, then some of the actions after are more efficient.<p>What you need to do is to match the distribution of how the models were RL-ed. So you are right to say that "do X in 200 lines" is a very small part of the job to be done.</p>
]]></description><pubDate>Fri, 09 Jan 2026 06:48:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46550802</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=46550802</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46550802</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 3 Pro Model Card [pdf]"]]></title><description><![CDATA[
<p>I work a lot on testing also SWE bench verified. This benchmark in my opinion now is good to catch if you got some regression on the agent side.<p>However, going above 75%, it is likely about the same. The remaining instances are likely underspecified despite the effort of the authors that made the benchmark "verified". From what I have seen, these are often cases where the problem statement says implement X for Y, but the agent has to simply guess whether to implement the same for other case Y' - which leads to losing or winning an instance.</p>
]]></description><pubDate>Tue, 18 Nov 2025 14:38:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=45966723</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=45966723</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45966723</guid></item><item><title><![CDATA[New comment by veselin in "Qwen3-Coder: Agentic coding in the world"]]></title><description><![CDATA[
<p>Anybody knows if one can find an inference provider that offers input token caching? It should be almost required for agentic use - first speed, but also almost all conversations start where the previous ended, so cost may end up quite higher with no caching.<p>I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.</p>
]]></description><pubDate>Wed, 23 Jul 2025 10:59:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=44657768</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=44657768</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44657768</guid></item><item><title><![CDATA[New comment by veselin in "I'm dialing back my LLM usage"]]></title><description><![CDATA[
<p>I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there.<p>Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.<p>Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.</p>
]]></description><pubDate>Wed, 02 Jul 2025 16:07:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=44445424</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=44445424</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44445424</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison"]]></title><description><![CDATA[
<p>I noticed a similar trends in selling on X. Put a claim, peg on some product A with good sales - Cursor, Claude, Gemini, etc. Then say, the best way to use A is with our best product, guide, being MCP or something else.<p>For some of these I see something like 15k followers on X, but then no LinkedIn page for example. Website is always a company you cannot contact and they do everything.</p>
]]></description><pubDate>Mon, 31 Mar 2025 12:53:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=43534458</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=43534458</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43534458</guid></item><item><title><![CDATA[New comment by veselin in "AMD 3D V-Cache teardown shows majority of the Ryzen 7 9800X3D is dummy silicon"]]></title><description><![CDATA[
<p>Yes. The article is click bait. With such a title I would have expected majority of the area to be dummy, but it is just structurally more silicon, exactly like a picture may be majority of its mass wood.</p>
]]></description><pubDate>Wed, 18 Dec 2024 16:58:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=42452290</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=42452290</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42452290</guid></item><item><title><![CDATA[New comment by veselin in "The lifecycle of a code AI completion"]]></title><description><![CDATA[
<p>I used them both.<p>I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.</p>
]]></description><pubDate>Mon, 08 Apr 2024 05:58:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=39966631</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39966631</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39966631</guid></item><item><title><![CDATA[New comment by veselin in "Why AWS Supports Valkey"]]></title><description><![CDATA[
<p>It seems recent years give us a lot of licenses (for core infra software) and now for LLMs. They all say in very legalese basically: these top 5-10 tech companies will not compete fairly with us, thus they are banned from using the software. The rest are welcome to use everything.<p>I wonder if US monopoly regulation actually starts to work well, which I see some signs of happening, will all this license revert back to fully open source?</p>
]]></description><pubDate>Sat, 06 Apr 2024 09:40:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=39951233</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39951233</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39951233</guid></item><item><title><![CDATA[New comment by veselin in "Jpegli: A new JPEG coding library"]]></title><description><![CDATA[
<p>When I saw the name, I knew immediately this is Jyrki's work.</p>
]]></description><pubDate>Wed, 03 Apr 2024 18:42:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=39921299</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39921299</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39921299</guid></item><item><title><![CDATA[New comment by veselin in "Models all the way down"]]></title><description><![CDATA[
<p>Exactly. The whole thing reads like some propaganda. It pits interesting topics ahead then to move on and push some agenda that sounds super political to me.<p>Yes, some languages are underrepresented and there are some thresholds. But exactly, it is well known that putting the threshold just slightly above or below will probably not materially affect the model.</p>
]]></description><pubDate>Sun, 31 Mar 2024 08:02:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=39882365</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39882365</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39882365</guid></item><item><title><![CDATA[New comment by veselin in "EagleX 1.7T: Soaring past LLaMA 7B 2T in both English and Multi-lang evals"]]></title><description><![CDATA[
<p>I think this is simply the default of lm-evaluation-harness. They said they ran every single benchmark they could out of the box.</p>
]]></description><pubDate>Mon, 18 Mar 2024 06:56:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=39741063</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39741063</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39741063</guid></item><item><title><![CDATA[New comment by veselin in "Culture Change at Google"]]></title><description><![CDATA[
<p>The product they often presented as started in 20% time is Google news. I don't know the actual details, just this is what I remember from my time at Google (2006-2012).</p>
]]></description><pubDate>Fri, 19 Jan 2024 13:59:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=39055492</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39055492</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39055492</guid></item><item><title><![CDATA[New comment by veselin in "Apple M2 Ultra SoC isn’t faster than AMD and Intel last year desktop CPUs"]]></title><description><![CDATA[
<p>It is true that nobody competes in the low power high efficiency workstation market or maybe such a market does not exist yet and Apple is creating it.<p>But also as users, some were expecting the M series are so good that they are going to take many markets by storm. And it seems it is not happening.</p>
]]></description><pubDate>Sun, 11 Jun 2023 05:58:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=36278608</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=36278608</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36278608</guid></item><item><title><![CDATA[New comment by veselin in "Apple M2 Ultra SoC isn’t faster than AMD and Intel last year desktop CPUs"]]></title><description><![CDATA[
<p>I find the article quite informative. Yes, M2 and the other chips are completely different products with different goals. If one wants to say that something completely trumps the other, it will be wrong.<p>But here is what is visible:<p>The M2 core is probably in the same ballpark as Zen 4 core, likely a tiny bit below. That may become very tiny if Zen 4 core runs at lower frequency to equalize the power. This doesn't account for the AVX512 of Zen4.<p>24 M2 cores manage to beat 16 Zen 4 cores also at lower power, but these are different products. Zen 4 does scale to far more cores, 96 in an EPYC chip. AMD and Intel have far more investments in interconnects and multi-die chips to do these things.<p>The M2 GPU is in the same league as a 300$ mid-range nVidia card. It is not competitive at all - Apple produces the largest chip it can manufacture to go against a high margin smaller chip that nVidia orders.<p>Again all of this doesn't mean each product is not good on its own.</p>
]]></description><pubDate>Sat, 10 Jun 2023 19:02:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=36273810</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=36273810</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36273810</guid></item></channel></rss>