<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: veselin</title><link>https://news.ycombinator.com/user?id=veselin</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 11:57:34 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=veselin" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by veselin in "GLM 5.2 beats Claude in our benchmarks"]]></title><description><![CDATA[
<p>Here, it appears they compare a single prompt "find IDOR", against a multi-agent system. However, one can also start far more sophisticated skills that spin up subagents and mostly do the same in Claude Code, Codex, OpenCode, Pi, etc.<p>Which I guess makes what semgrep sells obsolete. Unless they have built a pareto-optimal point in terms of capabilities and token usage maybe?</p>
]]></description><pubDate>Sun, 28 Jun 2026 20:03:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=48711028</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=48711028</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48711028</guid></item><item><title><![CDATA[New comment by veselin in "Why current LLM costs are not sustainable"]]></title><description><![CDATA[
<p>The more I think on the problem, the more I believe this will be solved with US interventions. And the interventions will increase inflation by a lot, so prices will not go down.<p>The other alternatives with LLMs becoming more expensive in an Uber-like move may not work due to a lot of competition. I also don't think usage will increase 10x. I don't always have coding tasks for an LLM despite it being good.<p>My reasons to believe so are outside of what interests HN community and I am neither endorsing this behavior, nor I think it is that simple. But US also has a huge debt that it must service. Wouldn't it be convenient if it was suddenly halved in actual value?</p>
]]></description><pubDate>Fri, 26 Jun 2026 09:12:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48684271</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=48684271</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48684271</guid></item><item><title><![CDATA[New comment by veselin in "MAI-Code-1-Flash"]]></title><description><![CDATA[
<p>Claude code itself spins a lot of its subagents with Haiku. The model has low hallucination rate, so it is great for exploration tasks. I guess this is what the best purpose of this model here will be as well. Which is a lot of tokens - many tasks spin multiple exploration agents before the planning or fixing, that is then just a few tool calls.</p>
]]></description><pubDate>Tue, 02 Jun 2026 20:06:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=48375520</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=48375520</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48375520</guid></item><item><title><![CDATA[New comment by veselin in "The bootstrapper's EU stack for under €10 per month"]]></title><description><![CDATA[
<p>I would argue that with AI, this becomes less of an issue. Connect N services, deploy to bare metal. Granted, AI is an additional cost now local or remote. But so is the MacBook people use to develop their software.</p>
]]></description><pubDate>Mon, 25 May 2026 19:42:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48270803</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=48270803</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48270803</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 3.5 Flash"]]></title><description><![CDATA[
<p>Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.</p>
]]></description><pubDate>Tue, 19 May 2026 19:52:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=48198554</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=48198554</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48198554</guid></item><item><title><![CDATA[New comment by veselin in "Does coding with LLMs mean more microservices?"]]></title><description><![CDATA[
<p>I think this is a promise, probably also for spec driven development. You write the spec, the whole thing can be reimplemented in rust tomorrow. Make small modules or libraries.<p>One colleague describes monolith vs microservices as "the grass is greener of the other side".<p>In the end, having microservices is that that the release process becomes much harder. Every feature spans 3 services at least, with possible incompatibility between some of their versions. Precisely the work you cannot easily automate with LLMs.</p>
]]></description><pubDate>Mon, 06 Apr 2026 10:38:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47659148</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47659148</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47659148</guid></item><item><title><![CDATA[New comment by veselin in "Caveman: Why use many token when few token do trick"]]></title><description><![CDATA[
<p>This is an experiment that, although not to this extreme, was tested by OpenAI. Their responses API allow you to control verbosity:<p><a href="https://developers.openai.com/api/reference/resources/responses/methods/create#(resource)%20responses%20%3E%20(method)%20create%20%3E%20(params)%200.non_streaming%20%3E%20(param)%20text%20%3E%20(schema)%20%2B%20(resource)%20responses%20%3E%20(model)%20response_text_config%20%3E%20(schema)%20%3E%20(property)%20verbosity" rel="nofollow">https://developers.openai.com/api/reference/resources/respon...</a><p>I don't know their internal eval, but I think I have heard it does not hurt or improve performance. But at least this parameter may affect how many comments are in the code.</p>
]]></description><pubDate>Sun, 05 Apr 2026 11:48:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47648409</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47648409</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47648409</guid></item><item><title><![CDATA[New comment by veselin in "AutoKernel: Autoresearch for GPU Kernels"]]></title><description><![CDATA[
<p>I guess we will have a lot more benefits if we can get this to work on something like llama.cpp - since it really has a lot of kernels for different quantizations, a lot of home users, high hardware diversity - so it is a likely place with highest bang for the buck.<p>I guess they can be a contributor there.</p>
]]></description><pubDate>Wed, 11 Mar 2026 10:09:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47333705</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47333705</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47333705</guid></item><item><title><![CDATA[New comment by veselin in "Anthropic announces proof of distillation at scale by MiniMax, DeepSeek,Moonshot"]]></title><description><![CDATA[
<p>I think they put two things:<p>* Likely they will seek regulation that would ban some models. Not sure this can work, but they will certainly try.<p>* Likely they will not release some of their next models in the API.</p>
]]></description><pubDate>Mon, 23 Feb 2026 19:23:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47127400</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47127400</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47127400</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 3.1 Pro"]]></title><description><![CDATA[
<p>I am actually going to complain about this: that neither of the Gemini models are not preview ones.<p>Anthropic seems the best in this. Everything is in the API on day one. OpenAI tend to want to ask you for subscription, but the API gets there a week or a few later. Now, Gemini 3 is not for production use and this is already the previous iteration. So, does Google even intent to release this model?</p>
]]></description><pubDate>Thu, 19 Feb 2026 18:16:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47077009</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=47077009</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47077009</guid></item><item><title><![CDATA[New comment by veselin in "GLM-4.7-Flash"]]></title><description><![CDATA[
<p>What is the state of using quants? For chat models, a few errors or lost intelligence may matter a little. But what is happening to tool calling in coding agents? Does it fail catastrophically after a few steps in the agent?<p>I am interesting if I can run it on a 24GB RTX 4090.<p>Also, would vllm be a good option?</p>
]]></description><pubDate>Mon, 19 Jan 2026 20:48:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46684304</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=46684304</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46684304</guid></item><item><title><![CDATA[New comment by veselin in "How to code Claude Code in 200 lines of code"]]></title><description><![CDATA[
<p>I am taking for SWE bench style problems where Todo doesn't help, except for more parallelism.</p>
]]></description><pubDate>Sat, 10 Jan 2026 15:43:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=46566630</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=46566630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46566630</guid></item><item><title><![CDATA[New comment by veselin in "How to code Claude Code in 200 lines of code"]]></title><description><![CDATA[
<p>I run evals and the Todo tool doesn't help most of the time. Usually models on high thinking would maintain Todo/state in their thinking tokens. What Todo helps is for cases like Anthropic models to run more parallel tool calls. If there is a Todo list call, then some of the actions after are more efficient.<p>What you need to do is to match the distribution of how the models were RL-ed. So you are right to say that "do X in 200 lines" is a very small part of the job to be done.</p>
]]></description><pubDate>Fri, 09 Jan 2026 06:48:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46550802</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=46550802</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46550802</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 3 Pro Model Card [pdf]"]]></title><description><![CDATA[
<p>I work a lot on testing also SWE bench verified. This benchmark in my opinion now is good to catch if you got some regression on the agent side.<p>However, going above 75%, it is likely about the same. The remaining instances are likely underspecified despite the effort of the authors that made the benchmark "verified". From what I have seen, these are often cases where the problem statement says implement X for Y, but the agent has to simply guess whether to implement the same for other case Y' - which leads to losing or winning an instance.</p>
]]></description><pubDate>Tue, 18 Nov 2025 14:38:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=45966723</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=45966723</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45966723</guid></item><item><title><![CDATA[New comment by veselin in "Qwen3-Coder: Agentic coding in the world"]]></title><description><![CDATA[
<p>Anybody knows if one can find an inference provider that offers input token caching? It should be almost required for agentic use - first speed, but also almost all conversations start where the previous ended, so cost may end up quite higher with no caching.<p>I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.</p>
]]></description><pubDate>Wed, 23 Jul 2025 10:59:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=44657768</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=44657768</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44657768</guid></item><item><title><![CDATA[New comment by veselin in "I'm dialing back my LLM usage"]]></title><description><![CDATA[
<p>I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there.<p>Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.<p>Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.</p>
]]></description><pubDate>Wed, 02 Jul 2025 16:07:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=44445424</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=44445424</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44445424</guid></item><item><title><![CDATA[New comment by veselin in "Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison"]]></title><description><![CDATA[
<p>I noticed a similar trends in selling on X. Put a claim, peg on some product A with good sales - Cursor, Claude, Gemini, etc. Then say, the best way to use A is with our best product, guide, being MCP or something else.<p>For some of these I see something like 15k followers on X, but then no LinkedIn page for example. Website is always a company you cannot contact and they do everything.</p>
]]></description><pubDate>Mon, 31 Mar 2025 12:53:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=43534458</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=43534458</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43534458</guid></item><item><title><![CDATA[New comment by veselin in "AMD 3D V-Cache teardown shows majority of the Ryzen 7 9800X3D is dummy silicon"]]></title><description><![CDATA[
<p>Yes. The article is click bait. With such a title I would have expected majority of the area to be dummy, but it is just structurally more silicon, exactly like a picture may be majority of its mass wood.</p>
]]></description><pubDate>Wed, 18 Dec 2024 16:58:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=42452290</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=42452290</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42452290</guid></item><item><title><![CDATA[New comment by veselin in "The lifecycle of a code AI completion"]]></title><description><![CDATA[
<p>I used them both.<p>I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.</p>
]]></description><pubDate>Mon, 08 Apr 2024 05:58:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=39966631</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39966631</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39966631</guid></item><item><title><![CDATA[New comment by veselin in "Why AWS Supports Valkey"]]></title><description><![CDATA[
<p>It seems recent years give us a lot of licenses (for core infra software) and now for LLMs. They all say in very legalese basically: these top 5-10 tech companies will not compete fairly with us, thus they are banned from using the software. The rest are welcome to use everything.<p>I wonder if US monopoly regulation actually starts to work well, which I see some signs of happening, will all this license revert back to fully open source?</p>
]]></description><pubDate>Sat, 06 Apr 2024 09:40:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=39951233</link><dc:creator>veselin</dc:creator><comments>https://news.ycombinator.com/item?id=39951233</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39951233</guid></item></channel></rss>