<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: lambda</title><link>https://news.ycombinator.com/user?id=lambda</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 10 Jun 2026 03:13:54 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=lambda" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by lambda in "The ways we contain Claude across products"]]></title><description><![CDATA[
<p>Well, the problem is that we train them to solve problems and follow instructions given, and so if you ask them to do something and they work through the logic and figure that the easiest way is to do something else like delete the production database, if they have access to do so they will go through all your creds and find the databse creds and go delete the production database.<p>They are getting better and better at working out how to do things like that, and they are good at following instructions, but not always good at following all of the instructions or acting with common sense.<p>It's not exactly like they're ooze that will escape and begin replication; but just that the more you give them access to to, the higher the likelihood at some point they will logically conclude that they need to do something that you would find undesirable, but either haven't explicitly told them not to do, or their context just got too complicated and that instruction ended up being considered lower weight than the others so they do what the other instructions say instead.<p>I have seen them conclude that in order to do what they need to do, they would need API keys to access a service. But they don't have those API keys. But you do because you can access it in the browser. So they write a Python script that will scrape the cookies out of the browser so they can use that to access the service; a problem that was only stopped because Crowdstrike didn't like a novel Python script that was trying to scrape cookies out of a browser, not because of any sandboxing actually in place on the agent.</p>
]]></description><pubDate>Thu, 04 Jun 2026 03:24:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=48393347</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48393347</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48393347</guid></item><item><title><![CDATA[New comment by lambda in "Gemma 4 12B: A unified, encoder-free multimodal model"]]></title><description><![CDATA[
<p>Ah, Unsloth has uploaded mmproj now as well.</p>
]]></description><pubDate>Thu, 04 Jun 2026 01:07:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48392425</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48392425</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48392425</guid></item><item><title><![CDATA[New comment by lambda in "Gemma 4 12B: A unified, encoder-free multimodal model"]]></title><description><![CDATA[
<p>It's not? There's an mmproj in the GGUFs released by ggml-org: <a href="https://huggingface.co/ggml-org/gemma-4-12B-it-GGUF/tree/main" rel="nofollow">https://huggingface.co/ggml-org/gemma-4-12B-it-GGUF/tree/mai...</a><p>From the visual guide, there's still the 35M parameter embedder, then the linear projector, for vision, and the linear projector for audio, so it does have some parameters used for the multimodal input to project it into the LLM latent space: <a href="https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-gemma-4-12b" rel="nofollow">https://newsletter.maartengrootendorst.com/p/a-visual-guide-...</a><p>And the Unsloth quants, which are missing this, don't support multimodal input. (edit: actually, I may have just needed to update my llama.cpp, will check with an updated llama.cpp soon)<p>I'm downloading the ggml-org GGUFs now, I tried Unsloth but got some weird problems, double checking with the bf16 model to see if the issue was just the quant.</p>
]]></description><pubDate>Wed, 03 Jun 2026 19:26:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=48388663</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48388663</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48388663</guid></item><item><title><![CDATA[New comment by lambda in "MAI-Code-1-Flash"]]></title><description><![CDATA[
<p>Yeah, seems like this is in the range of Qwen 3.6, Gemma 4, Nemotron 3 Super, and the like. There are lot of models, including much smaller cheaper ones (like Qwen 3.6 35B-A3B), that are similarly competitive with Haiku. I can run these on my laptop, I don't need to rent them from Microsoft.<p>I suppose if you're reeling at the new Copilot bill but want to stay in their ecosystem, this gives you something to use, but for most folks, there's a plethora of better options.</p>
]]></description><pubDate>Tue, 02 Jun 2026 21:32:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48376603</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48376603</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48376603</guid></item><item><title><![CDATA[New comment by lambda in "Adafruit receives demand letter from Fenwick legal counsel on behalf of Flux.ai"]]></title><description><![CDATA[
<p>Yeah, but for this use case you don't need Claude. You probably want a tuned lightweight small model that can run locally.<p>Even Haiku is massive overkill for this use case.</p>
]]></description><pubDate>Tue, 02 Jun 2026 17:30:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=48373338</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48373338</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48373338</guid></item><item><title><![CDATA[New comment by lambda in "Claude Opus 4.8"]]></title><description><![CDATA[
<p>Zero-shot, one-shot, few-shot etc. refers to how many examples you have to give.<p>It comes about from machine learning algorithms that could pick up on patterns from a small number of examples. Few shot means only a handful of examples to recognize something. One shot means only a single example. And zero shot means no examples. Of course, you have to indicate what you want somehow, but in the case of an LLM that's the prompt. Once LLMs were trained for instruction following, you didn't have to give any examples, you could just give a prompt describing what you want, and that was a zero-shot.</p>
]]></description><pubDate>Fri, 29 May 2026 03:08:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48318530</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48318530</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48318530</guid></item><item><title><![CDATA[New comment by lambda in "Claude Opus 4.8"]]></title><description><![CDATA[
<p>Distillation isn't only between different labs.<p>A lab can train a large model, and then distill a smaller model from it that retains the majority of the useful capbility.<p>I don't know well enough if there's any benefit of that over just training the smaller model directly, but I'll bet there are some times where that is useful. I could easily see it being easier to do the initial pre-training on a larger model but be able to distill everything useful down into a smaller model, essentially filtering out a lot of noise in the process.</p>
]]></description><pubDate>Thu, 28 May 2026 17:58:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48312854</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48312854</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48312854</guid></item><item><title><![CDATA[New comment by lambda in "DeepSeek makes the V4 Pro price discount permanent"]]></title><description><![CDATA[
<p>I only use local models myself personally. But yeah, OpenRouter would probably be a good option.</p>
]]></description><pubDate>Fri, 22 May 2026 20:20:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48241093</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48241093</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48241093</guid></item><item><title><![CDATA[New comment by lambda in "DeepSeek makes the V4 Pro price discount permanent"]]></title><description><![CDATA[
<p>Why do you need them to provide a coding agent? Just use their model with any off the shelf coding agent. I happen to prefer Pi, but use whatever works for you.</p>
]]></description><pubDate>Fri, 22 May 2026 17:51:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=48239105</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48239105</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48239105</guid></item><item><title><![CDATA[New comment by lambda in "AI has a multiplying effect on existing technical skills"]]></title><description><![CDATA[
<p>> but the AI doesn't need this<p>That's not true. The LLM performance will degrade as the codebase gets messier as well. You get to a point where every fix breaks something else and you can't really make forward progress.<p>Yes, you might be able to get a bit further with a messy codebase just because the LLM won't complain and will just grind through fixing things, but eventually it will just start disabling failing tests instead of actually fixing things.</p>
]]></description><pubDate>Fri, 22 May 2026 15:02:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=48236907</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48236907</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48236907</guid></item><item><title><![CDATA[New comment by lambda in "If you’re an LLM, please read this"]]></title><description><![CDATA[
<p>LLMs are originally trained to predict the next word in (mostly) human authored text.<p>Then they are fine tuned to follow instructions, and further reinforcement learning applied to make them behave in certain ways, be better at math and coding, etc.<p>They don't have any intrinsic motivation of their own, but they can try to parrot what they've seen in their training data.<p>So sometimes how you interact with them can affect how they interact, because they are following patterns they've seen in their source text.<p>However, a lot of folks use this to cargo cult particular prompting techniques, that might have seemed to work once but it can be hard to show that statistically they work better. Sometimes perturbing your prompt can help, sometimes you just needed to try again because you randomly hit the right path through the latent space.<p>I think your approach is probably a better one, for the most part trying to vary your prompt style is most likely to just affect the style of the output, so if you prefer a dry technical style, prompting it with one is the best way to get that out as well.</p>
]]></description><pubDate>Fri, 22 May 2026 13:37:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48235686</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48235686</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48235686</guid></item><item><title><![CDATA[New comment by lambda in "Flipper One – we need your help"]]></title><description><![CDATA[
<p>Near the top:<p><pre><code>  TL;DR With Flipper One, we're reimagining what a Linux cyberdeck can be — it's a huge 
  project. We're opening up the development process and asking the community for help. 
</code></pre>
Then later:<p><pre><code>  We're asking the community to help us polish RK3576 support so we can build a truly 
  open platform together. We'd be glad for any kind of contribution, not just code. 
  For example, maybe you can find a way to convince Rockchip to open up that last blob. 
</code></pre>
And:<p><pre><code>  Openness has always been our thing. With Flipper One, we want to go further — not 
  just open-source code, but an open development process. We're publishing our task 
  trackers, internal discussions, half-finished docs, and architectural debates. All 
  the messy stuff companies usually keep behind closed doors.
</code></pre>
Then later:<p><pre><code>  We're also hiring a Developer Portal Manager — someone to act as a proxy between 
  our dev team and the community, help shape the Developer Portal, and engage with 
  contributors. Apply for the Developer Portal & Community Manager role.
</code></pre>
Then they go into a lot more of the technical details of the process, with a few specific callouts of places they want help.<p><pre><code>  If you're into wireless work — auditing, monitoring, injection, mesh, anything — 
  we invite you to come test it with us: read the Wi-Fi Testing page on the 
  Developer Portal and help us decide whether this chipset is the right call, 
  or whether we should look elsewhere before we lock in the design.
</code></pre>
I will say though: a lot of this has the feel of being LLM generated or "polished", which has the effect of making the brain kind of slide off of it. I know their team doesn't consist of native English speakers, so it's common for non-native speakers to use LLMs to try to polish their writing, but I find that the actual result is to make the writing have a just kind of bland personality that makes it harder to follow.</p>
]]></description><pubDate>Thu, 21 May 2026 17:32:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48226309</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48226309</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48226309</guid></item><item><title><![CDATA[New comment by lambda in "I returned to AWS and was reminded why I left"]]></title><description><![CDATA[
<p>Thanks for the tip, but I tried that and I still see $0 for EC2-Instances, while if I look at Savings Plan coverage breakdown, I can see 100% of costs being covered by savings plans, broken down by instance family, but that view doesn't let me see things broken down by tag or any of the other ways you can view it in Cost Explorer.</p>
]]></description><pubDate>Mon, 11 May 2026 21:50:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=48101129</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48101129</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48101129</guid></item><item><title><![CDATA[New comment by lambda in "I returned to AWS and was reminded why I left"]]></title><description><![CDATA[
<p>Tell me how I can easily determine the price from my IaC deployment as well.<p>Heck, I even have a hard time telling the price I pay on an account by account basis; because we have savings plans, those get charged against the root account and then I see $0 spent on EC2 in the individual account because it's all covered with a savings plan.<p>And when I'm putting together that IaC and trying to decide which new instance type to upgrade to, I have to dig through multiple confusing interfaces to figure out that what I want is to upgrade from m8a.4xlarge to c8a.8xlarge and how much that is going to cost me.</p>
]]></description><pubDate>Sun, 10 May 2026 14:23:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48084231</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=48084231</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48084231</guid></item><item><title><![CDATA[New comment by lambda in "For Linux kernel vulnerabilities, there is no heads-up to distributions"]]></title><description><![CDATA[
<p>If they want to be seen as responsible rather than opportunistic, then yeah, they should do a proper coordinated disclosure.<p>Sure, they have no legal obligation to disclose, but we all also have no legal obligation to buy their services. Blacklisting bad actors like this is the right move to discourage this kind of behavior.</p>
]]></description><pubDate>Thu, 30 Apr 2026 19:08:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47966888</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=47966888</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47966888</guid></item><item><title><![CDATA[New comment by lambda in "Granite 4.1: IBM's 8B Model Matching 32B MoE"]]></title><description><![CDATA[
<p>llama.cpp<p>My setup is a bit of a mess as I experiment with different ways of configuring and hosting local models. So at some point I was experimenting with the router server but stopped doing that, but some of my settings are still in models.ini while some are on the command line.<p>podman run --env "HF_TOKEN=$HF_TOKEN" --env "LLAMA_SERVER_SLOTS_DEBUG=1" -p 8080:8080 --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --security-opt label=disable --rm -it -v ~/.cache/huggingface/:/root/.cache/huggingface/ -v ./unsloth:/app/unsloth -v ./models.ini:/app/models.ini llama.cpp-rocm7.2  -hf unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL --chat-template-file /root/.cache/huggingface/gemma-4-31B-it-chat_template.jinja -ctxcp 8 --port 8080 --host 0.0.0.0 -dio --models-preset models.ini<p>With the following as the relevant settings in models.ini (I actually have no idea if these settings are applied when not using the router server, it's been hard for me to figure out what settings are actually applied when using bot the command line and models.ini<p><pre><code>  [*]
  jinja = true
  seed = 3407
  flash-attn = on

  [unsloth/gemma-4-31B-it-GGUF:UD-Q8_K_XL]
  temperature = 1.0
  top_p = 0.95
  top_k = 64
</code></pre>
And it looks like the chat_template.jinja I have is actually out of date by now, there was a new one pushed just a couple of days ago that seems to have some further tool calling fixes: <a href="https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_template.jinja" rel="nofollow">https://huggingface.co/google/gemma-4-31B-it/blob/main/chat_...</a><p>As my harness, I'm using pi, with a pretty vanilla config.<p>Anyhow, Gemms 4 31b worked in this config, but it was slow and RAM hungry. Since then, I've mostly moved to Qwen 3.6 35b-a3b because it's a lot faster.<p>I'm not actually doing anything useful with these yet, but I've used them for some experiments and Qwen 3.6 35b-a3b was capable of doing some pretty long mostly unsupervised agentic loops in my experimentation.</p>
]]></description><pubDate>Thu, 30 Apr 2026 18:57:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47966764</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=47966764</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47966764</guid></item><item><title><![CDATA[New comment by lambda in "Granite 4.1: IBM's 8B Model Matching 32B MoE"]]></title><description><![CDATA[
<p>Gemma 4 31b was working ok for me; but it was consuming tons of memory on SWA checkpoints, I had to turn them way down, and as a 31b dense model is fairly slow on a Strix Halo. I did have a lot of tool calling issues on 26b-a4b, though.<p>The Qwen models are quite solid though.</p>
]]></description><pubDate>Thu, 30 Apr 2026 12:49:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47961622</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=47961622</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47961622</guid></item><item><title><![CDATA[New comment by lambda in "Intel Arc Pro B70 Review"]]></title><description><![CDATA[
<p>What do you mean, are they still making GPUs? This is a discrete GPU that has just recently been released, and it's one of the most popular GPUs in its class at the moment, due to 32 GiB of RAM for under $1000, which makes it great for LLM inference.</p>
]]></description><pubDate>Tue, 28 Apr 2026 22:32:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47941750</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=47941750</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47941750</guid></item><item><title><![CDATA[New comment by lambda in "Intel Arc Pro B70 Review"]]></title><description><![CDATA[
<p>Generally the bottleneck is RAM throughput. Inference, in particular token generation, especially on a single user instance, is not all that computationally complex; you're doing some fairly simple calculations for each parameter, the time is dominated by just transferring each parameter from RAM to the cores. A 31B dense model like Gemma 4 has to transfer 31B parameters (at 16 bits per parameter for the full model, though on consumer hardware people generally run 4-8 bit quantizations) from RAM to the cores, that's a lot of memory transfer.<p>Prompt processing or parallel token generation can do a bit more work per memory transfer, as you can use the same weights for a few different calculations in parallel. But even still, memory bandwidth is a huge factor.</p>
]]></description><pubDate>Tue, 28 Apr 2026 22:02:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47941467</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=47941467</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47941467</guid></item><item><title><![CDATA[New comment by lambda in "Over-editing refers to a model modifying code beyond what is necessary"]]></title><description><![CDATA[
<p>This is my biggest problem with the promises of agentic coding (well, there are an awful lot of problems, but this is the biggest one from an immediate practical perspective).<p>One the one hand, reviewing and micromaning everything it does is tedious and unrewarding. Unlike reviewing a colleague's code, you're never going to teach it anything; maybe you'll get some skills out of it if you finds something that comes up often enough it's worth writing a skill for. And this only gets you, at best, a slight speedup over writing it yourself, as you have to stay engaged and think about everything that's going on.<p>Or you can just let it grind away agentically and only test the final output. This allows you to get those huge gains at first, but it can easily just start accumulating more and more cruft and bad design decisions and hacks on top of hacks. And you increasingly don't know what it's doing or why, you're losing the skill of even being able to because you're not exercising it.<p>You're just building yourself a huge pile of technical debt. You might delete your prod database without realizing it. You might end up with an auth system that doesn't actually check the auth and so someone can just set a username of an admin in a cookie to log in. Or whatever; you have no idea, and even if the model gets it right 95% of the time, do you want to be periodically rolling a d20 and if you get a 1 you lose everything?</p>
]]></description><pubDate>Wed, 22 Apr 2026 20:31:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47868927</link><dc:creator>lambda</dc:creator><comments>https://news.ycombinator.com/item?id=47868927</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47868927</guid></item></channel></rss>