<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: twotwotwo</title><link>https://news.ycombinator.com/user?id=twotwotwo</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 21 Apr 2026 12:22:53 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=twotwotwo" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by twotwotwo in "Kimi K2.6: Advancing open-source coding"]]></title><description><![CDATA[
<p>Kagi has it as an option in its Assistant thing, where there is naturally a lot of searching and summarizing results. I've liked its output there and in general when asked for prose that isn't in the list/Markdown-heavy "LLM style." It's hard to do a confident comparison, but it's seemed bold in arranging the output to flow well, even when that took surgery on the original doc(s). Sometimes the surgery's needed e.g. to connect related ideas the inputs treated as separate, or to ensure it really replies to the request instead of just dumping info that's somehow related to it.</p>
]]></description><pubDate>Mon, 20 Apr 2026 17:02:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47837202</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47837202</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47837202</guid></item><item><title><![CDATA[New comment by twotwotwo in "Are the costs of AI agents also rising exponentially? (2025)"]]></title><description><![CDATA[
<p>You could model more of the process: the dev's work as well as the model's, and the cost of catching a bug later or deploying it live. Those tasks push me further towards smaller tasks in general. (And they make the Gas Town type stuff seem more baffling.)<p>- Smaller chunks make review much easier and more effective at finding bugs, as we've known since long before LLMs.<p>- Greater certainty provides a better development experience. I've heard people talk about how LLM development can be tiring. One way that happens, I think, is the win-or-lose drama of feeding in huge tasks with a substantial chance of failure. I think if you're succeeding 95% of the time instead of 70%, and the 5% are easier to deal with (smaller chunks to debug), it's a better experience.<p>- Everything is harder about real-world tasks because they aren't clean verifiable-reward benchmarks. Developers have context that models don't, so it's common that a problem traces to an detail not in the spec where the model guessed wrong. For real-world tasks "failures" are also sometimes "that UI is bad" or "that way of coding it is hard to maintain." And it's possible to have problems the dev simply doesn't notice. The benchmarks' fully computer-checkable outcomes are 'easy mode' compared to the real world.<p>- Fixing agents' mess becomes more work as task sizes increase. (Like the certainty thing, but about cost in hours than the experience.) Again, if the model has spat out 1000 lines and stumped itself debugging a failure, it'll take you some time to figure out: more time than debugging 250-line patch, <i>and</i> the larger patch is more likely to have bugs. And if an issue bug makes it out to peer review, you can add communication and context-switching cost (point out bug, fix, re-review) on top of that.<p>- Bugs that reach prod are <i>really</i> expensive. More of a problem when a prod bug can lose you customers vs., say, on most hobby things. Ord's post gestures at it: there are "cases where failure is much worse than not having tried at all." That magnifies how important it is the review be good, and how much of a problem bugs that sneak through are, which points towards doing smaller chunks.<p>How significant each factor is depends on details: how easy the task is to verify, how well-specified it is (and more generally how much it's in the models' wheelhouse, and how much in mine), how bad a bug would be (fun thing? internal tool? user facing? can lose data?).<p>I think the dynamics above apply across a range of model strengths, but that doesn't mean the changes from say Sonnet 3.7 to Opus 4.5 didn't mean anything; the machine getting better at getting the info it needs and checking itself still helps at shorter task lengths. Harness improvements can help, e.g. they could help keep models of the 'too much context, model got silly' zone (may be less severe than it once was, but I suspect will remain a thing), build better context, and clean up code as well as spitting results out.<p>Besides taking more of your time up front, involving yourself more also tends to drift towards you making more of the lower-level decisions about how the code will look, which I find double-edged. You have better broad context, and you know what you find maintainable. But the implementer, model or another person, is closer to the code, which helps it make some mid-to-low-level decisions well.<p>Plan modes and Spec-Kit type things can help with the balance of getting involved but letting the model do its thing. I've liked asking the LLM to ask a lot of questions and surface doubts. A colleague messed with Spec-Kit so it would pick one change on its fine-grained to-do list at a time, which is a neat hack I'd like to try sometime.</p>
]]></description><pubDate>Sat, 18 Apr 2026 18:06:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47818053</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47818053</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47818053</guid></item><item><title><![CDATA[New comment by twotwotwo in "Dependency cooldowns turn you into a free-rider"]]></title><description><![CDATA[
<p>The topic of cooldowns just shifting the problem around got some discussion on an earlier post about them -- what I said there is at <a href="https://lobste.rs/s/rygog1/we_should_all_be_using_dependency#c_crmfmf" rel="nofollow">https://lobste.rs/s/rygog1/we_should_all_be_using_dependency...</a> and here's something similar:<p>- One idea is for projects not to update each dep just X hours after release, but on their own cycles, every N weeks or such. Someone still gets bit first, of course, but not everyone at once, and for those doing it, any upgrade-related testing or other work also ends up conveniently batched.<p>- Developers legitimately vary in how much they value getting the newest and greatest vs. minimizing risk. Similar logic to some people taking beta versions of software. A brand new or hobby project might take the latest version of something; a big project might upgrade occasionally and apply a strict cooldown. For users' sake, there is value in any projects that get bit not being the widely-used ones!<p>- Time (independent of usage) does catch <i>some</i> problems. A developer realizes they were phished and reports, for example, or the issue is caught by someone looking at a repo or commit stream.<p>As I lamented in the other post, it's unfortunate that merely using an upgraded package for a test run often exposes a bunch of a project's keys and so on. There are more angles to attack this from than solely when to upgrade packages.</p>
]]></description><pubDate>Wed, 15 Apr 2026 04:38:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47774739</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47774739</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47774739</guid></item><item><title><![CDATA[New comment by twotwotwo in "Claude mixes up who said what"]]></title><description><![CDATA[
<p>There is nothing specific to the role-switching here (as opposed to other mistakes), but I also notice them sometimes 1) realizing mistakes with "-- wait, that won't work" even mid-tool-call and 2) torquing a sentence around to maintain continuity after saying something wrong (amusingly blaming "the OOM killer's cousin" for a process dying, probably after outputting "the OOM killer" then recognizing it was ruled out).<p>Especially when thinking's off they can sometimes start with a wrong answer then talk their way around to the right one, but never quite acknowledge the initial answer as wrong, trying to finesse the correction as a 'well, technically' or refinement.<p>Anyhow, there are subtleties, but I wonder about giving these things a "restart sentence/line" mechanism. It'd make the '--wait,' or doomed tool-call situations more graceful, and provide a 'face-saving' out after a reply starts off incorrect. (It also potentially creates a sort of backdoor thinking mechanism in the middle of non-thinking replies, but maybe that's a feature.) Of course, we'd also need to get it to recognize "wait, I'm the assistant, not the user" for it to help here!</p>
]]></description><pubDate>Thu, 09 Apr 2026 18:43:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47707867</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47707867</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47707867</guid></item><item><title><![CDATA[New comment by twotwotwo in "Claude mixes up who said what"]]></title><description><![CDATA[
<p>I agree with the addition at the end -- I think this is a model limitation not a harness bug. I've seen recent Claudes act confused about who they are when deep in context, like accidentally switching to the voice of the authors of a paper it's summarizing without any quotes or an indication it's a paraphrase ("We find..."), or amusingly referring to "my laptop" (as in, Claude's laptop).<p>I've also seen it with older or more...chaotic? models. Older Claude got confused about who suggested an idea later in the chat. Gemini put a question 'from me' in the middle of its response and went on to answer, and once decided to answer a factual social-science question in the form of an imaginary news story with dateline and everything. It's a tiny bit like it forgets its grounding and goes base-model-y.<p>Something that might add to the challenge: models are already supposed to produce user-like messages to subagents. They've always been expected to be able to switch personas to some extent, but now even within a coding session, "always write like an assistant, never a user" is not necessarily a heuristic that's always right.</p>
]]></description><pubDate>Thu, 09 Apr 2026 18:32:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47707661</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47707661</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47707661</guid></item><item><title><![CDATA[New comment by twotwotwo in "What makes Intel Optane stand out (2023)"]]></title><description><![CDATA[
<p>One potential application I briefly had hope for was really good power loss protection in front of a conventional Flash SSD. You only need a little compared to the overall SSD capacity to be able to correctly report the write was persisted, and it's always running, so there's less of a 'will PLP work when we really need it?' question. (Maybe there's some use as a read cache too? Host RAM's probably better for that, though.) It's going to be rewritten lots of times, but it's supposed to be ready for that.<p>It seems like there's a very small window, commercially, for new persistent memories. Flash throughput scales really cost-efficiently, and a lot is already built around dealing with the tens-of-microseconds latencies (or worse--networked block storage!). Read latencies you can cache your way out of, and writers can either accept commit latency or play it a little fast and loose (count a replicated write as safe enough or...just not be safe). You have to improve on Flash by enough to make it worth the leap while remaining cheaper than other approaches to the same problem, and you have to be confident enough in pulling it off to invest a ton up front. Not easy!</p>
]]></description><pubDate>Sun, 15 Mar 2026 17:16:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47389498</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47389498</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47389498</guid></item><item><title><![CDATA[New comment by twotwotwo in "Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs"]]></title><description><![CDATA[
<p>This is fascinating, and makes me wonder what other things that 'should' be impossible might just be waiting for the right configuration to be tried.<p>For example, we take for granted the context model of LLMs is necessary, that all you can do is append and anything that changes the beginning requires a recalculation of whatever comes after it. And that does match how training works.<p>But all <i>sorts</i> of things would become possible if it were possible to shift things in and out of context without recomputing it all; conservatively you could avoid compaction, optimistically it might be a way to get info to the model that's both more deeply integrated than search and more efficient than training larger and larger models.</p>
]]></description><pubDate>Wed, 11 Mar 2026 05:13:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47331935</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47331935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47331935</guid></item><item><title><![CDATA[New comment by twotwotwo in "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"]]></title><description><![CDATA[
<p>For folks that like this kind of question, SimpleBench (<a href="https://simple-bench.com/" rel="nofollow">https://simple-bench.com/</a> ) is sort of neat. From the sample questions (<a href="https://github.com/simple-bench/SimpleBench/blob/main/simple_bench_public.json" rel="nofollow">https://github.com/simple-bench/SimpleBench/blob/main/simple...</a> ), a common pattern seems to be for the prompt to 'look like' a familiar/textbook problem (maybe with detail you'd need to solve a physics problem, etc.) but to get the actually-correct answer you have to ignore what the format appears to be hinting at and (sometimes) pull in some piece of human common sense.<p>I'm not sure how effectively it isolates a single dimension of failure or (in)capacity--it seems like it's at least two distinct skills to 1) ignore false cues from question format when there's in fact a crucial difference from the template and 2) to reach for relevant common sense at the right times--but it's sort of fun because that <i>is</i> a genre of prompt that seems straightforward to search for (and, as here, people stumble on organically!).</p>
]]></description><pubDate>Mon, 16 Feb 2026 18:39:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=47038514</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=47038514</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47038514</guid></item><item><title><![CDATA[New comment by twotwotwo in "Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation"]]></title><description><![CDATA[
<p>Yeah, this(-ish): there are shipping models that don't eliminate N^2 (if a model can repeat your code back with edits, it needs to reference everything <i>somehow</i>), but still change the picture a lot when you're thinking about, say, how resource-intensive a long-context coding session is.<p>There are other experiments where model designers mix full-attention layers with limited-memory ones. (Which still doesn't avoid N^2, but if e.g. 3/4 of layers use 'light' attention, it still improves efficiency a lot.) The idea is the model can still pull information from far back in context, just not in every layer. Use so far is limited to smaller models (maybe it costs too much model capability to use at the high end?) but it seems like another interesting angle on this stuff.</p>
]]></description><pubDate>Thu, 05 Feb 2026 06:58:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46896582</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46896582</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46896582</guid></item><item><title><![CDATA[New comment by twotwotwo in "Exe.dev"]]></title><description><![CDATA[
<p>FWIW, here are (mostly) their agent's tips for other agents from exploring a mostly-new system including tidbits like how to get recent Node: <a href="https://s3.us-east-1.amazonaws.com/1FV6XMQKP2T0D9M8FF82-cache/exedev-AGENTS.md" rel="nofollow">https://s3.us-east-1.amazonaws.com/1FV6XMQKP2T0D9M8FF82-cach...</a><p>It's very much a snapshot of what happens to come on a new VM today, and I put a little disclaimer in it to try to help tools get unstuck if anything there proves to be outdated or a flat-out (accidental) lie.</p>
]]></description><pubDate>Sat, 27 Dec 2025 21:30:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46405433</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46405433</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46405433</guid></item><item><title><![CDATA[New comment by twotwotwo in "Exe.dev"]]></title><description><![CDATA[
<p>I have played with it and it's so easy get started with that now I <i>want</i> a quick-project idea as an excuse to use it!<p>I'm sure you've thought of this, but: lots of people have some amount of 'free' (or really: zero incremental cost to users) access to some coding chat tool through a subscription or free allowance like Google's.<p>If you wanted to let those programs access your custom tools (browser!) and docs about the environment, a low-fuss way might be to drop a skills/ dir of info and executables that call your tools into new installs' homedirs, and/or a default AGENTS.md with the basic info and links to more.<p>And this seems like more fuss, but if you wanted to be able to expose to the Web whatever coding tool people 'bring', similar to how you expose your built-in chat, there's apparently an "agent control protocol" used as a sort of cross-vendor SDK by projects like <a href="https://willmcgugan.github.io/toad-released/" rel="nofollow">https://willmcgugan.github.io/toad-released/</a> that try to put a nice interface on top of everything. Not saying this'd be easy at all, but you could imagine the choice between a few coding tools and auth info for them as profile-level settings pushed to new VMs. Or maybe no special settings, and bringing your own tools is just a special case of bringing your own image or setup script.<p>But, as y'all note, it's a VM. You can install whatever and use it through the terminal (or VSCode remoting or something else). "It's a computer" is quite a good open standard to build on.<p>Is the chat descended from Sketch?</p>
]]></description><pubDate>Sat, 27 Dec 2025 01:50:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=46398369</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46398369</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46398369</guid></item><item><title><![CDATA[New comment by twotwotwo in "Exe.dev"]]></title><description><![CDATA[
<p>So I tried this the other day after Filippo Valsorda, another Go person, posted about it. My reaction was 'whoa, this <i>really</i> makes it easier to start a quick project', and it took a minute to figure out <i>why</i> I felt that way when, I mean, I have a laptop and could spin up cloud stuff--arguably I <i>already</i> had what I needed.<p>I think it's the combination of 1) <i>really</i> quick to get going, 2) isolated and disposable environments and 3) can be persistent and out there on the Internet.<p>Often to get element 3, persistent and public, I had to jump through hoops in a cloud console and/or mess with my 'main' resources (install things or do other sysadmin work on a laptop or server, etc.), resources I use for other stuff and would prefer not to clutter up with every experiment I attempt.<p>Here I can make a thing and if I'm done, I'm done, nothing else impacted, <i>or</i> if it's useful it can stick around and become shared or public. Some other environments also have 'quick to start, isolated, and disposable' down, but are ephemeral only, limited, or don't have great publishing or sharing, and this avoids that trough too. And VMs go well with building general-purpose software you could fling onto any machine, not tied to a proprietary thing.<p>This is good stuff. I hope they get a sustainable paid thing going. I'd sign up.<p>Also, though I realize in a sense it'd be competition to a business I just said I like: some parts of the design could work elsewhere too. You could have an open-source "click here to start a thing! and click here to archive it." layer above a VM, machine, or whatever sort of cloud account; could be a lot of fun. (I imagine someone will think "have you looked at X?" here, and yes, chime in, interested in all sorts of potential values of X.)</p>
]]></description><pubDate>Sat, 27 Dec 2025 01:17:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=46398209</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46398209</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46398209</guid></item><item><title><![CDATA[New comment by twotwotwo in "Measuring AI Ability to Complete Long Tasks"]]></title><description><![CDATA[
<p>Yeah--I wanted a short way to gesture at the subsequent "tasks that are fast for someone but not for you are interesting," and did not mean it as a gotcha on METR, but I should've taken a second longer and pasted what they said rather than doing the "presumably a human competent at the task" handwave that I did.</p>
]]></description><pubDate>Sun, 21 Dec 2025 08:08:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=46343108</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46343108</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46343108</guid></item><item><title><![CDATA[New comment by twotwotwo in "Measuring AI Ability to Complete Long Tasks"]]></title><description><![CDATA[
<p>Yeah--it's difficult to go from a benchmark involving the model attempting things alone to the effect assisting people on real tasks because, well, <i>ideally</i> you'd measure that with real people doing real tasks. Last time METR tried that (in early '25) they found a net slowdown rather than any speedup at all. Go figure!</p>
]]></description><pubDate>Sun, 21 Dec 2025 07:23:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46342930</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46342930</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46342930</guid></item><item><title><![CDATA[New comment by twotwotwo in "Measuring AI Ability to Complete Long Tasks"]]></title><description><![CDATA[
<p>I'm conflicted about opining on models: no individual has actually done a large sample of real-world tasks with a lot of models to be able to speak with authority, but I kinda think we should each share our dubiously-informed opinions anyway because benchmarks aren't necessarily representative of real-world use and many can clearly be gamed.<p>Anyhow, I noticed more of a difference trying Opus 4.5 compared to Sonnet 4.5 than I'd noticed from, for example, the last couple Sonnet bumps. Objectively, at 1.66x Sonnet's price instead of the old 5x, it's much more often practical to consider reaching for than past Opus models. Anthropic's basic monthly thing also covers a fair amount of futzing with it in CC.<p>At the other extreme, another surprise of this family is that Haiku 4.5 with reasoning on is usable: better than Sonnet with thinking off according to some bencharks, and in any case subjectively decent for point edits, single-page thingies, and small tools.</p>
]]></description><pubDate>Sun, 21 Dec 2025 06:13:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46342686</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46342686</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46342686</guid></item><item><title><![CDATA[New comment by twotwotwo in "Measuring AI Ability to Complete Long Tasks"]]></title><description><![CDATA[
<p>METR is using hours of equivalent human effort, not actual hours the agent itself spends, so by their methodology, your task might qualify as one where it pulls off much more than 4h of human work.<p>"Human hours equivalent" itself is an interesting metric, because: which human? Or rather, I'm sure they had a coherent definition in mind: presumably a human reasonably competent at whatever the specific task is. But hours the abstract human standard would spend is different from the hours any specific person, say you or I, would spend.<p>In particular, some of the appeal (and risk!!) of these things is precisely that you can ask for help with things that would be quick work for <i>someone</i> (who knows jq, or a certain corner of the PyPI library ecosystem, or modern CSS, or TypeScript annotations, or something else) but not for you.</p>
]]></description><pubDate>Sun, 21 Dec 2025 05:44:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46342537</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46342537</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46342537</guid></item><item><title><![CDATA[New comment by twotwotwo in "Performance Hints"]]></title><description><![CDATA[
<p>Your version only describes what happens if you do the operations serially, though. For example, a consumer SSD can do a million (or more) operations in a second not 50K, and you can send a lot more than 7 total packets between CA and the Netherlands in a second, but to do either of those you need to take advantage of parallelism.<p>If the reciprocal numbers are more intuitive for you you can still say an L1 cache reference takes 1/2,000,000,000 sec. It's "ops/sec" that makes it look like it's a throughput.<p>An interesting thing about the latency numbers is they mostly don't vary with scale, whereas something like the total throughput with your SSD or the Internet depends on the size of your storage or network setups, respectively. And aggregate CPU throughput varies with core count, for example.<p>I do think it's still interesting to think about throughputs (and other things like capacities) of a "reference deployment": that can affect architectural things like "can I do this in RAM?", "can I do this on one box?", "what optimizations do I need to fix potential bottlenecks in XYZ?", "is resource X or Y scarcer?" and so on. That was kind of done in "The Datacenter as a Computer" (<a href="https://pages.cs.wisc.edu/~shivaram/cs744-readings/dc-computer-v3.pdf" rel="nofollow">https://pages.cs.wisc.edu/~shivaram/cs744-readings/dc-comput...</a> and <a href="https://books.google.com/books?id=Td51DwAAQBAJ&pg=PA72#v=onepage&q&f=false" rel="nofollow">https://books.google.com/books?id=Td51DwAAQBAJ&pg=PA72#v=one...</a> ) with a machine, rack, and cluster as the units. That diagram is about the storage hierarchy and doesn't mention compute, and a lot has improved since 2018, but an expanded table like that is still seems like an interesting tool for engineering a system.</p>
]]></description><pubDate>Fri, 19 Dec 2025 21:21:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=46331083</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46331083</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46331083</guid></item><item><title><![CDATA[New comment by twotwotwo in "Zebra-Llama – Towards efficient hybrid models"]]></title><description><![CDATA[
<p>These are potentially complementary approaches. Various innovations have shrunk the KV cache size or (with DSA) how much work you have to do in each attention step. This paper is about hybrid models where some layers' state needs don't grow with context size at all.<p>SSMs have a fixed-size state space, so on their own they'll never going be able to recite a whole file of your code in a code-editing session for example. But if much of what an LLM is doing <i>isn't</i> long-distance recall, you might be able to get away with only giving some layers full recall capability, with other layers manipulating the info already retrieved (plus whatever's in their own more limited memory).<p>I think Kimi Linear Attention and Qwen3-next are both doing things a little like this: most layers' attention/memory doesn't grow with context size. Another approach, used in Google's small open Gemma models, is to give some layers only 'local' attention (most recent N tokens) and give a few 'full' (whole context window) attention. I guess we're seeing how those approaches play out and how different tricks can be cobbled together.<p>There can potentially be a moneyball aspect to good model architecture. Even if <i>on its own</i> using space-saving attention mechanisms in some layers of big models cost something in performance, their efficiency could allow you to 'spend' more elsewhere (more layers or more params or such) to end with overall better performance at a certain level of resources. Seems like it's good to have experiments with many different approaches going on.</p>
]]></description><pubDate>Sun, 07 Dec 2025 06:29:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=46179630</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46179630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46179630</guid></item><item><title><![CDATA[New comment by twotwotwo in "Chrome Jpegxl Issue Reopened"]]></title><description><![CDATA[
<p>Wanted to note <a href="https://issues.chromium.org/issues/40141863" rel="nofollow">https://issues.chromium.org/issues/40141863</a> on making the lossless JPEG recompression a Content-Encoding, which provides a way that, say, a CDN could deploy it in a way that's fully transparent to end users (if the user clicks Save it would save a .jpg).<p>(And: this is great! I think JPEG XL has chance of being adopted with the recompression "bridge" and fast decoding options, and things like progressive decoding for its VarDCT mode are practical advantages too.)</p>
]]></description><pubDate>Mon, 24 Nov 2025 16:12:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=46035634</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=46035634</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46035634</guid></item><item><title><![CDATA[New comment by twotwotwo in "[dead]"]]></title><description><![CDATA[
<p>"What are the consequences of recent, controversial changes in policy?" does not become an irrelevant question simply because you can also think of hypothetical policies.</p>
]]></description><pubDate>Mon, 17 Nov 2025 04:24:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=45950834</link><dc:creator>twotwotwo</dc:creator><comments>https://news.ycombinator.com/item?id=45950834</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45950834</guid></item></channel></rss>