<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: porridgeraisin</title><link>https://news.ycombinator.com/user?id=porridgeraisin</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 16 Apr 2026 16:34:02 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=porridgeraisin" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by porridgeraisin in "Introspective Diffusion Language Models"]]></title><description><![CDATA[
<p>Yeah, I think it's a super neat way to do MTP. Conceptually much more pleasing and simple than existing methods. Especially since this way scaling `k` as models get better will be easier. Wish it had been presented as such.</p>
]]></description><pubDate>Thu, 16 Apr 2026 09:27:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47790683</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47790683</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47790683</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Ask HN: Who is using OpenClaw?"]]></title><description><![CDATA[
<p>Didn't know about qmd.<p>I use a mix of markdown notes, an sqlite database, and my image store searchable by text. I use immich.<p>For now I do it manually by giving it skills for each data store I wanna access.<p>My usecases are all ad-hoc I am not a "pro" user by any means. So I don't mind some manual work.</p>
]]></description><pubDate>Thu, 16 Apr 2026 09:25:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47790673</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47790673</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47790673</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Moving a large-scale metrics pipeline from StatsD to OpenTelemetry / Prometheus"]]></title><description><![CDATA[
<p>Yeah, at previous work we used both as well. The transition from prom to vm was "ongoing" and from the time I joined to the time I left we did parallel writes to both. Never faced issues with either. If I remember correctly, we wrote from services to a kafka queue first, and then a consumer took that and pushed it to (both) the metrics endpoint(s).</p>
]]></description><pubDate>Thu, 16 Apr 2026 09:20:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47790619</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47790619</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47790619</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Ask HN: Who is using OpenClaw?"]]></title><description><![CDATA[
<p>Yep. I had posted a comment earlier detailing my usecases. But I too replaced that with my own system that does those same things.<p>It's way too bloaty, felt like operating windows start menu search.<p>But you might have missed so far some of the ideas they have. So it's useful to try it out, see what combination of features you use in particular and then just set those up for yourself with claude code or whatever as the LLM harness. Telegram integration is dead easy.</p>
]]></description><pubDate>Wed, 15 Apr 2026 22:21:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=47786097</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47786097</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47786097</guid></item><item><title><![CDATA[New comment by porridgeraisin in "CRISPR takes important step toward silencing Down syndrome’s extra chromosome"]]></title><description><![CDATA[
<p>Sure, but given the choice to not have down syndrome, I'm sure they will choose it. Were they given the choice? Not as a hypothetical. But in front of their eyes.</p>
]]></description><pubDate>Wed, 15 Apr 2026 17:37:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47782463</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47782463</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47782463</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Study: Back-to-basics approach can match or outperform AI in language analysis"]]></title><description><![CDATA[
<p>Tbh. The accuracy of translation is, while much better than prior methods, not that great yet. For tamil atleast.</p>
]]></description><pubDate>Wed, 15 Apr 2026 17:36:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47782443</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47782443</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47782443</guid></item><item><title><![CDATA[Microsoft Takes over Norway Stargate Data Center from OpenAI]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.bloomberg.com/news/articles/2026-04-14/microsoft-takes-over-norway-openai-data-center-capacity">https://www.bloomberg.com/news/articles/2026-04-14/microsoft-takes-over-norway-openai-data-center-capacity</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47779021">https://news.ycombinator.com/item?id=47779021</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 15 Apr 2026 13:56:57 +0000</pubDate><link>https://www.bloomberg.com/news/articles/2026-04-14/microsoft-takes-over-norway-openai-data-center-capacity</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47779021</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47779021</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Introspective Diffusion Language Models"]]></title><description><![CDATA[
<p>Eh. There is nothing diffusion about this. Nothing to do with denoising. This setup is still purely causal, making it quite a dishonest framing IMO. There is no more introspection here than what happens in MTP + SD setups.<p>Let me explain what is going on here. This is basically a form of multi-token prediction. And speculative decoding in inference. See my earlier post[1] to understand what that is. TL;DR, in multi-token prediction you train separate LM heads to predict the next as well as next to next token as well as... Upto chosen next kth token. Training multiple LM heads is expensive and can be unnecessary, so what people typically do is have a common base for all the k heads, explained further in [1]. These guys do another variant.<p>Here is what they do mechanically, given a sequence p consisting of five tokens PE([p1, p2, p3, p4, p5]). Where PE(.) adds relative position info to each token.<p>1. Create an augmented sequence PE([p1 MASK MASK MASK MASK]). Do a training pass on that, with the ground truth sequence p1..5. Here it is trained to, for example, to predict p3 given p1+pos=-2 MASK+pos=-1 MASK+pos=0, loosely notating.<p>2. Then separately[2], train it <i>as usual</i> on PE([p1 p2 p3 p4 p5]).<p>Step (1) teaches it to do multi-token prediction, essentially the single LM head will (very very loosely speaking) condition on the position `k` of the special MASK token and "route" it to the "implicit" k'th LM head.<p>Step (2) teaches it to be a usual LLM and predict the next token. No MASK tokens involved.<p>So far, you have trained a multi-token predictor.<p>Now during inference<p>You use this for speculative decoding. You generate 5 tokens ahead at once with MASK tokens. And then you run that sequence through the LLM again. This has the same benefits as usual speculative decoding, namely that you can do matrix-matrix multiplication as opposed to matrix-vector. The former is more memory-bandwidth efficient due to higher arithmetic intensity.<p>here is an example,<p>query = ["what", "is", "2+2"])
prompt = PE([...query, MASK*5])
you run output = LLM(prompt). Say output is ["what", "is", "2+2", "it", "is", "4"]. Note that the NN is trained to predict the kth next token when faced with positionally encoded MASK tokens. So you get all 5 in one go. To be precise, it learns to predict "4" given ["what", "is", "2+2", MASK, MASK]. Since it does not need the "it" and "is" explicitly, you can do it in parallel with generating the "it" and the "is". "is" is predicted given ["what", "is", "2+2", MASK], for example, and that also doesn't depend on the explicit "it" being there, and thus can also be done in parallel with generating "it", which is just normal generating the next token given the query. And then you use this as a draft in your speculative decoding setup.<p>Their claim is that using a multi-token predictor this way as a draft model works really well. To be clear, this is still causal, the reason diffusion models have hype is because they are capable of global refinement. This is not. In the same thread as [1], I explain how increasing the number of MASK tokens, i.e increasing `k`, i.e the number of tokens you predict at once in your multi-token prediction setup quickly leads to poor quality. This paper agrees with that. They try out k=2,3,4,8. They see a drop in quality at 8 itself. So finally, this is 4-token-prediction with self-speculative decoding(sans LayerSkip or such), removing seemingly no existing limitation of such setups. It is definitely an interesting way to train MTP though.<p>[1] <a href="https://news.ycombinator.com/item?id=45221692">https://news.ycombinator.com/item?id=45221692</a><p>[2] Note that it is computationally a single forward pass. Attention masks help you fuse steps 1 and 2 into a single operation. However, you still have 2 separate loss values.</p>
]]></description><pubDate>Tue, 14 Apr 2026 19:09:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47769992</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47769992</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47769992</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Missouri town fires half its city council over data center deal"]]></title><description><![CDATA[
<p>I guess their point is that of all possible industrial usecases, data centers are the least obnoxious one. I live in one of the countries that actually manufactures things, unlike the US, and I find it hard to argue with that. Any noise pollution caused by data centers is far far less than most industrial setups. It's the same with every other resource, water, electricity, effect on local shared infrastructure like roads and commerce, etc,. Other industries are an order of magnitude worse.<p>Given that you _have_ to have some industrial setup unless you want to import everything (tokens, in this case), datacenters are far and away the best choice.<p>I'll add a qualifier to the above, modifying it to say that of all industrial setups generating atleast X dollars of economic value, datacenters are far and away the best in terms of impact on nbhd.<p>The jobs argument also falls apart, when you consider that it's essentially 100 jobs in return for just an office building worth of space. If you want a thousand job plant just build that as well next town over, it will take way way more space and other resources though. The reason that didnt happen even before this datacenter boom is because most manufacturing setups are fairly infeasible in rich countries like the US. I can't imagine the response to a textile plant or a steel plant if this is the response to datacenters.<p>I agree however, that if you colocate a gigantic power plant, then you get the worst of both worlds. Fewer jobs and the hindrance of a big power plant near residential areas. Grid expansion being slow in developed areas like most of the US is not surprising though.<p>But this is pretty much the best case scenario. Tolerating the power plant until the grid expands is the way to go I suppose.</p>
]]></description><pubDate>Mon, 13 Apr 2026 18:13:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47755861</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47755861</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47755861</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Microsoft isn't removing Copilot from Windows 11, it's just renaming it"]]></title><description><![CDATA[
<p>The copilot executable and the edge executable are actually the same! It looks at argv[0] to decide which to show you. You can move mscopilot.exe to msedge.exe, it still opens edge. And vice versa.</p>
]]></description><pubDate>Mon, 13 Apr 2026 16:38:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47754614</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47754614</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47754614</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Pro Max 5x quota exhausted in 1.5 hours despite moderate usage"]]></title><description><![CDATA[
<p>I wanted this as well. Even asked about it at an openai talk. Basically a way to get the KV cache to the client (they can encrypt it if they care about me REing it, make a compressed latent if they don't wanna egress 20GB, whatever, I'm fine with a black box) so that I can load it later and avoid these cache misses.<p>I think the primary reason they cannot do this is that they change the memory and communication layouts in their serving stack rather aggressively. And naturally keeping the KV cache portable across all such layouts is a very difficult task. So you'd have to version the cache down to a specific deployment, and invalidate it the moment anything even small changes. So giving the user a handle to the cache sort of prevents you from making large changes to memory layout. Which is I suppose not that enticing. Also, client side KV caches are only meaningful in today's 1M contexts. Few y back it wasn't necessary, since just recomputing would be better for everybody.<p>To be clear, I don't mean they send it along with every request. Rather, they do their current TTL cache, and then when I'm at the end of a session, I request it in one shot and then close the session. And it doesn't have to come to the literal client, they can egress it to a storage service that we pay for, whatever. But ya the compat problem makes it all a non starter.</p>
]]></description><pubDate>Mon, 13 Apr 2026 14:34:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=47752602</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47752602</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47752602</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Many African families spend fortunes burying their dead"]]></title><description><![CDATA[
<p>Funerals are huge in india too. It runs for 13 days in some communities. To be clear, the actual cremation happens immediately, but the funeral ceremonies continue for 13 days after that.<p>Most of the expenses are days of one-meal-a-day for guests, and the general extra expenses of having a lot of relatives over at your house. The ceremonies themselves are fairly cheap - it's mostly prayers.<p>However there is no insurance and so on, since the aforementioned expenses scale with usual QoL.</p>
]]></description><pubDate>Fri, 10 Apr 2026 08:15:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47715102</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47715102</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47715102</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Six (and a half) intuitions for KL divergence"]]></title><description><![CDATA[
<p>> So minimising the cross entropy over theta is the same as maximising KL(P,Q)<p>Minimising*</p>
]]></description><pubDate>Fri, 10 Apr 2026 06:27:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47714329</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47714329</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47714329</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"]]></title><description><![CDATA[
<p>IMO, it's an expectations vs reality thing.<p>The marketing still goes on about continuous inherent improvement due to the model itself, whereas most improvements today are due to better scaffolding. The key now is to build tooling around these LLMs to make them reliably productive - whatever level that may be at.<p>While claude code is one such tool, after a point the tooling is going to become company specific. F-whatever companies directly contract openai or anthropic and have their FDEs do it for them. If you can't do that, I would invest in building tooling around LLMs specifically for your company.<p>Note that LLMs are approximate retrieval machines. You still need a planner* and a verifier around it. Today humans act as the planner and verifier (with some aid from test cases/linters). Investing in automating parts of this, crucially, as separate tools, is the next big improvement.<p>* By planning, I mean trying out solutions, rolling them back[1], and using what you learned to do better next time. The solution search process. Context management also falls under this.<p>[1] and no, LLMs going "wait no..." doesn't count.</p>
]]></description><pubDate>Tue, 07 Apr 2026 05:52:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47671177</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47671177</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47671177</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Got kicked out of uni and had the cops called for a social media website I made"]]></title><description><![CDATA[
<p>The law is pretty much redundant here. Even if he was connection-less, reserved, etc, the same problem would have happened.<p>See <a href="https://news.ycombinator.com/item?id=47668009">https://news.ycombinator.com/item?id=47668009</a></p>
]]></description><pubDate>Tue, 07 Apr 2026 05:39:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47671108</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47671108</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47671108</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Got kicked out of uni and had the cops called for a social media website I made"]]></title><description><![CDATA[
<p>The admin behaviour is expected in the Indian context. See my other comment.<p><a href="https://news.ycombinator.com/item?id=47668009">https://news.ycombinator.com/item?id=47668009</a></p>
]]></description><pubDate>Tue, 07 Apr 2026 05:38:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47671101</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47671101</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47671101</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Got kicked out of uni and had the cops called for a social media website I made"]]></title><description><![CDATA[
<p>Quite the personality this kid, I must say.<p>The admin behaviour is expected in an Indian context, provided you behave the way this guy did. I am not saying it's good to snatch the guys phone, but it's expected.<p>Let me explain the core issue here.<p>The issue is that if the platform ever devolves into something that can be construed as cyberbullying, then the admin is suddenly in trouble.<p>In the Indian context, elite public colleges like IITD have some students from quite poor non urban backgrounds, These colleges are cheap, have a strict entrance exam (JEE) and there's no money requirement so you have people from all financial strata. As such, the social dynamic is that the parents "entrust" the college with "taking care" of their kid. Especially in first generation educated. In contrast, in private colleges with homogenous, richer families the social dynamic puts more responsibility on the student. The age of 18 is completely irrelevant in this dynamic.<p>The point is, the admin in this college is also somewhat of a caretaker of the students. And will face social liability for cyberbullying "happening under their nose". This is true even if it happens on reddit by the way (and the bully is in the same college). Essentially, if there is a way for the dean to intervene and he doesn't, he has failed in his job as a caretaker. That's the dynamic here. Obviously he has deniability if some random american bullies a IITD kid on say HN. But if a IITD kid bullies a IITD kid on any social platform they will come down on it heavily.<p>Thus, the platform was never going to work and it's problematic before the law even comes into play. Talking about "tell me what rule I broke" without considering the above social dynamics is fairly immature. If they had done the same thing at say an Ashoka University (expensive private college) then they would have faced none of these issues by contrast. If I'm allowed a swipe at the author, this situation is entirely expected given their privileged background.</p>
]]></description><pubDate>Mon, 06 Apr 2026 22:13:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668009</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47668009</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668009</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Nvim-treesitter (13K+ Stars) is Archived"]]></title><description><![CDATA[
<p>This is why I built nvim from source, and git pull plugins into the pack directory. I think it's even a static binary. Whatever changes I need I git pull. After they added LSP I have not wished for anything else really, so I stopped pulling. I think I pulled LSP completion API in 0.11 era but that's it.<p>Hate it when people break backwards compatibility. For me it's sacrosanct, more important than absolutely anything else.<p>I only have a handful of plugins so the system works well. And I have a 500 line init.vim (and no other config).<p>Some ecosystems like golang share this principle and so I can freely update packages without worrying about breakages. But other ecosystems(nvim, python, etc) I'm a lone warrior</p>
]]></description><pubDate>Sun, 05 Apr 2026 05:30:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=47646374</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47646374</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47646374</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Embarrassingly simple self-distillation improves code generation"]]></title><description><![CDATA[
<p>There's an obvious baseline which seems missing<p>If you sample from the base model with T=1.6, top_k=20, top_p=0.8, i.e, the decode settings used for the distillation's ground truth, does it match the SSD'd model + some decoding? Performance wise.<p>Their sweep is missing this. And only covers "standard" decoding settings.</p>
]]></description><pubDate>Sat, 04 Apr 2026 16:18:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47640408</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47640408</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47640408</guid></item><item><title><![CDATA[New comment by porridgeraisin in "Understanding young news audiences at a time of rapid change"]]></title><description><![CDATA[
<p>I do the same in my 20s :)<p>We actually had a newspaper at home and I used to sneak a peek, atleast the sports page. But that stopped during Covid and we never renewed it... No one in the house is missing it so far. And we don't really watch news channels (our TV is just a few streaming subscriptions). At max I see a few headlines on social media recommendations. And I don't use twitter. Meaning no news. Its great highly recommended.</p>
]]></description><pubDate>Fri, 03 Apr 2026 20:20:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47631721</link><dc:creator>porridgeraisin</dc:creator><comments>https://news.ycombinator.com/item?id=47631721</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47631721</guid></item></channel></rss>