<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: hackpert</title><link>https://news.ycombinator.com/user?id=hackpert</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 10:22:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=hackpert" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by hackpert in "Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training"]]></title><description><![CDATA[
<p>We found evidence of specific layer-localized "reasoning" circuits in a few models last year too! A very much work-in-progress paper is here: <a href="https://openreview.net/forum?id=mTjGBrkdtz" rel="nofollow">https://openreview.net/forum?id=mTjGBrkdtz</a></p>
]]></description><pubDate>Thu, 19 Mar 2026 12:46:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47438449</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=47438449</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47438449</guid></item><item><title><![CDATA[New comment by hackpert in "Perfectly Replicating Coca Cola [video]"]]></title><description><![CDATA[
<p>Huh there is so much limonene in Coca Cola?! Limonene works as a very good…pesticide and herbicide! I did a research project on limonene like 10 years ago with my mentor and it outperformed most commercial pesticides in controlled settings. It really can't be that great to ingest.</p>
]]></description><pubDate>Mon, 12 Jan 2026 02:46:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=46583361</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=46583361</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46583361</guid></item><item><title><![CDATA[New comment by hackpert in "The Q, K, V Matrices"]]></title><description><![CDATA[
<p>These metaphorical database analogies bug me, and from what it seems like, a lot of other people in comments! So far some of the most reasonable explanations I have found that take training dynamics into account are from Lenka Zdeborova's lab (albeit in toy, linear attention settings but it's easy to see why they generalize to practical ones). For instance, this is a lovely paper: <a href="https://arxiv.org/abs/2509.24914" rel="nofollow">https://arxiv.org/abs/2509.24914</a></p>
]]></description><pubDate>Thu, 08 Jan 2026 12:30:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=46540285</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=46540285</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46540285</guid></item><item><title><![CDATA[New comment by hackpert in "DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]"]]></title><description><![CDATA[
<p>Hi! Did you ever end up running this reproduction? If yes, could you also check if the Putnam/IMO problems are in the training data perhaps by trying to have it complete the problems n times? I would totally do this myself if I weren’t GPU poor!</p>
]]></description><pubDate>Mon, 01 Dec 2025 11:20:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46106088</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=46106088</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46106088</guid></item><item><title><![CDATA[New comment by hackpert in "Meta Ray-Ban Display"]]></title><description><![CDATA[
<p>O</p>
]]></description><pubDate>Fri, 19 Sep 2025 14:01:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=45301791</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=45301791</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45301791</guid></item><item><title><![CDATA[New comment by hackpert in "OpenAI O3 breakthrough high score on ARC-AGI-PUB"]]></title><description><![CDATA[
<p>If anyone else is curious about which ARC-AGI public eval puzzles o3 got right vs wrong (and its attempts at the ones it did get right), here's a quick visualization: <a href="https://arcagi-o3-viz.netlify.app" rel="nofollow">https://arcagi-o3-viz.netlify.app</a></p>
]]></description><pubDate>Sat, 21 Dec 2024 04:36:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=42477578</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=42477578</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42477578</guid></item><item><title><![CDATA[New comment by hackpert in "Orion, our first true augmented reality glasses"]]></title><description><![CDATA[
<p>Sorry never mind! I wasn't thinking at all when I wrote that thought, but that obviously doesn't make sense :)</p>
]]></description><pubDate>Sat, 28 Sep 2024 03:17:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=41677590</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=41677590</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41677590</guid></item><item><title><![CDATA[New comment by hackpert in "Orion, our first true augmented reality glasses"]]></title><description><![CDATA[
<p>That's fair but what if you could estimate the direction of incoming light with other sensors? Using inverse diffraction etc. Just a thought</p>
]]></description><pubDate>Thu, 26 Sep 2024 17:35:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=41661054</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=41661054</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41661054</guid></item><item><title><![CDATA[New comment by hackpert in "Qwen2-Math"]]></title><description><![CDATA[
<p>Hi! I've been working on theorem proving systems for some time now. I would love to help out with an AlphaProof reproduction, but I can't reach you on discord for some reason!</p>
]]></description><pubDate>Fri, 09 Aug 2024 06:09:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=41199176</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=41199176</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41199176</guid></item><item><title><![CDATA[New comment by hackpert in "The 10,000x YOLO Researcher Metagame – With Yi Tay of Reka"]]></title><description><![CDATA[
<p>Thank you, those insights are invaluable! This is a specific and potentially dumb question and I completely understand if you can't answer it!<p>The practical motivation for MoEs is very clear but I do worry about loss of compositional abilities (that I think just emerge from superposed representations?) that some tasks may require, especially with the many experts phenomenon we're seeing. This is an observation from smaller MoE models (with like top-k gating etc.) that may or may not scale, that denser models trained to the same loss tend to perform complex tasks "better".<p>Intuitively, do you think MoEs are just another stopgap trick we're using while we figure out more compute, better optimizers or could there be enough theoretical motivation to justify their continued use? If there isn't, perhaps we need to at least figure out "expert scaling laws" :)</p>
]]></description><pubDate>Fri, 05 Jul 2024 23:47:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=40887071</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=40887071</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40887071</guid></item><item><title><![CDATA[New comment by hackpert in "Getting 50% (SoTA) on Arc-AGI with GPT-4o"]]></title><description><![CDATA[
<p>I'm not sure how to quantify how quickly or well humans learn in-context (if you know of any work on this I'd love to read it!)<p>In general, there is too much fluff and confusion floating around about what these models are and are not capable of (regardless of the training mechanism.) I think more people need to read Song Mei's lovely slides[1] and related work by others. These slides are the best exposition I've found of neat ideas around ICL that researchers have been aware of for a while.<p>[1] <a href="https://www.stat.berkeley.edu/~songmei/Presentation/Algorithm_approx_BSJC.pdf" rel="nofollow">https://www.stat.berkeley.edu/~songmei/Presentation/Algorith...</a></p>
]]></description><pubDate>Tue, 18 Jun 2024 04:42:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=40714144</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=40714144</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40714144</guid></item><item><title><![CDATA[New comment by hackpert in "Stable Diffusion 3: Research Paper"]]></title><description><![CDATA[
<p>There has been some interesting work when it comes to distributed training. For example DiLoCo (<a href="https://arxiv.org/abs/2311.08105" rel="nofollow">https://arxiv.org/abs/2311.08105</a>). I also know that Bittensor and nousresearch collaborated on some kind of competitive distributed model frankensteining-training thingy that seems to be going well. <a href="https://bittensor.org/bittensor-and-nous-research/" rel="nofollow">https://bittensor.org/bittensor-and-nous-research/</a><p>Of course it gets harder as models get larger but distributed training doesn't seem totally infeasible. For example if we were to talk about MoE transformer models, perhaps separate slices of the model can be trained in an asynchronous manner and then combined with some retraining. You can have minimal regular communication about say, mean and variance for each layer and a new loss term dependent on these statistics to keep the "expertise" for each contributor distinct.</p>
]]></description><pubDate>Wed, 06 Mar 2024 01:38:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=39611271</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=39611271</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39611271</guid></item><item><title><![CDATA[New comment by hackpert in "GeneGPT, a tool-augmented LLM for bioinformatics"]]></title><description><![CDATA[
<p>I tried to do this ages ago in 2018 by adapting OpenAI's flow architecture and it sort of seemed to work (was at least promising). With today's models with a significantly more disentangled latent space it should be much easier to do! I saw a transformer trained on the UK Biobank recently, excited for this space!</p>
]]></description><pubDate>Tue, 13 Feb 2024 02:53:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=39353742</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=39353742</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39353742</guid></item><item><title><![CDATA[New comment by hackpert in "Bard is much worse at puzzle solving than ChatGPT"]]></title><description><![CDATA[
<p>Wow I had hoped for a more productive discussion than these 1-1 comparisons of Bard vs ChatGPT that I'm seeing everywhere. The model deployed with this version of Bard is clearly a smaller model than the biggest LaMDA/PaLM models Google has been working on for ages. Which, according to their publications, show unprecedented results on _proof writing_ of all things (see Minerva). While their strategic decisions may be questionable (or they're just trying to quantize the model for mass deployment without burning billions per month in compute costs), its almost silly to question Google's ability to build useful LLMs.</p>
]]></description><pubDate>Wed, 22 Mar 2023 05:56:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=35257338</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=35257338</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35257338</guid></item><item><title><![CDATA[New comment by hackpert in "Metaphor Systems: A search engine based on generative AI"]]></title><description><![CDATA[
<p>I've been using Metaphor for a few weeks now and have almost entirely switched from Google and other search engines. Keyword based search simply doesn't come close when it comes to getting the _right_ results. While I have to sift through a few pages of results on Google and then maybe find what I'm looking for, on metaphor, there's almost no SEO spam or Wikipedia-style links dominating the top results. It directs you to sources that are relevant to your search query. I don't know how they did this (probably a lot of very specific and targeted tricks), but Alex and team have created a marvelous product and I'm excited to see where this goes! Congrats on the launch!</p>
]]></description><pubDate>Fri, 11 Nov 2022 08:44:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=33558497</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=33558497</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33558497</guid></item><item><title><![CDATA[New comment by hackpert in "On AlphaTensor’s new matrix multiplication algorithms"]]></title><description><![CDATA[
<p>Right, but doesn't that mean that it could potentially be used for designing algorithms that have componentwise numerical stability over some kind of floating point standard, but this, by definition being a result over finite fields, should be numerically stable?<p>(apologies if I misunderstood, I wasn't calling you out specifically but a generalized misconception I've noticed in a lot of other discussions so far)</p>
]]></description><pubDate>Fri, 07 Oct 2022 22:19:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=33127544</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=33127544</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33127544</guid></item><item><title><![CDATA[New comment by hackpert in "On AlphaTensor’s new matrix multiplication algorithms"]]></title><description><![CDATA[
<p>While your point about numerical stability is correct in general, there are no numerical stability issues here and I think this conception, which I've seen in more than one place now, stems from a fundamental misunderstanding of the paper's results. While they _did_ come up with a faster TPU/GPU algorithm too, the primary result is not a fast matmul approximation, it is an exact algorithm comprising of stepwise addition/multiplication operations, and hence is numerically stable and should work for any ring (<a href="https://ncatlab.org/nlab/show/ring" rel="nofollow">https://ncatlab.org/nlab/show/ring</a>). AlphaTensor itself does not do the matrix multiplication, it was used to perform an (efficiently pruned) tree search over the space of operations to find an efficient, stable algorithm.</p>
]]></description><pubDate>Fri, 07 Oct 2022 16:27:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=33123489</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=33123489</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33123489</guid></item><item><title><![CDATA[New comment by hackpert in "Why peer to peer digital payment system UPI should remain free in India"]]></title><description><![CDATA[
<p>UPI's penetration in (urban and semi-urban anyway) India is honestly incredible. I worked with/on the tech when it was very nascent in 2016 and in 2022, on visiting India after a long time I was stunned to see _everyone_ has a little PayTM QR code card. Vegetable vendors, taxi drivers, roadside hawkers, small business owners. It's brilliant because the system is banking the traditionally unbanked, and generating tremendous amounts of data that can hopefully be put to good use by economists, honestly unlike any other system in the world. Even MPesa in Africa has stupidly high withdrawal and transaction fees, which UPI doesn't need at all.</p>
]]></description><pubDate>Tue, 13 Sep 2022 14:11:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=32824506</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=32824506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=32824506</guid></item><item><title><![CDATA[New comment by hackpert in "The Follower: Using open cameras and AI to find how an Instagram photo is taken"]]></title><description><![CDATA[
<p>That is actually pretty freaking cool. Not hard to do by any means, just cool.</p>
]]></description><pubDate>Mon, 12 Sep 2022 13:18:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=32809700</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=32809700</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=32809700</guid></item><item><title><![CDATA[New comment by hackpert in "Ask HN: Why is there no performant remote desktop for Mac/Linux?"]]></title><description><![CDATA[
<p>x2go works quite well actually. Its definitely not like local but it does the job.</p>
]]></description><pubDate>Sat, 20 Aug 2022 13:49:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=32532046</link><dc:creator>hackpert</dc:creator><comments>https://news.ycombinator.com/item?id=32532046</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=32532046</guid></item></channel></rss>