Hacker News: hackpert

New comment by hackpert in "Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training"

hackpert — Thu, 19 Mar 2026 12:46:47 +0000

We found evidence of specific layer-localized "reasoning" circuits in a few models last year too! A very much work-in-progress paper is here: https://openreview.net/forum?id=mTjGBrkdtz

New comment by hackpert in "Perfectly Replicating Coca Cola [video]"

hackpert — Mon, 12 Jan 2026 02:46:26 +0000

Huh there is so much limonene in Coca Cola?! Limonene works as a very good…pesticide and herbicide! I did a research project on limonene like 10 years ago with my mentor and it outperformed most commercial pesticides in controlled settings. It really can't be that great to ingest.

New comment by hackpert in "The Q, K, V Matrices"

hackpert — Thu, 08 Jan 2026 12:30:29 +0000

These metaphorical database analogies bug me, and from what it seems like, a lot of other people in comments! So far some of the most reasonable explanations I have found that take training dynamics into account are from Lenka Zdeborova's lab (albeit in toy, linear attention settings but it's easy to see why they generalize to practical ones). For instance, this is a lovely paper: https://arxiv.org/abs/2509.24914

New comment by hackpert in "DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning [pdf]"

hackpert — Mon, 01 Dec 2025 11:20:07 +0000

Hi! Did you ever end up running this reproduction? If yes, could you also check if the Putnam/IMO problems are in the training data perhaps by trying to have it complete the problems n times? I would totally do this myself if I weren’t GPU poor!

New comment by hackpert in "Meta Ray-Ban Display"

hackpert — Fri, 19 Sep 2025 14:01:35 +0000

New comment by hackpert in "OpenAI O3 breakthrough high score on ARC-AGI-PUB"

hackpert — Sat, 21 Dec 2024 04:36:42 +0000

If anyone else is curious about which ARC-AGI public eval puzzles o3 got right vs wrong (and its attempts at the ones it did get right), here's a quick visualization: https://arcagi-o3-viz.netlify.app

New comment by hackpert in "Orion, our first true augmented reality glasses"

hackpert — Sat, 28 Sep 2024 03:17:39 +0000

Sorry never mind! I wasn't thinking at all when I wrote that thought, but that obviously doesn't make sense :)

New comment by hackpert in "Orion, our first true augmented reality glasses"

hackpert — Thu, 26 Sep 2024 17:35:28 +0000

That's fair but what if you could estimate the direction of incoming light with other sensors? Using inverse diffraction etc. Just a thought

New comment by hackpert in "Qwen2-Math"

hackpert — Fri, 09 Aug 2024 06:09:48 +0000

Hi! I've been working on theorem proving systems for some time now. I would love to help out with an AlphaProof reproduction, but I can't reach you on discord for some reason!

New comment by hackpert in "The 10,000x YOLO Researcher Metagame – With Yi Tay of Reka"

hackpert — Fri, 05 Jul 2024 23:47:52 +0000

Thank you, those insights are invaluable! This is a specific and potentially dumb question and I completely understand if you can't answer it!

The practical motivation for MoEs is very clear but I do worry about loss of compositional abilities (that I think just emerge from superposed representations?) that some tasks may require, especially with the many experts phenomenon we're seeing. This is an observation from smaller MoE models (with like top-k gating etc.) that may or may not scale, that denser models trained to the same loss tend to perform complex tasks "better".

Intuitively, do you think MoEs are just another stopgap trick we're using while we figure out more compute, better optimizers or could there be enough theoretical motivation to justify their continued use? If there isn't, perhaps we need to at least figure out "expert scaling laws" :)

New comment by hackpert in "Getting 50% (SoTA) on Arc-AGI with GPT-4o"

hackpert — Tue, 18 Jun 2024 04:42:36 +0000

I'm not sure how to quantify how quickly or well humans learn in-context (if you know of any work on this I'd love to read it!)

In general, there is too much fluff and confusion floating around about what these models are and are not capable of (regardless of the training mechanism.) I think more people need to read Song Mei's lovely slides[1] and related work by others. These slides are the best exposition I've found of neat ideas around ICL that researchers have been aware of for a while.

[1] https://www.stat.berkeley.edu/~songmei/Presentation/Algorith...

New comment by hackpert in "Stable Diffusion 3: Research Paper"

hackpert — Wed, 06 Mar 2024 01:38:32 +0000

There has been some interesting work when it comes to distributed training. For example DiLoCo (https://arxiv.org/abs/2311.08105). I also know that Bittensor and nousresearch collaborated on some kind of competitive distributed model frankensteining-training thingy that seems to be going well. https://bittensor.org/bittensor-and-nous-research/

Of course it gets harder as models get larger but distributed training doesn't seem totally infeasible. For example if we were to talk about MoE transformer models, perhaps separate slices of the model can be trained in an asynchronous manner and then combined with some retraining. You can have minimal regular communication about say, mean and variance for each layer and a new loss term dependent on these statistics to keep the "expertise" for each contributor distinct.

New comment by hackpert in "GeneGPT, a tool-augmented LLM for bioinformatics"

hackpert — Tue, 13 Feb 2024 02:53:05 +0000

I tried to do this ages ago in 2018 by adapting OpenAI's flow architecture and it sort of seemed to work (was at least promising). With today's models with a significantly more disentangled latent space it should be much easier to do! I saw a transformer trained on the UK Biobank recently, excited for this space!

New comment by hackpert in "Bard is much worse at puzzle solving than ChatGPT"

hackpert — Wed, 22 Mar 2023 05:56:21 +0000

Wow I had hoped for a more productive discussion than these 1-1 comparisons of Bard vs ChatGPT that I'm seeing everywhere. The model deployed with this version of Bard is clearly a smaller model than the biggest LaMDA/PaLM models Google has been working on for ages. Which, according to their publications, show unprecedented results on _proof writing_ of all things (see Minerva). While their strategic decisions may be questionable (or they're just trying to quantize the model for mass deployment without burning billions per month in compute costs), its almost silly to question Google's ability to build useful LLMs.

New comment by hackpert in "Metaphor Systems: A search engine based on generative AI"

hackpert — Fri, 11 Nov 2022 08:44:52 +0000

I've been using Metaphor for a few weeks now and have almost entirely switched from Google and other search engines. Keyword based search simply doesn't come close when it comes to getting the _right_ results. While I have to sift through a few pages of results on Google and then maybe find what I'm looking for, on metaphor, there's almost no SEO spam or Wikipedia-style links dominating the top results. It directs you to sources that are relevant to your search query. I don't know how they did this (probably a lot of very specific and targeted tricks), but Alex and team have created a marvelous product and I'm excited to see where this goes! Congrats on the launch!

New comment by hackpert in "On AlphaTensor’s new matrix multiplication algorithms"

hackpert — Fri, 07 Oct 2022 22:19:04 +0000

Right, but doesn't that mean that it could potentially be used for designing algorithms that have componentwise numerical stability over some kind of floating point standard, but this, by definition being a result over finite fields, should be numerically stable?

(apologies if I misunderstood, I wasn't calling you out specifically but a generalized misconception I've noticed in a lot of other discussions so far)

New comment by hackpert in "On AlphaTensor’s new matrix multiplication algorithms"

hackpert — Fri, 07 Oct 2022 16:27:37 +0000

While your point about numerical stability is correct in general, there are no numerical stability issues here and I think this conception, which I've seen in more than one place now, stems from a fundamental misunderstanding of the paper's results. While they _did_ come up with a faster TPU/GPU algorithm too, the primary result is not a fast matmul approximation, it is an exact algorithm comprising of stepwise addition/multiplication operations, and hence is numerically stable and should work for any ring (https://ncatlab.org/nlab/show/ring). AlphaTensor itself does not do the matrix multiplication, it was used to perform an (efficiently pruned) tree search over the space of operations to find an efficient, stable algorithm.

New comment by hackpert in "Why peer to peer digital payment system UPI should remain free in India"

hackpert — Tue, 13 Sep 2022 14:11:22 +0000

UPI's penetration in (urban and semi-urban anyway) India is honestly incredible. I worked with/on the tech when it was very nascent in 2016 and in 2022, on visiting India after a long time I was stunned to see _everyone_ has a little PayTM QR code card. Vegetable vendors, taxi drivers, roadside hawkers, small business owners. It's brilliant because the system is banking the traditionally unbanked, and generating tremendous amounts of data that can hopefully be put to good use by economists, honestly unlike any other system in the world. Even MPesa in Africa has stupidly high withdrawal and transaction fees, which UPI doesn't need at all.

New comment by hackpert in "The Follower: Using open cameras and AI to find how an Instagram photo is taken"

hackpert — Mon, 12 Sep 2022 13:18:07 +0000

That is actually pretty freaking cool. Not hard to do by any means, just cool.

New comment by hackpert in "Ask HN: Why is there no performant remote desktop for Mac/Linux?"

hackpert — Sat, 20 Aug 2022 13:49:43 +0000

x2go works quite well actually. Its definitely not like local but it does the job.