Hacker News: mlpro

New comment by mlpro in "Ideas are cheap, execution is cheaper"

mlpro — Fri, 16 Jan 2026 03:09:19 +0000

Novel Ideas are never cheap, lol.

New comment by mlpro in "Universal Reasoning Model (53.8% pass 1 ARC1 and 16.0% ARC 2)"

mlpro — Tue, 23 Dec 2025 07:31:33 +0000

Lol. trying to copy the Universal Weight Subspace paper's naming to get famous.

New comment by mlpro in "Coarse is better"

mlpro — Sun, 21 Dec 2025 23:07:36 +0000

Lol, yeah.

New comment by mlpro in "TRELLIS.2: state-of-the-art large 3D generative model (4B)"

mlpro — Sun, 21 Dec 2025 23:03:31 +0000

Oh, look - a new 3D model with a new idea - more data.

New comment by mlpro in "You have reached the end of the internet (2006)"

mlpro — Sun, 21 Dec 2025 23:01:03 +0000

I don't understand.

New comment by mlpro in "Waymo halts service during S.F. blackout after causing traffic jams"

mlpro — Sun, 21 Dec 2025 23:00:18 +0000

Waymo should do a bit more research in reliability and explainability of their AI models.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Thu, 11 Dec 2025 02:00:30 +0000

Read the paper end to end today. I think its the most outrageous ideas of 2025 - at least amongst the papers I've read. So counterintuitive initially and yet so intuitive. Personally, kinda hate the implications. But, a paper like this was definitely needed.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 16:04:48 +0000

They are not trained on the same data. Even a skim of the paper shows very disjoint data.

The LLMs are finetuned on very disjoint data. I checked some are on Chinese and other are for Math. The pretrained model provides a good initialization. I'm convinced.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 16:00:09 +0000

I think its very surprising, although I would like the paper to show more experiments (they already have a lot, i know).

The ViT models are never really trained from scratch - they are always finetuned as they require large amounts of data to converge nicely. The pretraining just provides a nice initialization. Why would one expect two ViT's finetuned on two different things - image and text classification end up in the same subspace as they show? I think this is groundbreaking.

I don't really agree with the drift far from the parent model idea. I think they drift pretty far in terms of their norms. Even the small LoRA adapters drift pretty far from the base model.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 05:50:53 +0000

Why would they be similar if they are trained on very different data? Also, trained from scratch models are also analyzed, imo.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 05:38:08 +0000

It's about weights/parameters, not representations.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 03:13:44 +0000

The analysis is on image classification, LLMs, Diffusion models, etc.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 03:12:48 +0000

It does seem to be working for novel tasks.

New comment by mlpro in "The universal weight subspace hypothesis"

mlpro — Tue, 09 Dec 2025 03:11:56 +0000

Not really. If the models are trained on different dataset - like one ViT trained on satellite images and another on medical X-rays - one would expect their parameters, which were randomly initialized to be completely different or even orthogonal.