Hacker News: hashta

New comment by hashta in "A Theory of Deep Learning"

hashta — Wed, 06 May 2026 21:04:25 +0000

Interesting read. I remember the grokking paper when it came out but I don't think I've ever seen that classic grokking loss curve in my own hands on real data. Curious if others have seen it more often in practice

New comment by hashta in "Apple picks Gemini to power Siri"

hashta — Mon, 12 Jan 2026 20:43:32 +0000

this also addresses something else ...

apple to some users "are you leaving for android because of their ai assistant? don’t leave we are bringing it to iphone"

New comment by hashta in "Apple picks Gemini to power Siri"

hashta — Mon, 12 Jan 2026 19:55:16 +0000

I’m a long time Android user and almost switched to iPhone last year. Mostly because I use macOS and wanted better integration and also wanted to try it. Another big factor was the AI assistant. I stayed with Android because I think Google will win here. Apple will probably avoid losing users to their biggest competitor by reaching rough parity using the same models

New comment by hashta in "SimpleFold: Folding proteins is simpler than you think"

hashta — Fri, 26 Sep 2025 23:50:35 +0000

It’s literally called "SimpleFold". But that’s not really my point, from your earlier comment (".. go through all the complexities first to find the generalized and simpler formulations"), I got the impression you thought the simplicity came purely from architectural insights. My point was just that to compare apples to apples, a model claiming "simpler but just as good" should ideally train on the same kind of data as AF or at least acknowledge very clearly that substantial amount of its training data comes from AF.

I’m not trying to knock the work, I think it’s genuinely cool and a great engineering result. I just wanted to flag that nuance for readers who might not have the time or background to spot it, and I get that part of the "simple/simpler" messaging is also about attracting attention which clearly worked!

New comment by hashta in "SimpleFold: Folding proteins is simpler than you think"

hashta — Fri, 26 Sep 2025 20:41:11 +0000

To people outside the field, the title/abstract can make it sound like folding is just inherently simple now, but this model wouldn’t exist without the large synthetic dataset produced by the more complex AF. The "simple" architecture is still using the complex model indirectly through distillation. We didn’t really extract new tricks to design a simpler model from scratch, we shifted the complexity from the model space into the data space (think GPT-5 => GPT-5-mini, there’s no GPT-5-mini without GPT-5)

New comment by hashta in "SimpleFold: Folding proteins is simpler than you think"

hashta — Fri, 26 Sep 2025 19:40:59 +0000

I’m not sure AF3’s performance would hold up if it hadn’t been trained on data from AF2 which itself bakes in a lot of inductive bias like equivariance

New comment by hashta in "SimpleFold: Folding proteins is simpler than you think"

hashta — Fri, 26 Sep 2025 19:32:05 +0000

One caveat that’s easy to miss: the "simple" model here didn’t just learn folding from raw experimental structures. Most of its training data comes from AlphaFold-style predictions. Millions of protein structures that were themselves generated by big MSA-based and highly engineered models.

It’s not like we can throw away all the inductive biases and MSA machinery, someone upstream still had to build and run those models to create the training corpus.

New comment by hashta in "Replace OCR with Vision Language Models"

hashta — Thu, 27 Feb 2025 00:48:48 +0000

An effective way that usually increases accuracy is to use an ensemble of capable models that are trained independently (e.g., gemini, gpt-4o, qwen). If >x% of them have the same output, accept it, otherwise reject and manually review

New comment by hashta in "Why do tree-based models still outperform deep learning on tabular data? (2022)"

hashta — Wed, 06 Mar 2024 02:54:08 +0000

To both questions above, just simple averaging of the logits (classification) or raw outputs (regressions) usually works well. If I had to guess why people don't use this approach often in kaggle competitions is the relative difficulty of training an ensemble of NNs. Also, NNs are a bit more sensitive to the type of features used and their distribution relative to decision trees (DTs).

Ensemble models work well because they reduce both bias & variance errors. Like DTs, NNs have low bias errors and high variance errors when used individually. The variance error drops as you use more learners (DTs/NNs) in the ensemble. Also, the more diverse the learners, the lower the overall error.

Simple ways to promote the diversity of the NNs in the ensemble is to start their weights from different random seeds and train each one of them on a random sample from the overall training set (say 70-80% w/o replacement).

New comment by hashta in "Why do tree-based models still outperform deep learning on tabular data? (2022)"

hashta — Tue, 05 Mar 2024 21:51:28 +0000

I have a lot of experience working with both families of models. If you use an ensemble of 10 NNs, they outperform well-optimized tree-based models such as XGBoost & RFs.