Hacker News: pleshkov

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Sat, 09 May 2026 23:06:24 +0000

I agree that we don't want to reconstruct the whole vector while retrieval and it makes poly-AE toy-like at the current state non production ready. My main interest here in the just taking more recall pp in closed form. And then think about how to make it fast. In all threads I got a good intermediate thoughts about the topic which may help me to bring to closer to production form

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Sat, 09 May 2026 22:29:13 +0000

Just checked the normalization point. You were partially right, sqrt-normalization makes the difference x2 less. I'm updating the numbers in the post. Interesting moment. I did a smoke test of poly-AE without whitening, and the result didn't change. I won't mention it in the post cause right now I'm not sure if it's a random effect or really a polynomial lift compensates normalization

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Sat, 09 May 2026 10:18:59 +0000

The polynomial lift in this post originally came out of an unsuccessful experiment with hyperbolic embeddings. The idea was to embed corpora into a hyperbolic ball (anisotropic embeddings have a tree-like structure that hyperbolic space could exploit). The lift was a tool to go from hyperbolic latent back to Euclidean for retrieval. Hyperbolic part didn't work; the lift evaluated standalone kept showing real signal, and that became this post.

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Fri, 08 May 2026 15:33:38 +0000

Fair point — lam is technically a hyperparameter. In practice I used lam=1e-3 (the default in the code) across all four models without tuning, and the gap to PCA is robust enough that small variations don't change the conclusion. So more accurately: "one hyperparameter with a benign default" rather than "no hyperparameters" — you're right I overstated.

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Fri, 08 May 2026 15:23:39 +0000

Good catch, this is the obvious ablation I should have included. I'll re-run with per-axis normalized PCA as a separate baseline and post numbers in this thread tomorrow. Prior: I expect some of the gap to come from normalization, but not all — the no-improvement results on isotropic datasets (§4) suggest there's structural signal the polynomial cross-terms catch that linear projection structurally can't. But that's a prediction; let me actually run it.

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Fri, 08 May 2026 15:16:09 +0000

Author here. Fair characterization, and a fair critique on the geometric story. A few clarifications. I don't claim {x_i, x_i·x_j} is the right lift specifically — the post itself shows datasets where the quadratic decoder gives essentially no improvement over PCA. The contribution is empirical: "second-order is the simplest nonlinear decoder you can fit in closed form, and on anisotropic embeddings it picks up real signal that linear decoders miss." Whether degree 3 would help further is open. Degree 3 blows up fast: at d=100 that's 175K features, and the Ridge solve at that scale starts memorizing the corpus rather than generalizing (§7 in the post discusses this trap at d=256 already). So degree 2 is partly a choice, partly a practical ceiling for the closed-form route.

New comment by pleshkov in "A polynomial autoencoder beats PCA on transformer embeddings"

pleshkov — Tue, 05 May 2026 11:32:06 +0000

Author here — questions and pushback both welcome.