Hacker News: noahho

New comment by noahho in "Ask HN: Who is hiring? (April 2026)"

noahho — Wed, 01 Apr 2026 19:17:16 +0000

Prior Labs | Berlin / Freiburg / NYC | ONSITE | Full-time | Multiple Roles

Tables power every clinical trial, financial model, and scientific experiment, but deep learning has mostly ignored them. No natural sequence, no spatial structure, no shared vocabulary across datasets. LLM architectures don't transfer. We built TabPFN, the first foundation model that actually understands tabular data (published in Nature, 3M+ downloads, new SOTA for tabular ML). The hardest problems are still open.

The model is half the product. The other half - training infrastructure, real-time serving, developer platform, reliability - is what turns a research breakthrough into something enterprises trust in production. We're hiring across both.

- ML Engineer, Training Infrastructure — Own GPU infrastructure, distributed training performance, and the developer productivity layer (CI, experiment tracking, model registry) that keeps research moving fast.

- Full Stack Engineer, ML Platform — Build the product that puts tabular foundation models in users' hands, from data upload through inference and results. You'll work across frontend, backend, and directly with the research team to turn new model capabilities into production features.

- Research Engineer, Foundation Model — Design experiments, run ablations, build training infrastructure, contribute to papers. Research engineers here aren't supporting scientists — they are the science team.

- ML Engineer, Cloud Platform — Design and scale the core infrastructure for serving and finetuning foundation models in production. Early enough that you're making the architecture decisions, not inheriting them.

Also hiring: Research Scientist, Applied Scientist, Forward Deployed ML Engineer, Developer Relations Engineer, AE, BDR.

20-person team selected from thousands applicants. Backgrounds from Jane Street, Google, CERN, G-Research. Led by Frank Hutter, advised by Yann LeCun and Bernhard Schölkopf. With backing from leaders at Hugging Face, DeepMind, and Black Forest Labs, XTX Ventures & Balderton.

All roles: https://jobs.priorlabs.ai

New comment by noahho in "Ask HN: Who is hiring? (March 2026)"

noahho — Mon, 02 Mar 2026 16:08:52 +0000

Prior Labs | Berlin / Freiburg / NYC | ONSITE & REMOTE (EU) | Full-time

ML Engineer, Cloud Platform — Design and scale the core infrastructure for serving and finetuning foundation models in production. Early enough that you're making the architecture decisions, not inheriting them.

ML Engineer, Training Infrastructure — Own GPU infrastructure, distributed training performance, and the developer productivity layer (CI, experiment tracking, model registry) that keeps research moving fast.

Full Stack Engineer, ML Platform — Build the product that puts tabular foundation models in users' hands, from data upload through inference and results. You'll work across frontend, backend, and directly with the research team to turn new model capabilities into production features.

Research Engineer, Foundation Model — Design experiments, run ablations, build training infrastructure, contribute to papers. Research engineers here aren't supporting scientists — they are the science team.

Also hiring: Research Scientist, Applied Scientist, Forward Deployed ML Engineer, Developer Relations Engineer, AE, BDR.

20-person team selected from thousands applicants. Backgrounds from Jane Street, Google, CERN, G-Research. Led by Frank Hutter, advised by Yann LeCun and Bernhard Schölkopf. With backing from Balderton, XTX Ventures and leaders at Hugging Face, DeepMind, and Black Forest Labs. Comp competitive with top AI labs, meaningful equity.

Apply at: https://jobs.priorlabs.ai

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

noahho — Mon, 17 Nov 2025 11:38:20 +0000

Article URL: https://arxiv.org/abs/2511.08667

Comments URL: https://news.ycombinator.com/item?id=45952746

Points: 7

# Comments: 0

New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"

noahho — Thu, 06 Nov 2025 20:34:25 +0000

Yes exactly, the API is the best way to handle text features. The actual semantics often matter a lot . Is the API an option for you or would you need this local?

New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"

noahho — Thu, 06 Nov 2025 20:09:56 +0000

Less feature engineering is definitely something we are aiming for. The current version is actually only based on statistics, the real world connections between features is something we're working on right now and hope to show results for soon. That's the next step

New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"

noahho — Thu, 06 Nov 2025 20:07:14 +0000

When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about. Glad its useful to people!

New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"

noahho — Thu, 06 Nov 2025 19:39:15 +0000

TabPFN-2.5 default (one forward pass) matches AutoGluon 1.4 tuned for four-hours. Autogluon is the strongest AutoML including stacking of XGB and cat boost and even includes the previous TabPFNv2.

New comment by noahho in "Ask HN: Who is hiring? (May 2025)"

noahho — Mon, 05 May 2025 16:41:47 +0000

Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time

Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.

We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.

We're hiring a founding team to build this new category:

Software Engineer: Build scalable infra & APIs. Data Scientist / ML Engineer: Optimize & scale our TFMs. Product Manager: Define the vision for AI-native tabular tools. Developer Relations: Grow our community & drive adoption. Location: On-site in Berlin or Freiburg (we build together). Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.

Apply: https://jobs.ashbyhq.com/prior-labs

Questions? Reach out at noah@priorlabs.ai

AI's Blind Spot: The Structured Data Challenge

noahho — Thu, 24 Apr 2025 14:36:36 +0000

Article URL: https://priorlabs.ai/vision

Comments URL: https://news.ycombinator.com/item?id=43783345

Points: 2

# Comments: 0

New comment by noahho in "Ask HN: Who is hiring? (April 2025)"

noahho — Fri, 11 Apr 2025 11:59:55 +0000

Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time

We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.

We're hiring a founding team to build this new category:

Apply: https://jobs.ashbyhq.com/prior-labs

Questions? Reach out at noah@priorlabs.ai

New comment by noahho in "Ask HN: Who is hiring? (March 2025)"

noahho — Tue, 04 Mar 2025 09:52:14 +0000

Prior Labs | Software Engineer, Product Manager, Developer Relations, ML Engineer | On-Site (Berlin, Freiburg) | Full-time

AI has transformed text, images, and code—but structured data remains overlooked. Prior Labs is an early-stage startup building Foundation Models for tabular data, unlocking a new AI modality with the potential to transform science, finance, healthcare, and data science itself. Our model, published in Nature, is already state-of-the-art for small datasets, and we’re scaling this into a step-change for data science.

We’re backed by Balderton, XTX Ventures, and leaders from Hugging Face, Black Forest Labs, DeepMind, DataRobot, .. Our team includes world-class researchers and engineers from top AI labs, and we’re growing fast.

We're hiring founding engineers & builders to help define this new category:

Software Engineer – Build scalable infrastructure and APIs to integrate our models into real-world applications.

Product Manager – Define and execute the vision for AI-native tools for structured data.

Developer Relations – Grow the developer community, drive adoption, and showcase use cases.

ML Engineer – Optimize and scale our foundation models for structured data.

Location: On-site in Berlin or Freiburg (we believe in building together). Why Join? Competitive salary + equity, shape the future of foundation models for structured data from day one.

Apply now: https://jobs.ashbyhq.com/prior-labs

Questions? Reach out at noah@priorlabs.ai

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Mon, 13 Jan 2025 18:40:56 +0000

Thanks a lot! Currently have an issue on documenting how to use for more samples at https://github.com/PriorLabs/TabPFN/issues/129. Will do this soon, maybe give an upvote there if it matters to you.

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Mon, 13 Jan 2025 18:39:20 +0000

Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class probabilities in classification). Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Mon, 13 Jan 2025 18:30:01 +0000

if you're predicting on text data, our public models don't do that, they would encode as classes. Our API (https://github.com/PriorLabs/tabpfn-client/) has experimental support.

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Mon, 13 Jan 2025 18:29:09 +0000

Thanks a lot! We don't see clear artifacts for the synth data. Part of the "trick" is to keep the capacity of our model low, it has only about 11M parameters. That forces the model to "learn an in-context learning algorithm" or in other words "do in-context learning rather than in-weigthts learning". Adding real data on top will help, agreed! The synthetic data is very broad, we started by a synth data prior that was just BNNs samples with differing sizes and thus super broad. Our new data samples functions more densely that are simpler to explain but could still sample almost any function (with the constraints that our networks aren't infinitely complex).

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Fri, 10 Jan 2025 13:29:30 +0000

thanks a ton! If it's public please share in the Discord https://discord.com/channels/1285598202732482621/ > #use-cases (just created!), if not, mail me at noah@priorlabs.ai

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Fri, 10 Jan 2025 10:28:29 +0000

Looks like a great use case! We have a method specifically for imputation in the tabpfn-extensions package (https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da...). It needs some cleaning up before I want to highlight in the notebooks and docs.

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Fri, 10 Jan 2025 10:25:48 +0000

Up to 4 hrs of tuning per dataset / split (10-fold CV)

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Fri, 10 Jan 2025 10:20:33 +0000

Author here! The fundamental challenge is that LLMs like O1 and Claude 3.5 simply aren't built for the unique structures of tabular data. When processing tables through LLMs, the inefficiencies quickly become apparent - tokenizing a 10,000 x 100 table as a sequence and numerical values as tokens creates massive inefficiencies.

There's some interesting work on using LLMs for tabular data (TabLLM: https://proceedings.mlr.press/v206/hegselmann23a.html), but this only works for datasets with tens of samples rather than the thousands of rows needed in real-world applications.

What o1 and other LLMs typically do is wrap around existing tabular tools like XGBoost or scikit-learn. While this works, they're ultimately constrained by these tools' limitations. We're taking a fundamentally different approach - building foundation models that natively understand tabular relationships and patterns. Our approach combines the benefits of foundation models with architectures specifically designed for tabular data structures.

New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

noahho — Fri, 10 Jan 2025 10:02:09 +0000

Author here! The breast cancer dataset is simple and heavily saturated, so small differences between methods are expected. As you say, single-use examples can be noisy due to randomness in how the data is randomly split into training and testing sets especially for a saturated dataset like this one. Cross-validation reduces this variance by averaging over multiple splits. I just ran this below:

  TabPFN mean ROC AUC: 0.9973

  SVM mean ROC AUC: 0.9903

  TabPFN per split: [0.99737963 0.99639699 0.99966931 0.99338624 0.99966465]

  SVM per split: [0.99312152 0.98788077 0.99603175 0.98313492 0.99128102]

  from sklearn.model_selection import cross_val_score
  from tabpfn import TabPFNClassifier
  from sklearn.datasets import load_breast_cancer
  from sklearn.svm import LinearSVC
  import numpy as np

  data = load_breast_cancer()
  X, y = data.data, data.target

  # TabPFN
  tabpfn_clf = TabPFNClassifier()
  tabpfn_scores = cross_val_score(tabpfn_clf, X, y, cv=5, 
  scoring='roc_auc')
  print("TabPFN per split:", tabpfn_scores)
  print("TabPFN mean ROC AUC:", np.mean(tabpfn_scores))
  
  # SVM
  svm_clf = LinearSVC(C=0.01)
  svm_scores = cross_val_score(svm_clf, X, y, cv=5, 
  scoring='roc_auc')
  print("SVM per split:", svm_scores)
  print("SVM mean ROC AUC:", np.mean(svm_scores))

It's hard to communicate this properly, we should probably make sure to have a favourable example ready, but just included the simplest one!