<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: noahho</title><link>https://news.ycombinator.com/user?id=noahho</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 09 Apr 2026 12:38:57 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=noahho" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by noahho in "Ask HN: Who is hiring? (April 2026)"]]></title><description><![CDATA[
<p>Prior Labs | Berlin / Freiburg / NYC | ONSITE | Full-time | Multiple Roles<p>Tables power every clinical trial, financial model, and scientific experiment, but deep learning has mostly ignored them. No natural sequence, no spatial structure, no shared vocabulary across datasets. LLM architectures don't transfer. We built TabPFN, the first foundation model that actually understands tabular data (published in Nature, 3M+ downloads, new SOTA for tabular ML). The hardest problems are still open.<p>The model is half the product. The other half - training infrastructure, real-time serving, developer platform, reliability - is what turns a research breakthrough into something enterprises trust in production. We're hiring across both.<p>- ML Engineer, Training Infrastructure — Own GPU infrastructure, distributed training performance, and the developer productivity layer (CI, experiment tracking, model registry) that keeps research moving fast.<p>- Full Stack Engineer, ML Platform — Build the product that puts tabular foundation models in users' hands, from data upload through inference and results. You'll work across frontend, backend, and directly with the research team to turn new model capabilities into production features.<p>- Research Engineer, Foundation Model — Design experiments, run ablations, build training infrastructure, contribute to papers. Research engineers here aren't supporting scientists — they are the science team.<p>- ML Engineer, Cloud Platform — Design and scale the core infrastructure for serving and finetuning foundation models in production. Early enough that you're making the architecture decisions, not inheriting them.<p>Also hiring: Research Scientist, Applied Scientist, Forward Deployed ML Engineer, Developer Relations Engineer, AE, BDR.<p>20-person team selected from thousands applicants. Backgrounds from Jane Street, Google, CERN, G-Research. Led by Frank Hutter, advised by Yann LeCun and Bernhard Schölkopf. With backing from leaders at Hugging Face, DeepMind, and Black Forest Labs, XTX Ventures & Balderton.<p>All roles: <a href="https://jobs.priorlabs.ai" rel="nofollow">https://jobs.priorlabs.ai</a></p>
]]></description><pubDate>Wed, 01 Apr 2026 19:17:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47605247</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=47605247</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47605247</guid></item><item><title><![CDATA[New comment by noahho in "Ask HN: Who is hiring? (March 2026)"]]></title><description><![CDATA[
<p>Prior Labs | Berlin / Freiburg / NYC | ONSITE & REMOTE (EU) | Full-time<p>Tables power every clinical trial, financial model, and scientific experiment, but deep learning has mostly ignored them. No natural sequence, no spatial structure, no shared vocabulary across datasets. LLM architectures don't transfer. We built TabPFN, the first foundation model that actually understands tabular data (published in Nature, 3M+ downloads, new SOTA for tabular ML). The hardest problems are still open.<p>The model is half the product. The other half - training infrastructure, real-time serving, developer platform, reliability - is what turns a research breakthrough into something enterprises trust in production. We're hiring across both.<p>ML Engineer, Cloud Platform — Design and scale the core infrastructure for serving and finetuning foundation models in production. Early enough that you're making the architecture decisions, not inheriting them.<p>ML Engineer, Training Infrastructure — Own GPU infrastructure, distributed training performance, and the developer productivity layer (CI, experiment tracking, model registry) that keeps research moving fast.<p>Full Stack Engineer, ML Platform — Build the product that puts tabular foundation models in users' hands, from data upload through inference and results. You'll work across frontend, backend, and directly with the research team to turn new model capabilities into production features.<p>Research Engineer, Foundation Model — Design experiments, run ablations, build training infrastructure, contribute to papers. Research engineers here aren't supporting scientists — they are the science team.<p>Also hiring: Research Scientist, Applied Scientist, Forward Deployed ML Engineer, Developer Relations Engineer, AE, BDR.<p>20-person team selected from thousands applicants. Backgrounds from Jane Street, Google, CERN, G-Research. Led by Frank Hutter, advised by Yann LeCun and Bernhard Schölkopf. With backing from Balderton, XTX Ventures and leaders at Hugging Face, DeepMind, and Black Forest Labs. Comp competitive with top AI labs, meaningful equity.<p>Apply at: <a href="https://jobs.priorlabs.ai" rel="nofollow">https://jobs.priorlabs.ai</a></p>
]]></description><pubDate>Mon, 02 Mar 2026 16:08:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47219823</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=47219823</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47219823</guid></item><item><title><![CDATA[TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2511.08667">https://arxiv.org/abs/2511.08667</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45952746">https://news.ycombinator.com/item?id=45952746</a></p>
<p>Points: 7</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 17 Nov 2025 11:38:20 +0000</pubDate><link>https://arxiv.org/abs/2511.08667</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=45952746</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45952746</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"]]></title><description><![CDATA[
<p>Yes exactly, the API is the best way to handle text features. The actual semantics often matter a lot . Is the API an option for you or would you need this local?</p>
]]></description><pubDate>Thu, 06 Nov 2025 20:34:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45840059</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=45840059</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45840059</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"]]></title><description><![CDATA[
<p>Less feature engineering is definitely something we are aiming for. The current version is actually only based on statistics, the real world connections between features is something we're working on right now and hope to show results for soon. That's the next step</p>
]]></description><pubDate>Thu, 06 Nov 2025 20:09:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=45839776</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=45839776</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45839776</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"]]></title><description><![CDATA[
<p>When we released TabPFNv1 over three years ago, I didn’t expect at all the hundreds of comments and reposts we would see. Tabular data had been a field getting little love from AI research—but we immediately felt that this was a topic that data scientists, scientists, financial analysts, and enterprise users deeply cared about. Glad its useful to people!</p>
]]></description><pubDate>Thu, 06 Nov 2025 20:07:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45839747</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=45839747</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45839747</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN-2.5 – SOTA foundation model for tabular data"]]></title><description><![CDATA[
<p>TabPFN-2.5 default (one forward pass) matches AutoGluon 1.4 tuned for four-hours. Autogluon is the strongest AutoML including stacking of XGB and cat boost and even
includes the previous TabPFNv2.</p>
]]></description><pubDate>Thu, 06 Nov 2025 19:39:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=45839362</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=45839362</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45839362</guid></item><item><title><![CDATA[New comment by noahho in "Ask HN: Who is hiring? (May 2025)"]]></title><description><![CDATA[
<p>Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time<p>Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.<p>We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.<p>We're hiring a founding team to build this new category:<p>Software Engineer: Build scalable infra & APIs. Data Scientist / ML Engineer: Optimize & scale our TFMs. Product Manager: Define the vision for AI-native tabular tools. Developer Relations: Grow our community & drive adoption. Location: On-site in Berlin or Freiburg (we build together). Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.<p>Apply: <a href="https://jobs.ashbyhq.com/prior-labs" rel="nofollow">https://jobs.ashbyhq.com/prior-labs</a><p>Questions? Reach out at noah@priorlabs.ai</p>
]]></description><pubDate>Mon, 05 May 2025 16:41:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=43896957</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=43896957</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43896957</guid></item><item><title><![CDATA[AI's Blind Spot: The Structured Data Challenge]]></title><description><![CDATA[
<p>Article URL: <a href="https://priorlabs.ai/vision">https://priorlabs.ai/vision</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43783345">https://news.ycombinator.com/item?id=43783345</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 24 Apr 2025 14:36:36 +0000</pubDate><link>https://priorlabs.ai/vision</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=43783345</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43783345</guid></item><item><title><![CDATA[New comment by noahho in "Ask HN: Who is hiring? (April 2025)"]]></title><description><![CDATA[
<p>Prior Labs | Founding-Team: Software Engineer, Data Scientist, ML Engineer, Product Manager, Developer Relations| On-Site (Berlin, Freiburg) | Full-time<p>Prior Labs is building foundation models for structured/tabular data – AI's biggest blind spot. While LLMs handle text/images, tables (numbers, categories, text mix) need native understanding. Our approach, TabPFN (published in Nature, 1M+ downloads, 3k+ GitHub stars), uses transformers pre-trained on synthetic data to achieve state-of-the-art results on small datasets zero-shot.<p>We're tackling a $100B+ opportunity to transform data science, finance, healthcare, and science. Backed by €9M pre-seed from Balderton, XTX Ventures, and leaders from Hugging Face, DeepMind, etc.<p>We're hiring a founding team to build this new category:<p>Software Engineer: Build scalable infra & APIs.
Data Scientist / ML Engineer: Optimize & scale our TFMs.
Product Manager: Define the vision for AI-native tabular tools.
Developer Relations: Grow our community & drive adoption.
Location: On-site in Berlin or Freiburg (we build together).
Offer: Competitive salary + significant equity. Shape the future of AI for structured data from day one.<p>Apply: <a href="https://jobs.ashbyhq.com/prior-labs" rel="nofollow">https://jobs.ashbyhq.com/prior-labs</a><p>Questions? Reach out at noah@priorlabs.ai</p>
]]></description><pubDate>Fri, 11 Apr 2025 11:59:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=43652851</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=43652851</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43652851</guid></item><item><title><![CDATA[New comment by noahho in "Ask HN: Who is hiring? (March 2025)"]]></title><description><![CDATA[
<p>Prior Labs | Software Engineer, Product Manager, Developer Relations, ML Engineer | On-Site (Berlin, Freiburg) | Full-time<p>AI has transformed text, images, and code—but structured data remains overlooked. Prior Labs is an early-stage startup building Foundation Models for tabular data, unlocking a new AI modality with the potential to transform science, finance, healthcare, and data science itself. Our model, published in Nature, is already state-of-the-art for small datasets, and we’re scaling this into a step-change for data science.<p>We’re backed by Balderton, XTX Ventures, and leaders from Hugging Face, Black Forest Labs, DeepMind, DataRobot, .. Our team includes world-class researchers and engineers from top AI labs, and we’re growing fast.<p>We're hiring founding engineers & builders to help define this new category:<p>Software Engineer – Build scalable infrastructure and APIs to integrate our models into real-world applications.<p>Product Manager – Define and execute the vision for AI-native tools for structured data.<p>Developer Relations – Grow the developer community, drive adoption, and showcase use cases.<p>ML Engineer – Optimize and scale our foundation models for structured data.<p>Location: On-site in Berlin or Freiburg (we believe in building together).
Why Join? Competitive salary + equity, shape the future of foundation models for structured data from day one.<p>Apply now: <a href="https://jobs.ashbyhq.com/prior-labs" rel="nofollow">https://jobs.ashbyhq.com/prior-labs</a><p>Questions? Reach out at noah@priorlabs.ai</p>
]]></description><pubDate>Tue, 04 Mar 2025 09:52:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=43252573</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=43252573</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43252573</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Thanks a lot! Currently have an issue on documenting how to use for more samples at <a href="https://github.com/PriorLabs/TabPFN/issues/129">https://github.com/PriorLabs/TabPFN/issues/129</a>. Will do this soon, maybe give an upvote there if it matters to you.</p>
]]></description><pubDate>Mon, 13 Jan 2025 18:40:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=42686915</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42686915</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42686915</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Yes! This makes sense from a learning perspective: More samples add additional evidence the datapoint is actually what you observed - based on one sample the model is closer to a mean regression (which would translate to more balanced class  probabilities in classification).
Transformers have trouble counting repeated entries (there was a famous failure case of ChatGPT, asking it to count the number of 1s and 0s in a string). This model has some tricks to solve this.</p>
]]></description><pubDate>Mon, 13 Jan 2025 18:39:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=42686895</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42686895</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42686895</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>if you're predicting on text data, our public models don't do that, they would encode as classes. Our API (<a href="https://github.com/PriorLabs/tabpfn-client/">https://github.com/PriorLabs/tabpfn-client/</a>) has experimental support.</p>
]]></description><pubDate>Mon, 13 Jan 2025 18:30:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=42686775</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42686775</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42686775</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Thanks a lot! We don't see clear artifacts for the synth data. Part of the "trick" is to keep the capacity of our model low, it has only about 11M parameters. That forces the model to "learn an in-context learning algorithm" or in other words "do in-context learning rather than in-weigthts learning". 
Adding real data on top will help, agreed! The synthetic data is very broad, we started by a synth data prior that was just BNNs samples with differing sizes and thus super broad. Our new data samples functions more densely that are simpler to explain but could still sample almost any function (with the constraints that our networks aren't infinitely complex).</p>
]]></description><pubDate>Mon, 13 Jan 2025 18:29:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=42686759</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42686759</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42686759</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>thanks a ton! If it's public please share in the Discord <a href="https://discord.com/channels/1285598202732482621/" rel="nofollow">https://discord.com/channels/1285598202732482621/</a> > #use-cases (just created!), if not, mail me at noah@priorlabs.ai</p>
]]></description><pubDate>Fri, 10 Jan 2025 13:29:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=42655429</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42655429</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42655429</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Looks like a great use case! We have a method specifically for imputation in the tabpfn-extensions package (<a href="https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da25821135602fdc4d95cc8c217afbc3b0/src/tabpfn_extensions/unsupervised/unsupervised.py#L282">https://github.com/PriorLabs/tabpfn-extensions/blob/dbc3f5da...</a>). It needs some cleaning up before I want to highlight in the notebooks and docs.</p>
]]></description><pubDate>Fri, 10 Jan 2025 10:28:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=42654431</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42654431</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42654431</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Up to 4 hrs of tuning per dataset / split (10-fold CV)</p>
]]></description><pubDate>Fri, 10 Jan 2025 10:25:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=42654422</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42654422</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42654422</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Author here! The fundamental challenge is that LLMs like O1 and Claude 3.5 simply aren't built for the unique structures of tabular data. When processing tables through LLMs, the inefficiencies quickly become apparent - tokenizing a 10,000 x 100 table as a sequence and numerical values as tokens creates massive inefficiencies.<p>There's some interesting work on using LLMs for tabular data (TabLLM: <a href="https://proceedings.mlr.press/v206/hegselmann23a.html" rel="nofollow">https://proceedings.mlr.press/v206/hegselmann23a.html</a>), but this only works for datasets with tens of samples rather than the thousands of rows needed in real-world applications.<p>What o1 and other LLMs typically do is wrap around existing tabular tools like XGBoost or scikit-learn. While this works, they're ultimately constrained by these tools' limitations. We're taking a fundamentally different approach - building foundation models that natively understand tabular relationships and patterns. Our approach combines the benefits of foundation models with architectures specifically designed for tabular data structures.</p>
]]></description><pubDate>Fri, 10 Jan 2025 10:20:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=42654393</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42654393</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42654393</guid></item><item><title><![CDATA[New comment by noahho in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"]]></title><description><![CDATA[
<p>Author here! The breast cancer dataset is simple and heavily saturated, so small differences between methods are expected. As you say, single-use examples can be noisy due to randomness in how the data is randomly split into training and testing sets especially for a saturated dataset like this one. Cross-validation reduces this variance by averaging over multiple splits. I just ran this below:<p><pre><code>  TabPFN mean ROC AUC: 0.9973

  SVM mean ROC AUC: 0.9903

  TabPFN per split: [0.99737963 0.99639699 0.99966931 0.99338624 0.99966465]

  SVM per split: [0.99312152 0.98788077 0.99603175 0.98313492 0.99128102]

  from sklearn.model_selection import cross_val_score
  from tabpfn import TabPFNClassifier
  from sklearn.datasets import load_breast_cancer
  from sklearn.svm import LinearSVC
  import numpy as np

  data = load_breast_cancer()
  X, y = data.data, data.target

  # TabPFN
  tabpfn_clf = TabPFNClassifier()
  tabpfn_scores = cross_val_score(tabpfn_clf, X, y, cv=5, 
  scoring='roc_auc')
  print("TabPFN per split:", tabpfn_scores)
  print("TabPFN mean ROC AUC:", np.mean(tabpfn_scores))
  
  # SVM
  svm_clf = LinearSVC(C=0.01)
  svm_scores = cross_val_score(svm_clf, X, y, cv=5, 
  scoring='roc_auc')
  print("SVM per split:", svm_scores)
  print("SVM mean ROC AUC:", np.mean(svm_scores))
</code></pre>
It's hard to communicate this properly, we should probably make sure to have a favourable example ready, but just included the simplest one!</p>
]]></description><pubDate>Fri, 10 Jan 2025 10:02:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=42654273</link><dc:creator>noahho</dc:creator><comments>https://news.ycombinator.com/item?id=42654273</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42654273</guid></item></channel></rss>