Representation Learning for Tabular Data: A Comprehensive Survey

Awesome Tabular Deep Learning for "Representation Learning for Tabular Data: A Comprehensive Survey". If you use any content of this repo for your work, please cite the following bib entry:

@article{jiang2025tabularsurvey,
         title={Representation Learning for Tabular Data: A Comprehensive Survey}, 
         author={Jun-Peng Jiang and
                 Si-Yang Liu and
                 Hao-Run Cai and
                 Qile Zhou and
                 Han-Jia Ye},
         journal={arXiv preprint arXiv:2504.16109},
         year={2025}
}

Feel free to create new issues or drop me an email if you find any interesting paper missing in our survey, and we shall include them in the next version.

Updates

[04/2025] arXiv paper has been released.

[04/2025] The repository has been released.

Introduction

Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks (DNNs) recently demonstrating promising results through their capability of representation learning. In this survey, we systematically introduce the field of tabular representation learning, covering the background, challenges, and benchmarks, along with the pros and cons of using DNNs. We organize existing methods into three main categories according to their generalization capabilities: specialized, transferable, and general models. Specialized models focus on tasks where training and evaluation occur within the same data distribution. We introduce a hierarchical taxonomy for specialized models based on the key aspects of tabular data—features, samples, and objectives—and delve into detailed strategies for obtaining high-quality feature- and sample-level representations. Transferable models are pre-trained on one or more datasets and subsequently fine-tuned on downstream tasks, leveraging knowledge acquired from homogeneous or heterogeneous sources, or even cross-modalities such as vision and language. General models, also known as tabular foundation models, extend this concept further, allowing direct application to downstream tasks without additional fine-tuning. We group these general models based on the strategies used to adapt across heterogeneous datasets. Additionally, we explore ensemble methods, which integrate the strengths of multiple tabular models. Finally, we discuss representative extensions of tabular learning, including open-environment tabular machine learning, multimodal learning with tabular data, and tabular understanding tasks.

Some Basic Resources

Benchmarks

Date	Name	Paper	Publication
2025	TabArena	TabArena: A Living Benchmark for Machine Learning on Tabular Data	CoRR
2025	MLE-Bench	MLE-Bench: Evaluating Machine Learning Agents on Machine Learning Engineering	ICLR
2025	TabReD	TabReD: Analyzing Pitfalls and Filling the Gaps in Tabular Deep Learning Benchmarks	ICLR
2024	Data-Centric Benchmark	A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data	NeurIPS
2024	Better-by-Default	Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data	NeurIPS
2024	LAMDA-Tabular-Bench	A Closer Look at Deep Learning Methods on Tabular Datasets	CoRR
2024	DMLR-ICLR24-Datasets-for-Benchmarking	Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning	DMLR
2023	TableShift	Benchmarking Distribution Shift in Tabular Data with TableShift	NeurIPS
2023	TabZilla	When Do Neural Nets Outperform Boosted Trees on Tabular Data?	NeurIPS
2023	EncoderBenchmarking	A benchmark of categorical encoders for binary classification	NeurIPS
2022	Grinsztajn et al. Benchmark	Why do tree-based models still outperform deep learning on tabular data?	NeurIPS
2021	RTDL	Revisiting Deep Learning Models for Tabular Data	NeurIPS
2021	WellTunedSimpleNets	Well-tuned Simple Nets Excel on Tabular Datasets	NeurIPS

Awesome Deep Tabular Toolboxs

RTDL: A collection of papers and packages on deep learning for tabular data.
TALENT: A comprehensive toolkit and benchmark for tabular data learning, featuring 30 deep methods, more than 10 classical methods, and 300 diverse tabular datasets.
pytorch_tabular: A standard framework for modelling Deep Learning Models for tabular data.
pytorch-frame: A modular deep learning framework for building neural network models on heterogeneous tabular data.
DeepTables: An easy-to-use toolkit that enables deep learning to unleash great power on tabular data.
AutoGluon: A toolbox which automates machine learning tasks and enables to easily achieve strong predictive performance.
...

Other Awesome Repositories

TabPFN and its extensions

Some summary repositories

Specialized Methods

Date	Name	Paper	Publication
2025	ModernNCA	Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later	ICLR
2025	TabM	TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling	ICLR
2024	ExcelFormer	Can a deep learning model be a sure bet for tabular prediction?	KDD
2024	AMFormer	Arithmetic feature interaction is necessary for deep tabular learning	AAAI
2024	GRANDE	GRANDE: gradient-based decision tree ensembles for tabular data	ICLR
2024	DOFEN	DOFEN: Deep Oblivious Forest ENsemble	NeurIPS
2024	RealMLP	Better by default: Strong pre-tuned mlps and boosted trees on tabular data	NeurIPS
2024	BiSHop	Bishop: Bi-directional cellular learning for tabular data with generalized sparse modern hopfield model	ICML
2024	SwitchTab	Switchtab: Switched autoencoders are effective tabular learners	AAAI
2024	PTaRL	Ptarl: Prototype-based tabular representation learning via space calibration	ICLR
2024	TabR	Tabr: Tabular deep learning meets nearest neighbors in 2023	ICLR
2023		An inductive bias for tabular deep learning	NeurIPS
2023	TabRet	Tabret: Pre-training transformer-based tabular models for unseen columns	CoRR
2023	Trompt	Trompt: Towards a better deep neural network for tabular data	ICML
2023	TANGOS	Tangos: Regularizing tabular neural networks through gradient orthogonalization and specialization	ICLR
2022	MLP-PLR	On embeddings for numerical features in tabular deep learning	NeurIPS
2022	SAINT	SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training	NeurIPS WS
2022	DANets	Danets: Deep abstract networks for tabular data classification and regression	AAAI
2022	DNNR	DNNR: differential nearest neighbors regression	ICML
2022	Hopular	Hopular: Modern hopfield networks for tabular data	CoRR
2022	LSPIN	Locally Sparse Neural Networks for Tabular Biomedical Data	ICML
2021	Net-DNF	Net-DNF: Effective Deep Modeling of Tabular Data	ICLR
2021	FT-Transformer	Revisiting deep learning models for tabular data	NeurIPS
2021	TabNet	Tabnet: Attentive interpretable tabular learning	AAAI
2021	DCNv2	DCN V2: improved deep & cross network and practical lessons for web-scale learning to rank systems	WWW
2021		Well-tuned simple nets excel on tabular datasets	NeurIPS
2021	NPT	Self-attention between datapoints: Going beyond individual input-output pairs in deep learning	NeurIPS
2020		Survey on categorical data for neural networks	Journal of big data
2020	TabTransformer	Tabtransformer: Tabular data modeling using contextual embeddings	CoRR
2020	GrowNet	Gradient boosting neural networks: Grownet	CoRR
2020	NODE	Neural oblivious decision ensembles for deep learning on tabular data	ICLR
2020	STG	Feature Selection using Stochastic Gates	ICML
2019	AutoInt	Autoint: Automatic feature interaction learning via self-attentive neural networks	CIKM
2018	RLNs	Regularization learning networks: deep learning for tabular datasets	NeurIPS
2017	SNN	Selfnormalizing neural networks	NIPS

Transferable Methods

Date	Name	Paper	Publication
2025		A survey on self-supervised learning for non-sequential tabular data	Machine Learning
2025	Tab2Visual	Tab2Visual: Overcoming Limited Data in Tabular Data Classification Using Deep Learning with Visual Representations	CoRR
2024	LFR	Self-supervised representation learning from random data projectors	ICLR
2024	UniTabE	UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science	ICLR
2024	CM2	Towards cross-table masked pretraining for web data mining	WWW
2024	TP-BERTa	Making pre-trained language models great on tabular prediction	ICLR
2024	CARTE	CARTE: pretraining and transfer for tabular learning	ICML
2024	FeatLLM	Large language models can automatically engineer features for few-shot tabular learning	ICML
2024	LM-IGTD	LM-IGTD: a 2d image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks	CoRR
2023	DoRA	Dora: Domain-based self-supervised learning framework for low-resource real estate appraisal	CIKM
2023		Transfer learning with deep tabular models	ICLR
2023	ReConTab	Recontab: Regularized contrastive representation learning for tabular data	CoRR
2023	TabRet	Tabret: Pre-training transformer-based tabular models for unseen columns	CoRR
2023	ORCA	Cross-modal fine-tuning: Align then refine	ICML
2023	TabToken	Unlocking the transferability of tokens in deep models for tabular data	CoRR
2023		Transfer learning with deep tabular models	ICLR
2023	Xtab	Xtab: Cross-table pretraining for tabular transformers	ICML
2023	Meta-Transformer	Meta-transformer: A unified framework for multimodal learning	CoRR
2023	Binder	Binding language models in symbolic languages	ICLR
2023	CAAFE	Large language models for automated data science: Introducing caafe for context-aware automated feature engineering	NeurIPS
2023	TaPTaP	Generative table pre-training empowers models for tabular prediction	EMNLP
2023	TabLLM	Tabllm: few-shot classification of tabular data with large language models	AISTATS
2023	UniPredict	Unipredict: Large language models are universal tabular predictors	CoRR
2023	TablEye	Tableye: Seeing small tables through the lens of images	CoRR
2022		Revisiting pretraining objectives for tabular deep learning	CoRR
2022	SEFS	Self-supervision enhanced feature selection with correlated gates	ICLR
2022	MET	MET: masked encoding for tabular data	CoRR
2022	SAINT	SAINT: Improved neural networks for tabular data via row attention and contrastive pre-training	NeurIPS WS
2022	SCARF	Scarf: Self-supervised contrastive learning using random feature corruption	ICLR
2022	Stab	Stab: Self-supervised learning for tabular data	NeurIPS WS
2022	DEN	Distribution embedding networks for generalization from a diverse set of classification tasks
2022	TransTab	Transtab: Learning transferable tabular transformers across tables	NeurIPS
2022	Ptab	Ptab: Using the pre-trained language model for modeling tabular data	CoRR
2022	LIFT	LIFT: language-interfaced fine-tuning for non-language machine learning tasks	NeurIPS
2021	SubTab	Subtab: Subsetting features of tabular data for self-supervised representation learning	NeurIPS
2021	DACL	Towards domain-agnostic contrastive learning	ICML
2021	IGTD	Converting tabular data into images for deep learning with convolutional neural networks	Scientific reports
2020	VIME	VIME: extending the success of self- and semi-supervised learning to tabular domain	NeurIPS
2020		Meta-learning from tasks with heterogeneous attribute spaces	NeurIPS
2020	TAC	A novel method for classification of tabular data using convolutional neural networks	Biorxiv
2019	Super-TML	Supertml: Two-dimensional word embedding for the precognition on structured tabular data	CVPR WS

General Methods

Date	Name	Paper	Publication
2025	Beta*	Tabpfn unleashed: A scalable and effective solution to tabular classification problems	ICML
2025	MotherNet	MotherNet: Fast Training and Inference via Hyper-Network Transformers	ICLR
2025	TabPFN v2	Accurate predictions on small data with a tabular foundation model	Nature
2025	TabForestPFN*	Fine-tuned in-context learning transformers are excellent tabular data classifiers	CoRR
2025	APT*	Zero-shot meta-learning for tabular prediction tasks with adversarially pre-trained transformer	CoRR
2025	TabICL*	Tabicl: A tabular foundation model for in-context learning on large data	ICML
2025	EquiTabPFN*	Equitabpfn: A targetpermutation equivariant prior fitted networks	CoRR
2025	*	Scalable in-context learning on tabular data via retrieval-augmented large language models	CoRR
2024	HyperFast	Hyperfast: Instant classification for tabular data	AAAI
2024	TabDPT*	Tabdpt: Scaling tabular foundation models	CoRR
2024	MIXTUREPFN*	Mixture of incontext prompters for tabular pfns	CoRR
2024	LoCalPFN*	Retrieval & fine-tuning for in-context tabular models	NeurIPS
2024	LE-TabPFN*	Towards localization via data embedding for tabPFN	NeurIPS WS
2024	TabFlex*	Tabflex: Scaling tabular learning to millions with linear attention	NeurIPS WS
2024	*	Exploration of autoregressive models for in-context learning on tabular data	NeurIPS WS
2024	TabuLa-8B	Large scale transfer learning for tabular data via language modeling	NeurIPS
2024	GTL	From supervised to generative: A novel paradigm for tabular deep learning with large language models	KDD
2024	MediTab	Meditab: Scaling medical tabular data predictors via data consolidation, enrichment, and refinement	IJCAI
2023	TabPTM	Training-free generalization on heterogeneous tabular data via meta-representation	CoRR
2023	TabPFN	Tabpfn: A transformer that solves small tabular classification problems in a second	ICLR

* denotes that the method is a variation of TabPFN, some of which requires fine-tuning for downstream tasks.

Ensemble Methods

Date	Name	Paper	Publication
2025	TabM	TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling	ICLR
2025	TabPFN v2	Accurate predictions on small data with a tabular foundation model	Nature
2025	Beta	Tabpfn unleashed: A scalable and effective solution to tabular classification problems	CoRR
2025	LLM-Boost, PFN-Boost	Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes	CoRR
2024	HyperFast	Hyperfast: Instant classification for tabular data	AAAI
2024	GRANDE	GRANDE: gradient-based decision tree ensembles for tabular data	ICLR
2023	TabPTM	Training-free generalization on heterogeneous tabular data via meta-representation	CoRR
2023	TabPFN	Tabpfn: A transformer that solves small tabular classification problems in a second	ICLR
2020	TabTransformer	Tabtransformer: Tabular data modeling using contextual embeddings	CoRR
2020	GrowNet	Gradient boosting neural networks: Grownet	CoRR
2020	NODE	Neural oblivious decision ensembles for deep learning on tabular data	ICLR

Extensions

Clustering

Anomaly Detection

Tabular Generation

Interpretability

Open-Environment Tabular Machine Learning

Multi-modal Learning with Tabular Data

Tabular Understanding

Please refer to Awesome-Tabular-LLMs for more information.

Workshops

Acknowledgment

This repo is modified from TALENT.

Correspondence

This repo is developed and maintained by Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qile Zhou, and Han-Jia Ye. If you have any questions, please feel free to contact us by opening new issues or email:

Jun-Peng Jiang: jiangjp@lamda.nju.edu.cn
Si-Yang Liu: liusy@lamda.nju.edu.cn
Hao-Run Cai: caihr@smail.nju.edu.cn
Qile Zhou: zhouql@lamda.nju.edu.cn
Han-Jia Ye: yehj@lamda.nju.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
resources		resources
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Representation Learning for Tabular Data: A Comprehensive Survey

Updates

Introduction

Some Basic Resources

Benchmarks

Awesome Deep Tabular Toolboxs

Other Awesome Repositories

Specialized Methods

Transferable Methods

General Methods

Ensemble Methods

Extensions

Workshops

Acknowledgment

Correspondence

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

License

LAMDA-Tabular/Tabular-Survey

Folders and files

Latest commit

History

Repository files navigation

Representation Learning for Tabular Data: A Comprehensive Survey

Updates

Introduction

Some Basic Resources

Benchmarks

Awesome Deep Tabular Toolboxs

Other Awesome Repositories

Specialized Methods

Transferable Methods

General Methods

Ensemble Methods

Extensions

Workshops

Acknowledgment

Correspondence

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Packages