Nabeel Seedat

Hi 👋 I am a final-year PhD student in Machine Learning at the University of Cambridge supervised by Prof. Mihaela van der Schaar. My research interests span: Data-Centric AI, Large Language Models (LLM)s, Responsible AI, Synthetic Data and Uncertainty Quantification.

Since data is the fuel for ML, my research aims to develop systematic data-centric approaches applicable to different data modalities (tabular, image & text) — to make ML systems more reliable & trustworthy 🦾, whilst also improving model performance & training efficiency 🚀. Most recently on LLMs!

I hold a Masters degree from Cornell University working on Bayesian Deep Learning, as well as a Masters from the University of the Witwatersrand (South Africa) working on Signal Processing & ML for Parkinson’s Disease. I also hold a dual-bachelors in Information Engineering & Biomedical Engineering from the University of the Witwatersrand (South Africa).

My industry experience includes time as an ML Researcher at AstraZeneca working on LLM verification, test-time scaling and uncertainty quantification. Before my PhD, I worked on production ML systems as a Data Scientist working on Computer Vision at Shutterstock (USA) and as an ML Engineer working on NLP at Multichoice (Africa’s largest multimedia company).

Note: I am finishing my PhD in the Summer of 2025 and am looking for full-time industry ML Research opportunities.

Please reach out if you think I am a good fit: ns741@cam.ac.uk

[twitter] [scholar] [github] [linkedin]

Jump to publications

🗞️ News 🗞️

Nov 2024 → Four papers accepted to NeurIPS2024! covering different dimensions of Large Language Models (LLMs). Looking forward to presenting with my co-authors in Vancouver 🥳 Camera-ready versions of our papers coming soon!
Oct 2024 → 🤩 Really enjoyed giving a tutorial at MICCAI 2024 on Clinical AI in the real-world: From Data-centric AI to Dynamic Learning. It was an honor to have the opportunity to share my research during the tutorial at the first MICCAI to take place in Africa! 🇿🇦🌍🎉
Summer 2024 → Excited to be interning as an ML Researcher at AstraZeneca, where I’ll be working on LLMs for clinical trials 💊💉. Looking forward to AI research around LLM verification and uncertainty quantification to advance healthcare 🩺!
July 2024 → Two papers accepted to ICML 2024 - topics include LLMs for synthetic data generation and uncertainty estimation! 🥳 Looking forward to presenting these with my co-authors!
June 2024 → Our paper improving pseudo-labeling (semi-supervised learning) from a data-centric perspective has been accepted to the new Journal of Data-centric Machine Learning Research (DMLR) 🥳. Really excited to be one of the early contributors to this premier data-centric ML research venue — part of the JMLR family.
June 2024 → Gave a talk at Stanford on An Uncertainty Estimation lens on Data-centric AI 🤔
May 2024 → Our paper on Generalization as a key challenge for Responsible AI is accepted to Nature Digital Medicine! Really awesome colab with GSK 😎. We were invited to talk about it on the Nature podcast.
May 2024 → Had a really fun time at the ICTP Advanced ML Summer School in Trieste giving a talk on Data-centric AI for healthcare. Thanks to the organizers 😊
April 2024 → Gave a talk at Apple on Data characterization & Synthetic data. Thanks for hosting me and the super fun session! 😊
Jan 2024 → Three papers accepted! 🥳 One paper at AISTATS2024 and two papers at ICLR2024 — a first time :) Looking forward to presenting these with my co-authors!
Dec 2023 → DC-Check accepted to IEEE Transactions on AI! Interested in Data-Centric AI, then checkout our paper Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities 😊
Nov 2023 → Gave a talk at Microsoft Research Cambridge on Data-Centric AI. Thanks for hosting me! 😊
Oct 2023 → Data-Centric AI Tutorial accepted to NeurIPS2023! w/ Mihaela van der Schaar and Isabelle Guyon (Google Research). See you in New Orleans 🐊🎺🌶️
Oct 2023 → Four papers accepted to NeurIPS2023! Three papers on the main track and one on the D&B track. Camera-ready versions of our papers coming soon!
Sept 2023 → On September 11 I gave a talk on Data-Centric AI at the AI and Machine Learning in Healthcare Summer School organised by the Cambridge Center for AI in Medicine (CCAIM). Have a look at the fantastic program here: https://ccaim.cam.ac.uk/program/.
Aug 2023 → Presented a tutorial on Data-Centric AI@ IJCAI2023! together w/ Mihaela van der Schaar. It was a fantastic experience to engage with the community about this important research area!
July 2023 → Selected by the Mail & Guardian in the Top 200, Young South Africas’s for 2023! 🇿🇦
June 2023 → Awarded the best research poster presentation at the Future of Data-Centric AI conference
May 2023 → Paper accepted to ICML2023 on transportable structure learning [paper].
March 2023 → Our Data-Centric AI checklist called DC-Check was featured by MarkTechPost and the Montreal AI Ethics Institute (see the DC-Check paper).
Jan 2023 → New paper accepted to AISTATS2023 on improving conformal prediction w/ self-supervised learning [paper].
Oct 2022 → Excited to be giving talks on Data-Centric AI at AstraZeneca, Queen Mary University of London and the University of Cape Town!
Sept 2022 → _New paper accepted to NeurIPS2022 on data-centric AI to audit training datasets for tabular, images and text [paper]. Looking forward to presenting together with my co-authors!
May 2022 → Two papers accepted! 🥳 at ICML22 on data-centric AI for reliable deployment [paper] and treatment effect estimation in continuous time [paper].
Oct 2021 → I have officially started a PhD in Machine Learning in the University of Cambridge under the supervision of Mihaela van der Schaar!

Publications

Please find some of my publications below (a more up-to-date list can be found on google scholar).

“*” denotes equal contribution.

Top ML/AI venues

P.Rauba*, N.Seedat*, M.Luyten, M. van der Schaar. ``Context-aware testing: A new paradigm for testing with Large Language Models.’’ NeurIPS 2024 [paper]
N.Seedat, M. van der Schaar. ``Matchmaker: Self-Improving Compositional LLM Programs for Schema Matching.’' NeurIPS 2024 GenAI for Health [paper]
P.Rauba, N.Seedat, M. van der Schaar. ``Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments.’’ NeurIPS 2024 [paper]
N.Astorga, T,Liu, N.Seedat, M. van der Schaar. ``POCA: Partially Observable Cost-Aware Active-Learning.’' NeurIPS 2024
N.Seedat*, N.Huynh*, B.van Breugel, M.van der Schaar ``Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes.’’ ICML 2024 [paper]
T.Pouplin, A.Jeffares, N.Seedat, M.van der Schaar ``Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise.’’ ICML 2024 [paper]
N.Seedat*, N.Huynh*, F.Imrie, M. van der Schaar. ``You can’t handle the (dirty) truth: Data-Centric Insights Improve Pseudo-Labeling’’ Journal of Data-centric Machine Learning Research (DMLR) [paper]
N.Seedat, F.Imrie, M. van der Schaar. ``Dissecting sample hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI.’’ ICLR 2024 [paper]
H.Sun, A.Chan, N.Seedat, A.Huyuk, M. van der Schaar. ``When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective.’’ Journal of Data-centric Machine Learning Research (DMLR) [paper]
T.Liu, N.Astorga, N.Seedat, M. van der Schaar. ``Large Language Models to Enhance Bayesian Optimization.’’ ICLR 2024 [paper]
N. Huynh, J. Berrevoets,N.Seedat, J.Crabbe, Z.Qian, M. van der Schaar. ``DAGnosis: Localized Identification of Data Inconsistencies using Structures.’’ AISTATS 2024 [paper]
N.Seedat, J.Crabbe, Z.Qian, M. van der Schaar. ``TRIAGE: Characterizing and auditing training data for improved regression.’’ NeurIPS 2023 [paper]
N.Seedat*, B.van Breugel*, F.Imrie, M. van der Schaar. ``Can you rely on your model evaluation? Improving model evaluation with synthetic test data.’’ NeurIPS 2023 [paper]
L.Hansen*, N.Seedat*, M. van der Schaar, A.Petrovic. ``Reimagining Synthetic Data Generation through DataCentric AI: A Comprehensive Benchmark..’’ NeurIPS 2023 (D&B) [paper]
H.Sun, B.van Breugel, J.Crabb'e, N.Seedat, M. van der Schaar. ``What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization.’’ NeurIPS 2023 [paper]
N.Seedat*, A.Jeffares*, F.Imrie, M. van der Schaar. ``Improving adaptive conformal prediction using self-supervised learning.’’ AISTATS 2023 [paper]
J.Berrevoets, N.Seedat, F.Imrie M. van Der Schaar. ``Differentiable and transportable structure learning.’’ ICML 2023 [paper]
N.Seedat, J.Crabbe, I.Bica, M. van der Schaar. ``Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data.’’ NeurIPS 2022 [paper]
N.Seedat, J.Crabbe, M. van der Schaar. ``Data-SUITE: Data-Centric identification of in-distribution incongruous examples.’’ ICML 2022 (Spotlight) [paper]
N.Seedat*, F.Imrie*, A.Bellot, Z.Qian, M. van der Schaar. ``Continuous-time modeling of counterfactual outcomes using neural controlled differential equations.’’ ICML 2022 [paper]
N.Seedat. ``MCU-Net: A framework towards uncertainty representations for decision support system patient referrals in healthcare contexts.’’ KDD 2020, Spotlight Presentation: Workshop on Applied Data Science for Healthcare & ICML 2020: Uncertainty & Robustness in Deep Learning Workshop. [paper]
N.Seedat and C.Kanan. ``Towards calibrated and scalable uncertainty representations for neural networks.’’ NeurIPS 2019 - 4th Workshop on Bayesian Deep Learning. [paper]
N.Seedat and V.Aharonson. ``Machine learning discrimination of Parkinson’s Disease stages from walker-mounted sensors data.’’ AAAI 2020 - International Workshop on Health Intelligence and Studies in Computational Intelligence (Springer), 2020. [paper]

Journals and other conferences

L.Goetz*,N.Seedat*, R.Vandersluis, M. van der Schaar. ``Generalization—a key challenge for responsible AI in patient-facing clinical applications.’’ Nature Digital Medicine [paper]
N.Seedat, F.Imrie, M. van der Schaar. ``Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities.’’ IEEE Transactions on Artificial Intelligence, 2024. [paper]
E.Heremans, N.Seedat, B.Buyse, D.Testelmans, M. van der Schaar, & M. De Vos. ``U-PASS: an Uncertainty-guided deep learning Pipeline for Automated Sleep Staging.’’ Computers in Biology and Medicine, Vol 171, 2024. [paper]
H.Liu*, N.Seedat* and J.Ive. ``Modelling Disagreement in Automatic Data Labelling for Semi-Supervised Learning in Clinical Natural Language Processing.’’ Frontiers in Artificial Intelligence, 2024. [paper]
N.Seedat and V.Aharonson. ``Automated Machine Vision Enabled Detection of Movement Disorders from hand drawn spirals.’’ IEEE International Conference on Health Informatics (IEEE ICHI), 2020. [paper]
N.Seedat, V.Aharonson and Y.Hamzany. ``Automated and interpretable m-health discrimination of vocal cord pathology enabled by machine learning.’’ IEEE Conference on Computer Science and Data Engineering, 2020. [paper]
V.Aharonson, N.Seedat, S.Korn, S.Baer, M.Postema, G.Yahalom. ‘‘Automated stage discrimination of Parkinson’s Disease.’’, BIO Integration Journal, 2020. [paper]
N.Seedat, N.Sen, N.Naicker, K.Sharma, A. Almeida, G.Kalyansundaram, B.Mkwanazi, M.Velayudan. ``PEMS: Custom Neural Machine Translation System-Making subtitling of Portuguese TV shows and movies on the African continent work.’’ IEEE ICECET, 2021. [paper]
N.Seedat, D.Beder, V.Aharonson and S.Dubowsky. ``A comparison of footfall detection algorithms from walker mounted sensors data.’’ IEEE EBBT, 2018. [paper]
V.Aharonson, N.Seedat, I.Schlesinger, A.McDonald, S.Dubowsky and A.Korczyn. ``Feasibility of an instrumented walker to quantify treatment effects on Parkinson’s patient gait.’’ IEEE EBBT, 2018. [paper]
N.Seedat, I.Mohamed and AK.Mohamed .``Custom Force Sensor and Sensory Feedback System to Enable Grip Control of a Robotic Prosthetic Hand.’’ IEEE BioRob, 2018. [paper]
N.Seedat and A.van Wyk. ``Quadcopter Control using Intelligent Control.’’ Deep Learning Indaba. [paper]

Tutorials at top AI/ML conferences

N.Seedat, C.Gonzalez, M. van der Schaar. ``Clinical AI in the real-world: From Data-centric AI to Dynamic Learning’’ MICCAI 2024 Tutorial.
N.Seedat, I.Guyon, M. van der Schaar. ``Data-Centric AI for reliable and responsible AI: from theory to practice’’ NeurIPS 2023 Tutorial.
N.Seedat and M. van der Schaar. ``Data-Centric AI: Foundation, Frontiers and Applications.’’ IJCAI 2023 Tutorial.

Invited Talks

GE Healthcare (Topic: Synthetic data) (Oct 2024)
Nature Digital Medicine (Topic: Generalization as a key challenge for Responsible AI) (June 2024)
Stanford (Topic: An uncertainty estimation lens on Data-centric AI) (June 2024)
ICTP Advanced ML summer school (Topic: Data-Centric AI) (May 2024)
Apple (Topic: Data-Centric AI - Data characterization & Synthetic Data) (April 2024)
Microsoft Research Cambridge (Topic: Data-Centric AI) (November 2023)
Future of Data-Centric AI Conference Talk (Topic: Data-IQ) (June 2023)
Discovery Limited Invited Talk (Topic: Data-Centric AI) (Feb 2023)
AstraZeneca AI Journal Club Invited Talk (Topic: Data-Centric AI) (Nov 2022)
Queen Mary University London CogSci Invited Talk (Topic: Data-Centric AI) (Oct 2022)
University of Cape Town Invited Talk (Topic: Data-Centric AI) (Oct 2022)