Nabeel Seedat
Hi π I am a final-year PhD student in Machine Learning at the University of Cambridge supervised by Prof. Mihaela van der Schaar. My research interests span: Data-Centric AI, Large Language Models (LLM)s, Responsible AI, Synthetic Data and Uncertainty Quantification.
Since data is the fuel for ML, my research aims to develop systematic data-centric approaches applicable to different data modalities (tabular, image & text) — to make ML systems more reliable & trustworthy π¦Ύ, whilst also improving model performance & training efficiency π. Most recently on LLMs!
I hold a Masters degree from Cornell University working on Bayesian Deep Learning, as well as a Masters from the University of the Witwatersrand (South Africa) working on Signal Processing & ML for Parkinson’s Disease. I also hold a dual-bachelors in Information Engineering & Biomedical Engineering from the University of the Witwatersrand (South Africa).
My industry experience includes time as an ML Researcher at AstraZeneca working on LLM verification, test-time scaling and uncertainty quantification. Before my PhD, I worked on production ML systems as a Data Scientist working on Computer Vision at Shutterstock (USA) and as an ML Engineer working on NLP at Multichoice (Africa’s largest multimedia company).
Note: I am finishing my PhD in the Summer of 2025 and am looking for full-time industry ML Research opportunities.
Please reach out if you think I am a good fit: ns741@cam.ac.uk
ποΈ News ποΈ
Nov 2024 β Four papers accepted to NeurIPS2024! covering different dimensions of Large Language Models (LLMs). Looking forward to presenting with my co-authors in Vancouver π₯³ Camera-ready versions of our papers coming soon!
Oct 2024 β π€© Really enjoyed giving a tutorial at MICCAI 2024 on Clinical AI in the real-world: From Data-centric AI to Dynamic Learning. It was an honor to have the opportunity to share my research during the tutorial at the first MICCAI to take place in Africa! πΏπ¦ππ
Summer 2024 β Excited to be interning as an ML Researcher at AstraZeneca, where I’ll be working on LLMs for clinical trials ππ. Looking forward to AI research around LLM verification and uncertainty quantification to advance healthcare π©Ί!
July 2024 β Two papers accepted to ICML 2024 - topics include LLMs for synthetic data generation and uncertainty estimation! π₯³ Looking forward to presenting these with my co-authors!
June 2024 β Our paper improving pseudo-labeling (semi-supervised learning) from a data-centric perspective has been accepted to the new Journal of Data-centric Machine Learning Research (DMLR) π₯³. Really excited to be one of the early contributors to this premier data-centric ML research venue — part of the JMLR family.
June 2024 β Gave a talk at Stanford on An Uncertainty Estimation lens on Data-centric AI π€
May 2024 β Our paper on Generalization as a key challenge for Responsible AI is accepted to Nature Digital Medicine! Really awesome colab with GSK π. We were invited to talk about it on the Nature podcast.
May 2024 β Had a really fun time at the ICTP Advanced ML Summer School in Trieste giving a talk on Data-centric AI for healthcare. Thanks to the organizers π
April 2024 β Gave a talk at Apple on Data characterization & Synthetic data. Thanks for hosting me and the super fun session! π
Jan 2024 β Three papers accepted! π₯³ One paper at AISTATS2024 and two papers at ICLR2024 — a first time :) Looking forward to presenting these with my co-authors!
Dec 2023 β DC-Check accepted to IEEE Transactions on AI! Interested in Data-Centric AI, then checkout our paper Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities π
Nov 2023 β Gave a talk at Microsoft Research Cambridge on Data-Centric AI. Thanks for hosting me! π
Oct 2023 β Data-Centric AI Tutorial accepted to NeurIPS2023! w/ Mihaela van der Schaar and Isabelle Guyon (Google Research). See you in New Orleans ππΊπΆοΈ
Oct 2023 β Four papers accepted to NeurIPS2023! Three papers on the main track and one on the D&B track. Camera-ready versions of our papers coming soon!
Sept 2023 β On September 11 I gave a talk on Data-Centric AI at the AI and Machine Learning in Healthcare Summer School organised by the Cambridge Center for AI in Medicine (CCAIM). Have a look at the fantastic program here: https://ccaim.cam.ac.uk/program/.
Aug 2023 β Presented a tutorial on Data-Centric AI@ IJCAI2023! together w/ Mihaela van der Schaar. It was a fantastic experience to engage with the community about this important research area!
July 2023 β Selected by the Mail & Guardian in the Top 200, Young South Africas’s for 2023! πΏπ¦
June 2023 β Awarded the best research poster presentation at the Future of Data-Centric AI conference
May 2023 β Paper accepted to ICML2023 on transportable structure learning [paper].
March 2023 β Our Data-Centric AI checklist called DC-Check was featured by MarkTechPost and the Montreal AI Ethics Institute (see the DC-Check paper).
Jan 2023 β New paper accepted to AISTATS2023 on improving conformal prediction w/ self-supervised learning [paper].
Oct 2022 β Excited to be giving talks on Data-Centric AI at AstraZeneca, Queen Mary University of London and the University of Cape Town!
Sept 2022 β _New paper accepted to NeurIPS2022 on data-centric AI to audit training datasets for tabular, images and text [paper]. Looking forward to presenting together with my co-authors!
May 2022 β Two papers accepted! π₯³ at ICML22 on data-centric AI for reliable deployment [paper] and treatment effect estimation in continuous time [paper].
Oct 2021 β I have officially started a PhD in Machine Learning in the University of Cambridge under the supervision of Mihaela van der Schaar!
Publications
Please find some of my publications below (a more up-to-date list can be found onΒ google scholar).
“*” denotes equal contribution.
Top ML/AI venues
- P.Rauba*, N.Seedat*, M.Luyten, M. van der Schaar. ``Context-aware testing: A new paradigm for testing with Large Language Models.’’ NeurIPS 2024 [paper]
- N.Seedat, M. van der Schaar. ``Matchmaker: Self-Improving Compositional LLM Programs for Schema Matching.’' NeurIPS 2024 GenAI for Health [paper]
- P.Rauba, N.Seedat, M. van der Schaar. ``Self-Healing Machine Learning: A Framework for Autonomous Adaptation in Real-World Environments.’’ NeurIPS 2024 [paper]
- N.Astorga, T,Liu, N.Seedat, M. van der Schaar. ``POCA: Partially Observable Cost-Aware Active-Learning.’' NeurIPS 2024
- N.Seedat*, N.Huynh*, B.van Breugel, M.van der Schaar ``Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes.’’ ICML 2024 [paper]
- T.Pouplin, A.Jeffares, N.Seedat, M.van der Schaar ``Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise.’’ ICML 2024 [paper]
- N.Seedat*, N.Huynh*, F.Imrie, M. van der Schaar. ``You can’t handle the (dirty) truth: Data-Centric Insights Improve Pseudo-Labeling’’ Journal of Data-centric Machine Learning Research (DMLR) [paper]
- N.Seedat, F.Imrie, M. van der Schaar. ``Dissecting sample hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI.’’ ICLR 2024 [paper]
- H.Sun, A.Chan, N.Seedat, A.Huyuk, M. van der Schaar. ``When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective.’’ Journal of Data-centric Machine Learning Research (DMLR) [paper]
- T.Liu, N.Astorga, N.Seedat, M. van der Schaar. ``Large Language Models to Enhance Bayesian Optimization.’’ ICLR 2024 [paper]
- N. Huynh, J. Berrevoets,N.Seedat, J.Crabbe, Z.Qian, M. van der Schaar. ``DAGnosis: Localized Identification of Data Inconsistencies using Structures.’’ AISTATS 2024 [paper]
- N.Seedat, J.Crabbe, Z.Qian, M. van der Schaar. ``TRIAGE: Characterizing and auditing training data for improved regression.’’ NeurIPS 2023 [paper]
- N.Seedat*, B.van Breugel*, F.Imrie, M. van der Schaar. ``Can you rely on your model evaluation? Improving model evaluation with synthetic test data.’’ NeurIPS 2023 [paper]
- L.Hansen*, N.Seedat*, M. van der Schaar, A.Petrovic. ``Reimagining Synthetic Data Generation through DataCentric AI: A Comprehensive Benchmark..’’ NeurIPS 2023 (D&B) [paper]
- H.Sun, B.van Breugel, J.Crabb'e, N.Seedat, M. van der Schaar. ``What is Flagged in Uncertainty Quantification? Latent Density Models for Uncertainty Categorization.’’ NeurIPS 2023 [paper]
- N.Seedat*, A.Jeffares*, F.Imrie, M. van der Schaar. ``Improving adaptive conformal prediction using self-supervised learning.’’ AISTATS 2023 [paper]
- J.Berrevoets, N.Seedat, F.Imrie M. van Der Schaar. ``Differentiable and transportable structure learning.’’ ICML 2023 [paper]
- N.Seedat, J.Crabbe, I.Bica, M. van der Schaar. ``Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data.’’ NeurIPS 2022 [paper]
- N.Seedat, J.Crabbe, M. van der Schaar. ``Data-SUITE: Data-Centric identification of in-distribution incongruous examples.’’ ICML 2022 (Spotlight) [paper]
- N.Seedat*, F.Imrie*, A.Bellot, Z.Qian, M. van der Schaar. ``Continuous-time modeling of counterfactual outcomes using neural controlled differential equations.’’ ICML 2022 [paper]
- N.Seedat. ``MCU-Net: A framework towards uncertainty representations for decision support system patient referrals in healthcare contexts.’’ KDD 2020, Spotlight Presentation: Workshop on Applied Data Science for Healthcare & ICML 2020: Uncertainty & Robustness in Deep Learning Workshop. [paper]
- N.Seedat and C.Kanan. ``Towards calibrated and scalable uncertainty representations for neural networks.’’ NeurIPS 2019 - 4th Workshop on Bayesian Deep Learning. [paper]
- N.Seedat and V.Aharonson. ``Machine learning discrimination of Parkinsonβs Disease stages from walker-mounted sensors data.’’ AAAI 2020 - International Workshop on Health Intelligence and Studies in Computational Intelligence (Springer), 2020. [paper]
Journals and other conferences
- L.Goetz*,N.Seedat*, R.Vandersluis, M. van der Schaar. ``Generalizationβa key challenge for responsible AI in patient-facing clinical applications.’’ Nature Digital Medicine [paper]
- N.Seedat, F.Imrie, M. van der Schaar. ``Navigating Data-Centric Artificial Intelligence with DC-Check: Advances, Challenges, and Opportunities.’’ IEEE Transactions on Artificial Intelligence, 2024. [paper]
- E.Heremans, N.Seedat, B.Buyse, D.Testelmans, M. van der Schaar, & M. De Vos. ``U-PASS: an Uncertainty-guided deep learning Pipeline for Automated Sleep Staging.’’ Computers in Biology and Medicine, Vol 171, 2024. [paper]
- H.Liu*, N.Seedat* and J.Ive. ``Modelling Disagreement in Automatic Data Labelling for Semi-Supervised Learning in Clinical Natural Language Processing.’’ Frontiers in Artificial Intelligence, 2024. [paper]
- N.Seedat and V.Aharonson. ``Automated Machine Vision Enabled Detection of Movement Disorders from hand drawn spirals.’’ IEEE International Conference on Health Informatics (IEEE ICHI), 2020. [paper]
- N.Seedat, V.Aharonson and Y.Hamzany. ``Automated and interpretable m-health discrimination of vocal cord pathology enabled by machine learning.’’ IEEE Conference on Computer Science and Data Engineering, 2020. [paper]
- V.Aharonson, N.Seedat, S.Korn, S.Baer, M.Postema, G.Yahalom. ‘‘Automated stage discrimination of Parkinsonβs Disease.’’, BIO Integration Journal, 2020. [paper]
- N.Seedat, N.Sen, N.Naicker, K.Sharma, A. Almeida, G.Kalyansundaram, B.Mkwanazi, M.Velayudan. ``PEMS: Custom Neural Machine Translation System-Making subtitling of Portuguese TV shows and movies on the African continent work.’’ IEEE ICECET, 2021. [paper]
- N.Seedat, D.Beder, V.Aharonson and S.Dubowsky. ``A comparison of footfall detection algorithms from walker mounted sensors data.’’ IEEE EBBT, 2018. [paper]
- V.Aharonson, N.Seedat, I.Schlesinger, A.McDonald, S.Dubowsky and A.Korczyn. ``Feasibility of an instrumented walker to quantify treatment effects on Parkinsonβs patient gait.’’ IEEE EBBT, 2018. [paper]
- N.Seedat, I.Mohamed and AK.Mohamed .``Custom Force Sensor and Sensory Feedback System to Enable Grip Control of a Robotic Prosthetic Hand.’’ IEEE BioRob, 2018. [paper]
- N.Seedat and A.van Wyk. ``Quadcopter Control using Intelligent Control.’’ Deep Learning Indaba. [paper]
Tutorials at top AI/ML conferences
N.Seedat, C.Gonzalez, M. van der Schaar. ``Clinical AI in the real-world: From Data-centric AI to Dynamic Learning’’ MICCAI 2024 Tutorial.
N.Seedat, I.Guyon, M. van der Schaar. ``Data-Centric AI for reliable and responsible AI: from theory to practice’’ NeurIPS 2023 Tutorial.
N.Seedat and M. van der Schaar. ``Data-Centric AI: Foundation, Frontiers and Applications.’’ IJCAI 2023 Tutorial.
Invited Talks
GE Healthcare (Topic: Synthetic data) (Oct 2024)
Nature Digital Medicine (Topic: Generalization as a key challenge for Responsible AI) (June 2024)
Stanford (Topic: An uncertainty estimation lens on Data-centric AI) (June 2024)
ICTP Advanced ML summer school (Topic: Data-Centric AI) (May 2024)
Apple (Topic: Data-Centric AI - Data characterization & Synthetic Data) (April 2024)
Microsoft Research Cambridge (Topic: Data-Centric AI) (November 2023)
Future of Data-Centric AI Conference Talk (Topic: Data-IQ) (June 2023)
Discovery Limited Invited Talk (Topic: Data-Centric AI) (Feb 2023)
AstraZeneca AI Journal Club Invited Talk (Topic: Data-Centric AI) (Nov 2022)
Queen Mary University London CogSci Invited Talk (Topic: Data-Centric AI) (Oct 2022)
University of Cape Town Invited Talk (Topic: Data-Centric AI) (Oct 2022)