About  Publications  Team  Contact

D3M Project

The Data-Driven Discovery of Models (D3M) program aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background to create empirical models of real, complex processes. At the NYU VIDA Center we address two important challenges in automating machine learning: i) pipeline synthesis and model understanding, and ii) dataset search and discovery.

AutoML and Model Explanability


Alpha-AutoML is an extensible open-source AutoML system. It leverages the reinforcement learning and neural network components of AlphaD3M, but it relies on standard, open-source infrastructure to specify and run pipelines. It is compatible with state-of-the-art ML techniques: by using the Sklearn pipeline infrastructure, Alpha-AutoML is fully compatible with other standard libraries like XGBoost, Hugging Face, Keras, PyTorch. In addition, primitives can be added on the fly through the standard Sklearn’s fit/predict API, making it possible for Alpha-AutoML to leverage new developments in ML and keep up with the fast pace in the area.

Repository: github.com/VIDA-NYU/alpha-automl


AlphaD3M is an AutoML system for multiple ML tasks that automatically searches for models and derives end-to-end pipelines that read, pre-process the data, and train the model. AlphaD3M leverages recent advances in deep reinforcement learning and is able to adapt to different application domains and problems through incremental learning.

Repository: gitlab.com/ViDA-NYU/d3m/alphad3m
Video demo: youtube.com/watch?v=9qJvOUOh2zM


Data scientists can interact with AlphaD3M through d3m-interface. d3m-interface is a Python library which enables data scientist to use D3M AutoML systems. It contains an implementation to integrate D3M AutoML systems with Jupyter Notebooks. It provides a familiar interface to make easier for people to adopt D3M tools.

Repository: gitlab.com/ViDA-NYU/d3m/d3m_interface


PipelineProfiler is an interactive visualization tool that allows the exploration and comparison of the solution space of ML pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be used together with common data science tools to enable a rich set of analyses of the ML pipelines.

Repository: github.com/VIDA-NYU/PipelineVis
Video demo: youtube.com/watch?v=2WSYoaxLLJ8


Visus is a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. Visus also integrates visual analytics techniques and allows users to perform interactive data augmentation and visual model selection.

Repository: gitlab.com/ViDA-NYU/d3m/ta3
Video demo: youtube.com/watch?v=EUn1qwXVFHs

Dataset Search and Discovery


Auctus automatically discovers datasets on the Web and, different from existing dataset search engines, infers consistent metadata for indexing and supports join and union search queries. Auctus is already being used in a real deployment environment to improve the performance of machine learning models.

Repository: gitlab.com/ViDA-NYU/auctus/auctus
Video demo: youtube.com/watch?v=lZQbh3ctq6Q

News & Events

09/2020 AlphaD3M+PipelineProfiler was selected as one of the finalists in a machine-learning competition organized by Wells Fargo.
08/2020 Paper “PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines” has been accepted at VIS2020.
09/2019 Visus system has been accepted to the Demo Expo at NYC Media Lab’s Annual Summit.
09/2019 AlphaD3M and Visus have been selected for a talk in the special track Automated Machine Learning and AI at IBM AI Systems Day.
05/2019 Paper “Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar” accepted at AutoML’19.
05/2019 A paper accepted at DEEM’19: “Debugging Machine Learning Pipelines”.
04/2019 New publication accepted  at HILDA’19 (co-located with SIGMOD): “Visus: An Interactive System for Automatic Machine Learning Model Building and Curation”.
06/2018 Paper “AlphaD3M: Machine Learning Pipeline Synthesis” accepted at AutoML’18.