About  Publications  Team  Contact

D3M Project

The Data-Driven Discovery of Models (D3M) program aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background to create empirical models of real, complex processes. At the NYU VIDA Center we address two important challenges in automating machine learning: i) pipeline synthesis and model understanding, and ii) dataset search and discovery.

AutoML and Model Explanability

AlphaD3M

AlphaD3M is an AutoML system that automatically searches for models and derives end-to-end pipelines that read, pre-process the data, and train the model. AlphaD3M uses deep learning to learn how to incrementally construct these pipelines. The process progresses by self play with iterative self improvement.

Repository:
gitlab.com/ViDA-NYU/d3m/alphad3m

D3M-interface

Data scientists can interact with AlphaD3M through d3m-interface. d3m-interface is a Python library which enables data scientist to use D3M AutoML systems. It contains an implementation to integrate D3M AutoML systems with Jupyter Notebooks. It provides a familiar interface to make easier for people to adopt D3M tools.

Repository:
gitlab.com/ViDA-NYU/d3m/d3m_interface

PipelineProfiler

PipelineProfiler is an interactive visualization tool that allows the exploration and comparison of the solution space of ML pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be used together with common data science tools to enable a rich set of analyses of the ML pipelines.

Repository: github.com/VIDA-NYU/PipelineVis
Video demo: youtube.com/watch?v=2WSYoaxLLJ8

Visus

Visus is a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. Visus also integrates visual analytics techniques and allows users to perform interactive data augmentation and visual model selection.

Repository: gitlab.com/ViDA-NYU/d3m/ta3
Video demo: youtube.com/watch?v=EUn1qwXVFHs

Dataset Search and Discovery

Auctus

Auctus automatically discovers datasets on the Web and, different from existing dataset search engines, infers consistent metadata for indexing and supports join and union search queries. Auctus is already being used in a real deployment environment to improve the performance of machine learning models.

Repository:
gitlab.com/ViDA-NYU/auctus

News & Events

09/2020 AlphaD3M+PipelineProfiler was selected as one of the finalists in a machine-learning competition organized by Wells Fargo.
08/2020 Paper “PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines” has been accepted at VIS2020.
09/2019 Visus system has been accepted to the Demo Expo at NYC Media Lab’s Annual Summit.
09/2019 AlphaD3M and Visus have been selected for a talk in the special track Automated Machine Learning and AI at IBM AI Systems Day.
05/2019 Paper “Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar” accepted at AutoML’19.
05/2019 A paper accepted at DEEM’19: “Debugging Machine Learning Pipelines”.
04/2019 New publication accepted  at HILDA’19 (co-located with SIGMOD): “Visus: An Interactive System for Automatic Machine Learning Model Building and Curation”.
06/2018 Paper “AlphaD3M: Machine Learning Pipeline Synthesis” accepted at AutoML’18.