The Data-Driven Discovery of Models (D3M) program aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background to create empirical models of real, complex processes. At the NYU VIDA Center we address two important challenges in automating machine learning: i) pipeline synthesis and model understanding, and ii) dataset search and discovery.
AutoML and Model Explanability
AlphaD3M is an AutoML system that automatically searches for models and derives end-to-end pipelines that read, pre-process the data, and train the model. AlphaD3M uses deep learning to learn how to incrementally construct these pipelines. The process progresses by self play with iterative self improvement.
Data scientists can interact with AlphaD3M through d3m-interface. d3m-interface is a Python library which enables data scientist to use D3M AutoML systems. It contains an implementation to integrate D3M AutoML systems with Jupyter Notebooks. It provides a familiar interface to make easier for people to adopt D3M tools.
PipelineProfiler is an interactive visualization tool that allows the exploration and comparison of the solution space of ML pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be used together with common data science tools to enable a rich set of analyses of the ML pipelines.
Visus is a system designed to support the model building process and curation of ML data processing pipelines generated by AutoML systems. Visus also integrates visual analytics techniques and allows users to perform interactive data augmentation and visual model selection.
Dataset Search and Discovery
Auctus automatically discovers datasets on the Web and, different from existing dataset search engines, infers consistent metadata for indexing and supports join and union search queries. Auctus is already being used in a real deployment environment to improve the performance of machine learning models.
News & Events
09/2020 AlphaD3M+PipelineProfiler was selected as one of the finalists in a machine-learning competition organized by Wells Fargo.
08/2020 Paper “PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines” has been accepted at VIS2020.
09/2019 Visus system has been accepted to the Demo Expo at NYC Media Lab’s Annual Summit.
09/2019 AlphaD3M and Visus .
05/2019 Paper “Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar” accepted at AutoML’19.
05/2019 A paper accepted at DEEM’19: “Debugging Machine Learning Pipelines”.
04/2019 New publication accepted at HILDA’19 (co-located with SIGMOD): “Visus: An Interactive System for Automatic Machine Learning Model Building and Curation”.
06/2018 Paper “AlphaD3M: Machine Learning Pipeline Synthesis” accepted at AutoML’18.