The Data-Driven Discovery of Models (D3M) program aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background to create empirical models of real, complex processes.
Related Papers
- Roque Lopez, Raoni Lourenco, Remi Rampin, Sonia Castelo, Aécio Santos, Jorge Ono, Claudio Silva, Juliana Freire. AlphaD3M: An Open-Source AutoML Library for Multiple ML Tasks. AutoML Conference, 2023.
- Jorge Ono, Sonia Castelo, Roque Lopez, Enrico Bertini, Juliana Freire, Claudio Silva. PipelineProfiler: A Visual Analytics Tool for the Exploration of AutoML Pipelines. IEEE Visualization and Computer Graphics, 2020.
- Iddo Drori, Yamuna Krishnamurthy, Raoni Lourenco, Remi Rampin, Kyunghyun Cho, Claudio Silva, Juliana Freire. Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar. AutoML Workshop at ICML, 2019.
- Raoni Lourenco, Juliana Freire, Dennis Shasha. Debugging Machine Learning Pipelines. Workshop on Data Management for End-to-End Machine Learning, 2019.
- Iddo Drori, Yamuna Krishnamurthy, Remi Rampin, Raoni Lourenco, Jorge Ono, Kyunghyun Cho, Claudio Silva, Juliana Freire. AlphaD3M: Machine learning pipeline synthesis. AutoML Workshop at ICML, 2018.
Vizier, an open-source tool that helps analysts to build and refine data pipelines. Vizier combines the flexibility of notebooks with the easy-to-use data manipulation interface of spreadsheets. Combined with advanced provenance tracking for both data and computational steps this enables reproducibility, versioning, and streamlined data exploration.
Related Papers
-
Mike Brachmann, William Spoth, Oliver Kennedy, Boris Glavic, Heiko Mueller, Sonia Castelo, Carlos Bautista, Juliana Freire. Your notebook is not crumby enough, REPLace it. CIDR, 2020
- Mike Brachmann, Carlos Bautista, Sonia Castelo, Su Feng, Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller, Rémi Rampin, William Spoth, Ying Yang. Data Debugging and Exploration with Vizier. SIGMOD Conference, 2019.
ARIES- ARt Image Exploration Space, an interactive image manipulation system that enables the exploration and organization of fine digital art. The system allows images to be compared in multiple ways, offering dynamic overlays analogous to a physical light box, and supporting advanced image comparisons and feature-matching functions, available through computational image processing.
Related Papers
- Lhaylla Crissaff, Louisa Ruby, Samantha Deutch, Luke DuBois, Jean-Daniel Fekete, Juliana Freire and Cláudio T. Silva. ARIES: Enabling Visual Exploration and Organization of Art Image Collections. IEEE Computer Graphics and Applications, 38 (1), 2018, 91-108.
The project – which involves large-scale noise monitoring – leverages the latest in machine learning technology, big data analysis, and citizen science reporting to more effectively monitor, analyze, and mitigate urban noise pollution.
Related Papers
-
C Mydlarz, M Sharma, Y Lockerman, B Steers, CT Silva, JP Bello. The life of a New York City noise sensor network. – Sensors, 2019.
-
Lostanlen, V., Salamon, J., Cartwright, M., McFee, B., Farnsworth, A., Kelling, S. and Bello, J. P.Per-Channel Energy Normalization: Why and how.
In IEEE Signal Processing Letters, 26(1) pp.39–43, 2019 (Jan).
A web-based dataflow framework for visual data exploration that focuses on interactivity, flexibility, and simplicity.
Related Papers
-
Bowen Yu, and Claudio T. Silva. VisFlow – Web-based Visualization Framework for Tabular Data with a Subset Flow Model. In IEEE Transactions on Visualization and Computer Graphics (Proc. VAST), 2017.
IEEE Xplore: https://ieeexplore.ieee.org/
document/7536189
Generating community-sourced disease data
Related Papers
- B. Ray, R. Chunara. Predicting Acute Respiratory Infections from Participatory Data. 2016. International Society for Disease Surveillance Conference. Atlanta, USA.
- B. Ray, E. Ghedin, R. Chunara Network Inference from Multimodal data: A Review of Approaches from Infectious Disease Transmission 2016. JBI
- B. Ray, R. Chunara. Integrating Genomic Data in a Community-based Multi-modal Viral Transmission Model for Network Inference. 2015. NIPS Workshop on Machine Learning for Healthcare. Montréal, Canada
Our goal in this project is to develop a scalable infrastructure that automates, to a large extent, the process of discovering, organizing, and extracting data from hidden-Web sources.
Related Papers
- Kien Pham, Aécio Santos, and Juliana Freire. Learning to Discover Domain-specific Web Content. WSDM 2018
- Kien Pham, Aècio Santos, Juliana Freire. Understanding website behavior based on user agent. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval 2016
An open-source scientific workflow and provenance management system that supports data exploration and visualization.
Related Papers
- VisMashup: Streamlining the Creation of Custom Visualization Applications (by Emanuele Santos, Lauro Lins, James Ahrens, Juliana Freire and Claudio Silva). IEEE Trans. Vis. Comp. Graph (Proceedings of IEEE Vis 2009), 15(6), pp. 1539-1546, 2009. (video)
- Using Workflow Medleys to Streamline Exploratory Tasks (by Emanuele Santos, David Koop, Huy T. Vo, Erik Anderson, Juliana Freire, and Claudio T. Silva). In 21st International Conference on Scientific and Statistical Database Management (SSDBM), pp. 292-301, 2009.
- A First Study on Strategies for Generating Workflow Snippets (by Tommy Ellkvist, Lena Stromback, Lauro Lins, and Juliana Freire) In ACM SIGMOD Intenational Workshop on Keyword Search on Structured Data (KEYS), 2009.
- Using Mediation to Achieve Provenance Interoperability (by Tommy Ellkvist, David Koop, Juliana Freire, Claudio T. Silva, and Lena Stromback) In IEEE International Workshop on Scientific Workflows, 2009.
Urbane
.
Related Papers
-
Urbane: A 3D Framework to Support Data Driven Decision Making in Urban Development. Nivan Ferreira, Marcos Lage, Harish Doraiswamy, Huy T. Vo, Luc Wilson, Heidi Werner, Muchan Park, Claudio Silva. VAST ’15: Proc. IEEE Conf. on Visual Analytics Science and Technology, 2015, 97-104
-
Topology-based Catalogue Exploration Framework for Identifying View-Enhanced Tower Designs. Harish Doraiswamy, Nivan Ferreira, Marcos Lage, Huy T. Vo, Luc Wilson, Heidi Werner, Muchan Park, Claudio Silva. ACM Transactions on Graphics (SIGGRAPH Asia ’15), 34(6), 2015, 230:1-230:13
-
Urban Pulse: Capturing the Rhythm of Cities. Fabio Miranda, Harish Doraiswamy, Marcos Lage, Kai Zhao, Bruno Gonçalves, Luc Wilson, Mondrian Hsieh, and Claudio Silva. IEEE Transactions on Visualization and Computer Graphics (IEEE SciVis ’16), 23(1), 2016, 791-800
Spatial Query Processing
.
Related Papers
-
A GPU-Based Index to Support Interactive Spatio-Temporal Queries over Historical Data. Harish Doraiswamy, Huy T. Vo, Claudio Silva, and Juliana Freire. ICDE ’16: Proc. Intl. Conf. on Data Engineering, 2016, 1086-1097
-
A Unified Index for Spatio-Temporal Keyword Queries.Tuan-Anh Hoang-Vu, Huy T. Vo, and Juliana Freire. CIKM ‘16: Proc. ACM Intl. Conf. on Information and Knowledge Management, 2016, 135-144
-
GPU Rasterization for Real-Time Spatial Aggregation over Arbitrary Polygons. Eleni Tzirita Zacharatou, Harish Doraiswamy, Anastasia Ailamaki, Claudio T. Silva, and Juliana Freire. PVLDB, 2017, to appear
Urban Data Analysis
.
Related Papers
-
Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. Nivan Ferreira, Jorge Poco, Huy T. Vo, Juliana Freire, and Claudio T. Silva. IEEE Transactions on Visualization and Computer Graphics (IEEE VAST ’13), 19(12), 2013, 2149-2158
-
Using Topological Analysis to Support Event-Guided Exploration in Urban Data. Harish Doraiswamy, Nivan Ferreira, Theodoros Damoulas, Juliana Freire, and Claudio Silva. IEEE Transactions on Visualization and Computer Graphics (IEEE SciVis ’14), 20(12), 2014, 2634-2643
- Riding from Urban Data to Insight Using New York City Taxis. Juliana Freire, Claudio Silva, Huy Vo, Harish Doraiswamy, Nivan Ferreira, and Jorge Poco. IEEE Data Engineering Bulletin, 37(4), 2014, 43-55
- Exploring Traffic Dynamics in Urban Environments Using Vector-Valued Functions. Jorge Poco, Harish Doraiswamy, Huy T. Vo, Joao L. D. Comba, Juliana Freire, and Claudio Silva. Computer Graphics Forum (EuroVis ’15), 34(3), 2015, 161-170
- Anonymizing NYC Taxi Data: Does It Matter? Marie Douriez, Harish Doraiswamy, Claudio Silva, and Juliana Freire. DSAA ’16: Proc. IEEE Intl. Conf. on Data Science and Advanced Analytics, 2016, 140-148
- Data Polygamy: The Many-Many Relationships among Urban Spatio-Temporal Data Sets. Fernando Chirigati, Harish Doraiswamy, Theodoros Damoulas, Juliana Freire. SIGMOD ’16: Proc. Intl. Conf. on Management of Data, 2016, 1011-1025
- Querying and Exploring Polygamous Relationships in Urban Spatio-Temporal Data Sets. Yeuk-Yin Chan, Fernando Chirigati, Harish Doraiswamy, Claudio Silva and Juliana Freire. SIGMOD ’17: Proc. Intl. Conf. on Management of Data, 2017, 1643–1646
Related Papers
- ReproZip: The Reproducibility Packer, R. Rampin, F. Chirigati, D. Shasha, J. Freire, and V. Steeves. In Journal of Open Source Software (JOSS), 2016
- ReproZip: Computational Reproducibility With Ease, F. Chirigati, R. Rampin, D. Shasha, and J. Freire. In Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 2085-2088, 2016
- Packing Experiments for Sharing and Publication, F. Chirigati, D. Shasha, and J. Freire. In Proceedings of the 2013 International Conference on Management of Data (SIGMOD), pp. 977-980, 2013
- ReproZip: Using Provenance to Support Computational Reproducibility, F. Chirigati, D. Shasha, and J. Freire. In Proceedings of the 5th USENIX conference on Theory and Practice of Provenance (TaPP), 2013