The Data Science Environments Partnership includes New York University, the University of California, Berkeley, the University of Washington, the Gordon and Betty Moore Foundation, and the Alfred P. Sloan Foundation. The goal of this partnership is to dramatically advance data-intensive scientific discovery, empowering researchers to be vastly more effective by utilizing new methods, new tools, new partnerships, and new career paths. We are accomplishing this via the creation of “Data Science Environments” at the three universities with a five-year $37.8 million cross-institutional effort supported by the Gordon and Betty Moore Foundation and Alfred P. Sloan Foundation. These new Data Science Environments will demonstrate how an institution-wide commitment to data science can deliver dramatic gains in scientific productivity and lead to significant new discoveries.
Why Data Science Environments?
Technological advances are generating staggering amounts of scientific data. The scale and complexity of these data sources continues to grow. Efficiently harnessing the data being generated has the potential to revolutionize every field within the natural, mathematical, computational, and social sciences. However, the volume, variety, and velocity of data are overwhelming current tools and practices, while the researchers (data scientists) who have the computational, statistical, and mathematical skills needed to handle or effectively analyze the data are not sufficiently supported within academia.
Changes in academia are required to more fully support data scientists who enable new data-driven discoveries. We aim to create new types of institutional environments in which these discoveries can take place.
Data scientists are in high demand in industry but lack similar support or a singular home in academia due to their inherently interdisciplinary nature. Further, institutions are not optimally structured to bring together and reward people with both the scientific domain expertise and the computational, statistical, and mathematical skills needed to drive data science forward.
Challenges of Supporting Data-Driven Science
There are significant cultural mismatches between the entrenched structures of universities and the needs of data-intensive scientific discovery. The following are some of the challenges we have encountered at our institutions and seek to address:
- Supporting the careers of data scientists
- Promoting education and training in data-intensive discovery
- Building an ecosystem of tools and software
- Supporting reproducibility and open science
- Creating physical and intellectual space for growing the data science community
- Understanding the culture of data science through ethnography and evaluation
Our work is organized around these six challenges, which we believe are themes that can be used to effectively talk about the future of academic data science. Learn more:
- Education and training
- Tools and software
- Reproducibility and open science
- Physical and intellectual space
- Ethnography and evaluation
Our Long-Term Goal
At the end of this five-year experiment, our goal at the three universities is to better understand how best to bring together interdisciplinary people within institutional environments in order to provide them with the resources, freedom, and interconnected networks necessary for science to flourish. This will allow research teams to focus on what is most meaningful—developing better methodologies to analyze data and advancing scientific discovery.