September 29, 2016

A team of Berkeley faculty have been awarded a $150,000 Google grant and $30,000 of in-kind credits to further develop the computing environment used in Foundations of Data Science, Berkeley’s pioneering course for lower-division students of all majors.

In the Foundations course, as well as the associated “connector” courses, students become proficient with data analysis and computing by working hands-on with real data, addressing real-life issues ranging from water usage to crime rates.  

The Google award will enable a team of Berkeley students, staff, and faculty to continue developing this cloud-based platform to support other Berkeley courses and extend to other institutions that want to offer data science courses. Among the universities that have approached Berkeley are Harvard, Stanford, Yale, UC San Diego, University of Pennsylvania, Cornell, and Carnegie-Mellon. The project seeks to make it easy and safe to utilize the course’s computational infrastructure in conjunction with the local campus’ approach to authentication and storage, regardless of cloud provider.  Also, it seeks to allow resources to be deployed flexibly to support student work.

UC Berkeley is a strong partner to Google and we're happy to be supporting the Foundations of Data Science course, as courses like this have an expanding reach to other institutions that can take advantage of its high-quality content,” said Maggie Johnson, Google's Director of Education and University Relations.

Computer Science Professor John DeNero, the faculty team lead, emphasized that the Foundations course and the online computing environment are designed to make data science accessible to a broad array of students —  many without previous programming or statistics experience.

“This course is really focused on how we can study the world through the lens of computing,” he said. “Even students with no background in computer science are able to do this by learning to program, then learning to use computation to perform statistical analysis. Students carry out the whole process of data science themselves – visualizing data sets, asking interesting questions about them, and answering those questions through computation.”

In the class, students learn to use the Python programming language and complete assignments in Jupyter Notebooks, which enable browser-based computation in the cloud, avoiding the need to install software, transfer files, or update libraries. Using Jupyter Notebooks, educators, scientists and researchers can combine data from multiple formats – live code, equations, narrative text and rich media – into a single, interactive document. Educators can write instructions in the Notebook, include a coding exercise after the instructions, and then ask for students’ interpretation of the results immediately after that.  Used by more than 1 million academics and professionals in fields ranging from finance to astrophysics, Jupyter Notebooks have emerged as the gold standard application for data science.

Berkeley has emphasized an open-source approach to developing its data science curriculum. The textbook and other course materials for Foundations of Data Science — including lab exercises, homework assignments, exams, and associated datasets — are already available to the academic community online on data8.org. With the next iteration of the course’s cloud-based computing environment, the suite of materials will be complete.

As Foundations of Data Science begins to reach a wider audience, the developers have not lost sight of the main purpose of the course: to bring data science to all students, regardless of background. This semester, a laptop loan program has been added for students who have limited access to personal computers. But simply making computers available is not enough.   

“Without the software technology tools that we’re using, the Foundations course wouldn’t exist,” said Statistics Professor Ani Adhikari, who teaches the course and co-developed the course. “There would have been too many obstacles for students to easily gain access to the tools and data that they need.”

For detailed course information about Foundations of Data Science, please visit data8.org.