A data science course introduced this semester gives students the chance to delve into concepts and research that are rarely, if ever, offered at the undergraduate level.
Data 102: Data, Inference, and Decisions, taught by EECS Professor Michael Jordan and Statistics Professor Fernando Perez, is the latest addition to a series of core data science courses that include Data 8: Foundations of Data Science, Stat 140: Probability for Data Science, and Data 100: Principles and Techniques of Data Science.
Like other courses in the series, the design of Data 102 drew on input from Berkeley faculty across several disciplines, who aimed to integrate key areas in data science. Building on the earlier courses, Data 102 not only teaches students the “how to” of Data 100 and the “finding patterns” of Data 8, but also the applications, specifically in relation to decision making in the context of other decision makers and sequences of decisions. Students learn to use data to make decisions even when faced with uncertainty.
"There is no other class that brings statistics, computing, and real world problems together in such an embrace."
“There is no other class that brings statistics, computing, and real world problems together in such an embrace” Jordan said.
Jordan and Perez believe that this class represents the first time that some of this content has been taught at an undergraduate level. One of the first topics discussed is online false discovery control (maintaining control over the percent of false positives in a series of decisions over time), which was first written about in a research paper just a few years ago.
After completing this course, the professors are confident that their students will be ready to pursue their own data science problems and work outside of college. Currently, about half of the 100 students in the class are interested in pursuing further academic works and the other half in moving into industry, and the course team is working hard to create a curriculum that serves both of these groups. Incorporating data and relevant problems from public policy, cognitive science, genetics, and more, the class is designed to accommodate students of all majors and interests who meet the prerequisites.
“These tools and the issues that come with these volumes of data are changing how all research disciplines are conducted and how all industrial practice is conducted,” Perez said.
Since it is currently a pilot class, Data 102 is limited to 100 seats, which are filled by current data science and statistics majors. Building on prerequisites that include Linear Algebra, Data 100, and an advanced Statistics course, the course includes a challenging curriculum that Perez and Jordan want to open up to a larger group of students from all majors soon. They hope that any students who were drawn to data science after taking Data 8 will make sure that they are prepared by meeting the prerequisites to take Data 102 during their years at Berkeley.
“It started with Data 8,” Jordan said. “In some ways it just returns to the Data 8 ideas and does them in another level of sophistication. So with Data 8, it was really more to get people excited, show them a few ideas that they would then start to deepen out and pursue. This [class] is to see that they’ve really deepened those ideas out, and they’re ready to go to industry or to government or to academia and really start to work on these ideas in real life.”