This spring, Berkeley’s growing Data Science curriculum will take another step forward, with the introduction of a new lower-division course that allows students of all majors to learn relatively advanced statistical methods without the need for extensive prerequisites.
“This course is really building on the excitement and energy of the current Data Science momentum at Berkeley,” said Statistics Professor Elizabeth Purdom. “We feel we’re meeting the internal needs of our [Statistics] department but also the larger goals of the campus in terms of building a data science curriculum.”
Vice Chancellor for Undergraduate Education Cathy Koshland announced this week that Statistics Professors Purdom and Adityanand Guntuboyina will receive a 2016-2017 Presidential Chair Fellows Curriculum Enrichment Grant to develop the new course, Statistical Methods for Data Science (STAT 28). The grants are awarded to faculty teams to transform core areas of the undergraduate curriculum.
The new course is designed to allow students in various disciplines to learn relatively advanced statistical methods without the need for extensive prerequisites. The course will build on the unique pedagogical approach developed in Foundations for Data Science (CS C8/INFO 8 /STAT C8, familiarly called Data 8), and add to Berkeley’s current Data Science offerings. Like the Foundations course, STAT 28 will offer opportunities for students to work with real data and use it to ask questions and draw meaningful conclusions. Using the statistical programming language of R, students will also gain new tools to build on the concepts taught in Foundations of Data Science, which is the only prerequisite to the course.
“We’re trying to reach an audience who want to understand how to use data in their field of study, but they’re not Statistics or Computer Science majors,” Purdom said. “The goal is to introduce statistical methods that go beyond what you can get in any introductory class, and without having to take four or five math or Stats prerequisites first.”
The course will expand on the statistical foundations offered in Data 8, with more on continuous distributions and likelihood, while maintaining the emphasis on modern resampling techniques. The course will introduce commonly used methods for analyzing multivariate data, such as Principal Components Analysis (PCA), hierarchical clustering, multiple regression modeling, and random forests.
The class is expected to have a capacity of about 60 students this spring.