Jennifer Chayes is Associate Provost of the Division of Computing, Data Science, and Society (CDSS) at UC Berkeley, which comprises EECS, Statistics, BIDS, the Data Science Education Program, the Center for Computational Biology, and the School of Information, for which she is also Dean. Chayes is Professor in four departments and schools: EECS, Information, Mathematics, and Statistics. For 23 years, she was at Microsoft, most recently as a Technical Fellow, where she co-founded and led three interdisciplinary labs in Cambridge, Mass., NYC, and Montreal.
She is a member of the National Academy of Sciences, and the American Academy of Arts and Sciences. Chayes has received numerous awards and honors, including the 2012 Anita Borg Institute Women of Vision Leadership Award, the 2015 John von Neumann Award of the Society for Industrial and Applied Mathematics (the highest honor of SIAM), and an honorary doctorate from Leiden University in 2016. Chayes is deeply committed to diversity in STEM; she has participated in numerous activities and served on many committees for gender and racial diversity.
In this Q&A she talks about the need for an organization like CDSS and how data science can help tackle some of the world's most pressing problems.
Question: You certainly picked an interesting time to start a new job pulling together a new division at UC Berkeley. How are things going after your first nine months?
Jennifer: People have said to me, “Oh my, you arrived in January and the world collapsed in March.” But I think that times of great disruption are also times of great opportunity. Disruption allows us to rethink everything. Could we use the disruption of online education to think through new ways to make education more inclusive? Can we identify some of the sources of inequities that played out so devastatingly in COVID, and come up with ways to address these going forward? We are looking to push some conventional boundaries in order to have bigger impacts.
There is such depth and dedication among the people at Berkeley. But we need to make sure that the whole is greater than the sum of the parts. As I have virtually met more and more of the CDSS staff, I’m impressed by the breadth of expertise they have and their commitment to the organization. We continue to hire people for critical positions--many of them from campus. We are also striving to ensure we have a culture reflecting how we value diversity and inclusion. If we just talk about it, but don’t demonstrate that commitment they are just empty words.
Among our key hires are members of our executive staff. Kathy Yelick is our Associate Dean for Research and Deborah Nolan is Associate Dean for Undergraduate Education; we’ve just been joined by Oliver O'Reilly as Associate Dean for Graduate Education and Nathan Sayre as Associate Dean for Faculty. Rebecca Miller has been our Chief Administrative Officer since February, and Cynthia LuBien joined us in May as our Chief Development Officer.
In many ways, CDSS is like a startup. We’re moving fast and adapting quickly. It’s both exhausting and exhilarating and I am having the time of my life.
Q: When colleagues ask about your new position, what's your "elevator pitch" reply?
A: Our job is to weave together the riches of the university to solve societal problems. We are focusing on climate and sustainability, biomedicine and health, and social welfare and social justice. We’re not here just to advance our core capabilities (though we will do this too), but to integrate expertise from around the campus to advance new research agendas and to ensure that students come out of Berkeley thinking masterfully and ethically about data so they can transform whatever field they choose to enter.
Q: You mentioned COVID, which is constraining so many things, but it also presents an incredible opportunity to show how the science of data can help slow the virus and help us create a more just, more resilient future. Can you give a few examples of how CDSS is helping battle the pandemic?
A: We have many examples, but here are a few. In mid-March when the outbreaks began, Prof. Bin Yu of the Statistics Department worked with many students and outside collaborators to create predictions of hospital demand seven days out. She and her team used 20 databases from counties and hospitals to come up with five models to accurately learn where to ship PPE and ventilators. They then created visualizations so that non-experts could understand the results. Over 1 million face shields were delivered around the world as one result of this effort.
In another project, a team from Lawrence Berkeley National Laboratory and campus used Natural Language Processing to search existing databases of scientific literature to look for information relevant to treating COVID-19. Now, it isn’t labeled as such because COVID-19 didn’t exist until recently. By adapting a method used in material sciences, they uncovered 80,000 papers, patents and other info and are adding about 1,500 more each week. At the same time, about 1,000 unique visitors are using the site each week to learn about proteins, genes, symptoms and possible drugs that already exist.
In the area of drug design, Jennifer Listgarten of EECS has been using machine learning to research the use of small molecules in drug design. Once the pandemic hit, she started looking for small molecules that could prevent the binding of the COVID-19 virus to human cells.
One other project sits closer to our home. Maya Petersen and Art Reingold from the Department of Epidemiology and Biostatistics in the School of Public Health and their collaborators are working on a project to help Berkeley and other campuses figure out when it will be safe to reopen. They want to take in many sources of data, such as for symptoms, exposure, mobility, place-based data such as from air filters, and aggregate public health data to create a machine learning-based risk prediction of who is most likely to contract the disease.
Most epidemiologists agree that the best way to slow a pandemic is to not just test symptomatic people, but asymptomatic persons too. Should we test everybody all the time? Some models say every two or three days, while others say once a month. But testing is a limited resource so how do we use testing and surveillance with so many people to test? They are doing an experiment using two matched cohorts. The first group is of students, with 1,000 who are living in dorms and another 1,000 who are living elsewhere. The second cohort comprises faculty and staff and they have one group who have gone back into their labs and a second group that is sheltering in place. The goal is to use machine learning to find a smarter way of identifying who to test.
That’s just a subset, but I think it shows the breadth of our efforts and collaborations.
Q: In both external and campus presentations you've talked about how data that are skewed -- either intentionally or not -- can have devastating effects on people in areas such as health care, economics and social divisions. How does this problem intersect with the recent surge in protests for equal treatment for all across social and economic levels? How is CDSS working to address these issues?
A: As I mentioned, structural inequities in our health care systems played out in devastating ways during the pandemic. But this was in many ways predicted in a paper written by Ziad Obermeyer of Berkeley’s School of Public Health and his co-authors and published last October in Science magazine. Using machine learning, the authors found the racial inequality in how health care is allocated. The authors found racial bias in one widely used algorithm because it uses health costs to measure health needs. Because less money is spent on Black patients, the algorithm wrongly concludes that they are healthier than equally sick white patients on whom more money is spent. This resulted in reducing suggested care to Black patients who need extra care by more than half. We saw similar things happen when the pandemic struck. That paper predicted it to a large extent.
The hideous murder of George Floyd is also proving to be very disruptive and, again, is an opportunity to make significant changes in society. I originally considered having “fairness” be one of CDSS’ main foci, but in talking with Linda Burton, Dean of School of Social Welfare at Berkeley, we decided to go beyond striving for fairness and commit to improving human welfare and increasing social justice.
I see us reaching out to people who can effect change. Public defenders, social workers, child welfare workers, policy experts, K-12 educators and have rich experiences that inform the way we can look at the relevant public data. These people are our Berkeley alumni. If we work with the people on the ground who see and live with the effects of racial and economic injustice, we can do proper causal inference, not just causation. We want to understand the effects of interventions. I think we can define a new field around the concept of human welfare and social justice.
We can try out this approach by working with our alumni and hopefully it will then go farther and ultimately benefit everyone in California. I want this to be what we’re known for in five to ten years.
Q: In an interview before you joined campus, you talked about your commitment to getting more women interested in STEM careers and in getting more students overall to take an introductory class in data science. About 6,000 students take such a class at Berkeley each year and more than half of them are women, which is a good start. What do you think is motivating these students and why do you think such a class is important for students, whatever their major?
A: One descriptor of CDSS is “leading in a data-driven world.” Students are increasingly aware of the role of data in their daily lives. But awareness only goes so far -- we want to touch every student who comes through Berkeley so that when they leave, they can think confidently, critically, and ethically about data. I like to think we are inoculating them against misinformation.
It’s not just about Berkeley. Our Data Science Education Program is being used as a model by other colleges and universities, including community colleges and Historically Black Colleges and Universities. We want all students to think about and question data because it comes into play with whatever their field of study. The more you have this ability in your toolbox, the more you will be able to learn using data and transform whatever field you are in.
When I was at Microsoft, I would meet with groups of young women in middle and high schools through our DigiGirlz program. We would talk about how knowledge of data and computer science could enhance whatever they wanted to do with their lives; not replace what they were thinking of doing, but enhance it.
Q: CDSS recently received a $252 million gift toward a new building, the aptly named Data Hub. As the largest gift in the history of UC Berkeley, what does a gift of this magnitude say about the importance of CDSS and its research and education missions?
A: First, it indicates there is an extremely generous donor who shares our vision. It’s clearly a strong endorsement of the multidisciplinary and inclusive vision we have laid out for CDSS. We’re calling it a Data Hub and not the Center for Data Science for a reason. A hub is a central entity with spokes or branches radiating out in all directions. In the case of CDSS, these arms will link us with many other parts of campus, helping us build partnerships with social scientists, economists, mathematicians, sustainable business experts, computer scientists, public health and public policy experts, and others. The new building will both house and further those partnerships. And it will enable education in all of these areas, helping us to foster the growth of future leaders.
Q: It sounds like there has been significant progress. CDSS is drawing in expertise from a number of colleges and schools, which gives it a unique flavor on campus. After aligning these pieces, what will the resulting organization look like in a research sense? Where do you see yourself and CDSS in five years?
A: I think people should be able to move fluidly, to move where their passion takes them, where the problems take them. For example, in CDSS we are bringing together faculty and staff from computer science, from statistics, from the School of Information and we hope that the rest of the university will be able to feel that same fluidity.
We are building something at Berkeley called the Data Sciences Commons. It will house those who are willing to take the risk of being not “just” a computer scientist or not “just” a biologist. It’s my job to set up the structures that de-risk this, which allow people to be fluid in their careers. I think that’s the way for people to be most fulfilled and have the most impactful careers.
There’s a tug between the fact that we need to run things, but on the other hand we need to make it easy to blur things so people can follow both their passions and the passions created by the world’s problems.
I hope we will begin to see the effects of this culture very soon. To go back to an earlier point, I believe that having a diverse staff, in which each of us understands how we contribute to this larger goal and have a stake in our collective success, will be a key driver in this effort.
Q: Is there a question I didn't ask that you'd like to ask and answer?
A: Ok, here’s one: “Jennifer, what convinced you to leave an established company like Microsoft after 23 years to create what amounts to a startup organization at Berkeley?”
That’s a great question and one I’ve been asked many times. It’s because when you look at the world, you see so much potential with data and how we interact with data. We have the opportunity to integrate data science across campus to develop end-to-end solutions to some of the world’s most pressing problems. How could I say no to such an amazing opportunity? For me, being here makes me feel like a kid in a candy shop.