F Perez

Fernando Pérez makes the case for collaboration at TEDxBerkeley Infle(x)ion.

March 11, 2019

In a few years, if we walk down the campus, and we ask a random student about analyzing and making inferences from data...that will be as natural as using email, Facebook, or a web browser because data science is becoming part of the educational and intellectual fabric of UC Berkeley

- Fernando Pérez

Presenting at the TEDxBerkeley Infle(x)ion 10th anniversary event, Fernando Pérez, UC Berkeley professor, IPython creator, and Project Jupyter co-founder, made an enthusiastic case for an open, collaborative atmosphere of discovery to help connect students and professionals to scientific research.

Pérez started his academic career studying physics in his hometown of Medellin, Colombia. In 1996, he left to pursue a PhD in theoretical physics in Boulder, Colorado in hopes of finding an atmosphere that celebrated openness and collaboration. The move eventually led to the development of IPython, an interactive computing interface that promotes both those ideals by making it easier to visualize and share analyses with others in a browser format, and Project Jupyter, which supports the open-source exchange. While he initially figured IPython would take only an afternoon to complete, in reality it has become part of a work in progress that has continued over the past 18 years.

IPython helped him use the computer as a “conversation partner,” ultimately enabling him to  share his research around the world, an aim he soon discovered he had in common with many other researchers. Just one example was his good friend, the late John D. Hunter, a neuroscientist at the University of Chicago who researched how severe epilepsy worked in children and built tools for making this and other research more broadly accessible, including Matplotlib, a library of visualization tools. The two friends became part of a larger community of scientists and researchers who were driven by an interest in tech but  who also wanted to work collaboratively instead of tearing each other down. In this community, they found many like-minded individuals who were interested in building together in a space of open-source scientific development.

Professor Pérez described traveling with Hunter and others from this community to a conference on scientific computing to teach a workshop for students from impoverished rural areas in India. The students were able to learn about the different tools the team used in their research and then contribute back to IPython. This kind of collaborative and empowering dynamic would have been impossible if it weren’t for these new open source and inclusive tool sets, he said.

“Jupyter is a product of this incredible team and community,” he said. “It is important to go against this narrative of solo heros in science, which is something I think the media likes to use a lot, but is unfortunately very toxic and counterproductive.”

Open source computing tools such as IPython and Jupyter Notebooks, he said, “weave together the languages of humans, English and mathematics, with the language of the computer in a single narrative.” Without these tools, it would not have been possible to communicate and share research with such a large audience.

He also spoke about the importance of data, and the ways it can bring computation to life in a real-world contexts. To illustrate his point, he displayed a Jupyter Notebook from the course Data 8 that uses data from jury pulls in Alameda County to study racial disparity in juries. He offered another example, this one in the context of physics, in which scientists were able to use data to detect the collision of two black holes in September 2015. A Jupyter Notebook contains the entire narrative on this research, enabling anyone to play with the data set, replicate the detection of the collision, and even reproduce the noise it created.

In 2013, Professor Pérez had the opportunity to be in involved in the creation of the Berkeley Institute of Data Science (BIDS) and ever since, he has been working to bring together disciplines like biology, business, geography, and many others in the context of data science. Courses like Data 8 and Data 100 include data sets from many disciplines, and data science Connector courses and Modules enable students to further explore data in a variety of contexts.  He believes that data science will continue to be a fast growing field at Berkeley, and increasingly, students from different fields will be proficient in data science.

He closed his talk with a reminder that none of these profound advances in data science would have been possible without the introduction of the open, collaborative technologies, that have been developed over the past 20 years. In his eyes, ‘this data revolution is here to stay.”

Watch the talk here:

https://www.youtube.com/watch?v=FH0uXTJlVRc&t=1935s

Hour 6:00:00 to 6:13:50