pexels.keiraburton.032923
The Data Science Ethos tool is package of research-backed guidance and case studies that show students and practioners how to embed ethics into the lifecycle of data-driven projects. (Photo/ Keira Burton, Pexels)

Data science has unlocked new potential to help understand and address challenges like human health and climate change. But it can be challenging to know how to work ethically with data, especially when it has major impacts on society. 

Enter the Data Science Ethos tool by the Academic Data Science Alliance (ADSA). The alliance’s package of research-backed guidance and case studies show how to embed ethics into the lifecycle of data-driven projects. Students and experts can follow this framework to assess the effects of their choices at each stage of their work, from framing their research questions to interpreting and sharing their work.

“It brings a distilled version of humanities and social science insights directly to practicing data scientists and data science students, so that they learn just how regular and simple it can be to reflect about ethics and bring it into their workflow,” said Cathryn Carson, a contributor to the tool. She also co-designed UC Berkeley’s Human Contexts and Ethics (HCE) program, which is part of the Data Science Undergraduate Studies program.

Lenses, stages of the lifecycle

The tool was developed by a team of social scientists, data scientists and humanists from ADSA, including Berkeley faculty. It encourages practitioners and others to consider their work through four lenses during each of the six stages of the research process.

The four lenses, which Carson said are based on Berkeley’s HCE curriculum, are positionality, sociotechnical systems, power and narratives. These should prompt social and ethical questions around the production and context of the data that anchor a project.

The lenses should be considered individually and together during each research stage. Those stages include question formulation, data discovery, analysis, modeling, interpreting and sharing. The lifecycle is laid out in a paper, “Data Science Ethos Lifecycle: Interplay of Ethical Thinking and Data Science Practice,” published in the Journal of Statistics and Data Science Education and authored by Margo Boenig-Liptsin and Ari Edmundson of UC Berkeley and Anissa Tanweer of the University of Washington.

Carson and the HCE team are interested in incorporating the lifecycle tool, which includes case studies, into Berkeley’s Data Science Discovery program and data science classes. ADSA intends for it to be used outside of Berkeley, too, including in K-12 classes, Carson said.

“We hope that embedding the tool in training programs will give future data science practitioners this hands-on experience at bringing ethics into their work,” said Carson, chair of Berkeley’s Department of History. “So that it just becomes part of thinking about how you do data science."