DS-UA 202 Responsible Data Science

Term: Spring 2023
Instructor: Dr. Elisha Cohen
Level: Undergraduate

Topics

Fairness: Algorithmic Fairness(classification and set selection, risk assessment), Fairness and Causality, equality of opportunity, justice;

Data Science Lifecycle: data profiling & data cleaning, Taming technical bias;

Data Protection: Limits of anonymization, differential privacy, Data protection, ethical frameworks, principles-based approach;

Transparency and Interpretability: Auditing black-box models, explainable machine learning, Experiments in ad targeting and delivery, Discrimination in online ad delivery, Interpretability and legal frameworks;

Description

The first wave of data science focused on accuracy and efficiency: on what we can do with data. The second wave is about responsibility: what we should and should not do. Accordingly, this technical course tackles the issues of ethics and responsibility in data science, including legal compliance, data quality, algorithmic fairness and diversity, transparency of data and algorithms, privacy, and data protection.

Data science promises to improve people’s lives, accelerate scientific discovery and innovation, and bring about positive societal change. Yet, if not used responsibly—in accordance with ethical and moral norms, and legal and policy considerations—this same technology can cause harm on an unprecedented scale. Algorithmic changes in search engines can sway elections and incite violence; irreproducible results can influence global economic policy; models based on biased data can legitimize and amplify racist policies in the criminal justice system; algorithmic hiring practices can silently and scalably violate equal opportunity laws, exposing companies to lawsuits and reinforcing the feedback loops that lead to lack of diversity, which is both socially undesirable and can negatively impact performance of organizations. These strategic issues become more important as the economy globalizes. Therefore, as we develop and deploy data science methods, we are compelled to think about the effects these methods have on individuals, population groups, and on society at large.

The European Union recently enacted the General Data Protection Regulation (GDPR) that mandates legal protections of data subjects on the part of government entities and companies that employ algorithms and data to make decisions. The US is following suit with a plethora of local efforts, including a recently passed algorithmic transparency law in New York City that applies to City agencies. These legal frameworks, and many others that will soon follow, compel us to develop skills and acquire methodologies for operationalizing responsibility.