Data Studies

Data Studies courses prepare students to take their critical thinking and analysis skills and apply them to data-oriented environments, including industry, government, and non-profit organizations.

Science and Technology Studies and the Data Sciences Initiative have developed a new program in Data Studies with courses open to students from all majors. These courses are designed to offer critical understanding of the social life of data, as well as important aspects of data analysis. They prepare students to apply their critical thinking and analytical skills to data-oriented environments in industry, government, and non-profit organizations.

Learn more about the program by visiting these websites:

Division of Social Sciences page on Data Studies

College of Letters and Science page on Data Studies

Data Studies Courses

STS 101: Introduction to Data Studies

Introduction to Data Studies

The following are the key topics and subtopics covered in STS 101. The course incorporates methods derived from STS and concepts from the social sciences and humanities, focusing on critical approaches to data. The course involves four hours of lecture/discussion per week as well as homework involving critical reading, data manipulation using Microsoft Excel, and presentation of work. There will be a final exam.

Caring for Data (structuring and naming data files informatively, backing up original data, ability to reproduce cleaned data, sharing data, documenting data content and provenance)

The Data Science Process in the World (defining a problem, clarifying a problem, learning to how and where to ask questions, learning to be comfortable not knowing, learning to experiment and explore data, teamwork on assignments)

Data Exploration and Manipulation with Excel (indexes, lookup tables, pivot tables, graphing and charting)

Understanding Data (archaeology of data, metadata concepts, formats, columns, features, data integrity, cleaning data)

Stakeholders (analysis, constraints and benefits, interviewing, roleplaying, redefining questions)

Big Data and Pre-Machine Learning (brainstorming features, selecting good data, big data strengths and limitations)

Presenting/Visualizing Rhetoric (graph and chart approaches, structuring a presentation, structuring a report)


STS 115: Data Sense and Exploration

Data Sense and Exploration: Critical Storytelling and Analysis
This course introduces students to data science analysis through case studies of working with data to develop and tell meaningful stories about interesting questions. The course has the students work with real questions and real-world (messy) data, learn to think critically about how to quantify and measure concepts, learn to visualize data for exploratory data analysis (EDA) and for communicating final results to different types of audiences. The students develop data literacy, intuition about sampling variability, skepticism about quantitative claims, best practices in data visualization, and an introduction to programming.

We use case studies to explore problems with real data that require students to think about the context of the problem and the data. The case studies also encourage the students to bring in other sources of data that may offer improved insights into the problem. They also raise issues with “found” data that are not randomly sampled and how naive inferences can be highly misleading.
Students will learn to clean data, considering how the data were collected and how errors may have been introduced.

The course also introduces elementary aspects of the R computing environment for data analysis. The primary purpose for this is so that students can use this in subsequent courses, especially the capstone course (the fifth course in the forthcoming “Data Studies" minor).

The course involves: 1) using data to reason about and answer questions, 2) skeptically framing and evaluating questions and answers, 3) presenting results to different audiences, 4) manipulating data with a high-level programming language.

Learning Objectives: Reason about how the data were obtained; understand limitations of the sampling mechanism; identify the population from which the data were sampled and what relevant inferences we might be able to perform.

Visualization covers many types of plots; single variable plots; two variable plots; multivariable plots; choice of glyphs, colors; conditional plots - panels; geospatial maps; visualizing “big data”.

This courses uses a high-level programming language (R) rather than Excel and so requires explaining fundamental concepts in that language.