Machine Behavior

Spring 2025 — COS 598B

Machine learning models are everywhere, and their role in society may increase as they become more popular and influential. At the same time, recent work has shown that LLMs can simulate and predict human behavior remarkably well. Thus, understanding and steering the behavior of such systems can amplify their benefits, mitigate their harms, and increase our understanding of human behavior. This seminar course aims to facilitate publishable student research on these broad topics. Coursework is a mix of readings and a research project.

Basic Information
Schedule
About the course
- 3.1 Reading activities
- 3.2 Research project
- 3.3 Guest Lectures
- 3.4 Grading
- 3.5 Expectations
Detailed Schedule
Acknowledgments

1. Basic information

Email: manoel@cs.princeton.edu.
Office Hours: on demand.
Hours: Tue & Thu, 10 am — 11:20 am.
TA: Ryan Liu (rl5886@cs.princeton.edu).
Location: Friend Center 016.
Office hours: Use this Google Sheet

2. Schedule

Date	Readings	Slides	Project
Jan 28	Intro	(link)	Ideation
Jan 30	Intro	(link)	Ideation
Feb 04	Simulation	(link)	Ideation
Feb 06	Simulation	(link)	Ideation
Feb 11	LLMology	(link)	Ideation
Feb 13	LLMology	(link)	Ideation
Feb 18	Serina Chang		❌
Feb 20	❌		Proposal
Feb 25	Social Media	(link)	Prior work
Feb 27	Social Media	(link)	Prior work
Mar 04	Hum. & Machines	(link)	❌
Mar 06	Hum. & Machines	(link)	❌
Mar 18	Mor Naaman		❌
Mar 20	Michael Bernstein		❌
Mar 25	❌		Clinic
Mar 27	❌		Clinic
Apr 01	❌		1-on-1 meetings
Apr 03	Sunnie Kim		❌
Apr 08	Dylan Thurgood		❌
Apr 10	Christopher Barrie		❌
Apr 15	Paul Röttger		❌
Apr 17	Micah Caroll		❌
Apr 22	❌		Presentations
Apr 24	❌		Presentations

3. About the course

This course has three central components: 1) reading activities, 2) a research project, 3) guest lectures

3.1 Reading activities

TL;DR: For each class, all students write a short commentary (“reading response”) to assigned readings, and some students take turns as “discussants,” summarizing their peers’ reading responses.

3.1.1 Reading response

Each day, we will have 1 to 2 assigned readings;
Everyone should write a short response for each.
- This should be done as a slide to be added to a collective deck.
- This should be done at least 22 hours before the class.
Your response should reflect on a couple of the following questions:
- What are the contributions of the paper?
- How would you extend this work?
- Do you disagree with any of the authors’ methodological decisions?
- What connections did you find between this work and your own?
- Did you gain any insights (directly or indirectly) by reading this paper?
- Do you agree with all the assumptions made in the paper?

3.1.2 Discussants

We will have one or two discussants per assigned reading. They should:
- Read the assigned readings like everyone else.
- Synthesize the paper and the reading responses in 3 to 5 slides.
- Present (~20 min) the synthesis you prepare.
- Co-host (~30 min) a discussion in the class. Discussants should jointly prepare prompts to spur a discussion in the class.

3.2 Research Project

TL;DR: Along with a team, you will conduct original research and write a paper summarizing your project.

Scope. A central component of the seminar is a research project. For example, your project might 1) use LLMs to simulate human behavior, 2) examine the behavior of machine learning models, or 3) analyze the interaction between humans and machine learning models. Above all, the project should be about a topic that interests you, e.g., something you find useful or that may contribute to your dissertation.
Teams. Team formation will be flexible, and the project scope will be commensurate with the team size. The final paper will include a “credits” section describing how each group member contributed to the project.
Process. We will have milestones, informal presentations, and feedback throughout the semester. I will also meet with you outside the class to help with your project.
Outcome. Your team should write a research paper summarizing the projects with the typical sections, e.g., Intro, Related Work, Methods, Results, and Discussion. It should have around 5,000 to 7,000 words. You may tailor the paper to a specific venue you want to target for publication (e.g., ACL venues, ICLR, CoLM), and the instructor can help students think about whether and where to submit their project.

3.2.1 Deliverables

The project has five different deliverables, all to be done in groups:
1. Project proposal.
  - Description: Two-pager on: 1) what your intended project is; 2) why is it relevant, and 3) how you are going to do it. Be prepared to talk about your proposal (no slides needed) in class!
  - Deadline: Feb 20.
  - Submission link
2. Brief synthesis of relevant related work.
  - Description: Two-pager. Select ~3 papers relevant to your project and 1) summarize and their contributions; 2) discuss their limitations; 3) describe how your envisioned work differs from/expands prior work.
  - Deadline: Mar 06.
  - Submission link
3. Project clinic presentation.
  - Description: Short presentation discussing what your group has accomplished and, most important, the roadbloacks you are currently facing.
  - Deadline: Mar 25.
  - Submission: Add slide to this drive folder.
4. Project final presentation.
  - Description: Short presentation of your project.
  - Deadline: Apr 22nd.
  - Submission: Add slide to this drive folder.
5. Final project report
  - Description: Final report in the format of a paper. The report should have around 8 pages and 4,000-8,000 words, and should be structured like a paper. You must include a `contributions’ section outlining what group participant did what. I encourage you to link a github repo with the code you used for the project within the manuscript.
  - Deadline: Apr 24nd.
  - Submission link.

3.2.2 IRB

Additionally, you may also need to apply for an IRB. I will help you with this!

3.3 Guest Lectures

After spring break, we will have a series of guest lectures. Students are expected to attend and meaningfully engage with guest speakers.

3.4 Grading

10% In-class active participation.
10% Discussant presentation.
20% Reading responses.
60% Research project:
- 35% Final write-up.
- 15% Final presentation.
- 10% Other deliverables.

3.5 Expectations

I expect you to:
- Attend and actively participate in class.
- Be respectful and collegial to your classmates and guests.
- Complete readings early and submit responses on time to help discussants.
- Present your work when the time comes and serve as a discussant when necessary.
Deadlines. Deadlines exist to help the class run smoothly. However, if you have any extenuating circumstances, please contact me about whether and how you can receive an extension. You must be proactive in letting me know so that we can plan together and others are not disrupted.
A note on diversity and respectful conduct. This course welcomes all students of all backgrounds. You should expect and demand to be treated by your classmates and myself respectfully. If any incident challenges this commitment to a supportive, diverse, inclusive, and equitable environment, please let me know so the issue can be addressed.
Disability, Religious, and family accommodations. If you have any questions about disability or religious accommodations, please refer to university policies. Feel free also to contact me for any reason.
Academic integrity. We will follow the University’s rules and responsibilities guide. Also, if you need IRB approval, we can work together to apply for it early!

4. Detailed Schedule

Week #1 — Introduction (Pre-read)

Jan 28

General plan:
- Introductions.
- Go over the plans for the seminar.
- Quick overview of readings.
- Set expectations: things might change based on your feedback.
Project:
- Brainstorming areas of interest.

Jan 30

Readings:
- Rahwan et al. “Machine behaviour.” Nature (2019) (link)
- Wagner, Claudia, et al. “Measuring algorithmically infused societies.” Nature (2021) (link)
Project:
- Brainstorming areas of interest.

Week #2 — Simulations (Pre-read)

Feb 04

Readings:
- Argyle, Lisa P., et al. “Out of one, many: Using language models to simulate human samples.” Political Analysis (2023). (link)
- Messeri, Lisa, and M. J. Crockett. “Artificial intelligence and illusions of understanding in scientific research.” Nature 627.8002 (2024): 49-58. (link)
Project:
- Brainstorming areas of interest.

Feb 06

Readings:
- Hu, Tiancheng, and Nigel Collier. “Quantifying the persona effect in LLM simulations.” ACL 2024. (link)
- Wang, Angelina, Jamie Morgenstern, and John P. Dickerson. “Large language models should not replace human participants because they can misportray and flatten identity groups.” ArXiv preprint 2024. (link)
For fun:
- “I strongly feel that this is an insult to life itself,” Munger (blogpost)
- John Horton’s take on Kevin’s blogpost.
Project:
- Brainstorming areas of interest.

Week #3 — LLMology (Pre-read)

Feb 11

Readings:
- Binz, Marcel, and Eric Schulz. “Using cognitive psychology to understand GPT-3.” Proceedings of the National Academy of Sciences 120.6 (2023): e2218523120. (link)
Project:
- Brainstorming areas of interest.

Feb 13

Readings:
- Santurkar, Shibani, et al. “Whose opinions do language models reflect?.” International Conference on Machine Learning. PMLR, 2023. (link)
Project:
- Brainstorming areas of interest.

Week #4 — Interlude

Feb 18

Guest Lecture: Serina Chang
- Title: “Inferring and simulating human behaviors with machine learning”
- Abstract: “Understanding human behaviors is crucial for high-stakes decision making, such as pandemic response, yet fine-grained behaviors are often difficult to observe (e.g., for privacy reasons or data collection constraints). In this talk, I’ll discuss two approaches to addressing this challenge: (1) inferring behaviors from novel data sources, (2) simulating behaviors with LLMs. In the first part, I’ll discuss our work to infer fine-grained mobility networks from aggregated location data, which enabled us to model the spread of COVID-19 and inform public health decision-making. In the second part, I’ll discuss our recent work on generating social networks with LLMs, showing that, while these models can capture structural characteristics of real-world networks, they substantially overestimate political homophily.”

Feb 20

Project:
- Project proposal: Students present project proposals and receive feedback.

Feb 25

Readings:
- Guess, Andrew M., et al. “How do social media feed algorithms affect attitudes and behavior in an election campaign?.” Science 2023. (link)
- Wagner, Michael W. “Independence by permission.” Science 2023. (link)
Project:
- Review of prior work.

Feb 27

Readings:
- Haroon, Muhammad, et al. “Auditing YouTube’s recommendation system for ideologically congenial, extreme, and problematic recommendations.” PNAS (2024) (link)
- Hosseinmardi, Homa, et al. “Causally estimating the effect of YouTube’s recommender system using counterfactual bots.” PNAS (2024). (link)
Project:
- Review of prior work.

Week #6 — Humans and Machines (Pre-read)

Mar 04

Readings:
- Costello, Thomas H., Gordon Pennycook, and David G. Rand. “Durably reducing conspiracy beliefs through dialogues with AI.” Science 385.6714 (2024). (link)
Project:
- Review of prior work.

Mar 06

Readings:
- Krügel, Sebastian, Andreas Ostermaier, and Matthias Uhl. “ChatGPT’s inconsistent moral advice influences users’ judgment.” Scientific Reports (2023). (link)
Project:
- Review of prior work.

Week #7 — Guest Lectures

Mar 18

Guest Lecture: Mor Naaman
- Title: From Autocomplete to Auto-Everything: The Consequences of AI-Mediated Communication
- Abstract: Talk: From autocomplete and smart replies to video filters and deepfakes, we increasingly live in a world where communication between humans is augmented by artificial intelligence. AI often operates on behalf of a human communicator by recommending, suggesting, modifying, or generating messages to accomplish communication goals. We call this phenomenon AI-Mediated Communication (or AI-MC). While AI-MC has the potential of making human communication more efficient, it impacts other aspects of our communication in ways that are not yet well understood. Over the last six years, my collaborators and I have been documenting the impact of AI-MC on communication outcomes, language use, interpersonal trust, and more. The talk will outline experimental findings from this work. For example, the research shows that AI-MC involvement can impact the evaluation of others; change the extent to which we take ownership over our messages; and shift not only what we write, but even our expressed attitudes. AI-MC may also have a disparate effect on different demographic groups, for example in how different groups are evaluated and suspected of using AI. Overall, AI-MC raises significant practical and ethical concerns as it stands to reshape human communication, calling for new approaches to the development of these technologies.
  Mar 20
Guest Lecture: Michael Bernstein
- Title: Generative Agents: Interactive Simulacra of Human Behavior
- Abstract: Talk: Effective models of human attitudes and behavior can empower applications ranging from immersive environments to social policy simulation. However, traditional simulations have struggled to capture the complexity and contingency of human behavior. The argument is that modern artificial intelligence models allow for a re-examination of this limitation. The case is made through generative agents: computational software agents that simulate human behavior. By enabling generative agents to remember, reflect, and plan, an interactive sandbox town of twenty-five agents inspired by The Sims is populated. Then, by anchoring agents’ memories in qualitative interviews of over 1,000 Americans, it is described how generative agents are able to replicate participants’ responses on the General Social Survey 85% as accurately as participants replicate their own answers. Finally, it is explored how these human behavioral models can help design more effective online social spaces, understand the societal disagreement underlying modern AI models, and better embed societal values into algorithms.

Week #8 — Project Clinic

Mar 25

Project:
- Project clinic: Students present projects and get feedback.

Mar 27

Project:
- Project clinic: Students present projects and get feedback.

Week #9 — Alignment

Apr 01

Project:
- 1-on-1 meetiings: Students get individualized feedback.

Apr 03

Guest Lecture: Sunnie Kim
- Title: Advancing Responsible AI with Human-Centered Evaluation
- Abstract: As AI technologies are increasingly transforming how we live, work, and communicate, AI evaluation must take a human-centered approach to realistically reflect real-world performance and impact. In this talk, I will discuss how to advance human-centered evaluation, and subsequently, responsible development of AI, by integrating knowledge and methods from AI and HCI. First, using explainable AI as an example, I will highlight the challenges and necessity of human (as opposed to automatic) evaluation. Second, I will illustrate the importance of contextualized evaluation with real users, revisiting key assumptions in explainable AI research. Finally, I will present empirical insights into human-AI interaction, demonstrating how users perceive and act upon common AI behaviors (e.g., LLMs providing explanations and sources). I will conclude by discussing the implications of these findings and future directions for responsible AI development.

Week #10 — Guest Lectures

Apr 08

Guest Lecture: Dylan Thurgood
- Title: Assessing the role of emotions in the persuasive capabilities of GPT-4: An analysis of articles about climate change
- Abstract: A growing body of empirical evidence suggests that large language models (LLMs) can generate persuasive political content that rivals or even outperforms the persuasiveness of human-generated messages. While it is well-established in the communication literatures that emotions play a significant role in the persuasiveness of a message, how well LLMs can reproduce emotion frames in textual content and how their persuasiveness depends on the emotions they convey has received little attention. In this talk, I will discuss experimental work on AI persuasion in a context where emotions have been identified as a key drivers of mobilisation: climate change. Comparing the persuasiveness of AI-written articles about a topic related to climate change with a human-generated benchmark reveals that while GPT-4 can reproduce the emotions conveyed in the human-generated articles and enhance persuasiveness, the role of emotions appears more ambiguous than previous literature suggests.

Apr 10

Guest Lecture: Christopher Barrie
- Title: Replication for Language Models
- Abstract: Excitement about Large Language Models (LMs) abounds. These tools require minimal researcher input and yet make it possible to annotate and generate large quantities of data. While LMs are promising, there has been almost no systematic research into the reproducibility of research using them. This is a potential problem for scientific integrity. We give a theoretical framework for replication in the discipline and show that much LM work is wanting. We demonstrate the problem empirically using a rolling iterated replication design in which we compare crowdsourcing and LMs on multiple repeated tasks, over many months. We find that LMs can be (very) accurate, but the observed variance in performance is often unacceptably high. In many cases the LM findings cannot be re-run, let alone replicated. This affects “downstream” results. We conclude with recommendations for best practice, including the use of locally versioned ‘open source’ LMs.

Week #11 — Guest Lectures

Apr 15

Guest Lecture: Paul Röttger
- Title: Measuring Political Bias in Large Language Models
- Abstract: Large language models (LLMs) are helping millions of users to learn and write about a diversity of issues. In doing so, LLMs may expose users to new ideas and perspectives, or reinforce existing knowledge and user opinions. This creates concerns about political bias in LLMs, and how these biases might influence LLM users and society. In his talk, Röttger will first discuss why measuring political biases in LLMs is difficult, and why most evidence so far should be approached with skepticism. Using the Political Compass Test as a case study, he will demonstrate critical issues of robustness and ecological validity when applying such tests to LLMs. Second, he will present his team’s approach to building a more meaningful evaluation dataset called IssueBench, to measure biases in how LLMs write about political issues. He will describe the steps they took to make IssueBench realistic and robust. Then, he will outline their results from testing state-of-the-art LLMs with IssueBench, including clear evidence for issue bias, striking similarities in biases across models, and strong alignment with Democrat over Republican voter positions on a subset of issues.

Apr 17

Guest Lecture: Micah Caroll
- Title: What We Want Changes! Problems with Optimizing Feedback from Influenceable Humans
- Abstract: Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. I’ll first discuss a formalism designed to account for preference changes (DR-MDPs). Through the lens of DR-MDPs, one can better analyze the consequences of optimization objectives used under standard static-preference assumptions: in particular, how they can lead to undesirable AI behavior aimed at influencing humans (in settings such as LLM chatbots or recommender systems). Finally, I’ll discuss the core challenges in the way of fully resolving or avoiding issues of preference change – and why this problem is here to stay.

Week #12 - Final Presentations

Apr 22

Project:
- Final presentations: students present their project and submit their report.

Apr 24

Project:
- Final presentations: students present their project and submit their report.

5. Acknowledgments

I used the following seminar courses as references to help structure this:

Manoel Horta Ribeiro