Hi! I'm Aspen.

I'm a PhD student at CSAIL in the Madry Lab and the Center for Deployable Machine Learning. Check out my random ramblings here.

research

* * * responsible ml * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

ai supply chains

The AI systems you interact with are typically the product of many AI components glued together. These AI supply chains challenge many of the basic expectations we have about ML development and deployment, including for fairness and explainability.
w/Sarah Cen, Andrew Illyas, Isabella Struckman, Aleksander Madry, & Luis Videgaray

on ai deployment

Our series, On AI Deployment, discusses the economic and regulatory implications AI supply chains.

open foundation models

What are the benefits of open models? What are the risks? Led by Sayash Kapoor and Rishi Bommasani, this work collects the thoughts of 25 authors to start answering these questions.
w/ Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Kevin Bankston, Stella Biderman, Miranda Bogen, Rumman Chowdhury, Alex Engler, Peter Henderson, Yacine Jernite, Seth Lazar, Stefano Maffulli, Alondra Nelson, Joelle Pineau, Aviya Skowron, Dawn Song, Victor Storchan, Daniel Zhang, Daniel E. Ho, Percy Liang & Arvind Narayanan

sampling with llms

People have begun using large language models (LLMs) to induce sample distributions (e.g. for synthetic training data purposes), but there are no guarantees about said distribution. We evaluate LLMs as distribution samplers acrpss multiple modalities, finding they struggle to produce a reasonable distribution.
w/ Alex Renda & Michael Carbin
[1] paper [2] git

designing data for ml

The ML pipeline includes data collection and iteration. But what data should you collect, how should you collect it, and how do you evaluate what a model has learned prior to deployment?
w/Fred Hohman, Luca Zappella, Xavier Suau Cuadros, & Dominik Moritz
[1] paper [2] poster

ml practices outside big tech

Support for the democratization of machine learning is growing rapidly, but responsible ML development outside Big Tech is poorly understood. As more organizations turn to ML, what challenges do they face in creating fair and ethical ML?

We explore these challenges, highlighting future research directions for the ML community in our AIES spotlight paper.
w/ Serena Booth

* * * ml interpretability * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

emergent world representations

Do complex language models memorize surface statistics, or do they develop internal representations of underlying processes generating sequences? We explored this question using a synthetic board game task (Othello), uncovering nonlinear internal representations of a board state. By intervening on model's layer activations during its calculations, we learn that these representations are causal. Finally, we leverage these techniques to create latent saliency maps to explain what influenced the model's output.
w/Kenneth Li, David Bau, Fernanda Viegas, Hanspeter Pfister, & Martin Wattenberg

* * * uncertainty in ml * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

uncertainty in ml systems

ML systems are the product of complex sociotechnical processes, each of which introduces distinct forms of uncertainty into the final output. Communicating this uncertainty is critical for building appropriate trust, but is often achieved through simple, cumulative encodings that may obfuscate uncertainty's underlying complexity. Our work is aimed at exploring how and what uncertainty measures should be presented to different stakeholders.
w/Harini Suresh
[1] video [2] paper

socializing data

Labeled datasets are historically treated as authoritative sources of ground truth. But how is that ground truth determined, and how can we build historical contexts for these systems? This project focuses on collaborative sensemaking and label provenance.
[1] paper

* * * visualizations * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

misleading visualizations

Misinformation comes in many forms--including in charts and graphs. So we built a spell-check equivalent for visualizations! We hope that by pointing out ineffectiveness in visualizations, we can ensure best practices in design and increase data literacy. Just as importantly, we can encourage accuracy and critique in public domains.

So what is the red wavy line analogue for graphs?
w/Michael Correll & Arvind Satyanarayan
[1] paper [2] video

visualizations for the public

Air quality, like many environmental and health considerations, is important to communicate to the public.

But how do you effectively communicate important information to lay readers, particularly in context of uncertainty and statistical model outputs?
w/Pascal Goffin & Miriah Meyer

teaching

data crafting

While the value of play is scientifically grounded, the benefits of play in the context of data are underexplored. Our workshop encouraged novices to explore data and 'made new' mundane notions for experienced practitioners by utilizing crafting materials and techniques.
[1] blog 1 [2] blog 2 [3] paper [4] Please reach out for slides.

wanna chat?

Aspen

{

  • twitter
  • medium
  • cv