research
* * * ml interpretability * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
emergent world representations
Do complex language models memorize surface statistics, or do they develop internal representations of underlying processes
generating sequences? We explored this question using a synthetic board game task (Othello), uncovering nonlinear internal representations of a board state. By intervening on model's layer activations during its calculations, we learn that these representations are causal. Finally, we leverage these techniques to create latent saliency maps to help explain what influenced the model's output. Read more in our paper
here.
* * * responsible ml * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
designing data for ml
The ML pipeline includes data collection and iteration. But what data should you collect, how should you collect it, and how do you evaluate what a model has learned prior to deployment? Read more in our paper
here. In submission, CHI 2023.
ml practices outside big tech
Support for the democratization of machine learning is growing rapidly, but responsible ML development outside Big Tech is poorly understood. As more organizations turn to ML, what challenges do they face in creating fair and ethical ML?
We explore these challenges, highlighting future research directions for the ML community in our AIES 2021
paper!
disparities of ml responsibilities
Deploying an ML model is often the result of many stakeholders and disparate entities. So who is responsible for what in this fragmented, often opaque supply chain?
* * * uncertainty in ml * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
socializing data
Labeled datasets are historically treated as authoritative sources of ground truth. But how is that ground truth determined, and how can we build historical contexts for these systems? This project focuses on collaborative sensemaking and label provenance. Soon to be submitted!
uncertainty in ml systems
ML systems are the product of complex sociotechnical processes, each of which introduces distinct forms of uncertainty into the final output. Communicating this uncertainty is critical for building appropriate trust, but is often achieved through simple, cumulative encodings that may obfuscate uncertainty's underlying complexity. Our work is aimed at exploring how and what uncertainty measures should be presented to different stakeholders.
Learn more
here!
* * * visualizations * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
misleading visualizations
Misinformation comes in many forms--including in charts and graphs. So we built a spell-check equivalent for visualizations! We hope that by pointing out ineffectiveness in visualizations, we can ensure best practices in design and increase data literacy. Just as importantly, we can encourage accuracy and critique in public domains.
So what is the red wavy line analogue for graphs? Read more in our EuroVis 2020 paper
here!
Air quality, like many environmental and health considerations, is important to communicate to the public.
But how do you effectively communicate important information to lay readers, particularly in context of uncertainty and statistical model outputs? We discuss challenges
here, and design considerations here.
See the outcome and explore air quality
here.