I’m a researcher at the Allen Institute for AI on the Semantic Scholar Research team, where I work on NLP for scientific literature. Before that, I spent a couple years working as a data scientist in Seattle, and a year as a researcher in the Applied Probability group at Academia Sinica in Taiwan. I graduated in 2015 with an MS in Statistics from the University of Washington.

# My research interests

It’s important yet tough for scientists to keep up with the rapid pace of publication. It’d be great if NLP models could improve access to & understanding of the valuable knowledge contained in academic literature. Yet, NLP models that work well on news or Wikipedia articles often perform poorly when applied to scientific text. What makes scientific text challenging? Why do existing models do so poorly on it? How can we overcome these limitations?

### Adapting language models for science

One of the best ways to improve performance across many scientific NLP tasks is to adapt large language models to the scientific domain:

### Scientific NLP tasks & datasets

It’s hard to make progress without challenging tasks & datasets for evaluating our models:

### Resources for scientific NLP research

Scientific papers can be difficult to access (paywalls, copyright 😤). We need large, machine-readable, open-access corpora to support scientific NLP research:

### Helping researchers do research

• What does \gamma mean again? Hate flipping back to page 2 to find the definition? Our ScholarPhi tool provides just-in-time definitions of terms & math symbols right on the PDF (arXiv preprint)

• LIME gives you post-hoc explanations of arbitrary model predictions. But what if a user says Show me more/less for that explanation? Tuning a linear model for this is easy, but for neural models, our solution is LIMEADE (arXiv preprint)

• Prototype recommender system for arXiv papers (demo). Now adopted into production on Semantic Scholar (link)

### Science of science

I’m interested (and concerned) about bias in scientific research. How can NLP help us identify & quantify these biases?

# Professional organizations

It’d be great if more researchers in the NLP & text mining communities worked on scientific text. To promote this, I’ve co-organized workshops & shared tasks: