ACM-W NA Profiles: Lillian Petersen

Lillian Petersen is a young researcher and the winner of the 2019-20 of the ACM/CSTA Cutler Bell Prize for creating a tool to help aid organizations increase food security in sub-Saharan Africa. She has been doing a research project every year since 7th grade on topics ranging from predicting weather patterns to studying cancer.

What was the “spark” that made you know you wanted to be involved in computing?

I started learning how to program in 5th grade. I think this was mostly because I was bored in school and I was looking for other things to do. So for Christmas, I got the book Hello World!: Computer Programming for Kids and Other Beginners. I worked through that book and learned from it how to do data analysis and how to work with Python. I found it extremely fun and interesting to learn loops and if statements. And then I decided to apply those skills to some actual data. 

I was a big skier and I had heard that the amount of snow we got in my town depended on the El Niño Index. I downloaded two very small data sets, just the snowfall in Los Alamos, which is my hometown, and the El Niño index. I ran correlations and made best fit lines. I was actually able to get about a 4 months lead time on predicting the snowfall in my hometown. That was really exciting. That’s what made me realize the power of big data and computing to answer real world questions that people are really interested in, and that could make a difference–even though I was only in middle school.

What are you most excited about right now?

I’ve switched fields a lot since I’ve started. I’ve worked with everything from weather data, to climate change, to food shortages, but then I took a biology class my junior year of high school. I became fascinated with molecular biology and the vast potential for lifesaving drugs and therapies. 

The summer after my junior year, I switched research topics. I got an internship at the Salk Institute for Biological Sciences in San Diego. I’m working on a project to better understand the mechanisms of gene regulation and using that to study Leukemia. If everything goes really well, this could be the first step towards a new drug in the future to help Leukemia and other cancers.

After starting with the El Niño snowfall predictions, what inspired you to work on bigger problems in the world?

My dad is an ocean modeler so I’ve grown up hearing about climate change like every day. It was something that was important to me to raise awareness about. When I was in 7th grade and did my project with El Niño, I was working with weather station data. And I was wondering, are [these places] getting warmer? Are they getting drier or wetter? Can we see it in the daily data or do we need these big models?

The next year, in 8th grade, I downloaded weather station data for every weather station in the world. At first it was more a thing for myself because I was curious: can I look at the daily temperature and see if it was warming? The answer was a resounding yes. Then I realized I could use this project to raise awareness. 

I created an interactive map that’s available to the public where anyone can zoom in on their hometown or anywhere around the world and see how that station is changing in the mean and extreme temperatures. That was the first project that I did that I think had a larger impact beyond me predicting snowfall just for fun. It made me realize my ability to raise awareness and affect a large scale change.

What was the most interesting or most difficult of your research into food instability?

I need to start that question by giving background. Current early warning systems that predict food shortages are run on extensive budgets and are not easily scalable.

That’s because predicting crop yields in Africa is extremely difficult because of a highly heterogeneous landscape where there are very small farm fields–much smaller than one satellite pixel–and many different crops, climates and growing seasons across the continent. In the US or Europe, you have very large farm fields and only grow a handful of crops, maybe three or four. Then there’s so much data [in the US] on where everything is grown exactly, how much of it is grown and the yield for every single crop field. You can train your model to be very specific and predict crop yields very well in the US. 

When you try to apply these very specialized models to Africa, they all break down, because the farm fields are so small, there’s so many different crops and the climates aren’t the same. I had to create a model that was much more versatile. I didn’t use any crop masks and I didn’t use any sub-national ground truth data, so that gets rid of a lot of problems that make it hard to work in Africa. My crop yield prediction uses satellite imagery that can be applied to anywhere in the world to predict the crop yield of any crop type.

What is a crop mask?

Imagine you have satellite imagery of all the ground and you only want to use the ground that has crops, so you use a crop mask. The problem is the crop masks in Africa are not very good. Basically, you don’t know which land is growing crops and what land is not. Data for that is very hard to come by. What I did instead is use all the satellite imagery over a certain area and used the monthly anomaly.

This will mostly capture the signal of the crops because if you have urban land, the anomaly will be zero because it will be the same all the time. If you have forested land, the anomaly will also likely be zero because it will be forested all the time. You’ll end up getting mostly just the signal from the crop fields without having to work with complicated methods. 

What is your research and development process like?

I start doing research over the summer. That’s really the time when you can get a lot done without having to worry about school work. I spend the first few weeks of the summer brainstorming a topic I can work on that’s doable–that’s one of the first criteria. You can’t reach for some pie in the sky idea that would take 20 researchers three years to do. You have to be able to do it as a high school student. So I look for something that’s doable, but also high impact and something that I’m passionate about.

After deciding the project I’m going to work on, I read a lot of paper and the literature. I look at their methods and results and I figure out the method I’m going to use. Then I program and download data and I try to find correlations and linear regressions and whatever else I’m using.

What do you hope to do from here?

Next year I’m going to go to Harvard. Right now I’m thinking I plan to major in applied math–applied to molecular biology–with a minor in computer science, but that could still change. I definitely want to continue doing research. I’m interested in molecular biology, especially genetics. Those fields, together with math and computer science, have the largest potential for impact out of every field I know of. 

It’s the next wave of the future. We’re going to have drugs and therapies that extend human life and improve living conditions. All these things are just within our reach and I want to be on the leading edge with that.

What advice would you give to students who want to make a difference?

My biggest piece of advice would be to learn computer programming. Computer programming, in my experience, is the best way to make a large impact and do a real research project without having to have a PhD or a ton of resources. There’s so much data [online] that you couldn’t even find all of it. 

If you know computer programming, you can research any topic you’re interested in by downloading data and running correlations or linear regressions, or if you’re complicated, machine learning algorithms. You can make a difference as a high schooler or a young researcher.

To learn more about Lillian and her work, visit her website: lillianpetersen.github.io