On winning DataFest with the Data Minions
Mount Holyoke College junior Sophie Kahn ’27 and her team, the Data Minions, won this year’s ASA Five College DataFest. How did they do it? By not writing a line of code until the second night, she writes.
Somewhere around hour 14, exhausted and overcaffeinated, I started to think we might actually have something.
My team, the Data Minions — three juniors with majors spanning mathematics, data science and statistics — walked away from this year’s ASA Five College DataFest with Best in Group and Best in Show awards. While I’m incredibly proud of what we pulled off, the process was humbling in ways I hadn’t fully expected — even going into it for the second time.
If you haven’t heard of DataFest, here’s the quick version: Teams of students receive a massive, real-world dataset and have a weekend to analyze it and present findings to a panel of judges. And when I say massive, I mean it. Nothing in your coursework prepares you for the size and messiness of a DataFest dataset. The ones you use in class are clean, manageable and structured for learning. This one was not. There were multiple datasets to cross-reference, documentation to read through and a whole lot of head-scratching before any of it started making sense. Ours was healthcare data — sprawling and full of the kind of complexity you don’t encounter until you’re working with something real.
The first night was the hardest part of the entire weekend. We didn’t write a single line of code. We just sat with the data — reading through documentation, figuring out how the pieces connected and asking ourselves what we actually wanted to know. That might sound frustrating, but I think it was the most important thing we did.
Here’s what makes that restraint so difficult: Teams have only 19 hours to complete the entire project. From the moment the dataset drops to the moment you present, the clock is running. That kind of pressure makes you want to move, to produce something, to feel like you’re making progress. Sitting still and just thinking can feel almost reckless when you’re watching the hours disappear. But that urgency is exactly what sends teams off course, chasing too many ideas at once, building visualizations that don’t connect and arriving at the presentation with a lot of work and not much to say.
At my first DataFest, I made the classic mistake of jumping straight into graphs. You get excited and start pulling things apart, and suddenly your project is going in six different directions at once. It looks busy but doesn’t really say anything.
This time, I was determined to do things differently. We spent the entire first night and most of the second day working on one thing: finding a research question we were genuinely curious about. Eventually, we landed on one that felt urgent and grounded: We wanted to simulate the impact of Medicaid expansion on emergency room overcrowding and tie it to a real bill currently being debated in the Kansas Legislature. To do that, we had to build our own estimates of what Medicaid coverage would actually look like under expansion since that data didn’t exist yet. That process of going from a massive, unwieldy healthcare dataset to a precise, policy-relevant question is what I’m most proud of.
The coding, once we got there, was honestly the easy part. When you know exactly what you’re trying to find out, the technical work follows naturally. The hard part is the unglamorous process of figuring out what you’re even trying to say.
The other thing that made a real difference was how our team worked together. I’ve seen groups at these competitions fracture. One person takes over, or the team quietly splits into two separate projects running in parallel. That’s a recipe for a disjointed presentation. The three of us stayed in it together the whole time. Every decision was a conversation. When one of us got stuck, the other two were right there. I think the judges could feel that in our presentation — there was a coherence to it and a sense that we were all telling the same story.
Would I do it again? Absolutely. And I wouldn’t change a thing.
If you’re thinking about competing in DataFest, my only advice is this: Resist the urge to start making things before you know what you’re making. The dataset will feel overwhelming, and the time pressure will feel even more so. But the teams that do well are the ones who ask the sharpest question. Producing a lot of work means nothing if the underlying question is fuzzy. Give yourself permission to spend time just thinking. That patience is what wins competitions.