Gil Elbaz, Datagen’s co-founder and CTO, sat down with Anthony Goldbloom, the co-founder and CEO of Kaggle. Kaggle hosts machine learning competitions, where data scientists download data and upload solutions to difficult problems. Before Kaggle, Anthony worked as an econometrician at the Reserve Bank of Australia, and before that the Australian Treasury. He holds a first call honors degree in Econometrics from the University of Melbourne.
This interview has been edited for length and clarity.
Q: There’s so much amazing talent on Kaggle. What was your original vision for Kaggle? Did you have something else in mind?
Anthony Goldbloom: I was a journalism intern at the Economist magazine, and I wrote an article about predictive analytics, and machine learning applied to business problems. I was interviewing companies, and was thinking, I would love to work on some of the problems I was interviewing people about. The idea behind Kaggle was to give companies access to people like me, and people like me access to the interesting problems that companies post.
And I think it’s really largely fulfilled that goal. How interesting and exciting to try a problem, get to a certain level of performance, then see what the winners did that you didn’t. It’s a unique learning opportunity. With each challenge you participate in, your performance gets better and better. I think Kaggle has largely fulfilled what I had hoped originally it would do.
Q: How has Kaggle evolved over the years, what it started from and what is the future of Kaggle?
Anthony Goldbloom: Some of the biggest changes we’ve made to Kaggle over the years have been the introduction to a hosted notebook product. We introduced that because we noticed that when people were competing in competitions, they were sharing. We introduced this so that people in our community can share code in our hosted notebook, and this was a huge change. Over time, it has evolved into a real, beautiful, hosted notebook environment; really stable, very powerful, a very nice environment.
The second big part is what we call our public data platform. We allowed anybody in our community to share any dataset with each other without a challenge. The public data platform has been really powerful, and allows our users to actually create their own competitions.
We look at ourselves as really trying to provide learning by doing.
Q: How do you see Kaggle evolving with the need for very large compute in order to train, let’s say generative methods or various reinforcement learning methods? And the need for access to data where we’re only scratching the surface of what’s possible?
Anthony Goldbloom: One of the very exciting things happening on massive datasets is when it comes to solving pragmatic, real world problems.
Kaggle considers ourselves as not the place where new machine learning gets invented, but where it gets sifted and sorted. We figure out what’s real and what isn’t on real pragmatic problems. And so I think, in most cases, being able to fine tune an existing model and not having to spend money on training from scratch, ends up being the dominant strategy.
Q: How would you describe the community as a whole and its ability to solve problems together?
It’s open to anybody, right? If you have an internet connection, Kaggle is accessible to you as a learning opportunity and as a way to get credentialed. The fact that everybody is on the same playing field is another really nice feature of this community.
Q: Do you see Kaggle as a community that can be put together for positive impacts?
Anthony Goldbloom: We work with the radiology industry in North America on challenges ranging from taking chest x-rays to diagnose COVID to CT scans, to diagnose lung cancer, to a large range of medical challenges.
Raising awareness on public good type challenges is definitely an area that Kaggle has done good work on in the past. And I expect us to continue.
Q: What would you recommend to new people starting out in the machine learning space or the computer vision space?
Anthony Goldbloom: My answer is probably somewhat predictable, but get on Kaggle. I’m very much learning by doing type. I think it’s important to learn some basic Python as a starting point. Kaggle has some really nice courses where we try to teach you the basics of Python, the basics of supervised machine learning. They’re not supposed to be a really rigorous grounding in any of these topics, but they’re supposed to teach you just enough that you can start rolling up your sleeves and playing by yourself.
And challenges are a really good way to learn. You probably don’t want to spend more than half an hour or an hour a day on a challenge. And maybe one idea didn’t work, but then something you think of later makes an improvement. It’s a very nice way in my view to learn.