Decoding Site Reliability Engineering
DraftKings is transforming a global industry with our Engineering team. We’re excited to spotlight our talented engineers through our Q&A series. Today we are proud to feature Todd, Director of Site Reliability, who’s been with DraftKings for over two years. Get to know Todd’s story below.
Tell us a bit about yourself and how you found yourself in Site Reliability Engineering?
I grew up always taking things apart and putting them back together to figure out how they worked. I never really grew out of that, and I have tended to gravitate towards careers that indulge the behavior. I found myself in Site Reliability Engineering (SRE) through roles in systems engineering, release engineering, production support, and software development. Each of those other roles has prepared me for my current role on the SRE team, and they have all built on each other in many ways. The throughline is a curiosity about how things work and a willingness to inspect and investigate.
How did you end up at DraftKings? What drew you to the company?
I had first heard of DraftKings through ads on a fantasy football radio channel I listened to on my daily commute. Fast forward a few years, and I am working for a company that, among other things, has a fantasy football platform. I was in attendance at Datadog’s Dash conference with my team, where I saw Travis Dunn speak about engineering at DraftKings. I was really impressed by what I had heard. After the talk, I checked quickly on my phone if any openings might fit my experience. There was! I applied that night when I got home, and here I am. I was drawn to the engineering org culture, the problems to solve, and the industry that DraftKings helped create. I was attracted to working in an environment where things were evolving and contributing to solving the challenges. A few years in and things are still evolving, and there are always new challenges to solve.
Can you describe what Site Reliability Engineering is?
I can try. I get asked this a lot. Site Reliability Engineering is applying software engineering principles to infrastructure and operations. That is fairly textbook, but it applies to the role here at DraftKings. We strive to automate and codify as much of the infrastructure as possible. We then enable engineering teams to use that automation to own their infrastructure safely. There’s some more to it, but at a high level, that’s the role.
What is your role on the SRE team? How does your team contribute to DraftKings’ mission/goals?
I am a Director on the SRE team. Our team is responsible for all the cloud and on-prem infrastructure for our DFS, Sportsbook, and Casino products. We’re a highly collaborative team. We partner closely with engineering, security, architecture, and other teams to ensure DraftKings’ services stay available and performant. During big events like the start of the NFL season or the Super Bowl, we’re online proactively monitoring the critical infrastructure and fixing problems before becoming apparent to the fans using our products.
What does a typical day look like for you?
I typically log on at or a little before 9:00 am and immediately jump into my Slack channels. I try to get caught up on anything that happened overnight and any pages my team received before I get my day started. I will usually have a meeting or two before attending one of my team’s stand-ups at 10:30 am. Also, I will typically have at least one 1:1 with a direct report and some informal syncs with my leads or product owner throughout the day. I like to work time into each day to read up on investigations and POCs that my team is doing and project statuses. I also try to clear as many of the accumulated action items from the day as possible and then queue up the plan for the next day. I usually shut it down by 5:30 pm and make the long commute to my living room.
Please walk us through the most interesting/challenging site issue you’ve faced at DraftKings. How did your team solve this issue?
This might not be the most challenging issue, but it highlights an important lesson. Early during one NFL game day a couple of years ago, we encountered an issue in our scaling process. This led to a small backup, and our scaling process got a little out of order when we cleared it. That allowed a critical service to completely scale out before one of its dependencies even started scaling out. We completely overwhelmed the dependency with calls and caused a user-facing issue. We declared an incident and immediately diagnosed the problem. The prevailing thought was to scale out more instances of the dependency, but that was going to take time. We spent a few minutes discussing how we could add more instances quicker but then we realized we could scale in the service making the calls. That worked! And much faster than scaling out more instances of the dependency to handle all the calls. We had several other lessons and clean-ups from this incident. One of the key takeaways for me was that more scale isn’t always better or the correct course of action.
What’s your favorite code editor?
Controversial! Call me old-fashioned, but if I need to edit something quickly, I still reach for Vim. You can be very efficient if you take the time to learn and configure it. For almost everything else, I have been using VS Code for a couple of years now. Prior to that, I really enjoyed Atom.
What advice do you have for those seeking a job at DraftKings?
Get comfortable with talking through what you are thinking. There are few things worse than an interview where a candidate sits there quietly thinking. I want to hear it! What do you think about the problem? Let’s talk through it together. Pretend like you’re already on the team, and we are working to solve a problem together. This is even more important if you are unfamiliar with the subject matter.
What advice would you give to your younger self?
Don’t be afraid to be terrible at something. Most people don’t start off being great at what they are doing. It takes work and time to develop a new skill.
What’s the best and worst thing about remote work?
I’ve really gotten used to not having to take the train or driving to get to work. I have been putting that extra time to use for everything from additional time working in the morning to getting in an after-work workout and spending time with my family. It has added a lot of flexibility to my schedule. The worst part and the challenge of remote work is knowing when to put the work down for the night. There isn’t that natural break and decompression on the commute home anymore. Initially, I had a difficult time making the transition, but at this point, I’ve established a balance that works for me. Also, I really miss my podcasts. No commute. No podcasts. I just can’t get into them the same at home.
If you're interested in joining our team of incredible engineers and believe you can help us make an impact, please check out our open roles and apply to be part of this amazing community.
Want to take a deep dive into our Engineering world at DraftKings? Learn more on our DraftKings Tech blog.
Check out the latest DraftKings blog posts and meet our global team!Read our Blog
Picture Yourself Here