News

Engineering at the Edge of Impossibility: How We Delivered 973 Million Requests Per Minute During a Global Sporting Event

0 MIN READ • Stephen Blum on Jul 22, 2025

Imagine the world tuning in for a global sports event with millions of fans spanning 200+ countries, and every second packed with drama. But this event wasn’t just about watching the match. Thanks to PubNub, fans could join and chat with friends, send real-time reactions as goals hit the net, and track up-to-the-second game stats, all perfectly synced with the action on screen video stream.

Behind the scenes, our team watched as engagement soared: 57.9 billion requests in a single hour, with peaks of 973 million requests per minute. It was the ultimate test of our real-time infrastructure. Here’s how we made sure every message, reaction, and stat reached fans instantly, without any downtime, no matter where they were in the world.

The Challenge: When the World Connects at Once

This wasn't just about streaming content. It was about creating an interactive experience at unprecedented scale. Think about what happens when millions of people around the world want to engage with the same content simultaneously. They're not just watching; they're reacting, chatting, and connecting in real-time. Traditional infrastructure simply isn't built for these kinds of traffic spikes.

But for us at PubNub, this represents exactly the challenge we've been engineering toward. We've invested heavily into autoscaling our infrastructure, building it on Kubernetes running on Amazon EKS across all of their availability zones globally. Why? Because we want users to connect to the closest region and have the best possible experience.

The numbers from this event tell an incredible story of what's possible when you engineer for the edge cases:

57.9 billion requests handled in a single hour during peak traffic
973 million requests processed per minute at the absolute peak
200+ countries served simultaneously with consistent performance
Zero downtime during critical moments
Real-time interactions perfectly synced with the live stream

The Engineering Behind the Achievement

When you're handling nearly a billion requests per minute, every millisecond counts. Our approach to load distribution might sound simple, but it's the result of years of optimization. We use geo-based routing that connects users to the closest zone through GeoDNS.

This is actually pretty smart routing, but it's also minimal from a compute perspective. We want you to connect as fast as possible without any delays, and we don't want processes calculating where you need to connect to. It needs to be done in milliseconds. The beauty of this approach is that it spreads load across Amazon's data centers distributed around the world while ensuring the fastest possible connection for each user.

Real-Time Monitoring and Drift Detection

Managing 200+ countries simultaneously requires sophisticated monitoring. We have drift detection built into our network and monitoring tools that alert automatically when we need to intervene faster than our automated systems are operating.

What does drift detection mean in practice? You want messages to arrive to end users as fast as possible, because latency is key, especially when you're participating in a chat while watching a video. But you want to make sure they experience streams in a linear fashion and flows properly. Users shouldn't jump around between different moments in the event, or see a spoiler in the chat before the livestream has got to that moment.

We have many patent grants and claims that have been awarded to us that allow us to operate our algorithms in a cloud environment. This isn't just theoretical. It is battle-tested technology. We implement a store-and-forward mechanism in our network that gives us the ability to offer our customers guarantees on message delivery. Even if users have network blips, and with billions of users over the years connecting to our network, we've experienced every possible network condition. We ensure all data is delivered to them.

The Science of Capacity Planning

There's always a balance between over-provisioning (expensive) and under-provisioning (risky). Our approach is to target 50% resource utilization of the resources we purchase. This allows our auto-scaling algorithm to always maintain that level, giving us enough time to add more resources as more users come online.

This 50% target gives us plenty of time and overhead, and it's actually another key selling point, it provides excellent latencies without other users affecting any user's experience. The system also uses horizontal pod autoscaling (HPA) that takes latency into account and gives us the ability to automatically scale and add more hardware when latency starts to drift from nominal levels.

Solving the 'Thundering Herd' Problem

One key issue when building real-time apps is that success can be your failure: what if millions of people all react at the same time. This thundering herd problem has been a challenge for companies across the industry for years. And it still remains an expensive problem.

Our background comes from the social web, and that's where thundering herds kind of began. You have these situations where you have a lot of users all at once. We were able to take those understandings and build our network in compiled languages. Our entire network is built on what the internet itself is mostly built on: C. Your routers, switches, and all the infrastructure that sends network packets around the internet is written in C because it's very fast. That's what our PubSub core is written in.

Over the years, we've also been upgrading to Rust, because it gives you the same level of performance as C with additional guarantees like memory safety and concurrency safety. We're even porting our C code over to Rust and seeing fantastic results.

Beyond the Technical: Unlocking Data Insights

Because of our ability to process data in real-time, from messages to reactions and status updates, we provide powerful insights into user engagement patterns during these massive events with PubNub Illuminate. During this sporting event, we were powering not just the chat and reactions, but also live stats so users could see possession per team, current state of the game, all right there on their device.

But the possibilities go much further. You could derive insights about why activity increased during specific moments. Was it a positive experience, or were users frustrated? You can extract that information and use it to understand what really caused engagement spikes.

You could even take a full event like a soccer match and create engagement graphs over time, and use reaction peaks to build highlight reels, or compare one team versus another to gauge the relative passion levels of different fan bases. And correlate to purchase intent, likelihood of monetization and user retention.

The Future of Interactive Experiences

This event demonstrated what I see as a fundamental shift in user expectations. Interactive experiences are becoming commonplace, it's an expected feature these days. Look at the App Store right now: the top 10 apps all have built-in user communication. That kind of experience is now expected on all devices.

With this event's scale and the level of user participation, we can see that interactive experiences are catching up. More people are getting this experience, and it's becoming an expected feature that needs to be included if you expect to have a successful app.

We're also investing in the future with AI-powered capabilities. When you have users communicating, especially in public settings during emotional events like sports, you want to help manage those interactions. We have automatic moderation features that we've built and released to several customers, and we're expanding that capability across sports, education, healthcare, and other use cases where you want to maintain appropriate conversation tone while preserving authentic sentiment.

Engineering for What's Next

This global sporting event proved what's possible when you have the right real-time infrastructure, but this technology isn't just for massive one-time events. Whether you're building a platform that needs to handle traffic spikes, planning a global launch, or creating any application that requires reliable real-time performance, the same engineering principles apply.

Your users expect seamless experiences, even when millions of others are doing the same thing simultaneously. The question isn't whether you'll need to handle massive scale, it's whether you'll be ready when that moment comes.

Get started with PubNub today to learn how you can ensure your platform performs flawlessly, even when the whole world is watching.