2x Performance Improved Migrating Python 2 to Python 3

Python2 To Python3 1200x630.png

We are pleased to announce the results of the migration of the shared data pipeline service from Python 2 to Python 3. The first and most significant result is a drastic reduction in memory utilization. The old version of the shared event pipeline on Python 2 used gigabytes of memory, whereas the new version on Python 3 uses under one hundred megabytes of RAM. This is a significant improvement that will lead to improved operational flexibility.

The success of the migration did not end there. One of the things that the shared event pipeline is responsible for is feeding subscribe events into the presence system. The Presence system is responsible for tracking online/offline activity of users and devices. Presence has a metric that indicates the age of an event in the system. This is the time difference between when the presence application sees the event and the timestamp on the event indicating how old it is. 

As you can see, our spike peaks were latent. Spiking at over two seconds on average. After the upgrade, you can see the latency is much smaller, roughly 600 milliseconds less than the previous daily peaks. Our daily values have also received an improvement, with a similar difference of about five to six hundred milliseconds. This improvement occurred in every region that the shared event pipeline was deployed to.

The migration of the shared event pipeline from Python 2 to Python 3 has been a success. It has resulted in a drastic reduction in memory utilization, as well as a significant improvement in the performance of the presence system and shared event pipeline feeding many of our downstream systems.

Positive Downstream Systems Impacts

Internally we track many system metrics. The user-join-event time metric is the amount of time between when a user subscribes to a PubNub channel and when they are counted as joining it. A lower value indicates that users are joining channels more quickly at a lower latency. This metric is important because it can be used to track the effectiveness of device online tracking latency.

The implicit subscribe presence event pipeline is one of the most important and slowest components of the system. Note that we process over three trillion API calls per month with tens of petabytes worth of data. This system is heavily loaded. By optimizing this component, events can be processed more quickly, which is a major advantage for the customer. The graphs show that Python 3 provides better memory management than Python 2, with minimal code changes required. This means that Python 3 can handle more data without using as much memory, which has shown, in our case, to lead to faster processing times.

The improvements in Python 3’s version of the gzip library and the 0MQ library have been a positive development for us. We have noticed that the new versions have removed some of the bottlenecks that existed in the shared event pipeline. This is significant because we rely heavily on these two libraries. The improvements have made our code more efficient and reliable. We are grateful for the work that has been done to improve these libraries.

We were interested in the potential effect of this change on the shared event pipeline and all downstream processes, including Presence, Events and Actions, and Storage. We observed only minor improvements for these downstream systems. We see that this is because these downstream systems do not consume from the gzip 0MQ interface, and they use another generalized queue interface.

What's next for Python 3 at PubNub?

While we are also upgrading components to Rust, we have some potential quick wins by using the success proven by this Py2 to Py3 upgrade. We have almost finished upgrading our fleet of API servers to the latest Python runtime. There are a few remaining items.

Even though Python 3 has proven a results, we are still making moves to Rust. We're actively in progress replace systems with Rust. Upgrading it to Rust would provide a number of benefits, including improved performance, security, and stability. That is a bigger task, as it is a complete rewrite. Meanwhile, we are looking to upgrade the older Python runtimes to the latest versions. This will allow us to capture quicker wins on operational improvements. It was great to share this story with you, as we've seen an excellent outcome from the Py2 to Py3 move.