Solving the Histogram Bug

November 30, 2021 4-minute read

What is Tempo?

Tempo is a free-to-play rhythm platformer I published with my friends over at Aestronauts. A major facet of the game is mastering each level to claim a spot on one of the global leaderboards, where players are ranked by their 'Beats Per Minute', or BPM. As a result, we felt it was important to give our players useful metrics to compare themselves with the competition. One such metric is the post-level histogram, intended to show the player a full distribution of player scores from the leaderboard, with their current score and personal record marked for reference.

Leaderboard Screenshot — Leaderboard screenshot from Tempo level 7, 'Into the Light'.

The Issue

This histogram is generated at runtime, right as the player enters the post-game interface. During the transition, we pull the top 100 entries from the Steam Leaderboard to populate the local leaderboard. Our oversight lay in using those same scores to populate the histogram as well. Because these are only the top 100 scores, our own score indicator could be thrown far off the plot, resulting in a blatant visual bug and confusion for the player.

One of our core pillars while developing Tempo was to release a game as bug-free as possible, given the small scope of the game. As such, we knew this bug had to be addressed right away.

Approach #1

Initially to resolve this, we considered simply pulling the entire Steam Leaderboard down to recreate the full histogram plot every time the player completes a level. However, as our team anticipated, this turned out to be extremely costly, and on levels with over 10,000 entries the game would freeze while downloading the entries, and the FPS would tank upon completion, likely due to the intense memory usage of the several thousand LeaderBoardEntry structs.

🪓 Axed! 🪓

Approach #2

From here, we began considering several routes to address the histogram display bug in a performant manner. One consideration was to set up a middle-man server using Amazon AWS to act as a buffer between clients and the Steam Leaderboards. This server would constantly run a script intermittently polling the Steam Leaderboards for each level, and constructing information to define the histogram plot of the level's leaderboard data. The necessary information would simply be the min score, max score, bucket width, and number of entries in each bucket. Each client could then query for this small struct of data in order to populate a local copy of the histogram in real time.

While possible, this solution was definitely not preferable. Not only did it require a significant upfront technical investment to set up a remote server on a separate platform, but there was also no guarantee that the necessary application would fit nicely within the processing and memory capacity of a Free Tier Ubuntu Micro-Instance. On top of that, as the Tempo user-base continues to grow, there is the potential to overwhelm the single lone server, and setting up a load balancing system to handle the potential stress was outside the particular skill set of our principle engineer (myself). Ultimately, this solution was discarded as too costly and a poor return on investment.

🪓 Axed! 🪓

Approach #3

The next solution our team discussed was to locally cache all the leaderboard data one time on the players machine, and then update it intermittently. We determined this to be a poor direction, as we can only run code on a client machine while the game is active, and optimally we should avoid downloading massive amounts of data while the user is playing. Furthermore, there is now a tradeoff between performance and keeping the cache up to date.

The real nail in the coffin, however, is that the Steam Leaderboard API doesn't seem to offer a method of retrieving entries based on the time of creation, meaning this approach is fundamentally flawed. In the end, we deemed this system to be needlessly complex and potentially non-performant.

🪓 Axed! 🪓

The Final Solution

The final solution was to choose a number N to represent how many leaderboard entries a client is willing to download. Now, we download every nth entry, where n = TOTAL_ENTRIES / min(TOTAL_ENTRIES, N), and insert it into the histogram.

This gives us a nice even distribution of the data throughout the plot, with the illusion of a massive amount of sampled data. Even better, with N as a sliding value, we can conveniently profile and choose an optimal number of entries to maximize both performance and plot quality. N could even be chosen dynamically such that it is performant on lower end devices!

🔥 Success! 🔥