Hey guys, today’s post will address threading in video games.
In times where multicore CPUs are pretty common, multithreading becomes a really obvious choice for performance improvements in video games. But multithreading isn’t always the answer. This post is supposed to deal with two common issues I noticed again and again over the last few years.
The first one: Improving (render) thread counts in .ini files of video games to tweak performance.
Well that one might be a bit complicated so I decided to address it first. Sometimes when I read forum entries where people are complaining about bad performing video games, a couple of times someone came up with a lot of .ini tweaks to improve performance. Basically there is nothing wrong with that, but as you might guess already, not all of these tweaks are really helpful. Improving the “render thread count” for instance is not worth it (in most cases). More threads do not automatically mean better performance. Quite the contrary, often this will make things worse.
DirectX x to 11 and openGL aren’t capable of multithreaded GPU/CPU communication (or at lest not very good at it). And that’s one of the core issues. Yes multithreading in games makes totally sense, but only at some very special points and in a “very limited” range. Throwing 32 threads at a game and thinking it will work better is a wrong approach which usually comes from people who have less or no experience with multithreading and/or game development.
Ok back to our DirectX/OpenGL description. I will stick with DirectX in this post since I am more familiar with that, but most of the points will apply to OpenGL in a very close manner.
As I already mentioned: You won’t be able to pass render tasks to your graphics card from multiple threads at the same time. This leads us to a more or less annoying issue.
In a beautiful world, filled with rainbows unicorns and a multithreaded D3D(11)DeviceContext, rendering would work like this:
Sadly our context isn’t threadsafe. What does that mean? Well, it’s actually not very complex. As you can see in the picture above, if the world were a better place, we could draw completely parallel but in reality things behave a bit different. If we want to access the D3D11Context, we will have to hide it behind a lock to serialize the access, otherwise our game would crash or at least begin to behave in a strange way. That means our rendering would look like this:
You might recognize that this will prevent us from taking advantage of our multithreading capabilities in our game, even worse we are facing the threading overhead (yes, spawning tasks and assigning them to threads also takes time) without getting any performance benefits –> our performance decreases. And for the DirectX pros who are laughing at me right now: Yes I am aware of Deferred Context and CommandList but this image is thought to simplify the whole thing for those who have no experience with DirectX since basically you giving tasks to a threadsafe container and execute them all in one thread with an immediate context is the same principe as seriallizing the draw-calls per mutex. The immediate context will execute them one by one anyway. (remember? we are still in DirectX 11)
This might sound like: Multithreading in games is bullshit…. It isn’t. There are other ways to bypass this issue.
Things like animation updates, collision, position updates, sound, physic or even preparing draw-calls can be paralleled very well (that’s why people might have multiple threads for rendering). This is an example how a parallelized frame could be prepared/executed:
This pretty much shows how I am processing my frames (with sometimes more, sometimes fewer threads… it depends on the complexity of my update/collision functions). And yes, sometimes I am doing my collision check twice, first at the beginning of my ->Move() function, one time after my ->Move() function. (One may argue about efficiency but that’s not the topic of this post).
Okay I think this gave you a little insight how multithreading could be implemented in a video game and wich problems you might have, but that’s not all folks.
As a little conclusion you might take the following. There are developers who use multiple threads for graphic tasks (as I already mentioned, to prepare their draw calls), but that’s not the rule (at least as far as I know). And even if there is a render thread counter inside of an .ini file, leave it how it is. In the best case you get 1-5 FPS. In the worst case, the engine can’t handle the increased parallelization workload and your game looses FPS. Developers do not chose this numbers for fun. They know how their engine works (at least I hope so), so they probably will know better how many render threads will be appropriate.
And this leads me to the second issue (I promised you two issues 😉 ):
It’s not always a good idea to throw threads on a game. There are beginners out there who feel forced to use multithreading in games since there are multithreaded CPUs out there. That’s not always a good idea. If you have a game with simple physic, simple collision, low graphics (that does not mean bad graphics), you don’t want to throw multiple threads at it. Take a very basic tetris-game for instance. It could look like this one:
Why the hell should you throw multithreading at this? Actually you will put a lot of work into getting the same if not worse performance out of your engine just to say: Well it’s multithreaded. I know, this is a really trivial example but it suits my needs. Most likely you won’t need multiple threads in your game until physic enters the field. And that’s another point. I can’t tell it often enough… Don’t use too much threads! Maybe you remember the little physic framework I introduced in my last post. It’s back.
In the right upper corner you see the FPS. (Yes, it’s running at higher FPS than the gif I uploaded does.) Atm, the engine is running at 4 threads and everything is fine. And this happens if you add another 2 threads to the pool:
Again, FPS counter upper corner. Well you might recognize that the FPS decreased a little bit. (basically 1 fps and the FPS are at a lower point in the average) One FPS might sound like absolutely no problem (and that’s actually right). But it would be performance you might get for free. And if I would increase the thread count even further, the FPS would decrease even wider. The impact on this example might sound negligible, but you are forgetting that this is a really simple example. The framework is very basic and there never were performance issues by now (unless you triple the square number). If the performance draw would increase, the FPS loss might increase as well. That’s the reason I recommend finding the sweet spot where you get the best FPS with as few threads as possible.
That’s all for now, I hope some of you may be smarter than you were before reading this post.