Why multi-threading is your best friend?
Imagine you're at a buffet (yes, our journey starts with food). You've got one plate and a hundred dishes to try. That's your game without multithreading - one thing at a time, a queue longer than Black Friday. Enter Task Graph, your ticket to grabbing multiple plates and conquering that buffet like a pro.
Threading is a way of saying, "Let's do all the things, all at once, without tripping over ourselves." It's a fancy job manager that lets you throw small, digestible tasks onto various threads, handling them like a pro chef juggling flaming knives. The beauty of it? It abstracts away the nitty-gritty of thread management, letting you focus on what matters - making your game awesome.
Unreal’s multithreading approach
- FRunnable: The OG of threading in Unreal. You create a class that inherits from FRunnable, cook up some tasks, and run them on a separate thread. It's like hiring a sous-chef to take care of the side dishes while you focus on the steak.
- Task Graph System: Unreal's way of saying, "Let's get organized." It allows you to queue up tasks that can run concurrently, managing dependencies like a pro project manager. It's the backbone of Unreal's concurrency and a real game-changer for complex operations.
- Async Tasks: The quick and dirty way to fire off a task without getting bogged down in the nitty-gritty of thread management. Perfect for when you need to fetch data or perform a calculation without stalling the main thread.
- ParallelFor: Ever wanted to speed up a loop by running iterations in parallel? ParallelFor is your friend. It slices up your loop and serves it to multiple threads, speeding up processing like a culinary ninja chopping vegetables.
Beyond Unreal: Multithreading in the Wild
The game industry at large has embraced multithreading with open arms, recognizing it as critical for leveraging modern hardware. Here are a few approaches seen across the board:
- Entity Component Systems (ECS): ECS architectures are the new kids on the block, promoting data-oriented design for maximum performance. By decoupling data from logic, ECS facilitates easy multithreading, allowing operations on entities to run in parallel without a hitch. I will cover this in future
- Job Systems: Popularized by Unity, job systems let developers define work units (jobs) that can run concurrently, handling dependencies and synchronization behind the scenes. It's a bit like having an automated kitchen where robots prepare dishes simultaneously, supervised by a master chef.
Threading classes & unreal
F Runnable
FRunnable
is Unreal Engine's base class for creating threads. It's like drafting your very own digital worker; you tell it what job to do, and it goes off to work in the background, leaving the main thread unburdened and your game running smoother than a jazz saxophone solo.
How Does FRunnable Work? (The Stylish Code Edition)
Here's how you would write the script for your backstage hero:
#pragma once
#include "CoreMinimal.h"
class FMyWorker : public FRunnable {
public:
FMyWorker() {
WorkerThread = FRunnableThread::Create(this, TEXT("MWorker"));
}
/** Destructor */
virtual ~FMyWorker() {
if (WorkerThread != nullptr) {
WorkerThread->Kill(true);
delete WorkerThread;
}
}
virtual bool Init() override {
return true;
}
virtual uint32 Run() override {
while (!bWantsToStop) {
// Perform Intensive Tasks here
// Warning: Thread Safe
}
return 0;
}
virtual void Stop() override {
bWantsToStop = true;
}
protected:
FRunnableThread* FMyWorker = nullptr;
bool bWantsToStop = false;
};
FMyWorker* MyWorker = new FMyWorker();
// And just like that, the thread is in action
Pros and Cons of Using FRunnable
Pros:
- Precision Control: Like a puppet master, you have full control over the thread's lifecycle.
- Power: It's Unreal Engine's most direct and potent way to handle heavy lifting in the background.
- Flexibility: Whether it's data processing, loading content, or performing calculations,
FRunnable
is up for the task.
Cons:
- Complexity: With great power comes... a bit more complexity. You'll need to manage the thread's lifecycle carefully to avoid crashes or unexpected behavior.
- Responsibility: You're in charge of ensuring thread safety and managing how your thread interacts with the rest of your game, which can be a daunting task.
- Overhead: Each
FRunnable
thread is a full-fledged system thread, which might be overkill for smaller tasks.
https://docs.unrealengine.com/4.26/en-US/API/Runtime/Core/HAL/FRunnable/
Async Task
Think of Async Tasks as the quick spellcasters of the Unreal Engine multithreading world. They're perfect for when you need to perform a small, well-defined task asynchronously, like fetching data, doing light computations, or processing input without blocking the main game thread.
How Does an Async Task Work? Imagine you're in a kitchen, and you need to whip up a quick side dish while also keeping an eye on the main course. An Async Task is like calling over a kitchen assistant to take care of the side dish swiftly.
AsyncTask(ENamedThreads::AnyBackgroundThreadNormalTask, []() {
// Your code here, e.g., fetching data from a server
});
// Thread Types
namespace ENamedThreads
{
enum Type
{
UnusedAnchor = -1,
RHIThread,
GameThread,
ActualRenderingThread = GameThread + 1,
AnyThread = 0xff,
MainQueue = 0x000,
LocalQueue = 0x100,
NumQueues = 2,
ThreadIndexMask = 0xff,
QueueIndexMask = 0x100,
QueueIndexShift = 8,
NormalTaskPriority = 0x000,
HighTaskPriority = 0x200,
NumTaskPriorities = 2,
TaskPriorityMask = 0x200,
TaskPriorityShift = 9,
NormalThreadPriority = 0x000,
HighThreadPriority = 0x400,
BackgroundThreadPriority = 0x800,
NumThreadPriorities = 3,
ThreadPriorityMask = 0xC00,
ThreadPriorityShift = 10,
GameThread_Local = GameThread | LocalQueue,
ActualRenderingThread_Local = ActualRenderingThread | LocalQueue,
AnyHiPriThreadNormalTask = AnyThread | HighThreadPriority | NormalTaskPriority,
AnyHiPriThreadHiPriTask = AnyThread | HighThreadPriority | HighTaskPriority,
AnyNormalThreadNormalTask = AnyThread | NormalThreadPriority | NormalTaskPriority,
AnyNormalThreadHiPriTask = AnyThread | NormalThreadPriority | HighTaskPriority,
AnyBackgroundThreadNormalTask = AnyThread | BackgroundThreadPriority | NormalTaskPriority,
AnyBackgroundHiPriTask = AnyThread | BackgroundThreadPriority | HighTaskPriority,
}
}
Pros:
- Simplicity: Easy to use, with minimal boilerplate code.
- Flexibility: Choose from various named threads based on priority and nature of the task.
- Convenience: Ideal for quick, one-off tasks without the need for extensive thread lifecycle management.
Cons:
- Limited Control: Less control over the thread's lifecycle and execution details.
- Overhead: While minimal, creating tasks involves some overhead that might be noticeable with a large number of small tasks.
- Suitability: Not ideal for long-running or complex tasks requiring detailed control over threading.
Understanding [this]
[&] Lambda
In C++, a lambda function is a compact way to define an anonymous function. The [this]
part is called the capture list, and it dictates what from the surrounding scope is available inside the lambda function. When you use [this]
, you're telling the lambda it can access member variables and functions of the class it's defined in, just like any other member function.
Why use [this]
? Imagine your class has a private variable score
that you want to update within the lambda. Capturing [this]
allows the lambda to access score
directly, as if it were inside a regular member function:
[=]
: Captures all visible variables in the surrounding scope by value. Safe but can lead to dangling references if those variables go out of scope.
[&]
: Captures all visible variables by reference. Efficient but potentially dangerous if the lambda outlives the variables it references.[var1, &var2]
: Capturesvar1
by value andvar2
by reference. Mix and match based on needs.
class GameSession {
private:
int score = 0;
public:
void UpdateScoreAsync() {
AsyncTask(ENamedThreads::AnyBackgroundThreadNormalTask, [this]() {
this->score += 10; // Accessing the class member variable `score`
});
}
};
Parallel For
When you have a hefty task, like processing a large dataset or performing complex calculations on multiple game entities, Parallel For is your go-to spell. It breaks down your loop into multiple chunks, each running in parallel on separate threads, dramatically speeding up operations that would otherwise take a significant amount of time on the main thread.
How Does Parallel For Work? Imagine you're hosting a feast, and you need to chop a mountain of vegetables. Parallel For is like summoning several kitchen assistants, each taking a portion of the pile to chop simultaneously.
const TArray<int> indexes;
ParallelFor(indexes /** Elements */, [&](int32 Index) {
// Your loop code here, processed in parallel
//Warning: Requires careful consideration of thread safety, as multiple threads might access shared resources concurrently.
}, EParallelForFlags::BackgroundPriority);
Pros:
- Efficiency: Massive speedup for data processing and calculations by leveraging multicore processors.
- Ease of Use: Simple to implement, turning a traditional for loop into a parallelized version with minimal changes.
- Scalability: Automatically scales with the number of processor cores, making your code future-proof.
Cons:
- Thread Safety: Requires careful consideration of thread safety, as multiple threads might access shared resources concurrently.
- Complexity: Debugging and ensuring correctness can be more challenging due to concurrent execution.
- Overhead: There's overhead in distributing the work and synchronizing threads, which might not be beneficial for small datasets or tasks.
Non-abandonable task
For tasks that you can't simply abandon or interrupt, Unreal offers FNonAbandonableTask
. This special task ensures that the work gets done, come what may. It's perfect for operations that must reach completion to maintain data integrity or ensure a sequence of actions concludes properly.
To execute an FNonAbandonableTask
, you typically wrap it within an FAutoDeleteAsyncTask
, allowing the engine to manage its lifecycle automatically. This way, you focus on what the task should do, not on when it should be deleted.
class FMyNonAbandonableTask : public FNonAbandonableTask {
public:
void DoWork() {
// Perform your critical task here
}
/* Get StatId Plays crucial role in identifying tasks, performance profiling */
FORCEINLINE TStatId GetStatId() const {
RETURN_QUICK_DECLARE_CYCLE_STAT(FMyNonAbandonableTask, STATGROUP_ThreadPoolAsyncTasks);
}
};
TFuture and Promises
For scenarios where you need to perform a task asynchronously and then retrieve a result at some later point, Unreal provides a powerful C++ Standard Library feature: TFuture
and Promises
. These tools allow you to dispatch work and then "promise" to deliver a result that can be awaited with a TFuture
.
TPromise<int> Promise;
TFuture<int> Future = Promise.GetFuture();
Async(EAsyncExecution::ThreadPool, [&Promise]() {
int Result = 42; // Imagine some heavy computation here
Promise.SetValue(Result);
});
// Later on, you can check if the result is ready
if (Future.IsReady()) {
int Result = Future.Get();
// Do something with the result
}
Wrapping up
While we've highlighted some of the more specific threading mechanisms Unreal Engine offers, it's essential to recognize that these tools are part of a broader tapestry designed to empower developers. From managing game state updates to handling complex AI calculations and beyond, understanding when and how to leverage these threading constructs can significantly impact your game's performance and responsiveness.
Each threading approach serves different needs:
FRunnable
andFNonAbandonableTask
are about executing standalone tasks, with the latter providing guarantees on task completion.AsyncTask
simplifies dispatching quick, one-off tasks to various threads.ParallelFor
accelerates data processing by distributing iterations across multiple threads.TFuture
andPromises
introduce a way to work with asynchronous results, making your code cleaner and more efficient.
Thread safety
“too many threads, not enough mutexes” 🐶 Thread issues are like having a litter of puppies fighting over a toy (the shared variable). You can bet there's going to be a tussle, and maybe a few yelps, when two pups go for the same toy at the same time. We can synchronize their play by letting puppy A play before puppy B, or we can provide another toy (a snapshot) for puppy B. In the world of multithreading, this situation is what we call a race condition. 1. Mutexes (Locks) : Implement a feeding schedule where only one dog can access the bowl at a time. 2. Atomic Operations: Just like giving each pup a bite-sized treat at the same time to prevent squabbling, atomic operations ensure that certain computations on data are completed as indivisible steps. 3. Thread-Local Storage: Give each puppy its own bowl. This is similar to thread-local storage where each thread has its own copy of a variable, preventing interference from others. 4. Immutable Objects: Sometimes, the best toy is the one that can't be destroyed, no matter how much the puppies tug on it. In programming, immutable objects can be safely shared between threads without needing synchronization because they cannot be modified after creation.
- Mutexes (Locks) : Implement a feeding schedule where only one dog can access the bowl at a time.
- Atomic Operations: Just like giving each pup a bite-sized treat at the same time to prevent squabbling, atomic operations ensure that certain computations on data are completed as indivisible steps.
- Thread-Local Storage: Give each puppy its own bowl. This is similar to thread-local storage where each thread has its own copy of a variable, preventing interference from others.
- Immutable Objects: Sometimes, the best toy is the one that can't be destroyed, no matter how much the puppies tug on it. In programming, immutable objects can be safely shared between threads without needing synchronization because they cannot be modified after creation.
Understanding Errors
Common Errors from Lack of Thread Safety
- Race Conditions: When threads race to read or write shared data, the outcome depends on who runs first. It's chaotic, unpredictable, and a surefire way to corrupt your data.
- Deadlocks: Picture two dogs with leashes intertwined, each waiting for the other to move. Deadlocks freeze your threads when they wait indefinitely for resources locked by each other.
- Livelocks: Similar to deadlocks, but here the threads are active, constantly trying to resolve contention without progress—like dogs in a tug-of-war, endlessly pulling with no victor.
- Starvation: Occurs when a thread is perpetually denied access to resources because other threads hog them. It's like a pup missing mealtime because the bigger dogs always get to the bowl first.
- Priority Inversion: A low-priority thread holds a lock needed by a high-priority thread, which then can't proceed, akin to a small dog hogging the toy just out of reach of a more eager, larger pup.
Detecting Thread Safety Violations
Detecting these issues can be tricky, as they often manifest under specific timing conditions or under heavy system load. Tools like thread sanitizers, debuggers with thread analysis capabilities, and logging can help identify these errors. Look for symptoms like unexplained crashes, data inconsistencies, and performance bottlenecks.
Tools for Ensuring Thread Safety
Let's delve into some of the robust tools Unreal Engine offers to protect your code from the chaos of concurrency.
F Critical section
The FCriticalSection
class is a mutex that ensures that only one thread can execute a section of code at a time. It's like having a traffic light at a busy intersection, controlling the flow to prevent accidents.
FCriticalSection Mutex;
//To use it, you simply lock the FCriticalSection before accessing shared resources and unlock it afterward:
Mutex.Lock();
// Safe access to shared resources here
Mutex.Unlock();
//Or, even more conveniently, use the FScopeLock class to automatically manage locking and unlocking:
FScopeLock ScopeLock(&Mutex);
// Safe access to shared resources her
T ATOMIC / STD::atomic
TAtomic
in Unreal and std::atomic
in C++ Standard Library provide fundamental operations that are performed atomically, such as incrementing a counter or updating a flag. They're the equivalent of giving each dog a separate treat at the same time—no fuss, no muss.
// Unreal's TAtomic Wrapper for std::atomic
TAtomic<int32> SafeNumber;
SafeNumber = 42; // Atomic set operation
std::atomic<float> SafeFloat;
SafeFloat = 42.0f
FPlatform Atomics
FPlatformAtomics
provides a suite of static functions for atomic operations that are platform-agnostic, ensuring that your atomic operations work seamlessly across different hardware.
TQueues
TQueues
offer thread-safe FIFO data structures, ensuring data is processed in the order it was added, just as if you're training pups to wait their turn for treats.
Spinlock (UE5)
Introduced in Unreal Engine 5, FSpinLock
is a lightweight alternative to traditional mutexes that "spins" in a loop while waiting to acquire a lock, useful for short operations where the overhead of putting a thread to sleep is too great. It's like having a quick game of fetch while waiting for dinner to be served.
TShared Ptr (OG OF EM ALL)
TSharedPtr
is a smart pointer that ensures the object it points to is automatically deleted when no longer in use. When it comes to thread safety, it's the equivalent of an auto-cleaning dog bowl—once the dogs are done eating, the bowl cleans itself up. OG OF EM ALL
F Event
FEvent
is a synchronization primitive that allows threads to signal one another, much like teaching dogs to bark to signal they're ready for dinner. It's useful for coordinating the sequence of operations across threads.
FEvent* Event = FPlatformProcess::CreateSynchEvent(true);
Event->Wait(); // Wait for the event to be triggered
Event->Trigger(); // Trigger the event
Managing Shared Data
Managing shared data is an art form in itself. It involves strategies such as:
- Minimizing shared state: The less shared data, the fewer chances for contention.
- Immutable state: Data that doesn't change is inherently thread-safe.
- Confinement: Keep data local to a thread when possible, avoiding shared access.
- Copy-on-write: Threads work on copies of data, and only synchronize at critical points.
4. Pillars of software architecture - threading
A robust game architecture for multithreading stands on two foundational pillars: data integrity and performance efficiency. To achieve these, one must understand the roles of various threading mechanisms and design the game systems around them.
1. Data-Driven Design
In a multithreaded environment, data management is paramount. A data-driven architecture allows for clear separation of data and logic, minimizing dependencies and enabling easier data access across threads. This approach often leverages Entity Component System (ECS) patterns, which organically fit into a multithreaded paradigm by decoupling state from behavior.
2. Task-Based Systems
Breaking down the game logic into discrete tasks or jobs that can run independently provides a clear pathway for distributing work across threads. This granularity facilitates dynamic load balancing and reduces contention, making it easier to scale performance with the number of available cores.
3. Concurrency Control
Understanding and implementing concurrency control mechanisms is crucial. Mutexes, spinlocks, and atomic operations are the sentinels that guard against race conditions and data corruption. Each has its place:
- Mutexes (FCriticalSection): Ideal for protecting large sections of critical code or complex data structures where the overhead of locking is less significant than the need for data integrity.
- Spinlocks (FSpinLock in UE5): Useful in scenarios where the wait time is expected to be very short, and the overhead of putting a thread to sleep is greater than the cost of busy-waiting.
- Atomics (TAtomic, std::atomic): Perfect for simple data types where operations must be indivisible, such as counters or flags.
4. Event-Driven Synchronization
Events (FEvent) and condition variables allow threads to wait for specific conditions or signals before proceeding. This pattern is useful for orchestrating complex interactions between threads, such as ensuring resources are loaded before the game logic proceeds.
5. Memory Management
In multithreading, the way memory is allocated, accessed, and deallocated can significantly impact performance. Use thread-local storage to prevent contention, and be cautious with memory allocation and deallocation across threads, which can be a source of performance bottlenecks.
5. Lock-Free VS Locking Sync
A lot of people are confused in this one, I will try to explain in simplest way possible.
LOCKING
🐶 Imagine a puppy, let's call him Buddy, who has a collection of five chew toys. Buddy is quite possessive and insists on playing with each toy himself, not allowing any of the other dogs to touch them while he's playing. In software terms, Buddy has employed a "locking" mechanism or mutex. SORRY FOR THIS METAPHOR. BEST I COULD COME UP WITH.
In locking synchronization, when Buddy plays with a toy (or when a thread accesses a variable), he ensures exclusive access. No other pup (thread) can play with the same toy (modify the variable) until he's done and moves on to the next one. This approach is straightforward and ensures that only one thread modifies a variable at a time, preventing any form of data inconsistency or race conditions.
Pros:
- It's a clear and easy-to-understand system.
- It ensures that only one pup has the bone at a time, preventing any tussles (race conditions).
Cons:
- If Locky takes a long time, the other pups must wait, which isn't very fun (can lead to performance bottlenecks).
- If Locky forgets to remove the sign after playing, no one else can ever play with the bone (potential for deadlocks).
Lock-free
🐶 Now, let's suppose Buddy learns to share. He allows other dogs to join in the fun, taking turns to play with the toys. This represents "lock-free" synchronization.
Lock-free synchronization is akin to an organized game where each dog waits for its turn to play with the toys, ensuring no two dogs go for the same toy simultaneously. In programming, lock-free algorithms allow threads to work concurrently but in a manner that avoids the need for exclusive locks. Instead of enforcing a strict order, these algorithms typically use atomic operations to ensure that a thread can complete its task in a finite number of steps, regardless of what other threads are doing.
Pros:
- All pups can play simultaneously, making it a joyous and lively scene (improved performance and scalability).
- There's no need for signs or complex rules (reduced overhead and complexity).
Cons:
- Sometimes, two pups might go for the same toy at the exact moment, which requires quick reflexes and rules (can be complex to implement correctly).
- The pups need to be well-trained and aware of each other to avoid confusion (requires careful programming to ensure thread safety).
Locking vs. Lock-Free: Choosing the Right Approach
Both approaches have their place in the software world, and understanding when to use each is key to a well-behaved program:
- Locking (Mutexes): Suitable for complex operations or when working with resources that require strict ordering and safety. It's easy to understand and implement but can lead to performance bottlenecks if not managed carefully.
- Lock-Free: Ideal for high-performance systems where scalability and responsiveness are critical. It avoids pitfalls like deadlocks but can be complex to design and harder to reason about.
6. Custom wrapper class for t-s operations
The Atom Class: Synchronization
The Atom
class is a meticulously crafted template class that encapsulates a variable, ensuring thread-safe access and modifications.
It's like providing each dog (thread) with their own bowl (shared variable).
template<typename T>
/**
* @class Atom
* @brief Template class representing an atomic variable with support for concurrent access and modification
*
* The Atom class is a thread-safe wrapper around a variable, designed to ensure atomic read and write operations.
* It's particularly useful in concurrent programming scenarios where multiple threads need to access or modify a shared variable without causing race conditions.
*
*
* @tparam T The type of the value stored in the Atom
*/
class Atom {
/*
Description:
Purpose: Provides a mechanism for safe concurrent access and modification of a shared variable.
Key Features:
Thread-safe set and get operations.
Spin-lock mechanism for writes to ensure that only one thread can modify the value at a time.
Potential for implicit conversion to simplify usage in expressions.
@TODO and Improvements
Performance Optimization: Implement an adaptive backoff strategy to reduce contention and CPU usage during high contention on the lock.
Expand Functionality: Overload more operators as necessary (e.g., arithmetic or logical operators) to make Atom more versatile.
Error Handling: Introduce error handling for scenarios where lock acquisition fails (e.g., due to too many retries).
*/
private:
T value;
std::atomic<bool> isEditing{false};
public:
Atom() = default;
Atom(const T& initialValue) : value(initialValue) {}
Atom(const Atom<T>& other) {
value = other.GetValue(); // Safely get the value from the other Atom
}
Atom<T>& operator=(const Atom<T>& other) {
if (this != &other) { // Prevent self-assignment
SetValue(other.GetValue()); // Safely get and set the value
}
return *this;
}
Atom<T>& operator=(const T& newValue) {
SetValue(newValue);
return *this;
}
operator T() const {
return GetValue();
}
void SetValue(const T& newValue) {
BeginEdit();
/*
while (!BeginEdit()) {
// TODO: Implement a backoff strategy to improve efficiency
std::this_thread::yield();
}
*/
value = newValue;
EndEdit();
}
T GetValue() const {
// Assuming concurrent reads are safe and do not interfere with writes
return value;
}
private:
bool BeginEdit() {
const size_t maxAttempts = 10000; // Maximum number of attempts
size_t attempts = 0;
bool expected = false;
while (!isEditing.compare_exchange_weak(expected, true, std::memory_order_acquire, std::memory_order_relaxed)) {
expected = false; // Reset expected because compare_exchange_weak may alter it
if (++attempts >= maxAttempts) {
return false; // Failed to acquire the lock after max attempts
}
std::this_thread::yield(); // Yield to reduce busy-waiting impact
}
return true; // Successfully acquired the lock
}
void EndEdit() {
isEditing.store(false, std::memory_order_release);
}
};