Multithreading & unreal
1. Why multi-threading is your best friend?
Imagine you're at a buffet (yes, our journey starts with food). You've got one plate and a hundred dishes to try. That's your game without multithreading - one thing at a time, a queue longer than Black Friday. Enter Task Graph, your ticket to grabbing multiple plates and conquering that buffet like a pro.
Threading is a way of saying, "Let's do all the things, all at once, without tripping over ourselves." It's a fancy job manager that lets you throw small, digestible tasks onto various threads, handling them like a pro chef juggling flaming knives. The beauty of it? It abstracts away the nitty-gritty of thread management, letting you focus on what matters - making your game awesome.
Unreal’s multithreading approach
FRunnable: The OG of threading in Unreal. You create a class that inherits from FRunnable, cook up some tasks, and run them on a separate thread. It's like hiring a sous-chef to take care of the side dishes while you focus on the steak.
Task Graph System: Unreal's way of saying, "Let's get organized." It allows you to queue up tasks that can run concurrently, managing dependencies like a pro project manager. It's the backbone of Unreal's concurrency and a real game-changer for complex operations.
Async Tasks: The quick and dirty way to fire off a task without getting bogged down in the nitty-gritty of thread management. Perfect for when you need to fetch data or perform a calculation without stalling the main thread.
ParallelFor: Ever wanted to speed up a loop by running iterations in parallel? ParallelFor is your friend. It slices up your loop and serves it to multiple threads, speeding up processing like a culinary ninja chopping vegetables.
Beyond Unreal: Multithreading in the Wild
The game industry at large has embraced multithreading with open arms, recognizing it as critical for leveraging modern hardware. Here are a few approaches seen across the board:
Entity Component Systems (ECS): ECS architectures are the new kids on the block, promoting data-oriented design for maximum performance. By decoupling data from logic, ECS facilitates easy multithreading, allowing operations on entities to run in parallel without a hitch. I will cover this in future
Job Systems: Popularized by Unity, job systems let developers define work units (jobs) that can run concurrently, handling dependencies and synchronization behind the scenes. It's a bit like having an automated kitchen where robots prepare dishes simultaneously, supervised by a master chef.
2. Threading classes & unreal
LET’S DIVE DEEPER
F Runnable
FRunnable
is Unreal Engine's base class for creating threads. It's like drafting your very own digital worker; you tell it what job to do, and it goes off to work in the background, leaving the main thread unburdened and your game running smoother than a jazz saxophone solo.
How Does FRunnable Work? (The Stylish Code Edition)
Here's how you would write the script for your backstage hero:
Pros and Cons of Using FRunnable
Pros:
Precision Control: Like a puppet master, you have full control over the thread's lifecycle.
Power: It's Unreal Engine's most direct and potent way to handle heavy lifting in the background.
Flexibility: Whether it's data processing, loading content, or performing calculations,
FRunnable
is up for the task.
Cons:
Complexity: With great power comes... a bit more complexity. You'll need to manage the thread's lifecycle carefully to avoid crashes or unexpected behavior.
Responsibility: You're in charge of ensuring thread safety and managing how your thread interacts with the rest of your game, which can be a daunting task.
Overhead: Each
FRunnable
thread is a full-fledged system thread, which might be overkill for smaller tasks.
https://docs.unrealengine.com/4.26/en-US/API/Runtime/Core/HAL/FRunnable/
Async Task
Think of Async Tasks as the quick spellcasters of the Unreal Engine multithreading world. They're perfect for when you need to perform a small, well-defined task asynchronously, like fetching data, doing light computations, or processing input without blocking the main game thread.
How Does an Async Task Work? Imagine you're in a kitchen, and you need to whip up a quick side dish while also keeping an eye on the main course. An Async Task is like calling over a kitchen assistant to take care of the side dish swiftly.
Pros:
Simplicity: Easy to use, with minimal boilerplate code.
Flexibility: Choose from various named threads based on priority and nature of the task.
Convenience: Ideal for quick, one-off tasks without the need for extensive thread lifecycle management.
Cons:
Limited Control: Less control over the thread's lifecycle and execution details.
Overhead: While minimal, creating tasks involves some overhead that might be noticeable with a large number of small tasks.
Suitability: Not ideal for long-running or complex tasks requiring detailed control over threading.
Understanding [this]
[&] Lambda
In C++, a lambda function is a compact way to define an anonymous function. The [this]
part is called the capture list, and it dictates what from the surrounding scope is available inside the lambda function. When you use [this]
, you're telling the lambda it can access member variables and functions of the class it's defined in, just like any other member function.
Why use [this]
? Imagine your class has a private variable score
that you want to update within the lambda. Capturing [this]
allows the lambda to access score
directly, as if it were inside a regular member function:
[=]
: Captures all visible variables in the surrounding scope by value. Safe but can lead to dangling references if those variables go out of scope.
[&]
: Captures all visible variables by reference. Efficient but potentially dangerous if the lambda outlives the variables it references.[var1, &var2]
: Capturesvar1
by value andvar2
by reference. Mix and match based on needs.
Parallel For
When you have a hefty task, like processing a large dataset or performing complex calculations on multiple game entities, Parallel For is your go-to spell. It breaks down your loop into multiple chunks, each running in parallel on separate threads, dramatically speeding up operations that would otherwise take a significant amount of time on the main thread.
How Does Parallel For Work? Imagine you're hosting a feast, and you need to chop a mountain of vegetables. Parallel For is like summoning several kitchen assistants, each taking a portion of the pile to chop simultaneously.
Pros:
Efficiency: Massive speedup for data processing and calculations by leveraging multicore processors.
Ease of Use: Simple to implement, turning a traditional for loop into a parallelized version with minimal changes.
Scalability: Automatically scales with the number of processor cores, making your code future-proof.
Cons:
Thread Safety: Requires careful consideration of thread safety, as multiple threads might access shared resources concurrently.
Complexity: Debugging and ensuring correctness can be more challenging due to concurrent execution.
Overhead: There's overhead in distributing the work and synchronizing threads, which might not be beneficial for small datasets or tasks.
Non-abandonable task
For tasks that you can't simply abandon or interrupt, Unreal offers FNonAbandonableTask
. This special task ensures that the work gets done, come what may. It's perfect for operations that must reach completion to maintain data integrity or ensure a sequence of actions concludes properly.
To execute an FNonAbandonableTask
, you typically wrap it within an FAutoDeleteAsyncTask
, allowing the engine to manage its lifecycle automatically. This way, you focus on what the task should do, not on when it should be deleted.
TFuture and Promises
For scenarios where you need to perform a task asynchronously and then retrieve a result at some later point, Unreal provides a powerful C++ Standard Library feature: TFuture
and Promises
. These tools allow you to dispatch work and then "promise" to deliver a result that can be awaited with a TFuture
.
Wrapping up
While we've highlighted some of the more specific threading mechanisms Unreal Engine offers, it's essential to recognize that these tools are part of a broader tapestry designed to empower developers. From managing game state updates to handling complex AI calculations and beyond, understanding when and how to leverage these threading constructs can significantly impact your game's performance and responsiveness.
Each threading approach serves different needs:
FRunnable
andFNonAbandonableTask
are about executing standalone tasks, with the latter providing guarantees on task completion.AsyncTask
simplifies dispatching quick, one-off tasks to various threads.ParallelFor
accelerates data processing by distributing iterations across multiple threads.TFuture
andPromises
introduce a way to work with asynchronous results, making your code cleaner and more efficient.
3. Thread safety
Understanding Errors
Common Errors from Lack of Thread Safety
Race Conditions: When threads race to read or write shared data, the outcome depends on who runs first. It's chaotic, unpredictable, and a surefire way to corrupt your data.
Deadlocks: Picture two dogs with leashes intertwined, each waiting for the other to move. Deadlocks freeze your threads when they wait indefinitely for resources locked by each other.
Livelocks: Similar to deadlocks, but here the threads are active, constantly trying to resolve contention without progress—like dogs in a tug-of-war, endlessly pulling with no victor.
Starvation: Occurs when a thread is perpetually denied access to resources because other threads hog them. It's like a pup missing mealtime because the bigger dogs always get to the bowl first.
Priority Inversion: A low-priority thread holds a lock needed by a high-priority thread, which then can't proceed, akin to a small dog hogging the toy just out of reach of a more eager, larger pup.
Detecting Thread Safety Violations
Detecting these issues can be tricky, as they often manifest under specific timing conditions or under heavy system load. Tools like thread sanitizers, debuggers with thread analysis capabilities, and logging can help identify these errors. Look for symptoms like unexplained crashes, data inconsistencies, and performance bottlenecks.
Tools for Ensuring Thread Safety
Let's delve into some of the robust tools Unreal Engine offers to protect your code from the chaos of concurrency.
F Critical section
The FCriticalSection
class is a mutex that ensures that only one thread can execute a section of code at a time. It's like having a traffic light at a busy intersection, controlling the flow to prevent accidents.
T ATOMIC / STD::atomic
TAtomic
in Unreal and std::atomic
in C++ Standard Library provide fundamental operations that are performed atomically, such as incrementing a counter or updating a flag. They're the equivalent of giving each dog a separate treat at the same time—no fuss, no muss.
FPlatform Atomics
FPlatformAtomics
provides a suite of static functions for atomic operations that are platform-agnostic, ensuring that your atomic operations work seamlessly across different hardware.
TQueues
TQueues
offer thread-safe FIFO data structures, ensuring data is processed in the order it was added, just as if you're training pups to wait their turn for treats.
Spinlock (UE5)
Introduced in Unreal Engine 5, FSpinLock
is a lightweight alternative to traditional mutexes that "spins" in a loop while waiting to acquire a lock, useful for short operations where the overhead of putting a thread to sleep is too great. It's like having a quick game of fetch while waiting for dinner to be served.
TShared Ptr (OG OF EM ALL)
TSharedPtr
is a smart pointer that ensures the object it points to is automatically deleted when no longer in use. When it comes to thread safety, it's the equivalent of an auto-cleaning dog bowl—once the dogs are done eating, the bowl cleans itself up. OG OF EM ALL
F Event
FEvent
is a synchronization primitive that allows threads to signal one another, much like teaching dogs to bark to signal they're ready for dinner. It's useful for coordinating the sequence of operations across threads.
Managing Shared Data
Managing shared data is an art form in itself. It involves strategies such as:
Minimizing shared state: The less shared data, the fewer chances for contention.
Immutable state: Data that doesn't change is inherently thread-safe.
Confinement: Keep data local to a thread when possible, avoiding shared access.
Copy-on-write: Threads work on copies of data, and only synchronize at critical points.
4. Uillars of software architecture - threading
A robust game architecture for multithreading stands on two foundational pillars: data integrity and performance efficiency. To achieve these, one must understand the roles of various threading mechanisms and design the game systems around them.
1. Data-Driven Design
In a multithreaded environment, data management is paramount. A data-driven architecture allows for clear separation of data and logic, minimizing dependencies and enabling easier data access across threads. This approach often leverages Entity Component System (ECS) patterns, which organically fit into a multithreaded paradigm by decoupling state from behavior.
2. Task-Based Systems
Breaking down the game logic into discrete tasks or jobs that can run independently provides a clear pathway for distributing work across threads. This granularity facilitates dynamic load balancing and reduces contention, making it easier to scale performance with the number of available cores.
3. Concurrency Control
Understanding and implementing concurrency control mechanisms is crucial. Mutexes, spinlocks, and atomic operations are the sentinels that guard against race conditions and data corruption. Each has its place:
Mutexes (FCriticalSection): Ideal for protecting large sections of critical code or complex data structures where the overhead of locking is less significant than the need for data integrity.
Spinlocks (FSpinLock in UE5): Useful in scenarios where the wait time is expected to be very short, and the overhead of putting a thread to sleep is greater than the cost of busy-waiting.
Atomics (TAtomic, std::atomic): Perfect for simple data types where operations must be indivisible, such as counters or flags.
4. Event-Driven Synchronization
Events (FEvent) and condition variables allow threads to wait for specific conditions or signals before proceeding. This pattern is useful for orchestrating complex interactions between threads, such as ensuring resources are loaded before the game logic proceeds.
5. Memory Management
In multithreading, the way memory is allocated, accessed, and deallocated can significantly impact performance. Use thread-local storage to prevent contention, and be cautious with memory allocation and deallocation across threads, which can be a source of performance bottlenecks.
5. Lock-Free VS Locking Sync
A lot of people are confused in this one, I will try to explain in simplest way possible.
LOCKING
🐶 Imagine a puppy, let's call him Buddy, who has a collection of five chew toys. Buddy is quite possessive and insists on playing with each toy himself, not allowing any of the other dogs to touch them while he's playing. In software terms, Buddy has employed a "locking" mechanism or mutex. SORRY FOR THIS METAPHOR. BEST I COULD COME UP WITH.
In locking synchronization, when Buddy plays with a toy (or when a thread accesses a variable), he ensures exclusive access. No other pup (thread) can play with the same toy (modify the variable) until he's done and moves on to the next one. This approach is straightforward and ensures that only one thread modifies a variable at a time, preventing any form of data inconsistency or race conditions.
Pros:
It's a clear and easy-to-understand system.
It ensures that only one pup has the bone at a time, preventing any tussles (race conditions).
Cons:
If Locky takes a long time, the other pups must wait, which isn't very fun (can lead to performance bottlenecks).
If Locky forgets to remove the sign after playing, no one else can ever play with the bone (potential for deadlocks).
Lock-free
🐶 Now, let's suppose Buddy learns to share. He allows other dogs to join in the fun, taking turns to play with the toys. This represents "lock-free" synchronization.
Lock-free synchronization is akin to an organized game where each dog waits for its turn to play with the toys, ensuring no two dogs go for the same toy simultaneously. In programming, lock-free algorithms allow threads to work concurrently but in a manner that avoids the need for exclusive locks. Instead of enforcing a strict order, these algorithms typically use atomic operations to ensure that a thread can complete its task in a finite number of steps, regardless of what other threads are doing.
Pros:
All pups can play simultaneously, making it a joyous and lively scene (improved performance and scalability).
There's no need for signs or complex rules (reduced overhead and complexity).
Cons:
Sometimes, two pups might go for the same toy at the exact moment, which requires quick reflexes and rules (can be complex to implement correctly).
The pups need to be well-trained and aware of each other to avoid confusion (requires careful programming to ensure thread safety).
Locking vs. Lock-Free: Choosing the Right Approach
Both approaches have their place in the software world, and understanding when to use each is key to a well-behaved program:
Locking (Mutexes): Suitable for complex operations or when working with resources that require strict ordering and safety. It's easy to understand and implement but can lead to performance bottlenecks if not managed carefully.
Lock-Free: Ideal for high-performance systems where scalability and responsiveness are critical. It avoids pitfalls like deadlocks but can be complex to design and harder to reason about.
6. Custom wrapper class for t-s operations
The Atom Class: Synchronization
The Atom
class is a meticulously crafted template class that encapsulates a variable, ensuring thread-safe access and modifications.
It's like providing each dog (thread) with their own bowl (shared variable).