Multithreading & unreal

1. Why multi-threading is your best friend?

Imagine you're at a buffet (yes, our journey starts with food). You've got one plate and a hundred dishes to try. That's your game without multithreading - one thing at a time, a queue longer than Black Friday. Enter Task Graph, your ticket to grabbing multiple plates and conquering that buffet like a pro.

Threading is a way of saying, "Let's do all the things, all at once, without tripping over ourselves." It's a fancy job manager that lets you throw small, digestible tasks onto various threads, handling them like a pro chef juggling flaming knives. The beauty of it? It abstracts away the nitty-gritty of thread management, letting you focus on what matters - making your game awesome.

Unreal’s multithreading approach

  1. FRunnable: The OG of threading in Unreal. You create a class that inherits from FRunnable, cook up some tasks, and run them on a separate thread. It's like hiring a sous-chef to take care of the side dishes while you focus on the steak.

  2. Task Graph System: Unreal's way of saying, "Let's get organized." It allows you to queue up tasks that can run concurrently, managing dependencies like a pro project manager. It's the backbone of Unreal's concurrency and a real game-changer for complex operations.

  3. Async Tasks: The quick and dirty way to fire off a task without getting bogged down in the nitty-gritty of thread management. Perfect for when you need to fetch data or perform a calculation without stalling the main thread.

  4. ParallelFor: Ever wanted to speed up a loop by running iterations in parallel? ParallelFor is your friend. It slices up your loop and serves it to multiple threads, speeding up processing like a culinary ninja chopping vegetables.

Beyond Unreal: Multithreading in the Wild

The game industry at large has embraced multithreading with open arms, recognizing it as critical for leveraging modern hardware. Here are a few approaches seen across the board:

  1. Entity Component Systems (ECS): ECS architectures are the new kids on the block, promoting data-oriented design for maximum performance. By decoupling data from logic, ECS facilitates easy multithreading, allowing operations on entities to run in parallel without a hitch. I will cover this in future

  2. Job Systems: Popularized by Unity, job systems let developers define work units (jobs) that can run concurrently, handling dependencies and synchronization behind the scenes. It's a bit like having an automated kitchen where robots prepare dishes simultaneously, supervised by a master chef.


2. Threading classes & unreal

LET’S DIVE DEEPER

F Runnable

FRunnable is Unreal Engine's base class for creating threads. It's like drafting your very own digital worker; you tell it what job to do, and it goes off to work in the background, leaving the main thread unburdened and your game running smoother than a jazz saxophone solo.

How Does FRunnable Work? (The Stylish Code Edition)

Here's how you would write the script for your backstage hero:

Pros and Cons of Using FRunnable

Pros:

  • Precision Control: Like a puppet master, you have full control over the thread's lifecycle.

  • Power: It's Unreal Engine's most direct and potent way to handle heavy lifting in the background.

  • Flexibility: Whether it's data processing, loading content, or performing calculations, FRunnable is up for the task.

Cons:

  • Complexity: With great power comes... a bit more complexity. You'll need to manage the thread's lifecycle carefully to avoid crashes or unexpected behavior.

  • Responsibility: You're in charge of ensuring thread safety and managing how your thread interacts with the rest of your game, which can be a daunting task.

  • Overhead: Each FRunnable thread is a full-fledged system thread, which might be overkill for smaller tasks.

https://docs.unrealengine.com/4.26/en-US/API/Runtime/Core/HAL/FRunnable/

Async Task

Think of Async Tasks as the quick spellcasters of the Unreal Engine multithreading world. They're perfect for when you need to perform a small, well-defined task asynchronously, like fetching data, doing light computations, or processing input without blocking the main game thread.

How Does an Async Task Work? Imagine you're in a kitchen, and you need to whip up a quick side dish while also keeping an eye on the main course. An Async Task is like calling over a kitchen assistant to take care of the side dish swiftly.

Pros:

  • Simplicity: Easy to use, with minimal boilerplate code.

  • Flexibility: Choose from various named threads based on priority and nature of the task.

  • Convenience: Ideal for quick, one-off tasks without the need for extensive thread lifecycle management.

Cons:

  • Limited Control: Less control over the thread's lifecycle and execution details.

  • Overhead: While minimal, creating tasks involves some overhead that might be noticeable with a large number of small tasks.

  • Suitability: Not ideal for long-running or complex tasks requiring detailed control over threading.

Understanding [this] [&] Lambda

In C++, a lambda function is a compact way to define an anonymous function. The [this] part is called the capture list, and it dictates what from the surrounding scope is available inside the lambda function. When you use [this], you're telling the lambda it can access member variables and functions of the class it's defined in, just like any other member function.

Why use [this]? Imagine your class has a private variable score that you want to update within the lambda. Capturing [this] allows the lambda to access score directly, as if it were inside a regular member function:

[=]: Captures all visible variables in the surrounding scope by value. Safe but can lead to dangling references if those variables go out of scope.

  • [&]: Captures all visible variables by reference. Efficient but potentially dangerous if the lambda outlives the variables it references.

  • [var1, &var2]: Captures var1 by value and var2 by reference. Mix and match based on needs.

Parallel For

When you have a hefty task, like processing a large dataset or performing complex calculations on multiple game entities, Parallel For is your go-to spell. It breaks down your loop into multiple chunks, each running in parallel on separate threads, dramatically speeding up operations that would otherwise take a significant amount of time on the main thread.

How Does Parallel For Work? Imagine you're hosting a feast, and you need to chop a mountain of vegetables. Parallel For is like summoning several kitchen assistants, each taking a portion of the pile to chop simultaneously.

Pros:

  • Efficiency: Massive speedup for data processing and calculations by leveraging multicore processors.

  • Ease of Use: Simple to implement, turning a traditional for loop into a parallelized version with minimal changes.

  • Scalability: Automatically scales with the number of processor cores, making your code future-proof.

Cons:

  • Thread Safety: Requires careful consideration of thread safety, as multiple threads might access shared resources concurrently.

  • Complexity: Debugging and ensuring correctness can be more challenging due to concurrent execution.

  • Overhead: There's overhead in distributing the work and synchronizing threads, which might not be beneficial for small datasets or tasks.

Non-abandonable task

For tasks that you can't simply abandon or interrupt, Unreal offers FNonAbandonableTask. This special task ensures that the work gets done, come what may. It's perfect for operations that must reach completion to maintain data integrity or ensure a sequence of actions concludes properly.

To execute an FNonAbandonableTask, you typically wrap it within an FAutoDeleteAsyncTask, allowing the engine to manage its lifecycle automatically. This way, you focus on what the task should do, not on when it should be deleted.

TFuture and Promises

For scenarios where you need to perform a task asynchronously and then retrieve a result at some later point, Unreal provides a powerful C++ Standard Library feature: TFuture and Promises. These tools allow you to dispatch work and then "promise" to deliver a result that can be awaited with a TFuture.

Wrapping up

While we've highlighted some of the more specific threading mechanisms Unreal Engine offers, it's essential to recognize that these tools are part of a broader tapestry designed to empower developers. From managing game state updates to handling complex AI calculations and beyond, understanding when and how to leverage these threading constructs can significantly impact your game's performance and responsiveness.

Each threading approach serves different needs:

  • FRunnable and FNonAbandonableTask are about executing standalone tasks, with the latter providing guarantees on task completion.

  • AsyncTask simplifies dispatching quick, one-off tasks to various threads.

  • ParallelFor accelerates data processing by distributing iterations across multiple threads.

  • TFuture and Promises introduce a way to work with asynchronous results, making your code cleaner and more efficient.


3. Thread safety

“too many threads, not enough mutexes”

🐶 Thread issues are like having a litter of puppies fighting over a toy (the shared variable). You can bet there's going to be a tussle, and maybe a few yelps, when two pups go for the same toy at the same time. We can synchronize their play by letting puppy A play before puppy B, or we can provide another toy (a snapshot) for puppy B. In the world of multithreading, this situation is what we call a race condition.

  1. Mutexes (Locks) : Implement a feeding schedule where only one dog can access the bowl at a time.

  2. Atomic Operations: Just like giving each pup a bite-sized treat at the same time to prevent squabbling, atomic operations ensure that certain computations on data are completed as indivisible steps.

  3. Thread-Local Storage: Give each puppy its own bowl. This is similar to thread-local storage where each thread has its own copy of a variable, preventing interference from others.

  4. Immutable Objects: Sometimes, the best toy is the one that can't be destroyed, no matter how much the puppies tug on it. In programming, immutable objects can be safely shared between threads without needing synchronization because they cannot be modified after creation.

Understanding Errors

Common Errors from Lack of Thread Safety

  • Race Conditions: When threads race to read or write shared data, the outcome depends on who runs first. It's chaotic, unpredictable, and a surefire way to corrupt your data.

  • Deadlocks: Picture two dogs with leashes intertwined, each waiting for the other to move. Deadlocks freeze your threads when they wait indefinitely for resources locked by each other.

  • Livelocks: Similar to deadlocks, but here the threads are active, constantly trying to resolve contention without progress—like dogs in a tug-of-war, endlessly pulling with no victor.

  • Starvation: Occurs when a thread is perpetually denied access to resources because other threads hog them. It's like a pup missing mealtime because the bigger dogs always get to the bowl first.

  • Priority Inversion: A low-priority thread holds a lock needed by a high-priority thread, which then can't proceed, akin to a small dog hogging the toy just out of reach of a more eager, larger pup.

Detecting Thread Safety Violations

Detecting these issues can be tricky, as they often manifest under specific timing conditions or under heavy system load. Tools like thread sanitizers, debuggers with thread analysis capabilities, and logging can help identify these errors. Look for symptoms like unexplained crashes, data inconsistencies, and performance bottlenecks.

Tools for Ensuring Thread Safety

Let's delve into some of the robust tools Unreal Engine offers to protect your code from the chaos of concurrency.

F Critical section

The FCriticalSection class is a mutex that ensures that only one thread can execute a section of code at a time. It's like having a traffic light at a busy intersection, controlling the flow to prevent accidents.

T ATOMIC / STD::atomic

TAtomic in Unreal and std::atomic in C++ Standard Library provide fundamental operations that are performed atomically, such as incrementing a counter or updating a flag. They're the equivalent of giving each dog a separate treat at the same time—no fuss, no muss.

FPlatform Atomics

FPlatformAtomics provides a suite of static functions for atomic operations that are platform-agnostic, ensuring that your atomic operations work seamlessly across different hardware.

TQueues

TQueues offer thread-safe FIFO data structures, ensuring data is processed in the order it was added, just as if you're training pups to wait their turn for treats.

Spinlock (UE5)

Introduced in Unreal Engine 5, FSpinLock is a lightweight alternative to traditional mutexes that "spins" in a loop while waiting to acquire a lock, useful for short operations where the overhead of putting a thread to sleep is too great. It's like having a quick game of fetch while waiting for dinner to be served.

TShared Ptr (OG OF EM ALL)

TSharedPtr is a smart pointer that ensures the object it points to is automatically deleted when no longer in use. When it comes to thread safety, it's the equivalent of an auto-cleaning dog bowl—once the dogs are done eating, the bowl cleans itself up. OG OF EM ALL

F Event

FEvent is a synchronization primitive that allows threads to signal one another, much like teaching dogs to bark to signal they're ready for dinner. It's useful for coordinating the sequence of operations across threads.

Managing Shared Data

Managing shared data is an art form in itself. It involves strategies such as:

  • Minimizing shared state: The less shared data, the fewer chances for contention.

  • Immutable state: Data that doesn't change is inherently thread-safe.

  • Confinement: Keep data local to a thread when possible, avoiding shared access.

  • Copy-on-write: Threads work on copies of data, and only synchronize at critical points.

4. Uillars of software architecture - threading

A robust game architecture for multithreading stands on two foundational pillars: data integrity and performance efficiency. To achieve these, one must understand the roles of various threading mechanisms and design the game systems around them.

1. Data-Driven Design

In a multithreaded environment, data management is paramount. A data-driven architecture allows for clear separation of data and logic, minimizing dependencies and enabling easier data access across threads. This approach often leverages Entity Component System (ECS) patterns, which organically fit into a multithreaded paradigm by decoupling state from behavior.

2. Task-Based Systems

Breaking down the game logic into discrete tasks or jobs that can run independently provides a clear pathway for distributing work across threads. This granularity facilitates dynamic load balancing and reduces contention, making it easier to scale performance with the number of available cores.

3. Concurrency Control

Understanding and implementing concurrency control mechanisms is crucial. Mutexes, spinlocks, and atomic operations are the sentinels that guard against race conditions and data corruption. Each has its place:

  • Mutexes (FCriticalSection): Ideal for protecting large sections of critical code or complex data structures where the overhead of locking is less significant than the need for data integrity.

  • Spinlocks (FSpinLock in UE5): Useful in scenarios where the wait time is expected to be very short, and the overhead of putting a thread to sleep is greater than the cost of busy-waiting.

  • Atomics (TAtomic, std::atomic): Perfect for simple data types where operations must be indivisible, such as counters or flags.

4. Event-Driven Synchronization

Events (FEvent) and condition variables allow threads to wait for specific conditions or signals before proceeding. This pattern is useful for orchestrating complex interactions between threads, such as ensuring resources are loaded before the game logic proceeds.

5. Memory Management

In multithreading, the way memory is allocated, accessed, and deallocated can significantly impact performance. Use thread-local storage to prevent contention, and be cautious with memory allocation and deallocation across threads, which can be a source of performance bottlenecks.

5. Lock-Free VS Locking Sync

A lot of people are confused in this one, I will try to explain in simplest way possible.

LOCKING

🐶 Imagine a puppy, let's call him Buddy, who has a collection of five chew toys. Buddy is quite possessive and insists on playing with each toy himself, not allowing any of the other dogs to touch them while he's playing. In software terms, Buddy has employed a "locking" mechanism or mutex. SORRY FOR THIS METAPHOR. BEST I COULD COME UP WITH.

In locking synchronization, when Buddy plays with a toy (or when a thread accesses a variable), he ensures exclusive access. No other pup (thread) can play with the same toy (modify the variable) until he's done and moves on to the next one. This approach is straightforward and ensures that only one thread modifies a variable at a time, preventing any form of data inconsistency or race conditions.

Pros:

  • It's a clear and easy-to-understand system.

  • It ensures that only one pup has the bone at a time, preventing any tussles (race conditions).

Cons:

  • If Locky takes a long time, the other pups must wait, which isn't very fun (can lead to performance bottlenecks).

  • If Locky forgets to remove the sign after playing, no one else can ever play with the bone (potential for deadlocks).

Lock-free

🐶 Now, let's suppose Buddy learns to share. He allows other dogs to join in the fun, taking turns to play with the toys. This represents "lock-free" synchronization.

Lock-free synchronization is akin to an organized game where each dog waits for its turn to play with the toys, ensuring no two dogs go for the same toy simultaneously. In programming, lock-free algorithms allow threads to work concurrently but in a manner that avoids the need for exclusive locks. Instead of enforcing a strict order, these algorithms typically use atomic operations to ensure that a thread can complete its task in a finite number of steps, regardless of what other threads are doing.

Pros:

  • All pups can play simultaneously, making it a joyous and lively scene (improved performance and scalability).

  • There's no need for signs or complex rules (reduced overhead and complexity).

Cons:

  • Sometimes, two pups might go for the same toy at the exact moment, which requires quick reflexes and rules (can be complex to implement correctly).

  • The pups need to be well-trained and aware of each other to avoid confusion (requires careful programming to ensure thread safety).

Locking vs. Lock-Free: Choosing the Right Approach

Both approaches have their place in the software world, and understanding when to use each is key to a well-behaved program:

  • Locking (Mutexes): Suitable for complex operations or when working with resources that require strict ordering and safety. It's easy to understand and implement but can lead to performance bottlenecks if not managed carefully.

  • Lock-Free: Ideal for high-performance systems where scalability and responsiveness are critical. It avoids pitfalls like deadlocks but can be complex to design and harder to reason about.

6. Custom wrapper class for t-s operations

The Atom Class: Synchronization

The Atom class is a meticulously crafted template class that encapsulates a variable, ensuring thread-safe access and modifications.
It's like providing each dog (thread) with their own bowl (shared variable).

Previous
Previous

What exactly is neuro ai??

Next
Next

Xyris Server