Posts on fahersto's blog

Reflecting on game engine development

Fri, 12 Dec 2025 00:00:00 +0000

Game engine tech stack

As some of you may know, I have spent a considerable amount of time developing a game engine in C++. Toward the end of the year, I want to take some time to reflect on where I’m at and how I plan to move forward. Let us begin with a list of libraries that are currently used:

Rendering: OpenGL
Physics: Jolt
Audio: FMOD
GUI: ImGui, ImGuizmo, ImNodes
Text: msdf-gen
Window: GLFW
Serialization: yaml-cpp
Testing: googletest
Scripting: sol3
Navigation: recastnavigation
Asset Importing: assimp, ufbx
Logging: spdlog
Profiling: tracy
Graphics Debugging: RenderDoc

Feature showreel

Building on these libraries we implemented several features. So let us show some of these.

PBR Rendering

Physically based rendering has become the standard for rendering realistic graphics. Therefore, it is the default rendering method in our game engine. We even support editing the PBR shader using a node based approach which many developers have to come to expect ever since Unreal Engine introduced its blueprints. There should have been a video here but your browser does not seem to support it.

Particle Systems

Another fundamental feature of a game engine is a particle system. Even with a relatively small set of configurable parameters, a wide variety of effects can be achieved. Our system allows configuration of texture, size, lifetime, color, velocity, and gravity all over the lifetime of a particle. Additionally, each paramter can have a variance that is applied to each emitted particle individually. This enables effects such as hearts, which move upward and sideways at different speeds while shrinking over time as can be seen below.

The same system can also be used to create a fountain-like effect, where each particle is animated over time, affected by gravity, and initialized with a randomized velocity.

Animations

Implementing animation support has been one of the most time-consuming aspects of reaching the current state of the engine. There are many different asset file formats that support animations, and even within a single format there are often multiple ways to represent movement. Combined with additional data such as scaling calculation based on the node hierarchy, there are many cases to handle animation properly.

When testing a new model, our first step is to verify whether other tools or websites can import it correctly. There is an overwhelming number of free models with animations that simply did not properly work in any other tools we tested. Based on our testing, Sketchfab provides one of the most robust online model viewers. As a result, our goal was to support any model inside the engine that also works correctly on their platform.

In our opinion, it is always worth creating debug visualizations. The following GIF shows bone weight visualization of a dancing stormtrooper driven by skeletal animation.

Additionally, the engine provides visualization of the currently selected mesh, as well as options to display the skeleton and render the mesh in wireframe mode.

There should have been a video here but your browser does not seem to support it.

Physics

Our Jolt physics integration is still in its early stages. Determinism issues were one of the main reasons we introduced testing. At the moment, the engine supports several collision shapes. In the following video, a mesh collision shape interacts with the floor in multiple scenarios. If the collision response appears incorrect, this is due to poorly configured physics materials and the fact that the simulation body of the mesh does not match the visual representation particularly well.

There should have been a video here but your browser does not seem to support it.

Text rendering

Text rendering turned out to be a far more involved process than we initially expected. For that reason, there exists a separate blog post that discusses this topic in greater detail. For this post, simply enjoy this text component of our game engine. Text can be freely rotated, translated, and scaled in the editor while remaining crisp at all times thanks to MSDF-based text rendering. Also you basically get an outline for free when using this rendering method.

Limitations

Since this is a solo project, almost none of the engine’s features are fully finished. However, the following are some of the most significant limitations that currently hinder creating and shipping a game with the engine.

Lights
Only a single directional light and up to four point lights are supported. This is mainly due to the renderer being a simple forward renderer and for each mesh rendered each light in the scene has to be iterated over. This does not scale. We do not regret starting this project with such a simple renderer though since the learning curve to graphcis programming is steep enough even when choosing a simple rendering approach.

Shipping
There is no automated packaging or asset cooking pipeline. Shiping a game currently make it the response of the developer to copy all required assets manually to the right folder. This does not scale and is error prone.

Regrets

Our main motivation to write this post is to force ourselves to reflect on issues we encountered during game engine development in the hopes to help others and our future selves.

Testing

By far our biggest regret is introducing testing far too late in the development process. But why?

We started this project roughly five years ago. At the time, I was still at university and had little understanding of how to write code that can be maintained and extended over many years. In my opinion, testing is an integral part of writing high-quality software a mindset that has since been reinforced by my day job as a Java software engineer.
Testing a game engine is not straightforward. Most of my professional experience is with Java Spring applications, which can be tested quickly and easily using tools such as JUnit. Spinning up an entire game engine as part of a test suite is a new task for me. We also introduced many side effects for example by having too much global state. Therefore, the order in which tests run work differently.
Testing is not inherently fun, and this is a hobby project. That said, we have since experienced the satisfaction of being able to make large changes without constantly worrying about breaking core functionality, thanks to a solid test suite. The longer a project lives and grows, the more value and enjoyment you get out of the tests that were previously written.

Moving too quick

A game engine is a really nice hobby project because there area so many things to implement that you can learn a lot of different things and change up the topics you work on when you do feel like it. There is a downside of working on a big software project like a game engine with this mindset though. You (or atleast I)tend to get things to a ‘working’ state, instead of a correct state. The shortcuts you took years ago tend to catch you off guard much later on. Additionally, writing good documentation is crucial. We spent so much time debugging issues that in the end turned out to be just a function that did not properly handle a certain case, had a confusing name or was not implemented at all. Switching topics really encourages such issues to pile up. One thing that could help in such cases is writing tickets with acceptence criteria. At that point it just becomes too close to work for my liking. This is supposed to be a hobby you mostly do for fun after all.

Assimp

Assimp is great because it supports a large number of file formats that can be used to import assets into a game engine. However, assimp is extremely slow in debug mode. Additionally, almost every time we updated to a newer assimp version and re-imported the same .fbx models, something broke. While this is not entirely assimp’s fault (FBX itself is notoriously complex) it was still very frustrating and time-consuming to fix.

Skeleton animations in particular often broke between versions, and debugging issues with assimp is painful due to its terrible performance in debug builds. At this point, we no longer use assimp at all. Instead, we rely on specialized libraries that focus on a single file format and aim to get that one format right, rather than providing a high-level abstraction that has to support many formats at once. So for FBX files we integrated ufbx.

Future

Graphics API

We really want to move to a newer graphics API. The global state of OpenGL and single thread context has been limiting testing aswell as performance. However, we do not regret having started with OpenGL. The learning curve of vulkan is so much steeper and it takes much more upfront work to get the couple first hits of dopamine when you actually see something on screen.

Linux support

We really want to support linux since gaming is the chain that forces us to still run windows. Due to a specific feature of our game engine, linux support has not been feasible with OpenGL. Since we decided to move to Vulkan on our next iteration of the engine we should be able to support linux!

Releasing a game

One of our main goals when creating a game engine is to ship a game that is built with the engine. Sadly, this goal will have to be postponed by a couple of years because of the decision to move to Vulkan alone, since this is a new technology to us.

Game engine 2.0

Over the past couple of years we have learned a lot about game engine development. This allows us to make better design decisions regarding game engine architecture. Therefore, we decided to not only switch from OpenGL to Vulkan but rewrite the entire game engine from scratch. Taking the lessons learned and referencing code of our first version to create a much more robust game engine that can actually be used to release a game.

Process Doppelgänging - code injection technique

Sun, 03 Dec 2023 00:00:00 +0000

What is Process Doppelgänging?

Process Doppelgänging is a code injection technique which allows to load and execute arbitrary code in the context of a benign process without calling Windows API functions commonly invoked to achieve code injection. The technique was published by Tal Liberman and Eugene Kogan at Black Hat Europe 2017. The concept is to abuse NTFS transactions to create a process from a malicious section that is seemingly backed by a benign file. From an attacker’s point of view this is desirable as antivirus software may fail to scan the code of the malicious section and analyze the content of the benign file instead.

The image above illustrates the concept of Process Doppelgänging by creating a malicious notepad process from an overwritten svchost executable. The first step is opening a benign binary file such as svchost.exe in a transaction. In the next step, the file is overwritten by the malicious payload which is then mapped into memory. Subsequently, the transaction is rolled back which restores the changes on the file system. The section mapped into memory is not affected by this rollback. This results in a payload section in memory that is linked to a benign file such as svchost.exe. Then, a process based on the payload section is created by calling the low level Windows API function NtCreateProcessEx. Finally, the attacker creates the first thread of the process to start execution at the start address of the payload.

This technique is still prevalent on Windows 7 at the time of writing, but has been patched on Windows 10 where a Windows Defender driver blocks process creation from files with pending transactions. Since this method requires direct calls to low level Windows API functions, support for WOW64 must be implemented manually. We could not find an implementation providing WOW64 support. According to malware analyst hasherezade, implementing WOW64 support is feasible but tedious, since it requires manually filling in many parameters that otherwise are handled by the Windows loader.

In 2018 security researchers at Kaspersky Lab identified that the SynAck ransomware employed this technique. Later, security researchers from enSilo uncovered that more than 20 malware families had already employed this technique in 2018. They discovered a loader they named TxHollower to bring Process Doppelgänging to malware families such as FormBook, LokiBot and SmokeLoader. We suspect that malware performs Process Doppelgänging when it invokes the NTFS transaction API since transactions are rarely created. We dump the process when NtCreateThreadEx is called to start execution.

Prototype implementation

My Process Doppelgänging implementation is available on my GitHub.

MSDF font rendering in OpenGL

Wed, 06 Jul 2022 00:00:00 +0000

Text rendering

One of the latest additions to my OpenGL game engine is text rendering. TIt took more effort than I expected, and that’s why this article exists. The OpenGL standard does not define text rendering. This was surprising to me at first, because while OpenGL is a low-level graphics API, rendering text seemed pretty low-level to me. Boy was I wrong. To ensure the correct placement of individual glyphs, a number of variables must be taken into account, as shown in the following figure:

Source: https://learnopengl.com/In-Practice/2D-Game/Render-text

As if that wasn’t annoying enough, there is not only text from left to right, but also from right to left and even vertical text. Also there exist quite a few more letters than the ASCII range. If you aren’t convinced yet, get your blood boiling by reading the article Text Rendering Hates You by Aria Beingessner. It seems like there is no implementation that gets everything perfect. So yes, maybe rendering text is not as easy as I thought.

In addition to the computer-unfriendly way written language works, there are also different methods to achieve font rendering.

Legacy font rendering

The basic implementation uses a texture of all characters and cuts out the letters it needs to combine them into a word. This implementation works, but it has the big disadvantage that we can’t really scale the text well. Therefore, our font atlas texture would have to have a huge resolution to make large letters look good on for example a 4k screen. This would end up taking precious memory on our graphics card and decrease performance of our font rendering. Moreover, we might face issues when downscaling our characters in case they only occupy little space on a monitor with lower resolution. That’s why we need something better.

SDF

In a Signed Distance Field (SDF) each point that is part of a defined space is assigned a positive value that corresponds to the smallest distance to any point outside of the space. Each point outside the space is represented by a negative value which corresponds to the smallest distance to any point inside the space. We can write these distances into a channel of a texture (instead of the letter itself) to describe the shape of a character and then use these distances in our shader to construct the letter in an arbitrary size. Since constructing the character in our shader based on the texture can be done very efficiently this only introduces minimal overhead while allowing for a font atlas texture with much smaller resolution. Here is the result of creating the letter “A” from a 16x16 SDF texture:

Source: https://github.com/Chlumsky/msdfgen

MSDF

A Multi-Channel Signed Distance Field (MSDF) expands the idea of encoding the shape into a texture to all three color channels. This further improves the quality of the rendered text. This method has been published by Viktor Chlumský in a GitHub repository. THis master’s thesis on this topic can also be found there and is well worth reading. Here is the result of creating the letter “A” from a 16x16 MSDF texture:

Source: https://github.com/Chlumsky/msdfgen

Observe how much sharper the character construction turned out, even though our font atlas texture has the same resolution!

Pain

After researching various methods of text rendering, I decided to use MSDF. Luckily, Chlumský also provided a project that generates an entire font atlas instead of individual characters (msdf-atlas-gen). Furthermore, it compresses them into a single texture with all the metadata.

However, in my opinion this project lacks usability. To implement MSDF atlas generation into my game engine I had to dig deep into the code of the CLI tool and other resources. Here are some that helped me:

Viktor Chlumský:

Michael Martz:

Pain relief

I made this very basic sample implementation https://github.com/Fahersto/OpenGL_msdf which I hope may prevent you from suffering as much as I did. As you can see some parts of it are heavily based on code by Chlumský and Martz. Any improvements to this repository are very much welcome. The intent of this project is to make other people not spend solid 2 weeks on implementing MSDF font rendering. Just open a pull request!

Demo inside my 2D engine

Finally, some of that CSI: Miami zoom that MSDF rendering allows:

Parametric spline interpolation

Sun, 20 Mar 2022 00:00:00 +0000

Natural cubic spline

Piecewise spline interpolation fits cubic polynomials through a set of points. In contrast to utilizing a polynomial of a higher degree, this results in a smooth interpolation that stays much closer to the target points. The interpolation is based on a t value. The t value has to be monotonically increasing. This in turn means that a regular spline can only ever go into one direction. However, parametric splines overcome this limitation.

The math

The construction of a natural cubic spline is described in “Numerical Analysis” by Richard L. Burden and J. Douglas Faires. In my implementation the t value increases by one for every segment.

void ComputeCubicSpline(double* values, int valueCount)
{
    double* x = new double[valueCount];
    double* a = new double[valueCount];
    for (int i = 0; i < valueCount; i++)
    {
		// segment t values: [0,1], [1,2] This is asumed when finding the segment to a t value
        x[i] = i; 
        a[i] = values[i];
    }

    if (valueCount == 0)
    {
        printf("Error: Trying to compute spline without points\n");
        return;
    }

    // number of segments is equal to number of points - 1
    int segmentCount = valueCount - 1;

    // Implementation of the books algorithm
    // allocate memory
    double* b = new double[segmentCount + 1];
    double* c = new double[segmentCount + 1];
    double* d = new double[segmentCount + 1];

    // step 1
    double* h = new double[segmentCount];
    for (int i = 0; i <= segmentCount - 1; i++)
    {
        h[i] = x[i + 1] - x[i];
    }

    // step 2
    double* alpha = new double[segmentCount];
    for (int i = 1; i <= segmentCount - 1; i++)
    {
        alpha[i] = (3 / h[i]) * (a[i + 1] - a[i]) - (3 / h[i - 1]) * (a[i] - a[i - 1]);
    }

    // step 3
    // step 3,4,5 and part of 6 solve tridiagonal system
    double* l = new double[segmentCount + 1];
    double* u = new double[segmentCount + 1];
    double* z = new double[segmentCount + 1];
    l[0] = 1;
    u[0] = 0;
    z[0] = 0;

    // step 4
    for (int i = 1; i <= segmentCount - 1; i++)
    {
        l[i] = 2 * (x[i + 1] - x[i - 1]) - h[i - 1] * u[i - 1];
        u[i] = h[i] / l[i];
        z[i] = (alpha[i] - h[i - 1] * z[i - 1]) / l[i];
    }

    delete[] x;
    delete[] alpha;

    // step 5
    l[segmentCount] = 1;
    z[segmentCount] = 0;
    c[segmentCount] = 0;

    delete[] l;

    // step 6
    for (int j = segmentCount - 1; j >= 0; j--)
    {
        c[j] = z[j] - u[j] * c[j + 1];
        b[j] = (a[j + 1] - a[j]) / h[j] - h[j] * (c[j + 1] + 2 * c[j]) / 3;
        d[j] = (c[j + 1] - c[j]) / (3 * h[j]);
    }

    delete[] h;
    delete[] u;
    delete[] z;

    for (int i = 0; i < segmentCount; i++)
    {
		// store polynomials
        polynomial_.push_back(Polynomial(d[i], c[i], b[i], a[i]));
    }

    delete[] a;
    delete[] b;
    delete[] c;
    delete[] d;
}

To get a position on the spline we just enter a t value into the corresponding polynomial. Since we know that t is increased by one for every segment, we can directly use t to lookup the corresponding segment/polynomial.

Parametric spline

Parametric splines are based on a cubic spline for each dimension. Therefore, a 2D spline consists of two cubic splines and a 3D spline of three.

void ComputeParametricSpline(double* x, double* y, double* z, int valueCount)
{
    splineX_ = CubicSpline();
    splineY_ = CubicSpline();
    splineZ_ = CubicSpline();

    splineX_.Compute(x, valueCount);
    splineY_.Compute(y, valueCount);
    splineZ_.Compute(z, valueCount);
}

To get the position on a three dimensional parametric spline, we just pass our t value to each of our splines:

Vector3 eval(float t)
{
    return Vector3(splineX_.eval(t), splineY_.eval(t), splineZ_.eval(t));
}

Since we now have a cubic spline for each dimension, we can move freely into any direction and also intersect ourselves.

Demo inside my 2D engine

This demo constructs a 2D parametric spline from points added by clicking. Furthermore, it animates a point on the spline by passing the current time as t value. There should have been a video here but your browser does not seem to support it.

Prototype implementation

My parametric spline implementation is available on my GitHub.

GhostWriting - advanced code injection technique

Sat, 19 Mar 2022 00:00:00 +0000

I recently had the chance to study several code injection techniques in-depth. Specifically Host-Based Code Injection Attacks (HBCIAs). This term was introduced to distinguish code injection attacks that target the local system from ones that target remote systems such as SQL injection. I have implemented 22 HBCIA techniques over the last couple of months and the GhostWriting technique stood out to me in particular.

What is GhostWriting?

GhostWriting is an advanced code injection technique that combines thread hijacking, a write-gadget to write to an arbitrary memory location and an endless loop to stall execution.

Implementation details

The endless loop and write-gadgets in our implementation are located inside ntdll.dll. This ensures that they are available in every process:

; endless loop		; 32-bit write-gadget		; 64-bit write-gadget
jmp 0x0				mov[ecx], edx				mov [rbx], r14
						ret							mov rbx, [rsp + 30h]
													add rsp, 20h
													pop r14
													retn

The following discussion of the GhostWriting technique is based on the 32-bit write-gadget. The 64-bit version is implemented analogously, but has to account for side effects caused by the 64-bit write-gadget. Our 64-bit write-gadget contains side effects since we have to avoid registers that are not set reliably when calling SetThreadContext or NtSetContextThread. This happens whenever we suspended our thread during a system call as Sam Russel describes in his brilliant blog post.

The first step when performing GhostWriting is to write the endless loop onto a fabricated stack. This is achieved with thread hijacking as described in algorithm 1.

When the instruction pointer of the thread equals the address of the endless loop, we know that the thread has written the address of the endless loop to the stack and is now stuck inside the loop. This is the signal that the data has been written and we can perform the next operation. Afterwards, the ROP chain is written by repeating algorithm 1 and modifying the source and destination on every invocation. We can only write 4 bytes in a 32-bit process and 8 bytes in 64-bit process per invocation, since we are limited by the amount of data a single register can hold.

Now that we have successfully set up our fake stack, we still need to make the stack executable to be able to execute our payload. This is achieved by crafting a ROP chain that calls VirtualProtect to add the execute flag to the corresponding memory pages. Since the calling convention of the function is fastcall, we set up the parameters by writing the corresponding values to RCX, RDX, R8 and R9.

One challenge when implementing GhostWriting is executable memory. I saw many implementations that took shortcuts such as only injecting into the local process or allocating remote executable memory by calling VirtualAllocEx. However, our implementation stays true to the concept and creates executable on the stack by executing a ROP chain that calls VirtualProtect to add the execute flag to the stack. A ROP chain allows to control execution flow via the stack, as each gadget performs its instructions and finishes with a RET instruction that transfers the control flow to the next gadget in the chain on top of the stack. Our ROP chain only uses gadgets that can be found in ntdll.dll since every user mode process maps ntdll. Here it is:

pop rdi; ret
; VirtualProtectAddress
pop rcx; ret
; targetAddress,
pop rdx; pop r11; ret
; size
; trash r11 (gadget sideeffect)
pop r8; ret
; newProtection (PAGE_EXECUTE_READWRITE)
pop r9; pop r10; pop r11; ret
; oldProtection (just some pointer to writeable memory)
; trash r10 (gadget sideeffect)
; trash r11 (gadget sideeffect)
push rdi; ret (this instruction calls VirtualProtect since we put its address into rdi earlier)
; Address of the written shellcode on the stack. VirtuaProtect will use this address to return to after its call.

Instead of the instructions itself we write the virtual addresses of these instructions inside ntdll to our fabricated stack. The commented lines are also filled in during runtime.

Next, the shellcode is written to the stack. The final step is to execute the ROP chain. This is done by suspending the thread and setting its program counter to the first gadget of the ROP chain and its stack pointer to the address of the second gadget on the stack. When the thread is resumed, it first executes the ROP chain which marks the stack as executable followed by execution of the payload on the stack.

Demo

Notice that the hijacked thread will only write our payload that spawns a MessageBox when we hover the notepad window.

Prototype implementation

My GhostWriting implementation (tested on Windows 10 Build 19043) is available on my GitHub.

Register handle operation callbacks from unsigned drivers with this one weird trick

Fri, 18 Mar 2022 00:00:00 +0000

Monitor handle acquisition

Acquiring a handle to a target process is a critical step in many code injection techniques. The Windows operating system exposes a mechanism that allows kernel mode drivers to supply handle operation callbacks. These callbacks can be registered by calling ObRegisterCallbacks. When a handle is created or duplicated the pre-operation callback is invoked before the operation is performed and the post-operation callback after the operation occurred. This mechanism is utilized by many antivirus and Anti-Cheat solutions to protect processes from code injection. For more information about how this mechanism can be employed to protect processes and how to overcome these protections take a look at this excellent article by Daax Rynd.

Since I do not own a valid driver certificate our first task is to load our unsigned driver. This can be accomplished by enabling TESTSIGNING. Enabled test signing can be detected from user mode by querying the SystemCodeIntegrityInformation through NtQuerySystemInformation. Therefore, it is reasonable to assume that whatever we want to monitor may check whether test signing is enabled and alter its behavior accordingly. Therefore, we choose the other option which is loading the driver using an exploit.

Kdmapper

kdmapper exploits a vulnerable Intel driver to manually map unsigned drivers. After we define a custom entry point for our driver, it can be successfully mapped by kdmapper. The downside of loading our driver this way is that we do not have a DRIVER_OBJECT.

ObRegisterCallbacks

As discussed previously ObRegisterCallbacks requires the callbacks to reside in signed kernel images. This is an issue because our manually mapped kernel mode driver does not have a valid DRIVER_OBJECT. The following call chain is invoked when we attempt to register a handle operation callback from our driver:

ObRegisterCallbacks 
	| // 0x20 = LDRP_VALID_SECTION flag
	|--	MmVerifyCallbackFunctionCheckFlags(ourCallbackFunction, 0x20)	
		| // looks up in which module ourCallbackFunction resides
		|--	MiLookupDataTableEntry
			| // checks if our driver has the required flag 										
			|--	DriverObject->DriverSection->flags & 0x20

Since our DRIVER_OBJECT is not valid because we mapped it with kdmapper, we do not have the required flag. However, if you are running Windows 7 you can use DSEFix to map your driver which comes with the added benefit of having a valid DRIVER_OBJECT. Therefore, on Windows 7 the trick we describe in the next section is not required to register handle operation callbacks.

One weird trick

AdrianVPL suggested a trick to bypass this limitation in a forum post. To understand it we take a look at the signature of the pre-operation callback:

OB_PREOP_CALLBACK_STATUS PobPreOperationCallback(
	PVOID RegistrationContext
	POB_PRE_OPERATION_INFORMATION OperationInformation
)
{...}

The calling convention of PobPreOperationCallback is fastcall. This means that the first parameter (RegistrationContext) is passed in the RCX register and the second parameter (OperationInformation) is passed in RDX. The interpretation of the registration context is driver-defined which in turn means that the driver itself is the only code accessing it.

The idea is to set the registration context to the address of our pre-operation callback and set the pre-operation field which would usually hold the callback to an address containing a JMP RCX instruction inside a valid driver. This results in ObRegisterCallbacks validating whether the driver containing the JMP RCX instruction is valid instead of checking our unsigned driver. When the callback is executed, the JMP RCX instruction jumps to the value in the registration context, which is the real callback inside our unsigned driver.

In his post, AdrianVPL suggested abusing ntoskrnl.exe which is signed and contains multiple JMP RCX instructions. However, abusing ntoskrnl did not work on several Windows 10 versions we tested. We conducted kernel mode debugging and noticed that ntoskrnl is missing a required flag that is checked when attempting to register a handle operation callback. We solved this issue by iterating currently running drivers and searching a signed driver with the required flag and a JMP RCX instruction. On our system one such driver is the DirectX Graphics Kernel which our current implementation abuses.

Our prototype implementation is available here: https://github.com/Fahersto/kernel_handle_monitoring

Revisiting Age of Empires 2: Definitive Edition

Wed, 05 Jan 2022 00:00:00 +0000

Background

The new years has just begun and it’s been half a year since I published my findings on Age of Empires 2: Definitive Edition and even longer since these issues have been brought to the games developers and Microsofts attention. Has the issue been fixed?

NO.

Brush off a bad day:

In Soviet Russia projectile dodges you:

As always this works online and in all game modes.

Will it be fixed?

The developers acknowledged the issue when I first reported it severals month ago and have “added it to their database”. With the release of Age of Empires IV I think it is more unlikely than ever to be fixed. I would love to be surprised though :)

imogui - draw on overlays using imgui

Mon, 27 Sep 2021 00:00:00 +0000

imogui is my GUI library to hook existing overlays and draw on them using imgui. All of imgui’s powerfull widgets such as buttons, plots and colorpickers can be used to create an interactive extension to any program using one of the supported overlays. This is very convenient as some of the supported overlays allow to add to any third party application (such as Steam). This project uses hookFTW to hook into the drawing function of the target overlay to achieve its functionality.

Using imogui

Additionally to imgui widgets it is also possible to draw several primitives using the passed imogui::Renderer inside the provided OnDraw callback. Here is an example using the Steam overlay in a 64-bit DirectX 11 game:

void OnDraw(imogui::Renderer* renderer)
{
	// use the renderer to draw anything
	[...]
	renderer->RenderCircle(bouncyballs[i].pos, bouncyballs[i].radius, bouncyballColor, i % 8, 32);

	static bool openOverlay = true;

	// create an imgui window to interact (choose bouncy ball color in this case)
	if (ImGui::Begin("IMOGUI - Hook", &openOverlay))
	{
		ImGui::Text("Intermediate Mode Overlay GUI");
		ImGui::ColorPicker3("Bouncyball Color", color);
		bouncyballColor = ImGui::ColorConvertFloat4ToU32(ImVec4(color[2], color[1], color[0], 1.f));
	}
	ImGui::End();
}

imogui::Steamoverlay steamOverlay;
steamOverlay.Hook(imogui::renderapi::directx11, OnDraw);

After compiling it into a DLL and loading it into the target process (here: Rocket League) using a DLL injector, this is the result: The purple discs and two open menus are usually not part of the game.

Unhooking imogui allows to modify your projects code and reinject your modified DLL during initial development (or if your usecase demands it) and is just as easy:

steamOverlay.Unhook();

Note that this library is not limited to be used with games only. It works with any software employing one of the supported rendering APIs.

Currently supported overlays

32 Bit

Overlay	OpenGL	DirectX 9	DirectX 11	DirectX 12
Steam	❌	✅	❌	❌
Discord	❌	✅	❌	❌
Origin	❌	❌	❌	❌
MSI Afterburner	❌	❌	❌	❌
Overwolf	❌	❌	❌	❌
GeForce Experience	❌	❌	❌	❌
OBS	❌	✅	❌	❌

64 Bit

Overlay	OpenGL	DirectX 9	DirectX 11	DirectX 12
Steam	❌	❌	✅	❌
Discord	❌	❌	✅	❌
Origin	❌	❌	✅	❌
MSI Afterburner	❌	❌	✅	❌
Overwolf	❌	❌	❌	❌
GeForce Experience	❌	❌	❌	❌
OBS	❌	❌	✅	❌

Roadmap

As you can see in the table above there are numerous overlays and rendering APIs that are yet to be supported by this project. If anyone is interested in contributing to the project I’m more than happy to assist. These are the features I’m most exited about currently:

Add support for OpenGL
Add support for DirectX 12
Add support for Vulkan
Add more overlays

hookFTW - hook for the win(dows)

Wed, 15 Sep 2021 00:00:00 +0000

This is a post is about my C++ hooking library for Windows (GitHub).

What is a hooking library?

A hooking library allows to change a target programs control flow. This can be useful to debug own applications but also to change or extend functionality of other programs. This functionality can be achieved using different methods. I implemented the following methods:

Byte patching .text section
Import Address Table (IAT)
Virtual Function Table (VFT)
Vectored Expcetion Handler (VEH)

Why write another Windows hooking library?

My motivation to write this library was twofold. First of all I wanted to really understand how hooking works and get an understanding of some of the low levels details, like relocating ASM instructions. Secondly I wasn’t really happy with the publicly available hooking libraries. Nearly all of the big names do not offer midfunction hooking (hooking withing a function instead of its prolog). An exception is Frida but the concept to inject an entire javascript engine into the target process to then communicate using a python binding seems like such a strange idea when dealing with this inherintly low level topic.

Relocation

When byte patching the .text section to hook a program, the overwritten bytes have to be saved and executed to preserve the target processes functionality (and not crash it). The issue with this is that many assembler instructions are position dependend - meaning that copying them somewhere else and then execute them will change their semantic. Therefore such instructions have to be detected and modified to preserve their original meaning. To do so a disassembler is required. I chose Zydis since it offers all the low level details about the disassembled instructions required to relocate instructions, has no third party dependencies, can be built using CMAKE and is blazing fast - all at the same time!

Using the library

The easiest way to use hookFTW is by cloning it recursivly and building the library using CMAKE. Hooking at a target address then becomes as easy as:

// use a midfunction hook. In this example we pass the proxy function as a lambda.
hookftw::MidfunctionHook prologHook;
prologHook.Hook(
	targetAddress,
	[](hookftw::context* ctx) {
		printf("inside hooked function\n"); 
		ctx->PrintRegister();
	}
);

Roadmap

There are some limitations on locations where hooks can be placed. Hooking a location where the target binary jumps to results in undefined behavior and most likely a crash. While I don’t know any feasable method to detect if a binary jumps to the code location, this rarely becomes a problem since hookFTW offers direct register access which usually allows to just hook a couple of bytes before/after and then get/set the desired value from the registers. There are some other handy features that I hope to implement in the near future.

This is it for now. In my next blogpost I will demonstrate how I put hookFTW to good use. Stay tuned!

Reversing Age of Empires 2: Definitive Edition

Mon, 14 Jun 2021 00:00:00 +0000

This is a repost. The original article can be found here.

This blog post describes my lockdown project of (partially) reversing the popular 2019 videogame Age of Empires 2: Definitive Edition. My efforts did not only educate me about lock-step simulation and 90s coding practices, but also lead to various multiplayer hacks.

First I’ll give a brief background on the game’s multiplayer architecture. Then I’ll explain how I interactively explored the game’s internals, until I could do things that should not be possible. A proof of concept that lets you instantly win every online match is provided at the end of the post.

Background

Age of Empires (AoE) is a very popular real-time strategy game series with roots in the late 90s. Numerous expansion packs and remakes have been published since. Exactly twenty years after the original Age of Empires 2 was released, the “Definitive Edition” remake was published in 2019. Like its predecessors it let’s you play online matches against real opponents.

While looking for a way to connect with nature during the corona lockdown (without leaving my basement ofc.), I stumbled upon AoE on steam. Naturally, I sucked at the game but got interested in how it works internally. I stumbled upon an interesting gamasutra article about how the first Age of Empire games managed to accomplish a (at the time) daunting task. They had to simulate several hundred units for up to 8 (!) players on the internet.

The gamasutra article also involved a curious paragraph:

Because the game's outcome depended on all of the users executing exactly the same simulation, it was extremely difficult to hack a client (or client communication stream) and cheat. [...] but these few leaks were relatively easy to secure in subsequent patches and revisions. Security was a huge win.

This sounds like a fun challenge!

The AoE network architecture was designed in the LAN party era, commonly referred to as the 90s. It is based on a lock-step simulation.

To make an interesting story short:

the game logic runs in “turns”
each client simulates the game on its own, there is no central server holding state
any command a player sends (such as moving a unit) is sheduled to be executed two “turns” later
every command sent needs to be verified and acknowledged by all players
if clients disagree about anything (e.g. the validity of a command or the position of a unit) the simulation is in a “desynced” state and the match gets terminated

This architecture implies that clients have to carefully inspect incoming commands and perform sanity checks on them. Missing or broken sanity checks would make it possible to send invalid commands that alter the game’s state in unintended ways. And that would be devastating, wouldn’t it?

Exploring the game

What does a 90s networking architecture have to do with “Age of Empires II: Definitive Edition” released in November 2019?

Quite a lot actually, since the game is mostly a graphic overhaul. Much of the underlying code is exactly the same as in the original.

First, I wanted to see if I can perform some actions through code. Using CheatEngine I was able to find a pointer to a unit object, just by manual searching based on changing a units position. Thanks to Runtime Type Information (RTTI) this also gives insights about its inheritance hierarchie. ReClassEx reports this RTTI about our unit object:

AVTRIBE_Combat_Object : AVRGE_Combat_Object : AVRGE_Action_Object : AVRGE_Moving_Object : AVGRGE_Animated_Object : AVRGE_Static_Object

So our unit is a AVTRIBE_Combat_Object which inherits from everything else to the right. Especially interesting is the inherited AVRGE_Moving_Object. Its existence suggest that movement may be a virtual function that is then overwritten. To find the move function, I set breakpoints on all functions in our units virtual function table and then remove all breakpoints that trigger for unknown reasons. By trial and error I could get rid of those “random” breakpoints until moving the unit around triggered only one breakpoint. This way I could identify the movement function inside the virtual function table.

To verify that this was indeed the function triggering movement, I wrote a small bot. This bot automatically dodges (my own) catapult attacks by moving all units from the impact location using their virtual move function.

The bot is injected into the Age of Empires process via a dynamic library (.dll), directly calling the movement function.

Unfortunately, there is a tiny issue with this method of moving units through code. It causes online games to desync and immediately terminate. This is expected since I’m just moving a unit, without sending the appropriate move command to other players. To synchronize the state, that 90s networking code must be used…

Digging deeper

Games tend to have huge code bases. Finding and identifying something like a command handler can be tedious work… This is where RTTI comes in handy once again. After some digging and working my way up the callstack of the movement function, I was able to uncover several classes of interest.

Here is a ReClassEx screenshot of the structures:

Thanks to RTTI I could find the command handler (called AVTRIBE_Command in the screenshot). It is now possible to set a breakpoint, which triggers every time the handler is accessed. Just by joining a multiplayer lobby and performing the actions I’m interested in (e.g. move a unit), I can identify the function for sending the corresponding command to the other players.

Using this method it was quite easy to identify the function responsible for sending a move unit command to other players. Here is an IDA screenshot of that exact function:

In order to keep a synced state between players, I have to call this function. As I only want to be able to call this function directly, I’m not terribly interested in its inner workings. Hence I did not reverse engineeer it.

To directly call a function, one needs to know the programs calling convention and which parameters to pass. Since the game is a 64bit windows binary, the fastcall calling convention is pretty much a given and IDA Pro happens to agree. IDA also determined that the function has 8 parameters… Luckily, dynamic analysis combined with RTTI makes the process of reverse engeneering these parameters quite easy. I could simply set a breakpoint at the start of the function and analyse the parameters when moving units in a multiplayer match.

The reversed parameters with types are:

CommandHandler: AVTRIBE_Command
Unit we want to move: AVTRIBE_Combat_Object : AVRGE_Combat_Object : AVRGE_Action_Object : AVRGE_Moving_Object : AVGRGE_Animated_Object : AVRGE_Static_Object
number of units we want to move: uint64_t
some parameter that always seems to be zero: uint64_t
target x position on the map: float
target y position on the map: float
indicator if movement should be queued as a waypoint: uint8_t
some parameter that always seemed to be one: uint8_t

With this information and the function’s offset within the binary, I can define the function’s prototype and call it directly from my dynamic library injected into the AoE process:

typedef int64_t(__thiscall* MoveUnit)(int64_t* command, int64_t* unit, int64_t unitCount, int64_t unknownZero,  float x, float y, char asWaypoint, char unknownOne);

//int64_t)GetModuleHandle(NULL) is used to get the ASLR base address of the age of empires process
static MoveUnit moveUnit = (MoveUnit)((int64_t)GetModuleHandle(NULL) + 0xE2CAE0); //this offset is valid as of 12 June 2021.

By calling this function I can move my units from code. That’s pretty cool, but what would happen if I try to move enemy units? Surely that’s where the afromentioned sanity checks would stop me from doing bad things, right?

Losing sanity

Turns out there is no sanity check preventing me from moving my enemies’ units. The game’s user interface only lets you select your own units. So it seems that we can only move our own units.

But since I’m directly calling functions via my injected library, I can just write the pointer to an enemy unit into my own current unit selection. This way I can control the enemy units directly via the user interface. The following GIF shows how I can move my own (blue) and my enemy’s (red) units.

The game does not really visually indicate the movement action of enemy units, as this is not something that should ever happen.

To be honest I did not really expect this to be possible. But since I had so much fun until now, I wanted to see what else is possible. After all disallowing a player to move units owned by other players would be one of the first sanity checks that comes to my mind when dealing with a real time strategy game. Turns out the sky is the limit. There seem to be no sanity checks whatsoever.

We can use the same method to work our way back from breakpoints on the command handler to any action in a multiplayer match we are interested in. One action that seems especially interesting is killing your own units to free up supply (usually done by selecting one of your own units and then pressing the “delete” key). It would be quite severe if we could just instantly kill enemy units with the press of a button. As expected, we can do that, thereby making Age of Empires 2: Definitive Edition my favourite clicker game:

To state the obvious: Enemy units usually don’t come with “kill” buttons attached to them. This is also a good time to remember that this works in multiplayer and also ranked matches. Just for the funsies I also implemented a social distancing functionality that kills all enemy units that get to close to one of my units. After all, whats life without a little whimsy?

Win the game (proof of concept)

I guess every good security related blog post should end with a proof of concept and demo video.

So here is a “instantly win the game” proof of concept for the latest version (2021-12-06 on Steam) of Age of Empires 2: Definitive Edition. It lets you enter a player’s number then loops over all of his units and kills them, instantly defeating the target player.

#include 
#include 
#include 

//working as of 12/06/2021.
/* NOT YET PUBLISHED - MAYBE SOON */

(We decided to not publish the PoC at this very moment. However, if the bugs won’t get fixed, we might publish it via our Twitter at a later time. We just want to make sure that fair play stays possible.)

This also works for ranked matches and could come in handy for tournaments (just kidding ;)). Be aware that the function offsets are hard coded and could change once there is a new patch for the game. The PoC would still work but would need updated offsets.

Showcase

Here is a video showing the PoC in action, killing both enemies’ instantly. In the lower right corner you can see the enemy player’s screen.

There should have been a video here but your browser does not seem to support it.

Conclusion

Due to hardware and bandwidth limitations of the 90s, Age of Empires uses a distributed state, employing a lock-step approach. Each client simulates the game by itself. There is no central server that holds the ground truth. As it usually is, legacy software lives a long live. In case of “Age of Empires 2: Definitve Edition” the networking architecture and code seem to still be heavily based on the original game from the 90s.

While this network architecture comes with benefits and has a certain appeal to it, it’s security relies purely on sanity checks that each communication partner has to perform. Unfortunately these sanity checks seem to be missing.

As a rewrite of the game seems unlikely, the way to fix this vulnerability would be to introduce sanity checks for every command received from another player.

While I haven’t looked into other “Age of Empires” games it seems plausable that other versions may have similar issues.

CodinaColada - game engine playground

Sun, 13 Dec 2020 00:00:00 +0000

Implementation of raymarching.

CodinaColada is my 2d game engine. Its main purpose is to be a sandbox in which I can implement any functionality I’m interested in.

Technologies

OpenGL

Box2D

Tracy

To improve performance of my engine I need to know where the biggest potentials for improvements are. This can be determined using a profiler. Tracy is one such profiler. It is able to measure CPU and GPU performance. This is done by introducing macros inside the functions to be instrumented which then create zones as can be seen in the image showing the performance of a single frame below.

From the image we can see that the engine is currently CPU bound as the red zones inside the OpenGL context only make up a tiny portion of frame. This is in contrest to the CPU which is busy the entire frame looping over the gameobjects and preparing rendering data (such as calculating positions). We can also see that there are 8 threads updating the gameobjects but the JsonAnalyser objects requires by far more time than the other >1000 gameobjects together. This however is not an issue of the engine but the application I build using the engine. The JsonAnalyser is responsible for rendering the text over 1000 nodes like these:

Therefore I refactored my code to have each individual node be a gameobject. This way much of the processing can happen concurrently in GameObject::Update and only the drawing itself has to take place in the drawing thread (GameObject::Draw). Here is the result:

As we can see the giant JsonAnalysis block is gone which in turn reduces the time per frame from ~8ms to ~5ms.

Now the largest block by far is Renderer::Draw, which sets up the data required for the OpenGLShaders (such as position, time…) for each individual GameObject. My next optimization will probably have to be concurrently creating a command buffer for OpenGL calls and then doing all the OpenGL calls from the OpenGL thread right before drawing.