Chapter 20. C++ Analysis
- C++ is an object-oriented programming language.
- C++ has objects which can contain both data and functions.
- Functions in C++ are similar to C programs except they can extend and be associated with an object or object class.
- Functions in C++ are often called methods.
- Classes are like structs except they can store function information and data.
- Classes are similar to a blueprint or foundation for creating an object.
- An object is an instance of a class.
- Multiple objects can be created from a class, all with unique data, except they share the same methods.
- Accessing data and functions require you to reference an object of a particular type.
The ‘this’ pointer:
- Variables and functions can be accessed using the object name + the variable name.
- e.g. CustomObject.var = 0;
- e.g. CustomObject.function();
- Variables and functions can also be accessed using just the variable name if called from within the definition of a class.
- e.g. var = 0;
- e.g. function();
- When accessed from within the definition of a class, the location in memory storing this variable will vary between objects as each object has its own memory addresses.
- The ‘this’ pointer keeps track of mapping memory addresses to variables.
- The ‘this’ pointer is implied whenever a variable is accessed within a function where an object wasn’t specified.
- ECX and sometimes ESI are used by Microsoft Assembly to pass the ‘this’ parameter.
Overloading and Mangling:
- ‘Method overloading’ is supported in C++ and means to have multiple functions with the same name, but with a different set of parameters being passed.
- C++ uses ‘name mangling’.
- In a PE file format each function only has its name as a label.
- Because this would cause labels to clash, the names in a PE file format are modified so that the name includes parameter information.
- e.g. A function called ‘Function’ as part of a class called ‘Class’ which includes 2 integers as parameter items would look like ‘?Function@Class@@QAEXHH@Z’
- IDA can demangle this, but only if symbols are present, and these are often removed by malware authors.
Inheritance and Function Overriding:
- Child classes inherit functions and data from parent classes.
- Generally isn’t visible in assembly.
Virtual and Non-Virtual Functions
- You can override a Virtual Function by a subclass.
- Virtual Function execution is determined at runtime.
- A child class with the same named function overrides the parent function.
- Non-virtual functions instead determine execution at compile time.
- A child class with the same named function does not override the parent function.
- The above functionality differences is commonly known as polymorphism as they share a common interface but perform different functions.
Use of Vtables:
- When C++ compiles, data structures called Virtual Function Tables or vtables are added to support Virtual Functions.
- These are arrays to function pointers.
- Each class using a Virtual Function has its own vtable.
- Each Virtual Function in a class is included in the vtable.
- Biggest issue is that a Virtual function call doesn’t show the target for a ‘call’ instruction.
- Non-virtual Function call may look like ‘call sub_<address>’
- Virtual function call would look like ‘call eax’
- First 4 bytes of an object are a pointer to the vtable.
- First 4-byte entry of a vtable is a pointer to the code for the first Virtual Function.
- To find the location of a Virtual Function call, you must find where a vtable is accessed and its offset, and then you must find the vtable in memory.
Recognising a Vtable:
- A vtable looks like an array of function pointers.
- Only the first value in a vtable should have a cross-reference with other being accessed via their offset.
- Virtual functions are not directly called by other parts of the code, so cross-references to them should not contain a ‘call’ instruction, but rather be referenced as an offset.
- e.g. ‘dd offset sub_<address>’
- All functions within a vtable table belong to the same class and are somehow related.
- If 2 classes point to the same offset you can infer an inheritence relationship.
- If one vtable is larger than the other it is a subclass.
Creating and Destroying Objects:
- When an object is created the ‘constructor’ function is called.
- This performs initialisation.
- Objects can be stored on the heap or stack.
- When not stored on the stack, memory allocation needs to occur. This happens with the ‘new’ operator which is of interest.
- New operators can often have an unusual function name such as ‘??2@YAPAXI@Z’.
- When an object is destroyed the ‘destructor’ function is called.
- This is automatically called if objects go out of scope.
- Can complicate disassembly due to exception handlers being added.
The purpose of this first lab is to demonstrate the usage of the this pointer. Analyze the malware in Lab20-01.exe.
Does the function at 0x401040 take any parameters?
If we examine the main function of ‘Lab20-01.exe’ (C++ executable) in IDA, we see that this doesm’t take any parameters; however, it does take a ‘this’ pointer. By doing this it knows that the function it will be running is for the created object.
One way to identify this is the lack of clear structure being passed, strange duplication of references being stored prior to it, and the result being stored in our ‘ecx’ register. This is in addition to a URL being moved into our newly created object reference.
Which URL is used in the call to URLDownloadToFile?
At a glance we can see the below URL being moved into ‘dword ptr [ecx]’.
Based on this we know that the URL http://www.practicalmalwareanalysis.com/cpp.html is being stored at the start of our newly created object. By examining ‘sub_401040’, we can see that the object passed in our ‘this’ pointer is being stored in [ebp+var_4].
This is then being referenced, and the start of our object is being accessed as the LPCSTR entry passed to URLDownloadToFile. In this case it is the URL and FileName respectively which is pushed to the calling object stack shortly before execution.
This in turn confirms the URL and FileName used by the call to URLDownloadToFile.
What does this program do?
The program is contained solely within what we’ve discussed in the previous 2 questions. From what we’ve seen, this program will download a file from http://www.practicalmalwareanalysis.com/cpp.html and save it on the local machine to a file called c:\tempdownload.exe.
The purpose of this second lab is to demonstrate virtual functions. Analyze the malware in Lab20-02.exe. This program is not dangerous to your computer, but it will try to upload possibly sensitive files from your machine.
What can you learn from the interesting strings in this program?
If we run strings over this executable, we can see a number of interesting entries, including what looks to be evidence this is made using C++, possible imports associated with network connections and FTP operations, and strings that indicate the program likely functions as an FTP client which is looking for .doc and .pdf files to send back to ftp.practicalmalwareanalysis.com.
What do the imports tell you about this program?
Opening this in PE-bear, we can see that this is importing functions from WININET.dll which look to be associated with FTP operations. This leads us to believe the program will function as a FTP client, further backing up our hypothesis from question 1.
Examining the imports from KERNEL32.dll we also see what looks to be API calls associated with finding files which match a certain parameter on a system.
Based on these imports it looks like this program will search for files on a system, and at some stage send them to a remote FTP server.
What is the purpose of the object created at 0x4011D9? Does it have any virtual functions?
If we examine 0x4011D9, we can see that this occurs directly after a comparison which looks to be searching for a .doc file. We can also see checks on one branch which may be looking for a .pdf file.
Of interest in the above is that after an object is created, for what looks to be a .doc file being found, there are 2 sets of ‘mov’ operations occurring directly after one another.
This looks to first create an object and store a reference to it into [ebp+var_15C]. This is then stored in a pointer to [edx] and [eax]. Immediately after this we see what looks to be a virtual function table ‘offset off_4060DC’ being written to the object’s first offset. If we examine cross-references to ‘off_4060DC’, we can see that this looks to be a virtual function given it is only referenced by an offset rather than a ‘call’ instruction.
Based on this it appears that the purpose of this object is to act as a reference to a ‘.doc’ file which has been found. Looking back at assembly operations performed prior to these operations shows calls to functions which help to back up this hypothesis. These back up our hypothesis given the malware would need to find a file before creating an object as a reference to the file.
Which functions could possibly be called by the call [edx] instruction at 0x401349?
Taking a look at the call [edx] instruction at ‘0x401349’, we can see that 3 possible objects are being created. The 3 objects being created are for a PDF file, DOC file, and a file that is neither of these being found on disk. From here we see evidence of Virtual Function Tables being setup, until all the references to a created object merge into a single reference to ‘[ebp+var_148]’.
Taking a step back, what we’re really interested in is the possible virtual functions that different objects would call, which in this case is at ‘off_4060DC’, ‘off_4060D8’, and ‘off_4060E0’ (remember that these all point to the first function in the virtual function table for our created objects).
If we take a look at what these offsets point to, we can see that they point to
- ’??1_Init_locks@std@@QAE@XZ’ (Name mangling has occurred. This tells us the original class was ‘std’ with a function name of ‘_Init_locks’. A quick search reveals this is likely an inbuilt C++ function used for creating a lock on an object when it is created)
If we examine sub_401380, we can see that this looks to be establishing a new connection to a remote FTP server and attempting to place a found PDF file into a ‘pdfs’ directory.
If we examine sub_401440, we can see that this looks to be establishing a new connection to a remote FTP server and attempting to place a found DOC file into a ‘docs’ directory.
Based on this we know what functions could be called by the call [edx] instruction at 0x401349.
How could you easily set up the server that this malware expects in order to fully analyze the malware without connecting it to the Internet?
Given we know that this is expecting an FTP server to be present at ftp.practicalmalwareanalysis.com for exfiltration, we can setup a local ftp server using software such as XAMPP or FileZilla and then redirect any calls for that domain to our local host like we’ve done in previous labs.
Note: This was designed to run against a Windows XP OS, and as such running this on other operating systems appears to fail.
We know based on what was found in question 4 that this doesn’t look to be authenticating to the ftp server in question. Due to this we will first need to enable an ‘anonymous’ user account on our FTP server and ensure it doesn’t require a password. In addition we will need to configure the home directory where captured files will be sent.
After doing this we can fire up ApateDNS, our FTP Server, and logon to the admin interface of our FTP server to track what is being sent to it. By running the program we can see DNS requests being made which are redirected to our own host. From here the program begins to establish a FTP connection, store the file found, and then disconnects from the FTP server causing a number of connections to occur.
This also highlights that the malware is attempting to store each type of file found in a folder called ‘docs’ or ‘pdfs’ depending on the extension being exfiltrated. This backs up what we found in our previous analysis.
By performing these actions, we are able to fully analyse the malware without connecting it to the internet.
What is the purpose of this program?
The purpose of this program is to find .pdf and .doc files on your system and exfiltrate these to a remote FTP server at ftp.practicalmalwareanalysis.com.
What is the purpose of implementing a virtual function call in this program?
By implementing virtual functions the program is able to perform different actions depending on the object file extension found on the host. In this case the different functions were to specify what directory exfiltrated files would be stored in.
This third lab is a longer and more realistic piece of malware. This lab comes with a configuration file named config.dat that must be in the same directory as the lab in order to execute properly. Analyze the malware in Lab20-03.exe
What can you learn from the interesting strings in this program?
By running strings against this binary we can begin to infer what it may be used for and what functionality it may have.
First off we see it is likely written in C++ and can present a message popup to the user.
Next up we see what looks to be a number of imported APIs giving this the ability to read files, create files, get access to the user context it is running under, make network connections, terminate itself, understand what process it is running under, and load further libraries.
Finally we can see that this looks to perform some Base64-encoding or decoding functions, potentially using a custom index_string, we see reference to remote URIs, a reference to original C++ clases being labelled as a ‘BackdoorClient’ in addition to ‘Polling’ and ‘Beacon’ strings. Further to this we can see this program looks to gather Host/User information, has the ability to upload and download files, the ability to create arbitrary processes, and can make GET/POST requests.
Immediately we begin to believe this is some sort of information gathering remote access tool/trojan which provides the ability to exfiltrate files and run commands on a system.
What do the imports tell you about this program?
Opening this in the latest available version of pestudio (in this case 9.09), we can see that a number of imports are already down as ‘blacklisted’, in addition to some deprecated APIs being used by the program.
Of interest is that we can see this has the ability to make network connections, execute processes, and sleep, all of which would be pretty common functions for a remote access tool/trojan which leveraged the sleep API call to allow checking into the C2 periodically.
The function 0x4036F0 is called multiple times and each time it takes the string Config error, followed a few instructions later by a call to CxxThrowException. Does the function take any parameters other than the string? Does the function return anything? What can you tell about this function from the context in which it’s used?
If we first examing cross-references to 0x4036F0, we can see that it is called 5 times throughout this program.
Looking at where these are called we can see they all take place inside of ‘sub_403180’. An example of this is shown below.
To get a bit more of an idea what is being passed to the function, we can examine cross-references to sub_403180 to see if anything is passed to this subroutine. Immediately we see an ‘sub_401EE0’ object being created and the object’s ‘this’ pointer being stored into ecx.
Based on this we know that the function 0x4036F0 doesn’t take any parameters other than the Config error string. Taking a look we can see that this same object (which we’re beginning to believe is part of an exception object) is used as a parameter to the CxxThrowException function.
If we examine what’s contained within ‘sub_4036F0’, we find evidence that this is likely setting up an exception to be raised.
Based on all of this context, and by examining the patterns which occur right before 0x4036F0 is called, we can infer that these are all exception objects which raise an exception if the specified config.dat file doesn’t exist or is invalid.
What do the six entries in the switch table at 0x4025C8 do?
If we jump to 0x4025C8 we find the six entries in the switch table which are referenced at 0x40252A.
If we follow this reference, we can see that this is triggered by a reference at 0x402500.
If we continue tracking back what kicks this off, we can find at ‘loc_402410’ there’s a cross-reference to an offset within ‘sub_403BE0’. This is ultimately what kicks off any one of these switches to occur.
If we were to look at what calls this, we’d find it is the only call within our _main method which is kicked off shortly after the program ‘start’ method runs. Of interest in the above is that we can see a looping function and a call to ‘sleep’. Right before this happens there’s a call to ‘sub_401F80’ which we’ll examine further.
In the above we find 5 calls to subroutines to examine further.
Based on the above User-Agent string and API calls, we can assume this plays the role of establishing a connection to the C2.
Based on the above strings and API calls, we can assume this plays the role of gathering initial system information to send back to the C2.
This subroutine has a number of subroutines which are called; however, at a glance we can see that this is likely playing the role of posting the gathered data back to the C2, or making a GET request to it.
Based on the above strings and errors being called, we can assume this is receving the response to our request and checking to see if it matches an expected valid HTTP response.
Based on the above strings and what looks to be Base64 index_strings, we can assume this is Base64-encoding or decoding the response received from the C2 server.
At this point we have a good idea of what actions the Beacon will take prior to ‘loc_402410’ inevitably calling the switch table at 0x4025C8. We also know that the six entries in this switch table are likely six different actions to take based on the response received from the C2.
To find out what each of the switch entries does we can investigate them further.
This is case 0x61, and from the above we can see that this looks to delete the object which called it, but nothing else.
This is case 0x62, and from the above we can see that this calls ‘sub_4025E0’ before executing case 0x61. Examining sub_4025E0 we can see that this looks to call atoi used in parsing a string into a number before this is passed to a ‘sleep’ API call.
This tells us that case 0x62 is likely designed to notify the beacon to sleep for a certain amount of time before checking back in for new commands.
This is case 0x63, and from the above we can see that this calls ‘sub_402F80’ before flowing into and executing case 0x61. Examining ‘sub_402F80’ we don’t find much besides another call to ‘sub_402EF0’. By looking at sub_402EF0, we can see that this looks to call CreateProcessA in order to run a command sent to it.
This tells us that case 0x63 is likely designed to start a process sent down from the C2 thus executing a command tasked to the beacon.
This is case 0x64, and from the above we can see that this calls ‘sub_402BA0’ before executing case 0x61. Examining ‘sub_402BA0’ we can see that it calls ‘sub_402A20’ with some parameters including ‘lpFileName’. If we look into what ‘sub_402A20’ is doing we see some familiar calls associated with connecting to the C2 and checking the response is valid.
In addition the above shows us evidence of a file being written to disk from the response received, and an error message associated with downloading a file.
This tells us that case 0x64 is likely designed to download a file from the C2.
This is case 0x65, and from the above we can see that this calls ‘sub_402C70’ before executing case 0x61. Examining ‘sub_402C70’ we can see that it calls ‘sub_4027E0’. If we look into what ‘sub_4027E0’ is doing we can see a call to CreateFileA which in this instance looks to be getting a handle on a file before its bytes are read in a looping function and it is uploaded to the C2.
This tells us that case 0x65 is likely designed to upload a file to the C2.
This is case 0x66, and from the above we can see that this calls ‘sub_402D30’ before executing case 0x61. Examining ‘sub_402D30’ we find that this looks to be gathering information about the machine it is being run on which will be sent back to the C2.
This tells us that case 0x66 is likely designed to profile a system and send the information back to the C2.
What is the purpose of this program?
If we view ‘sub_401EE0’ which is run after taking the config.dat file as a parameter.
We can see this is once again creating an exception object, in addition to specifying the resources which are present for commands to be retrieved from the C2.
We also know that this sends a beacon to the C2 and has a number of operations which could occur based on the C2 server response including:
- Notifying the beacon to sleep for a specified number of seconds.
- Notifying the beacon to start an arbitrary process.
- Notifying the beacon to download a file from the C2.
- Notifying the beacon to upload a file to the C2.
- Notifying the beacon to profile a system and send the information back to the C2.
Combining this with the above analysis in questions 1-4, we can conclude that this is a remote access trojan/tool which uses a config file which is encoded on disk to connect to its associated C2.
This concludes chapter 20, proceed to the next chapter.