Chapter 5. IDA Pro
Interactive Disassembler Professional:
- Options > General > Line Prefixes > (Opcode Bytes as 6) = helps show memory locations and opcode values in Graph Mode.
- Options > General > Auto Comments = Comments about instructions, useful learning Assembly.
- L Function flag = Library and can be skipped.
- Functions: Associates flags with each function.
- Names: Every address and name including functions, data, strings, named code.
- Strings: Default, shows ASCII longer than 5 characters.
- Imports: Lists all imports.
- Exports: Lists all exported functions.
- Structures: Create own data structures or list layout of all data structures.
Types of links:
- Sub Links: Links to start of functions
- Loc links: Jumps to destinations
- Offset Links: Links to an offset in memory
This lab uses the file Lab05-01.dll. Analyse this using basic dynamic analysis tools.
What is the address of DllMain?
Upon loading the DLL into IDA we arrive at the DllMain function, and can see the address in the functions window, or by turning on Line Prefixes under Options > General > Line Prefixes.
Use the Imports window to browse to gethostbyname. Where is the import located?
By viewing the imports, searching for gethostbyname, and then looking for the address location, we find where this import is located.
0x100163CC within idata
How many functions call gethostbyname?
By using the jump to xrefs function while highlighting gethostbyname
We are able to find that this is run 9 times, by 5 different subroutines/functions.
Focusing on the call to gethostbyname located at 0x10001757, can you figure out which DNS request will be made?
By using the jump to address function, we’re able to specify the address to jump to and in this case can specify 0x10001757.
Looking at the operand before this function call, we can see that an address offset is being moved into EAX before 0Dh is added to it
By following this we can see that EAX now points to address 0x10019194 within data which contains: [This is RDO]pics.praticalmalwareanalysis.com
If we look at the value 0Dh which will be added to EAX, if we convert this hex to decimal it gives us the value 13, and the first 13 characters are [This is RDO].
Because the pointer (EAX) is moved along 13 bytes, we now know that the DNS request will be made for:
How many local variables has IDA Pro recognized for the subroutine at 0x10001656?
Jumping to the address at 0x10001656 we can find 20 different variables which IDA Pro Free has identified; however, it is important to note that a paid version of this product may identify more variables.
How many parameters has IDA Pro recognized for the subroutine at 0x10001656?
Looking at the previous screenshot we can find that arg_0 has been identified which indicates one argument would be expected from this subroutine, and as a result 1 parameter.
Use the Strings window to locate the string \cmd.exe /c in the disassembly. Where is it located?
Using ALT + T we can search through the strings window for cmd.exe, and we can find where it is located.
What is happening in the area of code that references \cmd.exe /c?
By following the xref to the subroutine which references \cmd.exe /c
We’re able scroll through the function to see a number of interesting values being pushed to the stack, in this case the values: quit, exit, and cd catch our eyes.
Continuing on we can see entries such as: idle, uptime, mmodule, minstall, and inject all catch out eyes.
Finally if we look around this function we can find that the char array aHiMasterDDDDDD mentioning a ‘Remote Shell Session’, and ass such we can infer we’re looking at a remote shell session function.
In the same area, at 0x100101C8, it looks like dword_1008E5C4 is a global variable that helps decide which path to take. How does the malware set dword_1008E5C4? (Hint: Use dword_1008E5C4’s cross-references.)
Starting at address 0x100101C8 we can see a comparison statement comparing ebx to dword_1008E5C4, and viewing the cross-references to this we can find one of them which actually contain the mov statement to set the value.
Following this we can see that the output of sub_10003695 will be moved directly into dword_1008E5C4.
So by looking into this routine we can find that it is comparing the dw platform ID to the value ‘2’:
By running some searches based on this we find the following documentation on PlatformID
This tells us that the field 2 indicates the operating system is Windows NT or later.
As such we now know the malware will take a different path depending on whether the operating system is Windows NT or later.
A few hundred lines into the subroutine at 0x1000FF58, a series of comparisons use memcmp to compare strings. What happens if the string comparison to robotwork is successful (when memcmp returns 0)?
Looking into this routine we can find the entry comparing “robotwork” which uses a JNZ branch.
The JNZ branch will jump if the Zero Flag is NOT SET (in this case that means the comparison is successful). This is because when we’re talking about Zero Flags, we’re essentially asking “Is this false?”, and if it is true (1=True), the Zero Flag IS NOT set, if it is false (0=False) then the Zero Flag IS set.
Because of this, if memcmp returns 0, the answer to the question “Is this false?” will be no, thus indicating a successful comparison. Because of this the jump is NOT taken, and we end up running a call to the subroutine sub_100052A2, so let’s take a look into it.
From this we can see that it is opening a registry key at: HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion. The JZ statement is once again asking “Is this false?” which in this case would check if the registry was successfully opened or not. So long as the registry is successfully opened, the answer would be “False”, “No”, or in terms of the Zero-Flag “0”. The difference here is that it is jumping if the zero flag IS set, so let’s follow loc_10005309.
Here we can see it is querying WorkTime, and WorkTime registry keys. If we look back at where this opened the registry key we can see that it is passing an argument type of “Socket” with the value ‘s’. Looking back at the start of this question we can see that this pushes ebp+s which indicates this information is sent back over the passed network socket.
What does the export PSLIST do?
By looking into the exports within this DLL we can find PSLIST. Following this and pressing SPACE leads us to the IDA Graph view.
From here we can see that one of 2 paths will be taken depending on the result of sub_100036C3, so let’s dive a bit deeper there.
Once again we can see this is checking whether the operating system is Windows NT or later; however, even if it is, it is then checking if it’s major version is 5. So let’s look at what this represents by looking at the documentation on OSVERSIONINFOEXW structure
So we now know it is checking whether the OS is any of these versions. Depending on the output it will either run sub_10006518 or sub_1000664C.
Taking a closer look at sub_10006518 we can see based on the API call to CreateToolhelp32Snapshot, strings, and the function name that this will allow them to grab a process listing.
Looking further at sub_1000664C, we can see that this performs the same type of calls as sub_10006518; however, this also sends through reference to the socket to send the output back to.
Use the graph mode to graph the cross-references from sub_10004E79. Which API functions could be called by entering this function? Based on the API functions alone, what could you rename this function?
By using G to go to the address of interest (in this case sub_10004E79), we can then click View > Graphs > XRefs From to see a number of API functions within this function.
Based on this we can infer that it is more than likely the System Default Language Identifier would be sent over a network socket, and as such could name this function as LanguageIdentifier_Send.
How many Windows API functions does DllMain call directly? How many at a depth of 2?
By clicking View > Graphs > User xrefs chart, and then adjusting the settings to start and end at the function DLLMain with a depth of 1, we’re able to see 4 Windows API Functions.
If we expand this to a depth of 2, the chart blows out in size and we’re looking at 33 including duplicates.
At 0x10001358, there is a call to Sleep (an API function that takes one parameter containing the number of milliseconds to sleep). Looking backward through the code, how long will the program sleep if this code executes?
Moving back from the call to sleep, we can see that EAX is multiplied by 1000 before being pushed to the stack and called. This matches the reference to milliseconds, in that there are 1000 milliseconds in a second.
If we follow the previous routine at offset 10019020 (off_10019020), we see it points to the data reference unk_100192AC.
At present this is a bit confusing as it is made up of individual parts to a much larger string, but if we go ahead and convert this to the string it is supposed to be.
We can now see that it has the value [This is CTI]30 which is much clearer.
Looking back at the commands it is then adding 0Dh (13) to EAX which moves the pointer past the text ‘[This is CTI]’ leaving only ‘30’.
Based on the call to atoi this is then converted to a number before being multiplied by 1000 and as such the program will sleep for 30 seconds if this executes.
At 0x10001701 is a call to socket. What are the three parameters?
Looking at this address we can see a call to socket which takes 3 parameters (protocol, type, and af) all of which are pushed to the stack prior to the call.
Using the MSDN page for socket and the named symbolic constants functionality in IDA Pro, can you make the parameters more meaningful? What are the parameters after you apply changes?
By looking into the MSDN Socket Function we can find what these numbers correlate to.
By right clicking and selecting Use Standard Symbolic Constant, we’re able to quickly change these to accurately reflect their assigned values.
Search for usage of the in instruction (opcode 0xED). This instruction is used with a magic string VMXh to perform VMware detection. Is that in use in this malware? Using the cross-references to the function that executes the in instruction, is there further evidence of VMware detection?
By searching for ‘ED’ as a sequence of bytes (ALT+B) we can find only one occurrence of the instruction ‘in’.
Diving into this function we can see it is checking for the value VMXh which indicates this malware is implementing a known anti VM technique.
Looking at the Xrefs to this function we can see a reference to locating a VM in use and cancelling installation.
Jump your cursor to 0x1001D988. What do you find?
If we jump here using ‘G’ we find a bunch of seemingly random data.
If you have the IDA Python plug-in installed (included with the commercial version of IDA Pro), run Lab05-01.py, an IDA Pro Python script provided with the malware for this book. (Make sure the cursor is at 0x1001D988.) What happens after you run the script?
At present we’re running this through the free version of IDA Pro, so we’ll be unable to do this; however let’s open up the script and see if we can see what it is doing.
From this we can see it will loop through from our current position (0x1001D988) up to 50 bytes and run an XOR command over all of them by 0x55. From this we can infer that the script will de-obfuscate the seemingly random data.
With the cursor in the same location, how do you turn this data into a single ASCII string?
This can be done by pressing A on the string or doing so like we did earlier with CTI30. By converting all strings to ascii we wind up with gibberish still because each element still requires the XOR function.
We can also see there’s been some overlap of hex indicated by the ,27h,’ elements. By removing these and running the XOR command over all of the strings concatenated using CyberChef(https://gchq.github.io/CyberChef/) we get a hidden message.
Open the script with a text editor. How does it work?
In this instance we can see that there’s been some issues bringing back any capitalisation somewhere along the line and this should read “xdoor is this backdoor, string decoded for Practical Malware Analysis Lab :)1234”. This is yet another benefit of us running this through the python script; however, the purpose still stands based on how the python script works. It will loop through from our position (0x1001D988) up to 50 bytes and run an XOR command over all of the values individually by 0x55
This concludes chapter 5, proceed to the next chapter.