Practical Malware Analysis - Chapter 6 Lab Write-up

14 minute read

PMALab

Chapter 6. Recognizing C Code Constructs in Assembly

Global vs. Local Variables

  • Global Variables referenced by memory addresses (e.g. dword_40CF60)
  • Local Variables referenced by stack addresses (e.g. ebp-4)

Disassembling Arithmetic Operations

The below outline basic arithmetic operations and ways of remembering them within assembly.

Assigning variables (int a = 0; int b = 1;):

  • mov [ebp+var_4], 0 (int a = 0)
  • mov [ebp+var_8], 1 (int b = 1)

Addition of variables (a = a + 11;):

  • mov eax, [ebp+var_4] (a =)
  • add eax, 0Bh (a + 11)

Subtraction of variables (a = a - b;):

  • mov ecx, [ebp+var_4] (a =)
  • sub ecx, [ebp+var_8] (a - b)

Decrementing variables (a–;):

  • mov [ebp+var_4], ecx (null)
  • mov edx, [ebp+var_4] (a =)
  • sub edx, 1 (a - 1)

Incrementing variables (B++;):

  • mov [ebp+var_4], edx (null)
  • mov eax, [ebp+var_8] (b =)
  • add eax, 1 (b + 1)

Modulo variables (b = a % 3):

  • mov [ebp+var_8], eax (null)
  • mov eax, [ebp+var_4] (a =)
  • cdq (convert value)
  • mov ecx, 3 (3)
  • idiv ecx (a / 3)
  • mov [ebp+var_8], edx (b = remainder)

Recognizing if statements

Must be a conditional jump (e.g. jnz) for an if statement, but not all conditional jumps are if statements.

Recognizing nested if statements

Every if statement corresponds to a jump of some kind, so in this case there’s 3 close compare (cmp) and then jump (jnz) statements.

Finding for loops

Must have 4 components; initialise, compare, execute, and increment/decrement.

Finding while loops

This looks similar to a for loop, except without incrementation.

Function call conventions

Calling convention differs based on compiler and other factors.

3 most common calling conventions are:

  • cdecl
  • stdcall
  • fastcall

cdecl convention

One of the most popular. Params pushed onto stack from right to left, and caller cleans up the stack at end.

stdcall convention

Similar to cdecl, except callee needs to clean up stack. This is the standard convention for the Windows API, and as such when calling them you don’t need to clean up the stack as the DLLs which implement the API will clean up the stack.

fastcall convention

Varies most across compilers. First couple of arguments are passed in registers (e.g. EDX or ECX). Any other arguments are loaded right to left. Calling function responsible for cleaning up stack. Generally faster.

Push vs Move

The differences are generally based on the compiler or compiler settings, sometimes parameters are pushed onto the stack, other times they are moved onto the stack. In cases of push, there’ll be an additional instruction to restore the stack pointer which won’t be present in the move.

Switch statements

Compiled either using an ‘if’ or using ‘jump tables’. Will have multiple compare and jump statements close to one another, and any ‘false’ result leads the next compare to happen.

Jump Table

Has a large amount of switch statements and avoids the need for so many compares. In this case a ‘false’ result leads to the jump statement which dynamically points to a location.

Disassembling Arrays

Globally and locally defined arrays appear different. Globally defined may have a base address within ‘dword_XXXXXX’, and locally may have a base address within var_XX. In both cases accessing these arrays has ecx multiplied by the element size (in the case of integers this is 4).

Identifying Structures aka Structs

Similar to arrays but can contain different element types. Generally used to group information. Nearby data types may or may not be part of the struct. ‘fld’ or ‘fstp’ are floating point instructions which indicate this type is a double. ‘T’ key in pro can be used to create a structure.

Linked List is data structure with data records which contain a reference (link) to the next record in the sequence. Recursive object assignments help to identify this, for example a recursive loop where a variable is assigned eax, which then becomes eax+4 may be indicative of the variable having a pointer 4bytes in.

Lab 6-1

Analyze Lab06-01.exe

Question 1

What is the major code construct found in the only subroutine called by main?

Answer 1

By starting at the main function we can see that a subroutine 401000 is called.

Lab06-01.exe

Moving into this routine we see that there is a compare statement before a JZ jump statement, and by using the graph view we can verify that this is indicative of an ‘if’ code construct.

Lab06-01.exe

Question 2

What is the subroutine located at 0x40105F?

Answer 2

By looking at the values being pushed to the stack prior to calling this function we can begin to make some assumptions that this subroutine relates to printing some sort of message which is delimited by a line feed (\n).

Lab06-01.exe

If we dig slightly into this function we can see reference to ‘__stbuf’ and ‘__ftbuf’ which in the context of things leads us to believe this is ‘string buffer’ and ‘format buffer’. A quick search leads us to these being declared inside of the resource internal.h within the .NET Core.

.NET Core Runtime - internal.h

From this we can see it provides support for input/output of files, and if we compare this to the function references in this context we can make an informed decision that this is in fact the ‘printf’ (print formatted) subroutine, which writes the string to stdout.

Lab06-01.exe

Question 3

What is the purpose of this program?

Answer 3

Based on what we can see in the success or error messages that were pushed to the stack, this program checks if you are connected to the internet, and responds with a corresponding message in addition to returning the number ‘1’, if you are connected.

Lab 6-2

Analyze Lab06-02.exe

Question 1

What operation does the first subroutine called by main perform?

Answer 1

Taking a look at the main function, we can see that it calls ‘sub_401000’.

Lab06-02.exe

This subroutine calls ‘InternetGetConnectedState’, and then compares if the output of it is equal to 0. Then it will jump if the internet connection is present to one of 2 messages. As such this operation is an ‘if’ statement.

Lab06-02.exe

Question 2

What is the subroutine located at 0x40117F?

Answer 2

If we look at this subroutine we notice that it is almost identical to that which we saw in Lab06-01.exe, and once again comparing a string pushed to the stack just before calling this subroutine within main leads us to believe this is once again ‘printf’. In this instance there’s also the usage of ‘%c’ to help backup this inference.

Lab06-02.exe

Question 3

What does the second subroutine called by main do?

Answer 3

We can see that the second subroutine is ‘sub401040’. By looking at this we can see that this attempts to open the URL http://www.practicalmalwareanalysis.com/cc.htm and if it is successful, it will read in the first 0x200 (512) bytes into a buffer.

Lab06-02.exe

Question 4

What type of code construct is used in this subroutine?

Answer 4

If we look under the call to ‘InternetReadFile’ we can see 2 different pathways, one where it was able to read the HTML ‘file’, and one where it failed.

Lab06-02.exe

If we look at the code construct under where it successfully read the HTML file, we can see that there are multiple compare statements based on 4 characters from the buffer. This indicates an array of characters is being parsed, and if we view the compare characters by using ‘R’ to convert the hex to ascii, we see some interesting characters.

Lab06-02.exe

We know that within HTML ‘<!–’ signals the start of a comment.

Question 5

Are there any network-based indicators for this program?

Answer 5

As shown in question 3, there’s 2 obvious network indicators we can use within this program, which is the URL to be opened, and the User-Agent.

  • http://www.practicalmalwareanalysis.com/cc.htm
  • Internet Explorer 7.5/pma

Question 6

What is the purpose of this malware?

Answer 6

The purpose of this malware is to check if there is an active internet connection, if there is it will proceed to try and open the URL http://www.practicalmalwareanalysis.com/cc.htm using the User-Agent ‘Internet Explorer 7.5/pma’ and if it is successful, it will read in the first 0x200 (512) bytes into a buffer, if not it will terminate. From this buffer the characters ‘<!–’ are read, and if they don’t exists the error message “Error 2.3: Fail to get command\n” is printed.

Lab06-02.exe

If these do exist, it will print the message “Success: Parsed command is %c\n”, where %c is the character read from the HTML comment buffer. We can also see this will wait 60000 milliseconds (1 minute) before terminating.

Lab 6-3

Analyze Lab06-03.exe

Question 1

Compare the calls in main to lab 6-2’s main method. What is the new function called from main?

Answer 1

Examining the calls in Lab6-2 and Lab6-3 show a number of similarities; however, 1 subroutine exists in Lab6-3 which is new and is called “sub_40113”.

Lab06-03.exe

Question 2

What parameters does this new function take?

Answer 2

Looking at the function, we can see that it takes 2 parameters, a ‘char’ value, and a ‘lpcstr’ value (long pointer constant string).

Lab06-03.exe

If we examine how this is being invoked, we can see 2 items are pushed to the stack before calling this subroutine; argv, and var_8.

Lab06-03.exe

In this instance argv represents argv[0] which points to the name of the program. We can also see that var_8 is set to AL in 0x401228. This means that the lower 8 bits (byte, or character in this case) from sub_401040 becomes var_8, and given we know sub_401040 allocates bytes to a buffer, and then checks for ‘<!–’, we can then see that the next byte after ‘<!–’ is the char being passed to this function.

Lab06-03.exe

Question 3

What major code construct does this function contain?

Answer 3

Looking at this function we can see that it contains a switch statement with a Jump Table, this is indicated by a lack of repeated compare and jump statements, which is evident in an ‘if-else’ scenario, and instead contains one compare and jump statement with multiple locations it could end up jumping to.

Lab06-03.exe

Question 4

What can this function do?

Answer 4

If we break down this function, we can see that it subtracts ‘a’ from whatever is passed to it, therefore if it is ‘a’, it will have a value of ‘0’ and be the first jump case, b will equal 1 and be the second case and so forth.

In the case of this function, depending on the letter you send it (after ‘<!–’ in the HTML), between ‘a’ and ‘e’, it will run a different set of commands.

Lab06-03.exe

  • a = The program create’s a directory at C:\Temp if it doesn’t exist
  • b = The program will copy a file to C:\Temp\cc.exe, the file passed is lpExistingFileName which we know would be the program name
  • c = The program will delete the file located at C:\Temp\cc.exe if it exists
  • d = The program creates persistence using the subkey Software\Microsoft\Windows\CurrentVersion\Run with the value ‘Malware’ pointing to C:\Temp\cc.exe
  • e = The program sleeps for 100,000 milliseconds (100 seconds)

If the program fails to get these instructions or it fails, it will display an error message.

  • Error 3.2: Not a valid command provided

Question 5

Are there any host-based indicators for this malware?

Answer 5

The host based indicators from this are the file the malware will copy itself to, and the registry key used for persistence.

Note: A quick search reveals that 80000002h which is pushed to the stack in the registry key is linked to HKEY_LOCAL_MACHINE, so we can assume it will be here rather than the users registry hive.

  • C:\Temp\cc.exe
  • HKLM\Software\Microsoft\Windows\CurrentVersion\Run /v Malware

Question 6

What is the purpose of this malware?

Answer 6

The purpose of this malware is to check if there is an active internet connection, if there is it will proceed to try and open the URL http://www.practicalmalwareanalysis.com/cc.htm using the User-Agent ‘Internet Explorer 7.5/pma’ and if it is successful, it will read in the first 0x200 (512) bytes into a buffer, if not it will terminate. From this buffer the characters ‘<!–’ are read, and if they don’t exists the error message “Error 2.3: Fail to get command\n” is printed. If these do exist, it will print the message “Success: Parsed command is %c\n”, where %c is the first character read from the HTML comment buffer.

Based on the parsed command between the letters a and e, the program will either:

  • Create a directory at C:\Temp if it doesn’t exist
  • Copy a file to C:\Temp\cc.exe, the file passed is lpExistingFileName which we know would be the program name
  • Delete the file located at C:\Temp\cc.exe if it exists
  • Create persistence using the subkey HKLM\Software\Microsoft\Windows\CurrentVersion\Run with the value ‘Malware’ pointing to C:\Temp\cc.exe
  • Sleep for 100,000 milliseconds (100 seconds)
  • Display the error message “Error 3.2: Not a valid command provided”

Lab 6-4

Analyze Lab06-04.exe

Question 1

What is the difference between the calls made from the main method in Labs 6-3 and 6-4?

Answer 1

In Lab 6-3 the calls directly from the main method consist of:

  • sub_401000
  • sub_401040
  • sub_401271
  • sub_401130
  • Sleep

Lab06-04.exe

In Lab 6-4 the calls directly from the main method consist of:

  • sub_401000
  • sub_401040
  • sub_401150 (differs)
  • sub_4012B5 (differs)
  • Sleep

Lab06-04.exe

Looking into these methods we find that:

  • sub_401000 = Check for internet connection
  • sub_401040 = HTML C2 parsing function. Note: in 06-04 the User-Agent has changed.

Lab06-04.exe

  • sub_401150 (differs) = Jump Table switch statement as previously identified in Lab 06-03 sub_401130 to control actions.

Lab06-04.exe

  • sub_4012B5 (differs) = printf as previously identified in Lab 06-03 sub_401271.

Lab06-04.exe

Question 2

What new code construct has been added to main?

Answer 2

Looking into the main method we can see one clear change, and that’s the addition of a loop, which in this case is a for loop indicative by the flow returning and and incrementing variable which will be compared against the value 5A0h in hex (1440).

Lab06-04.exe

Question 3

What is the difference between this lab’s parse HTML function and those of the previous labs?

Answer 3

Looking at this lab’s parse HTML function, the first difference we can see is that it now takes in an argument as apparent with the reference to arg_0, and a new variable szAgent.

Lab06-04.exe

Looking further we can see that this is actually used now in the User-Agent which we previously identified had changed and that szAgent is being populated with the formatted user agent value.

Lab06-04.exe

This differs from the previous which didn’t take any argument and never changed.

Question 4

How long will this program run? (Assume that it is connected to the internet.)

Answer 4

The for loop in this case is comparing against the value 1440. Because computers are so fast the time it takes to run this amount of checks is negligible; however, we can also see that the program sleeps for 0EA60h (60,000) milliseconds in between each check.

Lab06-04.exe

Due to this we can assume the program will run for 1440 * 60,000 milliseconds = 86,400,000 milliseconds.

This number doesn’t make a lot of sense so let’s convert, 60,000 milliseconds is 1 minute. So for 1440 minutes, if we divide by 60 we find that this will run for 24 hours.

Question 5

Are there any new network-based indicators for this malware?

Answer 5

As mentioned in question 3, this has a new User-Agent, so the User-Agent:

Internet Explorer 7.50/pma%d

will be a new indicator, where %d is equivalent to var_C or in this case the number of minutes that have passed since the program started. This can be used to monitor how long it has been running for in each request.

Question 6

What is the purpose of this malware?

Answer 6

The purpose of this malware is to check if there is an active internet connection, if there is it will proceed to try and open the URL http://www.practicalmalwareanalysis.com/cc.htm using the User-Agent ‘Internet Explorer 7.5/pma%d’ which is passed in from a looping incremental user variable (this is used to track how long the program has been running.

If it is successful, it will read in the first 0x200 (512) bytes into a buffer, if not it will terminate. From this buffer the characters ‘<!–’ are read, and if they don’t exists the error message “Error 2.3: Fail to get command\n” is printed.

If these do exist, it will print the message “Success: Parsed command is %c\n”, where %c is the first character read from the HTML comment buffer.

Based on the parsed command between the letters a and e, the program will either:

  • Create a directory at C:\Temp if it doesn’t exist
  • Copy a file to C:\Temp\cc.exe, the file passed is lpExistingFileName which we know would be the program name
  • Delete the file located at C:\Temp\cc.exe if it exists
  • Create persistence using the subkey HKLM\Software\Microsoft\Windows\CurrentVersion\Run with the value ‘Malware’ pointing to C:\Temp\cc.exe
  • Sleep for 100,000 milliseconds (100 seconds)
  • Display the error message “Error 3.2: Not a valid command provided”

Based on the incrementing variable this program will run for 24hours before terminating. By renaming these functions and viewing the main method flow chart, we can easily see the flow of this malware.

Lab06-04.exe

This concludes chapter 6, proceed to the next chapter.