Malware Analysis Tutorial 10: Tricks for Confusing Static Analysis Tools
Learning Goals:

360 Mobile Vision - North & South Carolina Security products and Systems Installations for Commercial and Residential - $55 Hourly Rate. ACCESS CONTROL, INTRUSION ALARM, ACCESS CONTROLLED GATES, INTERCOMS AND CCTV INSTALL OR REPAIR 360 Mobile Vision - is committed to excellence in every aspect of our business. We uphold a standard of integrity bound by fairness, honesty and personal responsibility. Our distinction is the quality of service we bring to our customers. Accurate knowledge of our trade combined with ability is what makes us true professionals. Above all, we are watchful of our customers interests, and make their concerns the basis of our business.
  1. Explore Use of Stack for Supporting Function Calls
  2. Practice Reverse Engineering

Applicable to:

  1. Operating Systems.
  2. Computer Security.
  3. Programming Language Principles.

1. Introduction
This tutorial explores several tricks employed by Max++ for confusing static analysis tools. These tricks effectively prevent static program analysis tools that plot call graph and extract system call invocation information of the malware. By “static” we mean that the tool does not actually execute/run the malware. Most “smart” virus scanners are static analysis tools. Many of them employ heuristics to tell if a binary executable is malicious or not by examining the collection of the system function calls in that binary. For example, if a binary invokes too many operations related to registry, then an alert should be flagged. If such analysis can be blocked, the malware can significantly improve its survival rate. Note that, however, such tricks cannot block “dynamic” tools which actually run the malware (typical examples include CWSandBox and Anubis).

2. Lab Configuration
You can either continue from Tutorial 9, or follow the instructions below to set up the lab. Refer to Tutorial 1 and Tutorial 4 for setting up VBOX instances and WinDbg. We will analyze the function starting at 0x4014F9. 

Figure 1. Function 0x4014F9 to Analyze

(1) In code pane, right click and go to expression “0x40105c”
(2) right click and then “breakpoints -> hardware, on execution”
(3) Press F9 to run to 0x40105c
(4) If you see a lot of DB instructions, select them and right click -> “During next analysis treat them as Command”.
(5) Exit from IMM and restart it again and run to 0x40105c. Select the instructions (about 1 screen) below 0x40105c, right click -> Analysis-> Analyze Code. You should be able to see all loops now identified by IMM.
(6) Now go to 0x401147, you will notice that it’s “CALL 0x4014F9”. Press F4 to run to the point and then Press F7 to step into the function 0x4014F9.

3. Two-Layer Function Return
We now analyze the first trick of a two-layer function return which disrupts call graph generation. In the following we analyze function 0x00401838. Observe the instructions from 0x401502 to 0x401505 (in Figure 2). Our first impression would be that Function 0x401038 takes three parameters: a pointer to string “ntdll.dll”, 0x7C903400, and 0x7C905D40. However, later you will find that it is not the case: function 0x00401838 is simply used to confuse static analysis tools.

Figure 2. Two Layer Function Call at 0x401505

Figure 3 displays the function body of 0x401838. It starts with a call of 0x00413650 and then a bunch of other instructions (later, you will notice that these instructions will never be executed).

Figure 3. Function body of 0x401838

Notice that at 0x0040183B, it calls function 0x00413650, whose function body is displayed in Figure 4. There are only two instructions: POP EAX, and RETN.

Figure 4. Function Body of 0x413650

Now we have the interesting part. Look at the stack contents in Figure 5. First of all, starting from the third computer word, we have the three words pushed by the code earlier (they are pointer to “ntdll.dll”, 0x7c903400, and 0x7c905d40). Then you might notice that the top two words are the RETURN ADDRESSES pushed by the CALL instructions.

Each CALL instruction consists of essentially two steps: push the address of the next instruction to the stack (so that it can return when the function being called is completed) and then jump to the entry address of the function. Thus, it is not difficult to infer that 0x00401840 is pushed by the CALL 0x00413650 at 0x0040183B (see Figure 3), and 0x0040150A is pushed by the CALL 0x00401838 at 0x00401505 (see Figure 2). So the POP EAX will pop off 0x00401840 and save it to EAX. When the RETN instruction is executed, it is directly returning to 0x40150A (i.e., it jumps two layers back)! Clearly, the instructions starting from 0x00401840 are never executed and the two layer jumping can confuse quite a number of static analysis tools when they try to plot call graphs.

Figure 5. Stack Contents at 0x00413650

4. Invoking NTDLL System Calls using Encoded Table
Next we show an interesting technique to invoke ntdll.dll functions without the use of export table. We will analyze the instruction at 0x401557 (as shown in Figure 6), it calls function 0x4136BF. Later, you will find that function 0x4136BF invokes zwAllocateVirtualMemory without exposing the entry address of zwAllocateVirtualMemory explicitly and it does not use export table.

We leave the analysis of  the logic between 0x40150A and 0x401557 (shown in Figure 6) to readers. Basically the code is to establish an encoded translation table in stack from 0x0012D538 to 0x0012D638.

Figure 6. Code Between 0x40150A and 0x401557

Now observe the function body of 0x004136BF in Figure 7. The first instruction is CALL 0x004136DC. You might notice that between 0x004136BF and 0x004136DC, there are some gibberish code. If you read it more carefully, you will find that they are actually the contents of string “zwAllocateVirtualMemory” where the byte at 0x004136C4 is the character “z”.

Think about the CALL 0x004136BF again. It is essentially two stesp:
    PUSH 0x004136C4   # note 0x004136C4 is the beginning of “zwAllocateVirtualMemory” (see Figure 8)
    JUMP 0X004136DC

Figure 7. Function Body of 0x4136BF.

Now you can pretty much guess the point of the code: it is trying to invoke zwAllocateVirtualMemory! But how is it accomplished? Let’s delve deeper. At 0x004136DC it is calling function 0x00401172 and when it returns, at 0x004136E1, it JMP EAX. Can you guess the functionality of 0x00401172?

Figure 8. Stack Contents at 0x4136DC before the Call of 0x00401172

We list some hints below (see Figure 9):
(1) The loop between 0x0040118E and 0x004011A0 is to compute the checksum of the function name
(2) The loop between 0x4011A2 and 0x004011BA is a binary search. The search is performed on the encoded export table (as discussed in Tutorial 9). Each entry has two elements: (1) check sum of the function name, and (2) the entry address of the function.

You could easily infer that function 0x00401172, given the name of a function, returns its entry address. Once it returns, the return value is saved in EAX. Then JMP EAX at 0x004136E1 will invoke the function.

5. Challenge of the Day
(1) What is the checksum of zwAllocateVirtualMemory?
(2) What are the parameters of zwAllocateVirtualMemory? Why does Max++ call this function?
(3) Look at 0x40117B, what is stored in the thread local storage (EAX+2C)? How do you make your conclusion?

By admin