There are a number of schools of thought on how to approach reversing malware. Some people jump right into dynamic analysis in effort to quickly learn what the specimen is doing so they can put rules in place on their network to stop it's functionality or see who else might be infected. Some of these people, will then perform static analysis to see if they may have missed something. Others leave it at the results found from their dynamic analysis.
Other people do static analysis first to fully understand the expected behavior so they know if something is happening with the sample when running it in a dynamic lab other than what is expected. They will then run dynamic analysis on it to see if their findings are correct.
Some people just submit the sample to places like Virus Total, Anubis, or CWSanbox, just to name a few . They then take action based on the results they get back from these tools. Finally there are some that just submit it to their Anti Virus vendor and wait for a signature to be released.
There is nothing wrong with any of these methods. Most are done because of either lack of time, skills or understanding of how to reverse malware. Some may think, why reinvent the wheel? This is all OK.
Here on this blog, we enjoy getting our hands dirty with looking at malware. We are not satisfied by just letting someone else do the work. There is nothing like the feeling you get when you have reversed a sample and know what it does and can stop it's functionality based on what you yourself have found. It is a great feeling we hope everyone reading will experience.
So what do I do? Speaking for myself, I tend to follow the method to do some static analysis first. I will then put the malware into a lab and see what it might do differently than what I have found. The personal reason that I do this is because sometimes you will find evidence that a sample is capable of realizing that it is being run in a virtual environment. I have seen this in the wild. I have seen samples that are capable of this which then either don't run at all, or run with fake results.
One sample I found that did this actually went through motions of downloading a new sample, executing it which would only lead to removing the old sample, putting the new one in place then downloading another sample that did the same thing. I only figured this out after I started to realize the samples it was grabbing started to cycle through the same names. This wasted a good bit of time while I chased that ghost. Had I looked at it statically first, then I possibly could have saved myself the time.
I'm not going to tell anyone which method to follow. Choose the path that you are most comfortable with and stick with it. I can only warn you of the things I have found in my time doing this.
This first post of mine is to get you ready to do static analysis. Jamy will be posting an introduction to dynamic analysis which will get into lab setup. I will refer you to his post to start there. If your doing dynamic or static analysis, it is a very good idea to start with creating a lab of non production systems to run your samples in. This could be virtual or physical or both.
Overall process of static analysis:
I want to mention that I will be linking to tools on this blog. I cannot vouch for the validity of the copy you get. Some of these tools may include malware themselves. Try to get a copy of the tools from the original source. You will probably also want to scan the files and watch what your lab systems do after installing them. We will not be held responsible if you download a tool we mention that is infected with a virus. Be cautious and thorough in your investigation of each of these tools!
The first step in your process should be to start a log or collection of the details you are about to find. Some people do this is a text file, others may just jot things down in a notebook. I personally like to use a mind mapping software such as FreeMind. Lenny Zeltser, a SANS instructor and an overall excellent resource for reversing knowledge, has freely released a template for FreeMind specifically for analyzing malicious code. You can download it here.
If you are looking for very good training in this area, Lenny also offers a few courses with SANS that can be taken online or at a conference. He has specifically created, along with others, the Forensics 610: Reverse Engineering Malware , and the Security 569: Combating Malware in the Enterprise courses. I highly encourage you to take these courses if you are interested in malware analysis.
It doesn't matter which method you choose to write your notes down, but it is very important that you do. Documenting this will help you to keep on track and assist you in writing reports of your analysis if you do this professionally. Additionally if you choose a method that allows you to compile all of your findings centrally, you will be able to see trends or recognize similar behaviors of samples that could help in reversing future samples that exhibit similar characteristics.
This blog does not go into how to identify malicious code on your systems. We may do a post on that at a later date. In the mean time, there are plenty of good resources on the Internet to help you in this area.
With that aside, take note of the system you found the malware on. Take notice of the operating system, patch level, applications installed etc. Write down where you found the code (i.e. C:\windows\system32). Add any information that may be relevant on how the code was found (i.e. the system administrator noticed the system was running slowly, or found the system blue screened).
Next take a hash of the file or files found. A hash, in this case, is a mathematical computation on the bits of a specimen. This will help with identification of other copies of the malware even if the name is changed. You can think of this much like doctors and scientists look for specific characteristics of a virus in the human body. That way, other doctors or scientists can identify the same virus in another person.
It is generally accepted to perform a MD5 hash on the file. Some people will also do a SHA1 or other computations as well. There is also a newer method called fuzzy hashing or piecewise hashing that can be done. This actually hashes portions of a file, rather than the whole thing. This method allows for identification of portions of the code which may be useful in catching malware that has changed just a little in order to avoid detection by a person or Anti Virus application for example.
There are many tools out there to do this. WinMD5 is an easy to use tool which is freely available. SSDEEP can be used for fuzzy hashing. What tool you use is up to you. The mathematical computations such as MD5 or SHA are standard equations. All of the applications that perform these actions then should produce the same result. Some operating systems such as *nix varieties have these tools built right in on the command line such as md5 or shasum on the Macintosh I'm working from right now.
The next step that is good to take is to compare the hash values that you found with sites on the Internet to see if there are any matches or similar hashes found by others. This can go a long way in saving you time and effort if it is a known specimen. You can copy and paste your hash value into sites such as Virus Total, or Threat Expert to name a few. You can also run this through an internal database you might have to see if there are any hits. The purpose of this is to save yourself time and energy if this sample has already been analyzed and identified by someone else.
Next you will want to classify your specimen. This classification will be the file type, format, target architecture, language used, and compiler used. These characteristics will let you know what systems may be vulnerable and which may not. For example, if you find a PE file format, you can assume that *nix systems will most likely not be vulnerable to the sample. This is because the PE format is used on Microsoft Windows systems.
TrIDNet with the latest definition files is a good place to start this classification. Minidumper is another free tool which works well. One thing you will notice is that I generally run files though similar tools and compare the output. Sometimes one tool works better than other and will give you more information which may be helpful. I have also seen malware samples that are coded in such a way to recognize tools and change their behavior if they realize they are being analyzed.
GT2 is another good tool to run the sample through in this stage as well. This application attempts to identify the compiler, DLLs and other requirements for the code being analyzed.
You will want to scan the file next in attempt to see if it hits any known signatures. You can do this with multiple Anti Virus applications if you have multiple on your lab systems. You can also take this time to submit it to online sources which run the sample through a number of anti virus applications in effort to see if it's known. Depending on your organization, you may not want to submit the samples to these sources as they share their information freely with the public and anti virus vendors. If this sample is targeted in any way, this can blow your cover that you have found and are analyzing the sample.
If that is not a concern for you, some good examples of these services are Virus Total, Virscan, or Jotti. Again, this is just to name a few, there are many other good choices out there. Pick the ones you like and are comfortable with.
Another step to take is then to start to dig a little deeper. You will want to start looking for executable type, dll’s called, exports, imports, strings etc. This information will start to reveal characteristics of the sample which may lead you to get a better understanding of what it does. For example, if you run strings on the file you may reveal IP addresses used for call back and download components, you may find information that lets you know the file is packed or encrypted in some way, you may find other files that are needed or used in the infection. etc.
Some of the tools that you can use to see this information are BinText, dumpbinGUI, or DependencyWalker to name a few. Again, there are many more out there. These are just some ideas of tools that I like to use. For example if you are running on a *nix server, the command strings will work on most distributions right out of the box. Just open a terminal and run strings filename (where filename is the name of the file you are looking at)
The next stage is to look for packing or encryption. This is often where the process becomes difficult. Some packers are easy to defeat. Others are not. Some encryption routines are easily seen and reversed. Others are custom and difficult. This is the time when a lot of analysts will enter into dynamic analysis. The reason for this is that they may not be able to unpack or decrypt the file by hand. In order for the file to run on a system, it needs to be unpacked or decrypted in memory to function.
Some tools to use to look for this type of information are PEiD, PE Detective, or Mandiant's Red Curtain. These tools are capable of detecting known packers, encryption or behaviors that are common to malicious files to make them difficult to analyze. You may have found this information out already from the output of previous tools.