Removing Debugging Information from Visual C++/C# Projects

It’s often surprising how many malware programmers forget to do the simplest things. Mostly because many are so concerned with functionality, stealthiness and other production concerns, that details slip easily of their minds – a clear advantage to forensics. One of these details is the Program DataBase (PDB) information added by Visual Studio, which most malware authors used for Windows development. While it may seem innocuous, this string reveals a lot about the operating system used by the author, its user name and most notably, symbols that can be used by IDA and ease understanding of the disassembly.

Share

Introduction

It’s often surprising how many malware programmers forget to do the simplest things. Mostly because many are so concerned with functionality, stealthiness and other production concerns, that details slip easily of their minds – a clear advantage to forensics. One of these details is the Program DataBase (PDB) information added by Visual Studio, which most malware authors used for Windows development. While it may seem innocuous, this string reveals a lot about the operating system used by the author, its user name and most notably, symbols that can be used by IDA and ease understanding of the disassembly. This information allows to link multiple pieces of malware together, by using the username for example. Of course, this also allows for the creation of signatures. Thus, removing this information will add a hurdle to the analysts.

Contents

The Program Database File

The Program Database (PDB) is a binary file used to store debugging information about DLL and EXE files. The PDB file is created when you build your project and stores a list of symbols  their addresses along with the name of the file and the line number on which the symbol was declared. PDB files is also used for services collecting crash data to send it to developers for resolution.

Debugging Information

In Visual Studio, you can select to build your project in Debug or Release mode. In Debug mode, VS will include debugging information with your executable. In Release mode, no debug information is included by default, but in some cases is enabled so that if the program crashes, information can be retrieved and sent to the author for fixing. However for some reasons, some developers don’t really bother to use the Release mode, and simply use the executable generated by the Debug mode. Generally, you don’t want that if you are making malware (or any program really!). If left within the executable, a path to the PDB file will be included and can be extracted:

Path to the PDB file
Path to the Program Database (PDB) file used by Visual Studio for debugging purposes, extracted using the “strings” program.

Within the strings, you can determine that:

  1. The program was developed on Windows 7+ (because of C:\Users folder),
  2. The username of the developer is SUPPORT_23e45RT
  3. The source, or part of it, can be found on Github
  4. The original name of the program is CaitSithTest

These indicators can be useful to link this specific program with others and provide a common link between multiple malware. Additionally, the username could potentially be used to conduct open source research and find linked accounts or forum posts. But wait, there’s more…

If you leave the debugging information, you may be able to restore all the original names of variables and functions of the source code using IDA. IDA will first detect debugging information and ask the analyst if he wants to retrieve it, either via Microsoft – http://msdl.microsoft.com/download/symbols (not browseable)- or by looking locally.

IDA detected that debugging information is available and ask if the user wishes to retrieve it.
IDA detected that debugging information is available and ask if the user wishes to retrieve it.

If for some reason, the user is able to retrieve the information, he will have access to the names of the original symbols, which will make reverse engineering much more easier.

Since symbol information is available, the original names of the variables are displayed.
Since symbol information is available, the original names of the variables are displayed.

Compare the information from the figure above to the figure below, in which debugging information has been stripped at build time:

IDA could not find any debugging information and thus used its own labelling system to identify variables.
IDA could not find any debugging information and thus used its own labelling system to identify variables.

You can see that the variables defined in the first figure, such as ClipboardData, isProcessElevated and isDebugged have been preserved. By keeping information about the symbols, reverse engineering is much more easier compared to figure 2, in which information about the code is lost.

Disabling Debugging Information

To prevent VC from including this information in your executable, right click on your project, go to Project Properties > Linker > Debugging configuration menu. Select No in the Generate Debug Info option.

Removing debugging information in Visual Studio.
Removing debugging information in Visual Studio.

After doing this, rebuild your project and rerun the string extraction program against your binary, the path to the PDB file should not be present in the executable anymore.

no_debug_info
The path to the PDB file is not included in the executable once debugging information has been omitted.

Doing so makes it a bit more difficult to fingerprint the malware and hides information about the author’s system.

Conclusion

This is a simple tactic that is often omitted not only by malware author, but penetration testers, which are often Google programmers, i.e. copy-pasting code snippets from Stack Overflow or googling functions ūüėČ If you attempt to hide your malware into the System32 folder, looking for this information in the EXE or DLL files will quickly tell you which files are bad, since legitimate files will rarely have this info, or have legitimate looking one. As such, if you want to make sure, create a legitimate-Microsoft-looking user (Bill.Gates) on your machine and put your code into a Microsoft-looking project and path (C:\users\Bill Gates\Documents\HTA\Release\).

Useful T-Shark Commands for Intelligence Gathering from Network Traffic

T-Shark is practically the command-line version of Wireshark. It has the same basic capabilities but with the added flexibility offered by using the command-line to process outputs and send them to other applications. Below I’ve enclosed some of the commands that I have found myself reusing over and over again.

Share

Introduction

T-Shark is practically the command-line version of Wireshark. It has the same basic capabilities but with the added flexibility offered by using the command-line to process outputs and send them to other applications. ¬†Furthermore, T-shark is ideal for large PCAP files which Wireshark may have difficulty digesting, especially since it has to load the entire contents of the file prior to any kind of filtering. As such, T-Shark is my main tool for analyzing PCAPs. One of my main use is to extract specific information from the network for investigation. Below I’ve enclosed some of the commands that I have found myself reusing over and over again.

Contents

When analyzing PCAPs, I’m mostly concern to locate anomalies based on intelligence on various current actors. I’m especially interested in analyzing covert channels for specific indicators that are usually indicative of malicious traffic. These often include typos in popular URLs, weird looking domain names, emails with suspicious attachments or certificates with random fields. To extract the data, I’ve used the following T-shark commands:

Extracting URLs from HTTP Requests

In the example above (and the ones below), I’m reading network traffic from a offline PCAP file, which is why the -r lab1.http.pcap parameters is present. The -T fields specify to output the value of the fields specified with the -e parameter. For each field you want to output, you specify it using the -e <field> option. The -R <filter> is the filter to apply to the traffic. For example in this case we filter only HTTP traffic. Sort and Uniq are Linux applications used to sort the output of a program and remove duplicate entries.

Extracting Filenames from FTP uploads

Extracting URLs from DNS Requests

Extracting Recipients’ Email Addresses of Inbound Emails

Extracting Senders’ Email Addresses of Inbound Emails

Extracting Subjects of Inbound Emails

Extracting Source URLs of X509 Certificates

Extracting Information from X509 Certificates

Netminecraft

Since these proved useful on many occasions, a simple Bash script called Netminecraft was made to automate their usage.

Usage

To split a larger PCAP file into protocol-specific PCAPs, use;

Note that the scripts only passes the contents of <protocol> to
t-shark. As such, you can specify any Wireshark filter to extract
even more specific information, for example:

However avoid filters with spaces as the current version of the
script does not manage spaces. The results will be saved in a file
in the current directory as <protocol>.pcap, for example http.pcap.

To mine for data relating to a specific protocol, use;

The output file will contain text data that have been sorted and in
which the doubles will have been removed using ‘uniq -i‘, i.e.
we ignore the case of the items.

Examples

The example above will extract the URLs of all the DNS queries found
in the file dns.pcap and will output a list of URLs in dns.queries.txt:

Conclusion

Learning to use T-shark has many advantages that can increase efficiency, security and flexibility. It allows for scripting the extraction of data and storage into databases from which further analysis can quickly be done for anomalies. In this short post, we have listed some examples of how T-shark can be used, but it barely scraps the surface.

References

T-Shark Manual, The Wireshark Network Analyzer, https://www.wireshark.org/docs/man-pages/tshark.html, accessed on 2015-02-24