Reversing the Trendnet TS-402

Share

The Trendnet TS-S402 is a discontinued network storage enclosure that was sold to individuals for personal data storage. Like every Internet-of-Things (IoT) device, it runs on software programmed and/or configured by the manufacturer before shipping it to the end-user, i.e. the firmware. Firmware versions 2.00.10 and below of this particular device have a serious vulnerability allowing remote root access . This target thus provides an excellent exercise for reverse engineering while providing an example of a vulnerability that is unfortunately way too common in IoT: backdoors by design. In this post, we will introduce Binwalk and provide the background necessary to do the same on a large variety of firmware for consumer-level devices, using the TS-S402 as a practical example. A video of this post is also available.

Trendnet TS-S402
The Trendnet TS-S402 Network Storage Enclosure (from trendnet.com)

The Trendnet TS-S402

Before reversing any device, it’s important to actually understand its functionalities, components and any other piece of information that may help along the analysis of its firmware. The webpage of the product highlights the following features:

  • Access your data from the Internet (FTP) and on your local network
  • Microprocessor: Marvell 88F5182
  • IDE Controller: ITE IT8211F
  • Real Time OS: Embedded Linux (Kernel Version 2.4.25)
  • File Protocols: Microsoft Networks (CIFS / SMB) Internet (HTTP 1.1) FTP, FTPS (SSL FTP)

Why are these facts important? Not all of the will be useful, but some may provide you with an overall idea of what to expects once you start analyzing the firmware file. In many cases, especially with consumer-level devices, reversing firmware is fairly straightforward and common open-source tools will do the heavy lifting for you. But if you move on to industrial firmware, awareness of the device is important as you will be faced with unheard operating systems, libraries and unknown file formats. In this case, we can expect to see a Linux-based Operating System (OS) hosting a HTTP and FTP server, along with Samba compatibility. The manufacturer is even generous enough to provide the underlying microprocessor, which can be helpful when conducting even deeper analysis for vulnerabilities in the binaries.

Reversing the Firmware

The vulnerability we are looking for is present only in versions 2.00.10 and below of the firmware, which you can download from the repository of the company. Unzip the archive and you’ll to obtain the following files:

  • TS-S402_FW_2_00_10.bin
  • readme.txt
  • release_TS-S402.txt
  • REMOTE_PACKAGE_2_20.bin

It’s always a good idea to read the release notes and README files. Doing so may save you time and headaches trying to figure things out. If you’re into bug hunting, the release notes can be useful to list the changes and patches included in this version, providing potential hints to patched vulnerabilities of previous versions.

The two “.bin” files contain the programs and OS of the device. In this case, based on the filename, the TS-S402_FW_2_00_10.bin is the main firmware file and thus, the focus of this post. The first step is always to check if we can determine the file type by using the file command. If we are lucky, it is a known file type and some application exists to extract the relevant files/information out of it.

However we are not so lucky. The file command returns “data”, which means it found the file to be binary data without any specific structure or format. So we will need to use a more powerful tool: binwalk. Binwalk is very useful reverse engineering toolkit which can analyze and extract files from unknown binary files. However note that it can also return quite a few false positives. The only way to recognize them is with experience and trial-and-error. If not already done, install binwalk with apt using  sudo apt-get install binwalk and run the following command:

The command above asks binwalk to check out the TS-S402_FW_2_00_10.bin file and try to find interesting files or structures inside it. We use the “-x lzma” argument to eXclude any findings about LZMA-compressed data: these are false positives in this case. You will obtain the following result:

In other words, there seems to be a 32-bytes header followed by a GZip-compressed file. At this point, we want to carve this gzip file out of the binary file for further investigation. You can use the dd command to do so, but binwalk provides the -e option to Extract files for you.

The carved files will be outputted to a directory labelled _TS-S402_FW_2_00_10.bin.extracted, in which you will find a single file called 20, which is the offset of the file in the larger firmware. Using the file command again, we now get a more interesting result:

This time, the file command clearly recognized a TAR archive, meaning we can simply untar the file with the command below:

This archive contained even more files: a uImage and a filesystem:

  • uImage
  • rootfs.armeb.squashfs

The uImage is the boot loader of the firmware and you will often find this file or something similar in most Linux-based firmware. Analysis of the uImage will be left for another post. For now, we are interested in the filesystem contained in rootfs.armeb.squashfs, which as its name implies, is a SquashFS. To access the files it contains, we can normally use the unsquashfs tool, however in this case doing so won’t work:

Depending on how the file system was created and the development software used, it may have incompatibilities with the way unsquashfs expects the file system to be structured. As a workaround, we will use Sasquatch, which is more flexible when it comes to extracting file from non-standard Squash file systems. Clone the project and build the code by following the instructions n the README.md file, or you can download a pre-compiled binary for Ubuntu and variants. Now let’s try again carving out the files, but with Sasquatch this time by typing  ./sasquatch rootfs.armeb.squashfs . After a while, Sasquatch will extract all files in the squashfs-root directory, at which point you can finally access the files hosted on the targeted device.

Find the Backdoor

There is a fairly obvious backdoor hidden in the file system. Go ahead and explore the files and configuration and see if you can find it. When you are ready read on.

Most commercial devices are accessible remotely via a web interface. The web server and its contents are therefore a good starting point to hunt down potential vulnerabilities. On the TS-S402, the web application is located in the /home/httpd directory. The partial listing of the contents of this directory is included below:

As you can see, one of the web page is named “backdoor.html”; quite an obvious indicator that something is wrong. If you look at the webpage, you’ll notice that it seems to enable the telnetd daemon, thus allowing Telnet connections to the device. Unless specially configured to be blocked in the network firewall, this web page should be accessible to anyone on the network, potentially anyone if facing the Web. All that is missing right now is credentials to access the device. Let’s look at /etc/shadow to see if we can potentially figure out the default password for the root account:

Well, there isn’t any password setup for the root account. So in other words, anyone who can access the backdoor.html page on the device can enabled remote Telnet connections, and then login as root. While I haven’t tested it as I do not own this device. this vulnerability was previously confirmed and reported.

Conclusion

This post provided an example of reversing the firmware of a consumer-level IT appliance to locate vulnerabilities allowing remote access to the device. Such vulnerabilities may seem trivial an unimportant until a botnet such as the Mirai botnet comes along and use tens of thousands of these vulnerable devices – which are rarely updated – to DDoS websites across the web. Hence the need to understand the techniques and skills require to pwn these devices in order to defend them.

See Also

Learn More

 

 

Reversing the Parrot SkyController Firmware

Share

Introduction

The Parrot SkyController is a remote control sold for long range flights with the Parrot Bebop drone, sometimes used in the real estate sector. It enables integration of smartphones and tablets for flight control via Wifi connection, allowing transmission of real-time data from the drone. In other words, the SkyController acts as an intermediate between the flight management software (smartphone/tablet) and the actual drone. In this post, we explain how to reverse the firmware of the SkyController to be able o view the contents of the internal operating system and understand its inner workings. A lot of the information provided is based on [1] and is applied on this specific firmware. Code is provided along this post that will extract files and information from the SkyController firmware.

The Parrot SkyController Device
The Parrot SkyController Device (taken from the Parrot Store)

Contents

The firmware of the Parrot Skycontroller appears to be a well-defined format which is reused across multiple products from the company. The firmware contains 2 types of structures the header and a sequence of “data entries”. The header is located at the beginning of the file and is followed by multiple entries which contains configuration and file system information. We will look at both structures in the following sections.

Header

The firmware header starts with four magic bytes which spells out “PLF!”. These can therefore be used to identify valid Parrot firmware files. A quick Python snippet is included below as an example.

Immediately after the magic bytes, a sequence of thirteen 32-bit unsigned integers provides additional information about the firmware. Defined as a C structure, the header would look something like this:

The header version specify the type of format used for the firmware header. Within the SkyController firmware, the value for this field was “0x0D”, possibly indicating that there are 14 variations of header formats. The header size provides the number of bytes included in the header while the entry header size specifies the number of bytes in headers of entries within the firmware. The next 5 integers are undefined at the moment, as is the before-last integer. One of them is likely a CRC32 checksum. The dwVersionMajor, dwVersionMinor and dwVersionRevision defines the version of the firmware. Lastly, the header contains the size of the file in bytes.

Entries

Each entry has a header of constant size, which is specified within the firmware header by the dwEntrySize value. In the SkyController, each entry is composed of 20 bytes used as follows:

 

The section type specify the type of data contained within the entry. This can be configuration data, partition details or file information. These will be explained in further details in the sections below. The entry size contains the number of bytes within the entry, without consideration for null-byte padding. This field is followed by a CRC32 value. It’s unclear at this point how this value is calculated. The last known integer is the size of the data once uncompressed. If this value is 0, then the contents are not compressed. Each type of entry have further fields defined, and these are explained below.

Firmware Entries

There are multiple types of entries within the firmware. In this section we will describes the one we came across in the file we have analyzed.

Bootloader and Boot Configuration (0x03, 0x07)

Entry 0x03 contains a binary file which appears to be the bootloader of the system based on strings contained within, however more analysis will be required to understand how it actually works. In the SkyController, a PLF file was observed, but within that PLF file, this entry had binary data. Similarly, entry 0x07 also appears to be related to the boot process as shown by the string “ecos-bootloader-p7-start-246-ge30badf”, “ecos” referring to the eCos operating system.

File System Data (0x09)

There is a  “File System Data” entry for each file contained in the firmware. As such, this entry is the most common one. Depending on whether the file is compressed or not, the structure of this entry is slightly different: when its contents are uncompressed, this entry starts with the filename (or directory name), which is a zero-terminated string. The name is then followed by 3 unsigned integers.

The first integer contains flags specifying the type of file, which can be a directory, a normal file or a symbolic link. This is specified by reading the bits 12 to 15. The 12 other bits contains the permissions of the file. Converted into octal form, the last 12 bits will provide the same format as the one used in Linux. For example:

Extracting File Permissions from the Firmware Entry
Extracting File Permissions from the Firmware Entry

When compressed, the flags are within the Gzip-compressed data. The data must therefore be first uncompressed and then, the same procedure applies: the filename will be a null-terminated string right at the beginning, followed 3 unsigned integers, the first containing the type and permissions. The remaining bytes are the content of the file. In the example below, the filename is system/pulsar/etc/boxinit.hosted. The flags are 0x000081A0 (little-endian) and the contents of the file starts at 0x32. 

Partition Data (0x0B)

The 0x0B entry contains information about partitions on the device.The header of this entry is composed of 10 values. The first 4 integers are version information, the next 5 ones are unknown and the last unsigned integer is the number of partitions defined within the entry.

For each partition defined within the entry, a sub entry is available which contains further information about the partition: the device ID, volume data and the mount point on the target device. The defining structure for this entry are:

Volume type can be either RAW, STATIC and DYNAMIC. A raw partition contains no file systems, for example swap partitions. Static partitions are read-only, such as the boot partition and finally, the dynamic partitions are read-write and contains modifiable files.

Installation Data (0x0C)

This entry was included early in the file and contained another PLF file within its data. This PLF file contained a 0x00 entry, which is unknown, a 0x03 entry and a 0x07 entry. The 0x07 entry contained boot options:

Conclusion

I have upload a Python program [2] which will extract information and files from the SkyController firmware here. It also successfully extracted files for a Bebop2 firmware and I suspect it will work for other recent firmware. With additional work, it may reverse older ones. Repacking an unpacked firmware should be doable, but I do not have any of the device to test it out. Before doing so, determining how the CRC value for the header and each entry is calculate is required. Future work would include testing additional firmware files and implementing the program to repack an unpacked and modified firmware. Finally, reading the eCos documentation may clarify many of the unidentified values.

References

[1] “PLF File Format”, E/S and I, https://embedded-software.blogspot.ca/2010/12/plf-file-format.html (accessed 2016-12-13)

[2] Racicot, J. “Vulture”, https://github.com/InfectedPacket/vulture (accessed 2017-01-03)

(Bad) Amazon Phishing Email

Fortunately, my wife is a smart cookie and always suspicious of weird looking email. Maybe its due to the fact she lives with a paranoid guy. In any case, she caught this phishing email, which appears to be from Amazon, and leads to a fake login page.

Share

Introduction

Fortunately, my wife is a smart cookie and always suspicious of weird looking email. Maybe its due to the fact she lives with a paranoid guy. In any case, she caught this phishing email, which appears to be from Amazon, and leads to a fake login page.

Contents

The phishing email comes from “amazon@iservice.co.org.il” with the terribly spelled subject “your accounnt information need to be updated” and the content is a screenshot of an authentic Amazon email, thus bypassing filters. However, the attacker succeed in misspelling the only field he had to fill.

A fake Amazon account confirmation received which contains a single image.
A fake Amazon account confirmation received which contains a single image.

Clicking anywhere on the image will redirect the target to ‘http://bestofferz.biz/service/support/wp-admin/support/support/”, which host a fake login page as shown below:

Fake Amazon Login Page
The attacker is hosting a fake Amazon login page on HostGator

So by looking under the hood, we can see that the entire page is actually a single javascript function call to decrypt a long Base64 encoded string.

The encryption key used is stored in the hea2p variable and the HTML code. The entire code can be analyzed here and using the AES Javascript code here. If the target enters his emails and password, he will then be forwards to a fake account creation page asking for his address.

Fake Amazon Account Creation Page
Fake Amazon account creation page.

And of course, it will then ask you for your credit card information, which is possibly the end goal of the phisher.

Fake Credit Card Information Request Page
Fake Credit Card Information Request Page

All the pages are encrypted using the same key. Only after entering this information to the target get redirected to the real Amazon website.

Successful Phishing Operation Page
Successful Phishing Operation Page

Conclusion

Remember to always check the URL and the from email address !

Repost: Stack-based Buffer Overflow Vulnerabilities in Embedded Systems

The buffer overflow attack vector is well documented in desktop and server class machines using the von Neumann memory model, but little has been published on buffer overflow vulnerabilities in Harvard architectures.

Share

I have not written or contributed to the enclosed research paper. I’m simply reposting it here because it’s interesting and for some reason, appears available only via Google cache. So before it disappear from results, I’m reposting it here.

This paper discusses a technique to conduct buffer overflows on processors using the Harvard architecture. In this architecture, the stack starts at the beginning of the memory and grows up, versus Von Neumann architectures in which it grows down.

Abstract:

Most small embedded devices are built on Harvard class microprocessor architectures
that are tasked with controlling physical events and, subsequently, critical infrastructures. The Harvard architecture separates data and program memory into independent address spaces, as opposed to the von Neumann architecture that uses a unified memory system with a single address space for both data and program code. The buffer overflow attack vector is well documented in desktop and server class machines using the von Neumann memory model, but little has been published on buffer overflow vulnerabilities in Harvard architectures. In this paper we show that stack-based buffer overflow vulnerabilities exist in embedded control devices based on the Harvard class architecture. We demonstrate how a reversal of stack growth direction can greatly simplify the attack and allow for easier access to critical execution controls. We also examine popular defense techniques employed in server and desktop environments and the applicability of those defenses toward Harvard class machines.
Link: Kristopher Watts & Paul Oman, University of Idaho, Stack-based Buffer Overflow Vulnerabilities in Embedded Systems

#TheGreatFTPHunt – 2% to 9% of files scanned potentially containing confidential information

In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. We also present the latest statistics collected in our database.

Share

Introduction

In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. Based on the current result, 2 to 9% of the 3000 files reviewed were sensitive or potentially sensitive. Most of the sensitive files are concentrated on a few hosts. These files often include financial information or project data from businesses. So far, 773 hosts containing around 4.5 million files have been scanned.

Discussion

The amount of filenames collected is quite large and we cannot evaluate manually each filename for its probable sensitivity. As such, we need to devise a procedure to automatically assess its sensitivity. We have some definitions and restrictions to list first to clarify what a sensitive file is and limitation to our evaluation criteria.

In this document, sensitive file refers to user-generated or software-generated files based on user input that contains information that should probably not be publicly accessible and which can be leveraged against an individual or organization. This includes:

  • Personal identification documents; passport, driver’s license, visas, government forms…
  • Personal finance documents; income tax files, insurance forms, credit card statements, mortgage, pay stubs, banking information
  • Personal medical documents; prescriptions, medical records
  • Work-related files; emails, proprietary source code, password lists
  • Business finances; customer lists, sales data, project costs, business deals, investments, payrolls
  • Intellectual property; blueprints, schema, patents, research
  • Network configuration; passwords files, configurations files, network diagrams, user databases
  • Large databases of emails, addresses and other personal information.

Some of the files not included in our analysis that includes;

  • Copyrighted / Illegally downloaded files. However we considered text file containing licensing keys to be sensitive.
  • Inappropriate contents (nude selfies, personal politics, group affiliations etc…)
  • Personal pictures, letters.
  • Addresses and emails were not considered personal, however databases of addresses and emails are considered sensitive

Because of the volume, we cannot download and manually verify each file to confirm its contents, as such our main restriction is that our assessment must be done solely based on the absolute filename recorded. As such, to evaluate the sensitivity, we used three categories; positive, negative and neutral, i.e. either a file is very likely to sensitive, potentially sensitive or clearly not sensitive at all. Of course, there is always a possibility that a file labeled as sensitive may not be. For example, a file called social security numbers.xls may contain only formulas or an empty form. Ideally, files identified as positive or neutral should be manually vetted.

The procedure to automatically assess the sensitivity of a file based on its path and name is first done by assessing a random sample manually. Using the ORDER BY RANDOM (note: there will be a need to review if this function is truly random, which I doubt) function (performance is not an issue in this experiment) of the Postgresql database, multiple  random samples of 100 filenames are retrieved from the database. Each file is shown to the evaluator which based on the path, filename and extension assess the sensitivity of the file as ‘positive‘, ‘neutral‘, ‘negative‘. For each run, we log the count of hits for all categories.

Listing 1 : Example of a run in which a script asks an evaluator to assess the sensitivity of files based on its absolute path.

The evaluator is assessing the filename based on keywords that may indicate the contents of the file. As such, a file containing the word, or as we call it in this document, a token such as sales, passport or passwords will be assume to contain information about sales, a passport scan or a list of passwords. In many cases, the filename is too obscure, but the path and extension may indicate the contents of the file. For example, a path containing the tokens project, finances and a Microsoft Excel extension despite a filename of axe189212_c.xls will be considered as neutral, as the file may contents information about a project. Examples of both scenarios are shown in listings 2 and 3:

Listing 2 : Examples of files that were deemed ‘positive’ hits based on keywords in their absolute path.

Listing 3 : Examples of files that were deemed ‘neutral’ (or ‘unknown’) hits based on keywords in their absolute path.

Filenames in foreign languages are roughly translated using Google Translate, as such, many of them are labeled as unsure.

A Python script then divide the filename in tokens, and each token is stored in the database along with the number of times it was found in a positive, neutral and negative hit. Tokens are created slightly differently based if they are located in the path, the filename or in the extension. For the extension, a single token is created which contains the extension itself. If the file does not have an extension or is not an extension usually associated with known software, no token is created. For the filename, tokens are created by splitting each word using characters usually known to separate words such as the underscore, dash, period or spaces. Lastly, for the path, directories are used as token and unlike filenames, are not split further. An example of this process is shown in listing 4:

Listing 4 : Example of the tokenization of a filename.

Once the tokens are created, the script will either add the token in the database or update its count based on the evaluator choice. After each update, a score is given to the token, which is simply the ratio between positive hits and the total count of appearances: p / hits). Note that tokens are considered different depending their location in the filename. As such, a filename such as /My_Passport/backup/Outlook emails backup.pst, will generate 2 distinct ‘backup’  tokens; the one from the path and the one from the filename. We explain this decision in the next paragraphs.

Listing 5 : Scores of the tokens extracted from the file in listing 4.

By using this procedure, we believe that tokens appearing often in both positive and negative hits will cancel each other, while the tokens strongly associated with positive and negative hits will remain clearly divided. Some sort of mathematical should follow later one (I hope…need to review discrete maths I guess). Some preliminary results  appears to confirm this approach as valid. Extensions strongly associated with sensitive contains higher scores while media files have null scores.

However, there is a need to further refine this process by associating a value, or weight, to the location of the token. Tokens in the path are not as indicative of the sensitivity of the file as much as a token in the filename or extension. Even within the path, the highest level is generally less indicative than the lowest one, i.e. /documents/finances 2012/sales/company sales.xls. Therefore when assessing a new filename, we need to give a score to the path, the filename and the extension. For the path, we will get the score of each token and multiply it with a weight that correspond to its location in the structure. For token that are not found the default value of 0 will be given. Then we will take the average of all token for the score of the path. As for the filename, we will not consider the position. Finally the stored score of the extension will be retrieved from the database. If the extension is not found, then a score of 0 will be used. This will transform a filename into a set of three real values which we can range between 0 and 1. To determine the weights needed for each location, we will used a supervised neural network. More research will be conducted to determine how to use this approach.

Results

As of 16 July 2015, 4,568,738 files have been recorded from 773 hosts.

Country Hosts
United States 258
Russian Federation 91
Sweden 69
Canada 66
Ukraine 27
Norway 24
United Kingdom 24
Australia 19
Netherlands 18
Hong 18
Taiwan 16
Poland 15
Germany 11
Romania 11
Finland 10
Switzerland 8
Korea 8
Singapore 7
Czech Republic 7
Japan 6
Table 1. Location of the 773 hosts scanned as of 16 July 2015 order by country.

Mp3 and JPEG image files remains the most common. As such, we focus our statistics on document-type of files for a change, i.e. Office documents. Adobe PDF files and Microsoft Word documents are the most common file types based on our current data as shown in figure 1.

Most common file types scanned as of 16 July 2015 for office-related documents
Figure 1. Most common file types scanned as of 16 July 2015 for office-related documents

At the moment, around 3000 files have been assessed (30 runs of 100 samples). For each run, we recorded the number of positives, neutral and negative hits and found them overall constant at each run. (see figure 2) However more details about the RANDOM function is needed to insure the randomness of the sample. This part may need to be redone. So far, between 2% and 9% of files scanned are considered sensitive or potentially sensitive (see figure 3). However we need to consider the concentration of these files to put this information into perspective. The 278 files identified as sensitive or potentially sensitive were located on 59 hosts, with one host accounting for 101 of these file. This indicates that files of interests for an attacker are likely to be concentrated on a few hosts.

Chart of assessed sensitivity of randomly selected 30 samples of 100 filenames.
Figure 2. Assessed sensitivity of randomly selected 30 samples of 100 filenames.
Chart of percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.
Figure 3. Percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.

As for tokens, we will have to consider the entire collection of filenames in order to have sample from multiple sources, as such, we will pursue manually assessing samples of 100 filenames as more data is collected. After which we should have an excellent training set for the neural network. Some high-recurring and high-scoring tokens are shown in tables 2 and 3.

Token Hits Score
attach 7 0.9285714286
txn 7 0.9285714286
planning 6 0.9166666667
archived 6 1
recpt 6 0.9166666667
2010taxreturns 5 1
person~2 4 1
purchase 3 1
order 2 1
Паспорт 2 1
Table 2. Sample of high-scoring tokens sorted on the number of times observed.
Token Hits Score
jpg 938 0.013326226
mp3 460 0
music 448 0
seagate_backup_plus_drive 382 0.1452879581
asusware 348 0
pictures 309 0.0048543689
sda1 285 0.0649122807
bigdaddy 279 0
elements 278 0.0485611511
transcend 247 0.0222672065
my_book 234 0.0106837607
Table 3. Sample of high-recurring tokens sorted on the number of times observed.

Conclusion

While these results are preliminary, they nevertheless seems to provide a solid indication of what one can find on publicly-available removable drives. Additional work and fine tuning of both code and processes is required to provide more accurate data and the next step while the scan is still on going it to develop a methodology to assess the sensitivity of all files, likely using a neural network for classification based on the method presented above.

Useful T-Shark Commands for Intelligence Gathering from Network Traffic

T-Shark is practically the command-line version of Wireshark. It has the same basic capabilities but with the added flexibility offered by using the command-line to process outputs and send them to other applications. Below I’ve enclosed some of the commands that I have found myself reusing over and over again.

Share

Introduction

T-Shark is practically the command-line version of Wireshark. It has the same basic capabilities but with the added flexibility offered by using the command-line to process outputs and send them to other applications.  Furthermore, T-shark is ideal for large PCAP files which Wireshark may have difficulty digesting, especially since it has to load the entire contents of the file prior to any kind of filtering. As such, T-Shark is my main tool for analyzing PCAPs. One of my main use is to extract specific information from the network for investigation. Below I’ve enclosed some of the commands that I have found myself reusing over and over again.

Contents

When analyzing PCAPs, I’m mostly concern to locate anomalies based on intelligence on various current actors. I’m especially interested in analyzing covert channels for specific indicators that are usually indicative of malicious traffic. These often include typos in popular URLs, weird looking domain names, emails with suspicious attachments or certificates with random fields. To extract the data, I’ve used the following T-shark commands:

Extracting URLs from HTTP Requests

In the example above (and the ones below), I’m reading network traffic from a offline PCAP file, which is why the -r lab1.http.pcap parameters is present. The -T fields specify to output the value of the fields specified with the -e parameter. For each field you want to output, you specify it using the -e <field> option. The -R <filter> is the filter to apply to the traffic. For example in this case we filter only HTTP traffic. Sort and Uniq are Linux applications used to sort the output of a program and remove duplicate entries.

Extracting Filenames from FTP uploads

Extracting URLs from DNS Requests

Extracting Recipients’ Email Addresses of Inbound Emails

Extracting Senders’ Email Addresses of Inbound Emails

Extracting Subjects of Inbound Emails

Extracting Source URLs of X509 Certificates

Extracting Information from X509 Certificates

Netminecraft

Since these proved useful on many occasions, a simple Bash script called Netminecraft was made to automate their usage.

Usage

To split a larger PCAP file into protocol-specific PCAPs, use;

Note that the scripts only passes the contents of <protocol> to
t-shark. As such, you can specify any Wireshark filter to extract
even more specific information, for example:

However avoid filters with spaces as the current version of the
script does not manage spaces. The results will be saved in a file
in the current directory as <protocol>.pcap, for example http.pcap.

To mine for data relating to a specific protocol, use;

The output file will contain text data that have been sorted and in
which the doubles will have been removed using ‘uniq -i‘, i.e.
we ignore the case of the items.

Examples

The example above will extract the URLs of all the DNS queries found
in the file dns.pcap and will output a list of URLs in dns.queries.txt:

Conclusion

Learning to use T-shark has many advantages that can increase efficiency, security and flexibility. It allows for scripting the extraction of data and storage into databases from which further analysis can quickly be done for anomalies. In this short post, we have listed some examples of how T-shark can be used, but it barely scraps the surface.

References

T-Shark Manual, The Wireshark Network Analyzer, https://www.wireshark.org/docs/man-pages/tshark.html, accessed on 2015-02-24

Possible Phishing Campaign Against Academic Institutions

On 26 and 27 April 15, multiple colleagues from my department received a phishing email from legitimate, but likely compromised emails from Indian-based academic institutions. The objective was likely an attempt at stealing email credentials from the target.

Share

Introduction

On 26 and 27 April 15, multiple colleagues from my department received a phishing email from legitimate, but likely compromised emails from Indian-based academic institutions. The objective was likely an attempt at stealing email credentials from the target.

Analysis

The details of the one of the email received are included below:

The email was sent with “Low Importance”, probably as a way to attract additional attention as some people often ignore the “Urgent” flag, since they receive so many of them. Emails with the “Low Importance” flag and icon are more uncommon. Because users don’t see low importance emails often, they tend to open it before the urgent ones.

While the message was caught as SPAM by Symantec, this is clearly more than an attempt at SPAM and attempts to lure the target to a ContactMe page in order to receive an upgrade to their webmail applications. The “Contact Me” page requests information about the user and its credentials. Of note, many of the characters on the page are accentuated for some reason – possibly as to bypass restrictions of the service:

Phishing ContactMe webpage send to my email account.
Phishing ContactMe webpage send to my email account.

Once the target filled the information and click “Send Message”, a thank you note is displayed. The use of the ContactMe website prevents the operator from having to stand up a web server and buy/register a domain, which leaves a lot of traces but ultimately may look more convincing, as it will not have any content restrictions.

The sender’s email is pprusti@immt.res.in, which based on the suffix, is an email from the Institute of Minerals and Materials Technology, an advanced research institute in the field of mineralogy to materials engineering, established in Bhubaneswar, Odisha. A contact page on the website links the email address to Ms. Pallishree Prusti, from the Mineral Department. Therefore, this is a legitimate email address, which more than likely has been compromised.

Email of Ms. Pallishree Prusti from the Website of the IMMT
Email of Ms. Pallishree Prusti from the Website of the IMMT

Similar emails from a different sender, systemadmin.net@muhas.ac.tz, was reported as well. This time, this email is associated with the Muhimbili University of Health and Allied Sciences (MUHAS)in Tanzania.

In this operation, emails targeted were academic addresses. It appears that the operator may be targeting academic personnel by sending fake email upgrade notifications. It is unclear were the adversarial operator found the target email addresses. While ths operations does not appears advanced or sophisticated in any kind, it is very targeted. The adversary in this case may only be some student phishing for papers he can sell online or use for his own degree.

Indicators (for signatures)

  • http://www.contactme.com/553d4f488ed40c000300bde2

Conclusion

This phishing email may just be another one amongst many others and does not appear to be from a highly skilled operators. While the campaign appears targeted at academic institutions, it does not appear to target a specific field of activity. If you have encounter this email, please leave a comment with some information that would help determine if this specific operation is targeted or just another large campaign from a criminal group or a botnet.

The Past, Present and Future of Chinese Cyber Operations

China, as one of many alleged actors on the frontier of cyber espionage, is best understood by briefly examining the past century, how it influences contemporary cyber operations attributed to Chinese-based actors, and how they could be used against the Canadian Armed Forces in a potential Southeast Asian conflict.

Share

Out of nowhere, here’s an article I wrote for the Canadian Military Journal. China,  as one of many alleged actors on the frontier of cyber espionage, is best understood by briefly examining the past century, how it influences contemporary cyber operations attributed to Chinese-based actors, and how they could be used against the Canadian Armed Forces in a potential Southeast Asian conflict.

See the full article here: https://www.academia.edu/7633668/The_Past_Present_and_Future_of_Chinese_Cyber_Operations; or

here: http://www.journal.forces.gc.ca/vol14/no3/PDF/CMJ143Ep26.pdf

 

Starting in Exploit Writing – Day 06

Share

Time for tutorial 6 (yeah I skipped #5, as I got the gist of it). After 6 days, I really got the hang of buffer overflows…too bad they aren’t much in use anymore, but everyone has to start somewhere. So below is EIP and SEH being overflowed with 0x0041, indicating that our “A” where translated to Unicode characters.

Buffer Overflow in Trilogic Player
EIP is being overflowed by 0x0041, indicading a buffer overflow with Unicode characters

Mona locates the Metasploit pattern overwriting SEH at 536. So our payload will be something like

And everything falls in place;

Overwriting nSEH and SEH in Trilogic
Both nSEH and SEH are overwritten with the expected values.

This is where it gets tricky. When following the tutorial, I was never able to see that EIP contained 0x00440044. It actually contained 0x00440045 after moving after the Access Violation exception. I can clearly see that SEH has been overwritten with 0x00420042 and 0x00430043. Fortunately according to one comment, I’m not the only one. When I tried to put 534 bytes of junk rather than 536, I end up with 0x0042004D in EIP (difference of 8). I don’t get why it does that.

At least I’m confident I understand the concepts, but that’s pretty much useless if I cannot put it into practice. For this example, mona can find POP-POP-RET addresses that will work when translated to Unicode. From my understanding, [nSEH] will be useless in Unicode-based exploit, simply because there’s no easy way to have a jump matching Unicode encoding. As such, we’ll need to jump to a register which contains a address close to our shell code and either 1) make sure no instruction between the address and shell code will break the exploit or 2) modify the register (add/sub) to have it jump into a NOP slide to our shell code…and then we have to figure out the shell code, but from what I read, talented people have already took care of this for us.

Useful sites (a.k.a shit I need to read):

  1. http://ref.x86asm.net/coder32.html
  2. http://www.blackhat.com/presentations/win-usa-04/bh-win-04-fx.pdf
  3. https://www.corelan.be/index.php/2009/11/06/exploit-writing-tutorial-part-7-unicode-from-0x00410041-to-calc/

Starting in Exploit Writing – Day 05

Share

Today we’ll be studying “egg hunters” as of part 4 of the FuzzySecurity tutorials. Basically this technique is useful when the buffer remaining for your shell code is too small to do anything useful. So far we’ve been lucky and we had huge buffers to inject the shell code, but some time, you may be left with a couple of bytes only. With a “egg hunter”, instead of storing your shell code, you store a small function that will look for “magic bytes” in memory. Your shell code somewhere else in executable memory and tagged with these “magic bytes” so that your egg hunter finds it and executes it. Pretty straight forward.

Exploiting Kolibri Web Server

So my goal today was to test out the egg hunter example from the tutorial. However I got cocky and try to exploit the web server by smashing SEH rather than a normal buffer overflow. Everything went fine until I had to call the address containing the POP-POP-RET.

SEH Exploitation of the Kolibri Web Server
Exploitation of Kolibiri Web Server using the SEH Techique

The problem I’m running into is when looking for a “POP POP RET”, all the results are within Kolibri.exe, and all addresses starts with a null byte, i.e. “\x08\x89\x51\x00”. That means our payload will terminate at the null byte. I thought this wouldn’t be much of a problem, since [SEH] is an address anyway, the only difficulty would be that I may have to put the payload in the buffer prior to the [nSEH], and have a short jmp into the buffer placed into the [nSEH]. Now for some reason, it seems that SEH catches the exception, but never jumps to my POP-POP-RET address; it just terminates the program.

Pop-Pop-Ret Instructions from Kolibri Web Server
All the Pop-pop-ret instructions are located in Kolibri.exe and contains a null byte.

So I got stuck here for today. More to follow tomorrow