Installing WordPress on OpenBSD 6.0 with Httpd


In the previous posts, we setup a minimal but secure web server using OpenBSD 6.0. In this post, we start from a fresh install with httpd, MariaDB and PHP 5.6.23 setup on the host. In most cases, you may now want to install a web application on it. One of the most popular is WordPress. If you have followed all the steps in the previous tutorial, installing WordPress will be fairly easy. However, because the web server is sand boxed in OpenBSD, many issues can arise. Additionally, introduction of new application may also introduce new security concerns. In this tutorial, we go through the basics of setting the database and configuring the application. We’ll also assume that you have the networking aspect configured and working. You can also consult the accompanying video.

Setting Up WordPress 4.7 on OpenBSD 6.0

To install WordPress on OpenBSD 6.0 using the native httpd web server requires quite a few steps, but most are straightforward and requires only some Linux command shell knowledge. It’s a good idea to be well-versed in the Bash scripting language and basic Linux/OpenBSD knowledge. In any case, following the steps below will get you going with your new WordPress blog in no time.

Downloading WordPress

Once validated, unzip and untar the archive into your web root directory, likely /var/www/htdocs using:

This will untar all files into /var/www/htdocs/wordpress. Feel free to rename the wordpress directory to anything you’d like.

Configuring the Database

In previous post, we installed MariaDB and thus this section will assume you have installed this database application. Otherwise, refer to the documentation of your database to use the proper SQL statements to create databases, users and manage permissions.

Log into the MariaDB database using  mysql -u root -p your_password . If you are logging from a remote location, use the  -h host argument. Once logged in, we will conduct 3 steps:

    1. Create a database for the WordPress application:

    1. Create a user for WordPress to use in order to connect to the database by using the following SQL statement:

    1. Grant permissions to the new user in order to edit the database and tables as required:

Now, the WordPress application has a place to store data on our database. Before we proceed thought, I encourage you to look at the ~/.mysql_history for a glimpse of what happened while you were doing the steps above. As you will see, the password for the user has been logged into this file. Remove this file with rm ~/.mysql_history  and let’s disable logging to prevent such leaks by adding this line in your rc.conf.local file:

Installing WordPress

From a remote host, use your favorite browser and go to https://<your_address>/wordpress/ and the installer should popup automatically. The first step is create the configuration file by filling information about the database. So gather the following information, which we have from the previous section and click “Let’s Go“:

  1. Database name: Database name use with the “CREATE DATABASE” SQL statement, i.e. “db_wordpress
  2. Database username: Username enter in the “CREATE USER” SQL statement, i.e. “wp_user
  3. Database password: type in your password;
  4. Database host; Enter or ::1. Do not leave it as “localhost” as we want to use the sockets;
  5. Table prefix; Prefix for each table created. Unless you plan to have multiple WordPress sites, leave the default value.
Wordpress Installer Welcome Page
The WordPress Installer will guide you step-by-step on setting it up.

On the next page, enter the required data and click “Submit“. If every thing is setup right, you will be prompted to continue with the setup of the site. However, you may also get a blank “step2” page, i.e. the URL will be “setup-config.php?step=2” but nothing will show up. This problem can be caused by many different things. First, make sure you have setup PHP to use your MySQL database by enabling the proper extensions in the php-5.6.ini configuration file. See previous post for an explanation on how to do this.

Next issue you may encounter is a warning that WordPress cannot create the wp-config.php file. This is mostly due to permissions issues with /var/www/htdocs/wordpress/. The best option is to manually create the file by copy-pasting its contents. Another alternative is to temporarily change the permissions of the directory to allow write permissions with  chmod 777 /var/www/htdocs/wordpress for the installer to create the file. Doing so allows anyone to write and execute code to your directory and as such, it must be change immediately after you are finished installing and configuring WordPress.

Wordpress Fail to Create Wp-config.php
WordPress warns that it could not create the wp-config.file.

Quick Hardening

Before calling “Mission Accomplished”, take some time to test your new site and set the proper file permissions. Create a test post and try to upload an image to it. You may find that it fails, again because of permission issues. According to [1], you should have the following permissions for your WordPress install:

  • Folder set to 755;and
  • Files set to 644, except wp-config.php should be 440 or 400

This can be done with the following commands;

Furthermore, note the following quote from [1]:

No directories should ever be given 777, even upload directories. Since the php process is running as the owner of the files, it gets the owners permissions and can write to even a 755 directory.

Meaning that you should avoid the temptation to solve your uploads issues, or any other issues by setting full permissions, even the upload folder. Based on [2], all files outside the wp-content directory should be owned by your OpenBSD user account so they cannot be modified. The owner of the wp-content will be set to www and will be writable, allowing uploads of files themes and plugins. Note that once you chose your theme and plugins, you could further harden your blog by restricting the wp-content/themes and wp-content/plugins directories as some attackers hide web shells in those.

Retest to make sure it works.

Upload Failures due to Directory Permissions
Setting the minimal and proper permissions on the Uploads directory is critical.

One last quick thing you may want to do is delete unneeded installation files.  WordPress should have remove them for you, but just double check. You can also remove the readme.html and any release notes that may be present, this way, it will be harder for an attacker to find the version of your WordPress installation.


WordPress becomes insecure when adding plugins, which introduces the majority of new vulnerabilities. As such, attempt to avoid unnecessary plugins and themes and uninstall them once they are unneeded. Also enable auto-updates. There are quite further actions you can take to harden your WordPress install, and I’d recommend reading the reference at [1]. You can also review the database permissions you have granted to the “wp_user” in MariaDB, and possibly restrict them to simply INSERT/UPDATE/SELECT/DELETE instructions. Then test your installation with wp-scan, a great, free and open-source WordPress vulnerability assessment.


[1] Hardening WordPress, Core Directories/Files,, (accessed on 2017-01-09)

[2] Correct File Permissions for WordPress, StackOverflow,, (accessed on 2017-01-16)

See Also

Secure WebServers with OpenBSD 6.0 – Setting Up Httpd, MariaDB and PHP


In this tutorial, we setting up a web server on OpenBSD 6.0 using the native httpd web server, MariabDB and PHP. There can be quite a few issues popping up unlike other systems, mostly due to the fact that the web server is “chroot jailed” during execution. In other words, the web server is sandboxed and cannot access other parts of the operating system, which requires more work than other similar setups on other distributions. However, this greatly decreases the damages if your server gets whacked. In this post, we setup a minimal web server that will allow you to host simple web content. I’ll assume you have an OpenBSD 6.0 VM created with root access to it. From there, we will stand up our web server with HTTPS, install MariaDB and PHP. Please note that this tutorial is not meant for professional/commercial settings, but for personal and educational uses. A video version of this tutorial is also available.

Standing up a Minimal Web Server with Httpd

The strategy we will employ is to create a very minimal web server, test if it works as intended, and then enable additional features as we go along. So first, we’ll start by enabling the httpd daemon. To do so, first copy the httpd.conf file from /etc/examples/httpd.conf to /etc/ by typing  cp /etc/examples/httpd.conf /etc . Open the copied file as root using vi or another text editing tool if you have any install. In the file, you will delete all the examples provided and only keep the “minimal web server” and “types” sections:

Make sure you save your changes and now start your web server with the following command:  /etc/rc.d/httpd -f start . Make sure you include the  -f , other you may get an error message. We will fix this later. If everything goes well, you should get an “httpd(ok)” message. Otherwise, there is likely an error in your configuration file. You can confirm by using  httpd -n .

Let’s confirm everything works so far. Retrieve your IP address using  ifconfig em0 and using another host on your network, browse to http://<your_ip>. If everything works as expected, you will see something similar to the figure below:

OpenBSD WebServer - 403 Forbidden
Receiving a 403 Forbidden error from the OpenBSD web server.

We received a 403 error because we do not have any web pages created yet and by default, httpd prevents directory listing – which is a good thing. So let’s create a quick index.html web page. Use the following command  vi /var/www/htdocs/index.html and type the following:

Save the file and point your browser to http://<your_ip>/index.html. You should see your web page. If not, make sure you created the file in /var/www/htdocs and you haven’t made a typo in your URL. Also note that if you go to http://<your_ip>/, you will also end up on your web page. By default httpd looks for “index.html” and serves this web page when none is specified.

Fantastic. We got ourselves a web server. But not a very secure or useful one unless you want to host Geocities-like webpages. Next, we will enable HTTPS on our web server and redirect all traffic to it. We’ll need to do this in 2 steps:

  1. Create a certificate for your web server; and
  2. Setup httpd to use your certificate and HTTPS

First, we’ll need to generate a SSL private key. This is straightforward by using openssl:

The server.key file is your private key and must be secured! It’s very important that nobody else other than you have access to it. Next, we will use this key to generate a self-signed certificate. This is also done by using openssl:

The command above basically requests OpenSSL to generate a certificate (server.crt) using our private key (server.key) that will be valid for 365 days. Afterwards, you will be asked a couple of questions to craft the certificate. Since this is self-signed, feel free to enter anything. Once done, the first step is completed. Next, we will modify our configuration file again and update out minimal web server to tell it to use HTTPS:

Every time your modify the httpd.conf file, you will need to restart your web server for the changes to take effect. Use  /etc/rc.d/httpd -f restart do so and test your website again, this time using https://<your_ip>. You should be greeted with a warning message fro your browser, warning you that it cannot validate the certificate. That’s because it is self-signed. Click on “Advanced” and add it to the exceptions. Afterwards, you will be serve our web page via an encrypted link.

Web Server Certificate Exception
Receiving a warning from the browser on self-signed certificate.

So at this point, we have a functioning web server over HTTPS. However, unencrypted communications are still enabled. We would like to have ALL users over HTTPS. This can be done by replacing this line in httpd.conf:


Anyone using http://<your_site> will be automatically redirected to https://<your_site>.

Setting Up MariaDB

Installing the database is quite simple in contrast with many other activities we need to do. First, we’ll need to download some packages, so make sure you have the PKG_PATH environment defined with a mirror containing the packages you need. If not, select a mirror on and define your variable:

And as root, install the mariadb-server package:

Once completed, install the database using the included script by typing  mysql_install_db and when completed, start the mysqld daemon:  /etc/rc.d/mysqld -f start . The last step is to configure it by running the  mysql_secure_installation . The script will ask you a couple of questions:

  1. First, it will ask you to set a password for root. Choose a good password. Long simple passwords can be more efficient than short complex one that you won’t remember;
  2. It will then ask if it should remove anonymous users. Select “Y” to remove them;
  3. When asked if it should disallow remote root access, answer by the positive to prevent root access from remote hosts;
  4. Choose to remove all test databases; and
  5. Press “Y” to reload all privileges in the database application.

You are now done with installing the database. Before moving on to the next section, confirm that everything is working by login into MariaDB:

If everything went fine, you will be given access to the database engine. Type  quit; to exit the application.

Setting Up PHP

The last step of this tutorial involve downloading and install PHP. Very few webpages nowadays rely solely on static HTML, and I suspect most will want to install web applications later on, so let’s setup PHP. First, download some of the required packages. Note that additional packages may be needed depending on the web applications you wish to install later on, but for now, let’s setup the core PHP packages:

There are several versions of PHP available on the OpenBSD repository. What is important is that you select the same version for all packages you install. For example, at the time of writing version 5.6.23 and 7.0.8 were available, but php-mysql 7.0.8 was not, thus we select 5.6.23 for all PHP packages to prevent issues later one. Dismiss any packages ending with “-ap2” as these are for the Apache web server. For the purpose of this tutorial, we will select version 5.6.23 every time we are asked.

Before starting up PHP, we have a couple of things to do. First, we need to tell httpd to send PHP pages to the PHP processor. We also need to specify the PHP processor that we have a database it needs to be aware of. So let’s start by modifying our httpd.conf file again by adding a section about .php files. Also, we’ll add a “directory” section to tell the web server to look for “index.php” files first instead of “index.html“:

Next, let’s modify the PHP configuration file to enable MySQL. We do so by adding extensions to the /etc/php5.6.ini file. Open this file as root and add the following lines under the “Dynamic Extensions” section:

Since we modified configuration files, we’ll need to restart the httpd and php-fpm daemons. Do so with  /etc/rc.d/httpd -f restart and  /etc/rc.d/php56-fpm start . Hopefully, you will get “httpd(ok)” and “php56_fpm(ok)”. Otherwise, you may have introduce a typo in your configuration files or some packages may have not downloaded/installed properly.

Wrapping Up

One last thing we will do before calling it quit for today, is to make sure the httpd, php56-fpm and mysqld services are started on bootup of OpenBSD. To do so create a new rc.conf.local file in /etc/ using  vi /etc/rc.conf.local and type the following in it:

At startup, OpenBSD will use this file to initiate the services and the PKG_PATH environment variable. You will not have to use the  -f anymore when restarting the httpd daemon.


So in this post, we have enabled a HTTPS web server, along with a MariaDB and PHP, allowing use to serve dynamic content on a OpenBSD 6.0 machine. At this point, you should be able to host basic dynamic content. However, if you try to install more complex web applications, you will need an extra few steps in many cases. Sometimes, you will need additional packages and extra work to connect to the database via your web application. In the next tutorial, we will install WordPress to show some of the difficulties you may encounter with the chroot jail and file permissions of the web root.

See Also

Reversing the Parrot SkyController Firmware


The Parrot SkyController is a remote control sold for long range flights with the Parrot Bebop drone, sometimes used in the real estate sector. It enables integration of smartphones and tablets for flight control via Wifi connection, allowing transmission of real-time data from the drone. In other words, the SkyController acts as an intermediate between the flight management software (smartphone/tablet) and the actual drone. In this post, we explain how to reverse the firmware of the SkyController to be able o view the contents of the internal operating system and understand its inner workings. A lot of the information provided is based on [1] and is applied on this specific firmware. Code is provided along this post that will extract files and information from the SkyController firmware.

The Parrot SkyController Device
The Parrot SkyController Device (taken from the Parrot Store)


The firmware of the Parrot Skycontroller appears to be a well-defined format which is reused across multiple products from the company. The firmware contains 2 types of structures the header and a sequence of “data entries”. The header is located at the beginning of the file and is followed by multiple entries which contains configuration and file system information. We will look at both structures in the following sections.


The firmware header starts with four magic bytes which spells out “PLF!”. These can therefore be used to identify valid Parrot firmware files. A quick Python snippet is included below as an example.

Immediately after the magic bytes, a sequence of thirteen 32-bit unsigned integers provides additional information about the firmware. Defined as a C structure, the header would look something like this:

The header version specify the type of format used for the firmware header. Within the SkyController firmware, the value for this field was “0x0D”, possibly indicating that there are 14 variations of header formats. The header size provides the number of bytes included in the header while the entry header size specifies the number of bytes in headers of entries within the firmware. The next 5 integers are undefined at the moment, as is the before-last integer. One of them is likely a CRC32 checksum. The dwVersionMajor, dwVersionMinor and dwVersionRevision defines the version of the firmware. Lastly, the header contains the size of the file in bytes.


Each entry has a header of constant size, which is specified within the firmware header by the dwEntrySize value. In the SkyController, each entry is composed of 20 bytes used as follows:


The section type specify the type of data contained within the entry. This can be configuration data, partition details or file information. These will be explained in further details in the sections below. The entry size contains the number of bytes within the entry, without consideration for null-byte padding. This field is followed by a CRC32 value. It’s unclear at this point how this value is calculated. The last known integer is the size of the data once uncompressed. If this value is 0, then the contents are not compressed. Each type of entry have further fields defined, and these are explained below.

Firmware Entries

There are multiple types of entries within the firmware. In this section we will describes the one we came across in the file we have analyzed.

Bootloader and Boot Configuration (0x03, 0x07)

Entry 0x03 contains a binary file which appears to be the bootloader of the system based on strings contained within, however more analysis will be required to understand how it actually works. In the SkyController, a PLF file was observed, but within that PLF file, this entry had binary data. Similarly, entry 0x07 also appears to be related to the boot process as shown by the string “ecos-bootloader-p7-start-246-ge30badf”, “ecos” referring to the eCos operating system.

File System Data (0x09)

There is a  “File System Data” entry for each file contained in the firmware. As such, this entry is the most common one. Depending on whether the file is compressed or not, the structure of this entry is slightly different: when its contents are uncompressed, this entry starts with the filename (or directory name), which is a zero-terminated string. The name is then followed by 3 unsigned integers.

The first integer contains flags specifying the type of file, which can be a directory, a normal file or a symbolic link. This is specified by reading the bits 12 to 15. The 12 other bits contains the permissions of the file. Converted into octal form, the last 12 bits will provide the same format as the one used in Linux. For example:

Extracting File Permissions from the Firmware Entry
Extracting File Permissions from the Firmware Entry

When compressed, the flags are within the Gzip-compressed data. The data must therefore be first uncompressed and then, the same procedure applies: the filename will be a null-terminated string right at the beginning, followed 3 unsigned integers, the first containing the type and permissions. The remaining bytes are the content of the file. In the example below, the filename is system/pulsar/etc/boxinit.hosted. The flags are 0x000081A0 (little-endian) and the contents of the file starts at 0x32. 

Partition Data (0x0B)

The 0x0B entry contains information about partitions on the device.The header of this entry is composed of 10 values. The first 4 integers are version information, the next 5 ones are unknown and the last unsigned integer is the number of partitions defined within the entry.

For each partition defined within the entry, a sub entry is available which contains further information about the partition: the device ID, volume data and the mount point on the target device. The defining structure for this entry are:

Volume type can be either RAW, STATIC and DYNAMIC. A raw partition contains no file systems, for example swap partitions. Static partitions are read-only, such as the boot partition and finally, the dynamic partitions are read-write and contains modifiable files.

Installation Data (0x0C)

This entry was included early in the file and contained another PLF file within its data. This PLF file contained a 0x00 entry, which is unknown, a 0x03 entry and a 0x07 entry. The 0x07 entry contained boot options:


I have upload a Python program [2] which will extract information and files from the SkyController firmware here. It also successfully extracted files for a Bebop2 firmware and I suspect it will work for other recent firmware. With additional work, it may reverse older ones. Repacking an unpacked firmware should be doable, but I do not have any of the device to test it out. Before doing so, determining how the CRC value for the header and each entry is calculate is required. Future work would include testing additional firmware files and implementing the program to repack an unpacked and modified firmware. Finally, reading the eCos documentation may clarify many of the unidentified values.


[1] “PLF File Format”, E/S and I, (accessed 2016-12-13)

[2] Racicot, J. “Vulture”, (accessed 2017-01-03)

Remembering the ‘Stakkato’ Hacks

Philip Gabriel Pettersson, best known by the pseudonym of “Stakkato” can be said to have reached legendary status within the computer security community by his numerous successful breaches of high-level targets between 2003 and 2005. Then a 16 year-old hacker from Uppsala, Sweden, he successfully infiltrated systems of large universities, the United States military, NASA and various companies, forming a worldwide network within which he operated for around 2 years before being caught in 2005 and prosecuted by Swedish authorities. This post revisits the story of Stakkato by reviewing his motivation, techniques and exploits and potentially unearth some lessons learned from these events.

Bored Teenagers

Uppsala is the fourth largest city of Sweden and is situated around 70km north of the capital. In 2003, one of its curious and smart teenager went on to challenge himself by exploring – illegally – the digital environment surrounding the city. Some of us might remember the old definition of a “hacker”, as defined by The Mentor’s manifesto [1]. Back in 2003, owning a computer was still not totally commonplace, although it was a lot more than it in 1995. Only teenagers with a certain sense of interest and curiosity about technology would consider spending most of their time on their machines. In my corner of the world, in the 90s, computer science classes were nothing more than learning to type, using word processors and creating spreadsheets. I am sure I was not the only one in the same situation and some readers may remember the frustration of not being able to pursue their hobby in depth while in school. So we spent most the classes programming VBA games or spamming other students using WinPopup to have them call out the teacher, who would struggle to explain the innocuous messages on the screen. Only at night could we connect to the net, login into our favorite BBS, IRC channels or forums to finally learn more. Virtualization was not a thing back in the early 2000s, internet connections were still slow and owning more than 1 computer was a luxury most couldn’t afford. A solution was dumpster diving around computer shops – which were aplenty compared to nowadays – or browsing eBay for scraps. Another one was to poke around systems connected to the internet. Universities were of course perfect targets – opened, poorly secured (in order to be opened) and rich with systems, software and data.

Why am I rambling about the past? Because in many ways, Stakkato may have been the same teenager than many of us were back then, but his cockiness eventually got the better of him and caused his demise. Some even proposed that by 2005, he may have attempted to venture into criminal activities by selling stolen intellectual property. In any case, let’s explore briefly his story, because I believe many who now heads IT security companies, or experts and researchers in the field all shared the same starting point, but fortunately took a different path at some point.

The Stakkato Hacks

The first suspicions of wrongdoing were noticed in 2004. Berkeley researcher Wren Montgomery started receiving email from Stakkato [2], claiming that not only did he infiltrated her university, but that he also accessed the network of White Sands Missile Range in New Mexico, stole F-18 blueprints from Patuxent River Naval Air Station and infiltrated NASA’s Jet Propulsion Laboratory (JPL) – which to be honest, have been hacked by many in the past decade [3][4][5], almost making it an initial test for debuting hackers. These claims were later confirmed by spokesmen from both organizations. They however downplayed the importance of these breaches, claiming that there were low-level breaches and that only weather information was exfiltrated. Later during the year, several laboratories harboring supercomputers connected via the high-speed network TeraGrid reported breaches. However it was only in 2005, with the intrusion in networking company Cisco Systems, that would trigger alerts from authorities and proved to be a bridge too far. Having established a foothold within Cisco, Stakkato was able to locate and download around 800MB of source code of the Internetwork Operating System (IOS) version 12.3 [6]. IOS runs on every Cisco routers and other networking devices which are often key network component of not only large commercial and governmental organizations, but also of the worldwide telecommunication infrastructure. Samples of the code was released on IRC as proof and reported by a Russian security site. The theft of the code caused a stir, many believing that individuals or groups would comb the code and craft zero-day exploits that could be leveraged on critical systems.

This activity would prove the last Stakkato and his team would be able to brag about as the Federal Bureau of Investigation (FBI) and the Swedish authorities started to investigate the leaks. In 2007, he was convicted for breaching networks Swedish universities and paid 25,000$USD in damages. He was further interviewed by U.S. officials [7] and in May 2009, he was formally inducted in California for intrusions in Cisco Systems, NASA’s Ames Research Center and NASA’s Advanced Supercomputing Division [8]. In 2010 his prosecution was transferred to the Swedish authorities.

The Tactics

The core strategy of Stakkato revolved around a trojanized SSH client he uploaded to systems he compromised. The malicious client would be used to intercept users’ credentials and send them to a third location where Stakkato and his group would retrieve them to access additional systems. Once accessed, Linux kernel exploits were used for privilege escalation on the local system and then repeated their main tactic, creating privileged accounts and eventually building a wide network of proxies to launch their attacks. The attack on the National Supercomputer Centre [9] provides insight on the tactics and size of the compromises. The methodology used was not innovative by any mean, but was applied effectively and certainly leveraged human errors to its full extend. The process can be summarized as follow:

  1. Infiltrate a system via a kernel vulnerability or stolen credentials;
  2. Disable command history, e.g prevent the system from logging your commands;
  3. Attempt privilege escalation;
  4. Setup trojanized SSH clients, backdoors and rootkits;
  5. Extract known hosts from current machine;
  6. Attempt to infiltrate extracted hosts as per step 1.

The analysts of the NSC documented logins from universities the United States, Israel and Sweden and referenced the SuckIt rootkit [10] as being installed on one of the target machine. Unfortunately for the administrators, the rootkit was discovered only after a new root password was assigned to all machines, allowing the attackers to re-infiltrate the newly cleared systems. However this time the Swedish teenager was a lot less subtle and vandalized the systems by attempting a web defacement and modifying logon messages. This time the IT specialists took down the network, inspected and reconfigured every machine before putting the system back online. Despite the defensive operation, recurring login attempts and smaller-scale compromised originating from more than 50 compromised organizations were noted between 2003 and 2005.

Lessons Learned

This story follows the same pattern observed throughout the ages, such as sprawling empires from ancient times in which the rulers’ overconfidence led them to bankruptcy, or growing organizations that stretched into markets that proved more difficult than expected. Stakkato’s network of compromised systems grew too large, he became overconfident and tempted the sleeping bears. In other words, patience may have led him to a very different path. Or maybe his arrest was for the best afterall: there is little news about him past 2010, but coincidently there is a security researcher working in Samsun bearing the same name and credited multiple vulnerabilities in the Linux kernel [11][12]. While I have no idea if this is the same individual, I would be glad to hear that he now uses his skills fruitfully.

Arguably another lesson is how simple tricks can still work if applied efficiently. All things considered, security hasn’t changed dramatically within the past 10-15 years: it has evolved, but in the end, we still rely on usernames and passwords, users’ awareness and administrators properly maintaining their networks and hosts. Humans using these systems haven’t changed much either; we will take the simplest approach to achieve our goals. Hence we select the easiest password passing the complexity filters in place and reuse it [13] so we don’t have to remember 100 variations of the same password. Large database compromises in the past few years appears to prove this behavior. We could have many passwords and store them in password managers, but then the password managers can still be trojanized or exploited [14], allowing similar tactics used by Stakkato. Eventually most people would probably not bother to execute an additional program to retrieve their password in order to login in the service they need; it simply adds an additional step.


Studying the past of computer security is sometimes quickly dismissed, often seen as irrelevant given the change in technologies, but one can easily find inspiration in the stories of hackers, malware writers and the analysts that battled to gain and maintain control of systems. Much like studying the battles of Alexander the Great or Patton, there is much to be learned from studying the techniques used and wargaming their applications in modern organizations. Would the current administrators blindly enter their passwords if a windows suddenly popped up requesting their credential for some update? Users still get fooled by fake login web pages [15] and end up with their bank accounts plundered or their Twitter account spewing nonsense to all their followers. It still works.

Obligatory XKCD


[1]    “Phrack Magazine” [Online]. Available: [Accessed: 05-Nov-2016].

[2]    J. M. L. Bergman, “Internet Attack Called Broad and Long Lasting by Investigators,” The New York Times, 10-May-2005. [Online]. Available: [Accessed: 02-Nov-2016].

[3]    K. Zetter, “Report: Hackers Seized Control of Computers in NASA’s Jet Propulsion Lab,” WIRED, 01-Mar-2012. [Online]. Available: [Accessed: 04-Nov-2016].

[4]    “Hacker Sentenced in New York City for Hacking into Two NASA Jet Propulsion Lab Computers Located in Pasadena, California (September 5, 2001).” [Online]. Available: [Accessed: 04-Nov-2016].

[5]    “Hackers penetrated NASA computers 13 times last year,” USATODAY.COM, 02-Mar-2012. [Online]. Available: [Accessed: 04-Nov-2016].

[6]    “Sweden to prosecute alleged Cisco, NASA hacker.” [Online]. Available: [Accessed: 04-Nov-2016].

[7]    D. Kravets, “Swede Indicted for NASA, Cisco Hacks,” WIRED, 05-May-2009. [Online]. Available: [Accessed: 03-Nov-2016].

[8]    United States of America v. Philip Gabriel Pettersson aka “Stakkato.” 2009.

[9]    L. Nixon, “The Stakkato Intrusions: What happened and what have we learned?,” presented at the CCGrid06, Singapore, Singapore, 17-May-2006.

[10]    D. Sd, “Linux on-the-fly kernel patching wihtout LKM,” Phrack, no. 58, Dec. 2001.

[11]    P. Pettersson, “oss-sec: CVE-2015-1328: incorrect permission checks in overlayfs, ubuntu local root.” [Online]. Available: [Accessed: 05-Nov-2016].

[12]    “Linux Kernel ’crypto/asymmetric_keys/public_key.c ‘ Local Denial of Service Vulnerability.” [Online]. Available: [Accessed: 05-Nov-2016].

[13]    T. Spring and M. Mimoso, “No Simple Fix for Password Reuse,” Threatpost | The first stop for security news, 08-Jun-2016. [Online]. Available: [Accessed: 04-Nov-2016].

[14]    “How I made LastPass give me all your passwords.” [Online]. Available: [Accessed: 05-Nov-2016].

[15]    Bursztein, Elie, Borbala Benko, Daniel Margolis, Tadek Pietraszek, Andy Archer, Allan Aquino, Andreas Pitsillidis, and Stefan Savage, “Handcrafted fraud and extortion: Manual account hijacking in the wild,” in Proceedings of the 2014 Conference on Internet Measurement Conference, Vancouver, Canada, 2014, pp. 347–358.

The Byte-BOS Real-Time Multitasking Operating System

Compared to operating systems in general computing, the world of embedded devices remains a world to be discovered by security analysts and hackers alike, and it offers much to explore. While reverse engineering of many newer appliances is revealing well known operating systems such as Windows or Linux, others remains fairly unexplored, such as TinyOS or μOS. Legacy devices, still in used across many industries, also provide a rich variety of unknown software and architectures. One of them is the Byte-BOS Real-Time Multitasking Operating System (RTMOS).


Compared to operating systems in general computing, the world of embedded devices remains a world to be discovered by security analysts and hackers alike, and it offers much to explore. While reverse engineering of many newer appliances is revealing well known operating systems such as Windows or Linux, others remain fairly unexplored, such as TinyOS or μOS. Legacy devices, still in use across many industries, also provide a rich variety of unknown software and architectures. One of them is the Byte-BOS Real-Time Multitasking Operating System (RTMOS). This short article attempts, based on limited information, to detail this little known system, explore its history, usage and basic composition.


The Byte-BOS RTMOS was initially developed in 1994 by Numerical Services Inc. The company, defunct since 2004, sold the full C source code of the RTMOS to customers for 7495 $USD. Possession of the source code allowed the buyer full customization of the operating system according to the device being designed. The system supported a restricted set of microcontrollers (see table 1) including the Intel x86 and Motorola M68000. Other lesser known microprocessors supported included architectures from Hitachi, Mitsubishi and Texas Instruments, which were used in embedded devices within the medical, industrial and telecommunications sectors. Due to the long life-cycle of these devices, it may still be possible to identify devices leveraging the Byte-BOS RTMOS. Development and support for the RTMOS ceased around 2004. As such, documentation is very scarce, and remnants of information are available only via the Internet Archive [1]. At the time of writing, the domain “” resolves to “Scheumacher Engineering”, owned by the original developer of Byte-BOS. However the website is nothing more than a single page which was last updated in 2013.

Table 1 – Microprocessors supported by the Byte-BOS RTMOS.
Intel 80×86 Coldfire Mitsubishi M16C Texas Instruments
C2x/C5x DSP
Intel 80188/86 Motorola 68000 Mitsubishi M32D Texas Instruments
C3x/C4x DSP
Intel 80×86 (32bit) Motorola 68HC11 Hitachi H8300 ARM/THUMB
Intel 8096 Motorola 68HC16 Hitachi H8300H
Intel i960 Mitsubishi
Hitachi SHX


The Byte-BOS RTMOS is a minimal operating system providing task scheduling management for user applications, along with typical OS activities such as interprocess messaging, memory allocation and prioritization. It performs pre-emptive and non-pre-emptive scheduling of an unlimited number of queued tasks. The system enforces scheduling via interrupt service routines (ISRs) which can suspend and/or resume tasks based on events and their priority. When multiple tasks with a similar priority are requesting resources, the RTMOS assign the resources via a round-robin selection method.

As other operating systems, memory allocation appears to follow a standard process by keeping track of used and free memory chunks via a heap implemented by a double-linked list, which is the data structure used across the RTMOS to store unlimited numbers of objects. Tasks can dynamically allocate and free memory as needed. It also manages events and interaction with devices on behalf of user applications by abstracting the underlying hardware via a Device object (see figure 1). Byte-BOS also provides inter-tasks communications mechanisms for synchronization amongst tasks such as messages and semaphores.  In terms of data structures the double linked list and buffer objects appear to be the main structures used across the RTOS. Memory chunks, tasks and most of the other objects are all managed via the List object.  It also supports the queue structure used to manage tasks. A software stack is also provided and used by tasks to run.

Security-wise, Byte-BOS does not have any authentication or specific security methods for access to memory or special “kernel” functions; all tasks have the same level of authority than the RTOS. Basically, Byte-BOS is completely flat.


The RTMOS is written in C or specific variety of C specification depending on the targeted microprocessor. The 10 main components of the operating systems are represented in figure 1 and provide an excellent overview of the functions accomplished. A simulated “this” pointer to the various “struct”s is made available to the developer to give an Oriented-Object Programming (OOP) feel. This section describes briefly the main components of Byte-BOS, along with some of the features marketed by the original developer.

UML Representation of the different objectsfooun in the Byte-BOS RTMOS
Figure 1 – UML Representation of the different objects defined in the Byte-BOS RTMOS

The Task Object

The main object of the RTMOS is the Task object, which is very similar to the similar concept modern software development. Just as in any C/C++ object, a constructor is initiated prior to executing the body of the task, which is defined by the developer to conduct a specific task. Data of the task is allocated and accessible via the this_data pointer. Tasks also allocate sub-objects, opens required devices, initialize variables and synchronize state machines. Once constructed, the task runs its main function until it returns or is removed. Access to a task is done via the this_task pointer. Both pointers described are always in scope and usable within the task. Once the main task completes, the destructor of the structure is called, which frees memory and pointers of sub-objects. It also terminates any internal state machine.

The Memory Object

This object manages the memory heap and stores the pointer to the heap, the block size and the number of memory blocks. It also exposes various functions for memory management, mainly alloc_fmem and free_fmem. The heap is implemented using a double-linked list in which each node contains metadata about the memory blocks. While details are not available, it can be assumed that the size of the block, its status (free or allocated) and a pointer to the next block is included.

The Semaphore Object

As its name implies, the Semaphore is used for signaling and control between multiple tasks by either being in the Up or Down status. A timeout can be defined as needed.

The MessageBox Object

The MessageBox acts as a central repository for tasks to exchange data. Tasks and ISRs can create messages to initiate other tasks or exchange . The MessageBox basically provides two functions: put and get. When a new message is created, memory is allocated. Similarly, the memory is freed when retrieved from the box.

The Timer Object

Similar to any standard timer in other system, this object is used to create timeouts and schedules of tasks and events.

The Event Object

Events are created by tasks and ISRs to schedule other tasks.

The Device Object

This item abstracts interaction between the user application and the hardware device. It does so by pointing to the memory blocks used by the device and managing input/output to the area.

The List, Queue and Buffer Objects

These objects, as their name implies, are implementation of a linked list, a queue and buffer data structures.


Byte-BOS boasted the following additional features on their website from 2000, providing some extra insight on the internals of the system.

Interrupt Handling

ISRs can make operating systems calls and schedule tasks by including the ENTER_ISR and EXIT_ISR macros at the beginning and end of the routine. Most services are available to ISRs, giving them considerable access over the entire OS.

Critical Section Handling

Byte-BOS provides critical section handling by protecting non-reentrant code sections from both interrupt and task pre-emption. Protection of critical sections appears to be done via the following mechanisms:

  • Disabling interrupts
  • Locking and unlocking code sections
  • Prioritization of tasks

Task Execution Trace

Tasks are provided with a trace buffer configurable at both compile time and run time. The trace buffer contains information about the sequence of calls, the parameters passed, and the return value of the called function. The tracing option is extremely useful for debugging purposes.

System Requirements

The Byte-BOS RTMOS requires very little in terms of resources. While the requirements varies depending on the underlying processor, the size of the kernel is a few kilobytes, while the RAM requirements are less than 100 bytes of globally accessible memory and less than 100 bytes per task and ISR. The detail figures are listed in table 2.

Table 2 – Memory requirements for the Byte-BOS RTMOS according to microprocessor
Microprocessor Minimum Kernel
Size (KB)
Maximum Kernel
Size (KB)
RAM Requirement
for Global Variables
RAM Requirement
per Task (bytes)
Intel 80×86* 4 10 28 30
Intel 80×86
6 15 48 58
Intel 80188/86* 2 10 28 30
Intel 8096 2 8 20 20
Motorola 68000 1.5 12 50 70
Motorola 68HC11 1.2 8 28 20
Motorola 68HC16 2 8 28 20
Mitsubishi M37700 2 10 28 20
Mitsubishi M16C 2 12 64 48
Mitsubishi M32D N/A N/A N/A N/A
Hitachi H8300 1.2 8 28 20
Hitachi H8300H 2 10 38 40
Hitachi SHX 7.8 19.5 72 60
Texas Instruments
C2x/C5x DSP
1.2 8 28 20
Texas Instruments
C3x/C4x DSP
2.3 19.5 28 words 20 words
ARM/THUMB 4 15 50  70

* For Intel-based architecture, the memory footprint reported above is based on usage of the Borland C++ compiler.

Additional volatile memory is required for objects created dynamically. As such, the numbers provided above do not represent the total amount of memory consumed by Byte-BOS.

Compilers Supported

In general, Byte-BOS supports the development tools and compilers provided by the semi-conductor manufacturer. For Intel-based processors, the Borland C/C++ compiler was supported. The complete list of supported compilers is provided in table 3.

Table 3 – Supported compilers by the Byte-BOS RTMOS.
Microprocessor Supported Compilers
Intel 80×86 Microsoft C/C++ (large model)
Borland C/C++
Watcom C/C++
Intel 80×86(32bit) Watcom, Metaware and most of C compilers
Intel 80188/86 Microsoft C/C++ (large model)
Borland C/C++
Intel 8096 IAR
BSO Tasking Compiler
Motorola 68000 Cross-Code
Motorola 68HC11 IAR
Motorola 68HC16 Introl
Mitsubishi M37700 IAR
Mitsubishi M16C IAR
Mitsubishi Compiler
Mitsubishi M32D N/A
Hitachi H8300 IAR
Hitachi H8300H IAR
Hitachi SHX GNU
Green Hills
Texas Instruments
C2x/C5x DSP
Code Composer /
Texas Instrument Compiler
Texas Instruments
C3x/C4x DSP
Code Composer /
Texas Instrument Compiler


Devices using the Byte-BOS RTMOS can be identified by looking at ASCII and Unicode strings in their firmware. In almost all cases, if not all, the firmware will contain a copyright notice identifying it as well as the target microcontroller it was compiled for. For example, by extracting the strings from the firmware for the Seagate Cheetah 10K.6 Disc Drive, either using the simple strings.exe (or strings in Linux) or IDA, the copyright notice string “Byte-BOS 8096 Multitasking Operating System — Copyright 1990 Numerical Services Inc.” can be observed (see figure 2).

Byte-BOS Copyright Notice into the Firmware of the Seagate 10K.6 Disc Drive
Figure 2 – Byte-BOS Copyright Notice into the Firmware of the Seagate 10K.6 Disc Drive

While appliances posterior to 2004 will likely use modern RTMOS such as Linux , Byte-BOS can still be found on legacy devices. One such example is the Baxter IPump pain management medical device, which refers to its usage in the manual [2]. Of note, the manual refers to the detection of stack overflows within the Byte-BOS RTMOS. Industrial controls, aircraft, telecommunications systems and other systems designed prior to the 2000s may still harbour this little known RTMOS.


The Byte-BOS RTMOS will likely disappear as legacy embedded devices are life-cycled for newer systems with more processing power and extended memory. Until that moment, for developers of these systems, or simple hobbyists, information remains scarce and limited. While we provided a brief overview of its architecture and features, details about creating user applications remains obscured by the lack of online information such as an API description, code examples or more importantly, the source code. Such data would greatly ease development and engineering efforts of legacy systems.


[1] “Bytebos Home Page.” Bytebos Home Page. Internet Archive, 20 Oct. 2000. Web. 01 Nov. 2015. <>.

[2] Baxter IPump Pain Management System – Service Manual, Baxter Healthcare Corporation, Chapter 4 – Troubleshooting, p.4-8. 2007. Web. 01 Nov. 2015. <Link>

Repost: Stack-based Buffer Overflow Vulnerabilities in Embedded Systems

The buffer overflow attack vector is well documented in desktop and server class machines using the von Neumann memory model, but little has been published on buffer overflow vulnerabilities in Harvard architectures.

I have not written or contributed to the enclosed research paper. I’m simply reposting it here because it’s interesting and for some reason, appears available only via Google cache. So before it disappear from results, I’m reposting it here.

This paper discusses a technique to conduct buffer overflows on processors using the Harvard architecture. In this architecture, the stack starts at the beginning of the memory and grows up, versus Von Neumann architectures in which it grows down.


Most small embedded devices are built on Harvard class microprocessor architectures
that are tasked with controlling physical events and, subsequently, critical infrastructures. The Harvard architecture separates data and program memory into independent address spaces, as opposed to the von Neumann architecture that uses a unified memory system with a single address space for both data and program code. The buffer overflow attack vector is well documented in desktop and server class machines using the von Neumann memory model, but little has been published on buffer overflow vulnerabilities in Harvard architectures. In this paper we show that stack-based buffer overflow vulnerabilities exist in embedded control devices based on the Harvard class architecture. We demonstrate how a reversal of stack growth direction can greatly simplify the attack and allow for easier access to critical execution controls. We also examine popular defense techniques employed in server and desktop environments and the applicability of those defenses toward Harvard class machines.
Link: Kristopher Watts & Paul Oman, University of Idaho, Stack-based Buffer Overflow Vulnerabilities in Embedded Systems

#TheGreatFTPHunt – 2% to 9% of files scanned potentially containing confidential information

In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. We also present the latest statistics collected in our database.


In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. Based on the current result, 2 to 9% of the 3000 files reviewed were sensitive or potentially sensitive. Most of the sensitive files are concentrated on a few hosts. These files often include financial information or project data from businesses. So far, 773 hosts containing around 4.5 million files have been scanned.


The amount of filenames collected is quite large and we cannot evaluate manually each filename for its probable sensitivity. As such, we need to devise a procedure to automatically assess its sensitivity. We have some definitions and restrictions to list first to clarify what a sensitive file is and limitation to our evaluation criteria.

In this document, sensitive file refers to user-generated or software-generated files based on user input that contains information that should probably not be publicly accessible and which can be leveraged against an individual or organization. This includes:

  • Personal identification documents; passport, driver’s license, visas, government forms…
  • Personal finance documents; income tax files, insurance forms, credit card statements, mortgage, pay stubs, banking information
  • Personal medical documents; prescriptions, medical records
  • Work-related files; emails, proprietary source code, password lists
  • Business finances; customer lists, sales data, project costs, business deals, investments, payrolls
  • Intellectual property; blueprints, schema, patents, research
  • Network configuration; passwords files, configurations files, network diagrams, user databases
  • Large databases of emails, addresses and other personal information.

Some of the files not included in our analysis that includes;

  • Copyrighted / Illegally downloaded files. However we considered text file containing licensing keys to be sensitive.
  • Inappropriate contents (nude selfies, personal politics, group affiliations etc…)
  • Personal pictures, letters.
  • Addresses and emails were not considered personal, however databases of addresses and emails are considered sensitive

Because of the volume, we cannot download and manually verify each file to confirm its contents, as such our main restriction is that our assessment must be done solely based on the absolute filename recorded. As such, to evaluate the sensitivity, we used three categories; positive, negative and neutral, i.e. either a file is very likely to sensitive, potentially sensitive or clearly not sensitive at all. Of course, there is always a possibility that a file labeled as sensitive may not be. For example, a file called social security numbers.xls may contain only formulas or an empty form. Ideally, files identified as positive or neutral should be manually vetted.

The procedure to automatically assess the sensitivity of a file based on its path and name is first done by assessing a random sample manually. Using the ORDER BY RANDOM (note: there will be a need to review if this function is truly random, which I doubt) function (performance is not an issue in this experiment) of the Postgresql database, multiple  random samples of 100 filenames are retrieved from the database. Each file is shown to the evaluator which based on the path, filename and extension assess the sensitivity of the file as ‘positive‘, ‘neutral‘, ‘negative‘. For each run, we log the count of hits for all categories.

Listing 1 : Example of a run in which a script asks an evaluator to assess the sensitivity of files based on its absolute path.

The evaluator is assessing the filename based on keywords that may indicate the contents of the file. As such, a file containing the word, or as we call it in this document, a token such as sales, passport or passwords will be assume to contain information about sales, a passport scan or a list of passwords. In many cases, the filename is too obscure, but the path and extension may indicate the contents of the file. For example, a path containing the tokens project, finances and a Microsoft Excel extension despite a filename of axe189212_c.xls will be considered as neutral, as the file may contents information about a project. Examples of both scenarios are shown in listings 2 and 3:

Listing 2 : Examples of files that were deemed ‘positive’ hits based on keywords in their absolute path.

Listing 3 : Examples of files that were deemed ‘neutral’ (or ‘unknown’) hits based on keywords in their absolute path.

Filenames in foreign languages are roughly translated using Google Translate, as such, many of them are labeled as unsure.

A Python script then divide the filename in tokens, and each token is stored in the database along with the number of times it was found in a positive, neutral and negative hit. Tokens are created slightly differently based if they are located in the path, the filename or in the extension. For the extension, a single token is created which contains the extension itself. If the file does not have an extension or is not an extension usually associated with known software, no token is created. For the filename, tokens are created by splitting each word using characters usually known to separate words such as the underscore, dash, period or spaces. Lastly, for the path, directories are used as token and unlike filenames, are not split further. An example of this process is shown in listing 4:

Listing 4 : Example of the tokenization of a filename.

Once the tokens are created, the script will either add the token in the database or update its count based on the evaluator choice. After each update, a score is given to the token, which is simply the ratio between positive hits and the total count of appearances: p / hits). Note that tokens are considered different depending their location in the filename. As such, a filename such as /My_Passport/backup/Outlook emails backup.pst, will generate 2 distinct ‘backup’  tokens; the one from the path and the one from the filename. We explain this decision in the next paragraphs.

Listing 5 : Scores of the tokens extracted from the file in listing 4.

By using this procedure, we believe that tokens appearing often in both positive and negative hits will cancel each other, while the tokens strongly associated with positive and negative hits will remain clearly divided. Some sort of mathematical should follow later one (I hope…need to review discrete maths I guess). Some preliminary results  appears to confirm this approach as valid. Extensions strongly associated with sensitive contains higher scores while media files have null scores.

However, there is a need to further refine this process by associating a value, or weight, to the location of the token. Tokens in the path are not as indicative of the sensitivity of the file as much as a token in the filename or extension. Even within the path, the highest level is generally less indicative than the lowest one, i.e. /documents/finances 2012/sales/company sales.xls. Therefore when assessing a new filename, we need to give a score to the path, the filename and the extension. For the path, we will get the score of each token and multiply it with a weight that correspond to its location in the structure. For token that are not found the default value of 0 will be given. Then we will take the average of all token for the score of the path. As for the filename, we will not consider the position. Finally the stored score of the extension will be retrieved from the database. If the extension is not found, then a score of 0 will be used. This will transform a filename into a set of three real values which we can range between 0 and 1. To determine the weights needed for each location, we will used a supervised neural network. More research will be conducted to determine how to use this approach.


As of 16 July 2015, 4,568,738 files have been recorded from 773 hosts.

Country Hosts
United States 258
Russian Federation 91
Sweden 69
Canada 66
Ukraine 27
Norway 24
United Kingdom 24
Australia 19
Netherlands 18
Hong 18
Taiwan 16
Poland 15
Germany 11
Romania 11
Finland 10
Switzerland 8
Korea 8
Singapore 7
Czech Republic 7
Japan 6
Table 1. Location of the 773 hosts scanned as of 16 July 2015 order by country.

Mp3 and JPEG image files remains the most common. As such, we focus our statistics on document-type of files for a change, i.e. Office documents. Adobe PDF files and Microsoft Word documents are the most common file types based on our current data as shown in figure 1.

Most common file types scanned as of 16 July 2015 for office-related documents
Figure 1. Most common file types scanned as of 16 July 2015 for office-related documents

At the moment, around 3000 files have been assessed (30 runs of 100 samples). For each run, we recorded the number of positives, neutral and negative hits and found them overall constant at each run. (see figure 2) However more details about the RANDOM function is needed to insure the randomness of the sample. This part may need to be redone. So far, between 2% and 9% of files scanned are considered sensitive or potentially sensitive (see figure 3). However we need to consider the concentration of these files to put this information into perspective. The 278 files identified as sensitive or potentially sensitive were located on 59 hosts, with one host accounting for 101 of these file. This indicates that files of interests for an attacker are likely to be concentrated on a few hosts.

Chart of assessed sensitivity of randomly selected 30 samples of 100 filenames.
Figure 2. Assessed sensitivity of randomly selected 30 samples of 100 filenames.
Chart of percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.
Figure 3. Percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.

As for tokens, we will have to consider the entire collection of filenames in order to have sample from multiple sources, as such, we will pursue manually assessing samples of 100 filenames as more data is collected. After which we should have an excellent training set for the neural network. Some high-recurring and high-scoring tokens are shown in tables 2 and 3.

Token Hits Score
attach 7 0.9285714286
txn 7 0.9285714286
planning 6 0.9166666667
archived 6 1
recpt 6 0.9166666667
2010taxreturns 5 1
person~2 4 1
purchase 3 1
order 2 1
Паспорт 2 1
Table 2. Sample of high-scoring tokens sorted on the number of times observed.
Token Hits Score
jpg 938 0.013326226
mp3 460 0
music 448 0
seagate_backup_plus_drive 382 0.1452879581
asusware 348 0
pictures 309 0.0048543689
sda1 285 0.0649122807
bigdaddy 279 0
elements 278 0.0485611511
transcend 247 0.0222672065
my_book 234 0.0106837607
Table 3. Sample of high-recurring tokens sorted on the number of times observed.


While these results are preliminary, they nevertheless seems to provide a solid indication of what one can find on publicly-available removable drives. Additional work and fine tuning of both code and processes is required to provide more accurate data and the next step while the scan is still on going it to develop a methodology to assess the sensitivity of all files, likely using a neural network for classification based on the method presented above.

Removing Debugging Information from Visual C++/C# Projects

It’s often surprising how many malware programmers forget to do the simplest things. Mostly because many are so concerned with functionality, stealthiness and other production concerns, that details slip easily of their minds – a clear advantage to forensics. One of these details is the Program DataBase (PDB) information added by Visual Studio, which most malware authors used for Windows development. While it may seem innocuous, this string reveals a lot about the operating system used by the author, its user name and most notably, symbols that can be used by IDA and ease understanding of the disassembly.


It’s often surprising how many malware programmers forget to do the simplest things. Mostly because many are so concerned with functionality, stealthiness and other production concerns, that details slip easily of their minds – a clear advantage to forensics. One of these details is the Program DataBase (PDB) information added by Visual Studio, which most malware authors used for Windows development. While it may seem innocuous, this string reveals a lot about the operating system used by the author, its user name and most notably, symbols that can be used by IDA and ease understanding of the disassembly. This information allows to link multiple pieces of malware together, by using the username for example. Of course, this also allows for the creation of signatures. Thus, removing this information will add a hurdle to the analysts.


The Program Database File

The Program Database (PDB) is a binary file used to store debugging information about DLL and EXE files. The PDB file is created when you build your project and stores a list of symbols  their addresses along with the name of the file and the line number on which the symbol was declared. PDB files is also used for services collecting crash data to send it to developers for resolution.

Debugging Information

In Visual Studio, you can select to build your project in Debug or Release mode. In Debug mode, VS will include debugging information with your executable. In Release mode, no debug information is included by default, but in some cases is enabled so that if the program crashes, information can be retrieved and sent to the author for fixing. However for some reasons, some developers don’t really bother to use the Release mode, and simply use the executable generated by the Debug mode. Generally, you don’t want that if you are making malware (or any program really!). If left within the executable, a path to the PDB file will be included and can be extracted:

Path to the PDB file
Path to the Program Database (PDB) file used by Visual Studio for debugging purposes, extracted using the “strings” program.

Within the strings, you can determine that:

  1. The program was developed on Windows 7+ (because of C:\Users folder),
  2. The username of the developer is SUPPORT_23e45RT
  3. The source, or part of it, can be found on Github
  4. The original name of the program is CaitSithTest

These indicators can be useful to link this specific program with others and provide a common link between multiple malware. Additionally, the username could potentially be used to conduct open source research and find linked accounts or forum posts. But wait, there’s more…

If you leave the debugging information, you may be able to restore all the original names of variables and functions of the source code using IDA. IDA will first detect debugging information and ask the analyst if he wants to retrieve it, either via Microsoft – (not browseable)- or by looking locally.

IDA detected that debugging information is available and ask if the user wishes to retrieve it.
IDA detected that debugging information is available and ask if the user wishes to retrieve it.

If for some reason, the user is able to retrieve the information, he will have access to the names of the original symbols, which will make reverse engineering much more easier.

Since symbol information is available, the original names of the variables are displayed.
Since symbol information is available, the original names of the variables are displayed.

Compare the information from the figure above to the figure below, in which debugging information has been stripped at build time:

IDA could not find any debugging information and thus used its own labelling system to identify variables.
IDA could not find any debugging information and thus used its own labelling system to identify variables.

You can see that the variables defined in the first figure, such as ClipboardData, isProcessElevated and isDebugged have been preserved. By keeping information about the symbols, reverse engineering is much more easier compared to figure 2, in which information about the code is lost.

Disabling Debugging Information

To prevent VC from including this information in your executable, right click on your project, go to Project Properties > Linker > Debugging configuration menu. Select No in the Generate Debug Info option.

Removing debugging information in Visual Studio.
Removing debugging information in Visual Studio.

After doing this, rebuild your project and rerun the string extraction program against your binary, the path to the PDB file should not be present in the executable anymore.

The path to the PDB file is not included in the executable once debugging information has been omitted.

Doing so makes it a bit more difficult to fingerprint the malware and hides information about the author’s system.


This is a simple tactic that is often omitted not only by malware author, but penetration testers, which are often Google programmers, i.e. copy-pasting code snippets from Stack Overflow or googling functions 😉 If you attempt to hide your malware into the System32 folder, looking for this information in the EXE or DLL files will quickly tell you which files are bad, since legitimate files will rarely have this info, or have legitimate looking one. As such, if you want to make sure, create a legitimate-Microsoft-looking user (Bill.Gates) on your machine and put your code into a Microsoft-looking project and path (C:\users\Bill Gates\Documents\HTA\Release\).

Sending mail directly using SMTP

Since this is another thing I keep forgetting, I’ll just note it here. To send send emails by connecting directly to the SMTP port of a mail server with telnet, use these commands;

  1. telnet 25
  2. HELO
  4. RCPT TO:
  5. DATA
  6. Subject:
  7. Cc:
  8. Reply-To:
  9. <message>
  10. MIME-Version: 1.0
  11. Content-Type: <content>; boundary=”<bondary>”
  12. Base64 encoded attachment
  13. <CR-LF>.<CR-LF>
  14. Quit

Since I’ll likely forget again, I’ve script some Python for it:

Starting in Exploit Development – Day 04

Today, instead of following the FuzzySecurity tutorial, I’ve decided to solidify what I have learned so far by exploiting another FTP Server, this way we won’t yet stray far from the tutorial. We’ll exploit the PCMAN FTP 2.07 server.

The exploit is a buffer overflow in about any command send to the FTP server. We’ll attempt to exploit the STOR command. To do so, we basically reconstruct the Python script we’ve used in day 1:

Note that we are using a buffer of 3000 bytes. I’ve first attempted a payload size of 2000, but it failed to crash the server. At 3000, it was successful as you can see below:

Buffer Overflow in PCMAN FTP 2.07
We successfully smashed EIP with a payload of 3000 bytes in the STOR command.

Let’s replace our payload by a Metasploit pattern to find the offsets using !mona findmsp:

Mona showing at which offset EIP is overwritten
Mona found that EIP is being overwritten at offset 2006

Also interesting, is that SEH is not being overwritten here, so we cannot use the technique learned yesterday. The offset found, we can now start shaping our payload:

And we’ll test it to confirm everything is going smoothly:

EIP overwritten with "B"s
Our payload works, now we simply have to put the addresses and shell code needed

Ah ! Perfecto ! Now let’s figure out an address we can use to jump at [ESP]. We’ll do this using !mona jmp -r esp:

Search results for "jmp esp" in PCMAN 2.07
Search results for “jmp esp” instructions in memory for PCMAN FTP Server 2.07

Ideally, I would have like to find a “jmp esp” within the application itself, but all of them contained invalid bytes, so I’ll just use one from the Windows DLLs:

We’ll use the same payload as before, i.e. the windows\shell_bind_tcp as we are only interested in training purposes, so our final code will look like this:

And voila! I sometimes runs into issue when running the shell code on the target machine and it seems due to bad bytes in the shell, so this is something I’ll need to check out, i.e. how to determine which bytes should be avoided in the shell code. I usually fix it by regenerating a new payload in Metasploit. In any case, we have out shell:

Listening on port 4444
The exploit binded a shell on port 4444
Remote Shell from Exploiting PCMAN FTP 2.07
We successfully open a remote shell from the exploit in PCMAN FTP 2.07

All right, so now, we should be able to exploit basic buffer overflows from any simple program. Let’s move on…