Drone Insecurities

Airborne cyber attacks against drones have been conducted by hackers, criminals and state-sponsored actors. Unmanned Aerial Vehicles (UAVs) – also known as “drones” – are gaining popularity in many sectors of society. Pioneered by the military, law enforcement as well as criminals have used them. Hobbyists gather to race while retail stores will soon ship purchases with these. To reach the market quickly, security is often an afterthought. Vendors and operators underestimate the intent and capability of hackers to target their products. Thus researchers took up the challenge of showing their vulnerabilities. Reviewing current security issues and past incidents highlight the issues and their solutions before critical failures in drones causes harm. However before doing so, let’s quickly study the basic inner workings of drones.

Drones: How do they work?

Drones range from the highly complex multi-million dollars military aircraft to the toy models available for a hundred dollars. Despite this disparity, their overall architecture can be divided into 1) the aircraft and 2) the ground station. The aircraft regroups sensors and controllers needed to fly while the ground station includes software and hardware to send commands wirelessly. The ground station includes wireless transceivers, flight planning software, aircraft maintenance applications and the operators. The aircraft hosts the flight controller which processes data from sensors, avionics, communications, autopilot systems and in some cases, weapons.

Drone System Diagram
Drone Command and Control Diagram

Operators send navigation commands via a Remote Controller (RC), which relay information via a Line-of-Sight (LOS) communication channel. In low-end drones, standard wireless networking protocols are used while higher-end ones leverage satellite communications. Data from the ground station is transmitted to receivers or the aircraft and processed by the flight controller. The controller manages the outputs from various sensors including the GPS receiver, camera and propellers. Audiovisual data is relayed to the ground station via a second communication channel or stored on removable medias. Predefined routes can be programmed using flight planning software to make the aircraft autonomous.

Cyberattacks against Drones

Communication channels are the obvious attack vector. Unfortunately calls for encryption are often downplayed and proprietary protocols, despite some may believe, are not safe from reverse engineering and hijacking. In 2009, Iraqi militants found out that unencrypted videos feed from U.S. drones were available by pointing a satellite receiver towards the drone, greatly compromising operational effectiveness. While secure channels are critical, weak encryption is as good as no encryption. It was discovered in 2015 that scrambled video feeds between Israeli UAVs and ground stations were easily decrypted using open-source software. An attacker only required knowledge of the proper frequencies. That being said, even good encryption is no  remedy when employed with vulnerable key management systems. A fact rarely discussed by security vendors pitching solutions.

Lower end drones have been hacked via known vulnerabilities in wireless networking protocols. SkyJack is an air-to-air network attack drone that detects surrounding wireless connections of vulnerable drones and reroutes them to itself, allowing the remote operator to to hijack the target. This deauthentification technique used is similar to the one seen against home wireless networks. Other attacks included cracking WEP networks between the flight planning software and the remote controller, allowing the intruder to issue commands to the remote controller.

Communicational channels are not the only weak points. Drones require information from their environment in order to navigate. The information is captured by the sensors and transmitted to the flight controller for processing. Therefore providing invalid inputs to the sensors will disrupt the aircraft. Invalid GPS data is one of the explanation for the unexpected landing of a U.S. RQ-170 sentinel in Iran. The Iranian military maintain they spoofed GPS data by broadcasting stronger signals than the ones from valid GPS satellites. By doing so they forced the sentinel to land. Such an attack was proven possible on other models. The second theory is that an internal malfunction forced an emergency landing. In either case, this incident highlighted two issues: implicit trust of sensory data sources and unexpected effects of internal errors.

Security is not only about networks; it’s also about robustness of the internal software. MalDrone is a malware compiled for ARM Linux which allows to spawn a remote shell on the target drone. The attacker intercepts an unencrypted channel between the ground station and a vulnerable aircraft to upload the malware, allowing direct access to the operating system (OS). An alternative tactic could aim to modify the firmware of the flight controller or sensors. Users rarely validate the integrity of the firmware downloaded online, allowing backdoors to be injected via man-in-the-middle attacks. Hackers may leverage social engineering to send malicious updates or exploit a vulnerability to execute arbitrary code and gain remote access to the drone. Like many Internet-of-Things (IoT) devices, drones are seldom updated by their owners and could remain vulnerable for a long period of time.


As drones continue to find increased usage in civil society, they will be subjected to further analysis. Engineers and operators should never underestimate both the capability and intent of malicious actors to target their product either for fun, profit or other malicious goals. Lessons learned by the software industry remain mostly unimplemented:; lack of secure communications, no firmware integrity validation, loose or absent security controls and slow patching of critical vulnerabilities. The success of the Mirai botnet is a result of these failings and drones are not exempt. Unlike the distributed denial of service observed, compromised drones may raise safety concerns and must therefore be secured.

Remembering the ‘Stakkato’ Hacks

Philip Gabriel Pettersson, best known by the pseudonym of “Stakkato” can be said to have reached legendary status within the computer security community by his numerous successful breaches of high-level targets between 2003 and 2005. Then a 16 year-old hacker from Uppsala, Sweden, he successfully infiltrated systems of large universities, the United States military, NASA and various companies, forming a worldwide network within which he operated for around 2 years before being caught in 2005 and prosecuted by Swedish authorities. This post revisits the story of Stakkato by reviewing his motivation, techniques and exploits and potentially unearth some lessons learned from these events.

Bored Teenagers

Uppsala is the fourth largest city of Sweden and is situated around 70km north of the capital. In 2003, one of its curious and smart teenager went on to challenge himself by exploring – illegally – the digital environment surrounding the city. Some of us might remember the old definition of a “hacker”, as defined by The Mentor’s manifesto [1]. Back in 2003, owning a computer was still not totally commonplace, although it was a lot more than it in 1995. Only teenagers with a certain sense of interest and curiosity about technology would consider spending most of their time on their machines. In my corner of the world, in the 90s, computer science classes were nothing more than learning to type, using word processors and creating spreadsheets. I am sure I was not the only one in the same situation and some readers may remember the frustration of not being able to pursue their hobby in depth while in school. So we spent most the classes programming VBA games or spamming other students using WinPopup to have them call out the teacher, who would struggle to explain the innocuous messages on the screen. Only at night could we connect to the net, login into our favorite BBS, IRC channels or forums to finally learn more. Virtualization was not a thing back in the early 2000s, internet connections were still slow and owning more than 1 computer was a luxury most couldn’t afford. A solution was dumpster diving around computer shops – which were aplenty compared to nowadays – or browsing eBay for scraps. Another one was to poke around systems connected to the internet. Universities were of course perfect targets – opened, poorly secured (in order to be opened) and rich with systems, software and data.

Why am I rambling about the past? Because in many ways, Stakkato may have been the same teenager than many of us were back then, but his cockiness eventually got the better of him and caused his demise. Some even proposed that by 2005, he may have attempted to venture into criminal activities by selling stolen intellectual property. In any case, let’s explore briefly his story, because I believe many who now heads IT security companies, or experts and researchers in the field all shared the same starting point, but fortunately took a different path at some point.

The Stakkato Hacks

The first suspicions of wrongdoing were noticed in 2004. Berkeley researcher Wren Montgomery started receiving email from Stakkato [2], claiming that not only did he infiltrated her university, but that he also accessed the network of White Sands Missile Range in New Mexico, stole F-18 blueprints from Patuxent River Naval Air Station and infiltrated NASA’s Jet Propulsion Laboratory (JPL) – which to be honest, have been hacked by many in the past decade [3][4][5], almost making it an initial test for debuting hackers. These claims were later confirmed by spokesmen from both organizations. They however downplayed the importance of these breaches, claiming that there were low-level breaches and that only weather information was exfiltrated. Later during the year, several laboratories harboring supercomputers connected via the high-speed network TeraGrid reported breaches. However it was only in 2005, with the intrusion in networking company Cisco Systems, that would trigger alerts from authorities and proved to be a bridge too far. Having established a foothold within Cisco, Stakkato was able to locate and download around 800MB of source code of the Internetwork Operating System (IOS) version 12.3 [6]. IOS runs on every Cisco routers and other networking devices which are often key network component of not only large commercial and governmental organizations, but also of the worldwide telecommunication infrastructure. Samples of the code was released on IRC as proof and reported by a Russian security site. The theft of the code caused a stir, many believing that individuals or groups would comb the code and craft zero-day exploits that could be leveraged on critical systems.

This activity would prove the last Stakkato and his team would be able to brag about as the Federal Bureau of Investigation (FBI) and the Swedish authorities started to investigate the leaks. In 2007, he was convicted for breaching networks Swedish universities and paid 25,000$USD in damages. He was further interviewed by U.S. officials [7] and in May 2009, he was formally inducted in California for intrusions in Cisco Systems, NASA’s Ames Research Center and NASA’s Advanced Supercomputing Division [8]. In 2010 his prosecution was transferred to the Swedish authorities.

The Tactics

The core strategy of Stakkato revolved around a trojanized SSH client he uploaded to systems he compromised. The malicious client would be used to intercept users’ credentials and send them to a third location where Stakkato and his group would retrieve them to access additional systems. Once accessed, Linux kernel exploits were used for privilege escalation on the local system and then repeated their main tactic, creating privileged accounts and eventually building a wide network of proxies to launch their attacks. The attack on the National Supercomputer Centre [9] provides insight on the tactics and size of the compromises. The methodology used was not innovative by any mean, but was applied effectively and certainly leveraged human errors to its full extend. The process can be summarized as follow:

  1. Infiltrate a system via a kernel vulnerability or stolen credentials;
  2. Disable command history, e.g prevent the system from logging your commands;
  3. Attempt privilege escalation;
  4. Setup trojanized SSH clients, backdoors and rootkits;
  5. Extract known hosts from current machine;
  6. Attempt to infiltrate extracted hosts as per step 1.

The analysts of the NSC documented logins from universities the United States, Israel and Sweden and referenced the SuckIt rootkit [10] as being installed on one of the target machine. Unfortunately for the administrators, the rootkit was discovered only after a new root password was assigned to all machines, allowing the attackers to re-infiltrate the newly cleared systems. However this time the Swedish teenager was a lot less subtle and vandalized the systems by attempting a web defacement and modifying logon messages. This time the IT specialists took down the network, inspected and reconfigured every machine before putting the system back online. Despite the defensive operation, recurring login attempts and smaller-scale compromised originating from more than 50 compromised organizations were noted between 2003 and 2005.

Lessons Learned

This story follows the same pattern observed throughout the ages, such as sprawling empires from ancient times in which the rulers’ overconfidence led them to bankruptcy, or growing organizations that stretched into markets that proved more difficult than expected. Stakkato’s network of compromised systems grew too large, he became overconfident and tempted the sleeping bears. In other words, patience may have led him to a very different path. Or maybe his arrest was for the best afterall: there is little news about him past 2010, but coincidently there is a security researcher working in Samsun bearing the same name and credited multiple vulnerabilities in the Linux kernel [11][12]. While I have no idea if this is the same individual, I would be glad to hear that he now uses his skills fruitfully.

Arguably another lesson is how simple tricks can still work if applied efficiently. All things considered, security hasn’t changed dramatically within the past 10-15 years: it has evolved, but in the end, we still rely on usernames and passwords, users’ awareness and administrators properly maintaining their networks and hosts. Humans using these systems haven’t changed much either; we will take the simplest approach to achieve our goals. Hence we select the easiest password passing the complexity filters in place and reuse it [13] so we don’t have to remember 100 variations of the same password. Large database compromises in the past few years appears to prove this behavior. We could have many passwords and store them in password managers, but then the password managers can still be trojanized or exploited [14], allowing similar tactics used by Stakkato. Eventually most people would probably not bother to execute an additional program to retrieve their password in order to login in the service they need; it simply adds an additional step.


Studying the past of computer security is sometimes quickly dismissed, often seen as irrelevant given the change in technologies, but one can easily find inspiration in the stories of hackers, malware writers and the analysts that battled to gain and maintain control of systems. Much like studying the battles of Alexander the Great or Patton, there is much to be learned from studying the techniques used and wargaming their applications in modern organizations. Would the current administrators blindly enter their passwords if a windows suddenly popped up requesting their credential for some update? Users still get fooled by fake login web pages [15] and end up with their bank accounts plundered or their Twitter account spewing nonsense to all their followers. It still works.

Obligatory XKCD


[1]    “Phrack Magazine” [Online]. Available: http://phrack.org/issues/7/3.html. [Accessed: 05-Nov-2016].

[2]    J. M. L. Bergman, “Internet Attack Called Broad and Long Lasting by Investigators,” The New York Times, 10-May-2005. [Online]. Available: http://www.nytimes.com/2005/05/10/technology/internet-attack-called-broad-and-long-lasting-by-investigators.html. [Accessed: 02-Nov-2016].

[3]    K. Zetter, “Report: Hackers Seized Control of Computers in NASA’s Jet Propulsion Lab,” WIRED, 01-Mar-2012. [Online]. Available: https://www.wired.com/2012/03/jet-propulsion-lab-hacked/. [Accessed: 04-Nov-2016].

[4]    “Hacker Sentenced in New York City for Hacking into Two NASA Jet Propulsion Lab Computers Located in Pasadena, California (September 5, 2001).” [Online]. Available: https://www.justice.gov/archive/criminal/cybercrime/press-releases/2005/gascaConviction.htm. [Accessed: 04-Nov-2016].

[5]    “Hackers penetrated NASA computers 13 times last year,” USATODAY.COM, 02-Mar-2012. [Online]. Available: http://content.usatoday.com/communities/ondeadline/post/2012/03/hackers-penetrated-nasa-computers-13-times-last-year/1. [Accessed: 04-Nov-2016].

[6]    “Sweden to prosecute alleged Cisco, NASA hacker.” [Online]. Available: http://www.theregister.co.uk/2010/02/08/swedish_hacker_prosecution/. [Accessed: 04-Nov-2016].

[7]    D. Kravets, “Swede Indicted for NASA, Cisco Hacks,” WIRED, 05-May-2009. [Online]. Available: https://www.wired.com/2009/05/swede-indicted-for-nasa-cisco-hacks/. [Accessed: 03-Nov-2016].

[8]    United States of America v. Philip Gabriel Pettersson aka “Stakkato.” 2009.

[9]    L. Nixon, “The Stakkato Intrusions: What happened and what have we learned?,” presented at the CCGrid06, Singapore, Singapore, 17-May-2006.

[10]    D. Sd, “Linux on-the-fly kernel patching wihtout LKM,” Phrack, no. 58, Dec. 2001.

[11]    P. Pettersson, “oss-sec: CVE-2015-1328: incorrect permission checks in overlayfs, ubuntu local root.” [Online]. Available: http://seclists.org/oss-sec/2015/q2/717. [Accessed: 05-Nov-2016].

[12]    “Linux Kernel ’crypto/asymmetric_keys/public_key.c ‘ Local Denial of Service Vulnerability.” [Online]. Available: http://www.securityfocus.com/bid/81694. [Accessed: 05-Nov-2016].

[13]    T. Spring and M. Mimoso, “No Simple Fix for Password Reuse,” Threatpost | The first stop for security news, 08-Jun-2016. [Online]. Available: https://threatpost.com/no-simple-fix-for-password-reuse/118536/. [Accessed: 04-Nov-2016].

[14]    “How I made LastPass give me all your passwords.” [Online]. Available: https://labs.detectify.com/2016/07/27/how-i-made-lastpass-give-me-all-your-passwords/. [Accessed: 05-Nov-2016].

[15]    Bursztein, Elie, Borbala Benko, Daniel Margolis, Tadek Pietraszek, Andy Archer, Allan Aquino, Andreas Pitsillidis, and Stefan Savage, “Handcrafted fraud and extortion: Manual account hijacking in the wild,” in Proceedings of the 2014 Conference on Internet Measurement Conference, Vancouver, Canada, 2014, pp. 347–358.

(Bad) Amazon Phishing Email

Fortunately, my wife is a smart cookie and always suspicious of weird looking email. Maybe its due to the fact she lives with a paranoid guy. In any case, she caught this phishing email, which appears to be from Amazon, and leads to a fake login page.


Fortunately, my wife is a smart cookie and always suspicious of weird looking email. Maybe its due to the fact she lives with a paranoid guy. In any case, she caught this phishing email, which appears to be from Amazon, and leads to a fake login page.


The phishing email comes from “amazon@iservice.co.org.il” with the terribly spelled subject “your accounnt information need to be updated” and the content is a screenshot of an authentic Amazon email, thus bypassing filters. However, the attacker succeed in misspelling the only field he had to fill.

A fake Amazon account confirmation received which contains a single image.
A fake Amazon account confirmation received which contains a single image.

Clicking anywhere on the image will redirect the target to ‘http://bestofferz.biz/service/support/wp-admin/support/support/”, which host a fake login page as shown below:

Fake Amazon Login Page
The attacker is hosting a fake Amazon login page on HostGator

So by looking under the hood, we can see that the entire page is actually a single javascript function call to decrypt a long Base64 encoded string.

The encryption key used is stored in the hea2p variable and the HTML code. The entire code can be analyzed here and using the AES Javascript code here. If the target enters his emails and password, he will then be forwards to a fake account creation page asking for his address.

Fake Amazon Account Creation Page
Fake Amazon account creation page.

And of course, it will then ask you for your credit card information, which is possibly the end goal of the phisher.

Fake Credit Card Information Request Page
Fake Credit Card Information Request Page

All the pages are encrypted using the same key. Only after entering this information to the target get redirected to the real Amazon website.

Successful Phishing Operation Page
Successful Phishing Operation Page


Remember to always check the URL and the from email address !

Repost: Stack-based Buffer Overflow Vulnerabilities in Embedded Systems

The buffer overflow attack vector is well documented in desktop and server class machines using the von Neumann memory model, but little has been published on buffer overflow vulnerabilities in Harvard architectures.

I have not written or contributed to the enclosed research paper. I’m simply reposting it here because it’s interesting and for some reason, appears available only via Google cache. So before it disappear from results, I’m reposting it here.

This paper discusses a technique to conduct buffer overflows on processors using the Harvard architecture. In this architecture, the stack starts at the beginning of the memory and grows up, versus Von Neumann architectures in which it grows down.


Most small embedded devices are built on Harvard class microprocessor architectures
that are tasked with controlling physical events and, subsequently, critical infrastructures. The Harvard architecture separates data and program memory into independent address spaces, as opposed to the von Neumann architecture that uses a unified memory system with a single address space for both data and program code. The buffer overflow attack vector is well documented in desktop and server class machines using the von Neumann memory model, but little has been published on buffer overflow vulnerabilities in Harvard architectures. In this paper we show that stack-based buffer overflow vulnerabilities exist in embedded control devices based on the Harvard class architecture. We demonstrate how a reversal of stack growth direction can greatly simplify the attack and allow for easier access to critical execution controls. We also examine popular defense techniques employed in server and desktop environments and the applicability of those defenses toward Harvard class machines.
Link: Kristopher Watts & Paul Oman, University of Idaho, Stack-based Buffer Overflow Vulnerabilities in Embedded Systems

#TheGreatFTPHunt – 2% to 9% of files scanned potentially containing confidential information

In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. We also present the latest statistics collected in our database.


In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. Based on the current result, 2 to 9% of the 3000 files reviewed were sensitive or potentially sensitive. Most of the sensitive files are concentrated on a few hosts. These files often include financial information or project data from businesses. So far, 773 hosts containing around 4.5 million files have been scanned.


The amount of filenames collected is quite large and we cannot evaluate manually each filename for its probable sensitivity. As such, we need to devise a procedure to automatically assess its sensitivity. We have some definitions and restrictions to list first to clarify what a sensitive file is and limitation to our evaluation criteria.

In this document, sensitive file refers to user-generated or software-generated files based on user input that contains information that should probably not be publicly accessible and which can be leveraged against an individual or organization. This includes:

  • Personal identification documents; passport, driver’s license, visas, government forms…
  • Personal finance documents; income tax files, insurance forms, credit card statements, mortgage, pay stubs, banking information
  • Personal medical documents; prescriptions, medical records
  • Work-related files; emails, proprietary source code, password lists
  • Business finances; customer lists, sales data, project costs, business deals, investments, payrolls
  • Intellectual property; blueprints, schema, patents, research
  • Network configuration; passwords files, configurations files, network diagrams, user databases
  • Large databases of emails, addresses and other personal information.

Some of the files not included in our analysis that includes;

  • Copyrighted / Illegally downloaded files. However we considered text file containing licensing keys to be sensitive.
  • Inappropriate contents (nude selfies, personal politics, group affiliations etc…)
  • Personal pictures, letters.
  • Addresses and emails were not considered personal, however databases of addresses and emails are considered sensitive

Because of the volume, we cannot download and manually verify each file to confirm its contents, as such our main restriction is that our assessment must be done solely based on the absolute filename recorded. As such, to evaluate the sensitivity, we used three categories; positive, negative and neutral, i.e. either a file is very likely to sensitive, potentially sensitive or clearly not sensitive at all. Of course, there is always a possibility that a file labeled as sensitive may not be. For example, a file called social security numbers.xls may contain only formulas or an empty form. Ideally, files identified as positive or neutral should be manually vetted.

The procedure to automatically assess the sensitivity of a file based on its path and name is first done by assessing a random sample manually. Using the ORDER BY RANDOM (note: there will be a need to review if this function is truly random, which I doubt) function (performance is not an issue in this experiment) of the Postgresql database, multiple  random samples of 100 filenames are retrieved from the database. Each file is shown to the evaluator which based on the path, filename and extension assess the sensitivity of the file as ‘positive‘, ‘neutral‘, ‘negative‘. For each run, we log the count of hits for all categories.

Listing 1 : Example of a run in which a script asks an evaluator to assess the sensitivity of files based on its absolute path.

The evaluator is assessing the filename based on keywords that may indicate the contents of the file. As such, a file containing the word, or as we call it in this document, a token such as sales, passport or passwords will be assume to contain information about sales, a passport scan or a list of passwords. In many cases, the filename is too obscure, but the path and extension may indicate the contents of the file. For example, a path containing the tokens project, finances and a Microsoft Excel extension despite a filename of axe189212_c.xls will be considered as neutral, as the file may contents information about a project. Examples of both scenarios are shown in listings 2 and 3:

Listing 2 : Examples of files that were deemed ‘positive’ hits based on keywords in their absolute path.

Listing 3 : Examples of files that were deemed ‘neutral’ (or ‘unknown’) hits based on keywords in their absolute path.

Filenames in foreign languages are roughly translated using Google Translate, as such, many of them are labeled as unsure.

A Python script then divide the filename in tokens, and each token is stored in the database along with the number of times it was found in a positive, neutral and negative hit. Tokens are created slightly differently based if they are located in the path, the filename or in the extension. For the extension, a single token is created which contains the extension itself. If the file does not have an extension or is not an extension usually associated with known software, no token is created. For the filename, tokens are created by splitting each word using characters usually known to separate words such as the underscore, dash, period or spaces. Lastly, for the path, directories are used as token and unlike filenames, are not split further. An example of this process is shown in listing 4:

Listing 4 : Example of the tokenization of a filename.

Once the tokens are created, the script will either add the token in the database or update its count based on the evaluator choice. After each update, a score is given to the token, which is simply the ratio between positive hits and the total count of appearances: p / hits). Note that tokens are considered different depending their location in the filename. As such, a filename such as /My_Passport/backup/Outlook emails backup.pst, will generate 2 distinct ‘backup’  tokens; the one from the path and the one from the filename. We explain this decision in the next paragraphs.

Listing 5 : Scores of the tokens extracted from the file in listing 4.

By using this procedure, we believe that tokens appearing often in both positive and negative hits will cancel each other, while the tokens strongly associated with positive and negative hits will remain clearly divided. Some sort of mathematical should follow later one (I hope…need to review discrete maths I guess). Some preliminary results  appears to confirm this approach as valid. Extensions strongly associated with sensitive contains higher scores while media files have null scores.

However, there is a need to further refine this process by associating a value, or weight, to the location of the token. Tokens in the path are not as indicative of the sensitivity of the file as much as a token in the filename or extension. Even within the path, the highest level is generally less indicative than the lowest one, i.e. /documents/finances 2012/sales/company sales.xls. Therefore when assessing a new filename, we need to give a score to the path, the filename and the extension. For the path, we will get the score of each token and multiply it with a weight that correspond to its location in the structure. For token that are not found the default value of 0 will be given. Then we will take the average of all token for the score of the path. As for the filename, we will not consider the position. Finally the stored score of the extension will be retrieved from the database. If the extension is not found, then a score of 0 will be used. This will transform a filename into a set of three real values which we can range between 0 and 1. To determine the weights needed for each location, we will used a supervised neural network. More research will be conducted to determine how to use this approach.


As of 16 July 2015, 4,568,738 files have been recorded from 773 hosts.

Country Hosts
United States 258
Russian Federation 91
Sweden 69
Canada 66
Ukraine 27
Norway 24
United Kingdom 24
Australia 19
Netherlands 18
Hong 18
Taiwan 16
Poland 15
Germany 11
Romania 11
Finland 10
Switzerland 8
Korea 8
Singapore 7
Czech Republic 7
Japan 6
Table 1. Location of the 773 hosts scanned as of 16 July 2015 order by country.

Mp3 and JPEG image files remains the most common. As such, we focus our statistics on document-type of files for a change, i.e. Office documents. Adobe PDF files and Microsoft Word documents are the most common file types based on our current data as shown in figure 1.

Most common file types scanned as of 16 July 2015 for office-related documents
Figure 1. Most common file types scanned as of 16 July 2015 for office-related documents

At the moment, around 3000 files have been assessed (30 runs of 100 samples). For each run, we recorded the number of positives, neutral and negative hits and found them overall constant at each run. (see figure 2) However more details about the RANDOM function is needed to insure the randomness of the sample. This part may need to be redone. So far, between 2% and 9% of files scanned are considered sensitive or potentially sensitive (see figure 3). However we need to consider the concentration of these files to put this information into perspective. The 278 files identified as sensitive or potentially sensitive were located on 59 hosts, with one host accounting for 101 of these file. This indicates that files of interests for an attacker are likely to be concentrated on a few hosts.

Chart of assessed sensitivity of randomly selected 30 samples of 100 filenames.
Figure 2. Assessed sensitivity of randomly selected 30 samples of 100 filenames.
Chart of percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.
Figure 3. Percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.

As for tokens, we will have to consider the entire collection of filenames in order to have sample from multiple sources, as such, we will pursue manually assessing samples of 100 filenames as more data is collected. After which we should have an excellent training set for the neural network. Some high-recurring and high-scoring tokens are shown in tables 2 and 3.

Token Hits Score
attach 7 0.9285714286
txn 7 0.9285714286
planning 6 0.9166666667
archived 6 1
recpt 6 0.9166666667
2010taxreturns 5 1
person~2 4 1
purchase 3 1
order 2 1
Паспорт 2 1
Table 2. Sample of high-scoring tokens sorted on the number of times observed.
Token Hits Score
jpg 938 0.013326226
mp3 460 0
music 448 0
seagate_backup_plus_drive 382 0.1452879581
asusware 348 0
pictures 309 0.0048543689
sda1 285 0.0649122807
bigdaddy 279 0
elements 278 0.0485611511
transcend 247 0.0222672065
my_book 234 0.0106837607
Table 3. Sample of high-recurring tokens sorted on the number of times observed.


While these results are preliminary, they nevertheless seems to provide a solid indication of what one can find on publicly-available removable drives. Additional work and fine tuning of both code and processes is required to provide more accurate data and the next step while the scan is still on going it to develop a methodology to assess the sensitivity of all files, likely using a neural network for classification based on the method presented above.

Useful T-Shark Commands for Intelligence Gathering from Network Traffic

T-Shark is practically the command-line version of Wireshark. It has the same basic capabilities but with the added flexibility offered by using the command-line to process outputs and send them to other applications. Below I’ve enclosed some of the commands that I have found myself reusing over and over again.


T-Shark is practically the command-line version of Wireshark. It has the same basic capabilities but with the added flexibility offered by using the command-line to process outputs and send them to other applications.  Furthermore, T-shark is ideal for large PCAP files which Wireshark may have difficulty digesting, especially since it has to load the entire contents of the file prior to any kind of filtering. As such, T-Shark is my main tool for analyzing PCAPs. One of my main use is to extract specific information from the network for investigation. Below I’ve enclosed some of the commands that I have found myself reusing over and over again.


When analyzing PCAPs, I’m mostly concern to locate anomalies based on intelligence on various current actors. I’m especially interested in analyzing covert channels for specific indicators that are usually indicative of malicious traffic. These often include typos in popular URLs, weird looking domain names, emails with suspicious attachments or certificates with random fields. To extract the data, I’ve used the following T-shark commands:

Extracting URLs from HTTP Requests

In the example above (and the ones below), I’m reading network traffic from a offline PCAP file, which is why the -r lab1.http.pcap parameters is present. The -T fields specify to output the value of the fields specified with the -e parameter. For each field you want to output, you specify it using the -e <field> option. The -R <filter> is the filter to apply to the traffic. For example in this case we filter only HTTP traffic. Sort and Uniq are Linux applications used to sort the output of a program and remove duplicate entries.

Extracting Filenames from FTP uploads

Extracting URLs from DNS Requests

Extracting Recipients’ Email Addresses of Inbound Emails

Extracting Senders’ Email Addresses of Inbound Emails

Extracting Subjects of Inbound Emails

Extracting Source URLs of X509 Certificates

Extracting Information from X509 Certificates


Since these proved useful on many occasions, a simple Bash script called Netminecraft was made to automate their usage.


To split a larger PCAP file into protocol-specific PCAPs, use;

Note that the scripts only passes the contents of <protocol> to
t-shark. As such, you can specify any Wireshark filter to extract
even more specific information, for example:

However avoid filters with spaces as the current version of the
script does not manage spaces. The results will be saved in a file
in the current directory as <protocol>.pcap, for example http.pcap.

To mine for data relating to a specific protocol, use;

The output file will contain text data that have been sorted and in
which the doubles will have been removed using ‘uniq -i‘, i.e.
we ignore the case of the items.


The example above will extract the URLs of all the DNS queries found
in the file dns.pcap and will output a list of URLs in dns.queries.txt:


Learning to use T-shark has many advantages that can increase efficiency, security and flexibility. It allows for scripting the extraction of data and storage into databases from which further analysis can quickly be done for anomalies. In this short post, we have listed some examples of how T-shark can be used, but it barely scraps the surface.


T-Shark Manual, The Wireshark Network Analyzer, https://www.wireshark.org/docs/man-pages/tshark.html, accessed on 2015-02-24