The Solfa Cipher (NSEC17 Write-Up)

Share

Introduction

Between May 19th and 21st, 2017, I’ve participated to the NSEC 17 Capture-the-Flag (CtF) event held annually in Montreal, QC. As usual, the team and I had a blast spending days and nights solving challenges and drinking free beer. Among the challenges was a two-part cryptographic puzzle printed on the first and last pages of the passport of Rao’s Intricate Kingdom – the country part of the story line of the event. The challenge was divided in two parts: a Braille encoded message and the second part was encrypted using the Solfa cipher, which I had never heard of before. As such I decided to learn more about it and complete a write-up for the challenge at the same time. I’ll first quickly cover the Braille part of the challenge, then move on to the Solfa part of it and the decryption process.

The Second Half of the Flag

Upon entrance at the NSEC competition this year, participants received a passport designed to be stamped based on events happening during the CtF. The back of the front cover contains a sequence of dotted symbols which can be recognized quickly as Braille as shown in the figure below:

Braille-encoded message in Rao's Passport
Figure 1: Braille-encoded message in Rao’s Passport at NSEC’17

As most of you probably know, the Braille writing system was developed for the blind and visually impaired individuals to be able to read using touch. Examples of braille can often be found on elevators. The system is based on a matrix of 3×2 dots which can be blank or filled. Each dot is numbered from 1 to 6 as shown below:

Braille Matrix Numbers
Figure 2: A matrix of the 6 dots representing an individual character Braille

Each character of a natural alphabet can then be associated with a specific matrix configuration. For example, a simple Braille-English translation is shown below:

The Braille Alphabet
Figure 3: The Braille Alphabet (linked from pharmabraille.com)

Additional “shortcut” symbols are used for specific sounds, punctuation, symbols and words. The figure below shows some examples of common words in Braille:

Braille for words and abbreviations
Figure 4: Braille for words and abbreviations (from the Tennessee Council of the Blind)

Additional abbreviations can be found in [1]. Going back to the passport, we can obtain the transcription using Unicode:

⠠⠮⠀⠎⠑⠒⠙⠀⠓⠁⠇⠋⠀⠷⠀⠮⠀⠋⠇⠁⠛⠀⠊⠎⠀⠮⠀⠘⠺
⠠⠏⠇⠁⠞⠽⠏⠥⠎⠲⠀⠠⠁⠙⠙⠀⠭⠀⠁⠋⠀⠮⠀⠋⠌⠀⠓⠁⠇⠋⠀⠞⠕
⠕⠃⠞⠁⠔⠀⠁⠀⠉⠕⠍⠏⠇⠑⠞⠑⠀⠋⠇⠁⠛⠲⠀⠠⠛⠇⠕⠗⠽⠀⠞⠕
⠀⠠⠠⠗⠁⠕

We then translate into English and obtain the following translation from Braille to English.:

(Cap) THE S E CON D H A L F OF THE F L A
G I S THE  (Cap) P L A T Y P U S
. (Cap) A D D X A FTER THE F IRST H A L F
T O O B T A  IN  A  C O M P L E T
E F L A G . (Cap) G L O R Y T O
 (Cap) R A O

Table 1: Resulting message from translating from Braille to English.

Putting everything together, we obtain the first part of the flag included in the passport:

The second half of the flag is the word Platypus. Add x after the first half to obtain a complete flag. Glory to Rao

First Half of the Flag

The second part of the flag is much more obscure and less documented than the first one. The inside of cover page contains a small partition holding a total of 4 staves: the first one appears to be a chord while the last 3 are simply a sequence of notes. We noticed that the first staff contains a treble clef, the label “KEY-997” and is shorter than the other staves. Furthermore it contains the type of notes

Music Sheet in the Rao's Passport
Figure 5: Scanned copy of the lullaby on the last page of Rao’s passport.

Clearly, there is something in there, but how do we extract a flag out of this? My knowledge of music theory is extremely low and was basically non-existent prior to this challenge. As such feel free correct me in the comments if I misrepresent a musical concept or term.

The Cipher

Googling for words relating to music and cryptography will return a limited set of relevant sites, the first relating to musical cryptograms, which is not quite was we are looking for at the moment. The second page is about the Solfa Cipher. Once you’ve found the latter website, you’re almost at the solution, but let’s take a better look at it.

The Solfa cipher is a substitution cipher, but rather than using an alphabet to encode keys and cipher text, it uses musical notation. The encryption/decryption key is defined using a clef, a tonic, a mode and a rhythmic unit. The links will provide a much better definition of each different item than I could ever do in this article. However, be aware that the 4 elements mentioned above can have the following values:

Clefs Tonic Mode Rhythmic Unit
Treble C, C# Minor 1/4 (Quarter)
Alto D♭, D Major 1/8 (Eighth)
Bass E♭, E Phrygian 1/16 (Sixteenth)
F, F# Dorian
G♭, G Lydian
A♭, A Mixolydian
B♭, B Locrian

Table 2: Valid values for the properties of the Solfa key.

Like any symmetric cipher, a key is needed to encrypt the message, which will have to be shared with the intended recipients of the message. In this case, the key is in musical notation rather than a sequence of characters or bytes. The first staff of the page represents the key of the cipher, as the label clearly shows. The encryption key is composed of the four elements mentioned above: a treble clef, in C minor, using 1/8 as the rhythmic unit:

Solfa Cipher Key used in NSEC17
Figure 6: Solfa Cipher Key used in the passport at NSEC ’17: treble key in C minor with a 1/8 rhythm is used.

Each note is linked to a the seven pitches of the solfege, i.e. Do (D), Re (R), Mi (M), Fa (F), Sol (S), La (L) and Si (T). The “KEY-554” is only a randomly generated label and has no significance in the algorithm. With the key known, this puzzle becomes a chain of translations from musical notes to a list of tuple of tones and rhythms using the standard matrix below:

Do (D) Re (R) Mi (M) Fa (F) So (S) La (L) Ti (T)
1:  T I A S E N O  :1
2:  K Z X J Å Æ  :2
3: R C H M D L U  :3
4: F Y G P W B V  :4

Table 3: English language translation matrix usually used for the Solfa cipher.

The columns represents the pitch, while the rows represent the duration of the note (1, 2, 3 or 4).

Let’s go through a complete example to better understand the process. Consider the staff below:

Example of a Solfa-encrypted message
Figure 7: The word “SOLFA” encrypted using the Solfa Cipher

In this case, we assume that we are using a 4/4 meter i.e. the length of a single measure.  That means that each measure has a duration of 4 units. The key used to generate this melody was in C major, with a clef of Treble and a rhythmic unit of 1/4 (Quarter). The first note is Fa and starts at the first time unit, i.e. 1. Therefore the first note can be translated to (F, 1). The Fa is 4 units long, meaning that the second note, Si, also starts at time 1, translating to (T, 1). However this time, the note is only 2 time units long and thus the third note – Sol – starts at 3 and lasts only 1 time unit. Thus the third node is translated to (S, 3) while the fourth one will be translated to (D, 4). Finally, the last note is a Mi and starts at 1 and thus is translated to (M, 1). Putting everything together we have (F, 1), (T, 1), (S, 3), (D, 4) and (M, 1). Using the matrix above, we obtain (F, 1) = “S”, (T, 1) = “O”, (S, 3) = “L”, (D, 4) = “F” and (M, 1) = “A” and thus the plain text is the word “SOLFA”. This process is better represented in the figure below:

Decryption of the Solfa-encrypted word "SOLFA"
Figure 8: Decryption of the word “SOLFA” by reading the notes and their duration.

Going back to the NSEC challenge, we have a much larger melody to decrypt. Luckily, we have the key and the same process as the one we used to decrypt the cipher text in figure 8 applies.

Solfa-encrypted Message from the Passport
Figure 9: Solfa-encrypted Message from the Passport in NSEC ’17

Let’s take the first 9 notes listed in the figure above. For each of the note, we first determine its pitch (do, re, mi, …) and then its duration. The key given specify a 1/8 rhythm, as such a Eighth note will be worth 1 time unit, a Quarter note will be worth 2 time units and the half note will be worth 4 time units. Unless specified otherwise, the meter is 4/4, i.e. a measure is 4 time units long.

First 9 notes of Solfa encrypted for NSEC
Figure 10: The first 9 notes of the Solfa-encrypted message and the initial tempo of the melody.

Using the figure above, we can then extract the notes and their time from the partition:

The note is used as the column and the time as the row to read the corresponding value defined in the translation matrix. We can put a quick script what will read these notes and find the corresponding characters. Using table 3, the notes above will be translated to

d, 1 m, 3 s, 1 d, 4 r, 1 d, 3 m, 4 d, 1 m, 3
T H E F I R S T H

Table 4: First 9 characters decrypted from the partition at the back of the passport.

Applying this process to every note in the figure 9, we obtain the following message:

THEFIRSTHALFOFTHEFLAGISTHEWORDSUBDERMALCONCATENATEWITHTHESECONDHALFTOOBTAINACOMPLETEFLAGGLORYTORAO

Or with spaces and punctuation added: “The first half of the flag is the word subdermal. Concatenate with the second half to obtain a complete flag. Glory to Rao“. Mixing the 2 halfs of the flag, we get the string “SUBDERMALxPlatypus” and get 5 points out of it.

In case you are wondering what the melody in figure 9 sounds like, you can download the resulting MIDI file here: A Revolutionary Lullaby.

Conclusion

Braille and Solfa are quick and fun ways to encode/encrypt data in unusual ways. While they obviously should not be used for serious application, they could potentially be used as novel ways to exfiltrate data and bypass some filters. For example, a text file could be Base32 encoded and the padding character (“=”) could be replace with the number 1 for example. Then the resulting string could be encrypted using the Solfa cipher, transformed into a MIDI file and then uploaded to a remote location. I highly suspect that most network security appliances would not pick up on MIDI file being uploaded, although it would probably strike a careful analyst as suspicious. Feel free to experiment with it. A partial Python implementation can be found here.

References

[1] Simpson, C,  The Rules of Unified English Braille, Second Edition, 2013, International Council on English Braille, https://www.pharmabraille.com/wp-content/uploads/2015/11/Rules-of-Unified-English-Braille-2013.pdf, Visited on 2017-06-05

[2] Chambers, John, An ABC primer, http://trillian.mit.edu/~jc/doc/doc/ABCprimer.html, Visited on 2017-06-04

[3] Solfa Cipher, 2013, http://www.wmich.edu/mus-theo/solfa-cipher/index.html, visited on 2017-06-06

Additional Readings

Schneier, Bruce. Applied cryptography: protocols, algorithms, and source code in C. john wiley & sons, 2007.

Carter, Nicholas. Music Theory: From Absolute Beginner to Expert: The Ultimate Step-by-step Guide to Learning Music Theory Effortlessly. United States: CreateSpace Independent Platform, 2016. Print.

Software Exploit Development – Fuzzing with AFL

Share

It’s quite impressive to look back in the past to the early days of software vulnerabilities and observe the ongoing dance between new  mitigation and new exploitation techniques. Powerful fuzzing tools are now common place and operated on a daily basis by IT corporations and security labs; either to find crashes in their software or others’ program, seeking workable exploit out of it. New research is continuously presented to mitigate new procedures while multiple organizations develop new counter-mitigation tricks. In this post, we’ll overview the entire software exploitation process: from fuzzing with American Fuzzy Lop (AFL) to exploit development with gdb-peda and pwntools. For this purpose, we will develop a quick 64-bit program exhibiting a glaring buffer overflow vulnerability. We will then fuzz it to find that vulnerability, analyze the results and development an exploit for it. A video is also available.

The Vuln1 Vulnerable Program

While we could use a known vulnerable program online, we decide to craft our own quick C program so we can understand all its facets. The program below uses two characters buffers of 32 characters; one to hold the username and the other one to hold a password. To manage user input, we used the well-known insecure gets() function, which fails to check buffer boundaries and leads to buffer overflows.

Once executed, the program first asks for a username and a password. The inputs are stored in the login and passwd variables. Their value are then compared with the expected value using strcmp(). If the credentials entered are “root” and “1qazxsw2”, then a “Access Granted.” message is printed out to the console, otherwise “Access Denied.” is shown to the user and the program exits.

To simplify the exploitation process of this exercise, we will compile this program with absolutely no memory protection, i.e. NX will be disabled and no stack canary. NX is usually enabled to prevent code to be executed directly from the stack, which there is no need for other than exploitation purposes. As for stack canaries, they can detect stack overflows by adding an extra value to the software stack. If this value is not found when the function returns, an exception is thrown to prevent further execution. Nowadays, these protection schemes are enabled in a vast majority of cases, but we adopt simplicity rather than realism for this post. Disabling these protection mechanism can be achieved using the following GCC command:

The  -fno-stack-protector will disable the stack canaries while the  -z execstack makes both the heap and stack executable. To verify that these options have not been included, we can use a nifty tool call checksec which is included with pwntools, which will will present later in this post. By executing  checksec vuln1 we confirm that both the stack canaries and NX bit are disabled:

Checksec reports on 2 additional security mechanisms other than the stack canaries and No-eXecute bit. While these concepts are out of scope for this post, we will simply present them

With our target created, we are now ready to start fuzzing it to uncover the buffer overflow it contains.

Fuzzing with AFL

AFL is a popular open-source and free fuzzer that has been leveraged to discover vulnerabilities in a large set of applications and libraries. Before starting AFL, we need to instrumentalize our target using the afl-gcc compiler. The AFL compiler will add code around the source in order to maximize coverage. To compile the source code with AFL, use the same command used above to compile Vuln1 using afl-gcc rather than gcc or use the associated Makefile

The resulting binary is the one that will be used with AFL, but when analyzing the crash later one, we will do it with the gcc compiled binary. Until then, let’s learn how to use AFL to assess the Vuln1 program.

A critical aspect of fuzzing is to craft meaningful test cases, e.g. inputs that will maximize code coverage by exploring all potential paths of the targeted program. The vuln1 program is simple and only has 3 paths:

  1. Username is invalid;
  2. Username is valid, but password in invalid;
  3. Username and password are valid.

In order to reach these 3 paths, we will design our test cases appropriately by creating 3 files. The first file will have 2 lines, none of them containing the appropriate credentials, the second file will have the right username, but an invalid password and the third file will have both correct credentials. AFL will read the contents of each file and feed each line to the stdin of Vuln1. Create a directory called testcases and in it, create 3 files representing these cases. The name of the files does not matter.

test1.txt test2.txt test3.txt
a
a
root
a
root
1qazxsw2

After creating these 3 files, create another directory called results, which will contains the results of the fuzzing run. At this point you’re ready to start AFL using afl-fuzz, the actual fuzzing program. You can do so with the following command:

Where -t ./testcases specifies the directory containing the testcases, -o ./results specifies the output directory and ./vuln1 is that target program. If you run AFL for the first time, you’ll likely be greeted with the following warning:

Core_pattern file warning when running AFL
AFL Warns that the core_pattern file must be changed.

Just follow the instruction given and you’ll get rid of this message. Simply a shell as root using  sudo bash and type the suggested command, i.e.

Retry to start AFL using the same command and you should have no issue this time. A screen will appear and present you with quite a few statistics. This AFL Readme file explains all of these fields very well, and should definitively be read and well understood. For now, let’s focus on the “Overall Results” section.

Fuzzing Vuln1 with AFL - Results
Results of Fuzzing the Vuln1 Program

Two rows of this section are particularly interesting in this example:

  • Total paths; and
  • Unique crashes.

Notice that after a few seconds, the total paths field is 3, which is what we expected based on the code of vuln1. As such, once we reached 3 paths, we can stop AFL by pressing Ctrl-C, as it will not find anything new. In the real world, we have no idea how many paths may be possible. As such AFL provides color codes to help you assess if it’s time to stop. Another field that can help is the last path found. When no new paths have been found after a while, AFL may have cover most of the code it can find and is unlikely to find anything new. Finally, the most interesting field is the unique crashes, which indicates that some of the inputs, stored in the results directory, have successfully crashed the program and should be investigated. We have 2 files in the results/crashes directory:

Each file contains the input that crashed the program so you can reproduce the event and investigate to see if the crash is exploitable.

We can confirm the crash and observe an segmentation fault by piping the contents of the crash to our vuln1 program:

The next step is to analyze the crash data and determine if it can be converted into an exploitable vulnerability. Spoiler alert: it can.

Conclusion

This short post is a simple introduction to AFL, a powerful fuzzer that can be leveraged on source code and binaries to find potential vulnerabilities. This step is usually the first step in exploit development. In the next post, we’ll use PEDA to analyze the results found here and determine it exploitability.

References

See Also

  • Related YouTube Video: Software Exploitation Research: Fuzzing with AFL

Further Reading