NSA Codebreaker Challenge 2015: Task 3

Thu 07 July 2016 by bblough

This is the third post in my series on the NSA Codebreaker Challenge 2015, covering the third task of the challenge. If you haven't read them, you might want to check out the posts on Task 1 and Task 2.

Task 3

    The copy of the program you have is only capable of decoding secret
    messages and lacks the ability to encode new messages to other operatives.
    We need this capability in order to infiltrate the terrorist network and
    send encoded messages. Your task is to develop an encoder tool that will
    produce valid messages (i.e., should decode with an unmodified version of
    the decoder.) This task will require you to reverse-engineer the decoding
    logic in order to understand how to create new encoded messages. To
    complete this task, you will need to encode the following message using the
    provided keys and plaintext file, and upload the resulting message file.
    Also, the message for this task should be destined to the same recipient as
    the message in Task 1 (meaning it should decode in your decoder without any
    modifications being necessary) but using the key file provided below.

    The message to encode is, "Sir, Good news! We have developed a capability
    to encode secret messages to help with OPERATION vq72hp4w2pkw7h28b4gg. We
    are ready for our final tasking. VR, Codebreaker 3473469".

As the task says, in order to build an encoder we need to first understand the decoder logic. This was definitely the most time consuming part of the entire challenge, as it involved reverse engineering a significant portion of the decoder binary. This meant a lot of staring at the program's disassembly and converting it to an easier-to-work-with representation in C.

The Decoder

In order to make sure that I understood the decoder logic correctly, I decided to actually build a working decoder. This allowed me to verify the correctness of the logic by comparing my decoder's behavior with that of the decoder provided in Task 1.

To begin the process, I used objdump to disassemble the binary. This resulted in over three hundred thousand lines of assembly. Fortunately, the binary had not been stripped of debug symbols, so function names were included in the output. This made it easy to ignore the vast majority of the code, and only focus on functions that weren't related to openssl, curl, or the standard library. After weeding out all of the various library functions, there were 945 lines of assembly remaining. It belonged to the following functions:

08049c70 <main>:
0804a120 <get_file_contents>:
0804a1e0 <writefunc>:
0804a270 <init_string>:
0804a2d0 <normal>:
0804a590 <Base64Decode>:
0804a600 <tier2>:

Since the goal of the decoder is to understand the secret message logic, it wasn't necessary to implement the decoder in the exact same way as the original as long as it worked the same way when decoding messages. This meant that I could ignore any code that related to file handling, curl, or the stock price functionality. Reading through the disassembly, it became apparent that I could ignore all of the functions listed above except for tier2 and Base64Decode. However, since base64 is well understood, Base64Decode could be disregarded as well, as it would be straightforward to either reimplement, or find a library which provides the functionality.

To reverse engineer the remaining function, I systematically read chunks of the disassembly and annotated them with the equivalent C code.

...

804a67c:    e8 5f f2 ff ff          call   80498e0 <strtok@plt>
804a681:    85 c0                   test   eax,eax
804a683:    89 c1                   mov    ecx,eax                  ; ecx = pstr
804a685:    0f 84 bd 03 00 00       je     804aa48 <tier2+0x448>    ; if (!pstr) goto no_tokens
804a68b:    31 ed                   xor    ebp,ebp                  ; ebp = 0
804a68d:    31 ff                   xor    edi,edi                  ; edi = 0
804a68f:    be 07 00 00 00          mov    esi,0x7                  ; esi = 7
804a694:    bb 01 00 00 00          mov    ebx,0x1                  ; ebx = 1

;token_found
804a699:    89 0c 24                mov    DWORD PTR [esp],ecx      ; esp = pstr
804a69c:    89 4c 24 18             mov    DWORD PTR [esp+0x18],ecx ; esp_18 = pstr
804a6a0:    e8 bb f0 ff ff          call   8049760 <strlen@plt>     ; strlen(pstr)
804a6a5:    8b 4c 24 18             mov    ecx,DWORD PTR [esp+0x18] ; ecx = pstr
804a6a9:    89 c2                   mov    edx,eax                  ; edx = len(str)
804a6ab:    83 ea 01                sub    edx,0x1                  ; edx = len(str) - 1
804a6ae:    78 24                   js     804a6d4 <tier2+0xd4>     ; if (edx < 0) goto UNKNOWN1
804a6b0:    0f b6 44 01 ff          movzx  eax,BYTE PTR [ecx+eax*1-0x1]  ; eax = pstr+len(str)-1
804a6b5:    3c 20                   cmp    al,0x20                  ; if eax == space
804a6b7:    74 13                   je     804a6cc <tier2+0xcc>     ; if (eax == space) goto loop1
804a6b9:    e9 7d 03 00 00          jmp    804aa3b <tier2+0x43b>    ; goto not_space

...

Once a sizable chunk was completely annotated, I moved the C into the source file for the new decoder

...

    ptok = strtok(buf, "\n");
    if (!ptok) goto no_tokens;
    ebp = 0;
    edi = 0;
    esi = 7;
    ebx = 1;

start_at_end:
    pos = strlen(ptok);
    pos -= 1;
    if (pos < 0) goto end_loop_1;
    c = *(ptok + pos);
    if (c == ' ') goto next_char;
    goto not_space;
    next_char:
    pos -= 1;
    if (pos != -1) goto read_char;

not_tab:
    if (c == ' ') goto next_char;
    goto end_loop_1;

not_space:
    if (c == '\t') goto next_char;
    goto end_loop_1;

//  (pos == -1) || (c != '\t' && c != ' ')
end_loop_1:
    c = *(ptok+pos+1);
    if (c == '\0') goto get_next_token;
    pstr = ptok + pos;
    goto _804a707;

...

and moved on to the next chunk. Once all of the relevant chunks were done, I cleaned up all of the goto statements and labels by rewriting the code based on my understanding of the logic

...

    ptok = strtok(buf, "\n");
    if (!ptok) goto no_more_tokens;

    int max_bit = 7;
    int bit = max_bit;

    do {

        size_t pos = strlen(ptok);
        if (!pos) continue;

        char *p = ptok + pos - 1;
        while ( p >= ptok && (*p == ' ' || *p == '\t')) --p;

        ++p;

        while ( *p != '\0') {

            if (*p == ' ') *pout |= ( 1 << bit);

            if (bit == 0) {
                bit = max_bit;
                ++pout;
            } else {
                --bit;
            }

            ++p;
        }
    } while (ptok = strtok(NULL, "\n"));

...

The result was a decoder that implemented the following (incomplete) pseudocode:

...

    For each line of the message
        Start at the end of the line
        Read backward until a character that is not a tab or space is found
        For each character until the end of the line
            if a tab, append 0 to output
            if a space, append 1 to output

...

In short, the message was binary encoded using tabs to represent zeros and spaces to represent ones. Then the message bits were appended to lines in an innocuous-looking text file.

The output bitstream contained three sections: a 7-byte header section starting with M and containing the message size, the ID of the recipient, and the signature size; a message section containing the secret message in cleartext; and a signature section containing a base64-encoded RSA/SHA224 signature for the message.

After the initial work to decode the hidden message, it was straightforward to add the library calls needed to base64 decode the signature and verify it using OpenSSL

user@host:~/nsa_codebreaker_2015/task_3$ ./decode ../task_1/tier1_key.pem ../task_1/tier1_msg.txt
*****SIGNATURE IS VALID*****
Message: Meet at 22:00 tomorrow at our secure location.  Come alone, and do not tell anyone - this meeting is sensitive, as leadership will be present.  To authenticate yourself, mention the pass code osb4rfmthy5dp22kd7qm at the door.
*****SIGNATURE IS VALID*****

The Encoder

With the decoder logic in hand, creating the encoder was easy. Partial pseudocode for the encoder looked like this:

...

    Create an RSA signature for the message, base64 encode it, and append it to the message
    Create a header block with the leading M, the message size, the
        recipient id, and the signature size, and prepend it to the message.

    For each character of the message
        For each bit of the character
            if the bit is zero, output tab
            if the bit is one, output space

    Loop through encoded message
        append three bits of the message to each line of the template
        if less than three bits remain, append remaining bits
        if no more lines in template, append remaining message to one empty
            line at the end of the template

...

The result was an encoder that could encode messages that could be decoded by the binary given in Task 1.

user@host:~/nsa_codebreaker_2015/task_3$ ./encode tier3_private.pem message_in.txt tier3_template.txt > secret_message.txt

user@host:~/nsa_codebreaker_2015/task_3$ ../task_1/secret-messenger --reveal --symbol tier3_key.pem --action secret_message.txt
*****SIGNATURE IS VALID*****
Message: Sir, Good news! We have developed a capability to encode secret messages to help with OPERATION vq72hp4w2pkw7h28b4gg. We are ready for our final tasking. VR, Codebreaker 3473469
*****SIGNATURE IS VALID*****

Thus, Task 3 was complete.