Embedded software security – are text strings a vulnerability?
As embedded systems become even more ubiquitous and complex, there is an increasing concern about security. The term means different things to different people, but I am thinking of the requirement for systems to be less vulnerable to tampering. Security measures are aimed at preventing, deterring or delaying the work of a hacker, who is trying to change the functionality of the device in some way. This might be to extract data from it or change its operation. In any case, the goal is likely to be theft or malevolence of some kind.
If a system really needs to be bullet proof, industrial grade encryption is called for. This normally requires specific hardware support, which, whilst readily available, might be considered overkill for an application where such high security is not necessary. In such cases, there are other options …
If a hacker can gain access to a device’s memory content, they can start to figure out what it does and how it does it. This is the first stage in altering its operation. Code may be dis-assembled and, hence, the logic can be revealed. Without encryption, there is little that can be done to prevent this. The next thing the hacker might do is look at a hex/ASCII dump of the data and see what they can find there that makes sense. They are looking for patterns and recognizable structures. This is where some precautions may be taken. Whilst encryption may not be an option, obfuscation is a possibility.
The goal of data obfuscation is to delay or deter the hacker by simply making the data less recognizable for what it is. Scanning through a memory dump, one of the easy things to spot is text strings. So this is what I will focus on here.
For C/C++ code, text strings are just sequences of bytes containing ASCII codes terminated by a null byte. That is very easy to spot, so I will change it. First, instead of the null terminator, the first byte of each string will be a length specifier. The characters of the string will have their data scrambled slightly, to make them less familiar looking – all I will do is swap the two nibbles of each byte. I need to have a utility program into which I would feed the plain text strings and it generates the declaration for an array with appropriate initialization. Here is the function at the heart of this utility:
void scramble(int index, unsigned char *input) { unsigned char *charpointer, character;
printf("unsigned char string%d[%d] = {0x%02x, ", index, strlen(input)+1, strlen(input));
charpointer = input; while(*charpointer) { character = *charpointer++; character = ((character & 0x0f) << 4) | ((character & 0xf0) >> 4); printf("0x%02x", character); if (*charpointer) printf(", "); } printf("}; // \"%s\"\n", input); }
If I passed this function an index of 4 and a string “Hello world” [original eh?], the output would be:
unsigned char string4[12] = {0x0b, 0x84, 0x56, 0xc6, 0xc6, 0xf6, 0x02, 0x77, 0xf6, 0x27, 0xc6, 0x46}; // "Hello world"
I can copy and paste this into my code, then all I need to do is write a function to unscramble the text when I need to display it. Note that the generated code is somewhat self-documenting, as the comment shows the string in a readable form, but, of course, this only appears in the source code. If the hacker has access to your source code, then you are in sufficient trouble that I am unable to help further!
A side-effect of localizing all the text strings is that making different versions of the software for other languages is quite straightforward.
I must reiterate and emphasize that data obfuscation is far from bullet proof and will, at best, slow down the serious hacker. If you need really greater security, you must look at full encryption. And a good place to start is here.