What Is Stack Smashing?
Working as a quality assurance engineer, one will sooner or later run into the term stack smashing. As a developer, one will likely discover this term even earlier, especially if one has introduced a bug into the code, which causes a smashed stack. It is relatively easy (as in ‘somewhat easy’) for a developer to make a mistake that introduces stack smashing. As a user, when you learn about stack smashing, the damage is likely done already.
Stack smashing can happen involuntarily – for example, when the developer introduced a bug that caused stack smashing – or maliciously – an attacker somehow trying to overflow or corrupt the stack of a program.
Stack smashing is a somewhat loosely defined term that may point to various issues and can come from a variety of sources. The two most prominent issues which can cause stack smashing are; 1) to write/over-allocate too much data in a given part of the stack, thereby overwriting another part of the stack, and 2) where some external source (malicious or not) overwrote another program’s stack, though this is much less common.
So what is a stack? This, too, is a loosely defined term. Generally speaking, a stack refers to a program processing stack, a stack of functions as defined in a given software program/code.
Start by imagining a stack of bathroom tiles stacked up, ready to be used by a tiler. This is quite a good representation of a computer stack, with a few modifications. If each tile were a little offset from the previous one, it would be a better image, and we will soon see why.
Picture that each stacked tile is a function in the computer program. The most basic function is on the bottom, and could be for example the main() function in a C or C++ program. C and C++ are two programming languages that use the stack extensively.
Each of these functions in the C/C++ program will have a name and likely a set of incoming variables and outgoing variables. In simplified terms, picture if one of those variables had a length of 10 characters, and some other function accidentally wrote 100 characters to that variable. This may corrupt the whole stack.
In terms of the tiles example above, imagine someone with a hammer hitting the first tile a little too hard and thereby smashing all the other tiles. Eh voila; stack smashing ;)
The analogy works because, just as all tiles are now broken in our fictive memory image, a smashed stack will result in ‘broken functions’ if you will. Each tile offset is a function nested deeper – more on broken functions in the next section.
Debugging Smashed Stack(s)
Whereas technically a reference to ‘broken functions’ may not be fully correct, i.e., there is likely only one broken function, and there may even be no broken function when there is an external attack or malfunctioning program, it is a great way to think about a smashed stack.
Suddenly, variable and function names may be mangled, and a backtrace (the flow of functions the computer took to arrive at a given function that crashed and (in our example) smashed the stack) does not make sense anymore.
Generally speaking, when we look at a backtrace, it will have a clear flow of functions which were called. While a crashing program cannot immediately be called ‘healthy,’ in terms of backtracing/debugging, this is what a ‘healthy’ backtrace looks like:
When a stack is corrupted, however, debugging becomes much harder. The stack may look like this:
This is an example stack smashing issue that happened in MySQL, the database server (see the log.txt attachment to MySQL Bug 37815 for the full output) in 2008, causing the database server daemon (mysqld) to terminate.
While the operating system’s library libc.so.6, in this case, seems to have handled the stack smashing quite well (using some fortify functionality in the __fortify_fail function), the issue existed somewhere in the code and has since been fixed.
Note also that in this case, we do not see resolved function names, we are only shown the binary name (interestingly, the issue seems to have been in the client (mysql) causing the server (mysqld) to terminate) which is mysql, together with a memory address of the function: mysql[0x8051565], mysql[0x80525c7] and mysql(main+0x4f8)[0x8053198].
Normally, when we use debug symbols (ref below for an article on GDB which explains what debug symbols are in detail), we would see function names with variables, and even with some levels of binary optimization/minification in place, we would at least see function names, just like what we see in the first ‘healthy’ backtrace above.
However, in the case of a smashed stack, output of the function names, variables names, or values are never guaranteed and often complete mumbo-jumbo :) We may even see different function names or a completely mangled stack (another lingo often used by IT folk) of different function names which do not make much sense (and are likely fictive/untrue as the stack was overwritten somehow).
This makes it harder on both the test engineer (who may end up with many different outcomes for a single bug, complicating known bug filtering mechanism handling) as well as the developer (who will likely have to use some step-by-step tracing or a reverse execution debugger like RR to discover the bug at hand).
What To Do When You Face Stack Smashing?
If you run into stack smashing, the first thing you want to do is understand the issue and environment a little better to know the source. If you have a popular web server exposed on the Internet with lots of gaming users who are trying to win a tournament whilst the server is also mining Bitcoin, you will want to assume the possibility of foul play and figure out if someone is messing with the server.
However, in most instances, the issue will be just an application error. Whilst I say ‘just’, the issue may be very significant, may result in downtime of services, could cost a lot of money, and finally may be unable to be fixed. For example, a database server may crash persistently when being started due to the data being in a given state in combination with a shortcoming or limitation in the code.
If such a situation is compounded by not stack smashing, or in other words, not being able to generate a clean backtrace of the problem, debugging will be more involved and at times near impossible. Fear not, however, the same basic debugging as with any bug or application error/crash/issue remains the same.
Carefully read every bit of the log files before, during, and after the issue occurred. Take some backups and then retry the operation. Does it fail again or not? Research the errors, parts of the stack, and even the frames (i.e., individual stack functions shown, like the do_the_maths function in our original ‘healthy’ stack trace) can be put into your favorite search engines.
Concatenating (with a space) the most selective (top) crashing frames and searching for the same online often finds you an existing bug report for the issue you are facing. Still, in the case of stack smashing, likely these frames (function names) have become mangled and are therefore no longer usable in the same way. If you see an assertion message (a developer instituted assertion in the code) of any kind, search for that too.
Always log a new bug report if the issue does not appear to be logged online yet (you may be helping others who are seeing the same!) and supply as much information about the issue as you can find. Thousands of bug reports against just as many applications are logged online each day. Hopefully, the support team for your stack smashing application is at hand to help quickly.
You may also like to read our Debugging with GDB: Getting Started article next, as it builds further upon how C and C++ programs (and others) can be debugged with the GDB debugger. It also further explains the concepts of a stack in detail.