Open Testware Reviews

Technology Bulletin: System Call Hijacking Tools

Copyright 2003 by Tejas Software Consulting - All rights reserved.

Reviewed:
2003-September-30
Testingfaqs.org category: Test Implementation Tools

My review of Holodeck in August 2003 sparked requests for information about similar tools for Linux. I haven't yet found another tool like this that merits a detailed review, but I thought I'd share with you what I've learned about a few tools in this general category. While this bulletin focuses on Linux, there are some points that are relevant to other systems as well. Note that this bulletin assumes that readers have programming experience.

I investigated what I call "system call hijacking tools." These are potentially very powerful tools that you can use for robustness testing and a wide variety of other tasks. In fact, most of these kinds of tools are presented primarily as security tools, but system call hijacking has many potential applications. Here's the basic idea: a system call hijacking tool can take control of one or more running programs and modify the behavior when the program makes any system call. The tool might change the parameters that are passed to the system call, it might prevent the call into the operating system, and it might fake the results. Before delving into how such a tool can be very useful to a tester, I'll explain a bit more about what a system call is.

What is a system call?

A system call is a function call that executes code contained within the operating system kernel. Most of the operating system's fundamental services are accessed through system calls - opening files, asking for more memory, initiating network connections, rebooting, etc. Other functions may call into system libraries, or other third-party and user-supplied libraries. These other function calls can also be hijacked or stubbed for similar reasons, though the options for doing so are more limited than they are for system calls.

System call tracing

To help draw a picture of what these tools do, let's first consider the read-only equivalent - system call tracing. The name of these tools varies from system to system - look for something like "strace," "ktrace," "trace," or "truss." The most common one on Unix-like systems seems to be strace. I even found strace for Cygwin on Windows, plus Windows has Holodeck and tracing tools that come with some commercial development environments.

Here's the strace output for a simple program that calls malloc on Linux.
$ strace ./foo
execve("./foo", ["./foo"], [/* 32 vars */]) = 0
fcntl64(0, F_GETFD)                     = 0
fcntl64(1, F_GETFD)                     = 0
fcntl64(2, F_GETFD)                     = 0
uname({sys="Linux", node="localhost.localdomain", ...}) = 0
geteuid32()                             = 500
getuid32()                              = 500
getegid32()                             = 500
getgid32()                              = 500
brk(0)                                  = 0x80a39c0
brk(0x80a39e0)                          = 0x80a39e0
brk(0x80a4000)                          = 0x80a4000
fstat64(1, {st_mode=S_IFREG|0664, st_size=591, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40000000
write(1, "malloc returned 134888544\n", 26malloc returned 134888544
) = 26
munmap(0x40000000, 4096)                = 0
_exit(26)                               = ?
Here's the program, foo.c.
main() {
  printf("malloc returned %d\n", malloc(1));
}
If you look closely at the output, you'll see that the stdout output of the program is mixed with the strace output that goes to stderr. You can direct these output streams to two different files to avoid this problem, except that any stderr output from the program will still be jumbled up with the strace output.

Strace tries very hard to decode the system call parameters and show them in a readable form, such as "PROT_READ|PROT_WRITE" above, which would otherwise be a meaningless integer. Different trace tools vary widely in their ability to make the parameters readable.

Now that we can see the details of the system calls, let's move on to modifying their behavior.

How testers can use system call hijacking

If you could modify the system call parameters, the return value, and possibly replace or supplement the system call with your own code, you could easily simulate a huge variety of problems that applications need to deal with. For example, you could fake an out of memory or disk full error, or introduce data corruption on the disk or network, all using the same hijacking technique.

See the Holodeck review for additional background.

Catching system calls in the kernel

One way to hijack a system call is to hook a tool into the kernel itself. That's what the syscalltrack tool does. Syscalltrack loads a  kernel module that injects probes into the kernel system call table. It watches all current and future processes on the system by default, and it has a very flexible mechanism for filtering so you can zero in on the areas you're interested in. The tool seems to focus on logging. Here's the output from its strace clone, sctrace, on my "foo" program.
syscall: 6641["sctrace"]: 6_close(4) (rule 0)
syscall: 6641["sctrace"]: 3_read(3, "g", 1) (rule 0)
syscall: 6641["sctrace"]: 11_execve("/home/test/foo", bffff898, bffff8a0) (rule 0)
syscall: 6641["foo"]: 221_fcntl64(0, 1, 0) (rule 0)
syscall: 6641["foo"]: 221_fcntl64(1, 1, 0) (rule 0)
syscall: 6641["foo"]: 221_fcntl64(2, 1, 0) (rule 0)
syscall: 6641["foo"]: 122_newuname(new_utsname{c5f1fdf8, c5f1fe39, c5f1fe7a, c5f1febb, c5f1fefc, c5f1ff3d}) (rule 0)
syscall: 6641["foo"]: 201_geteuid(void) (rule 0)
syscall: 6641["foo"]: 199_getuid(void) (rule 0)
syscall: 6641["foo"]: 202_getegid(void) (rule 0)
syscall: 6641["foo"]: 200_getgid(void) (rule 0)
syscall: 6641["foo"]: 45_brk(00000000) (rule 0)
syscall: 6641["foo"]: 45_brk(080a39e0) (rule 0)
syscall: 6641["foo"]: 45_brk(080a4000) (rule 0)
syscall: 6641["foo"]: 197_fstat64(1, stat64{6, c5f1ff22, 2, 8592, 1, 500, 5, 34816, c5f1ff42, 0, 1024, 0, 0, 1064964199, 0, 1064964199, 0, 1064942448, 0, 2}, -973996256) (rule 0)
syscall: 6641["foo"]: 90_old_mmap(mmap_arg_struct{0, 4096, 3, 34, 4294967295, 0}) (rule 0)
syscall: 6641["foo"]: 4_write(1, "malloc returned 134888544\10", 26) (rule 0)
syscall: 6641["foo"]: 91_munmap(1073741824, 4096) (rule 0)
syscall: 6641["foo"]: 1_exit(26) (rule 0)
Syscalltrack is not able to modify the parameters sent to the system call or to avoid calling the system call altogether. I'm not sure whether it can modify the return code. It is able to generate some sort of failure. Here's a syscalltrack rule I wrote to cause any program named "foo" to fail every "brk" call:
rule
{
syscall_name = brk
rule_name = fail_brk
filter_expression {
COMM == "foo"
}
action {
type = FAIL
error_code = -12
}
}
I couldn't find any documentation on what the "error_code" means, though it seems to be the negative of an errno code. I believe the -12 will give me an ENOMEM. When I enable this rule, the "foo" program takes a segmentation fault when I run it, which is surprising. A debugger shows that the fault comes from the chunk_alloc() function before entering main(), which implies that something in the C run time is calling brk before my program has a chance to, and it's not able to give a proper error message. This is also a problem I ran into when using Holodeck to inject faults starting from the time the program starts. This is especially an issue when you're using shared libraries, which requires a number of extra system calls to start up the program.

To work around the startup issue, I added a 10-second sleep call at the top of foo.c. I turned off the syscalltrack rule and ran foo. After a few seconds, I re-enabled the syscalltrack rule, hoping that my malloc call would be the first to hit the fault. What actually happened is that sometimes the program aborted with no output, and sometimes it seemed to work just fine. I never got a 0 return code from malloc, which is what I wanted.

After looking at the trace output, I realize that I let my past experience on Unix systems blind me to the fact that my malloc call isn't actually causing a call to brk, but old_mmap instead. I presume that old_mmap is an entry point for the mmap system call. So I change my rule to target that instead:
rule
{
syscall_name = old_mmap
rule_name = fail_mmap
filter_expression {
COMM == "foo"
}
action {
type = FAIL
error_code = -12
}
}

But I still don't get malloc to fail. I have to combine the two rules and greatly increase the memory size that I pass to malloc in my test application in order to get the malloc call to trigger a failure. Again, instead of a NULL return from malloc, I get a segmentation fault before the statement after the malloc call starts.

Building syscalltrack was a bit of a challenge. I had to install the kernel sources, retrieve the config file from the /boot directory, and do the first few steps of building a kernel. It didn't work with the instructions that came with the sources, though it did work when I carefully followed the instructions on the web page, which were somewhat different from the documentation in the sources. I was using the same kernel version that the developers did (2.4.18-3, Red Hat 7.3). I have less confidence in how well installation would go on other kernel versions.

Also note that recent versions of the kernel (e.g., Red Hat 9) no longer export the system call table to kernel modules, so you would have to patch your kernel sources and replace your kernel before you could use syscalltrack.

Catching system calls using ptrace

Debugging and system call tracing are enabled on Linux and other systems by the ptrace system call. It's possible to exercise great control over a program using ptrace, including modifying the contents of its registers and memory. Modifying program behavior does require using some low-level architecture-specific knowledge of how registers are used and how parameters are passed to system calls.

This is the approach that the Subterfugue tool uses. (Yes, the odd spelling was intentional.) Subterfugue uses Python snippets called "tricks" to define what it does. Here's one I developed based on a similar trick in the examples, having no prior Python programming experience:
from Trick import Trick

import errno

class MemFail(Trick):
def usage(self):
return """
Makes every brk call fail with ENOMEM.
"""

def __init__(self, options):
self.options = options

def callbefore(self, pid, call, args):
assert call == 'brk'
return (None, -errno.ENOMEM, None, None)

def callmask(self):
return { 'brk' : 1 }
Here I set up a mask that says we're only interested in the "brk" system call. I define a callbefore method that specifies that the brk call should be aborted with an ENOMEM error. I then set a TRICKPATH environment variable to point to the directory containing my trick and run "sf --trick=MemFail ./foo". I get no output, and a 0 exit code. Again, it seems that I'm tripping up the C runtime startup code, and I'm not sure how to delay the injected faults until after the program has successfully started.

Subterfugue can also do system call tracing. For completeness, here's the output from its "Trace" trick on my foo program:
[7985] fcntl64(0, 1, 0) =
[7985] fcntl64() = 0
[7985] fcntl64(1, 1, 0) =
[7985] fcntl64() = 0
[7985] fcntl64(2, 1, 0) =
[7985] fcntl64() = 0
[7985] uname(-1073744048) =
[7985] uname() = 0
[7985] geteuid() =
[7985] geteuid() = 500
[7985] getuid() =
[7985] getuid() = 500
[7985] getegid() =
[7985] getegid() = 500
[7985] getgid() =
[7985] getgid() = 500
[7985] brk(0) =
[7985] brk() = 134887872
[7985] brk(134887904) =
[7985] brk() = 134887904
[7985] brk(134889472) =
[7985] brk() = 134889472
[7985] fstat64(1, -1073745664, 134878112) =
[7985] fstat64() = 0
[7985] mmap(-1073745696) =
[7985] mmap() = 1073741824
[7985] write(1, 'malloc returned 134888544\012', 26) =
malloc returned 134888544
[7985] write() = 26
[7985] munmap(1073741824, 4096) =
[7985] munmap() = 0
[7985] _exit(26) =
[7985] exited (status = 26)
# all child processes have exited
Subterfugue does not require any messy kernel module installation. However, I had trouble installing Subterfugue on Red Hat 9, perhaps an integration problem with a new-ish version of Python. On Red Hat 7.3, I found that I had to run at least one of the sample tricks as root because of a strange permission problem accessing /dev/<pid>/mem. Note that the Subterfugue web page warned a year and a half ago that the tool hadn't been updated for more than a year. My mail to the maintainer bounced, unable to penetrate an anti-spam system.

Having learned from the Subterfugue implementation a few details about how to use ptrace, I tried to write my own hijacking tool using ptrace. I was able to trace when system calls were entered and exited, but modifying their behavior required much more knowledge of the cpu architecture than I was able to glean from a casual reading of the Subterfugue code.

Other hijacking methods

There are a number of other possible approaches that you could use to hijack system calls. It's possible that you could use that age-old test tool, the debugger. Gdb, for example, has some decent scripting capabilities, if I remember correctly. If your application under test is dynamically linked, you could probably override the system call entry points by putting a library with functions of the same name first in the library search path, or else you could relink the application with the same sort of stub library. Yet another option, if you have the source code, is to instrument the code and recompile, renaming the system calls to the name of a function that you supply.

The Subterfugue web site includes links to a few other projects that might provide similar system call hijacking capabilities.

The bottom line

System call hijacking tools on Linux are not for the faint of heart. Some advanced knowledge may be required to install and configure them, besides the fact that you need to have a thorough grasp of the available system calls so you know what to target. In our example in this article, we had to know that malloc is merely a library routine, and it calls the brk and mmap system calls. Fortunately, a system call tracer gives us hints about where to target our tests. This example illustrates that it can be just as important to hijack a library call as it is to hijack system calls, and the tools I looked at can't do that. It would have been much easier to directly force a NULL return from malloc().