Trace Method Calls in a Python App

schollii

5.00/5 (1 vote)

2 Apr 2014CPOL1 min read

9.6K

Recipe for reverse engineering function calls in a Python app

Introduction

Here is a snippet of code that you might find useful if you have to reverse engineer some code (like a legacy app that you have to rewrite/update/etc.); specifically, which functions/methods call which other functions/methods. I tried using the trace module with --tracecalls but could not get any output. This recipe uses Python's settrace; the tricky part is to get the class name bound to (calling or called) method.

I put this together based on the Python docs for settrace, an article by PyMOTW, and an SO post.

Using the Code

Merge the following bit of code into your main Python script:

import sys, inspect

class TrackCalls:
    def __init__(self):
        self._traceCalls = set()
        sys.settrace(self._trace_calls)

    def _get_func_name(self, frame):
        module_name = inspect.getmodule(frame).__name__
        func_name = frame.f_code.co_name
        arginfo = inspect.getargvalues(frame)
        if len(arginfo.args) > 0 and arginfo.args[0] == "self":
            func_name = "%s.%s" % (arginfo.locals["self"].__class__.__name__, func_name)
        return func_name

    def _filter_callee(self, func_filename): ###
        #if 'yourmodule.py' in func_filename:
        return True

    def _trace_calls(self, frame, event, arg):
        if event != 'call':
            return

        func_name = self._get_func_name(frame)
        if func_name == 'write':
            # Ignore write() calls from print statements
            return

        func_line_no = frame.f_lineno
        func_filename = frame.f_code.co_filename
        if not self._filter_callee(func_filename):
            return

        caller = frame.f_back
        caller_funcname = self._get_func_name(caller) #.f_code.co_name
        caller_line_no = caller.f_lineno
        caller_filename = caller.f_code.co_filename

        self._traceCalls.add((caller_filename, caller_funcname, func_filename, func_name))

    def _simplify_trace_filename(self, lineItems, modules):
        line = list(lineItems)
        for modName in modules:
            if modName in line[0]:
                line[0] = os.path.basename(line[0])
            if modName in line[2]:
                line[2] = os.path.basename(line[2])
        return tuple(line)

    def _filter_line(self, lineItems): ###
        return True

    def save_data(self):
        sys.settrace(None)
        sys.stdout = sys.__stdout__
        traceOut = open("trace.txt", 'w')
        self._traceCalls = list(self._traceCalls)
        self._traceCalls.sort()
        for line in self._traceCalls:
            if self._filter_line(line):
                ### simpleLine = self._simplify_trace_filename(line, ['mod1.py', 'mod2.py'])
                traceOut.write('%10s %-40s %10s %-40s\n' % simpleLine)
        traceOut.close()

if __name__ == '__main__':
    tracker = TrackCalls()

    ...

    tracker.save_data()

This will create a file called trace.txt in the folder from which you start script. The parts you can customize are marked with ###:

_filter_callee: Aallows you to filter out some tracing that is not of interest. Return True to accept a callee (in the above code, all callees are tracked). You could make it return True only if "your_module.py" is in func_filename, but keep this function simple as it will be called for every function call: it is probably better to filter out lines during the write loop for trace.txt (end of script).
_filter_line: Make this return False for lines that are not of interest; this can be compute-intensive such as filtering out a whole bunch of module names, because this is called after your script has ended.
simpleLine: Uncomment this to simplify the filenames for called/callee filenames.

History

April 2014: First version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)