Friday, April 25, 2014

Debugging library loads in OS X using dtrace

When your OS X app starts to get a little bigger and a little complicated, you may encounter some issues with library loading. Getting the correct libraries to load is not always easy and in our case, where we develop a Python app and need to use a different Python version than what's on the system, even more trouble is added to the mix. Sometimes the system Python library sneaks in and our poor app then crashes with a sad message

Fatal Python error: Interpreter not initialized (version mismatch?)

or, alternatively, with an even more sad message saying

Fatal Python error: PyThreadState_Get: no current thread

These are caused by mixing two different Python library versions in the same process. They step on each other memory structures and don't play along nicely. Need to separate them.
But how can you quickly figure out which of your modules is referencing the wrong Python library? Use dtrace. I already wrote a post about how useful this tool can be. And we can employ it for this task as well. On OSX, it comes with a nice GUI app called Instruments (a part of XCode package). In this app, click the Instrument menu and select Build New Instrument... You will see the following window and now you need to fill in the details as below. In principle, you'll put a break/tracepoint on the function dlopen in libdyld.dylib and tell the system to record the stack trace in that moment as well as the first function argument which is the path to the library to be loaded.

When you set this up, add the new instrument to your current project and select the program to instrument. You can attach to an already running process or start a new instance. The result will look like the image below. You can see the library load events together with the path argument and for each item you can see the call stack on the right.
Awesome! Now let's get some ice cream.

Monday, April 14, 2014

Dev basics: Purpose explanation

Code reuse is essential to reduce the time spent not only on development (writing the code) but also on maintanence (figuring out what went wrong and where the hell in the huge codebase the problem lies) and it's a generally a good thing. However, to do it properly, any code unit with a potential of re-use should also have a good documentation. And good documentation does not mean this:

    This function returns the background color.
    @returns the background color
Color getBackgroundColor() {}

that's actually bad and useless documentation. Instead, documentation should provide the intention with the highest priority and then any other technical details.

Intention is a very important concept for code documentation. Developers should try to write down the intention they have in mind for a given piece of code (function, class, module) when they are writing it. Any usage of that piece then must be in line with that intention because it wouldn't make sense to use code in conflict with its intention. And when something in a program doesn't make sense, it will break.

You see, code is not just code. It has a meaning. Even when the code implementing a given purpose is very simple and incomplete, it still has a meaning and must be used according to its meaning and purpose. If you use a piece code based on reading its implementation but ignoring its intent, you are going to get in trouble when it changes. But how do you know the intent when the original developer didn't write it down?

Let's see a very simple example. Consider this function:

def join(path_elements):
    return '/'.join(path_elements)

and this function:

def join(path_elements):
    'Creates a WebDAV HTTP request url from given path elements'
    return '/'.join(path_elements)

With the second one, you immediately know that you cannot use it to build paths for the local filesystem because that has different rules than HTTP URLs! It would work at the moment on Linux but break as soon as someone made a WebDAV specific change. And, of course, this distinction should be obvious from the function name, that's the first place where you should attempt to document the meaning. If it's too long or complicated, continue in the function's documentation.

Document the purpose or intent of the code, the rest can be usually read from the code. This is easier said than done, sometimes you're not really sure what the one purpose of the code is or you don't know how to explain it in writing. That's probably a signal that this thing is harder to understand and that's a signal that the next programmer will have trouble as well. And that means that you need to spend more time thinking about this because it will pay off. Help them and write some docs!