Friday, February 10, 2017

Bali Impressions

  • They still have rainforests! It is one of the places where the oil palm tree monoculture has not (yet, gasp!) replaced the orignal old forest.
  • Local bell music sounds very nice.
  • There seems to be a local tape with flute and bamboo stick music that's supposed to be chill and relaxing. Our hotel played it every morning at breakfast. From 6 am. From a speaker right under my window. It woke me up reliably every morning and I hate it now. But it seems to be pretty popular in Bali anyway, I heard it in Ubud a few times.
  • If you like exotic arts, you want a wood, stone or bone sculpture from Bali.
  • It's not super cheap. If you've read one of those 'personal finance' blogs saying that 'you can afford to live abroad and it'll be cheaper than home' then, well, don't come to tourist centers of Bali.
  • It's touristy. Nusa Dua, Jimbaran and Ubud are all already rather urbanized and tourism is well established. Not for you if you're trying to get away from people. Better stay at home with your computer.
  • The people are friendly and merry. Our Uber driver was joking all the time that his eyes are not good and that he likes funny mushrooms from Lombok. We survived so it was funny in the end.
  • Did I mention the beautiful nature? Rainforest (with authentic rain as well!), sandstone cliffs, deep blue sea, reefs. And I was lucky to see a few manta rays. Majestic.
  • Food is great too. Curry and curry-like seasonings or peanut sauce is what I will remember most as local flavours and crackers, skewers and sugar peanuts is what I will remember in terms of ... shapes?

There are some downsides too:

  • Taxi mafia. Uber, on the other hand, has been a great user friendly experience.
  • Hawkers trying to push you into buying stuff you don't need. They don't worry about lying about it either.

9 / 10, would visit again

Sunday, January 29, 2017

Reverse Engineering Android APKs

Although I've never actually written any Android app, I've been playing around with its internals a bit. I own a phone that has CyanogenOS by default (that's already history), of course I've rooted it as well as bricking my previous phone during ROM changes.

I also tried reversing some android apps with various degrees of success. My main project was 'hacking' the Xiaomi YeeLight bedside lamp app to be able to control it programatically. Xiaomi did not provide any API but if I can modify the APK to accept commands, that's all I need.

Here are the slides for a talk I gave about basic reverse engineering in Codeaholics Hong Kong meeting. After that you can find some more details about the YeeLight case.


  • obfuscation (great against getting a general view but not if I'm targeting one specific thing)
  • anti-decompilers (can be always bypassed)
  • anti-debuggers (can also be bypassed)
  • time investment (can not be bypassed)

.apk contents

  • Java code compiled to smali register VM, saved all in classes.dex
  • AndroidManifest.xml in some kind of binary form
  • native machine libraries .so (ARM, x86, ..)
  • resources


  • Icelandic "assembler"
  • register based, as opposed to standard JVM stack-based
  • closer to the CPU, less work for JIT compiler
  • reasonably readable
    const-string v5, "UTF8"
    invoke-static {p0, v3, v4, v5}, Lcom/google/zxing/client/result/optional/NDEFURIResultParser;->bytesToString([BIILjava/lang/String;)Ljava/lang/String;
    move-result-object v2


  • no variable names (unless debug symbols)
  • try/catch blocks often broken
  • usually can't use Java compiler to put it back together
  • obfuscation -> all methods and classes are now named alphabetically (cd.i(a, b, c, d))

BCV front-end

  • Makes it easy to run decompilers on .dex or .jar
  • still not quite there for more in-depth analysis
  • so I use ... a text editor!
  • decompile everything to .java, put in git and write comments

Patching APKs

  • Example: YeeLight
  • write a new class in Android Studio (add YeeLight.jar to project)
  • compile to .smali
  • add the smali to already extracted apk folder/smali
  • modify .smali files to construct and invoke the new class

  • rebuild using apktool
  • sign
  • zipalign
  • Install on your device!

Working on the YeeLight app

This app is obfuscated and, quite honestly, contains a lot of code. It has a screen with a colour gradient where touching the colour would change the light color accordingly. I started by finding this Activity and trying to find the click handler. I planned to go deeper and eventually end up in the code that's sending Bluetooth commands but I got lost.

Then I tried to watch the logcat while using the app and found that the colour changes are being echoed in the log. One code search for this particular string got me into a class that was fully obfuscated but probably was somewhere on the way to sending the commands. Further reading the decompiled code revealed a consumer for these messages as well as conversion from a colour object to the Bluetooth message.

The next step was to write a network listener class in Java. It would run in its own thread and accept UDP packets sent to the broadcast address. Each colour change requires only 4 bytes of data so UDP is the simplest choice. Broadcast address is used to avoid needing any configuration - I can just send it out on my home network.

This Java code now needs to be converted to a .smali file. There are tools that should be able to convert it directly from a .class or a .jar but at that time, they did not work. So I ended up creating a dummy Android project in Android Studio to achieve the same result:

  1. Create a project in Android Studio.
  2. Convert classes.dex from the YeeLight apk into a YeeLight.jar using dex2jar.
  3. Add the YeeLight.jar to the project as dependency. This will allow you to call methods from the original APK.
  4. Build APK from the project.
  5. Use apktool to disassemble the result, obtaining a .smali file for your class.

Now you can add this new .smali file to the original APK. You also need to actually create an instance and call this new code in an appropriate place. That requires manually editing the existing .smali code of the app. If you can find where, it's not too difficult.

Finally, rebuild the APK using apktool, zip-align and sign it. This process is a bit more complicated than it should be so I have a little script for it right here: My Apk Scripts

Now you can install the app and try it out. If it works, you may want to disable updates for it otherwise the Play store will overwrite your efforts.

With a custom plugin for Kodi that sends the colour commands over UDP, the result is this:

List of resources and tools

Saturday, November 26, 2016

KeePass Ultimate Setup and Security Guide

1 Introduction

Passwords are our gateway to interacting with the digital world. It's how we show that it's really us because no one else could know our password, right? Passwords are not perfect or very convenient to use but it's the only thing we have now. Better options are being researched, one of them could be the U2F token but for now we're stuck with passwords.

I heard people don't follow the best practices for safe passwords. And who's to blame? We are supposed to have strong passwords containing all kinds of crazy characters and different for each site. And everybody is using at least 10 sites on a regular basis plus around 100 other random sites they already forgot about. Humans can simply never remember 10 or more strong passwords and if they can, it's probably because they've been participating in memorizing competitions.

Let the computer remember things for you and you can forget all your passwords except one. Using a password manager (in this article I'm introducing KeePass 2), you can save all your passwords securely encrypted with a single master password. This master password will be long but you'll be able to remember it easily because you'll use it every day and it's the only one you need.

In this article I'll introduce KeePass 2, the open source password manager as well as a security analysis. So you can have concrete arguments explaining why it's secure. The first part of each section will explain how to use the password manager securely and is required reading. The second part will explain how the security works and you don't have to read it.

1 1 Security analysis

  • It's necessary to use a different password on different sites in case one of them gets breached (it did happen, LinkedIn, Yahoo, ...). If you're a hacker and need a password for a more important website, first try to compromise other services that person is using.
  • What if somebody compromises my computer and steals my unlocked password vault? That could happen but in that case they'll also have access to all your private files and even if you didn't use a password manager, access to websites you're already logged in to. Keeping your devices free of malware is always necessary.
  • "I still don't feel good about centralizing all my passwords in one place", you say. That is generally a sound security attitude but consider that your primary email account already centralizes access to most of your services because it's used for forgotten password reset.
  • For critical sites (such as email), it's best to also use 2 Factor Authentication.

2 Getting started

2 1 Download

The original KeePass 2 application is Windows only. It can be downloaded from this page Choose the Installer button on top right and wait a moment for the download to start.

Alternatively, download from, choose "KeePass Installer, Professional Edition" (it's a strange name choice. Don't download the classic edition).

For a Mac, download KeePassX from and install in the usual Mac fashion.

2 2 Installation

When starting the installation on Windows, it should show a security window asking if you really want to install this program. This window MUST show Open Source Developer, Dominik Reichl. If not, do not allow it and delete the downloaded installer as you got a bad copy.

Security Analysis:

  • The project homepage as well as SourceForge mirrors don't have HTTPS. That's a bummer but the application files are digitally signed by the developer and the certificate is recognised by Windows. Therefore checking the digital signature provides stronger security than HTTPS. Furthermore, the FOSShub link is served over HTTPS.
  • The homepage for KeePassX does use HTTPS as well as the download. It does not have digital signatures but it can be downloaded from a website owned by the project's author and not a third-party (as is the case with sourceforge).

2 3 Choosing the master password

After you install the program, you can create a new database. Now is the time to create your master password.

Setting the password

This will be the main password that unlocks your database. It must be strong, stronger than your Facebook or banking password. It must be a new password, not something you were using before on a website. You must remember it well (try to type it a few times and then again the next day).

Your master encryption password needs to be really good. It should be at least 12 characters long but a better way is to pick a dictionary book and randomly pick 5 or 6 totally unrelated words. Maybe you can even combine multiple languages! "pasta blip port Bled nehmen" sounds good.

Setting "encryption difficulty"

After creating your database, you may want to go to File / Database Settings and then Security tab. Here, click the "1 second delay" link to properly set number of key transformation rounds. This is basically something like "encryption difficulty" and it increases the time taken to unlock the vault. A 1 - 5 sec delay is sufficient if you have a good password.

Don't forget to save you password vault file!

Security Analysis:

  • The problem with encryption passwords is that a potential attacker, after stealing your encrypted database, can just keep trying all possible words until they can crack it. Actually they'll program a computer to do it while they are having a beer. The computer can try alot of passwords per second.
  • Because of the danger of cracking the passwords, encryption tools also include a delay to slow it down. You can configure it in KeePass. The bigger delay and the better the password, the safer you are.
  • It's a good idea to increase the "encryption difficulty" 5 years later because computers will be faster in the future.

2 4 Settings

These settings are subjective and also depend on who can have access to your machine. This is what I would recommend for normal use. In Tools / Options:

Enable "Lock workspace after global user inactivity" and set it to 360 s or less.

Enable "Clipboard auto-clear time".

Enable "Lock workspace when computer is about to be suspended".

On the Interface tab, I like to enable "Drop to background after copying data to the clipboard".

2 5 Settings for KeePassX

This program is slightly different from the original Windows KeePass 2. Transform rounds ("encryption difficulty") can be set in Database / Database Settings. Again, you can click the Benchmark button to configure it to a recommended value.

Enable automatic locking in KeePassX / Preferences, on the Security tab.

2 6 Plugins

There are many plugins created by the community for KeePass. Currently I'm using none of them. Be careful because plugins can break security of KeePass and even their authors may not realize that. For example a browser integration plugin increases the risk quite a bit.

3 Day to day usage

Besides the security and cryptography, KeePass is a pretty ordinary program from a user perspective. Click Edit / Add Entry ... to add a new password entry. The program will automatically generate a new strong password for you so you only need to enter the site name and address (used by browser integration). Then click OK and File / Save to save the database.

To use a stored password, you have two options. The first one is to copy to clipboard (simply Ctrl+C) and paste in the website. The second option, which is slightly more convenient and slightly more secure is to use Auto-Type. Switch to your browser and place the cursor in the login form, in the user name field. Then switch to KeePass and select Perform Auto Type on the password entry. It will automatically log you in!

You can also create groups and assign icons to your entries but I think it's best to simply search for a site when you need it using the search box on the toolbar.

You can also use KeePass to safely store any other pieces of information such as bank PIN. It's not very suitable for storing files though. You may need to look at your OS' disk encryption or VeraCrypt.

4 Syncing the database

It's 2016, you probably have more than one computing device. Maybe you have too many of them. And you need to access your password database on all of them. This is where KeePass lags behind the commercial password vaults because you'll need to set it up by yourself. But don't worry, you can just use Dropbox or Google Drive ... or OneDrive or SpiderOak or any other file sync service you may already be using. Just put your password database in there and you're done.

Sounds insecure? Well the database is encrypted so if your password is good, your data is safe. Still feeling uncomfortable about it? You can add another factor - a keyfile. KeePass allows you to generate a file that is required to decrypt the database. You will then manually (using an USB stick) copy this file to any computer you want to use the password database on. Do not put it in Dropbox! Without the keyfile (and your password) there's no way in hell anyone could crack your encrypted database.

4 1 Step by step

Dropbox and Microsoft OneDrive will automatically sync any file you put in their special folder. Other similar services will probably do the same but I haven't used them.

First, add a keyfile to your password vault. If you already have created one, open it in KeePass and choose File / Change Master Key. In the dialog box here, enable both Master password and Key File. Type your master password again (don't need to change it). Then click Create to create a keyfile. Do not put this keyfile in your Dropbox. After finishing this, you can save the password vault to your Dropbox and it will be synchronized to your other computers using Dropbox.

Now you need to transfer the keyfile to your other computers. The best way to do this is offline, without using the internet. Copy the keyfile on an USB stick and use it to copy the file. Again, do not place the keyfile in the Dropbox folder. You should consider locking this USB stick safely to keep it as a backup of your keyfile. If not, don't forget to delete the keyfile off the USB stick before using it for something else.

Now you can use your password and the keyfile to open your password vault. The vault will be synchronized by Dropbox

4 2 Security Analysis:

  • If even a bit worried, use a keyfile.
  • If you lose your keyfile (or your master password), you won't be able to open the password database, ever. So write both on a paper and keep it at home, in a safe or something.
  • I'd prefer using a file sync service that supports file versions such as Dropbox or Google Drive. MS OneDrive can't.
  • Really, no one can break the encryption (AES algorithm). And if the NSA can, it'll cost millions of $$. Hacking your computer will be cheaper so that's what you should focus on next.
  • A practical way to delete the keyfile from an USB stick is to completely fill up the USB stick with other data (such as large movie files). Unfortunately it may not guarantee all traces of it disappear since flash chips may over-provision to make up for faulty portions. So the most secure way is to not use an USB but rather copy the file manually (it's just text and not that long).

5 Other password managers

Before KeePass I've been using LastPass. Together with 1Password, these seem to be the most established password managers at this time. Let me share some thoughts about how they compare. Note that the security analysis here focuses on the worst case scenarios and can sound a bit scary.

In terms of price and development model, KeePass is free and open source, LastPass is commercial but free for basic use and 1Password is fully paid. It's easier for security people to check the security of open-source software.

LastPass works as a browser plugin, same with 1Password. That's more risky from security point of view. For one, malicious websites might find some way to steal a password. KeePass is simple and isolated from the browser. Also, if a commercial password manager company changes management, gets sold or becomes subverted by a government, it could publish an update of its browser plugin that steals your data. That's a risk with all software that you use, including Windows or OS X. Again, KeePass is slightly smaller risk in this respect if you carefully check each update that you install.

For ease of use, the commercial programs may be more convenient. They take care of synchronization for you and 1Password is beloved for its user interface.

5 1 New password managers

While it's great that people try to innovate in the security area, I'd be always wary about new password managers until it's proven their developers know what they're doing. Security is not easy and a new product made by people without proper knowledge and experience can be a risk, even when the developers have good intentions.

6 Who am I to write about this?

I've been a software developer (a computer guy) longer than I can remember and in the past few years I've been focusing on cryptography engineering and security, studying and implementing cryptographic things at work. I found a crypto problem with a browser extension for KeePass. So I know enough to realize that I don't actually know enough yet! Also, I'm a level 45 crypto wizard ;)

Have I personally audited KeePass? Nope. But it's trusted by internet people and honestly, there's not that much to screw up since it's a rather simple program. I hope to take a look one day.

Friday, October 21, 2016

Crafting reliable C++

Everybody hates bugs [citation needed]. Why spend time hunting down bugs when you could be shipping code and earning money? This is especially true of low-level memory management bugs in C/C++. If you’re a newbie, words like memory leak, use-after-free or double-delete may not have an effect on you but any experienced developer will recoil at these words in horror or start chanting “asan, asan, asan, …” So, OK, we don’t want these bugs in our code, let them go somewhere else. This is especially true in my field of crypto engineering where bugs in crypto code or protocols can lead to complete failure of the projects. And bugs in ordinary programs can cause security problems just as well.
The weapons I choose for this battle are Modern C++, automated testing, static analysis from `the compiler, dynamic analysis tools, fuzzers and patience. Some people don’t like C++ or code C++ like it was plain old C. Sure, those Linux kernel programmers have a quantum computer in their head that lets them simulate all possible program paths at the same time to see where a lock or memory are not released. But I’m just a human and can barely keep track of the socks in my drawer so I have to find another way, one that is more fool-proof and doesn’t require so many socks.
So here are my top n tools and techniques to make my C++ less buggy, crashing less often and more reliable. It’s totally not a complete list but rather an introduction, a base level of tooling that everybody should know about but somehow not always does.

1 Deleting objects

C doesn’t have a garbage collector so we have to clean up our garbage manually. If we don’t, the program will eventually drown in garbage and run out of memory (but not before trashing the hard drive by swapping). The old and deprecated option is to do a delete obj; manually. This is completely unreliable (because it’s manual). You may forget to do this when having multiple exits from a method. Or when a method throws an exception. Or when another method you call throws an exception. Not to mention returning objects that the caller then has to free.
The modern approach is to use RAII which is a weird name for a simple concept: use the destructor to do all cleanup at the end of scope. If we have a scope with an object on the stack like this:
        VictorTheCleaner v;
then the destructor of v will be called when the scope is exited and it can take care of any deleting, releasing and cleaning that’s necessary. A scope can be exited in several ways:
  • normal program flow reaches the } brace
  • a statement such as return, breakor continue causes program flow to jump out
  • an exception is thrown
These are cases where you would have to manually do a delete and where the destructor will do it for you. The most common usage is a smart pointer that “owns” a non-smart (stupid) pointer and is responsible for deleting it. Another usage can be making sure you close a database connection. Or close a file. Or a network socket. Or change the process current directory to what it was before. This is equivalent to the using clause in languages such as C# and Python. C++ doesn’t have a dedicated keyword, instead we use the destructor.

1 2 unique_ptr

The standard library template class std::unique_ptr<T> is designed to take care of the most common case - when you need to automatically delete an object. Use std::unique_ptr<T>. Use it for function local variables, use it as object members (to be safe if your constructor throws). Use it especially for C land objects that are created with functions like EC_POINT_new() and must be deallocated with EC_POINT_free(). This is how you can set a user defined function as the deleter:
template<typename T, void (*Fn)(T*)>
class function_deleter {
    void operator()(T *p) {
        if (p != NULL) Fn(p);

template<typename T, void (*Fn)(T*)>
class unique_ptr_ex {
    typedef std::unique_ptr<T, function_deleter<T, Fn>> type;

    // do not instantiate this class, use unique_ptr_ex<T, Fn>::type
    unique_ptr_ex() = delete;
The first class defines a functor (something that has operator()). This is necessary because the second template parameter of unique_ptr is a type. The purpose of the second class is to act as a typedef with a parameter. It’s a workaround because not all compilers I’m using support the new template typedef in C++11. These two classes simplify creating unique_ptr with different cleanup functions:
unique_ptr_ex<BIGNUM, BN_free>::type m_privkey;
unique_ptr_ex<EC_POINT, EC_POINT_free>::type m_ec;
Then just initialize m_privkey with a new BIGNUM that belongs to you and deallocation using the openssl-provided function BN_free is taken care of automagically. Same for m_ec!
For details how to use unique_ptr, see the reference. I’ll try to give a few basic guides here.
  • Use it for local variables that you created in the function and need to clean in the same place.
  • Use it for member variables of a class that “owns” these variables. Owning them means that the class is responsible for cleaning them which typically happens when the class itself is deleted.
  • You can also use them as return types for functions that create an object and pass its ownership (responsibility for deleting) to the calling function. No more uncertainty on who should do the cleaning, unique_ptr tells you you are the responsible owner and will do it automatically for you too.
  • Do not use it for object pointers that do not transfer ownership. If I call a function that uses an existing object, I use a naked pointer or a reference (if it can’t be null).
  • Do not use it for member variables of a class if the class is not owning that object.
The last case is a little tricky. In some cases, more than one class are responsible for a cleanup job and that’s where a smarter smart pointer such as shared_ptr would come in. I try to keep things simple and have only one owner for each object so that I can use unique_ptr.

1 3 Other cleanup

Sometimes you need to do more things besides deleting an object upon scope exit. Maybe you need to close a database connection, restore the original value or roll back an object to a previous state.
You could create a new class with the appropriate code in destructor for each of these cases (for example PostgreCloser, PwdRestorer, …) but that is a little inconvenient. That’s why there is ScopeGuard, a class where you can redefine the cleanup code in-place.

2 Program invariants

A short intermission is necessary to explain the term invariant. The term literally means “something that doesn’t change” and in programming that will be a condition (a logical statement) that mustn’t change and be always true while the program is running because the code relies on it and things would break otherwise. There can be invariants on particular lines in the code (at the start of a function, in a loop) or they are related to a data structure or an OOP class. Invariants are usually not expressed in the programming language itself, it’s just something that we keep in mind and use when thinking about how the code will behave. Or at least should keep in mind. Ideally.
For example a data structure invariant for binary search tree is that each node has at most 2 children. A more interesting invariant requires that nodes under the left child are all less than the current node and all under the right child are more than the current node. If this condition doesn’t hold then efficient search in binary search tree will be broken.
A string splitting function will have an invariant at the end of the function (called post-condition) that says that the two returned parts, actually form the original string if reassembled. A splitting function would not otherwise be very useful. A memory allocator will have an invariant stating that all active allocations are kept track of so that a further allocation cannot occur in a piece of memory already given to someone else.
Invariants help us describe program behaviour and requirements using logic. At this time, mainstream programming languages don’t have any support for working with them. But there are languages that focus on correctness in the academic research community and they let you write those invariants in the code for formal checking.

3 Error handling

I’m coding a Modern C++ interface around some library from C land and using exceptions to signify any errors because I don’t want to deal with manual propagation of error codes up the stack and I always want to know when an error happens[ (see here)]]( Manual error code propagation clutters the code, makes it harder to grasp and is prone to human errors. Modern C++ / OOP programming encourages proper object initialization in the constructor (as opposed to an additional Init() method) and exceptions are the only way to report errors from there (also note this Boost article).
Now some people don’t like exceptions in C++ but I found that most (not all) of their arguments are based on limitations in the old versions of the language or simply ignorance of how exactly the language works. Be sure to get familiar with recent development in the C++ language, most significantly the RAII pattern. Of course exceptions have some drawbacks too and some properties that need to be kept in mind (or your quantum computer brain):
  • Throwing and catching exceptions is s……l..o…w. If you expect this to happen often, for example in parsing code, you need error codes (or Expected, see below) see comparison.
  • Throwing an exception in a destructor is very destructive. It will probably crash your program (see here for an example).
  • Throwing an exception in the constructor means the object construction was cancelled and destructor won’t be called. That makes sense but also could mean that a delete m_obj; in the destructor may never be invoked (again, example here) even though you have already new’ed it in constructor. That is one more reason to use unique_ptr for member variables since these variables will be protected against a sudden death by exception from constructor.
  • Only one exception can be in flight for a thread at a moment. This means that the exception model cannot support async callback based programming aka. Node.js or Python Twisted and you need to store exceptions manually (mentioned in this talk).
  • Exceptions can appear anywhere and there’s nothing in the code that will warn you about them.
If exceptions are not an option, you’ll be returning error codes. I’d suggest a slightly more sophisticated approach. The Expected type can be returned by functions that are supposed to produce a result but could also fail. It has type parameters for the result value as well as for the error. Then (for example) a parsing library could use the following interface:
struct ParseError {
    int line, col;
    std::string expected;

Expected<int, ParseError> parseInt(std::string);
Expected<int, ParseError> parseHex(std::string);
Expected<Url, ParseError> parseUrl(std::string);
The Expected class also seems to support the error monad programming style but that would be for another day :)

3 1 Error mis-handling

Beginners tend to ignore errors, I still remember I was doing it. I could barely manage to write the code to do what I wanted in the first place. But if we’re talking about reliable software, ignoring errors is unacceptable. Quite the opposite, we need to know each and every error that happened, is happening or may happen.
Errors that have happened need to be logged using an appropriate logging framework and in some cases may even be stored in a database for further analysis or sent to a remote monitoring server. See more here Exception Driven Development
Errors that are happening right now need to be detected. If calling an external library that doesn’t throw exceptions but returns an error code, check the returned code and throw, log or handle (retry?) the problem! From this point of view, exceptions (as opposed to returned error codes) really help because they are not ignored by default so the risk of forgetting to report an error is lower. But then again some newbies will very carefully put a catch block around each function so that they can ignore the valuable exception object that bears details about the problem.
How about errors that may happen in the future? Try to anticipate possible problems in the future but don’t try to recover, auto-repair or anything like that. Instead, check invariants and assumptions about your data structures consistency using assertions that report problems immediately. For example, if you have a class that is not thread-safe and you designated it to be only accessed from a single thread, assert it (this is a trick I found in Chrome source code):
void Gui::UpdateBlinkies() {
    assert(GetCurrentThread() == MainThread);

    m_blinkie ++;
The point is, again, to discover any problems, inconsistencies and unexpected situations as soon as possible because you are better able to debug and fix the problem. If the program crashes 10 minutes after the problem started, how can you trace the crash 10 minutes back to its original cause?
Having some hard-to-debug problem that a client reports but you can never see on your machines? Then you should have a log file that provides additional information. If even that doesn’t help, rather than spending a week trying to reproduce the problem on your machine, you can ship a debug build to the customer that collects information that you need or one that has enabled asserts. If you have used them well, that alone may be able to pinpoint the bug.

3 2 Exception safety

But no matter if you choose exceptions or error codes to handle those unusual unhappy cases, you still need to be careful about exception (or error safety). This means that you need to
1) release any resources that were acquired before an error
2) return the application into a consistent state (invariant safety)
Everybody should be pretty familiar with point #1 where doing some new BigObject() must be always followed by a delete even if you get an exception in between.
Point #2 is similar except that it is specific to your application invariants.
For example, if you are keeping some data in two structures and always need to update both of them on inserting, you need to make sure both things happen (or get rolled back) even if an exception is thrown:
void insert_both(string a, string b) {
    m_by_name.insert(a, b);
    // OMG, what if an exception happens here?
    m_by_addr.insert(b, a);
Handling this case could be still easy, just add a catch, remove the item and exit:
void insert_both(string a, string b) {
    m_by_name.insert(a, b);
    try {
        m_by_addr.insert(b, a);
    } catch (...) {
        m_by_name.remove(a, b);
But if you need to do something like this twice in a function, it starts to get complicated. Fortunately, C++ provides an elegant and convenient way to take care of both requirements. Memory safety has already been described (remember unique_ptr). Invariant safety can be done with a ScopeGuard which is a more flexible alternative to unique_ptr. There’s an implementation in the Facebook’s folly library. For the above example with two maps, you could use it in the following way:
void insert_both(string a, string b) {
    m_by_name.insert(a, b);
    ScopeGuard insert_guard = makeGuard([&] { m_by_name.remove(a, b); });

    m_by_addr.insert(b, a);
If any exception occurs in the code, the insert_guard will execute the remove operation to restore the original state. If everything goes smoothly to the end of the function, the scope guard will be cancelled by the dismiss() call.
This way you can have nice linear code which is easy to understand even if there are more than 1 rollbacks. Just imagine the scope guard as “do this cleanup if anything goes wrong down there” as opposed to having nested try-catch clauses with many possible combinations of control flow.

3 2 1 Testing exceptions

When we test our code, we usually focus mostly on the ‘happy path’ where everything goes as planned and the edge cases receive less attention. But if we want to have truly reliable code, even those error or edge cases deserve some attention. If you use code coverage tools, they will keep flashing their red warnings in the exception handlers at you until you add them to ignore list (err, I mean, fix them).
Proper unit testing (covered later) of course requires also testing the error and edge cases. You should try to come up with possible incorrect inputs (or problematic program state) and know, for each of them, how the program should handle it. And write this down in an unit test. In this way both the “happy” and failure behaviours of the function are well documented and verified to be correct.
This approach for unit-level testing is well established. On the more coarse scale, there is another technique where we artificially throw exceptions and check that they are handled appropriately. We can throw exceptions at various places in the program and check general properties such as whether it causes memory leaks, memory corruption, or crashes. This is only relevant in languages such as C++ which have those memory issues by default.
To automate this, you would put instrumentation points at interesting places in your program. Then you run your program or test suite over and over, triggering these instrumentation points in sequence. If each run of your program is deterministic (it takes the same path each time), you will have triggered each of the N points in the end, after running the test suite N times.
Since this is sorts of mass exception injection approach, we cannot test for specific behaviour of specific cases, only for overall response to exceptions. Memory correctness will be the most typical case. Another one could be ensuring that all those exceptions are properly logged. This method is very useful if you’re creating a binding to another programming language such as Java or Python or even plain old C. Typically you need to catch exceptions in the C++ world and translate them somehow into exceptions or at least error codes in the target language without messing up the memory or exception safety.
You also need to run this under a memory checker such as Valgrind, Asan or PageHeap which will inform you if any memory leak or access violation occurred. If all goes smoothly, you’ll know that exceptions can’t mess with you. It also probably means that you used RAII and unique_ptr correctly because without them it’s hard to make memory management right in the face of exceptions.
This approach has also been described in Exception-Safety in Generic Components.
This is how you may implement it:
// Once placed in code, it can be redefined to do different type
// of instrumentation such as heap consistency checking.
// NOTE: most likely, this should be disabled in release build
#define INSTRUMENTATION_POINT { g_instrument->RunPoint(); }

class ExceptionInstrument {
    static ExplosiveInstrumentator &instance();
    void dispose();

    bool should_throw();
    void maybe_throw(const std::string &file, int line);
    static void instrument(const std::string &file, int line);

    void next_run();
    void set_run(int no) { m_run_id = no; }

    // singleton
    static ExplosiveInstrumentator *g_instance;

    std::string get_filename_base(const std::string &path) const;

    int m_run_id;
    int m_counter;
    int m_threw_cnt;

void ExceptionInstrument::dispose() {
    if (g_instance != NULL) {
        delete g_instance;
        g_instance = NULL;

void ExceptionInstrument::maybe_throw(const std::string &file, int line) {
    string basename = get_filename_base(file);
    stringstream ss;
    ss << "Instrumentation exception F " << basename << " L " << line;

    if (should_throw()) throw std::runtime_error(ss.str());

void ExceptionInstrument::instrument(const std::string &file, int line) {
    instance().maybe_throw(file, line);

ExceptionInstrument &ExceptionInstrument::instance() {
    if (g_instance == NULL) {
        g_instance = new ExplosiveInstrumentator();
    return *g_instance;

std::string ExceptionInstrument::get_filename_base(const std::string &path) const {
    string::size_type bk_pos = path.rfind('\\');
    string::size_type fw_pos = path.rfind('/');
    string::size_type pos;
    if ((bk_pos != string::npos) && (fw_pos != string::npos)) {
        pos = std::max(bk_pos, fw_pos);
    } else if (bk_pos != string::npos) {
        pos = bk_pos;
    } else if (fw_pos != string::npos) {
        pos = fw_pos;
    } else {
        return path;

    return path.substr(pos);

bool ExceptionInstrument::should_throw() {
    bool res = false;
    if (m_counter == m_run_id) {
        res = true;
        m_threw_cnt += 1;
    m_counter += 1;
    return res;

void ExceptionInstrument::next_run() {
    // this is called from Java, do nothing if instrumentation is disabled
    if (m_threw_cnt == 0) {
        cerr << "Instrumentation run " << m_run_id << " threw no exceptions." << endl;
    m_counter = 0;
    m_threw_cnt = 0;
    m_run_id += 1;
Another technique is using what I call explosive mocks. They work as ordinary mocks but throw a random exception when called. They may help you test correct handling of exceptions that you didn’t expect when first writing the code. For example in connecting to a network API, all kinds of things can go wrong, from network, DNS problems to authentication, API changes, invalid parameters, …). It’s not a very systematic method but can be useful as exploratory testing to find bugs.
QUESTION: how to handle exceptions in message loops / GUI / … in a way that is debuggable, readable, testable?

4 Automated testing

I use automated unit tests and (more or less) automated integration tests where it makes sense. This is a big and complicated topic and I have an article with a few of my observations in progress. Make sure to keep your interpipes to this blog clean so you don’t miss it ;)

5 Tooling for correctness

In no way complete or sufficient but this is what I use.

5 1 Address Sanitizer (+ more)

If you write in C++ (or god-forbid, C), you’re going to have memory bugs. Well, unless you are using Address Sanitizer or Asan. This kind of bugs can cause your program to crash, which is a little annoying to see. But in some cases a crash can lead to a Remote Code Execution exploit. RCE is basically when a snake whisperer (hacker) convinces your program to start executing some code the hacker offered.. So yeah, that’s a little less convenient, especially when they use it to steal your money or data.
Asan helps you catch these bugs. Also memory leaks. It can’t detect every problem with your code, it only detects problems in code that you execute. So you still need a good test suite. It catches every access violation (segfault for unix folks) and prints beautiful coloured output in the console. And since everybody loves coloured console output (and not having segfaults), I hereby endorse using Asan for everything.
Originally developed for the clang compiler, it’s now available for gcc as well (sorry for people stuck with gcc 2.x)(I wonder if that gcc version is written on papyrus?). Detailed usage is here but in short, you will need to create a new build variant for your project that will generate an Asan-instrumented build with the -fsanitize=address flag. This build will be around 2x slower and will abort immediately when an access violation is detected. It does not report false positives, that abort will be something you’ll have to fix.
You may have used Valgrind. For memory access violations, Asan is similar but works better. It does require you to recompile the code with Asan enabled but then it’s much more accurate.

Asan output

5 2 afl-fuzz

So even though awesome, Asan won’t catch problems in obscure code branches that don’t get executed. One thing you can be sure: hackers will try to find them and run them so that they can pwn your machine and steal your candy. Here come fuzzers, tools that are designed specifically to execute code paths that normally never see the light of day. They do it by running your code, like, a million times, each time with a slightly different input and observing whether it caused some different behaviour in your code.
This is mostly suitable for programs that read and parse some input files such as images, videos, PDFs or even antivirus software reading .exe files.
See, it has colours
afl-fuzz is one such program, free, open source and pretty good. It’s not complete magic though. You need to adjust your project to do nothing but read the input data and sometimes you need to help the fuzzing process a bit with a hint. afl-fuzz works by inserting instrumentation during compilation that informs the fuzzer where the control flow is going. Based on this instrumentation, it tries to alter the input data to find new control flow paths. And when it finds a control flow branch that crashes your code, it’ll happily returns the bad input. Programmers will take that sample input to go and fix the bug, hackers will take that sample to develop an exploit.

5 3 Catch

As far as automated testing frameworks for C++ are concerned, there’s quite a choice. You’ve got the one from Google, Boost, even Visual Studio comes with one. I can’t compare them but I tried Catch and enjoyed it very much. So besides the #1 requirement of coloured console input, it has the benefit of being very light and easy to include in the project (just one header file!) and very easy to use.
To assert, you would simply write
REQUIRE( factorial(2) == 2 )
and it will automatically deconstruct it into two sides of the == operator, showing expected and actual if it doesn’t match. Testing this way is much more natural than the classic Assert.AreEqual(factorial(2), 2) or even Assert.That(factorial(2)).Equals(2) or whatever the latest fad in fluent interfaces is.
The BDD style for organizing tests has been the most convenient from what I’ve seen so far. You can have a hierarchy of test conditions, delimited using SCENARIO, GIVEN, WHEN, THEN and at each step of the hierarchy, you can set up some objects that will be used in levels below. The test framework will then take this tree and run each path independently. Let me give an example with a completely imaginary API:
SCENARIO("web service test", "[web][http]") {
  WebServiceFake fake;
  RestClient client(fake.url());

  GIVEN("authenticated client") {

    WHEN("makes request about self") {
      Request r(client.new_request("/user/pete");

      THEN("gets its data") {
        REQUIRE(r.json().get("salary") == 123);
  GIVEN("guest client") {
    WHEN("makes request about pete") {
      Request r(client.new_request("/user/pete");

      THEN("gets nothing") {
        REQUIRE(r.json().count() == 0);
      THEN("error is reported") {
        REQUIRE(r.status() == 403);
In this example, both “authenticated client” and “guest client” will be run separately, with a fresh instance of client object each time. In all but the most basic unit tests, we have to deal with setting stuff up and this layered structure is really helpful because it helps avoid duplication while putting the code where it’s easy to see.

5 4 PageHeap

While I believe there are plans to make Address Sanitizer available for Windows, at the time of writing that port was not yet ready. PageHeap is a debugging tool built into Windows that can be used to detect buffer overflow errors. It’s not as versatile as Asan but it also helped save my code’s neck a few times (was particularly useful to catch a bug at the boundary of C# and C++ code). It doesn’t require you to recompile the code, you just enable it for a particular program using gflags.exe available with the Windows SDK. It works by putting each allocation at the end of a virtual memory page which allows the OS to catch any access over the page boundary.

5 5 Other tools

  • WinDbg is a very powerful debugger with bunch of scripts and extensions available. For source code based debugging, Visual C++ is pretty sufficient because you can see everything. But WinDbg sure comes handy when you don’t have the code and need to debug issues outside your own code or have problems calling closed source or system libraries. On a second thought, you probably don’t want to end up digging there unless you enjoy this kind of self-punishment.
  • radare2 looks pretty rad for digging in assembly. Sadly I didn’t have much time to play around with it. Yes I seem to enjoy this kind of self-inflicted pain.
  • rr is a project from Mozilla that lets you record a program run and then debug the bug out of it by running it over and over and over until you find it.
  • F* is the absolute heavy-weight here. It lets you write code, prove that it’s absolutely correct and then transalte it to C/C++. Except for the part where you have to be a genius to prove the correctness of any larger program.

6 That’s it?

I’ve tried to compile my my approach to not shooting yourself in the foot while coding in C++. Note that while it’s not exactly short, it still doesn’t cover everything, for example how not to shoot yourself in your hand, knee or the back of your neck.
The story is not over though, perhaps you, dear readers, can reveal some tricks you have up your sleeve? Discuss!

Monday, October 17, 2016

Digging into browser CSPRNG

Browsers nowadays support the window.crypto.getRandomValues() API for obtaining cryptographically secure random number values suitable for generating private keys and session tokens. And while it’s questionable if in-browser JavaScript crypto is really secure (it still requires a flawless TLS configuration. Forget about encrypting stuff without HTTPS enabled), more clients and customers ask me for implementing crypto in the browser. Oh well.

Here is a quick review of how the getRandomValues() API is implemented in open source browsers, as of 18 October 2016, so that we can be sure that nothing shady or lame (such as running current time through the Mersenne Twister) is going on.


Here it’s pretty straightforward. The first function file is the implementation of the API and it goes directly to the OS.


Chromium has their own fork of WebKit but essentially it’s the same story. It uses a global random number generator in base namespace.


Uh … here it, uhm… complicated?

The journey starts in the cpp file responsible for that JS function:

Here it’s invoking the random number generation service, “”. Well, uh, hope it can’t be overriden from chrome JavaScript or something.
The service implementation is here and it calls PK11_GenerateRandomOnSlot:

This one calls C_GenerateRandom here (or possibly other PKCS11 implementations,

There is a deterministic random byte generator here. It calls RNG_SystemRNG once on boot to init its internal state.

The windows implementation calls RtlGenRandom instead of CryptGenRandom which is the official
CSPRNG API on Windows. Although the docs don’t say it is crypto-safe, it is used by rand_s from the CRT and that is documented to be crypto-secure.

Live Action

Then I attached a debugger and put a breakpoint at the critical points. And ran the crypto JS API in a loop. Here we can see it goes through the path as expected:

>   freebl3.dll!prng_generateNewBytes(RNGContextStr * rng, unsigned char * returned_bytes, unsigned int no_of_returned_bytes, const unsigned char * additional_input, unsigned int additional_input_len) Line 338   C
    freebl3.dll!prng_GenerateGlobalRandomBytes(RNGContextStr * rng, void * dest, unsigned int len) Line 642 C
    freebl3.dll!RNG_GenerateGlobalRandomBytes(void * dest, unsigned int len) Line 659   C
    nss3.dll!PK11_GenerateRandomOnSlot(PK11SlotInfoStr * slot, unsigned char * data, int len) Line 2247 C
    xul.dll!nsRandomGenerator::GenerateRandomBytes(unsigned int aLength, unsigned char * * aBuffer) Line 37 C++
    xul.dll!mozilla::dom::Crypto::GetRandomValues(JSContext * aCx, const mozilla::dom::ArrayBufferView_base<&js::UnwrapArrayBufferView,&js::GetArrayBufferViewLengthAndData,&JS_GetArrayBufferViewType> & aArray, JS::MutableHandle<JSObject *> aRetval, mozilla::ErrorResult & aRv) Line 105   C++
    xul.dll!mozilla::dom::CryptoBinding::getRandomValues(JSContext * cx, JS::Handle<JSObject *> obj, mozilla::dom::Crypto * self, const JSJitMethodCallArgs & args) Line 70 C++

And seeding the DRBG from the OS once on startup:

>   freebl3.dll!rng_init() Line 419 C
    nss3.dll!PR_CallOnce(PRCallOnceType * once, PRStatus(*)() func) Line 779    C
    freebl3.dll!RNG_RNGInit() Line 495  C
    nss3.dll!secmod_ModuleInit(SECMODModuleStr * mod, SECMODModuleStr * * reload, int * alreadyLoaded) Line 232 C
    nss3.dll!secmod_LoadPKCS11Module(SECMODModuleStr * mod, SECMODModuleStr * * oldModule) Line 480 C
    nss3.dll!SECMOD_LoadModule(char * modulespec, SECMODModuleStr * parent, int recurse) Line 1537  C
    nss3.dll!SECMOD_LoadModule(char * modulespec, SECMODModuleStr * parent, int recurse) Line 1572  C
    nss3.dll!nss_InitModules(const char * configdir, const char * certPrefix, const char * keyPrefix, const char * secmodName, const char * updateDir, const char * updCertPrefix, const char * updKeyPrefix, const char * updateID, const char * updateName, char * configName, char * configStrings, int pwRequired, int readOnly, int noCertDB, int noModDB, int forceOpen, int optimizeSpace, int isContextInit) Line 436   C
    nss3.dll!nss_Init(const char * configdir, const char * certPrefix, const char * keyPrefix, const char * secmodName, const char * updateDir, const char * updCertPrefix, const char * updKeyPrefix, const char * updateID, const char * updateName, NSSInitContextStr * * initContextPtr, NSSInitParametersStr * initParams, int readOnly, int noCertDB, int noModDB, int forceOpen, int noRootInit, int optimizeSpace, int noSingleThreadedModules, int allowAlreadyInitializedModules, int dontFinalizeModules) Line 638   C
    nss3.dll!NSS_Initialize(const char * configdir, const char * certPrefix, const char * keyPrefix, const char * secmodName, unsigned int flags) Line 812  C
    xul.dll!mozilla::psm::InitializeNSS(const char * dir, bool readOnly, bool loadPKCS11Modules) Line 976   C++
    xul.dll!nsNSSComponent::InitializeNSS() Line 1742   C++
    xul.dll!nsNSSComponent::Init() Line 1948    C++
    xul.dll!nsNSSComponentConstructor(nsISupports * aOuter, const nsID & aIID, void * * aResult) Line 174   C++
    xul.dll!nsComponentManagerImpl::CreateInstanceByContractID(const char * aContractID, nsISupports * aDelegate, const nsID & aIID, void * * aResult) Line 1203    C++
    xul.dll!nsComponentManagerImpl::GetServiceByContractID(const char * aContractID, const nsID & aIID, void * * aResult) Line 1561 C++
    xul.dll!nsCOMPtr_base::assign_from_gs_contractid(const nsGetServiceByContractID aGS, const nsID & aIID) Line 103    C++
    xul.dll!nsCOMPtr<nsINSSComponent>::nsCOMPtr<nsINSSComponent>(const nsGetServiceByContractID aGS) Line 541   C++
    xul.dll!EnsureNSSInitialized(EnsureNSSOperator op) Line 196 C++


In summary, the browsers behave as expected, providing random numbers seeded by the OS crypto-safe random number generator.

Thursday, June 2, 2016

My list of high-quality online resources

The internet is a tremendous library of information but finding the signal among all the noise is hard work. I think everybody gradually builds their own go-to list of trusted sites and sources and I think it would be a great idea to share so that we can all benefit.

Health & Nutrition

Health information is critical to be correct and unfortunately one of the most likely to be full of bulls**t and charlatans. I'm trying to follow only properly scientifically grounded sources.
Note: never use Google to search for health issues. As you probably know, Google and other big companies are collecting information about all their users and your health issues is information you really don't want anyone to know. Instead, use the Privacy / Incognito mode in your browser and search using Duck Duck Go or Disconnect.Me.

Privacy & Security

  • Decent Security: basic tips for Windows users
  • Google Safety Center: while we know that Google is after all your private data, they are also doing a good job of preventing any malicious 3rd parties from getting it from you
  • Signal App: one of the only easy & secure ways to communicate completely privately. Android, iOS and desktop supported

Technology & Science

  • Mozilla Developer Network: the ultimate web technology reference. Forget about w3schools, it's outdated or even incorrect
  • S. Hanselman's Ultimate Tool list: not only for developers
  • Cosmos: A Spacetime Odyssey: a modern documentary TV series which explains a range of scientific topics. Absolutely beautiful visually with a catching commentary from Neil deGrasse Tyson
  • National Geographic: with a long tradition of documenting and protecting the environment. Unfortunately it was just bought by Fox News last year :(
  • Earth Wind Map: A cool map of the weather. I didn't verify whether their data is correct.
  • GoodReads: At first I thought just another Social Network for X is not useful but I was wrong. This is a perfect place to steal ideas on what to read from friends with similar interests. And of course, books are still the ultimate fountains of knowledge.

Language & Writing

  • Corpus Of Contemporary American English: need to check if a certain phrase makes sense in real world English? Just enter it here and avoid making a fool of yourself with made-up expressions
  • no other resources, that's why the writing in this blog sucks so much

Environment & Charity

  • Charity Navigator: you want to contribute to a change to the world but not sure if a certain charity is real and is using your money properly? Charity Navigator can help to shine some light into its internal operation.

Friday, March 11, 2016

How to debug neovim python remote plugin

I really really really want to have debugger integration with my Vim setup and while the plugins for old Vim were a little wacky, the new architecture of NeoVim seems promising, so I decided to give lldb.nvim a go.

It didn't work. This is a (epic boring)  story of how I debugged and fixed the issues.

Step 1: update

Update your neovim to the latest release to avoid fighting issues that have already been solved. At the time of writing, I used:
  • nvim 0.1.2 from Homebrew
  • OS X 10.10.5
  • XCode 7.0
  • lldb-340.4.70

Step 2: Diagnose

PyThreadState_get error

If you're on OS X, chances are you have more than one Python version installed and that's where the trouble comes from. If you get this error message

>>> import lldb
Fatal Python error: PyThreadState_Get: no current thread

it's most likely because you're trying to import a module that has been linked with a different version of Python. The lldb module comes with the XCode developer tools and was linked with the default system version of Python which lives in (remember this)


so this is the python version you should use to run the lldb.nvim remote plugin. On my system, BTW, lldb module lives in


Step 3: Install neovim Python module

The neovim module has probably already been installed with neovim but perhaps not in the correct Python version. You can try to  import neovim in the system Python. If it fails, you'll need to install it using easy_install or pip:

sudo /System/Library/Frameworks/Python.framework/Versions/2.7/bin/python -m easy_install neovim

This will install the neovim package into the system Python distribution (needs sudo) using the easy_install tool.

Step 4: Configure neovim to use the system Python

In your neovim config file, add this line:

let g:python_host_prog = '/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python'

This ensures that neovim will start the system Python (which has access to lldb and neovim modules) to host the plugin. After this, you should be all set!

Step 5: Using $PYTHONPATH?

If you do use $PYTHONPATH with your non-system Python, you'll have trouble as well. Before launching the system Python from nvim, you'll need to clean this variable otherwise the packages will interfere with the system Python's packages.

I do that using a small wrapper script ~/syspython2 which gets invoked from nvim as the g:python_host_prog

# running the OS X system python. Required to import the lldb module.
export PYTHONPATH="/Applications/"

# enable these for debugging
#echo "--" >> ~/syspython2.log
#echo "$@" >> ~/syspython2.log
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python "$@"

Step 6: Diagnose

Still having trouble?

  • Don't forget to run :UpdateRemotePlugins
  • Enable logging in the ~/syspython2 script
  • Check using pstree | less if neovim is launching the correct Python binary
  • Double-check you can import neovim and lldb modules from the system Python
  • Make sure lldb.neovim is installed correctly - the file lldb.nvim/rplugin/python/ must exist
  • NeoVim also tries to load Python 3 plugins, you may need to do the same for Python 3
  • Try to debug /usr/local/Cellar/neovim/0.1.2/share/nvim/runtime/autoload/remote/host.vim  using debugging vim methods
  • More info about lldb Python module here on StackOverflow