Hash changes in Qt 5

Back in 2003 Crosby and Wallach figured out that certain implementation of hash tables were vulnerable to a family of DOS attacks.

The idea is that an attacker can carefully choose a number of different keys that get hashed in the same bucket of the table, by exploiting the fact that the attacker always knows the exact implementation of both the hash table and the hash function over the keys types.

For obvious efficiency reasons, the hash functions that are generally used are not cryptographically good — it’s quite the opposite: they are linear and it’s computationally feasible to find such bucket collisions.

In the end, the purpose of the attack is to make the hash table degenerate in very long chains in a few buckets, so that the table starts showing the worst-case algorithmic behaviour (O(n) instead of O(1)). That’s usually enough to slow down the application (and the machine) to unacceptable levels — i.e., a denial of service.

Did you say 2003?

Yep. Since the attack was against perl (the implementation, not the language), it got fixed quite early there (in perl 5.8.1, released in Sept. 2003).

What about the other implementations in other languages implementations / libraries? Well…

Yes, everybody forgot about that attack — at least until CCC 2011, where it was shown that most hash table implementations were still vulnerable. Panic and chaos ensued.

The bottom line of the story is that QHash in Qt 4 was and is vulnerable as well.

And the fix is…

A strategy to defeat this attack is breaking the predictability of the bucketing by salting the hash function.

A random seed can be chosen per-process (or even per-hash table), and then used to seed the calculation: the attacker doesn’t know then how to create bucket collisions because she can’t predict in which bucket a certain key will end up in.

This is what’s currently being merged in Qt 5.0.

The actual change is done in two steps:

  1. changing the qHash hashing function signature;
  2. add a per-process random seed that salts the hash function.

In Qt 5, for a type T, it will be possible to use any of these functions for calculating the hash value:

uint qHash(T t);
uint qHash(const T &t);
uint qHash(T t, uint seed)
uint qHash(const T &t, uint seed);

This is going to be fully source compatible with Qt 4: if a two-arguments version exist, it will be picked up and used by QHash. Otherwise, QHash will use the Qt 4-style, one-argument overload. The seed will then be XORed with the result to perturbate it.

Moreover, the overloads for the Qt classes will also have the seed defaulted to 0, so that it’s still possible for your code to simply call qHash(val) without passing a second argument.

NEWS AT 11

Well, no. The fact is, the fix isn’t harmless. And I’ll show you why.

Most of the work I had to do wasn’t about changing QHash; it was about finding and fixing all the places in Qt that relied on a specific iteration order on QHash elements.

Those bugs are subtle to find and usually denote something very wrong in the first place. Even in Qt 4, with a fully deterministic implementation, the order of iteration upon two identical QHashes was ultimately depending on the “history” of those two objects:

#include <QtCore>

int main()
{
   QHash<int, int> h1, h2;
   h2.reserve(100);
   h1[67003] = h2[67003] = 0;
   h1[101449] = h2[101449] = 1;
   QHash<int, int>::const_iterator i;
   for(i = h1.constBegin(); i != h1.constEnd(); ++i)
       qDebug() << i.key() << i.value();
   for(i = h2.constBegin(); i != h2.constEnd(); ++i)
       qDebug() << i.key() << i.value();
}

This prints (in Qt 4.8.0):

67003 0
101449 1
101449 1
67003 0

Qt has never given any guarantee upon QHash iteration order, since that depends on:

  • the qHash implementation (that can change at any time);
  • the QHash implementation (that can change at any time);
  • the history of each QHash object (which methods were called, with which arguments)
  • (in Qt 5) the random seed applied to qHash (changes in every process).

If your code is depending on a specific QHash ordering, or on qHash results to be the same across Qt versions, then — I’m sorry — your code is going to break.

Test that your code does not depend on qHash or QHash to be stable (for instance, recompile your Qt and change the qHash(QString) function in qhash.cpp to bitwise negate its output).

And if you find bugs, please fix them as soon as possible. Qt 5.0 is not going to wait for you!

Read the patient information leaflets

Is that all? Well, no.

As an extra cherry on top, in the last few months Robin played with different hash functions for strings / byte arrays.

The result of his work is that the qHash(QString) and qHash(QByteArray) are going to change in Qt 5.0 in favour of (a variant) of DJB’s hash function (namely, the one used in Java for the hashCode() of java.lang.String).

The result is that now qHash(QString) and qHash(QByteArray) are up to ~33% faster than their Qt 4 implementations. :-)

Advertisements

4 Responses to “Hash changes in Qt 5”

  1. atomopawn Says:

    Good work! I’ve been leery of Qt5 (I’m just getting the hang of Qt4!), but performance and security improvements like these are going to make it easy (and fun?) to switch.

  2. thorbjorn81 Says:

    It would be cool if you could link to some of the fixes you had to make to code relying on specific order. At least I’m having trouble imagining in which situation this would happen since I guess normally code would either not require any order at all, or it would require it to be sorted (in which case a QMap could be used instead).

Comments are closed.