Apache Password Storage

Updating my password on an SVN server, I happened to forget the parameters to htpasswd
and actually looked at the usage message.  It contains this rather interesting line buried at the
bottom (Solaris 11.1, derived from Apache 2.22):

The SHA algorithm does not use a salt and is less secure than the MD5 algorithm.

Obviously, by "X algorithm" they mean "the overall process of taking  password and using our password hashing procedure which incorporates hash algorithm X".  

Indeed, it's true: there's no salt.  Generate an apache password file entry for user myUser, password myPassword:

igb@mail:~$ htpasswd -nbs myName myPassword
myName:{SHA}VBPuJHI7uixaa6LQGWx4s+5GKNE=

and repeat the same task using simple hashing:

igb@mail:~$ echo -n myPassword | openssl dgst -sha1 -binary | openssl enc -base64
VBPuJHI7uixaa6LQGWx4s+5GKNE=

to see that they're the same.  I wonder how many sites quickly thought "MD5 is a bit broken, SHA1 is better"?  When in fact, dictionary searching given a file of 1000 SHA1 hashes is at least 1000 times easier (unsalted).


The iterated MD5 that is used if you select that option is here: 


It does have good salting, and is clearly the better option.  It at least generates different outputs each time it is called with the same input!

igb@mail:~$ for i in 1 2 3 4; do htpasswd -nbm myName myPassword | head -1; done
myName:$apr1$8wUtj8FR$m2OfIoNqjJVkNYjAhwZ25.
myName:$apr1$3qS0CJOD$ONOeHyqTIUPgnMBKH9XCW0
myName:$apr1$g4igyO/H$XJIx4OHNIDxDD4m3Q5vNj1
myName:$apr1$95GXXu0V$wPM3zi/BLMVgpGJI4KNAC/
igb@mail:~$ 

However, that too makes one wonder.  It uses sort-of iterated MD5: it doesn't repeat the whole algorithm, complete with finalisation, rather it iterates repeatedly with the password, the salt and some fixed strings, calling the hash update function each time.  Unless I'm missing something, the way the code is written means that running apr_md5_update repeatedly is equivalent to building a buffer containing the catenation of the successive strings and calling it once: that is ripe for hardware acceleration.
   
It's not parameterised (by contrast, see the SHA256 and SHA512 based hashes now used for the password file on recent Linuxes and Solarises here http://pythonhosted.org/passlib/lib/passlib.hash.sha256_crypt.html - they have a parameter for how many iterations to use, allowing scaling over time).  The comment in the code shows how old the decisions are:

/*
     * And now, just to make sure things don't run too fast..
     * On a 60 Mhz Pentium this takes 34 msec, so you would
     * need 30 seconds to build a 1000 entry dictionary...
     */
My laptop (three year old Air with an i5 processor) can compute 240000 such hashes in 30s without even going to the effort of writing dedicated code:

ians-macbook-air:~ igb$ time (for i in 1 2 3 4; do head -60000 /usr/share/dict/words | openssl passwd -apr1 -salt ayS1/GqV -stdin > /dev/null & done; wait)

real 0m28.363s
user 1m46.752s
sys 0m0.227s
ians-macbook-air:~ igb$ 

and a more modern i7 Air is can do about 400000.   Being able to perform hashes 400x faster using a very naive approach, with presumably much more performance available via GPUs and other hardware tweaks, makes the loss of a password hash pretty serious.  ~10k/sec means that you could could run the top 10000 entries from the roku database (say) again 86400 users in a day using a laptop, which would be a pretty devastating attack against a large stolen hash file.