|[ Oh yeah, you can click it for *full* size! Free to use, please show credits. ]|
Jeremi Gosney (@jmgosney) have done a terrific job on cracking the Linkedin hashes. In fact, he has cracked some 90% of them now. So I asked Jeremi if I could ask one of my UX colleagues to try to make an infographic, he happily agreed. The result can be seeen in all its glory from the above supersize infographic (Print it if you like!)
Some additional information to go with it:
1. The base words are usually what gives us the first indications of where a leak originates from. However the Linkedin story didn't break before several trustworthy people confirmed finding their own unique SHA1 password hash in the dump. Crying wolf doesn't help much, and can for sure damage the reputation of those innocent if not handled correctly.
2. The pass phrases shown are good examples, but I think that it is important to highlight that these were cracked after what I would consider massive wordlist attacks, eventually also including elements of logical rules and/or pure bruteforce.
Even more important, which most articles in the media have forgotten, is to differentiate between the risks of online vs offline password cracking. In an online attack any attacker will be severely limited both in terms of simultaneous accounts to test per second, as well as number of passwords per account per second. Rate-limiting algorithms, CAPTCHA and account lockout policies usually secures most systems to a decent level.
For offline attacks, like what happened here with the Linkedin hashes, remember that those hashes must be obtained using other tricks. It could be SQL injection (SQLi), it could be a vast number of other vulnerabilities, bugs or flaws being exploited. We don't know how that happened, and depending on Linkedin, we may not get the truth either.
If you can access stored password hashes there is a high probability an attacker can also access most, if not all information that has any value at the service being attacked. The primary reason for cracking the password hashes is to obtain extra value of the compromised accounts. That applies to both the site being compromised initially, as well as any other services where the username (e-mail address usually) and cracked password has been reused.
As a side note here on e-mail accounts, look at this post from Bruce Schneier:
3. Now for the final (and to me most interesting) part: colors. :-)
Professor Kirsi Helkala (http://www.nordrekalstad.com/kirsi) and others (including me as co-author) have written a new paper which is now in for peer review. Basically it is about creating strong passwords that are easy to remember, backed of course by scientific research.
She made an interesting observation during her trials that caught my interest; quite a few people were using the primary color from service provider X's logo as one component when they created their passphrases for that specific site.
So I quickly asked Jeremi to look for color words in the Linkedin output, and count the number of occurences. This is an area we need to do more research and statistics into, but I *think* the infographic really do reveal something never documented before. To me its just one more factor that will help us improve our logical and rule-based password cracking skills.
This is good from a risk perspective; a better understanding of how people create their passwords and how to handle it properly. At the same time this is also ammunition for the bad guys who takes an economic interest in increasing their cracking capabilities.
I'll end this by giving a big THANK YOU to Jeremi, as well as my colleague Tom for doing the infographic. Last but not least a similar THANK YOU to reporter Jonas Bakken at DagensIT.no for taking an interest in this. The public deserves to know about the risks of bad password storage practices.