Friday, October 26, 2012

Analysis of the Leak

That extremely frustrated feeling you get when you cannot crack 50% of a moderately large leak within minutes. When rockyou.txt only nets you 6,124 plains. When 1.2 billion words + 40,000 rules results in a paltry 24,000 plains. Oh, that frustrated feeling.

And let's not forget that "you have to be freaking kidding me" feeling you get when you realize that the dump you have been working with for 26 hours actually contains plaintext passwords for 70% of the hashes -- after you've already busted your ass to crack 81% of them. A mistake easily made when you hastily extract only the hashes from a dump, without bothering to look at the rest of the data.

That's precisely how I felt while working on the recent leak from Peruvian domain registrar Punto, whose database was made public by @LulzSecPeru late last week. The database dump -- provided as a full .sql file -- contains names, telephone numbers, email addresses, mailing addresses, and other identifying information for 99,471 unique customers. Among the leaked information are 137,066 unsalted SHA-1 password hashes (110,669 unique), and -- as all of us missed -- 114,057 plaintext passwords.

While I have recovered 87% of the passwords from the leak, there's very little to get excited about here, as this dump has very little value for those of us whose sole purpose in life is performing statistical analysis of the way users select passwords.

You see, the reason wordlists and rules did not yield the expected results is because very few users selected their own passwords. Rather, the vast majority -- approximately 71% by my count -- were using generated passwords assigned by Punto, and Punto never forced them to change their password. 

This is not necessarily a bad thing, and I cannot say that I am opposed to this practice. Personally I feel that users continuing to use strong, randomly-generated passwords assigned by a system is much better than forcing users to choose their own passwords, as surely users will create weaker ones.

The Punto leak provides some evidence that this type of system can work, as I was only able to recover about 24% of the 110k passwords through conventional, highly-probable methods (wordlists, rules, common masks, incremental through length 7, Markov length 8 & 9, digits through length 12.) This is in stark contrast to other moderately large leaks, where we can usually recover 50-70% using the same methodology. And as expected, the passwords recovered through these methods were hardly spectacular. As Punto did not enforce any sort of password complexity, the shortest password recovered was one character, and the average length for those recovered passwords was 7 characters.

Having recovered less than a quarter of the passwords through the most probable methods, I was feeling pretty defeated. I was officially bored with this leak after only 16 hours.

But just as their password storage was poorly implemented (raw SHA-1 + plaintext), so was the way they generated users' passwords. Rather than generating random passwords, they instead opted for a predictable pattern. I stumbled onto a few pieces of this pattern through the course of running rule-based attacks against the plains I had already found, and as I continued to loop back through, I kept hitting on that pattern.

Hope. I manually exploited each specific mask, and within 90 minutes I had already recovered 20k more plains. By expanding the masks and doing more loopbacks, I was able to piece together the exact keyspace used by their password generation algorithm. It can be fully exploited with the following three masks:

-1 ?l?u?s ?1?1?1?d?d?d?1?1?1?1
-1 ?l?u?s ?1?1?1?d?d?d?d?1?1?1?1
-1 ?l?u?s ?1?1?1?d?d?d?d?d?1?1?1?1

The entire keyspace would take approximately 8 years to fully exhaust at a rate of 16 G/s using my 8x HD 7970 rig. But thanks to Atom and the new Brute-Force++ engine in oclHashcat-plus, I was able to crush it in about 8 hours. 

By generating an .hcstat file with all of the plains I found through manually exploiting patterns, I was able to run short Brute-Force++ attacks against each mask by selecting thresholds that would allow each attack to run for about 2 hours. I then looped back through the new plains I discovered, generating a new .hcstat file and lowering the threshold on each pass. Feeling confident that I had sufficiently exploited the generated password keyspace, the dump was now up to 81% recovered.

I was again lean on ideas. That's when I remembered that this dump included e-mail addresses and usernames as well -- things that are rarely included in leaks like this. As I went to extract the usernames and e-mail addresses from the .sql dump to create a new wordlist from them, I noticed some very familiar words in a particular column of the table -- they were passwords that I had already cracked. You have got to be freaking kidding me! Sure enough, after dumping that column and running it through the hash list, 77,538 of the 110,669 passwords were right there in the database this whole time.

I had already cracked most of what was stored as plaintext in the database; there were only about 5k passwords that I had missed. Running the usernames through with rules added another 2k or so that hadn't been found through other means. With 87% of the passwords recovered, I did not feel compelled to continue any further with this leak.

So enough commentary, let's start talking statistics.

  • 96,805 out of 110,669 unique passwords recovered.
  • Approximately 68,500 passwords were automatically generated and never changed.
  • Approximately 28,300 passwords were selected by users.
  • 7,885 of user-selected passwords were based on the user's username.
  • 43 accounts had no password (possibly never activated?)

Top Passwords

  1. 123456
  2. Sistemas2
  3. sistemas2
  4. Sistemas1
  5. sistemas1
  6. Password1
  7. 1234567
  8. dominios
  9. francisco2
  10. sebastian3
  11. puntope
  12. sistemas
  13. password1
  14. rosarios
  15. 123456789
  16. Sistemas
  17. sametsis
  18. eduardo2
  19. Francisco

Top Basewords

  1. peru
  2. sistemas
  3. abc
  4. password
  5. carlos
  6. admin
  7. dominio
  8. puntope
  9. jose
  10. lima
  11. master
  12. luis
  13. alianza
  14. internet
  15. jcastilla
  16. nic
  17. eduardo
  18. daniel
  19. fernando
  20. cesar

Top Masks

  1. ?l?l?l?d?d?d?d?l?l?l?l
  2. ?l?l?s?d?d?d?d?l?l?l?l
  3. ?l?s?u?d?d?d?d?l?l?l?s
  4. ?s?u?l?d?d?d?d?l?l?s?u
  5. ?l?l?l?d?d?d?l?l?l?l
  6. ?l?l?l?l?l?l?l?l
  7. ?d?d?d?d?d?d?d?d
  8. ?u?u?u?d?d?d?d?u?u?u?u
  9. ?l?l?l?l?l?l?l
  10. ?l?l?l?l?l?l?l?l?l
  11. ?l?l?s?d?d?d?l?l?l?l
  12. ?l?l?l?l?l?l?l?l?l?l
  13. ?l?s?u?d?d?d?l?l?l?s
  14. ?u?l?l?d?d?d?d?l?s?u?l
  15. ?s?u?l?d?d?d?l?l?s?u
  16. ?l?l?l?l?l?l?l?l?l?l?l
  17. ?u?s?u?d?d?d?d?u?u?u?s
  18. ?l?l?l?l?l?l?d?d
  19. ?d?d?d?d?d?d?d
  20. ?l?l?l?l?d?d?d?d


  1. You you by any chance still have the plain text password dump ?
    I'm working on a password research project and the dump would be very

    thanks in advance..


  2. Sure Matt, shoot me an email (epixoip at and I'll provide you with the plains.


All comments will be moderated, primarily for spam. You are welcome to disagree with my posts of course.