Wednesday, March 24, 2010

Knock, knock... Who's there, statistically?

Utstein Abbey 4Per, the owner of this blog has fled the country for a few days, so I am seizing the opportunity to not have my little musings drowned in his figurative firehose of blog posts.

About six months ago, I reinstalled one of my Gentoo Linux servers and I left the SSH port open to the world. I did this deliberately, as I tend to access my servers from many different sites, not always knowing in advance what my source address will be. Usually, I'll install logrotate and a few other packages to keep things tidy, but for some reason this was neglected.

The other day, while doing some routine maintenance on the server, I discovered that the system log file /var/log/messages had grown to a whopping 12GB. What on earth was going on here?

What is SSH?
Secure Shell or SSH is a secure, encrypted network protocol designed as a replacement for earlier, insecure protocols like telnet, rsh and rlogin.
A quick glance through the first few pages of the file, revealed what I thought was the reason for the enormous file size: About 8 hours after the server was brought online, the first attempt at brute forcing the SSH service started, and from that day on, it was more or less under constant attacks from various hosts all over the world.

Now, this didn't really come as a surprise to me. A few years back, while I was working for one of Scandinavia's largest ISPs, I set up an IDS system on a few of their linkups. It's simply staggering how much malevolent traffic there is out there, including these kinds of automated SSH scans. As omnipresent as they are, could these SSH scans really generate 12GB worth of log entries?

Well, as it turns out, they didn't. Most of the log data was generated by an extremely chatty local process, and not a very interesting one at that. However, my curiosity was piqued, so I went about extracting the log entries pertaining failed attempts at logging on using SSH, and importing the data into my trusty PostgreSQL database.

There results were fairly interesting.

In the 6 months since the server went live, 140416 login-attempts had been made, from 568 different IP addresses. The attackers had attempted to login using 21868 different usernames, most of which made sense, others made less sense, and some were just plain weird. The idea, I presume was to try a big set of commonly found usernames, paired with common passwords, although some of the attackers seemed to have gotten the usernames and passwords mixed up. I mean, I would hate to work for an organization that gives their employees usernames like these:
  • Sh3I5Lik3P4rtY@v3r (She Is Like Party Over, I guess)
  • 666s1czfarginn\r (password as username and messed up file conversion)
  • 1l0v3y0u (I love you too!)
  • 0p9o8i7u6y5t4r3e2w1q (clever keyboard pattern)
  • 1q2w3e4r5t6y7u8i9o0p (not so clever keyboard pattern).
  • root!@# (if at first you don't get access, start swearing)
At the other end of the spectrum, there were the more logical choices. The clear winner of the popularity contest was  test, being used 4206 times:

Much to my surprise, root wasn't even in the top 100 of this list. Oh well, I guess most script kiddies are Windows users.

At one point, I tried to see if there was any method to the madness, so I graphed out the number of login attempts per day. Was there a pattern to be seen? Was the level of activity higher during weekends, or during week days? How about holidays?

I couldn't really spot any trends or significant high or low points, other than an apparent clustering of activity a week before Christmas, and then absence of activity around Christmas eve. School kids enjoying their vacation before having to attend all the family gatherings? This seemed like a plausible explanation until I took a look at the geographical distribution of the attacks:

And the winner is China, with a whopping 27% of all the login attempts! Turns out China beat the next three countries, USA, Spain and the South Korea combined, and China was heavily represented around the Christmas holidays. So much for the Christmas theory.

After digging around in the database, I realized that I could organize all the singular login attempts into groups or attacks, by looking at the time of one login attempt from a single ip, and comparing it to the time of the next login attempt from the same ip. If they two login attempts where close enough to each other in time, I lumped them together into a single attack. 5 minutes seemed to be an acceptable time difference to be working within.

This allowed me to run some statistics on the aggressiveness of the attacks:

The most aggressive attack (from Germany) lasted 15 minutes and 48 seconds. During that time, the attacker tried to log in 2624 times, giving an average of 2.77 attempts per second. There were longer lasting attacks (some went on for days), making even more attempts, but none as aggressive as this.

Finding an aggressive attacker is usually fairly easy. Finding an  elusive one is a bit harder. After sorting everything into attacks, I discovered a whole lot of attacks that had only one single logon attempt from various source addresses. These seemingly random occurrences didn't make much sense to me until I started sorting said attacks by time of occurrence, and including the attempted username.

Suddenly, a very distinctive pattern emerged:

Over a period of 3 weeks, approximately 170 attempts were made from some 50 different source addresses all over the world, using a single, alphabetically sorted list of usernames. That's roughly 8 attempts per day, or about 30000 times slower than the aggressive attack mentioned earlier.

Well, hello there, Mr. Slow and Meticulous Botnet! Trying to fly under the radar, are we? Well, you almost made it.

In truth, the only reason I did discover this was because I was manually sifting through the data. This kind of distributed and slow moving behavior is a very efficient, albeit time consuming, way to avoid detection by automated monitoring tools. Oh, by the way, that single webmaster entry in there probably wasn't part of the distributed attack.

So, what can be done to mitigate the risks?

There are a number of ways to protect your SSH login service, but determining which one's most suitable depends on how you value of your system. However, threat and risk assessments is a topic best left for a different blog post, so in closing, I'll just list a few security measures for your consideration:
  1. Configure your SSH service to require certificate based authentication and reject everything else.
  2. Configure your /etc/hosts.deny and /etc/hosts.allow files to only allow access to the SSH service from predefined source addresses.
  3. Configure your firewall (host based or otherwise) to only allow access to the SSH service from predefined source addresses.
  4. Install an intrusion prevention tool, like DenyHosts to block excessive traffic by automatically inserting the offending address into the /etc/hosts.deny file.
  5. Subscribe to a block-list service to have your /etc/hosts.deny file or your firewall ruleset automatically updated.
  6. Use a port knocking service to dynamically open the SSH port in your firewall whenever you come knocking.
  7. Ignore the problem. If your users are using good passwords, and you're diligently keeping your system up to date and generally secure, these scans are just a nuisance anyway.
And on that bombshell, it's time to end this lengthy blog post. Happy secure remote shelling!


  1. Bare en kommentar til det at det er et botnet som prøver å angripe under radaren. Det kan også være et botnet som angriper et større antall mål "round robin style". Hvis du bare er en av 200k adresser som blir utsatt for dette, vil det for deg kunne se ut som om det skjer veldig sakte. Siden du bare ser det som treffer din boks er det umulig for deg å vite hva som er tilfelle.

  2. I just wanted to say that this was an amazing post.

    About "test" being the #1 user account, I know someone who had his home server compromised since he forgot to delete a test account he set up. Admittedly it's a bad idea to derive trends from a single data-point, but I can see how that would be effective.

    Is there any chance you could release a sanitized version of your logs? There's been several writeups on this type of attack over the years, but no publicly available datasets that I know of. Also, I'm too lazy to set up a honeypot on my own network to collect the data myself ;)

    As to the previous posts on the subject here is one from Symantec from 2006:

    Updated info from Symantec responding to a successful attack in December 2008

    And a list of GoIP locations from Computer Defense written August last year:

    Reading your posts and reviewing the posts above it's really interesting to see how this attack is evolving.

  3. It's fun to see the results from some data analysis graphed out like that, and with a nice write-up of the different attack strategies.

    Sound advice at the end as well.

  4. Bjarte: Jeg antok at det ikke bare var meg som hadde æren av å bli utsatt for botnett-angrepet, men jeg glemte visst å kommentere det i artikkelen.

    Matt: I'll see what I can do about creating a filtered log for public release some time over the Easter holidays.

  5. Nice post, interesting analysis, and I agree with most of your recommendations.

    But does anybody really consider port knocking to be useful? Once your bot is fitted with the ability to try random ports, the port sequence is really just another password written in a 16 bit alphabet, IMO.

    I've been mulling writing up an article about that (likely upcoming topic for, but any qualified comment on the topic is welcome now as well.

  6. If you craft your own knock packets using port numbers, flags, sequence numbers, options and so on, you'll end up with some incredibly complicated "passwords" even with very few packets. Additionally, one might presume that your "knock" is a scripted sequence of packets, negating the need for remembering the "password" yourself, which means you're free to choose your own magic 100+ knock sequence.

    When trying to brute force a port knocking service, you're faced with a few problems: How do you know that a port knocking service is actually in place? How do you know that your attempted knocking sequence has worked? Scan the entire port range between each attempt?

    The game changes ever so slightly if you are able to monitor the traffic as a genuine port knock happens. In that regard, simple port knocking services aren't more secure than any old clear text protocol, like telnet.

    I guess one could device a knocking scheme using some form of challenge response or key exchange, but that feels a bit like donning The Complicator's Gloves.

    Would I consider using port knocking to protect my own SSH service? Nah, not worth the hassle.

  7. As you say, port knocking isn't worth the hassle.

    But I've had great success using fail2ban. It doesn't stop the bogus logins, but at least it halts them at a few attempts.


All comments will be moderated, primarily for spam. You are welcome to disagree with my posts of course.