In late 2010, Sean Brooks received three e-mails over a span of 30 hours warning that his accounts on LinkedIn, Battle.net, and other popular websites were at risk. He was tempted to dismiss them as hoaxes—until he noticed they included specifics that weren’t typical of mass-produced phishing scams. The e-mails said that his login credentials for various Gawker websites had been exposed by hackers who rooted the sites’ servers, then bragged about it online; if Brooks used the same e-mail and password for other accounts, they would be compromised too.
The warnings Brooks and millions of other people received that December weren’t fabrications. Within hours of anonymous hackers penetrating Gawker servers and exposing cryptographically protected passwords for 1.3 million of its users, botnets were cracking the passwords and using them to commandeer Twitter accounts and send spam. Over the next few days, the sites advising or requiring their users to change passwords expanded to include Twitter, Amazon, and Yahoo.
“The danger of weak password habits is becoming increasingly well-recognized,” said Brooks, who at the time blogged about the warnings as the Program Associate for the Center for Democracy and Technology. The warnings, he told me, “show [that] these companies understand how a security breach outside their systems can create a vulnerability within their networks.”
The ancient art of password cracking has advanced further in the past five years than it did in the previous several decades combined. At the same time, the dangerous practice of password reuse has surged. The result: security provided by the average password in 2012 has never been weaker.
A new world
The average Web user maintains 25 separate accounts but uses just 6.5 passwords to protect them, according to a landmark study (PDF) from 2007. As the Gawker breach demonstrated, such password reuse, combined with the frequent use of e-mail addresses as user names, means that once hackers have plucked login credentials from one site, they often have the means to compromise dozens of other accounts, too.
Newer hardware and modern techniques have also helped to contribute to the rise in password cracking. Now used increasingly for computing, graphics processors allow password-cracking programs to work thousands of times faster than they did just a decade ago on similarly priced PCs that used traditional CPUs alone. A PC running a single AMD Radeon HD7970 GPU, for instance, can try on average an astounding 8.2 billion password combinations each second, depending on the algorithm used to scramble them. Only a decade ago, such speeds were possible only when using pricey supercomputers.
The advances don’t stop there. PCs equipped with two or more $500 GPUs can achieve speeds two, three, or more times faster, and free password cracking programs such as oclHashcat-plus will run on many of them with little or no tinkering. Hackers running such gear also work in tandem in online forums, which allow them to pool resources and know-how to crack lists of 100,000 or more passwords in just hours.
Most importantly, a series of leaks over the past few years containing more than 100 million real-world passwords have provided crackers with important new insights about how people in different walks of life choose passwords on different sites or in different settings. The ever-growing list of leaked passwords allows programmers to write rules that make cracking algorithms faster and more accurate; password attacks have become cut-and-paste exercises that even script kiddies can perform with ease.
“It has been night and day, the amount of improvement,” said Rick Redman, a penetration tester for security consultants KoreLogic and organizer of the Crack Me If You Can password contest at the past three Defcon hacker conferences. “It’s been an exciting year for password crackers because of the amount of data. Cracking 16-character passwords is something I could not do four or five years ago, and it’s not because I have more computers now.”
At any given time, Redman is likely to be running thousands of cryptographically hashed passwords though a PC containing four of Nvidia’s GeForce GTX 480 graphics cards. It’s an “older machine,” he conceded, but it still gives him the ability to cycle through as many as 6.2 billion combinations every second. He typically uses a dictionary file containing about 26 million words, combined with programming rules that greatly extend its effectiveness by adding numbers, punctuation, and other characters to each list entry. Depending on the job, he sometimes uses a 60 million-strong word list and something known as “rainbow tables,” which are described later in this article.
As a penetration tester who gets paid to pierce the defenses of Fortune 500 companies, Redman tries to spot weaknesses before criminal hackers exploit them on his customers’ networks. One of the key ways he stays ahead is by downloading hash lists that are dumped almost every day on pastebin.com and other sites to see if any belong to the organizations he is contracted to protect.
Recently, he recovered a 13-character password that he had spent several months trying to crack. To protect the account holder, he declined to reveal the precise combination of characters and instead made up the imaginary passphrase “Sup3rThinkers” (minus the quotation marks) to illustrate his breakthrough. “Sup3rThinkers” follows a number of patterns that have become common: it opens with a common, five-letter word that begins with a capitalized letter and substitutes a 3 for an E, followed by a common, seven-letter word that also begins with a capital letter. While the speed of his system didn’t hurt, cracking the password was largely the result of the collective codebreaking expertise developed online over the past few years.
The most important single contribution to cracking knowledge came in late 2009, when an SQL injection attack against online games service RockYou.com exposed 32 million plaintext passwordsused by its members to log in to their accounts. The passcodes, which came to 14.3 million once duplicates were removed, were posted online; almost overnight, the unprecedented corpus of real-world credentials changed the way whitehat and blackhat hackers alike cracked passwords.
Hashing it out
Like many password breaches, almost none of the 1.3 million Gawker credentials exposed in December 2010 contained human-readable passcodes. Instead, they had been converted into what are known as “hash values” by passing them through a one-way cryptographic function that creates a unique sequence of characters for each plaintext input. When passed through the MD5 algorithm, for instance, the string “password” (minus the quotes) translates into “5f4dcc3b5aa765d61d8327deb882cf99”.
Even minor changes to the plaintext input—say, “password1” or “Password”—result in vastly different hash values (“7c6a180b36896a0a8c02787eeafb0e4c” and “dc647eb65e6711e155375218212b3964” respectively). When processed by the SHA1 algorithm, the inputs “password”, “password1”, and “Password” result in “5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8”, “e38ad214943daad1d64c102faec29de4afe9da3d”, and “8be3c943b1609fffbfc51aad666d0a04adf83c9d” respectively.
In theory, once a string has been converted into a hash value, it’s impossible to revert it to plaintext using cryptographic means. Password cracking, then, is the practice of running plaintext guesses through the same cryptographic function used to generate a compromised hash. When the two hash values match, the password has been identified.
The RockYou dump was a watershed moment, but it turned out to be only the start of what’s become a much larger cracking phenomenon. By putting 14 million of the most common passwords into the public domain, it allowed people attacking cryptographically protected password leaks to almost instantaneously crack the weakest passwords. That made it possible to devote more resources to cracking the stronger ones.
Within days of the Gawker breach, for instance, a large percentage of the password hashes had been converted to plaintext, a feat that gave crackers an even larger corpus of real-world passwords to inform future attacks. That collective body of passwords has only snowballed since then, and it grows ever larger with each passing breach. Just six days after the leak of 6.5 million LinkedIn password hashes in June, more than 90 percent of them were cracked. In the past year alone, Redman said, more than 100 million passwords have been published online, either in plaintext or in ciphertext that can be readily cracked.
“Now, it’s like once a quarter you get another RockYou,” Redman said.
We will RockYou
In the RockYou aftermath, everything changed. Gone were word lists compiled from Webster’s and other dictionaries that were then modified in hopes of mimicking the words people actually used to access their e-mail and other online services. In their place went a single collection of letters, numbers, and symbols—including everything from pet names to cartoon characters—that would seed future password attacks.
“So it’s no longer this theoretical word list of Klingon planets and stuff like that,” Redman said of the RockYou list. “It’s literally ‘dragon’ and ‘princess’ and stuff like that, and [the list] may crack 60 percent of a newly compromised website. Now you have 60 percent of the work done and you haven’t done any thinking at all. You’ve just used your previous knowledge.”
Almost as important as the precise words used to access millions of online accounts, the RockYou breach revealed the strategic thinking people often employed when they chose a passcode. For most people, the goal was to make the password both easy to remember and hard for others to guess. Not surprisingly, the RockYou list confirmed that nearly all capital letters come at the beginning of a password; almost all numbers and punctuation show up at the end. It also revealed a strong tendency to use first names followed by years, such as Julia1984 or Christopher1965.
“Sup3rThinkers” wasn’t included in the list of RockYou passwords, making it part of the 40 percent of hashes that require Redman to apply cracking techniques that go beyond a simple word-list attack. Fortunately for him, the RockYou corpus included both “sup3r” and “thinkers” as separate passwords. That allowed him to recover the password in question by appending each word in his list to every other word in the list. The technique is simple enough to do, although it increases the number of required guesses dramatically—from about 26 million, assuming the dictionary Redman uses most often, to about 676 million.
Other complex passwords require similar manipulations to be cracked. The RockYou list, and the hundred-millions-plus passwords that have collectively been exposed in its aftermath, brought to light a plethora of other techniques people employ to protect simple passcodes from traditional dictionary attacks. One is adding numbers or non-alphanumeric characters such as “!!!” to them, usually at the end, but sometimes at the beginning. Another, known as “mangling,” transforms words such as “super” or “princess” into “sup34” and “prince$$.” Still others append a mirror image of the chosen word, so “book” becomes “bookkoob” and “password” becomes “passworddrowssap.”
Passwords such as “mustacheehcatsum” (that’s “mustache” spelled forward and then backward) may give the appearance of strong security, but they’re easily cracked by isolating their patterns, then writing rules that augment the words contained in the RockYou dump and similar lists. For Redman to crack “Sup3rThinkers”, he employed rules that directed his software to try not just “super” but also “Super”, “sup3r”, “Sup3r”, “super!!!” and similar modifications. It then tried each of those words in combination with “thinkers”, “Thinkers”, “think3rs”, and “Think3rs”.
Such cracking techniques have existed for a decade, but they work far better now that the crackers possess a more intimate understanding of the ways people choose passwords.
“It’s vastly different than it was [before] because of these massive password lists,” said Rob Graham, CEO of penetration testing firm Errata Security. “We never had a really large password list to work from. Now that we do, we’re learning how to remove the entropy from them. The state of the art of cracking is much more subtle in that before we were guessing in the dark.”
A little finesse
That subtlety takes all sorts of forms. One promising technique is to use programs such as the open-source Passpal to reduce cracking time by identifying patterns exhibited in a statistically significant percentage of intercepted passwords. For example, as noted above, many website users have a propensity to append years to proper names, words, or other strings of text that contain a single capital letter at the beginning. Using brute-force techniques to crack the password Julia1984 would require 629 possible combinations, a “keyspace” that’s calculated by the number of possible letters (52) plus the number of numbers (10) and raising the sum to the power of nine (which in this example is the maximum number of password characters a cracker is targeting). Using an AMD Radeon HD7970, it would still take about 19 days to cycle through all the possibilities.
Using features built into password-cracking apps such as Hashcat and Extreme GPU Bruteforcer, the same password can be recovered in about 90 seconds by performing what’s known as a mask attack. It works by intelligently reducing the keyspace to only those guesses likely to match a given pattern. Rather than trying aaaaa0000, ZZZZZ9999, and every possible combination in between, it tries a lower- or upper-case letter only for the first character, and tries only lower-case characters for the next four characters. It then appends all possible four-digit numbers to the end. The result is a drastically reduced keyspace of about 237.6 billion, or 52 * 26 * 26 * 26 * 26 * 10 * 10 * 10 * 10.
An even more powerful technique is a hybrid attack. It combines a word list, like the one used by Redman, with rules to greatly expand the number of passwords those lists can crack. Rather than brute-forcing the five letters in Julia1984, hackers simply compile a list of first names for every single Facebook user and add them to a medium-sized dictionary of, say, 100 million words. While the attack requires more combinations than the mask attack above—specifically about 1 trillion (100 million * 104) possible strings—it’s still a manageable number that takes only about two minutes using the same AMD 7970 card. The payoff, however, is more than worth the additional effort, since it will quickly crack Christopher2000, thomas1964, and scores of others.
“The hybrid is my favorite attack,” said Atom, the pseudonymous developer of Hashcat, whose teamwon this year’s Crack Me if You Can contest at Defcon. “It’s the most efficient. If I get a new hash list, let’s say 500,000 hashes, I can crack 50 percent just with hybrid.”
With half the passwords in a given breach recovered, cracking experts like Atom can use Passpal and other programs to isolate patterns that are unique to the website from which they came. They then write new rules to crack the remaining unknown passwords. More often than not, however, no amount of sophistication and high-end hardware is enough to quickly crack some hashes exposed in a server breach. To ensure they keep up with changing password choices, crackers will regularly brute-force crack some percentage of the unknown passwords, even when they contain as many as nine or more characters.
“It’s very expensive, but you do it to improve your model and keep up with passwords people are choosing,” said Moxie Marlinspike, another cracking expert. “Then, given that knowledge, you can go back and build rules and word lists to effectively crack lists without having to brute force all of them. When you feed your successes back into your process, you just keep learning more and more and more and it does snowball.”
Attack of the dictionaries
This sort of password cracking entered the public consciousness thanks largely to the 1980s hacking thriller The Cuckoo’s Egg, in which author Cliff Stoll chronicles his real-life pursuit of a hacker who breaks into US computer systems and steals sensitive military and security documents on behalf of the Soviet KGB.
The book is packed with people in high places who undermine national security with poor password hygiene—an account on the network of defense contractor SRI Inc. with a user name and password of “SAC”, for example, or a super-user account for Lawrence Berkeley Labs that hadn’t been changed in years.
“When money was stored in vaults, safe-crackers attacked the combination locks,” writes Stoll, who as a displaced astronomer becomes the book’s unlikely hacker-hunting protagonist. “Now that securities are just bits in a computer’s memory, thieves go after the passwords.”
Stoll’s account was one of the first to show how a hacker armed with little more than a dictionary and a Unix computer could crack any password in the English language, even when the passcode was stored as only hash on a hacked machine. At one point, Stoll compares the crypto function—which was then based on the now-antiquated Data Encryption Standard (DES)—to a one-way meat grinder that converts each human-readable word into unique ciphertext.
“Did this hacker have a magic decryption formula?” Stoll asks. “If you turn the crank of a sausage machine backwards, pigs won’t come out the other end.”
Only later would Stoll learn that the hacker was feeding each word of the dictionary—starting with aardvark and ending with zymurgy—into the same DES hash function the hacked Unix systems used. The intruder then compared the output to the ciphertext contained in the intercepted password files.
“This was serious stuff,” Stoll wrote. “It meant that every time I’d seen him copy a password file, he could now figure out legitimate users’ passwords. Bad news.”
Stoll didn’t know it at the time, but even as the intruder was using a dictionary to guess his users’ passwords, cryptographers were fashioning a new type of attack that would ultimately be able to crack orders of magnitude more hashes in a faction of the time.
The rainbow connection
The germ of this new approach originated with Martin E. Hellman. In 1980, Hellman published a paper titled “A Cryptanalytic Time-Memory Trade-off” that proposed what came to be called Hellman tables. These tables were compiled ahead of a password attack and worked by using precalculated data stored on disk. Hellman tables reduced the computing resources required to crack a DES hash from about $5,000 to just $10. In 2003, fellow cryptographer Phillippe Oechslin proposed refinements to Hellman’s technique that vastly improved the effectiveness.
The result is now what’s known as rainbow tables. Almost overnight, they changed the way people went about cracking large numbers of password hashes. Like earlier time-memory tradeoffs proposed by Hellman, the concept was simple. Rather than asking a computer to enumerate each possible password in real-time and compare it against a targeted hash, precalculated data was stored in memory or on disk in a highly compressed form to speed up the process and lower the computing requirements needed to brute force huge numbers of hashes.
While earlier techniques had also tried this approach, they produced tables that were unnecessarily large and therefore unwieldy for cracking passwords. The genius of rainbow tables is a complex mathematical formula that expresses virtually every possible password combination without requiring each one to be stored in memory or on disk. Each table targets a specific algorithm and keyspace, and it contains a collection of chains. Each chain starts with an arbitrary password on one side and ends with a single hash value on the other end. The beginning password is put through the algorithm to generate its hash, and that value is then passed through one of many different “reduction functions” to generate a new password guess. The new password is then hashed.
The process continues until the hash at the end of the chain is reached.
The breakthrough wasn’t just the speed with which the tables could crack passwords; it was also their ability to crack almost every possible password as long as it didn’t fall outside the targeted keyspace. Rainbow tables are believed to get their name because each chain link uses a different reduction function, but all chains follow the same pattern—much as each color in a rainbow is different but all rainbows follow the ROYGBIV pattern.
The space savings alone are huge. Storing a table of every possible 10-character password with only lowercase letters, along with its corresponding MD5 hash, would require about 3,108 terabytes of disk space. A rainbow table expressing 99.9 percent of those combinations, by contrast, requires just 167 gigabytes.
In the era of Windows XP, when Microsoft’s underlying LAN Manager restricted password lengths to no more than 14 characters that at maximum were converted into two seven-character passwords and that converted all letters into uppercase, the results were devastating. In 2003, hackers releasedOphcrack, an open-source program that used rainbow tables to crack most Windows passwords in just minutes. Even more powerful cracking applications quickly followed.
“The fact that you can have this thing that anyone can download that will crack literally any Windows XP password hash was really cool,” said Marlinspike, who has designed CloudCracker, a service that takes about 20 minutes to check a WiFi password against 300 million possible words. “It’s not like I got 20 percent, or 50 percent, or even 80 percent. You got all of them. That was a major thing.”
The huge advances in GPU-assisted password cracking have diminished much of the advantages of rainbow tables, however. Passwords with six or fewer characters can be brute-force cracked with less fuss using GPU-powered computers, while passwords longer than nine or 10 characters require rainbow tables with unwieldy file sizes. That leaves only a small sweet spot of seven or eight characters where rainbow tables are especially useful these days.
Still, the tables maintain their status as a useful, if niche, tool for some hackers. Witness Free Rainbow Tables, a project that allows volunteers to donate spare computer cycles to generate publicly available tables that crack hashes returned by algorithms including SHA1, MD5, and NTLM. Its organizers have already amassed six terabytes worth of data. And with the participation of more than 3,900 volunteer computers, Free Rainbow Tables adds an estimated 36 megabits of table data every second, according to James Nobis, one of the developers behind the project.
Needs more salt
An updated version of LAN Manager known as NTLM was introduced with Windows NT 3.1. It lowered the susceptibility of Windows passwords to rainbow table attacks, but didn’t eliminate the risk. To this day, the authentication system still doesn’t apply cryptographic “salt” to passwords to render such attacks infeasible.
Salting appends several unique characters to each account password before running it though a cryptographic function, a process that blunts the value of rainbow tables and other types of precomputed attacks. A 16-bit salt, for example, requires 65,535—or 216—separate tables to be defeated. A random salt of 32 bits makes rainbow table attacks even more impractical by pushing the number of tables required to more than four billion. (The salt must be saved for each user and is usually stored beside the user name and password hash, so the information is available during each user login. Salt is rarely kept apart from the hash. Even when known, its virtue lies in its uniqueness, which defeats pre-computation of results.)
To illustrate what this looks like in practice, we created a new Linux account for “testuser.” The operating system stored the login data in a single long line of text (kept in /etc/shadow, where Linux stores passwords):
The line is broken up by colons—first comes the username, then the lengthy password section, then data about when the password was last changed, how old it is, when the account expires, and more.
The important bit for our purposes is the password section, which is internally divided by $ symbols. First comes the number that identifies the hashing algorithm used—in this case, 6 corresponds to the SHA-512 algorithm. Next is the salt, 2lvEhpi5. Finally, there’s the hash itself, a long string of letters and symbols.
In addition to making rainbow-table attacks infeasible, salting can also significantly add to the resources required to carry out more traditional cracking attacks, since it ensures that each stored hash is unique even if two users choose the same passcode. That, in turn, requires each hash in a compromised table to be cracked separately, even if they mask one or more identical plaintext passwords.
Despite the benefit of the technique, and the relative ease of implementing it, a surprising number of websites—including LinkedIn, Yahoo, and eHarmony—didn’t use it when they were recently breached. Hashes derived from NTLM, because they never use salting, are among the easiest to crack.
To the detriment of millions of Internet users, going without salt is only one of the many sins that popular websites routinely commit against password security.
No, SHA1 is not a secure hashing algorithm
A large percentage of the sites that fall prey to password breaches commit another error that further diminishes the protection of hashes: they use algorithms that were never designed to protect passwords. That’s because SHA1, DES, and MD5 were designed to convert plaintext into hashes extremely quickly using minimal computing resources, and this is exactly what people running password cracking programs want most. (NTLM, which still uses MD4, is also highly susceptible to cracking.)
To appreciate just how poor a password hashing choice these unsalted algorithms are, consider this: It took independent security researcher Jeremi Gosney about six days to crack more than 90 percent of the 6.5 million SHA1 hashes exposed in the LinkedIn breach. He recovered a fifth of the plaintext passwords in just 30 seconds. In the following two hours, he cracked another one-third of them. One day into the exercise, he had recovered a total of 64 percent, and in the five days that followed he cracked another 26 percent.
A key part of his success—besides his 500-million-strong word list and a computer equipped with four AMD Radeon HD6990 graphics cards—was the decision by LinkedIn engineers to hash passwords using SHA1. The algorithm uses a single cryptographic iteration to convert plaintext, allowing Gosney’s system to cycle through more than 15.5 billion guesses per second.
By contrast, algorithms specifically designed to protect passwords are engineered to require significantly more time and computation to convert plaintext into hashes. For instance, SHA512crypt, which is included in Mac OS X and most Unix-based operating systems, passes text through 5,000 iterations, a hurdle that would have limited Gosney to slightly less than 2,600 guesses per second. The Bcrypt algorithm is even more computationally expensive, in large part because it subjects text to multiple iterations of the Blowfish cipher that was deliberately modified to increase the time required to generate a hash. PBKDF2, a function built into Microsoft’s .Net software developer framework, offers similar benefits.
These computationally expensive functions require increased server processing, of course. This can increase the strain on Web servers and could even open them up to new types of DoS attacks, said Matt Weir, a Florida State University post-doctoral student whose PhD focused on passcodes. But the benefit in improved security largely outweighs the investment, many security experts argue. Had LinkedIn engineers used Bcrypt, for example, Gosney would have been able to make fewer than 1,750 guesses per second.
“If the LinkedIn passwords had been hashed using bcrypt, I never would have been able to crack 90 percent of them,” he told Ars in an e-mail. “The number of attacks I had to run, combined with the sophistication of the attacks I had to run to get many of the passwords [more than] 15 characters, would have taken literally centuries to finish. I would have given up after about a week.”
Hitting the wall
Even powerful computation engines have trouble cracking longer passwords using brute force. Assuming such an attack checks for all combinations of all 95 letters, numbers, and symbols available on a standard English-language keyboard, it takes a matter of hours for a desktop computer with anIntel Core i7 980x processor to brute-force crack any five character password. Increasing the password length by just one character requires about a day; bumping the length by one more character, though, dramatically increases the cracking time to more than 10 days. Rob Graham, the Errata Security CEO who calculated the requirements, refers to this limitation as the “exponential wall of brute-force cracking.”
Adding a GPU card to a system undoubtedly helps, but not as much as many might imagine. An AMD Radeon 6970 still needs more than 10 days to brute force a seven-character passcode. And the wall barely budges even when significantly more powerful resources are brought to bear. Using an Amazon EC2 cloud system that combines the horsepower of more than 1,000 individual GPUs, it still takes about 10 days to brute-force an eight character password.
But with few exceptions, the exponential wall rarely impedes most password crackers. As demonstrated by the RockYou dump, the typical person is notoriously sloppy when choosing a passcode. A full 70 percent of them contained eight characters or less. Only 14 million of the 32 million total were unique, showing that a large percentage of passwords are duplicates. Atom, the Hashcat developer and password-cracking expert, estimates that 66 percent of entries from the typical unsalted hash list can be cracked by a single person in less than two days.
So what can the average person do to pick a passcode that won’t be toppled in a matter of hours? Per Thorsheim, a security advisor who specializes in passwords for a large company headquartered in Norway, said the most important attribute of any passcode is that it be unique to each site.
“For most sites, you have no idea how they store your password,” he explained. “If they get breached, you get breached. If your password at that site is unique, you have much less to worry about.”
It’s also important that a password not already be a part of the corpus of the hundreds of millions of codes already compiled in crackers’ word lists, that it be randomly generated by a computer, and that it have a minimum of nine characters to make brute-force cracks infeasible. Since it’s not uncommon for people to have dozens of accounts these days, the easiest way to put this advice into practice is to use program such as 1Password or PasswordSafe. Both apps allow users to create long, randomly generated passwords and to store them securely in a cryptographically protected file that’s unlocked with a single master password. Using a password manager to change passcodes regularly is also essential.
Given the sophistication of the crackers, anything less simply means your password is trivial to break.
“The whole password-cracking scene has changed drastically in the last couple years,” said Weir, the Florida State University post-doctoral student. “You can look online and you can generally find passwords for just about everyone at some point. I’ve found my own username and passwords on several different sites. If you think every single website you have an account on is secure and has never been hacked, you’re a much more optimistic person than I am.”