Password Hashing

Oct 10 2011

sph writes in Hacker News:

The problem is that it’s relatively easy to brute-force attack even a salted hash today.

On the first day, people stored passwords in plain text. If someone got access to your database, they had all the passwords.

On the second day, people decided to hash the passwords so that people couldn’t unencrypt them. Then the attackers created rainbow tables that correlated each hash with its associated password since the password would always hash to the same value (so you have “109BCA” in your database, but they have a table that has that hash and that “12345” hashes to that value).

On the third day, people decided to salt the hashes to render the rainbow tables ineffective. Now, each password would hash to a different value so they couldn’t just look up the password for a given hash. However, as computing power increased it became easy to just brute-force the password. You have the hash and you have the salt so you can just try every password with the hash until you get a match.

  hashed_password = "ACBDEF1234567890"
  salt = "12345"
  possible_passwords = ["password1", "ilikesheep", "micro$oft"]
  possible_passwords.each do |pass|
    if Digest::SHA1.hexdigest("#{pass}#{salt}") == hashed_password
      real_password = pass
    end
  end

The problem is that code like that has gotten really cheap to run and it’s incredibly parallel (you can have loads of machines each take a piece of the workload – oh, and hopefully no one will make a joke that you’d never write that in Ruby; I just felt that would be easy pseudo-code to demonstrate). You can just try combinations of passwords at random, but there are lists of more common passwords that you can try first making it go even faster. Hashing algorithms are meant to be fast because they’re meant to be usable on a large piece of data. As such, it also becomes fast to brute force check all sorts of combinations.

On the fourth day, people started using things like bcrypt because bcrypt was made to be slow. The fact that bcrypt is slow means that if someone wants to brute force it, it will be very slow going for them. If one can check 10,000 SHA1 hashes in a second, but only 10 bcrypt hashes in a second, it will take 1,000x longer to crack the bcrypt stored password (those are just made-up numbers as an example).

Salting is better than not salting because they have to brute force the password. However, as computing power increases it isn’t so much better because brute forcing is becoming easy. One needs to use a slow algorithm to make sure that cracking it will also be slow (prohibitively slow). Bcrypt also allows you to specify how much work you want it to have to do. This way, as computing power increases, you can increase the amount of computing power needed to compute the bcrypt. By contrast, hashes are meant to go fast and so every day they’re getting less secure.

No responses yet

Is genius simply the product of hard work?

Oct 10 2011

Excerpt from chesser‘s comment in Hacker News:

Bobby Fischer was completely obsessed with chess and played and studied it incessantly.

9 years of manic dedication and he “suddenly” got good.

His mother spoke something like 8 languages and his father was a Hungarian physicist who headed the Theoretical Mechanics section of the Naval Ordnance Laboratory and was an expert in elasticity and fluid dynamics.

One famous story about his memory:

“One day when he was in Iceland, Fischer called Frederick Olaffson, Iceland’s only Grandmaster. Olaffson’s Icelandic-speaking daughter answered the phone and explained her parents were out and would return at suppertime. Fischer understood nothing that was said because he did not know the language. But he listened, apologized and hung up. Later that day Fischer met with another Icelandic player who spoke English. He explained what had happened and repeated every Icelandic word he had heard on the phone, imitating the sounds with perfect inflection. The Icelandic player translated the message word for word for Fischer.”

Despite being prodigiously “intelligent”, it still took years of dedication to get good, and years more to get really good.

Plenty of average people undoubtedly beat him at chess when he was a kid. When you think about it, it should be obvious that he was always that smart. Just because you’re a kid and don’t know anything yet doesn’t mean you aren’t smart. “Smarts” is not the same as skill.

Incidentally, his record as the youngest grandmaster in history lasted for many years until it was broken by Judit Polgar, whose father was explicitly running an experiment with his daughters to prove that prodigies are made, not born. All three daughters became chess experts. He explained that Judit, the youngest, was the most successful because she worked the hardest. She’s the only female ever ranked in the top-10 of the “Men’s” ratings list.

No responses yet

Henry Ford on Entrepreneurship

Oct 10 2011

Whether you think you can, or you think you can’t - you’re right.  - Henry Ford

No responses yet

Programming & writing

Oct 10 2011

wrook writes in slashdot:

I swtiched jobs from being a computer programmer to being an ESL teacher in Japan. Japan is somewhat famous for churning out students who know a lot *about* English, but can’t order a drink at Mac Donald’s. We used to have a name for those kinds of people with regard to programming languages: language laywers. They can answer any question you put to them *about* a programming language, but couldn’t program to save their life. These people often make it past job interviews easily, but then turn out to be huge disappointments when they actually get down to work. I’ve read a lot about this problem, but the more I look at it, the more I realise that these disabled programmers are just like my students. They have a vocabulary of 5000 words, know every grammar rule in the book but just can’t speak.

My current theory is that programming is quite literally writing. The vast majority of programming is not conceptually difficult (contrary to what a lot of people would have you believe). We only make it difficult because we suck at writing. The vast majority of programmers aren’t fluent, and don’t even have a desire to be fluent. They don’t read other people’s code. They don’t recognise or use idioms. They don’t think *in the programming language*. Most code sucks because we have the fluency equivalent of 3 year olds trying to write a novel. And so our programs are needlessly complex.

Those programmers with a “spark” are programmers who have an innate talent for the language. Or they are people who have read and read and read code. Or both. We teach programming wrong. We teach it the way Japanese teachers have been teaching English. We teach about programming and expect that students will spontaneously learn to write from this collection of facts.

In language acquisition there is a hypothesis called the “Input Hypothesis”. It states that *all* language acquisition comes from “comprehensible input”. That is, if you hear or read language that you can understand based on what you already know and from context, you will acquire it. Explanation does not help you acquire language. I believe the same is true of programming. We should be immersing students in good code. We should be burying them in idiom after idiom after idiom, allowing them to acquire the ability to program without explanation.

No responses yet

Which road do I take?

Oct 09 2011

plinkplonk says:

One day Alice came to a fork in the road and saw a Cheshire cat in a tree. ‘Which road do I take?’ she asked. ‘Where do you want to go?’ was his response. ‘I don’t know’, Alice answered. ‘Then’, said the cat, ‘it doesn’t matter’.

A good (non generic) answer to “What should I learn?” (asked by a self educated programmer here) depends on the answer to the question “What (kind of programmer) do you want to be?” Step 1, answer the Cheshire cat! )

No responses yet

Steve Jobs on Media and People

Oct 09 2011

When you’re young, you look at television and think, There’s a conspiracy. The networks have conspired to dumb us down. But when you get a little older, you realize that’s not true. The networks are in business to give people exactly what they want. That’s a far more depressing thought. Conspiracy is optimistic! You can shoot the bastards! We can have a revolution! But the networks are really in business to give people what they want. It’s the truth.

- Steve Jobs in WIRED magazine, February 1996

No responses yet

Why it is hard to break hash functions

Jan 13 2011

I was going through the skein hash function when one friend asked why it is difficult to break a cryptographic hash function. A comprehensive introduction to hash functions is well beyond the scope of this article, so if you want to refresh your memory, please read it up from elsewhere and comeback.

Any good hash function must have the following features:

  1. Preimage resistance: given a hash h it should be hard to find any message m such that h = hash(m)
  2. Second preimage resistance: given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2).
  3. Collision resistance: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2).

When somebody tells you that it is impossible to break a hash function, you ask the question: What does it mean to break a hash function? The short answer is that if you are able to find out a message corresponding to the given message digest, the hash function is said to be broken.

If you are a programmer, you may think:

If hash function is implemented using some algorithm, shouldn’t it be possible to reverse the algorithm and get back the message corresponding to the message digest? If the algorithm runs step-by-step in the CPU, shouldn’t it be possible to trace back through the steps that resulted in the output?

The answer is of course: No.

Why?

The simple answer is that all hash functions can be considered to be one way functions. One-way functions are mathematical functions using which you can compute the output for any input, but extremely difficult to find out the input when any random output value is given. ie., one-way functions cannot be inversed/reversed.

For example, take the integer factorization problem. It is easy to multiply the numbers 101 and 103, but it is very difficult to find the factors of 10403 (If you don’t know the answer in advance, of course). Of course you can find the factors using a computer, but integer factorization becomes harder and harder as the magnitude of the numbers go up. For large enough numbers, even computers find it incredibly difficult to find the prime factors.

Cryptographic hash functions employ some kind of one-way function so that the message digest cannot be easily reversed to form the corresponding message.

Let us examine the Skein hash function and see why it cannot be broken. (Skein is an example of  a hash function that uses a block cipher as a basic building block. Many modern hash functions have similar structure.)

The basic structure of the skein function is shown in the following figure:

The basic block in the Skein hash is a function called UBI. As you can see there are several inputs to the hash function. The only variable input is the message. This means that we can pre-compute the output of the first UBI block. So in essence our problem becomes:

You would have assumed by now that to reverse this hash function, we will have to reverse the UBI blocks. Let us take a peek inside the UBI block.

The Skein hash function uses Threefish encryption function in its UBI blocks. The original message is considered as the plain-text input to the first UBI block and it is encrypted using the precomputed value from the config block as the key. This encrypted value is input as the key to the last UBI block.

If we try to reverse the last UBI block, we will understand that what we have to do in essence is to retrieve the key of the Threefish encryption given the plain-text and the cipher text. Now if you know anything about encryption algorithms, you would know that it is impossible to retrieve the key given a {plain text, cipher text} pair for a good enough encryption algorithm. This is a basic property of encryption algorithms.

All this essentially boils down to one insight:

Skein hash function is hard to break because it is hard to find the key for the encryption algorithm used inside the UBI blocks – In case of Skein the block cipher used is ThreeFish. Many popular hash functions use some type of block cipher inside them and that is what gives them their strength.

So now the question becomes:

Why is it hard to find the key of an encryption algorithm?

To find out the answer, we should look inside the specific encryption algorithm used. More on that, later.

5 responses so far

« Newer posts Older posts »