The last day I saw an interesting abuse of hash functions.
In an application that processed strings, there was a part where it compared medium sized strings. Instead of using the in-built string comparison routine, they calculated the hash values of the strings and compared the hashes. Clever?
Wrong!
Finding the hash value (MD5 in this case) of a string is an expensive task. If you do not retain the hash values for future purposes, just compare the strings directly to know whether they are equal.
And, you know, hash collisions could provide false positives. Even if the performance wasn’t an issue, you should still never do this.
Right.
if you ask me sounds like a good idea…. if you are storing the string in the DB, and comparing large strings… it would be perfect. And how on earth are you gonna get false positives?
Um, how about providing some data to back up that claim?