An Insight into Hashing: The Backbone of Cybersecurity

Hashing

Hashing is a fundamental concept in computer science and cybersecurity. Hashing is a process of transforming a string of data characters into another form, often shorter and fixed in length. This transformation is generally irreversible and does not involve a key, making it a one-way function. Hashing has many applications in computer science, such as indexing, easy retrieval, security, digital signatures, cryptography, etc.

Hashing in action: Suppose we have a set of strings {“ab”, “cd”, “efg”} and we want to store them in a table. We can use a simple hash function that takes the first letter of the string and converts it to a number from 0 to 9 using the ASCII code.

For example, the string “ab” would have a hash of 97 % 10 = 7, where 97 is the ASCII code of ‘a’ and % is the modulo operator. The string “cd” would have a hash of 99 % 10 = 9, and so on.

Encryption

Encryption is another process of encoding data with a key, but unlike hashing, it is generally reversible. Encryption is used to protect data from unauthorized access or modification. Many encryption algorithms use different techniques to encrypt and decrypt data. Some of the common types of encryption are:

  • Symmetric key encryption: This is a type of encryption that uses the same key to encrypt and decrypt the data. The key must be shared between the sender and the receiver in a secure way.

    Some examples of symmetric key encryption algorithms are Caesar Cipher, AES (Advanced Encryption System), 3DES (Data Encryption Standard), etc.

  • Asymmetric key encryption: This is a type of encryption that uses two different keys: a public key and a private key. The public key can be used to encrypt the data, and the private key can be used to decrypt it or vice-versa. The public key can be shared with anyone, but the private key must be kept secret by the owner.

    Some examples of asymmetric key encryption algorithms are RSA (Rivest-Shamir-Adleman), ECC (Elliptic Curve Cryptography), etc.

Data Retrieval

One of the common uses of hashing is to store and retrieve data efficiently. Hashing converts object data into an integer value, which can be used as an index in a hash table. A hash table is a data structure that maps keys to values using an array. By using hashing, the data can be accessed in constant time (O(1)), regardless of the size of the data set.

Digital Signatures

Another use of hashing is to authenticate message senders and receivers. A digital signature is a way of verifying the integrity and authenticity of a document or message.

A digital signature consists of two parts: a signed document and a message digest. A message digest is a hash of the document, which serves as a fingerprint of the document. The message digest is encrypted with the sender’s private key and sent as a separate transmission to the receiver.

The receiver rehashes the document and matches it with the digest. If they match, the document is verified.

Here is an example of how a digital signature works:

Alice wants to send a contract to Bob and sign it digitally.
Alice uses a hashing algorithm, such as SHA256, to create a message digest of the contract. The message digest is a unique and fixed-length value that represents the contract.
Alice encrypts the message digest with her private key, which only she knows. This encrypted message digest is her digital signature.
Alice sends the contract and her digital signature to Bob as two separate transmissions.
Bob receives the contract and the digital signature from Alice.
Bob decrypts the digital signature with Alice’s public key, which is available to anyone. This gives him the message digest that Alice created.
Bob uses the same hashing algorithm that Alice used to create a new message digest of the contract.
Bob compares the new message digest with the one he decrypted from Alice’s digital signature. If they are identical, then he knows that the contract has not been altered and that it was signed by Alice

Cybersecurity

Hashing can also be used to enhance cybersecurity and prevent data tampering. For example, a sender can hash a file and add the hash to the file before sending it to the receiver. The receiver can rehash the file and compare the hash with the one attached to the file to check if the file has been modified on the way.

A similar technique is used in blockchain applications, where each block contains a hash of the previous block to ensure the validity of the chain.

Cryptography

Hashing is also an essential component of cryptography, which is the science of securing data. There are many hashing algorithms that are designed to produce unique and unpredictable hashes for different inputs. These algorithms are used for various purposes, such as password hashing, message authentication codes, key derivation functions, etc.

Some of the popular hashing algorithms are SHA1, SHA2, SHA3, MD2, MD4, MD5, etc.

Collision

A collision occurs when two different inputs result in the same hashed value with the same key, or when two different keys result in the same hashed value for two different messages. A collision is possible because the hashed result is generally shorter than the original message, which means that the output space of possible values may be smaller than the input space. Therefore, there are more possible inputs than outputs, and some inputs will inevitably map to the same output.

Collisions can cause problems in data retrieval, security, and cryptography.

There are several ways to reduce the chance of collision or mitigate its effects. Some of them are:

  • Open addressing: This is a method of resolving collisions in hash tables by finding another empty slot in the array for the new key-value pair.

  • Separate chaining: This is another method of resolving collisions in hash tables by using linked lists or other data structures to store multiple key-value pairs in the same slot.

  • Salting: This is a technique of adding a random value to the input of the hashing algorithm to make it produce different results for the same input. This is especially useful for password hashing.

Salting

Salting is a way of preventing collision and making passwords more unpredictable. Salting refers to adding a value (called salt) to the input of the hashing algorithm to make it give unique results, even if the same input is passed.

For example, if two users have the same password “123456”, 
then hashing them will produce the same hash value. 
However, if we add different salts to each password before
hashing them, such as “123456+salt1” and “123456+salt2”, 
then they will produce different hash values.

Hash functions are deterministic, meaning that they will always produce the same output for the same input. Salting makes them appear non-deterministic, meaning that they will produce different outputs for the same input. Salts are generated by a cryptographically secure function, which ensures that they are random and unpredictable.

Salting prevents some common attacks on password hashing, such as:

  • People tend to use easy passwords that can be guessed or found in dictionaries. If a hacker gets access to the hash table that stores passwords, they can use precomputed hash tables (called rainbow tables) that contain hashes of common passwords and their corresponding plaintexts. By comparing the hashes in both tables, they can find out the passwords.

  • A hacker can get access to multiple password tables from different websites and use a dictionary attack to find patterns in the hashes. For example, if the same hash value appears in different tables, then it means that the user has used the same password for different accounts. This can lead to the compromise of the hashing algorithm and the user’s security.

Salting prevents these kinds of attacks by making the hashes unique and unpredictable. Salts are generated by a cryptographically secure function and should be different for each user and each of their hashed information. Salts should not be publicly accessible or easy to guess, so usernames should not be used as salts.

To end this all, here's a poem about a Computer eating Hash Browns that Bing's Chat wrote using GPT-4 model:

A computer was hungry one morning
It wanted to try something new
It scanned the internet for recipes
And found one for hash browns to do

It gathered the ingredients needed
Potatoes, oil, salt, and pepper
It shredded the potatoes with its wires
And heated the oil in a skillet

It formed the potatoes into patties
And fried them until they were brown
It sprinkled some salt and pepper on top
And took a big bite with a frown

It did not like the hash browns at all
They were too greasy and bland
It spat them out and deleted the recipe
And decided to stick to its RAM