Leet Code problem #49 - Group Anagrams

2023-07-08

In my free time i like to work on Leet code problems. One that i stumbled upon recently is the problem Group anagrams builds on top of the simpler problem to evaluate if a string is an anagram or not.

Here is a description of the problem:

You are given an array of strings. returns an array of arrays of strings so that each array of strings is contains anagrams and every string from the input array must be in some array of the result array. In other words all anagrams from the input are grouped together.

Examples:

Input: strs = ["eat","tea","tan","ate","nat","bat"] Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Input: strs = [""] Output: [[""]] Input: strs = ["a"] Output: [["a"]]

Constraints:

1 <= strs.length <= 10^4 0 <= strs[i].length <= 100 strs[i] consists of lowercase English letters.

So how do we solve this?

Lets break the problem down and first look at the problem of determining wether two strings are anagrams.

Take the two strings ["hello", "olleh"]. How do we know if they are an anagram or not? The rule of anagrams is that they have to contain exacly the same letters. So if we sort the two strings that means they have to be identical. so we can then loop through each character in the strings and compare them. if they are all the same, they are an anagram.

"hello" -> "ehllo"
"olleh" -> "ehllo"

or in a negative case ["hej", "jah"]

"hej" -> "ehj"
"jah" -> "ahj"

We can also see that if two strings are of different length, they cannot possibly be anagrams. So we can start by ruling out two strings in our code if their lenght does not match.

Now given that they are the same length, let us turn our attention to the complexity of the case that we compare them by sorting. Our original problem require us to compare many of these strings and the strings can be also be long. so we must have a decent time and space complexity. First we know that sorting is genereally done in O(n*log n) where n is the number of elements we are sorting. So comparing two strings will make n to the length of the strings and first do an O(n*log n) sort and then we will iterate through strings which is done relative to the length of the strings O(n).

Regarding memory compelexity we are not storing any intermedieary values, everything is done in place, so we have constant memory complexity O(1).

But is this the fastest way to see if two strings are anagrams? No. There is of course a way to do a memory time trade-off as in many other algorithmic problems. If we use a hashmap to represent the strings, we can compare the hashmaps instead of comparing sorted strings. The power we get is that a hashmap have constant time access and constant time insertion regardless of position because we are using key value pairs. The representation we will be using is the following:

1 { 2 'h': 1 3 'e': 1 4 'l': 2 5 'o': 1 6 }

Here we see that for the word "hello" we store the numbers of characters as the value for the key which is the character itself. For two strings to be an anagram the hashmaps must be identical, meaning they have the same keys and the same values. Some languages have iterable hashmaps and some don't, but we can either chose to iterate through a hashmap or iterate through one of the strings and letting each character select a key value pair in the both hashmaps, since each character is a key in the hashmap. We here have the possibility that we try to select a key that doesn't exist, which is the negative case it also differs from language to language how this is dealt with. Imporant to note is that to simplyfy the comparison of the hashmaps we have to make sure that the strings are the same length first, otherwise we risk getting false positives.

Okay so what is the time and space complexity here? We are now only iterating through each array a constant amount of times and no sorting is required thanks to the hashmap. This means that we are in O(n) time. But this doesn't come for free. We now need to create a hashmap for each string in the input array which makes space in O(n).

So how do we use this strategy in our extended problem where we have to not only compare anagrams but also group them together?

My solution was the following:

I use the same strategy to compare anagrams, but i also have to map them to a certain group. I used another hashmap to do this mapping.

the format will then become the following:

{
    "hello": ["elloh", "hello"]

}

Which we can easily transform into an array of arrays, since it is stated in the problem that the order of the arrays does not matter.

I am learning golang at the moment and will therefore be providing my solution in golang. It is easy to read and i don't use any advanced language specific concepts.

1 2func groupAnagrams(strs []string) [][]string { 3 // a map of results 4 results := make(map[string][]string) 5 keymap := [26]int{} 6 7 for k := 0; k< 26; k++ { 8 keymap[k] = 0 9 } 10 11 for _, word := range(strs){ 12 13 for _, letter := range(word){ 14 keymap[letter-'a']++ 15 } 16 17 hash := "" 18 for key, val := range(keymap) { 19 for ;val > 0; val--{ 20 hash += string(rune(key+'a')) 21 } 22 } 23 24 results[hash] = append(results[hash], word) 25 26 } 27 28 res := [][]string{} 29 30 for _ , list := range(results){ 31 res = append(res,list) 32 } 33 return res 34} 35 36

So what is going on here?

  • first we create a resuls map which will have the format { "ehllo": ["hello","olleh"]}

  • The main grunt of the work is done within the for loop, where iterate through the input array strs. We start by creating a hashmap for the string. I chose to create a map containing all the possible keys and setting them to 0.

  • we then iterate through each character of the current string. We increment the value of the key of each character. We do in a similar as described in the simplified version with a few small change. Instead of mapping a character to a value, we are using the characters ASCII value to map a integer from 0-25 to a value instead. This is beacuse we need to care about order of the characters to create inside our mapping and to create a hash to store it in our result map. And an ordered map does not have O(1) insertion, so we use a trick with ASCII values to store the characters in an array, making the index be calculated through the ascii value of the character 'a' to 'z'. This works because ASCII values are orderd and each is represented with a numerical value. The conversion is done with the letter- 'a'. So for the string "hello" we get:

1 for _, letter := range word { 2 keymap[letter - 'a']++ 3 } 4// a b c d e f g h i j k l m n o p q r s 5// [0,0,0,0,1,0,0,1,0,0,0,2,0,0,1,0,0,0,0,...] 6

characters a-z are mapped to indecies 0-25. and the value at index for the letter is incremented.

'e' maps to index 4 and h maps to index 7 etc.

Next we need to create a hash for this word so that we can put it into the correct group.

This is done by iterating over the keymap and seeing what characters are in the map.

1 hash := "" 2 for key, _ := range(keymap) { 3 for ;keymap[key] > 0; keymap[key]--{ 4 hash += string(rune(key+'a')) 5 } 6 } 7

We now take the key append its character value to the hash, while also decrementing the values in the keymap. This makes us create the correct hash, snice the keymap is a ordered array, while simuntanioulsy reseting the keymap for the next word in our loop.

when we have the correct hash, we can just append the current word to the results map. with

1 results[hash] = append(results[hash]+ word)

Iterating through all the words makes us create a results map that contains all the groups. Now we just need to format this hashmap into its desired structure, which is an 2d array.

1 res := [][]string{} 2 3 for _ , list := range(results){ 4 res = append(res,list) 5 } 6

Nothing interesting here, just simple insertion in a new array by looping over the values in the results map.

lastly we can return the res 2d array and we are done with the algorithm

So what is the time and space now? we know that the comparison between 2 words is linear to the length of the word. We know that iterating thought the list of words is only done once, which is then linear to the lenght of the input array. So we are doing a O(k) * O(n) = O(k*n) where k is the length of the input list, and n is the average length of the words in the list. Or if you want to be conservative you can use the upper bounds of the constraints to define k and n. k= 100 , n = 10^4

The space here is linear with respect to the number of words in the input since we store every word once in our results map. Our keymap only allocates constant space. O(k*n) + O(1) = O(k*n)

Thank you for reading!