So 15 years later I reopened Introduction To Algorithms By Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein. And I decided to see if my experience in coding would help me appreciate this book for all the knowledge it could bring me.
So I decided to take a look at the simple heap structure and algorithms:
Let's look at the pseudo code (see here for the whole chapter):
When I see this pseudo code, and think of actual production code, I am dazzled:
- it has meaningless variable;
- API and variable names (like the global heap_size) are quite hard to grok;
- I wonder if recursion is really important knowing it is hard to read, debug, and may reach a recursion limit;
- I have a hard time understanding how it works.
Here is my heapify function (the whole code is here https://gist.github.com/3927135 if you want to check the correctness):
def heapify(_array,index=1, heap_size=SENTINEL): """bubbles down until it is correctly located. heap_size can be used internally for optimization purpose""" if heap_size == SENTINEL: heap_size= len(_array) a_child_is_bigger= True while a_child_is_bigger: largest= index left_pos, right_pos = left_offset(index), right_offset(index) #check left if left_pos < heap_size and _array[left_pos] > _array[index]: largest= left_pos #check right if right_pos < heap_size and _array[right_pos]>_array[largest]: largest= right_pos if largest == index: # we are finally the king of the hill, end of story a_child_is_bigger= False else: #swap with child // bubble down _array[index], _array[largest] = _array[largest], _array[index] index= largest
And by coding it revealed what I feared the most: CS and computer developers don't live in the same world: the way a code is written matters.
- getting read of the asymmetric test line 5 greatly improves the understanding of the logic;
- by using full name for tests conditions you really help people (you included) understanding thus maintaining your code;
- recursion comes in the way of understanding straightforward code;
- one letter variable names are really unhelpful variable names;
- their code typography is even worse than mine.
By the way if you want really efficient heap structure in python please use : http://docs.python.org/library/heapq.html because it is just plain better, superbly documented, tested and the source code is quite nice too.
Now, I am plain lost: when I check CS papers because I want to solve a problem I often stumble on unreadable pseudo code and verbiage. Understanding the API, the variable name ... takes an awful lot of time. When I write in python I have the feeling that it is a Pseudo Code Translator, so if we assume my implementation is close visually and logically from pseudo code, does it costs that much to improve pseudo code readability?
If I can do it knowing that I am one of the most stupid developer among the one I know, why or how can't they do the same. Pseudo code is for sharing knowledge, do we really need to waste time de-ciphering these pseudo codes to be a real or better developer?
When you see equivalent code in industry grade application, you normally hate that. It is plain bad practices. Isn't computer sciences supposed to teach elite grade developers?
And when I see such code, I sometime wonder if the guy understood what he wrote.
Meaningful variables names, test conditions, readability is what matters, because code is meant to be improved and fixed.
With all these questions in my head I came to a conclusion: I don't take for profound what I think is uselessly obscure, so my opinion of CS papers and academics has dramatically dropped.
EDIT: now, I see that bulbbling down and bubbling up can be extracted from the logic, and I guess it can be used to make a heap usable for min/max efficient inserting/extraction.