nouvelles du froid #2



Donc il est tombé 50 cm de neige. Soyons honnêtes, et tant pis pour ceux que ça a ennuyé, c'est fun. J'ai loupé toutes les photos dans la tempête mais de toute façon j'y voyais si peu qu'à un moment j'ai ouvert les yeux et j'étais au milieu de la route avec les voitures en train de foncer vers moi... huhu.  Donc si j'y voyais rien j'imagine que l'appareil ne voyait pas bezef non plus.

Après pas mal de passage de chasse neige en milieu d'après midi la rue ressemblait à ça (il pouvait y avoir des tas de neiges d'1m50 + par endroit) :


Sinon, un endroit ou je vais prendre mon café se trouve dans le quartier latin et la vue qui normalement ressemble à ça :

Ressemble à ca maintenant: 



Dans cet endroit, j'ai rencontré Nicolas, lui après 5 ans à passer ses vacances à Montréal où il avait rencontré sa mie, il a décidé de s'installer, et manque de bol, il a eu été un peu plaqué à son arrivée à Québec. Ça ne le démoralise pas trop, et muni d'un visa touristique il compte trouver son travail.


Là sur cette image il est heureux, il a profité des soldes d'hier matin et réussi à acheter sa literie à -70% (c'est les soldes en ce moment) et je suis jaloux. J'ai profité un peu des soldes mais juste pour un pull et un jean. Ben je me fais pas de soucis pour lui, je pense qu'il y arrivera.

J'ai pas encore regardé pour l'apparte, honte à moi :) je sais que c'est l'affaire de 3 semaines à trouver sans se fouler. Donc j'ai encore un peu de temps devant moi.

Sinon je m'amuse à faire capoter l'appareil photo en prenant des clichés où il y a des éclairages et des infrarouges, les résultats sont amusants.




Lettre à ma maman #1

Alors, hier alors que je pissais dans une ruelle, je me disais, faudrait que je rassure ma maman restée au pays quand au fait que Montréal est une ville où il fait beau vivre. C'est vrai, après 5 semaines ici, je sens bien mon expertise :)

Et alors, que ce projet me traversais l'esprit je me suis dit faut vraiment le faire et pour c'est ainsi que j'ai fait cette photo :


Hé bien, tu me croiras pas : c'est à ce moment précis que le flic qui était en train de me suivre pour vérifier si j'étais pas en train d'uriner illégalement est arrivé.

Il m'a vu en train de photographier, et il s'est excusé.

Et d'un coup, j'ai compris que c'était le Karma. On a une idée con en pissant, le genre de truc qu'on ne fera jamais, mais d'un coup, paf, on sait qu'on doit le faire parce que l'univers vous envoie un signe.

Donc voilà le projet c'est de t'envoyer parfois une ou deux photos avec une tranche de vie pour que tu comprennes que ton fils est pas malheureux à Montréal

On va commencer par Alex :

Nous nous connaissons de mon café concert favori du moment: L'Escogriffe.

Il vient de la région du lac Saint Jean comme une personne sur 6 ici. Ils ont des accents à couper au couteau (bien qu'ils prétendent qu'ici c'est moi qui aurait un accent...).

Il était freelance graphiste en informatique puis il en a eu marre de courir après l'argent, alors il a décidé de devenir un métallo. 

Là il va partir voir sa famille 2 semaines. Ensuite, il attend avec impatience de  reprendre les cours pour apprendre une méthode de soudure utilisée pour les pipelines ce qui lui permettrait d'aller sur les chantiers là où ça frette (caille des meules sévère en argot correct).

Il est content d'avoir changer de métier. Même si le graphisme ne lui manque pas, il s'est acheté un atelier de sérigraphie et ferait bien des T-Shirts pour lui et ses amis du travail.

Voilà ...

Tu t'inquiètes de ce que c'est sur internet?

Ahahah, tant que tu partages pas le lien, personne viendra le lire. Puis de toute façon, j'ai pas honte de rassurer ma pôôôvre mère épleurée dont le fils et si loin.

Auto documenting and validating named arguments

In real life there are companies were keywords (**kw) are prohibited, and others where positional arguments can stack up as high as seven. Let's state the obvious concerning arguments:

Human beings are flawed


The positional arguments are fixed arguments. Some are mandatory others are optional (and are being set at function definition). 

Named arguments is a way to give arguments which are known by their name not their positions. 

If we humans were not limited by our short term memory then positional arguments would be enough. Alas we are limited to 7 items in memory +- 2. I normally strongly advise to remember that even if you can go up to 9 because you are genius, when you are woken up at 3 am after a party for a critical bug your short term memory might drop to 5+-2 items. So be prepared for the worse and follow my advice and try to stick to 3 mandatory positional arguments the more you can. 

Then, you have the case against the named arguments.

Named arguments are great for writing readable calls to function especially when there are a lot of optional arguments or when calls can be augmented on the fly thanks to duck typing and all funky stuffs that makes programming fun.

However, the documentation of the function might seems complex because you have to do it by hand as you can see here :
https://github.com/kennethreitz/requests/blob/master/requests/api.py

Plus the signature of the function is quite ugly.

    
get(url, **kwargs)
        Sends a GET request. Returns :class:`Response` object.
        
        :param url: URL for the new :class:`Request` object.
        :param **kwargs: Optional arguments that ``request`` takes.

So, even if named arguments are great they are painful to document (thus a little less maintainable), and gives function a cryptic signature when used in the form **kwargs.


Having explicit named arguments with default values are therefore «more pythonic» since :

Explicit is better than implicit. 

Decorators for easing validation documentation 



Since I am an advocate of optional named arguments and I find them cool, I thought why not write code ... 

>>> @set_default_kw_value(port=1026,nawak=123)
... @must_have_key("name")
... @min_positional(2)
... @validate(name = naming_convention(), port = in_range(1024,1030 ))
... def toto(*a,**kw):
...     """useless fonction"""
...     return 1

... that would magically return a documentation looking like this:


toto(*a, **kw) useless fonction

keywords must validate the following rules:
  • key: <port> must belong to [ 1024, 1030 [,
  • key: <name> must begin with underscore
at_least_n_positional :2

keyword_must_contain_key :name

default_keyword_value :
  • params: port is 1026,
  • params: nawak is 123

The idea was just to make a class making decorator with reasonable defaults that would enhance the decorated function documentation based on functools.wrap code.


class Sentinel(object):
    pass
SENTINEL=Sentinel()

def default_doc_maker(a_func, *pos, **opt):
    doc = "\n\n%s:%s" % (a_func, a_func.__doc__)
    posd= "%s\n" % ",".join(map(str, pos))  if len(pos)  else ""
    named = "\n%s" % ",\n".join([ "* params: %s is %r"%(k,v) for k,v in opt.items() ]
        ) if len(opt) else ""
    return """
**%s** :%s
%s""" % ( 
        a_func.__name__,
        posd,
        named
    )


def valid_and_doc(
            pre_validate = SENTINEL,
            post_validate = SENTINEL,
            doc_maker = default_doc_maker
        ):
    def wraps(*pos, **named):
        additionnal_doc=""
        if pre_validate is not SENTINEL:
            additionnal_doc += doc_maker(pre_validate, *pos, **named)
        if post_validate is not SENTINEL:
            additionnal_doc += doc_maker(post_validate, *pos, **named)
        def wrap(func):
            def rewrapped(*a,**kw):
                if pre_validate is not SENTINEL:
                    pre_validate(*pos,**named)(*a,**kw)
                res = func(*a,**kw)
                if post_validate is not SENTINEL:
                    post_validate(*pos,**named)(*a,**kw)
                return res

            rewrapped.__module__ = func.__module__
            rewrapped.__doc__=func.__doc__  + additionnal_doc
            rewrapped.__name__ = func.__name__
            return rewrapped
        return wrap
    return wraps



That can be used this way :

def keyword_must_contain_key(*key):
    def keyword_must_contain_key(*a,**kw):
        if set(key) & set(kw) != set(key):
            raise Exception("missing key %s in %s" % (
                  set(key)^( set(kw)& set(key)),kw)
            )
    return keyword_must_contain_key


def at_least_n_positional(ceil):
    def at_least_n_positional(*a, **kw):
        if a is not None and len(a) < ceil:
            raise Exception("Expected at least %s argument got %s" % (ceil,len(a)))
    return at_least_n_positional

min_positional= valid_and_doc(at_least_n_positional)
must_have_key = valid_and_doc(keyword_must_contain_key) 

Okay, my code might not get an award for its beauty, but you can test it here https://github.com/jul/check_arg


And at least sphinx.automodule accepts the modified docs, and interactive help is working too. 

Of course, it relies on people correctly naming their functions and having sensibles parameters names :P

However, though it sound ridicule, I do think that most of our experience comes from knowing the importance of naming variables, modules, classes and functions correctly.


Conclusion



Since I am not satisfied by the complexity/beauty of the code I strictly have no idea if I will package it, or even work on it. But at least, I hope you got the point that what makes optional optional named arguments difficult to document is only some lack of imagination. :)



Cross dressing on the internet and gender issues

So the actual buzz is gender issues



http://www.kathrineswitzer.com/written_about.shtml
I dare say it is a non problem. But before let me tell you my story as a cross dresser ... on the internet.

Once upon a time I signed up for a famous dating site. And as advised by a friend, I was told to try with a fake feminine account. Well, let me tell you: it feels awkward. You are not at the right place. You directly notice it because communication is really different. Communication is sexualized.

At least it gave me tips on how to hit on girls: which techniques were working (thanks to fellow men), and which were not. After 3 hours, I stopped and analyzed on what I learned in the gender difference issues, and scripted a very efficient lightweight bot to help me improve my dating ratios.

Years later, I became a City of Heroes player. I had two very good reasons to cross dress again:
- if you ever played a MMORPG (meuporg as we say in my country), you may notice that in order to level up a support player needs to team with a party, and men prefer to team up with girls;
- the camera was always showing the player in subjective view, and feminine 3D model were awesome.

That's how I became half a man, and half woman. My damage dealers/tanks would be men (because they were accepted easily) and my support/controls were women. I was «love», «tainted love», «true love» and was having a perfect body I could enjoy watching... And even women would prefer my masculine looking damage dealers. Everyone is biased ...

Well, being proposed many times in global canals was disturbing. My chat was only feminized using huhu and hihi and nothing else, but talking like a men for the remaining turned them on. It was weired. So, I created a feminine only guild so that we wouldn't be bothered, and played with women. Needless to say the guild was 66% masculine IRL players (huhu). And I discovered women are not only common but good players. And we teamed up with feminine guilds too (that were knowing we were mainly men).

I then decided MMORPG were really enjoyable but taking too much of my time, so I get to a more fast paced game known as Urban Terror where my nick became [SF]Julie.  During pickups, I was favorised even though my level was lower (there are very good she players on Urt btw, I was even below their average).
I had of course troubles on public servers: people trying to hit on me or being sexists, but as I was an admin of our public server, they would get kicked/banned easily. And, I dare say that by being quite extreme, girls would enjoy playing on our server since they would not be annoyed.

Finally, nowadays on IRC I am an androgynous creature named Julie1 (julie + 1 pronounce as my real first name in french but people oddly enough just only read the feminine part). So I still am cross dressing in a way, and let me tell you, it has one advantage: on tech channels I get answers faster than «men».

Everybody is being pissed by the bad behaviors, and not the «unfair positive bias». And that bugs me; gender issues is like an auto reflexive feedback loop. How do we break it?

Speaking of women in free software



First I apologize for having crashed libroscope server. But as you can notice here, we libroscope were amongst the first to publicly speak of women in free software. Our method was to let them speak by themselves.

Since I am quite doubting everything, I was quite dubious to their claims they were discriminated. But, we let them speak. And I listened. Perline claimed for instance that the problem with men is that no project can be achieved in a mixed environment since men would take the lead, and not let women express themselves. And I thought to myself: «what a joke!» and then came the Q&A.

Well, for 15 minutes, an -he- anarchist, and very sensitive to women problems and utterly activist would explain to every one his problems as a women. The actual women present in the room were not even able to talk a single word. It was like a proof in example.

He was trying to shine in his white armor of protecting the women's pride. And all my years as a cross dresser came back: men talking on behalf of women on gender issues  is weired. Women being pushed in conference is also weired: it is like when I was accepted for partying only based on my gender. At my opinion it does not help women cause, it reinforces the bias.

So my message to men heralding women would be better understood in song I guess :

Do I have a solution?


Critic is easy. And I have a solution. One of our speaker the year before -Benjamin Mako Hill- made an awesome speech on the first freedom of free software (and that explained especially why non commercial use was bullshit), the freedom to use that in its term is radically non discriminative.

The ethic of free software/Open Source is based on action, and production. And I think that by being a regular free software user we should envision the deep implication is has:
- not a single discrimination, positive or negative is acceptable;
- when we install/use/modify a software do we already care if it is French, Black, Women Alien made? No we don't...

So Free/Libre/Open Source Software community is armed for accepting women... and all other minorities...

If Perline is right and women only communities are what it takes to empower feminine presence in Free Software: please do it. Production and quality is the only stuff that matters, whatever the means you have to use. You have my full support to exclude me from your workshop the time it will take for you to produce enough software in order to be respected. Even if it is kind of strange.


On internet though -dear women- you should try cross dressing. And if you want to fully understand men, maybe you should even try cross dressing on a dating site, in games, on IRC to understand the bias we all experience...

In fact, I praise everyone to walk in the other's shoes by cross dressing on internet. 


One of the stuff that bugs me though is: are we really aiming at the right issues. Behind the gender issues wouldn't we miss a broader issue? Why was free software shaped the way it was, and what is the invisible barrier that keeps not only women, but also a lot of other minorities out of free software? And shouldn't we measure in an objective way the diversity (economic, geographic) so that we can measure the impact of our actions?

I will make a wild guess however ... Free Software is probably regressing in terms of diversity since we are becoming more and more «experts». And, we might observe more and more people leaving the way pierre 303 left stack exchange.

But as I said before, without a survey we babble nonsense: we don't give ourself the means to measure the impact of our actions and to understand the real nature of the problem. 

Using signal as wires?

Learning that signals were software interrupts in a book on Unix, I thought, «hey, let's try to play with signals in python since it is in stdlib!»

The proposed example is not what one should really do with signals, it is just for the purpose of studying.

Well remember this point it will prove important later: http://docs.python.org/dev/library/signal.html#signal.getsignal
There is no way to “block” signals temporarily from critical sections (since this is not supported by all Unix flavors).

The idea transforming a process as a pseudo hardware component



Signals are like wires normally that carries a rising edge. On low level architecture you may use a  wire to say validate when results are safe to propagate. and a wire to clear the results.
I do a simple component that just sets to 1 the bit at the nth position of a register according to the position of the wire/signal (could be used for multiplexing).


Here is the code:

#!/usr/bin/env python3.3 
import signal as s
from time import sleep
from time import asctime as _asctime
from random import randint
import sys

asctime= lambda : _asctime()[11:19]

class Processor(object):
    def __init__(self,signal_map, 
            slow=False, clear_sig=s.SIGHUP, validate_sig=s.SIGCONT):
        self.cmd=0
        self.slow=slow
        self.signal_map = signal_map
        self.clear_sig = clear_sig
        self.validate_sig = validate_sig
        self.value = 0
        self._help = [ "\nHow signal are wired"]
        self._signal_queue = []
        self.current_signal=None

        if validate_sig in signal_map or clear_sig in signal_map:
            raise Exception("Dont wire twice a signal")

        def top_half(sig_no, frame):
            ## UNPROTECTED CRITICAL SECTION 
            self._signal_queue.append(sig_no)
            ## END OF CRITICAL

        for offset,sig_no in enumerate(signal_map):
            s.signal(sig_no, top_half)
            self._help += [ "sig(%d) sets v[%d]=%d"%(sig_no, offset, 1) ]

        self._help += [ "attaching clearing to %d" % clear_sig]
        s.signal(clear_sig, top_half)
        self._help += [ "attaching validating to %d" % validate_sig ]
        s.signal(validate_sig,top_half)
        self._help = "\n".join( self._help)
        print(self._help)

    def bottom_half(self):
        sig_no = self._signal_queue.pop()
        now = asctime()
        seen = self.cmd
        self.cmd += 1
        treated=False
        self.signal=None

        if sig_no in self.signal_map:
            offset=self.signal_map.index(sig_no)
            beauty = randint(3,10) if self.slow else 0
            if self.slow:
                print("[%d]%s:RCV: sig%d => [%d]=1 in (%d)s" % (
                    seen,now,sig_no, offset, beauty
                ))
                sleep(beauty)
            self.value |= 1 << offset 
            now=asctime() 
            print("[%d]%s:ACK: sig%d => [%d]=1 (%d)" % (
                seen,now,sig_no, offset, beauty
            ))
            treated=True

        if sig_no == self.clear_sig:
            print("[%d]%s:ACK clearing value" % (seen,now))
            self.value=0
            treated=True

        if sig_no == self.validate_sig:
            print("[%d]%s:ACK READING val is %d" % (seen,now,self.value))
            treated=True

        if not treated:
            print("unhandled execption %d" % sig_no)
            exit(0)

wired=Processor([ s.SIGUSR1, s.SIGUSR2, s.SIGBUS, s.SIGPWR ])

while True:
    s.pause()
    wired.bottom_half()
    sys.stdout.flush()

Now, let's do some shell experiment:
$ ./signal_as_wire.py&
[4] 9332
660 jul@faith:~/src/signal 19:04:03
$ 
How signal are wired
sig(10) sets v[0]=1
sig(12) sets v[1]=1
sig(7) sets v[2]=1
sig(30) sets v[3]=1
attaching clearing to 1
attaching validating to 18

660 jul@faith:~/src/signal 19:04:04
$ for i in 1 12 10 7 18 1 7 30 18; do sleep 1 && kill -$i %4; done
[0]19:04:31:ACK clearing value
[1]19:04:32:ACK: sig12 => [1]=1 (0)
[2]19:04:33:ACK: sig10 => [0]=1 (0)
[3]19:04:34:ACK: sig7 => [2]=1 (0)
[4]19:04:35:ACK READING val is 7
[5]19:04:36:ACK clearing value
[6]19:04:37:ACK: sig7 => [2]=1 (0)
[7]19:04:38:ACK: sig30 => [3]=1 (0)
[8]19:04:39:ACK READING val is 12

Everything works as it should, no? :) I have brilliantly used signals to transmit data asynchronously to a process. With 1 signal per bit \o/

What about «not being able to block the signal»



$ for i in 1 12 10 7 18 1 7 30 18; do echo "kill -$i 9455; " ; done | sh
[0]22:27:06:ACK clearing value
[1]22:27:06:ACK clearing value
[2]22:27:06:ACK: sig7 => [2]=1 (0)
[3]22:27:06:ACK: sig30 => [3]=1 (0)
[4]22:27:06:ACK: sig10 => [0]=1 (0)
[5]22:27:06:ACK: sig12 => [1]=1 (0)
[6]22:27:06:ACK READING val is 15

Oh a race condition, it already appears with the shell launching the kill instruction sequentially: the results are out of order. Plus you can clearly notice my critical section is not small enough to be atomic. And I lost signals :/

Is python worthless?


Not being able to block your code, is even making a top half/bottom half strategy risky. Okay, I should have used only atomic operations in the top half (which makes me wonder what operations are atomic in python) such has only setting one variable and doing the queuing in the while loop, but I fear it would have been worse.

Which means actually with python, you should not play with signals such as defined in stdlib since without blocking you have systematical race conditions or you risk loosing signals if you expect them to be reliable.

I am playing with signals as I would be playing with a m68K interrupt (I would still block signal before entering the critical section). To achieve the blocking and processing of pending signals I would need POSIX.1 sigaction, sisget, sigprocmask, sigpending.

Why python does not support them (in the stdlib)?

Well python is running on multiple operating systems, some do support POSIX.1 some don't. As signals are not standardized the same way except for POSIX compliant systems with the same POSIX versions, therefore it should not be in st(andar)dlib. And since it is *that* risky I would advocate not allowing to put a signal handler at first place (except for alarm maybe). But, take your own risk accordingly :)

If you feel it is a problem, then just remember binding C code to python is quite easy, and that on POSIX operating system we have everything we need. This solution given in stackoverflow is funky but less than having unprotected critical section: http://stackoverflow.com/a/3792294/1458574.

My problem with Computer Science pseudo code

I remembered opening books of Computer Science to learn algorithm when I began. And my first reaction was: it seems unnecessarily complicated like maths, but I must be an idiot since I learnt physics.

So 15 years later I reopened Introduction To Algorithms By Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein. And I decided to see if my experience in coding would help me appreciate this book for all the knowledge it could bring me.

So I decided to take a look at the simple heap structure and algorithms:

Let's look at the pseudo code (see here for the whole chapter):

When I see this pseudo code, and think of actual production code, I am dazzled:

  • it has meaningless variable;
  • API and variable names (like the global heap_size) are quite hard to grok; 
  • I wonder if recursion is really important knowing it is hard to read, debug, and may reach a recursion limit;
  • I have a hard time understanding how it works. 
So  I thought, maybe I am just an idiot and I tried to see if I could do a better job at writing pseudo code which would help understand. (I used python as a pseudo code checker)

Here is my heapify function (the whole code is here  https://gist.github.com/3927135 if you want to check the correctness):

def heapify(_array,index=1, heap_size=SENTINEL):
    """bubbles down until it is correctly located.
    heap_size can be used internally for optimization purpose"""
    if heap_size == SENTINEL:
        heap_size= len(_array)
    a_child_is_bigger= True
    while a_child_is_bigger:
        largest= index
        left_pos, right_pos = left_offset(index), right_offset(index)

        #check left
        if left_pos < heap_size and _array[left_pos] > _array[index]:
            largest= left_pos

        #check right
        if right_pos < heap_size and _array[right_pos]>_array[largest]:
            largest= right_pos

        if largest == index:
            # we are finally the king of the hill, end of story
            a_child_is_bigger= False
        else:
            #swap with child // bubble down
            _array[index], _array[largest] = _array[largest], _array[index]
            index= largest

And by coding it revealed what I feared the most: CS and computer developers don't live in the same world: the way a code is written matters.

  1. getting read of the asymmetric test line 5 greatly improves the understanding of the logic;
  2. by using full name for tests conditions you really help people (you included) understanding thus maintaining your code;
  3. recursion comes in the way of understanding straightforward code; 
  4. one letter variable names are really unhelpful variable names;
  5. their code typography is even worse than mine.
I have made python code to check the whole logic: writing a tinge more readable code does not seems to prevent it from working (I may have a problem on the boundary because this hidden heap_size variable is a pain to understand).

By the way if you want really efficient heap structure in python please use : http://docs.python.org/library/heapq.html because it is just plain better, superbly documented, tested and the source code is quite nice too.

Now, I am plain lost: when I check CS papers because I want to solve a problem I often stumble on unreadable pseudo code and verbiage. Understanding the API, the variable name ... takes an awful lot of time. When I write in python I have the feeling that it is a Pseudo Code Translator, so if we assume my implementation is close visually and logically from pseudo code, does it costs that much to improve pseudo code readability?

If I can do it knowing that I am one of the most stupid developer among the one I know, why or how can't they do the same. Pseudo code is for sharing knowledge, do we really need to waste time de-ciphering these pseudo codes to be a real or better developer?

When you see equivalent code in industry grade application, you normally hate that. It is plain bad practices. Isn't computer sciences supposed to teach elite grade developers?

And when I see such code, I sometime wonder if the guy understood what he wrote.

Meaningful variables names, test conditions, readability is what matters, because code is meant to be improved and fixed.

With all these questions in my head I came to a conclusion: I don't take for profound what I think is uselessly obscure, so my opinion of CS papers and academics has dramatically dropped.


EDIT: now, I see that bulbbling down and bubbling up can be extracted from the logic, and I guess it can be used to make a heap usable for min/max efficient inserting/extraction.


Unicode is tough

So today, I dared open a bug on python. Which at one point should make me feel mortified, since it has proven that I misunderstood what a character is.

The point was in python 3.2:
foo⋅bar=42
#  File "stdin", line 1
#    foo⋅bar=42
#            ^
#SyntaxError: invalid character in identifier
### This is another bug that is not in the scope of the post
### http://bugs.python.org/issue2382
print(ord("foo⋅bar"[3]))
# 8901
foo·bar = 42
print(ord("foo·bar"[3]))
# 183

A point is a punctuation mark, no? And variable names shouldn't use punctuation.
Plus it looks the same, shouldn't it be considered the same?

So I opened a bug and I was pointed very nicely to the fact that unicode characters "MIDDLE DOT" is indeed a punctuation but it also has the unicode property Other_ID_Continue. And as stated in python rules for identifiers, it is totally legitimate.

That is the point where you actively search for a good documentation to understand what in your brain malfunctioned. Then a Perl coder pointed me to Perl Unicode Essentials from Tom Christiansen. Even if the 1st third is about Perl, it is the best presentation so far on unicode I have read.


And then I understood my mistakes:
  • I (visually) confused a glyph with a character: a same glyph can be used for different characters;
  • unicode is much more than simply extending the usable glyphs (that I knew, but I did not grasped that I new so little).

By the way if you need a reason to switch to the current production version 3.3.0
remember Py3.3 is still improving in unicode support

py3.2 :
"ß".upper()
# ß  

which is a wrong result while in py3

"ß".upper()
# SS  


Fun with signal processing and ... matplotlib

Well, I remember overhearing a conversation one day in a  laboratory once 10 years ago.

Electronic engineers were proud to have helped their team to higher CDD resolution with a trick they published:
they would use a median filter to filter out the photon arriving from the CDD from the thermal noise. They were even very proud of seeing a very signal far smaller than the noise.

I thought to myself. Such a scam, this is to stupid it cannot work.

So I decided to play with matplotlib to test it.

First we generate white noise, then given a probability of 1/n a photon is seen, and its amplitude is 1/A of the signal.

Then, I used #python-fr to have advices on the best way to do a moving average (which is the usual filter used for filtering out signals), then I tried the median filter.

A median filter is just as silly as taking the median in a moving window on a time serie.

It cant work, can't it?

Well, it does.

Is it not hard  to code, but beware your eyes will cry from PEP8 insanity and mapltotlib style mixed with pythonic style (call it art):
(https://gist.github.com/3794506)



Moral of the story? 

 

Well, I should have to publish the code that shows why 40 for median and 75 for movering average are the optimal settings, and how dazzling fast matplotlib is.

But the real point is python + matplotlib are amazing. And that sometimes simple ideas give amazing results.

Have fun!

The startup checklist for developer's interviews

As a foreword, I'd say when candidating for a job interview for a startup be more aware of the risk you are taking:
- cash flow is not yet existing so you take the risk of not being paid or being fired;
- if the product is innovative by nature ... you are not sure you will make it work.

The tarpit of the startups

 

I have had not very much luck with my job interviews these last weeks, so I resorted to startup. Well, startup interviews are quite different than normal job interviews, a developer must also interview the startup to assess the risk he takes.

Job description

Most startups wants a developer. When you hear them, everything is done, they just need more manpower with polyvalent sets of skills. The truth is the more impressive a job description is, the more problems you will have. That is the reason why I usually get pretty inquisitive and make the founders spill the beans about what are the remaining problems. Usually, only some little details are missing; they have made a prototype that works and only  need some industrialization.

Evil lies in the details

Never ever believe a startuper saying it is 95% done you'll only have to work on the remaining 5%.

As funny as it seems, the little details are often big details. Since I have NDAs to respect, I cannot say how laughable the 95% claim often is.

Let's sum up by saying the 5 remaining percent of the job are just 5 huge percent. Like computing a NP complex problem in a polynomial time. Or having software with real time constraint distributed amongst more than one computer and OS that are no real time OS with constraint from less than .1s using UDP protocols tunneled in TCP on any networks.

Just think of any problems for which -seasoned developers- know we have no the solutions and stuff it with a lot of other problems in the remaining 5%.

I dare say that solving a high bandwidth low latency multi host real time robust distributed signal processing with no R&D team, no budget in less than 9 months is a 1% detail on a startup scale.


Evil also lies in deadlines and wages


I have no problems trying to be smarter than the average. I have a problem in trying to solve a problem that more than one brilliant team bit their teeth on. Not that I am modest. Modesty is not my first value. I am pretty arrogant. But, when there is no state of the art for a given problem, given no hard data to analyse I pretty much chicken out given a deadline, because as far as I am concerned either my knowledge is outdated (which means I will fail) or I am right on the fact that a problem is a tricky one that requires Research and Development or is an absolute un-achievable problem of computer science.

When, you decide to be a developer for a team that has a deadline, no funding and no ideas what they face and that their are entering the realm of research you are scared.

The nature of research is you don't know how to solve a problem, and as long as you have not studied with hard enough data and theory you have a hard time
-  knowing if it is possible to deliver;
- and when!

While researching you may discover that your company is doomed!

So, adding a stressful situation to low wages (usually we are talking of 40k$/year) it is quite  unfair: you feel like atlas having to wear on your shoulder the weight of the world and having a peanut as an incentive.

Why not be rock'n roll and take a risk? 

 

Would you bet on you alone against some of  the most brilliant teams of the world of computer industry (MS, apple, google, facebook...) given that the odd are against you and if you win, you will be strip off everything?

Well,  you could. Given that you have:

- other incentives;
- trust in your team.

E  pluribus unum



Usually, companies looking for the developer doing the remaining 5% already has at least 1 CTO and 1 expert. If you say that the 5% belongs to the realm of research and not engineering, you are already facing a strong disapprobation from two members of the team, especially when you say that research on the opposite of engineering is far harder to deliver in time. So you are not even hired that you are challenging at least one of the most prominent figure of a startup. Well, believe my experience these are no good omens.

The other problem that it enlightens is that the company that wants to hire you have the belief it is highly versed in bleeding edge computer science, but it seems to ignore all of the field they pretend to dominate. There are 2 problems:
- they cannot assess the risk;
- they cannot assess the benefit;
- they cannot assess the cost.

Do you really want to work or contract for a company that has no money and given some good sense as no idea how to make some?

Well you could...

The last hit ... 

Imagine that you have a solution and can save them.

Do you know you will earn nothing from your hard work?

Well it is called IP law. The result of the intellectual work from any contractant or worker from a company belongs to the company. So imagine you find a new solution to a problem (innovation), not only would you earn nothing from it, but would you reuse it would you be sued for counterfeiting your own fucking innovation.

So to sum up, working as a developer of a (small french) startup is often like taking all the risks and hoping no benefits.

What is the problem with startups? 



Some say the problem is only French. In France more than once had I this discussion with executive and strangely artists that the ability to produce something is less important  than the ability to imagine things.

Let me tell you  my truth:
- everyone has ideas;
- few can deliver them in a given deadline.
 

So if you want startup ideas I throw a bunch of them:
- itinerant coffee vendors with a light stand that would switch from subway exits  to subway exits according to frequentation;
- auto-adaptive real time delay line for signal transmission;
- power computing efficiency consulting companies;
- printing server that would change the font to save money on printing (a liter of ink cost more than a kilo of gold);
 - highly efficient dating sites (based on my experience of automated bots on website I made);
- agency for computer talent recruitment;
- women oriented soft pornographic content. 

The problems with ideas is that you need to deliver them. And to deliver them you need a capital in order to sustain you between the time you have your idea and you deliver it. If you have been ruined once as I, you don't have any capital. You just have a know how you can negotiate.

My point is ideas are bullshits if you can't deliver them. Thus here is my factual checklist to check if you can work for a startup:

Don't get mad get even


  1. how many percents of problems remains to be solved before launching the product?
  2. what are the few remaining percent made of?
  3. if the product is already launched what are the recurring problems?
  4. when is the deadline for sustainability? 
  5. are all the technical problems understood?
  6. is there a test bench for testing all the parameters of the solution?
  7. do they understand the complexity of the problems they try to solve?
  8. do they know the state of the art? (did they even fucking read the manual or googled their problem?)
  9. what is their budget for R&D? operations? Did they at least figured both problems were quite not the same? 
  10.  do they have a QA for answering and solving their customers problem? 
  11. can they reproduce their customer's problem (it is a rephrasing of do they have a lab for experimentation?)
  12. what are their growth prediction? in terms of customers (scalability) and business model (income)
  13. do they document their code/APIs? have a bugtracker? have an SCM?
  14. do they have hard data they got from their prototype?
  15. do they understand the cost of computer engineering (effort) vs hardware (peanuts) and of infrastructure (recurring costs)?
  16. do they have a realistic charging model?
It is stupid, but you can't do R&D based on the assumptions of  a self declared technical guru or expert. You need to test, experiment, try, reproduce and understand. The absence of a lab or hard data in a supposedly innovative startup is a NO GO. The absence of knowledge of the revelant concepts they deal with (asynchronous programming, real time, distributed programming, b(i|u)g data, data processing) is a clue they are clueless. And if they are clueless, they won't know how much your knowledge worths.

 So my piece of advice is:
- if the wages are indecently excessive take the money and shut up, and prepare a backup plan because you'll need it;
- else run like hell.

You don't know how fast I am actually running!

EDIT: thx bruno 

When Free Software was a nest of black swans

Listening to this song while reading might help understand the post:


I guess you wouldn't believe how free software was in the beginning: a nest of freaks.

When I went to my first Libre Software Meeting in 2000 I met a lot of interesting persons. And now that I know them, retrospectively I knew only freaks:
- people suffering of chronic seizure;
- of Sclerosis;
- of type II diabete;
- transgenders, bi sexual, gays and lesbians;
- people suffering from psychosis (chronic depression, bipolar disorder, schizophrenia);
- Asperger's syndroma;
- ADHD;
- handicapped people;
- social outcast for miscellaneous reasons;
- and... no women.

With an over-representation in regard to the actual freak percentage in society.

And this is just what I know from talking to some people in private, and amongst them are well known members of FLOSS community.

Have you ever attended  a speech where the stuttering speecher  would be not only accepted but would be granted a warm welcome?  That is the Free Software I embraced. A radically non discriminative community (© Benjamin Mako Hill).

Is this an exception? It has taken me years to discover the truth, because they were not wearing their differences as an excuse for not respecting the rules, and that might have been the secret of Free Software success: being a nest of freaks that want to be treated as everyone: only what you achieve define people, not appearance, nor social status, nor difference.

And now, as time has passed by, I think that not only this was wonderful and an example for society as a whole, but also the reason of the rise, and maybe a possible reason for the fall to come of the free software community.


Status hierarchy vs Competence hierarchy

 

I will now err on the very risky path of  giving my interpretation: I think that the 80's has been the stage of a nameless counter reaction to the 70's, and that Free Software might have been a nameless revolution that took place in the 90's. No words were spoken, still outcasts gathered.

Is there any cultural trace of what I say?

Teenage movies in the 1980's were often about how nerds being bullied that would fight back. Musical bands such as N.E.R.D.S or No doubt, or weezer or 311 would speak about it.

France were I was grown up is not U.S.A. we had our own history based on the elites and a very strong social reproduction based on social casts.

In order to have a computer degree that was socially recognized it was better to integrate selective schools. Given the actual data, I believe that 1968 revolution was a revolution in words and a counter revolution in disguise (social diversity maximum is  almost exactly in 1968).

What kind of hobby was cool for the freaks? Computer stuff: whatever your social status was, only your skills mattered and social skills were useless. I don't know if it was a conscious claim or a logical evolution, still computer hobbyists were counting numerous nerds that could shine through their skills.

Don't think computer scene was open to everyone, computers at this time were really expensive toys, so it was mainly structured around well born social outcasts from (white) educated families.



Cargo cult programming vs Digital craftsmanship

Conformism against diversity



I hereby begin one of the controversial part of my thesis. We did better at coding because we were not elite engineers and engineering was not about learning how to code, but how to behave and talk. (*Of course my statement is a bit excessive.)

In (my) university we were not as well equipped as engineering schools. We had cheap (PC) or obsolete stuff (Old unices and stations). At this time I remember that embracing linux, C bash, Perl and PHP was scolded. Real pros would use: Java, Design Patterns, UML and merise, Corba, VB, C++ ... Networking future in France was about Token Ring, ATM, minitel, Internet was a joke designed by (what a joke) PhD. We were hobbyists tinkering at most and betting on the wrong technologies.

Real engineers with a future learnt all the right theory of programming, theory of compilators, they could spoke the words of IT (commonly known as BS or buzzwords),  while we  were due to code practical problems, such as plotting our own results taken from our lab equipment. I was graduating in physics, maybe it helped. We had no time to learn any hype coding we were due to deliver with short deadlines.

We were scolded, despised, mocked but we all delivered in our fields of competences. Some became famous, some lost. Free Software was a much more selective world than any selective school in France: it was based on what you really do, not how nice you were, or how you could handle concepts or the hype you raised, or how web 2.0 coolish your blog was. And shipping stuff was the entry token. Code is not the only thing you can ship: documentation, web magazine, coordination, packaging, socialisation, normalisation are important parts of free software.

 Doing is an aristocracy that needs no excuse

The cool stuff when feeling you suffer an injustice is too fight back the only way you opponents understand: we, the freaks of a conformist society, have beaten the pulp out of the arrogant people with nice suits and soft words by delivering stuffs. FLOSS success was neither a technological or a political common fight, it was the sum of individual fighting separately for acknowledgement.

So when I see articles about: «why you should not hire the black swan» I wonder if the fall of FLOSS is not coming? Being different but delivering quality products is at my opinion a corner stone of Free Software Community.


Mega Troll: why women have so much problems in free software?



In a pretty misogyn chapter of «the social contract» Rousseau stated that women used their supposed weakness as a strength. After stating that the strongest tend to bully the weakest, he then stated that women claims of weakness  should not be taken for granted since they tends to side with the strongest.

I would like to point the fact that if you accept my assumption that free software pioneers were pretty much standing out of the conformity, and were rejected when trying to pick up chicks then women rejection as a trauma can make sense. Oddly enough, this community was made of men.

You could figure how strange people are cock blocked for  being uncool, or not hype enough, or poorly dressed or just plain different and have a mental picture of how some of the freaks may have been hurt. You could also read fantasy to have a glimpse in the mind of women: K. Kurtz, A Mc Muster Bujold, U. K. Le Guin... all heroes in feminine stories are good, because they are the good deeds of good parents. Parenthood is the key to heroism 9 out of 10 times.You don't date monsters unfit for being the father of your children.

Most heroes in male written stories are however background-less: only their action makes them interesting. (Okay, I admit that with my vision of fantasy, I'd be relieved we discover that LOTR was influenced by a women).

If and only if my assumptions are right, I say deep wounds take time to heal so there will be no immediate acceptance of women. And, to test my theory I have made up a way to test it: given feminine guinea pigs accepting the following premises:

- women should consider that masculine gender is in fact a neutral gender on internet and should hide any hints pointing towards their sex;
- women should consider they deserve nothing for being a women neither good nor bad, and they should embrace the idea that what you do defines what you are, not who you are defines what you can (that is radical non discrimination);
- on internet no one knows I am a women;
- people that were bullied in their youth tends to develop a strong personality and thus are assertive/pushy, so should women if they want to blend in if they really think they really suffered their difference.  

If nice guinea pigs tries my experimental behavior compared to a set of unbriefed women; then I dare predict they will be accepted in free software communities. And, I also can predict that all hell will break loose when people will uncover the women's true identity in real life (either strong attraction or repulsion). And for a few, given enough respect earned by their actions, they might even be accepted as the person they are.


The conclusion in music too:

The limitation of frameworks.

In my young days when I began coding, the synonym for code reusability was libraries in procedural languages and classes in object oriented languages, then it became modules or software packages, and afterwards it became frameworks.

Trends change, the sense of these words may differ but the goal stays the same: doing more in less code you share with the others.

I could do a very abstract presentation of the limitations of frameworks, but let's talk of code as if it is literature.


What is a language?



I do learn computer languages like I learn foreign languages, I like to focus on typical expressions that make the language interesting, I don't try to translate one word for one word. These are the idiomatic expressions. Some language put a stress on manipulating vectors of data like a single variable, others on being terse, others on letting you express the way you want.

A language in itself to be efficient has to be embraced as a whole, as a frame set. And the reward for accepting the language are new ideas, and new ways to express yourself. For me a programmer likes a language more than the others for a question of aesthetic. That's why language wars are so heated and opinionated.


What is a program?



A program is a story you tell in a language and some stories sound better in some languages. For instance erlang or node.js seem better suited to tell asynchronous stories. R seems better suited to tell a story about how a series of data can be interpreted as information....

A program is an essay; your customer set a stage with characters,  one or more places, some events situation he wants you to build a story that leads to the desired conclusion. In the middle you are alone with a sort of bible given by the customers with all that is relevant to set the stage, and no one cares how you do it, but you have to tell a consistent story driving the characters in the desired stage.


What is a pattern?


A (design) pattern is like a cliché in a story. When you see a hero saving a nice lady in distress you expect them to kiss in the end. Well, as a coder I mainly think this way: given an initial conformation of data and a desired outcome, I sometimes have a feeling of déjà vu and  will lazily tell my story in an obvious way (that may not be obvious to a duchess for it is sometimes crude). This is not code since words may vary. We all have a style or constraints given by the customer that may alter the possibility to use a crude cut'n paste. So patterns are not about actual code it is about a global direction.

It is like when you want to make a gag in a movie, you can rely on quiproquo, rolling gags, misdirections... These are mechanisms you can rely on, you still have to describe the situation and write the dialogue, you cannot just adapt a sketch of another movie.

Wrapping up and stating what a framework is



I see programming as writing an essay given constraints given by an editor. The editor wants us to write in a genre, with given elements, with a desired conclusion and a deadline. We could write anything in the world if we had enough time, but since our schedules are tight we have to be terse, fast, and reuse patterns and code.

Frameworks are mainly story generators. 

Do you see these TV shows  with stereotyped characters, stereotyped situations, and stereotyped development according to a «bible»? Do you remember Starsky & Hutsch, J.A.G. (and all Belissario production), New York Special Unit ...
These TV shows are to literature what frameworks are to the computer languages: a restriction of what you can do with your imagination. They were enjoyable shows amongst these ones (Wild Wild West for instance) and it is not a bad thing. Of course, some scenarists made TV shows evolve by not telling the same story other and other during the whole show and still they stayed consistent: I still have fond memories of Twin Peaks, or babylon 5.

I don't say frameworks are bad: we need them, because of the deadlines.

I say they limit our options to already existing stories. As a coder, I do think that writing a story that already exists is a waste of time. Adapt an existing story (buy a software) if you want something close to an existing story. When you make variation to an existing show you take the risk to alter the conceptual integrity of the story. In programming I call this «disengaging a framework». This as far as I am concerned is tough. I remember having to add a custom authentication mechanism to turbogears for a flash application and it took me two weeks to have it right.

My experience is the more constraints a framework imposes, the less idiomatic (in regard to the original language it is coded in) a framework is, the harder to disengage it will be.


Why disengaging a framework matters?



I remember why I decided I would never code in python one day.

It was in 2004 I was directing a topic at Free Software Meeting and since the organisation was eating its own dogfood, the website was in a CMS framework (Plone?) in python. I pride myself in understanding languages quickly because they all share common roots (the more languages you know the easier it is to learn new ones), and I had a problem: speeches were not sorted in alphabetical order but regarding the DB id. I thought it would be easy to find where the list was sorted to introduce the right ordering functions. In 6 hours I was unable to find how it was working, because it was not a language, it was a frame set very hard to grok. And frameworks are not a language they are yet more powerful, but so less expressive.

But when I tell you the customers are coming with strict situations, characters and definitive ideas of what they wanted : I lied

"Programmers don't burn out on hard work, they burn out on change-with-the-wind directives and not 'shipping'."
Mark Berry 
Customers are the biggest liars, they always want something slightly different than the initial specification, and different from what the framework you have chosen proposes. Thus you will have to disengage sooner or later.



Can we improve the situation on the code side?


Well, the situation has improved already since a long time ago.

What I described so far as frameworks are jumbo web frameworks like Ruby on Rails, symfony, Django, turbogears, Plone, ASP.NET ...

Thanks to developers such as Blake Mizerany (who inspired Armin Roacher I guess for flask) and numerous other developers we have already have lightweight frameworks: ramaze (ruby), play (java) ,flask (python), sinatra (python), dancer (perl) .... frameworks are not bad, they are fantastic tools to code less, but they freeze you in a mindset while our added value lies in our agility. Thus, the main quality of a modern framework (in my opinion) is its flexibility.

Furthermore, these last years I have met developers that have almost always coded with one framework. I notice some of them hardly knew the finesse of the language they code in (and I like to call them monkey coders). But for the others I wonder what they will become when their framework will become obsolescent? Will they still be able to code in python (ruby, PHP, Perl or whatever)? Maybe Zed Shaw was wrong, Ruby on Rails was not a ghetto, maybe all jumbo frameworks are ghettos.

Like the dealers in «the wire» speaking their slang for too long ... staying in their culture for too long and hardly able to leave their former life of dealers or junkies.  


PS thanks to bruno  for the corections
«the wire» is an amazing show
If I was a rubyist I would code web sites with ramaze
If I was a perl coder I would use dancer
And if I was a coder I would not use PHP because it is neither consistent nor terse.

Making tests before installation with setuptools

I dream that packages don't install if the tests are failing. I made it at least.

My solution is gory but practical.

in setup.py I added:

def test():
    """let's script the command line python -munittest discover"""
    loader= unittest.TestLoader()
    suite=loader.discover(".", "test_.*.py")
    runner=unittest.TextTestRunner()
    result=runner.run(suite)
    if  not result.wasSuccessful():
        raise Exception( "Test Failed: Aborting install")
    print("#### Test passed")

if "install" in sys.argv or "bdist_egg" in sys.argv or "sdist" in sys.argv:
    test()


And since practicality beats purity...

Still wondering if it is a bad idea



Okay, I test my packages before deploying, okay, there is tox. But no test can make as much variations as what users have as an environment. Even though I don't find it nice, at least it has the authoritative psycho-rigid behaviour I want.

Just like in Perl, if it does not pass the tests, it should not be installed.

I still wish I could:
  • make an autoreport tool (calling a REST server) to know how reliable are my packages, and which OSes/python versions have problems;
  • have a tool that let user interract with ticketting system;
  • bypass the tests with a --force flag, and call the test suite in a unified way once the package is installed.
 Still dreaming.

OOP Sux


I have seen posts on hackernews on the topic that Object Oriented Programming (OOP) sux. These are stratospheric considerations. OOP does not sux on a theoretical point of view it sux in real life.

Guh! Don't I like python? Yes I do. But what I hate is not really OOP. It is monkey developers that care about OOP and not about coding.

Coding is like kicking a problem in the nuts. Some problems get kicked in the nuts by OOP, some don't. People that don't understand deserve a slow and painful death by enucleation with a plastic spoon.  

To make myself clear:
  • I am gonna make a short theoretical statement about what is wrong with some devs;
  • a short practical code review of a code that pisses me off.

OOP sux as much as developers that don't know how to code. 


It is indeed true that I am old school. The data structure is more important for me than the workflow or algorithms. And I do hate OOP for mixing both when it's done poorly.

A good object is for me data with multiple views.

And there is a signature for poor OOP:
  • code that should be better written as a script;
  • code that has strong coupling;
End of theory. Let's practice.

My code sux!



Sometimes I am too lazy to write code, so all permissions granted I reuse other's code. Pypi-stat (a package of mine) and I suffered my laziness.

Rule 1: all code that could be written as a function should be written as such


If you watch https://github.com/jul/pypi-stat/blob/master/pypi_get_stat.py#L206 you'll notice that the borrowed code is called this way:

PyPIDownloadAggregator(pkg.strip()).stats()

When you see this, don't think: this a script disguised as an object.

By the way, what is the problem to solve? Given a package name, I want all the stats given by an RPC call to pypi in the form of a dict.

Basically I want to :
  • fetch the stats;
  • transform them;
Fairly easy?

__init__ -the constructor- is stragthforward, it is initialisation. So you expect stats to be easy too.

Well, cry with me lads and wenches.
https://github.com/jul/pypi-stat/blob/master/pypi_get_stat.py#L147

def stats(self):
        """Prints a nicely formatted list of statistics about the package"""
        self.downloads # explicitly call, so we have first/last upload data
        ...
Wut?!

Rule 2: when you need to call methods or properties in a defined order so that it works, you have a coupling problem


Why do I ask a property I don't use at your opinion?

Because, the property is processing hidden states and coupling are on their way.

   

    @property
    def downloads(self, force=False):
        """Calculate the total number of downloads for the package"""

        if len(self._downloads) == 0 or force:
            for release in self.releases:
                urls = self.proxy.release_urls(self.package_name, release)
                self._downloads[release] = 0
                for url in urls:
                    # upload times
                    uptime = datetime.strptime(url['upload_time'].value, "%Y%m%dT%H:%M:%S")
                    if self.first_upload is None or uptime < self.first_upload:
                        self.first_upload = uptime
                        self.first_upload_rel = release

                    if self.last_upload is None or uptime > self.last_upload:
                        self.last_upload = uptime
                        self.last_upload_rel = release

                    self._downloads[release] += url['downloads']

        return self._downloads

downloads is not a property, it is a method doing processing and setting internal properties on the way. So is releases. This stuff that by conventions are stateless (properties) do modify the state of the freaking object. Thank you man! When reading this kind of code where inert stuffs (properties) is active I feel like a rabbit trapped in a dense mine field.


What the heck?

So at one point, I decided I was not reading or improving this «easy» code. I just put a rant in the comments about how much I hate stupid OOP developers.

This very straight forward code has therefore a coupling (you can't call stats without accessing the downloads property because it silently mixes up code and processing, with some states logics).

That's what make script coders like I hate OOP.
The workflow is not sequential (it jumps from part of the code forward and backward), and data are not processed sequentially. It is not code, it is Snakes and Ladders.

That is the reason I hooked my save method on stats: this code was a maze and I knew the entry point was the constructor and the end point was the stats method. I am weak, I know.

Conclusion


OOP is not bad if you fear Alzeihmer and want to train your brain to most astonishing logics.

With poor coders object oriented programming is like taking acids with the mad hatter. Pretty intellectually stimulating, but not very efficient.

Still I think that whatever the paradigm used, this coder would have done code «looking» academically nice, but definitively wrong. Coding is not about following what is written in a book about OOP or functional or imperative programming.

It is about making a complex problem look simple, not the opposite.

EDIT: Very good Pycon talk on the topic

Packaging in python from a former Perl dev point of view

It's easy


I use github, readthedocs post commit hooks,  http://guide.python-distribute.org/ and it all works fine: I am delighted to be honest.

As far as I am concerned packaging in python is freaking easy, a fun and rewarding experience. So here is my point: this is not a rant since I love packaging in python.

However, to be honest there are quite a few things that trouble me:
  • I don't really know what I am doing since I mostly follow cookbooks;
  • I am lacking some of the CPAN features;
  • and I do have (the maybe wrong feeling) that the packaging culture amongst python community is not as strong as in Perl (this also applies to my production).

from CPAN import wisdom

Most of Perl's quality module are not coming from PEP they are coming from cultural habits. I dare cut and paste some of tutorial for perl


  • Write the documentation for a module first, before writing any code. Discuss the module with other people first, before writing any code. Plan the module first, before writing any code.
  • It's easy to come up with a solution to a problem. It takes planning to come up with a good solution. Remember: the documentation, not the code, defines what a module does.
  • Every module should have a purpose. There's a proliferation of modules with names like "perlutils.pm", "rcs_utils.pm", and "utilUtils.pm" that have no obvious purpose, and it's difficult to know what each does. This leads to confusion and duplication of code.
Well, since we are plagued on pypi with the infamous «nested list printer» and ports of PHP file_get_contents, I guess this wisdom is not yet totally in python.

I think python tutorial focus too much on the technical part (how to build a package) and not enough on the QA part (how to make a useful and maintainable package).

There is also something in CPAN I love, and I don't follow because I lost myself in geeking with sphinx, it is the straightforward documentation in one page following this plan:
  • Name 
  • Version
  • Synopsis (short code snippet that works)
  • Description (in full english with no code)
  • Methods (with code snippets if useful)
  • Notes (extra informations needed)
  • See also (similar packages)
  • Limitations
  • Bugs
  • Author
  • Licence
It is a kind of very informative plan.

These are what the culture provides. Plans are not imposed to the packager, it just converged as being efficient.

from CPAN import tools


Now, CPAN has also great tools we miss in python: for instance prior to installation there are the automated tests and eventually automated reports.

I do always test a package before pushing it, but, I'd rather force tests that prevent installation if they fail on the user side. I tried ditribute test suites feature, but I fumbled.

You know what, I miss this feature, and the deployment matrix.

I miss to see if maintainer is active by being able to watch its ticketing queue. The ticketing system is in CPAN.

I also miss the direct link to the source. Or the dependencies chart.

I also like how they handle the «Missing In Action» of packagers and how they can decide to hand over a package's maintenance to another maintainer.

This have very few chances to happen in a close future. However, I can see how we can all improve our package.

from packager import good_will


Good news is we don't need code to solve most of these problems.

Documentation


In the README I provide with my packages I (try to) include:
  • a link to the sources
  • a link to the full documentation (on readthedocs and package.python.org)
  • a link to the ticketing of github
  • a synopsis
  • requirements (in case my dependencies get weirdly not computed)
  • a changelog
I will try from now on to be terser in my documentation and follow the previously mentioned Perl plan.

I noticed repoze.lru changed its former nice README for a useless one. I am sad.




Testing

 
Always test before pushing. It may seem obvious, but I have noticed some maintainers don't. Build a sdist, make a clean virtualenv with nothing to install your package before uploading it too. It is nice.

Versioning


Follow http://www.python.org/dev/peps/pep-0386/

Always tag your source code in your repository with the adequate pypi version.


Don't ask GvR what you can achieve by yourself


First I understand nothing of the mess between distribute, distribute2, setuptools, so I gave up dreaming of hacking my way to a solution through brute coding. And I guess you don't make a donkey that is not thirsty drink water.

My workaround is as a packager I can by showing the example and hope people will follow me improve python packaging with my own lever: practice and culture.

In the ecosystem I am not only a producer I am also a consumer of packages. So I think that as consumers of packages YOU can also improve the packaging ecosystem by checking that a package follows most of these rules before installing it:

  • is the README on pypi including 
    • a link to the source, 
    • the ticketing system, 
    • a synopsis, 
    • a changelog, 
    • a link to the the full documentation (1 point per present info);
  • is the full documentation following the Perl canonical Plan (in regard to the complexity of the package don't be too picky (5 points if the doc is relevant))
  • does the source code contains a test suite? (5 points)
  • can I reach the maintainer IRL (2 points)
  • are there outstanding issues in the ticketing (2 points if all issues are opened for less than 1 month)
  • is there an auto reporting tool in the package (that triggers the test suite and submit it to a ticketing system) (5 points (I won't have them)). 
  • versioning the PEP way (3 points)
In my case I won't use a package having less than 20 points on my own scale. If we all do that we have a chance that packaging improves. I guess Perl has made mistakes so I really don't advocate following their steps blindly, I advocate that we do also slowly build our own strong culture of Quality Assurance the python way.

Oh, and since I am proud of finalizing this «Book» (perl dev) snippet, here is a polyglot in Perl and python to do i++ and ${A}++

q = 0 or """ #=;$A=41;sub A { ~-$A+2}; A() && q' """
A=lambda A: -~A #';
print A(41) # python + perl = <3

I can walk the land of illusions

Have you ever wondered if you were crazy? Sometimes sanity lies in assuming you are. I have what some dumbass call an Asperger Syndrom.

I don't have a syndrom, I can travel back and forth from the land of symbols. I can tell stories equally consistent that may be true of false. When walking this land, I am totally engulfed in another reality.

Let me tell you a story that may be true, or false. I don't know myself where the truth is. And as far as I concerned I don't tell the truth, I tell you a fictional story.

I do recognize what is said about Asperger is true: I can't decypher facial expressions, we are crippled when it comes to feel empathy. This part of the brain seems to be cut from its natural final destination. Brains rewire. Human adapt. We have a part of the brain specialized in symmetry/pattern/singularity recognition that is left unused in our early ages.

We use this part of the brain for other things we get attracted to: flows, mechanisms, music, drawing. When this part of the brain activate, we have the same feelings as if we saw expressions. We crave to see forms in symbols that are the evocation of positive feelings. Regularities, symmetries, pattern of symbols are triggering in our brains their human counterparts.

As young people, we are avid of knowledge, symbols, all field that attract us. A world of symbols slowly mature. We have mutiple maps slowly building in our imagination.

We also need rituals that are yet immutable but varying to bring us our fix of input. We constantly saturate our brain with informations, avidly looking for patterns. Sometimes given certain conditions (sleep deprivation, some emotional pattern), we can fully snap in the world of symbols.

All our circuitry is temporally rewired for analysis: verbal circuits, facial recognition, emotions, pain, sensations all get rewired. And then, we can travel as if we were in the symbols themselves. We can travel our imagination as if it was the true world.

Causality, facts, informations all get intuitively accessible through all day life experience. We walk the map. We have the very feeling to know the truth behind the reality. However, the path is made of orbital with bifurcations each one leading to a contradicting truth. That's why I call this world the world of illusions. I can loose myself there, that's why I repeat so often the mantra of the map and territory. I can distinguish both world. Both are equally false. And I travel on the fine line of sanity.

My rewired and saturated circuits are anarchically generating signals in my brains. It triggers as a side effect emotion, pleasure and pain in a way I can't control. It is intoxicating. It feels like a trip under strong hallucinogens.

Not to loose myself, I need in the real world a passeur. Someone that helps me take the safe way, that helps me deliver my visions so that I don't overload. My expressions are being clearly so influenced by the intoxication that I need someone to decypher all I can say that don't make sense.

I also need him or her so that I can trust her not to make me take the dark paths. Some experiences are best avoided. Who wants to know the actual feeling of death for instance?

The bound is the one of talking. The language is halfway through imagination and reality. It is a journey. I feel like being a weird computer semi directing the calculation operated by someone in communion.

I know it is false, it is only a story. And like any story it ends.

When it ends, the saturation of my brains is followed by an intense relaxation. I can feel the world, the emotions, the wind on my skin as if it was the first time. I sometimes have the feeling to be able to distinguish the emotions of people by watching their face. I can cry and laugh. By coming back, I have a short time walking your world. I am normal, for once. Until, a new crisis will come back. In between, I will be a freak that will be lost in between. Experiencing glimmers of both worlds. With no joys or emotions. I will be a zombie, but nobody will see me as I am when I walk the forest of the symbols.

When it happens, I feel both like a shaman, and a plain fool. And I wonder, if it is crazy, or true. I don't know. I only know these moment of inspirations are frightening the normal people. That they fear us, or fear (for good reasons) we may definitely loose our sanity.

The only thing I know is, that it may be a delirium or a self induced hallucination, but is fucking cool.

I am not like you, I may be crazy, I may see truths that are untold, but I don't care: I can travel in places you will never see. I can walk the path of my imagination without any drugs.







The color of music: why not discover python's ecosystem the right way?

It all begun, a nice sunny day of summer on #python-fr



Well, not that sunny, but a story always begin on a cheerful note. A numerous minority of beginners that comes on #python-fr (usually from PHP) often make a first project to discover python which belongs to the following categories:
  • recoding a network client/server without using twisted or tornado;
  • recoding an HTML parser without using an html parser (yes with regexps). 
When we usually tell them it is wrong, we are in the best case ignored, in some exceptional cases flamed for being castrating their creativity.

How do we tell them that reinventing the square wheel is not standing on the giants' shoulders?

Well, this is so disheartening that I asked myself, what would I do as a first project? And since I sux at deciphering music, I told myself: How hard is it to make a sonogram?


If you know a little music, you know that a note is associated with a frequency (the primary). Every instruments when you play a note have a primary vibration which is strong and a stationary vibration and additional stationary vibrations of k * primary  frequency but of lower amplitude called harmonics. So I want to see the music in a colourful way to be able to tell which notes are played by finding the lowest vibration with the biggest amplitude. No rocket science, occidental tempered modes are a gift from the Greek before they were broke. (We never paid them for this gift, but who cares they don't have IP lawyers).


How hard is it to make a spectrogram in python?



Let me google it for you:
spectrogram music python, choose : stackoverflow link,
I chose audiolab instead of waves because I was reading the author's article a few day agos on planet python.

So you are redirected on http://macdevcenter.com/pub/a/python/2001/01/31/numerically.html which gives you the bases for a first code and some explanations.

Fast Fourier Transform is an outdated (since this year after years of good services) way of transforming a time dependent serie in a frequency serie. But the Fourier transform is still a must.

The truth is you have to make a spectre of frequency, which is not a Fourier fransform, but closely related (I'll skip the theory of temperate distributions). As with a radar you convolve the signal with itself on a time window to have a power representation (autocorrelation). We have a nice trick with Fourier avoiding very costly and boring integrals of overlap over time. As I have not opened a book of signal processing for 20 years, I do strongly advise you if interested to double check what I am saying. I am pretty much a school failure so I must have made a mistake somewhere.

I did not liked the old code on macdev because I had more than 20 lines of codes to write. And I told myself, «it would be very amazing there is not a ready made class for spectrograms in python». And I reread carefully the Stackoverflow answer (googgling is fine, reading is a must). And I discovered that spectrograms are directly included in matplotlib
http://matplotlib.sourceforge.net/examples/pylab_examples/specgram_demo.html


Plus my first result was crappy so I googled : improve resolution spectrogram python. And here I had the base to do a fine code. Because parameters were explained.

I did not even had to code or scratch my head to have a functioning code. Did I learnt something? Yes: always check your problem has not a solution before coding head first. And code less.

The fun in coding is about adding your salt




Well: spectrograms are fine but for decoding music reading raw frequencies sux. So here I came with my improvement that does not interest physicists: changing the numerical scales by readable notes.

Step 1: copy paste on internet an HTML table of note names vs frequency in a file.
Step 2: adding it to the graph.

Here is my code:




And here is the result:
Köln concert PartII of Keith Jareith from seconds 10 to 20.


It is a piano instrumental, you clearly see the harmonics, the notes (No I joke low frequency is a mess to read), and the tempo. My note scales are not the same than Keith's because I loaded a scale based on la=440Hz where it must be la=442Hz. You guess the arpeggio at offset = 1.8second.

Well, we python devs, are sometimes bashed because what we do is «too easy». Well, I find it pathetic someone who wets his pants pissing thousands of lines of code when you can have the same result with 20 lines.


AC/DC back in black from 5 to 10 seconds
Multi instrumental song makes it a pain to read the notes. You may guess however an E3 bass line with a tempo of 60bpm begining at offset = 6 seconds.


Why didn't I make a logarithmic scale? Because spectrograms are not compatible with log scales in matplotlib. That's the end of it.


So what is the conclusion?



If you begin in python, rather than re walking already more than known paths for which you'll invariably code a simple neat and wrong solution just explore what is the bleeding edge and do some fun stuffs, you'll discover we need new ideas and that you can help.

For instance in this case:
  • you could make a matplolib application where you can zoom part of the  whole song in another window;
  • you could try to code a colormap in matplotlib that can support logarithmic scale;
  • you could try to had some filtering to erase the harmonics;
  • you could had a bpm frequency detector and play with bpm to higher the resolution and show the beat in xticks to higher readability ;
  • you could play with the window function and the color scale to higher the resolution;
  • you could make a 3D graphs with frequency bins ...

There is always room for improvements as long as you target the right problems.

Well, computer industry is an amazing field to work in because there are always new problems to solve. Beginning in computer should not be about mimicking wrongly old boring problems (parsing HTML with regexps, network daemons, parallel computing) but about finding your place in the borders of an imaginary land. And remember if as I you begin in python: Always stand on the giants' shoulders, you'll see further than if you stay a dwarf in a crowd of giants.

Lost in cache optimization

Premature (and Micro) optimisation is the root of all evil



We all know this, don't we? And even though I know it, I do still fall in the trap.

Let me tell you my story: it all began with profiling a basic web log parser  I did fairly fast (core of the project was done in 2 days (demo here)). But before releasing it, bmsipelon wanted to check if the implementation was efficient enough so he made a benchmark of the algorithms. Thanks to him I corrected a bug in my library.

You may notice we've had a basic memoizer since the begining?

I was quite raging that his code was running faster than mine, so ... I decided to begin optimizations... And all hell broke loose.

Context : I was parsing a 191K lines apache log in 24 seconds with bmispelon implementation and in 30 seconds with my implementation. And I wanted to beat him badly.  That's called coopetition (competing in cooperation).

Runsnake 



After running runsnake I had this gorgeous report :

Well, I also discovered map reduce is more costly and not that memory efficient compared to a simple loop Simple is better than complex.

Also, Hankyu is not the most efficient (it is beaten by default dict). The fractal look of __iadd__ and isintance is the cost of recursion. Remember: Flat is better than nested.

However, to change the shape of his results he needs to change a lot of code, I only need to change one argument. Cost vs advantages.

I thought that by getting rid of the fat (in httpagent parser, and date formating) I would win my 6 seconds.

As a result, I tried to improve the cache, by measuring its impact. Then I measured and here was the result for the cached date_formater:

{
    "date_format": {
        "len": 143142,
        "hit": {
            "hit": 49972, 
            "time": 0.08554697036743164
        }, 
        "miss": {
            "hit": 143142, 
            "time": 4.912769079208374
        }
    }, 

It was a pleasure, and a drawback: my strategy seemed efficient in terms of performance, but explosive in terms of memory. So I decided to give a try to py3 functools.lru_cache and beaker.

functools.lru_cache and beaker were taking 30 seconds instead of 26 seconds! I thought to myself: these guys must over engineer. So I made my own crude fixed size LookupTable. As usual, Baptiste bested my circular buffer ring like implementation with its implementation. I had carefully looked at all big O notation for the complexity and tried to choose the containers according to my needs. So I was raging.

I also thought: Hey genius, date in logs are monotonically growing but sometimes constant, then why handle the burden of a dict that will blow up your memory, when storing a simple pair of values works? So I implemented, and «measured» with my tool the impact of a monotonic cache:


{
    "date_format": {
        "hit": {
            "hit": 49972, 
            "time": 0.0680077075958252
        }, 
        "miss": {
            "hit": 143142, 
            "time": 4.508622646331787
        }
    }
}
 
Hurray! I won an award of the best idea of the century (probably existing in some old FORTRAN code) and rediscovered that caching strategies should take the shape of the data in consideration. Not only was my algorithm more efficient, but also sparser in terms of memory. So it was time to prepare some nice data to share my discovery.

So I decided to add more caching backends and test them all to look serious. I added repoze.lru_cache.

Here were my first data :



  • dict is a basic memoizer (the canonical example given everywhere),
  • repoze is repoze.lru_cache with maxsize=10000,
  • fixed is memoizer with Baptiste's custom fixed size lookup table  maxsize=10000
  • beaker is beaker.cache in memory and expire time = 600

Wut?



Repoze.lru_cache is 27.5 seconds where functools lru_cache was 30 seconds, and faster than the crude fixed size lookup table?

So I looked the source code. Even though it was more complex it was thread safe and faster. I did not imagine a brain dead close to the metal implementation would be beaten by such a complex code (in terms of single lines of code and tests (they even recoded/optimized modulo!)).

At least, I still had the pride to think my monotonic messy but efficient cache  would be quite a news. But, repoze.lru_cache had seeded some doubt in my brain. So I decided to double check my results. On a tinge of wit I removed my monotonic cache in contradiction with my measurements. So I tested both repoze.lru_cache in replacement of my monotonic cache and the basic memoizer without any cache on the date formater ... just in case.



  • dict + nc is the dumb memoizer with no cache on date formating,
  • repoze wo mono is repoze.lru_cache caching also the date time formater

What happened? 



In my head, I had already anticipated that on the opposite of web caching, my function could be insufficiently costly to use a complex cache. So I made a «close to the metal one value cache». But I relied too much on my own measurements, and forgot to make the test that matters: real time analysis with the dumb time from bash on real data...

Yes, I won : my monotonic cache is indeed faster than repoze.lru_cache which is my archnemesis in this test, but caching this very fast function is dumb useless.

How could I have avoided all this loss of time?



By remembering that Premature optimization is the root of all evil. Let me show you the big picture
Had I begun optimization after a no cache version would I have discovered I was doing breadcrumbs optimization. 4 seconds faster than beaker is impressive (more than 15%).
Winning 4 seconds in regards to 80 seconds (5%) at the cost of :
  • development time;
  • non thread safe implementation;
  • my code is a battlefield;
  • no unit tests;
  • headaches;
  • close failure
is pathetic. I will probably finalize my API without my cache implementations.

Morals of the story



Heisenbug


My measurement missed the overhead of the decorator's call time. A decorator adds a call to a function call, and there is no way to measure it, I over focused on improving cache but I lacked the point that no cache were needed. Never dumbly trust your measures. 

I also (re)learnt my lessons: measures can trigger artefacts, the real data world is often more complex and diverse than the one tested in a beaker. There are some one best way of caching however caching is still data dependant: there is no «one size fits all» caching. If you want to use a cache, it is a good practice to measure objectively that your caching strategy is the good one in situ. There are however some pretty safe strategies.

You should always test your new ideas against real data. Even if you carefully chose your algorithms you might have overlooked one bias (such as overlooking the pathological cases for key deletion in a dict). And what your unit tests may not reveal may appear in production which is the worst place to discover a design bug. Theory is great, so are experiments. They should not be seen as opposed, but rather as complementary.


Caching is like cryptography: you'd best use other's framework



Try to follow the general API, so you can change your backend. There is no such thing as an ultimate cache since caching is really data shape and context dependant.



Never lose sight of the big picture



Had I watched carefully my profiler would I have had noticed what I rediscovered: I was losing time with micro optimization.


Cache benchmarking results

In my test caches are never over filled, because I begin with empty cache and I allocate 10000 entries per cache as a default. yahi.speedshoot taking more than one file as an input, users might have a slight decrease in performance for big series of file with fixed size cache (cost of deleting keys in MutableMappings). However since caching is context dependant (do you parse files from a same sites, or different sites) thus these results may vary with the data you really parse. I cannot guess what is the best strategy, however any cache decorator should be made in a way that you can clean the caches. repoze.lru_cache misses this feature (and I have it \o/).
 
My monotonic cache is a good idea (ego boosting), but I was wrong on the use case. And as long as I have no case in which it works, it is a dumb idea.

Cache APIs are standard enough (cache provider handling the implementations for the real cache, providing cache invalidation and providing a cache decorator) that you should always make an adapter so that you can change your caching backend.

Beaker.cache is impressive in terms of functionality and could make my code thread safe and memory safe. Complexity is improved and it has a cost, but also some returns on investment.

Memoizer strategy is the fatest cache, and also the simpler to code. It is a good arrow to keep in your quiver in most circumstances, but remember your cache will grow without control.

Read before coding



I learnt a lot about caching with wikipedia, stackoverflow and pycon slides that were all over internet. Before coding, it is always a good idea to search internet, not for code but studies. Standing on the giant's shoulders is not lame, it is about focusing on the exciting new ideas that were not coded. Sometimes, it is fun to try to test the established wisdom by implementing your variant. You may win, you might lose, but you will have fun. Maybe I did not find any reference to a monotonic cache because it is lame, maybe I don't have the good keywords, maybe it is a good idea, I'll never know but  I took the chance. You'll never win if you never fight.

Algorithms are evolving, nothing grants that what you learnt is still up to date (even the iFFT algorithm is being renewed).


One pride


I did not ridiculed myself!

My teachers used to say if your results are astounding you should double check them since there is a 99% probability there is a mistake somewhere.

And they said: even if you fail, remember errors are fruitful since one never learns from its success but from its mistakes. Sharing your mistakes with the community is less rewarding but is as important as sharing your success.


This was my mistake: Premature optimization is the root of all evil.


PS here is my code for plotting: