Data Types are taught erroneously.


On irc #python-fr, we see numerous students struggling against their assignments. Mostly, they have not learnt the difference between an abstraction and an implementation. So I made a sample of Conway's game of life on github to illustrate what is wrong.

https://github.com/jul/game_of_life


http://en.wikipedia.org/wiki/File:Gospers_glider_gun.gif

 

 

Back to my first lesson in ... VLSI design. 



I did not graduate in CS but in micro electronics. I think our first lesson should be taught to CS students, because it highly impacts the way you code.

In electronic we think at wire level, with transistors. But, a circuit might have millions of transistors. Since, this complexity is not manageable either if you ignore how transistors works, or if you get lost in the wire schematics our first lesson is how not to blow our mind by mixing Bottom/Up approach and Top/Down.

Top Down or the blackbox approach


Top Down is about writing high level code, the more abstract possible, and about wiring sub blackboxes in bigger blackboxes. What is important in blackboxes is not how smart you are at what and how you implement inside the box, but how smart your at wiring your components, so that you can change your approach in case something goes wrong. When programming it is mostly the API level.

Top/Down is about designing smart interfaces

Bottom Up approach or knowing your limitations



Bottom Up is about focusing on the limitations of your circuitry (asynchronous ...) and its strengths. In a language such as python, basic Data Types, and the functional domain (recursion limited @1000 frames) ... This constrains your  development.

Bottom/Up approach is about building the inside of the blackbox knowing your limitations

The problem with students



Many students confuses the abstraction with the implementation, and game of life code is like a painful struggle to figure out what belongs to the game of life, and what belongs to the implementation.

See here a classical code

It is unreadable: you suffer with the coder.

If you want to change the rules you need to change the data type used. And you cannot change your implementation easily. Everything is mixed up.

Do you really think this is a good way of coding?

Top Down : Game Of Life main code is about stating explicitly the rules of the games.



from util.matrix import matrix

X = 16
Y = 14
DEAD = 0
ALIVE = 1

grid = matrix(X, Y, [DEAD]* X*Y)
oscillator = [(0,0),(0,1),(0,2)]
still = [(0,0),(0,1),(1,0),(1,1)]
glider = [(0,2),(1,2),(2,2),(2,1),(0,0)]

def at(grid,x,y, pattern):
    for dx,dy in pattern:
        grid.set(x+dx,y+dy,ALIVE)

at(grid, 1,8, glider)
at(grid, 2,3, oscillator)
at(grid, 6,5, still)

while True:
    time+=1
    print grid
    sleep(1)
    n_grid = grid.copy()
    for x in range(X):
        for y in range(Y):
            if grid.get(x, y):
                n_grid.set(x, y, grid.nb_living_around(x, y) in [2, 3])
            else:
                n_grid.set(x, y, grid.nb_living_around(x, y) == 3)
    grid = n_grid


By importing matrix, I clearly state the grid is a matrix like blackbox. I say the blackbox needs : a constructor, a copier, a 2D setter, and 2D getter.

My code is naive, therefore it is not abstract? How wrong: writing human readable code in a natural language fashion that seems idiotic is the highest level of abstraction.

The more your exposed code seems simple and human readable, the more abstract your code is.

Bottom Up : Flat is Better than nested !



In any programming language / architectural design flat is better than nested, since an array of array requires a double allocation, double addressing  ... it brings twice the trouble of a flat array.

If you access  a framebuffer, a picture, you work on matrices, but in reality it is a 2D array  abstraction given to you. The truth is video card, computers are working with linearly addressed memory because they are very good with contiguous chunks of data.

So if you have a look at matrix you'll see :


class matrix:
    
    def __init__(self,size_x,size_y, array_of_mutable):
        self.size_y=size_y
        self.size_x=size_x
        self.matrix= array_of_mutable

    def _oneD_offset(self,ix,iy):
        x,y= ix%self.size_x, iy%self.size_y
        offset = y*self.size_x+x
        return offset
        
    def get(self,x,y):
        """CBP powa"""
        return self.matrix[self._oneD_offset(x,y)]
    
    def nb_living_around(self,x,y):
        around = [ -1, 0, 1]

        return sum([
            int(self.get(x+dx,y+dy)) for dx in around for dy in around
                if (dx,dy) != (0,0)
        ])
        
    def set(self,x,y,val):
        self.matrix[self._oneD_offset(x,y)]=val
    
    def copy(self):
        copied_matrix = None
        if hasattr(self.matrix, "copy"):
            copied_matrix = self.matrix.copy()
        else:
            copied_matrix = [ x for x in self.matrix ]
        return matrix(self.size_x, self.size_y,copied_matrix )
    
    def __str__(self):
        to_print=" "
        to_print+=" ".join([ "%2d" % x for x in range(self.size_y) ])
        for x in range(self.size_x):
            for y in range(self.size_y):
                if (y==0):
                    to_print+="\n%2d " % x
                to_print+=" %2s" % ( "X" if self.get(x,y) else "." )
        return to_print


Well : this code states matrices are an abstraction, we are still using a blackbox approach I am even agnostic on the backend. My matrices are a view on any 1D Sequence.

Benefits : 

  • transposing is just about changing size_x, size_y thus I don't need costly transposition operations; 
  • copy is flat; 
  • tr(A+B) = tr(A) + tr(B) same goes for and/or/not/sub. If the Sequence supports element by element additions for two Sequences (like numpy arrays) theses operations will be dazzlingly fast;
  • I can change my «backend/data storage when I want». 

Duck Typing now (using interfaces)



Since I have mixed my approach I can now do neat things if my constraints are changing since my matrix is agnostic about the real data type I can use any Sequence :


### All these works !
#grid = matrix(X, Y, bytearray(X*Y))
#from numpy import array,zeros
#grid = matrix(X, Y, zeros(X*Y))
#grid = matrix(X, Y, [DEAD]* X*Y)
#from collections import defaultdict
#grid = matrix(X,Y,defaultdict(int,{}))
from util.weird_array import Bitmap, SparseArray
#grid = matrix(X,Y,Bitmap(DEAD))
grid = matrix(X,Y,SparseArray(set()))
#### end of golf

Bitmap is a Sequence interface wrapping an integer, SparseArray is a sparse implementation based on a set, defaultdict, or numpy.array and even bytearray work. A good coder should require as few constraints as possible on its real implementation for having the more escape routes in case something bad happens :

  • if you have very little memory and don't care about speed, Bitmap is the best;
  • if you want speed, the list implementation is slightly behind bytearray in performance, but is more polyvalent (you can implement a cellular automata with states > 255);
  • if you have a sparse board (the number of cells being alive being very more important than the dead one), then the sparse array is for you ...
  • Still the naive list implementation is the optimum regarding all criteria. 

Data literacy is not implementation, it is about properties and interfaces 



Most student we meet on IRC think knowing data type is knowing the exact big O notation for every concrete class operations. And, they never think of encapsulating data as views.

So, you painfully have to try to explain them, the interface of the data types are more important than the data type implementation, and also that data types are more then concrete implementation they are abstractions. And they struggle painfully because, they use the good abstractions, but the wrong implementation.Because they can't dissociate both.

I strongly suggest students to reason in terms of abstract base classes as defined here (which are interface based definitions of the data types) :

http://docs.python.org/library/collections.html#collections-abstract-base-classes

rather than in concrete implementation. But the problem is they are taught to do so!


Wrapping it up  : it's all about Interfaces



Always present a human friendly interface in TopDown approach, implement computer friendly data types in BottomUp approach.


I present a grid interface in the main file, this is the top down approach, the code real value lies in the fact it will be used by human. Still, the implementation, is computer friendly. 



As a result, the best compliment for a coder is when you are despised for writing naive code (at top level), and hated for not playing the same song at the wire level, because that is the nature of code.  

Coders are the mediators between two consistent yet exclusive logic : the one of the computers, and the one of the human beings. Coding is about making this mediation transparent.


So please, stop showing me the guts of your code and expect me to be impressed: I find it gross and a proof of your ignorance of what coding is.  And please teachers :  teach abstraction to your student, or leave the kids alone.

Dividing is not as easy at it seems

The div inferno


Still coding archery and trying to give a consistent definition of add/mul/div/sub to Mappings, I stumbled upon ... my memories of when I was a student in applied physics : computers are just big abaci, they dont do maths, they do something that looks like maths, but is not math.

The ring horror movie


How to explain the most efficiently the when the horror movie begins ?
I dare say when you use division operation and try to overload it. (/, __div__, //, or truediv)

Why ? 


if a and b belongs to the same type of number (rational, integer, relative, fractional, real, irrational, complex), addition and multiplication  you have an endomorphism : what comes in, stays in.

Addition and multiplication are nice algebraic players.

substraction, is more tricky, but does not pose a great threat. The truth about «subtraction» is you only need __neg__. a - b = a + (-b) = a.__add__(b.__neg__())

This normally does not increase the complexity in the behaviour of numbers.

(I'll splip for the sake of comprehension on [-N,N] = Z). You pretty much don't leave your ring.

Div : when all hell broke loose 


Following sub implementation one would say it is sufficient to define the unary operator inversion to define the whole division : a/b= a*( 1/b ), anyway division must enforce
b * (1/b) == 1


Python weirdness


In python of course : int(a) * 1/int(a) = 0 or 1.0 (with from __future__ import division).
Are python developers stupids ?

They are not  : computer is



Well, you have anyway a serious problem. Especially knowing that computers are clueless about math : they know only of IEEE754 floats and integers.

If a variables belongs to natural integers where does its inverse belongs ? 

It is a rational number. For which computer and python have no natural representation. So it becomes a float. Float is a territory, math is a map. It is a map that does not map the territory.


But what if a/b is exact ? then the number belongs to integers ! And binary representations does not gives me the informations of the type of number I deal with : I only know if a number belongs to N, Z (integer) or the rest of the world. And 2.0 is not the same as 2. Therefore, I am stuck.

Why is it important ?


In python Sequence * 2 is the same as repeating the sequence twice. But Sequence * 2.0 raises an exception. Because is a nonsense.

I want  archery to behave this way :
  •  a = { 'a' : [ 1 ] } * (4 / 2)  should be { 'a' : [ 1,1 ] } 
  • .5 * { 'a' : 1 } should be { 'a' : 1 } / 2

So I fall in PEP0238, and in PEP0228, and in pep3141 traps.

Python's fault ?


No, the real truth, for I know how to wire at the metal level an adder, a muler a subber, I thus know that computers are only very very very fast abaci.



The muler of the CPU (not to be confused with the Floating Point Unit) used for int is really done the way it would be done with an abacus with one wire and one bead per bit, plus extra wires used for flagging the results. (Overflow, Carry, Zero, Infinite, Sign)

Abaci are greats for integers, but any other number we use in math are in fact abstractions. 0.5 and 1/2 are the same because .5 is a fraction of power of two. But .8 cannot be represented in binary abacus because it is not an exact sum of factions of power of two.

sum( [ .1 ] * 10  ) == 1.0
#False


That's the reason why all developers using float for financial or commercial transaction should be shot in the head without any trial. They are introducing errors.

This is a must read http://docs.python.org/tutorial/floatingpoint.html
 
At this point one should know of IEEE754. This is the root of all evil underneath the floating point.

It is the standard of the electronic industry for floating point numbers. And it is the standard in all GPU too. So openCL won't fix your problem. You may use extra slow library high level libraries, but you have no hard wired therefore fast replacement in view. There is no standards, or widely used ASIC or  FGPA I am aware of to solve the problem. Since I have not updated my knowledge on the topic for 15 years, you should not take my words for granted (IEEE854?).

So since my problem is at the wire level, I don't expect python to solve it.

If you don't have a solution, then you don't have a problem.

Possible remedies ? 

PEP3141


Well, in fact I would have preferred the rejected alternative based on haskell. But they might have had good reason to reject it.

Symbolic calculus ?


I have played with HP48 and MATLAB (and seen mathematica in action). I therefore know symbolic calculus is possible, but, it requires having dynamic AST parsing inside the language plus lazy evaluation and probably some side effects we cannot predict. We have this in python with SAGE, and it has a lot of opened tickets.

Plus, if you encapsulate every number in symbolic object (fractions, irrational,  integers, symbolic expressions ...) you'll have a loss of performance. Actual integer add is dazzling fast. FPU is pretty slow. Imagine that a simple addition needs to resolve a lot of mro (method resolution order) before actually calling the assembly level add to do the addition ? Your code will be slow.

And I intend my code to rely on stdlib. So I won't use SAGE.


Sympi ? Well, I just discovered it while writing this post. I will look at it later.


Smart approximation ?


Imagine you have 0.30000001 in your float, wouldn't it be tempting to say it is .3? Well as long as you don’t mean a= .3000000001 and don't need to subtract it to  .3 you don't have a problem.
Since python is pretty well known for its love of  the least surprise principle and the refusal of the core developers to guess, there is not a chance it may happens in python. This would de facto results in implicit casting, and if you are not aware of implicit casting nightmare, just think of PHP has the result of this choice. https://bugs.php.net/bug.php?id=54547

Arbitrary precision ?


Python already provides it natively for integers. We also have the bigfloat package on pypi. It does not solve my implicit cast problems, nor does it solves the fact that floats are an inexact representations of fractions. It may also (not tested) slow the speed.


Fixed Point arithmetic ?


Python provides the decimal base data type, you are strongly encourage to use it when playing with code that relies on exact results (commerce, finance...). Hardware manufacturer (IBM being the leader) also scratches their itches on the topic. But it am still unable to know if  (a/b) is an integer, or if .5 is exactly 1/2,

Uncertainty estimates ? 


I sometimes wish operator have an enhanced mode where they not only give the result, but also the uncertainty. But, I don't know if it is feasible.

Porting haskell monads ?


Well, if I think it would be faster to switch to haskell than to wait for that to be ported in python.

Archery will therefore not support div for safe implementations


Since I want to ship a working version, and I don't want to debug impossible problems I therefore am reluctant to implement division for mappings. The problem is at the wire level, and I think  any workaround at language level is just plain wrong in terms of predictability, consistency, and performance.

I will flagged any mappings with div as tainted/unsafe, or gladly accept any propositions that makes it sane. My brain just can't figure any good solutions now. 

Education might be the solution


I don't know what happens in your country, but I have -especially in finance- seen educated monkeys using very sophisticated equations relying on exponential of matrices (exp(X) = Sum(pow(X,n)/fact(n)) to model economy (with retrocation) and use plain floats for the sake of «performance». Whenever I hear them, I shiver. Retraction + division implies the amplification of errors. And as far as I am concerned, for so called performance reasons (I dare say numerical analysis illiteracy), they don’t use error estimations.

Well truth is, as soon as you work with float, your intuition is wrong !

(a - b ) * c != a * c - b * c

Errors  in substraction and multiplication have not the same amplitudes and do not behave like linear equations. And if you use feedback models witch converge (reinjecting small portions of the output values in the input), you may have errors being greater than the signal.
Dont confuse exactitude and precision :  http://en.wikipedia.org/wiki/Propagation_of_uncertainty

Conclusion : the headache of symbols, the sanity of python


I am pretty much a computer illiterate,  I never code relying on what I know of computer, I rely on words, symbol and their meaning in the real world. When I code, I don't have the feeling to rely on scientific knowledge, I have the feeling to write an English essay. I take a short-cut by taking for granted that people will rely on what we share as a common knowledge in real world :
  • I expect + - / * to behave like in math ; 
  • I expect words to have the usual meaning;
  • I expect context to give me informations;
  • I expect object to be the subject, method to be the verb, and arguments to act as complements.
I stumbled on this div, because my rule works 99% of the time, but it fails in 1% of the time.  But, computers have their own logic. People should know it before claiming to be developers. On the other hand, a program not readable and modifiable by a human is useless.

So as a conclusion, I'd say, computers are imperfect, human are too, and the craft of programming is staying in a balance between computer literacy, and relying on human common knowledge. This is a hard path, since balance is about breaking rules wisely, and therefore being taken into default by the zealots of both chapels.

I also discovered I like python because the core developers are safeguarding the language core against weird ideas, and I like PEP because everything has been rationally pondered and the arguments behind every designs are stated.

Also, the scientific community being very active amongst python community and giving feedbacks,  I see python as the leader in numerical analysis problematic amongst the big 4 (ruby, perl, PHP, .net).

I also discovered, I shoud really peek at haskell monads.



You'll have some matplotlib in your pycon ?

Yesterday, I made a presentation on matplotlib during the vous reprendrez bien un peu de pycon event. An event dedicated to make people discover projects in python. It was a nice moment thanks to a responsive audience. For those who were there (what a snobism, I publish in en_uk the restitution of a presentation made in french) here is a small summary.


Why (not) choose matplotlib?


After presenting the difference between the nice jquery.flot charts and the ugly matplotlib charts, and how I lost a contract as a freelance with matplotlib, I stated that once you need something a little more complicated than histograms, pie charts and plots, you are stuck. Plus javascript code is bigger and uglier. And if a last argument was needed, you can do more with less, once you know the «coup du berger» (a safe opening  in chess).

Plus -for former Perl developer as I- matplotlib as a tremendous advantage: a gallery with actual examples you can copy paste and it works. But in order to re use them, you need to know the standard opening of a plot.

Forewords

As a former matplot programmer I can ensure, matplotlib is functionnal port of matplot, an old software used in university for playing with discrete series in signal processing.

I pointed people to that it was safer to install matplotlib the «packaged» way on your OS. See here :  http://matplotlib.sourceforge.net/users/installing.html

Then, one should use ipython that gives a nice feeling of immediateness by rendering on the fly your plots. (ipython has nice features such as completion, syntax highlighting, help)

Le coup du berger


There are «historical? coding convention» in matplolib, making examples easier to read and modify once you know there is nothing to understand :

from matplotlib import pyplot as plt
### importing pylot is all about initializing the GUI/canvas
fig = plt.figure()
### now you have a canvas
ax = fig.add_subplot(111)
### at line 1, col 1 , in a canvas made of 1 subplot
ax.plot(range(100), label ="optional legend for the plot")
### yes ploting is done here
ax.set_xlabel("x is there")
ax.set_ylabel("and ofc y is there")
plt.title("titles are nice")
### These are optional but so easy
plt.show()
### show the result
plt.savefig("example.png")
## this is the result saved for this post.
Can't say, it is not easy ?

What is the use of the coup du berger ?



Now you can, not only cut and paste examples, but you can now recognize the pattern and understand what is specific here : http://matplotlib.sourceforge.net/examples/ This part of matplotlib is the pypi or the CPAN or the CTAN of matplotlib. The place where you gather working examples to enhance your experience of matplotlib. At this point you have the starter kit for using matplotlib.

Extra fun. 

Numpy is great

Numpy are bindings basically to fortran libraries (as a former fortran77 developer I luv it :) ) that gives a little bit of «array langage» paradigm to python. (During the presentation I said it contained numerical recipies, but I was wrong, I said it was a high performance high library for homogeneous arrays, I was right). An «array langage» is simply a language where a variable is an array and for instance 2 * a will multiply all members of an array by 2.

Plotting a cardioid, add_subplot


from matplotlib import pyplot as plt
import numpy as np
from numpy import cos, sin
### sin & cos of numpy works in radian (sin & cos of math in degrees)
fig = plt.figure()
ax=fig.add_subplot(2,1,1)

## I declare there will be a stack of 2 subplots and I want to play with line 1, col1
a=cos(2.0 * np.array(range(2000))/200.0)
## array of float
ax.plot(a, '.-')
## '.-' is about telling the shape of the line I wanna draw
ax=fig.add_subplot(2,1,2)
## I tell matplotlib in the stack of the subplot I want another one under the first on the 2nd line
b=sin(5.0 * np.array(range(2000))/200.0)
### gruiky implicit cast to float by 5.0 * 
ax.plot(a, b, label="nice cardioid, ne?")
ax.legend()
plt.show()

Playing with dates


Question:  how do the hell do I play with dates ? 

Answer: Coup du berger + manual :) 

  1. convert your dates with dates.date2num
  2. use ax.plot_date
  3. if lazy use fig.autofmt_xdate
or ask stack overflow :)
or look the gallery : http://matplotlib.sourceforge.net/examples/pylab_examples/date_demo1.html

Even more fun


This is fun http://scipy-lectures.github.com/advanced/image_processing/index.html 

Et voilà : all is said and done, and now you can fly with your own wings :) 
The manual is great, the gallery is great, this is all you need to know to have fun.  

Tablet devices considered harmful


Stand Tall


The limiting factor in user interface whatever BS are told on internet is the human condition. Human beings have two distinguishing traits other animals dont have :
  1. the opposing thumb (which does not concern us here);
  2. the bipedic position.
And because of our evolution, bipedic position put strains on our spine.

Since work accidents are costing a lot directly and indirectly to the economy, we have numerous studies on what might might and might not harm humans. They all revolve on the intrinsic weaknesses due to our bipedic position.

Recommended position for computer use
All the known dangers related to computer use as listed in wikipedia are basically linked to :
  • bending your neck;
  • not keeping your spine straight;
  • Prolonged use of the hands, wrists, back, neck, etc.
  • Sitting in the same position for a long period of time.
This field of research known as OSH is closely linked to the workers struggles in the late XIXth century that acknowledged the legal and financial responsibility of the companies regarding harmful work condition : what we commonly refer to occupational accident. And all the countries belonging to the International Labour Union will make your company liable for any harm coming from unfit working conditions.

Have you never wondered why Apple don't ship touch screen with their bigger computers ? 


Since liabilities are cascading,  if your working conditions are coming from some equipment provided by your company, if companies are misinformed by hardware manufacturers the liability will be cascading to the manufacturers.

As a result you may have noticed desktop manual first pages are always about safe working positions. These are codified in regulations that all revolves about conforming to normalizations (such as this one). These are mainly legal disclaimers : if you don't use the electronic device the «safe» way, then the manufacturer has nothing to do with your problems. People should always read the manual and contract :) 


You may think touch screen is a hipster technology?

So wrong.

It was one of the numerous technologies researched in the late 80's  in the famous Palo Alto Park (famous for having been the source of  most of so called apple innovations). And they had a term for the major defect of tablet devices : the gorilla arm. Plus as you can notice, when you use a tablet device you do not always think of:
  • your sitting position;
  • how you bend your neck.
Since Apple have very good lawyers, I don't think  the absence of touch screen for laptops and desktops is fortuitous. 

Science is about causality

And causality is the claim that the same causes produces the same effects, as long as you measure them.

Since the introduction of tablet devices, their use has been marginal. The more they will be used, the more people will interact with them, and as a result will present repeated strain injuries linked to the bad habits taken. Thus tablets are not yet harmful since no studies are made yet, but they will become because we have strong knowledge on technologies sharing traits with tablet devices : pocket books and touch screens ...


«We can predict everything, except the future...
But future is in certain circumstances a degraded version of the past»

source geek.com
I can therefore safely predict that :
  • studies will come in the next 10 years to prove tablet devices are a risk to the health;
  • hardware manufacturer will provide desktopish docks or laptopish docks for hand held computers (smartphones, tablets...);
  • tablet PCs users with a PhD or masters in CS or ergonomics are complete morons since they could have  seen it by themselves (this thus includes :  geeks, most of computer journalists, fixies user, and maybe you if you needed to read this post to understand the problem, ).

EDIT : since when so we know iPad/tablet position for reading is a bad idea ? 

Since copists are using «pupitre» the same as a dock.



The joice and headache of naming


Be careful with naming


In a previous post I illustrated how my communications skills are below average. One idea (I will prove useful with side projects such as a parsing of apache web log in map reduce fashion) was strongly rejected because I communicated awkwardly. Therefore, now, I am overly cautious with the words I use.

I never was strong with math and science, that's the reason why I chose physics. I never was able to use the wanted terminology but, since I could see symmetries and singularities I always was able to give the solutions without resorting to too much equations. As a result, since I could find solutions,  I never cared to explain in plain words how I found them. I do the same when I code : I don't bother explaining, since I see the solution.

I thought that people would understand code without explanations... How wrong. Code does not tell anything. 


Telling I wanted to add addition to dict was a screw up


Well, python is a duck typing language, I proposed to enhance a «strong» type. A dict is something that quacks like a dict, run like a dict, flies like a dict. Using the wrong words misled me. Something that is a duck-dict is a Mapping. I should have said I wanted to have Mapping to behave correctly with Addition.
Here is the source of Mapping (See here). These are only abstract methods that everything that want to qualify as a dict must have. As a result, it made my code freakishly easier. Mapping is the abstract class for a any dict like classes.
It makes therefore isinstance safe to use !

How to do it ? Maximum overload


There is an obvious way to do it : overloading addition operators : as described here.

Object is_a or has_a. How to do it monkepatching, composition, defining a factory ? I decided it was time to break my biggest taboo : multiple inheritance !
Since Mapping don't have Adders it is not a problem. For the sake of this article, (and since I want to communicate properly now), I discovered this is known as Mixins in python. Since I am a (sniff) former Perl coder I know them as trait, ruby knows them as behaviour.

Since I thought of add/sub/div/mul not as actual mathematical operations but rather as behaviours and that one of my first dev was a light unit test testing behaviour (associativity, distributivity ...). I said funny coincidence, and that's it, I began recoding VectorDict as mixins.

I am not speaking of freezing the meaning of addition, I am speaking of defining more than one set of consistent sets of behaviour for the basic arithmetical operations according to rules that ensures properties, and that makes them safe to use according to our mathematical intuition. (I am pretty found of non euclidian geometry, so I have some fun ideas in mind).


The strength of a word


Words are strong if they are concise enough, but not too broad. My previous experience with VectorDict told me, that naming mixins at the method level would make class declaration unusable. I know 99% will need only Adder, and 1% would maybe need the rest.
What do you need to make a Mapping actually add (use the + sign) ?
Overloading __add__ ?
Not enough : you then don't support a+=b
__add__ and __iadd__ ?
not enough : you need __radd__ : this is how I do it :

class Adder():
    """ making Mapping able to add"""

    def __add__(self, other):
        """adder"""
        copy = self.copy()
        copy += other
        return copy

    def __iinc__(self,number):
        """in place increment"""
        for k,v in self.iteritems():
            self[k] += number
        return self
        
    def __iadd__(self, other):
        if not isinstance(other,Mapping):
            self.__iinc__(other)
            return self
            
        for k,v in other.items() :
            self[k] = v+self[k] if k in self else v
        return self
    
    def __radd__(self, other):
        copy=self.copy()
        if not isinstance(other, Mapping):
            return copy.__iinc__(other)
        return copy.__iadd__(other)



With this code you have the following result :
from collections import defaultdict 
class Dad(Adder,Muler):pass
e=Dad(int, { "a" : 3,"b" : Dad(int,dict(a = 2 )) })
e*=2
print e+1
# OUT: defaultdict(, {'a': 7, 'b': defaultdict(, {'a': 5})})
f=Dad(int, { "a" : [],"b" : Dad(int,dict(a = [] )) })
f+=[ 1, ]
print f
# OUT: defaultdict(, {'a': [1], 'b': defaultdict(, {'a': [1]})})

That is the reason why as a tribute to these two words that helped me (trait, and behaviour) my new package implementing the mixins for Addition is now known as archery
Because I thought it would be fun to import trait (arrows) from archery.


So much more naming problems


There is a coupling that cannot be avoided later on traits for Diver on Muler and Adder. It can seem odd, but you MUST code Adder, than Muler, and only then Subber. As a result I had in mind I would have to make consistent sets of traits (since I do not intend to restrict myself to linear algebrae). And I needed a name that made sense to bundle traits of Adder, Subber, Muler, Diver in a way I was safe. That is archery.quiver. Because, I have found nowhere on internet how to name intuitively a macro behaviour/trait/mixin of mixins that must be coupled.

A quiver is not planned to be only a set of traits, it is also planned to be tested for consistency.


Archery, trait, quiver and ... bow


And since I was having fun having traits and quivers I decided it was a time to have the archery.bow. Bow would assemble traits and quivers in usable class. The problem is how to name :
  • a defaultdict with addition ?
  • a dict with addition, substraction, division, multiplication

Java style and being hated ?


Name should help !
But writing DefaultDictWithAddition (wich is the easier) might become boring (even for me).

Try the pun and being misunderstood?


D(edault)D(ict) that add, sounds to me as dad therefore Daddy.
Who would call its class Daddy ?  And how to call Defaultdict with Subber ? Momma  ? I realised I would  propably face some criticisms for being too openly sexist and giving useless names.

 When you have no solutions just be resourceful


I decided that since I had no choices but to try to respect the minimum contract with code I would do two things :
  • have less than 10 chars names
  • give an hint on the capacity of the class


Bow naming convention


First with trait, and quivers, you may think you don't need bow, but trust me use them (at least as a template for other mappings).
Second, bows will be classified by their power :
  • short bow,
  • longbows,
  • crossbow
It has one exception : LongBow (I do have a strong admiration for its use in Battle of Agincourt) so it is reserved for the full bell and whistle one : the migration of VectorDict.

Since I know nothing of archery, I will have fun finding exotic names (like Daikyu which is a Japanese long bow) for the other ones.

The naming Nightmare is not over

https://github.com/jul/archery/blob/master/archery/crafter.py have two functions that are clearly misnamed, and I am short on inspiration.

Even crafter smells like a wrong idea.

The Bowyer


Bow( int, { "a" : Bow(int, {"a": 1 } ) ) 

This being annoying to write I have function that takes a Mapping to convert all dict intricated in a dict it works this way :
 
make_bow = lambda tree: Bow(int, tree)
Bowyer(make_bow, { "a" : { "a" : 1 } } ) 

The fletcher


This one is clearly misnamed : it generates an iterator on all path to keys and values in the following form : from archery.crafter import fletcher

print [ x for x in fletcher({ "l1": 2, "l2": { "c1" : 1 }, "l3": [ 1 , 2 , 3 ] }) ]
# OUT: [["l1", 2], ["l2", "c1", 1], ["l3", [1, 2, 3]]]

(I use it to convert dict/JSON to CSV).
You are welcome to help me on this.


PS : this story is a little shortcut and gives me falsely all credits for the ideas of naming which in real life involved bruno rohée & baptiste mispelon. And, I lied a little bit, I did not find everything at once, but telling the truth might have been confusing.

Don't plan to throw one away, you'll do it anyway


I am pretty convinced that is common wisdom to plan to throw one away, since you'll do it anyway (Fred. Brooks). But, in real life, it is better not to plan it, but to struggle hard with your first idea, and push it to the limits.

All begun with a strange idea

I may have been quite trollish when proposing this to the python ideas mailing list. But, I soon begun to build VectorDict

My plan was even bigger than that : trying to make computer specialists acknowledge of the need of algebraic rules for the common operators.

I can even acknowledge that my plan was (maybe is still) leading without me realising it to reimplement in a subset of python an inefficient and slow LISP (see Greenspun's tenth rule). Well follow the idea :
  • algebrae and dict as (fractal) vector leads to matrices,
  • matrices applies on vector and transform vector in vector,
  • therefore I am mostly planning to transform trees in trees, thus I am trying to reinvent a poor inefficient LISP as a subset of python
I packaged it, and pushed it on pypi, expecting to see early adopters giving feedbacks, I wrote a documentation with sphinx expecting people to find a use for this package and give me advices.

But none of this happened.

I wrote an actual practical use case, thinking people would  look the code and understand the API.

But none of this happened. 

Road to failure

Believing in your ideas and coding, advocating, thinking you are right.

That seems mean, but it is true, it is the road to failure. The road to failure is not about believing and working, it is all about not being able or wishing to communicate. I was thinking at that time that since I don't know how to code, I would present a consistent API with unit testing, and that people would understand the idea, steal it, and would make a correct implementation. And, then,  I would have something working because I saw a lot of use for this.

But I can't code it properly since I know nothing. I can't communicate since I don't know the right words. The truth is most time I have heard «singleton, design patterns, abstract class, OOP, frameworks...» coming from the mouth of a fellow coder, it was to just a way to be obscure to appear profound. So I have a profound dislike for these words and those who pronounce them. All the same goes for ring, and all mathematical terms. I only have my math books from university and 13 rules I know linear algebrae should enforce (like a+b=b+a, a+neutral = a , a-a = 0 ....). So I thought following the book would be enough (since I have an IQ far below average), and since people are smarter than I they would understand. I myself have a background and believe in :
  • not using recursion, since stack can explode
  • not using multiple inheritance since it is «evil»©®
So I did not planned to throw my vectordict, I became an obnoxious salesman of my solution, saying it would be great for map reduce, and made a word counter in map reduce with multiprocessing. And it worked. But no one cared, even though there were blogs everywhere talking about map reduce for big data.

But no one listened. I was talking the wrong dialect at the wrong place.

Becoming yourself

Well, I am old, and I was grown up with the idea that you cannot be smarter and more creative than in your twenties, and I have always been a slacker. At that time, I was attending IRC since 3 months as the obnoxious salesman I became.

To have more friendly reaction, my nick was even one of a girl. And, I lost months, days, hours attending IRC. #python, #python-fr, #freebsd-fr, I was just lost, and an annoying pain in the ass.

But, one day, someone asked me what would be the use of my package and as I reinstalled my (crashed) server, I decided to give it a go on apache log combined parsing.

And one awesome person (bmispelon) did not let me code my own way. He challenged me to do better, faster, more logical code, and to benchmark my code.

What benchmarking learnt me was :
  • that my first code (of a not really good developer) was not that bad, because I could understand and modify it ;
  • that my idea was freakingly efficient;
  • that I should throw it away because its structure was unclean

When to throw your code

The moment to throw your code is not when you know it is dirty, but when you : on one hand begin to understand why it is cool, on the other hand you become limited in your progression by all the things you made wrongly.

Thanks to this experience, I now have became a better coder (I must admit that answering and listening on IRC about others common mistakes helped me a lot).

Failure is cool : it is the road for learning

One does not learn from his success but from  his mistakes

Without me noticing by working hard, failing, I learnt from my mistakes. I could only learn from my mistakes because I involved myself much. Had I thrown away VectorDict too soon, I would have not been able to resurrect it in the new package to come : archery. And I learnt, I needed to be part of a community to interact with it. 

To be honest that is the only reason to be of this blog. 

PS : Merci Baptiste :) 

There is a begining for everything

with open('/dev/null', 'w') as __import__('sys').stdout: from this import s
__import__('sys').stderr.write(s.split("\n")[10].encode("rot-13") + "\n" )
raise Exception("not implemented yet")