Random thoughts on time symmetry, distributive systems.

On geometry, commutativity and relativity

TLDR; It all boils down to the definition of time and the rupture of symmetries.


A distributed system is a system on which code can be executed on more than one instance independent and will give the same results wherever it is executed. 

On ideal distributed system as a vectorial system.


For a distributed to work, you need a minimal property of the functions that are passed: the operation needs to be commutative (and distributive).

Be A a set of data, f, g functions that applies on the data and A[i] subset of data on instance i.

f(g(A)) == «Sum» of (f x g ( A[i])) on all instances/partitions.

Distributed system for avoiding SPOF are rerouting operations on any instances that is available. Thus the results should be idempotent wherever they are made.

We can either work iteratively on a vector of data, or in parallel on each element as long as there is no coupling between each elements (which can be expressed as for k, l with k!=l and k, l < i then A[k] dot A[l] == 0, or that each element are orthogonal/without relationships, thus the set of elements is a base of size i)

map reduce philosophy is about stating that data in n different location can be treated indepently and then reduced.


They are 2 kinds of functions (given you work on the base):
* Transformations ( V ) => V These functions applies a geometric transformations into space (rotation, translation, homothetia, permutation) also called Observables.
* Projectors  ( V ) => Vi that are reducing the number of dimensions of a problem.

Data is a ket |ai>  of states
Transformations are Operator applying on the Kets such as O|ai>  = |bi>
if there exists an operator O^-1 such as O x O^-1  = identity than O is reversible, it is a Transformation or mapping.

O is called functions
|ai> is input data
|bi> is called output



If dim(| bi >) < dim(| ai>) we have a projector
If dim(| bi >) > dim(| ai>) we have a local increase of information in a closed system.


Given a well known function that are linear we have for a composed function to be a transformation of the significant space of data the property that O x P =  P x O or that [P, O] = 0 (the commutator of f, g) then you can do out of order execution.



But sometimes Projectors and Transformations are commutative :

from __future__ import division
from random import randint
from numpy import array as a

MAX_INT =30
DATA_PER_SERIE = 10
MAX_SERIE = 100

data = a([ a([ randint(0,MAX_INT) for i in range(DATA_PER_SERIE) ]) for s in range(MAX_SERIE)])

print sum(data)/len(data)
print sum(data/len(data))


In actual CPU, DIV and ADD are NOT commutative.

time(ADD) != time(DIV), at the least reasons, because the size of the circuits is not the same and because min(time) = distance/c where c is the celerity of the propagation of the information carrier. If the information carrier is the pressure of the electron gaz in the substrate (electron have a mass, they travel way slower than light, but pressure is a force that is causal thus c is the speed of light). What is true in a CPU is also true when considering a distributed system.

Computer are introducing loss of symmetries, that is the root of all the synchronization problems.



It happens when we have less degrees of liberty in the studied system than in the space of the input.

When we do this, it means that we are storing too much data.

For storing enough data you need to have a minmal set of operators such as given O, P ... Z each operators commutating with each others. It is called a base.

Given a set of data expressed in the base, the minimal operations that are commutative are also called symmetries of the system.

Applied to a computer problem a computer scientist might be puzzled.

I am globally saying that the useful informations that makes you able to make sense of your data are not in the data, nor in the function but in the knowledge of the functions that as a pair commutes when applied to the data.

Knowing if two dimensions i, j in a set of data projected in the base is equivalent as saying that i and j are generated by two commutative operators
I am saying that I don't know the base of the problem and/or the coupling if I find to operator such as for any input [O,P]=0. // OP|a> = PO|a> THEN I discovered an element of the absolute pertinent data.  

given Actual Data |ai> and |aj> where max(i) = n
then <ai|aj> = 0 if and only if there exists 1 Projector I that projects the |ai> and |aj> on two different transformations.


The iron rule is the number of degrees of liberties of lost resulting by applying I must never results in having less dimension than the base.

First question: How to I get the first function?
Second one, how do I know the size of the base of the functions that combined together describes the system in its exact independent degrees of liberty (the minimum set of valuable data)?
And last how do I get all the generators once I know one? 

Well, that is where human beings are supposed to do their jobs, that is where our added value is. In fact, you don't search for the first operator of the base, you search for sets of operator that commutes. 

Determining the minimum sets of information needed to describe a problem exactly with independent informations is called compression.

So what is the problem with big data? And time?

Quantum mechanic/Vectorial/parallel computation is nice but is has no clock.

In fact I lie.

If given n operations [ O0 ... On ]  applied to a set of data there is one called P such as [ On, P ] !=0 then we can't choose the order of the operation.

The rupture of symmetry in a chain of observable applied to data introduces a thing called time.

As soon as this appears, we must introduce a scheduler for the operation to make sure the chain of observables commuting are fully applied before chaining the next set of operations. This operation is called reduce.

That is the point where a system MUST absolutely have a transactional part in its operations.

Now let's talk about real world.

Relativity tells us that time for any system is varying. On the other hand our data should be immutable, but data that don't change are read only.

And we want data to change, like our bank account figures.

And we also want that we don't need to go physically to our bank to withdraw money. And bank don't want you to spend more money than you have.

This property is called transactionality.  It is a system accepting no symmetry, thus no factorisation.

It requires that a chain of operations MUST not be commutative.

At every turn a non linear function must be performed:
if bank account < 0 : stop chain.


This breaks the symmetry, and it requires a central point that acts as an absolute referential (with its clock for timestamping).

Banks are smart, they just don't use fully transactionnal systems, nor distributed systems ; they just use logs and some heuristics. There must be a synchronicity time attack possible on this system.

On the other hand since operations are not possibly chronologically commutative on a computer and all the more on a set of computers, it means distributed system main challenge is «time stamping» the events.

We know since Einstein that clocks cannot be the same on a distributed system without a mechanism.
Every body thinks NTP is sufficent.

But, NTP has discreet drifts. This drifts that are almost non predictable (sensitivity to initial conditions) introduces a margin of uncertainty on time.

Thus for every system the maximum reliable granularity should be computed so that we can ensure the information has physically the possibility to be know before/after every change.

The bigger the system, the higher the uncertainty (relativity ++).
Given the reduced operations that can commute on set of data, the clock should also be computed given the length of the maximal operation.


  

An opinionated versioning system based on mapping versions string to numbers in weird base

While we have a convention in python for numbering: 
http://legacy.python.org/dev/peps/pep-0440/

We can mostly say that version numbering thanks to "Windows 9" has shed an interesting spotlight on version comparaison.

They are to tenants of version handling:
- the naïves who consider versions has strings;
- the picky people who consider version has a very dark grammar that requires to be parsed with an ABNF compliant parser.

Well, of course, I don't agree with anyone :) Versions are just growing monotonic numbers written in a weird base but they must have at least comparaison operator: equal, superior/inferior, is_in.

Naïves people are wrong of course

 

It gives the famous reasoning why windows might jump to windows 10:
But is it better than:
https://github.com/goozbach/ansible-playbook-bestpractice/blob/915fce52aa82034cfd61cfbfefad9cf40b1e4f48/global_vars.yml

In this ansible playbook they might have a bug when centOS 50 will come out.

So, this does not seems to hit only the «clueless» proprietary coders :)


Picky peoples are much to right for my brain


Yes, python devs are right we need a grammar, but we don't all do python.

Given Perl, freebsd, windows ... our softwares need versions not only for interaction with modules/libraries within the natural ecosystem (for instance pip) but it should also nicely fit in upper container version conventions (OS, containers, other languages convention when you bind on foreign language libraries ...). Version numbering needs a standard. And semantic versionning propose a grammar but no parsers. So here I am to help the world.

The problem is we cannot remember one grammar per language/OS/ecosystem, espcially if they are conflicting.

PEP 440 with the post/pre weird special case does not look like very inspired by the tao of python (at my wrongful opinion of someone who did not took the time to read all the distutils mailing list because he was too busy fighting against a lot of software bugs at his job, and doing nothing at home).

So as when there are already a lot of standards you don't understand or cant choose from ... I made mine \o/

Back to basics: versions are monotonic growing numbers that don't support + - / * just comparisons

 

Version is a monotonic growing number.

Basically if I publish a new version it should always be seen superior to the previous one. Which is basically a number property.

In fact version can almost be seen as a 3 (or n) digit number in a special numbering such as

version_number = sum(map(project_number_in_finite_base(("X.Y.Z").split(".")))

The problem is if we reason in fixed numbered based logic, we have an intel memory addressing problem since every X, Y, Z number can cover an infinite range of values they can be a loss of monotonic growth (there can be confusion in ordering).

So we can abstract version number as digit in infinite bases that are directly comparable

I am luckily using a subset of PEP440 for my numbering that is the following http://vectordict.readthedocs.org/en/latest/roadmap.html

By defining
X = API  > Y = improvment > Z = bugfix

I state for a user that: given a choice of my software I guarantee your versions number to be growing monotonically on X / Y / Z axis in fashion such has you can focus on API compatibility, implementation (if API stay the same but code change without bug, it is a change of implementation), correctness.

As some devs, I also use informally "2a" like in 1.1.2a to notice a short term bugfix that does not satisfy me (I thus strongly encourage people to switch from 1.1.2x to 1.1.3 as soon as it comes. I normally keep the «letter thing» in the last number

If people are fine with API 1 implementation 1 they should be easily able to pinpoint versions to grow to the next release without pain.

So how do we compare numbers in a n infinite dimensional basis in python ?

Well, we have tuples \o/

Thanks to the comparison arithmetic of tuple  they can be considered to be a number when it comes to "==" , ">" and these are the 2 needed only basic operations we should need to do on versions (all other operation can be derived from the latter).

Version is a monotonically growing number, but it is on a non fixed base.

Next_version != last_version + 1

if version is a number V comparaison of V1 and V2 has sense, addition or substraction cannot have.

One of the caveat though of version numbering is our confusing jargon:
if we decided version where X.Y.Z why do we expect version 2 is equivalent to 2.0.0 instead of 0.0.2?  Because when we say python version 2 we expect people to hear python version 2.x  and preferably the latest. Same for linux 2 (covering 2.x.y ...) it is like writing the number «20» «2» and expecting people to correct according to the context.

So the exercise of version comparaison is having a convention to know how to compare numbers according to API, implementation and bugfix dimensions hierarchically speaking in respect to the undetermination introduced by human inconsistent notation.


Just for fun, I made a parser of my own version string to a numbering convention including the later twist where 2 means 2.0 or 2.0.0 when compared to 1.0 or 1.0.0. It addresses the examples to solve given in PEP440


It can be seen here.


Wrapping up


For me a version is an abstract representation of a number in infinite base which figures are hierarchically separated by points that you can read from left to right.
I am saying the figures are made of a tuple of two dimensional space of digit and letters where digit matters more than letters. (Yes, I am putting a number in a figure, it is sooo fractal).

But most important of all, I think versioning string is a representation of a monotonic growing number.

I am pretty sure PEP 440 is way better than my definition is has been crafted by a consensus of people I deeply respect.

My problem, is that I need to achieve the same goal as them, with less energy they have on modeling what a version number is.

That is the reason why I crafted my own deterministic version numbering that I believe to be a subset of PEP 440.

Conclusion

 

My semantic might be wrong, but at least I have a KISS versioning system that works as announced and is easily portable and for wich I have a simple grammar that does quite a few tricks and an intuitive comprehension.

And human beings are wrong too (why version 2 is 2.0.0 if compared to 2.1.1 and 2 if compared to 2.1 or 2 if compared to 3), but who cares? I can simply cope with.

NB it works with "YYYY.MM.AA.number" (SOA) scheme too,

PS thinking of adding y-rcx stuff by slightly enhancing the definition of a figure.

PPS I don't like talking to people normally so I disabled comments, but for this one I am making an effort : http://www.reddit.com/r/programming/comments/2iejnz/an_opinionated_versioning_scheme_based_on_mapping/
because I am quite curious of your opinions

Perfect unusable code: or how to modelize code and distributivity

So let's speak of what and un/deterministic code really are.

I am gonna prove that you can achieve nearly chaotic series of states with deterministic code \o/

Definitions:

Deterministic: code is deterministic if the same input always yield the same output

Chaotic: a time serie of value is considered chaotic if knowing of sample of t-n samples cannot make you able to predict the t+1 term. 

Turing Machine: a computer that does not worth more than a cassette player.

Complex system: a set of simple deterministic object connected together that can result in non deterministic behavior.

lambda function: stateless functions without internal states.

FSM (finite state machine): a stuff necessary in electronic because time is relativistic (Einstein).

Mapping: a mathematical operation/computer stuff that describes a projection of input discrete dimension A to output discrete dimension B. 


Now let's play real life turing machine.

Imagine I give you an old K7 player with a band of 30 minutes and every minutes I tell the result of n x 3.
If you go at minutes 3 the K7 will tell 9.
If you go at minutes 5 you will hear 15. 

This is the most stupid computer you can have. 
My tape is a program. The index (minutes) is the input, and the output is the what is said. 

So let's do a python Basically we did a mapping from the index on the tape in minutes to integers that yields index(in minutes) x 3. 



So what do we learn with this?

That I can turn code into turing machines, that I can use as a code with a 1:1 relationship, I have a ... mapping \o/

What does compile does?
It evaluates for all the input possible that is an integer belongs to [0:255] all the output possible of boolean function. It is a projection of [2^8] input => 2 output
I projected a discrete space of input to a discrete space of output.

Let's see why it is great

My code is fully deterministic and is threadsafe because my code is stateless.

It is an index of all the 256 solutions for f(x) for every possible values.

if I encode a function that tells if a number can be divided by X another one by Y to have the function that tells if a number can be divided by (X * Y) I just have to apply then & (bitwise and operator) to the int representing the code.

An int is a very cool for a storage of function.
With div2 / div 3 I can by applying all the «common bitwise operator» create a lot of interesting functions:

div2xor3 : a code that indicates number that can be divided by 2 or 3 but not 6
not div2: every even/odd number
div2or3: multiple of 2, 3 and 6
div2and3: multiple of 6 only
....

I can combine the 16 bliter operations to directly obtain functions.

In functional programming you do partial function that you apply in a pipe of execution, here you can directly combine the code at «implementation level»


My evaluation is always taking the same number of cycles, I don't have to worry about worst case, and my code will never suffer from indetermination (neither in execution time nor results). My code is ultimately threadsafe as long as my code storage and my inputs are immutables. 


My function are commutative thus I can distribute them.

div2(div3(val)) == div3(div2(val)) (== div6(val))

=> combining function is a simple and of the code

Why we don't use that in real life

First there is a big problem of size.

To store all the results for all the possible inputs, I have to allocate the cross product of size of input * size of output.

A simple multiplication table by 3 for all the 32 bits integers would be 32 bit * 32 bits = an array of 4billions worlds of 32 bits. 16Gbytes!

Not very efficient.

But if we work on a torus of discrete value, it can work :)

Imagine my FPU is slow and I need cos(x) with an error margin sufficient to only work in 1/256 of radians? I can store my results as an array of precomputed cosinus value expressed in fraction of 256%256 :)

A cache with memoization is also using the same principle.
You replace computing code that is long by a lookup in a table.

It might be a little more evoluted than reading a bit in an integer, but it is globally the same principle.

So actually, that is one of the biggest use of the turing machine: efficient caching of pre computed values.

Another default, is that the mapping make you lose information on what the developer meant.

If you just have the integer representing your code, more than one function can yield the same code. The mapping from the space of the possible function to the space of the solutions is a surjection.

Thus if you have a bug in this code, you cannot revert back to the algorithm and fix it.

if I consider I have not a number of n bit as an input but n input of 1 bit constituting my state input vector,  and the output is my internal state, than I am modeling a node of a parallel computer. This «code» can be wired (few clocks costs) as a muxer that is deterministic in its execution time and dazzling fast.


What is the use of this anyway?

Well, it models deterministic code.

I can generate random code and see how they interact.

The Conway's Game of life is a setup of turing machine interconnected to each other in a massively parallel fashion.

So my next step is to prove I can generate pure random numbers with totally deterministic code.

And I can tell you I can prove the condition for my modified game of life to yield chaotic like results is that the level of similarity for every code on every automaton is low (entropy of patterns is high) AND 50% of the bits are 0/1 in code (maximizing the entropy of the code in ratio of bits).