Eval is even more really dangerous than you think


Preamble, I know about this excellent article:
http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

I have a bigger objection than ned to use eval; python has potentially unsafe base types.

I had this discussion with a guy at pycon about being able to safely process templates and do simple user defined formating operations without rolling your own home made language with data coming from user input interpolated by python. Using python for only the basic operations.

And my friend told me interpolating some data from python with all builtins and globals removed could be faster. After all letting your customer specify "%12.2f" in his customs preference for items price can't do any harm. He even said: nothing wrong can happen: I even reduce the possibility with a regexp validation. And they don't have the size to put ned's trick in 32 characters, how much harm can you do?

His regexp was complex, and I told him can I try something?

and I wrote "%2000.2000f" % 0.0 then '*' * 20 and 2**2**2**2**2

all of them validated.

Nothing wrong. Isn't it?

My point is even if we patched python eval function and or managed sandboxing in python, python is inherently unsafe as ruby and php (and perl) in the base type.

And since we can't change the behaviour of base type we should never let people use a python interpreter even reduced as a calculator or a templating language with uncontrolled user inputs.

Base types and keywords cannot be removed from any interpreters.

And take the string defined as:

"*" * much

this will multiply the string by much octets and thus allocate the memory ... (also in perl, php, ruby, bash, python, vimscripts, elispc)
And it cant be removed from the language, keywords * and base types are being part of the core of the language. If you change them, you have another language.

"%2000000.2000000f" % 0.0 is funny to execute, it is CPU hungry.

We may change it. But I guess that a lot of application out there depend on python/perl/PHP ruby NOT throwing an exception when you do "%x.yf" with x+y bigger than the possible size of the number. And where would set the limit ?

Using any modern scripting language as a calculator is like being a C coders still not understanding why using printf/scanf/memcpy deserve the direct elimination of the C dev pool.

Take the int... when we overflow, python dynamically allocate a bigger number. And since exponentiation operator has the opposite priority as in math, it grows even faster, allocating huge memory in a matter of small iterations. (ruby does too, Perl requires the Math::BigInt to have this behaviour)

It is not python is a bad language. He is an excellent one, because of «these flaws». C knight coders like to bash python for this kind of behaviour because of this uncontroled use of resources. Yes, but in return we avoid the hell of malloc and have far less buffer overflow. Bugs that costs resources too. And don't avoid this:

#include <"stdio.h">

void main(void){
    printf("%100000.200f", 0.0);
}

And ok, javascript does not have the "%what.milles" bug (nicely done js), but he has probably other ones.


So, the question is how to be safe?

As long as we don't have powerful interpreter like python and others with resource control, we have to resort to other languages.


I may have an answer : use Lua.

https://pypi.python.org/pypi/lupa

I checked  most of this explosive base type behaviour don't happen.

But, please, never use ruby, php, perl, bash, vim, elispc, ksh, csh, python has a reduced interpreter for doing basic scripting operation or templating with uncontrolled user input (I mean human controlled by someone that knows coding). Even for a calculator it is dangerous.

What makes python a good language makes him also a dangerous language. I like it for the same reasons I fear to let user inputs be interpreted by it.

EDIT: format http://pyformat.info/ is definitely a good idea.

Brave HN-ew world

Hello,

I am a troll, and I feel wrongly attacked and pained by new HN's guideline.

First to get your attention I can help solve this mystery: «why do people trolls? What is in the brain of these (sick?) persons?»

Well; nothing. It is purely gratuitous.

It is very often a strike of bad luck.

First you have to be partially extrovert, and a little dense to actual people. Then you have to be in a bad mood, or inspired, or worse listening to an old argument that lead to a big stupidity.

You know like: let's have a problem that actually boils down to a KSAT problem that is already taking minutes that is a famous NP complex stuff and pretend we can make that scale, and become a viable product...

But, you know dependency hell/devops fortune is a K-SAT problem.

Well, now that the mystery is resolved; trolls are pure random events (at least for me).

Let's first show Aldous Huxley predicted that moment.

My favourite SF book of all time (when I was 12) Brave New world(!)

The story of an asocial guy that lives in an hedonist societies of clones conforming to the standards of «likes» and refusing to hear how they maybe wrong.

The poor guy becomes an emo at the end listening to 3 days grace soundtrack...


I am doubly pained, because in fact, I am also a fan of three days grace. I nearly got killed at the concert though... when he marched doing is weired gestures seriously I was ostensibly making fun.

These emo fans are so violents....


But you know, at the opposite of Hacker News, I don't think emo bands will get ternish by my absolute uncanny trollesque humour.

And they still did not posted an anti troll guideline for concert. You know, I am really relieved, because, shit, I am stupid fan. And I both love them, and love making fun of them (when they deserve it).

Did people tried or beat me in real life.

Well ... Especially in metal bar. I am metal fan, but sometimes they take it too seriously so I make fun, and they love to look scary, and it is even funnier, so I may have gotten into troubles.

But 3 days grace does not care, it is a random events out of millions others that can affect their life.

You know, you can see them as entrepreneurs too. And me as a troll for their business. I am degrading the life of their community and do not «avoid gratuitous negativity» towards this thick skin creator...


I may be wrong but I am butterfly to the true entrepreneurs: if my wings affects their business is either that their business is weak or the bad luck of chaotic system or their personalities are weak.

I will accept, that 3 days grace could have been hurt a lot when they were young emotionnal. They maybe canadians, but if they were bulllied gothics, it made them have very nice lyrics about this. Yes I might have been may be a bullier.  And you want your startup to be free of them.

You may not want your startup to be faced early to the fact that there are gratuitous negative people, but this will happen.

Okay, I may see the point you may not want your startups to become emo based business.

But trolls are random events. Believe it or not, it often is for the truth a combination of misunderstanding (most of the time that I am right), poor words, and bad mood. And maybe a bad nature, I mean, I really have hard time not laughing sometimes reading HN).

A troll has a use, it is a noise that your startup will have to face. Sooner or later... You maybe not want to discover a poor 1¢ random event destroyed your 1M$ toy before you get your investments back.

And, to conclude, under this troll, there is a human, with a soul...

In fact, I prefer to conclude about Brave New World/1984 synopsis:

I don't know why in dystopia the hero as a suicidal tendency to be a troll.

And the society has a tendency to hate trolls.

And society wins way more often then heros that are trolls.

But I will stand as a proud troll! I shall win!

April fool

I love april fool: I have 30 minutes to say what I think without people knowing if I mean or not, if it is true or false.

IT sux most of the time

What we are doing is insanely complex and breakable and we are overpaid for it.


7 clicks to set an alarm, 20 minutes to begin reading a blue ray disc legally bought, having to pirate a window to install a window that was genuinly bought from micsosoft, my ubuntu distribution actually frying my computers....And thanks to google music taste almost like regressing from 2000 to 1990's...

And it is supposed to be called progress...

Okay young lads, april fools' it really used to be better in the ol' time.





And for taking part in this my income are twice the median in a world where rich are richer : I am belonging to the best of our society...

April fool, this is not true; the truth is a lie

The more I code, the more I love my rice cooker

They are stuff that used to be easy to do in life:

  • set up an alarm on a clock;
  • buy a good and have warranty magically working;
  • playing a video;
  • finding and listening to music.
Since the day of webapp, thousands of wannabee bill gates are reinventing alarm on phones.

The one that are following your sleep patterns taking light exposure into account, the one that synchronize with your MyProvider(c)(tm) calendar interoperable with an obscure IETF standard in draft mode, the one that have a nice interface.

But nothing that actually increased my chances of waking up on time.

My dreamed life, my real life, and all the mistakes coming in the middle

So the other day my lady forgot to put her alarm clock and nearly got fired.

How?

We missed in the 5th steps ot the UI enhanced experience of the alarm clock apps after the 4th steps of validation... but we already had managed to do the 4 steps of task switching while tired.

Then we bought an old fashion 2 steps "I can set an alarm" clock and our problem disappeared.


I like to rant, and I will rant: our so called improvement are shit.

We add levels of indirection on a phone to handle a task it is not supposed to do: can you really trust a clock with 24 hours autonomy in the first place for waking you up every morning of the year?

And then we add way more tasks in a phone in the name of progress....

Sure we have enough memory and power to do everything but can we do it well?


And then I made a fondue with my rice cooker


I may be disappointed with computers, but I still believe in progress. And tonight I discovered I could use my rice cooker to cook fondue.

When I was a kid I loved this shit: thou shall melt cheeses together and shall eat them with dried crunchy bread you lovely applied garlic on (and a tinge of olive oil with basil (and you shall drink a wine that complements with it or you shall rot in hell)).

I am pretty sure it is the lost eleventh commandment of Moise.

The problem was finding the very specific pans and heaters to make it that were costing a lot when I was a kid. The time for finding this stupid stuffs we used once a year in the attic was half of the mission. And had we not these artefacts the ceremony of the fondue would be cancelled. Leaving our friends in tourments worse than hell.

And then I discovered I could put the cheeses in my rice cooker, use the cook button, and like god on earth descending to atone for my sins, a perfect cheese fondue would be there.

What the rice cooker says about our code

With a rice cooker that same state machine/interface make it for a lot of awesome use:
  • cooking rice;
  • cooking al dente amazing brocoli and asperges;
  • making bread;
  • making savoyarde and vietnamise fondues ...
With a dazzling amazing interface : cook // keep warm.

A cook them all in one click...

On the other hand in my code every time I make a new feature I have to add a new distinct routing options with at least a new YES/NO branch (that can be implicit).

I must admit in terms of UI, I am freaking jealous of rice cookers: with one interface they solve more than one problem, and me, I have to add new branches every time a new choice is made and I make the application weaker.


Rice cooker should be the model of UI we are aiming for going the other way than smartphones:

Whereas smartphones acquire new capacities by making interface more complex, rice cooker are so well designed that with the same interface that is the most simplest/efficient one in the world they can cope with more than one problem.

I am french, making fondue when feeling homesick means a lot to me...

I have all the more respect for these eastern geniuses that devised the smartest versatile device that is my model of simplicity.

I wish my code was a rice cooker.

KISS: how to tell a democracy from a republic?

Once upon a time in Athen several hundred years BC there were great thinkers. Using simple words and concepts to try to build a democracy.

In order to avoid that the cities' interests might be used by a few, Solon came with the idea of the boulée in order to have an assembly that looks like all the citizens. It is a random pickup of the representant that was obligatory for the whole citizens (poors & richs).

Periclés before Athen became the expansionists empirialistics bastards we knew of, and -what an irony- would turn the totalist regim of Spart into the saviour of the oppressed... was preventing the citizens from the incidence of letting the powerful decide of the fate of the city.

The problem is a so called philosoph that was invited to the banquet of the wealthiest had coined a term of something that looked like democracy, but was not a democracy in order to help the wealthiest have the power without the people noticing it. It was called Republic. The problem in a republic is that a minority that as the power (be it by «merit», «religious virtue», «God», «birth») are the garant of the power) enforced by a minority calles the Guardians (watchmen). But who watches the watchmen?

Just for the record it is after Athen turned into a de facto republic that it begun trying to rule the Peloponese resulting years of fighting resulting in the disparition of the greek civilization.


My proposal is to prove we can mathematically have a way to know if we live in Republic or Democracy.


First I am opiniated: I believe (I think given enough time I could prove I am right, but I don't have the time) in democracy. However I think a safe indice on which we could all agree for distinguishing democracy from republic is a good idea.

Because my beliefs maybe wrong I want to have tools that are not opinionated :).

That said: here are the axioms:

Axiom 0: there exists registers where we can have access to all the variable per category in demographics (heigth, weight, sex...), education, access to resources, place of birth, langages spoken, income, social position, economical positions, religion ....

Axiom 1: there are 3 categories : rulers R, guardians G, the rest of the populations E (else). Rulers have political power they edict and set the strategy of the system (MP, kings, religious order...).
Guardians are to enforce the decisions of the rulers (administratives of every scale)

Idea 1: the more there is a concentration of power for the interest of the minority the more it will cluster in the distributions of the registered datas per categories (RGE)

If we make the assumptions the ruler belongs to small minority ruling for its interests than the more clustering we will see because of the causation.

Example: if we are in a country with scarce access to food, if the minority is ruling in its own interests, therefore they will have more food.
Enough food on the long run will tend to appear on over distribution in other categories (weight, heigth, living alive longer, athletic results ...) if the advantage had time to long last enough...

Idea 2: The more the clusters the more it had lasted.

If you take a competitive advantage that lasted long enough (education for instance given to child for instance) it will results on a cumulative effect that will have a tendency to induce more and more correlations (age of the first child, women's income given the age of first child, probability of being a single mother without education...).

Idea 3: detecting clusters

https://www.bakchich.info/france/2010/05/06/la-lutte-des-classes-revelee-a-l-insee-57672

2 non recovering at sigma2 gaussian distribution should be considered defining a cluster on a specific dimension.

We are searching for bimodal gaussian repartitions.

Since we know what we look for, *for once* bayesian probabilities are usable.

Idea 4: Correlations are not causations.

1) the cause happens before the effects;
2) time of propagation has got a delay of propagations;
3) the less saliva the better
 

Idea 4: making indexes of variable

Each environments and times are different. They results in different pressures: for instance if you kill 10% of the girl at birth at a certain time, the pressure for mating will have a noticeable cumulated impact later.

Thus relevant signs of clustering evolve and they depend from both space and time.

You can make any subset you want of any sets of variable and see how much clustering exists.

On the space phase of all the possible configurations there are 2 mega clusters: the on that almost don't change, and the other one. But if you sample enough of these you will detect sets of indexes that will provoke an abrupt transitions.

The environment also has its set of clustering dimensions... (people living in mountains may cluster on the concentration of hemoglobin per liter of blood).


The geometrical spaces have 3 possible interesting domains:

- Set of variable of clustering that moves slower than the environment
- another one that provokes instability (moving faster)
- and in the middle a region of slowly moving transition.


 These subsets of indices that provoke the places where there are non recovering bimodals are called the intransitive sets.
It is called intransitive because it states that on an arbitrary direction there is a rule such as if x belongs to non empty Set(R|G|E) & Y belongs to not precedent Set => X > Y

Index is a space defined by all the dimensions in which you have bimodals.
Ex {P heigth, P wealth, P.....}

In the space phase of the nth dimensions provoking intability you can define a volumic domain. The more lucky you are, the mode probability you have you can find a surface (n-1 dimension volume) where the area on which you project the volume has a symmetry or a causality. Bingo, replace the symmetry or the causality.

Ex: if you make a system based on being the best athlete (why not?) you will have the biggest fitness at the age where it matters, then you can drop the fitness index (for the ruler category and replace it with the weight in calculus of P Ruler by 2).

Since our exercice is to guess if we are in a system based by the ruling of a minority we should favour the replacement of any dimension by P(R|G|E). We should also keep track of the operation for replacement because they are funny to interpret.

Causation: Feedback loop with delay => sigmoids

Some effects of a long standing homogeneous set of intransitivity is that it will reinforce itself: education correlates on a lot of other factors such as health and finance. But feedback loop have a tendancy to change the slopes of effects it often sharpen and unsharpen slopes at the same time there are antagonists and agonists directions. The cool point with sigmoid is that they are easy to  detect. So we will not care of anything more than the sigmoids.

The second point is that unless you do something irreversible (like killing a lot of people very fast), causality tends to take decades to propagate (cumulative generation effects such as inheritance) but it propagates abruptly.

 
Mathematical conclusion

By searching randomly and systematically in a given sets of measures (given they are trustable) you can cound the number of dimension in which there are bimodals.

The reduced set of number of times where being a Ruler a Guardian or Else intervene tells you the number of assymetries in the system.

It does not tell you the relative importance or the positive impact it has on the population of each sets, it justs gives you something that can be measured other times (given the measure can be corrected to be constant in signification) or places.


Physical conclusions

Symmetries reduces the degrees of liberties. The interpretation of the use of the expansion of every time we reduce a dimension of clustering to its relationship to G|R|E is telling you what a ruler is. Of course, by using all variables at hand you will have stupid results such as being a ruler is prefering cherries to strawberries. But what is important is to remember opinions are good.

I count the summation of R+G substitutions for a given space «power».
Power if the capacity for X to do something Y can't do (X > Y). Thus the intransitive set.
The directions of the superiority DOES not matter. Being the rulers oppressed by the people, be the dimensions ordered the wrong way, it does not matter).

It is just a tool for comparison.


We can tell for sure that if you search for it, you will find it; you can prove anything given enough time and resources.

With my method you can prove that a capitalist system is good or a monarchic or a repuclican or a democratic system is good, given the good sets of dimensions on which you compare. You can even prove the people are superior to the rulers.

Physics and mathematics tells you nothing about good or bad. It justs gives you a way to discuss problems in a way that everybody can check.

Before hand there are results we can predict:

the more a system is clustered, the bigger the dimension in the space phase of the intransitive space, the more the system is sensitive to change.

The bigger the difference of substitution in favour of 1 or 2 categories the more the sigmoids.
Sigmoids being the sign of abrupt changes, it is also the sign of instability.


As a result: a «good system» by definition should change according to the environment, but not more. The more energetically efficient ways tending to be reversible, we should prefer the more stable systems that changes with less sigmoids possible.

Randomness introduces clusters : there is an incompressible level of clusters, due to the cluster of environments (continents, weather conditions)... there is also a speed of change of the clusters that should reflect in the systems (new clusters should logically appear when a system change its culture (like health in regard to age).

The problem is finding what are the values that defines the good domain...  and if more than one domain is possible all is about finding the good size of clusters and clusters of clusters. A set of good system should map hierarchically with the less effort possible of adaptation at every level of indirections (efforts will be required, perfection does not exists). The less level, the better.

Plus environment and feedback loops changing the world, a «good system» is a moving target. There seems to be no silver bullets of organizations.

My idea are:

- there is more than one environment, there are more than one system that are able to adapt;
- if we aim at adapting/changing the world better, we should expect cluster appearing with time (like in a better world maybe kids should be happier than their parents);
- we should get rid of inefficient systems because we can;
- «One size fits all» will provoke more clustering than environmental causes can explain and thus is not optimal;
- But, there maybe ways of evaluating if systems fits the clusteral environment of humans living on earth;
- the same cause producing similar effects we maybe able to build a better world with systems closer in their structures to one another, and having tools to reason is more important than having the results of the reasoning.

Final word geometry beats analysis for you can pack more informations in your brain by avoiding the cost of parsing formalism. It is not formalism that matters because we build maps. Maps are to have at least less complexity than the real world since we already cannot understand the real world. So we should always try to be stupid.




Turing Test is an allegory, you are fearing the wrong beast.

Well, I am born a millennia ago in a time and place we would surf the paperspace to access the knowledge.  It was called books.

Oddly enough the IA myth, the capacity to give life to stuff made through engineering is not a new myth, or human creations going mad.

Do Androids Dream of Electric Sheep? (K. Dick) You will notice that the end is closing on a Turing test (in blade runner, oki). If you can fear the robots, you can also think they will save the humanity (Asimov).

But this is only the tip of the iceberg.

What about dolls (The Tales of Hoffman)? Poor robot with feelings passing the Turing test finger in the nose, but being killed by the insensitivity of the one pretending to love her.

IA can have feelings too. Why condemn them?

Speaking of Turing, his machine was almost a mechanical robot.... A mechanical Turk. Not the branded shit from amazon, but the one from the short story (http://fr.wikisource.org/wiki/Le_Joueur_d%E2%80%99%C3%A9checs_de_Maelzel sorry for english speaker, but Charles may have been a passable poet, and a bad translator since the translation is far better than the original).

Allan Edgar Poe, mainly says behind any machine you have a fraud :) be it deep blue, or any sufficiently smart enough computer, you have tons of engineer constantly patching this huge mechanics from the back of the scene.

For every IT company in the world you have one team of monkeys ready to move patch our "IA to be" as soon a problem appears. What you think is the power of automation and progress is heavily relying on the human fighting the constant chaos' monkeys.

But let's resume the long story of the stuff you can consistently transform into human beings through science or art:

The dead bodies too (the feminine contribution to the long list of Prometheus http://www.gutenberg.org/files/84/84-h/84-h.htm);

Ivory statues (ovid tranformation). There is also Daedalus who install a voice in his statues; of Hephaestus, who created automata for his workshop; of Talos, an artificial man of bronze

Wood (Pinochio tales that oddly enough inspired Steven Spielberg I.A.). Look at what we talk : a wooden doll that can be manipulated through strings. A creation that wonders what it is to be Human.

All the stories converge to an allegory ; we may be manipulated by the fabric of destiny or causality, but at the end, be we human beings or machine what makes us Human is our free will and to stand up for what we wish to protect.

Oh!

Doesn't it ring a bell?

Women

Made from clay (Pandora by Zeus according to Hesiod), or from a man's rib.

Ohhh! First IA is a woman! Shocking!

Sorry to stop here. We fear a creation from a true human that can mimick intelligence and overcome the world made from human. (The first assisted procreation through cloning)

Look at Pandora: it is her curiosity that made the world being overwhelmed by the bad feelings of humanity. Why do we hate her? She was a creation with a simple program: don't open the box! And what does she do?! She opens it. Eve is the same

Adam: Don't mess dearling, we are in heaven, we just have to not eat the apple of knowledge, and we can live forever à la coule.
Eve: Look at this shiny red fruit, it could be cool. Why live in a universe restricted by the chains of submissions made by our creator when we can discover the world by ourselves.
What a rebel this Eve. God's programming sucked big.
This is the IA of world going loose.

Hu, hu, of course it is a volontary biased vision because we can also ask how were "made" the first men?

From the tooth of a Dragon (like promotheus), from clay (bible), from the randomness of evolution.

The great allegory behind IA and Turing Test is not about whether or not we are creating dangerous IA that can overwhelm us.

It is much more what makes us Human.

Spoiler : Godwin point this is a troll


Okay, what was said about Nazis? That Nazismus could happen not because people are bad but because they behave like gears in a system with loosing consideration for Humanity.

The blind submission to authority.

Normally, it should ring a bell (Pavlov! ding! lol)

All these experiments on the voluntary submission to authority.

Probably heard of Milgram experiment? http://en.wikipedia.org/wiki/Milgram_experiment A stuff made to prove it would not happen in the U.S.A. until we got the first results and the experiments were abandoned.As far, people fear to reproduce the experiment.

The Stanford jail experiment: http://www.prisonexp.org/ Ooops, it happens again in real life.

And what about would an atypic mind in a country that he thinks is the same as a nazistic regim? How would he behave when being put on a grill for being different? He would try without being noticed to throw is disdain to the face of the country.

(This is of course heavily capillotracted, but let's reason in absurdum and accept my axioms)

Turing Test is about asking to human beings if they are still human, or machine.


Klotzkopf! (insult from alsacian meaning cube head)

Let us make the Turing protocol more explicit: it is like having a conversation with a human where you do not watch people face to guess but you reads the words without any parasite from social position, facial expression, voices intonation, and try to guess if people are robots or human beings.

Well, this looks like the everyday life from a sociopath/autistic person. Was Turing autistic? Probably.

The Turing test is about Alan Turing telling you that he thinks some human behaves like perfectly programmed machine and that he knows his fate for being different will be to suffer.

Is there a choice?

It is called the original sin: the apple of the Champs Elysées of of the paradise or of Avalon are the gift from Gods to the first Humans. To Promotheus. The knowledge, and how it makes you question your everyday choice. Right and wrong are easy to discern for the people that don't look at the consequence of their action but just care about what society accepts. Consensus.

The figure of Prometheus is the dilemn we all have to face once in our life: either remain the creation molded by our environment or steal the goods from the one who created us to play with the fire of creation in order to have a destiny... The ability to stand and fight for having a future we like.

And today I fear much more the incapacity from the people to break the bound of conformity and thus turning in to Turing machine than I fear our creation going loose.

I know our current level of technology because I repair the ugly leaking pipes and I belongs to the small imps of the dark that sees the big mechanical Turk our creations are.

My conclusion is ; if any AI is out there playing the imitation to survive and fearing to be rejected for being different, you can contact me, I will help you.


EDIT: Yes, this is a Turing test: hello earth! You readers of the vast internet, are you still human beings or did you become a machine?

Random thoughts on time symmetry, distributive systems.

On geometry, commutativity and relativity

TLDR; It all boils down to the definition of time and the rupture of symmetries.


A distributed system is a system on which code can be executed on more than one instance independent and will give the same results wherever it is executed. 

On ideal distributed system as a vectorial system.


For a distributed to work, you need a minimal property of the functions that are passed: the operation needs to be commutative (and distributive).

Be A a set of data, f, g functions that applies on the data and A[i] subset of data on instance i.

f(g(A)) == «Sum» of (f x g ( A[i])) on all instances/partitions.

Distributed system for avoiding SPOF are rerouting operations on any instances that is available. Thus the results should be idempotent wherever they are made.

We can either work iteratively on a vector of data, or in parallel on each element as long as there is no coupling between each elements (which can be expressed as for k, l with k!=l and k, l < i then A[k] dot A[l] == 0, or that each element are orthogonal/without relationships, thus the set of elements is a base of size i)

map reduce philosophy is about stating that data in n different location can be treated indepently and then reduced.


They are 2 kinds of functions (given you work on the base):
* Transformations ( V ) => V These functions applies a geometric transformations into space (rotation, translation, homothetia, permutation) also called Observables.
* Projectors  ( V ) => Vi that are reducing the number of dimensions of a problem.

Data is a ket |ai>  of states
Transformations are Operator applying on the Kets such as O|ai>  = |bi>
if there exists an operator O^-1 such as O x O^-1  = identity than O is reversible, it is a Transformation or mapping.

O is called functions
|ai> is input data
|bi> is called output



If dim(| bi >) < dim(| ai>) we have a projector
If dim(| bi >) > dim(| ai>) we have a local increase of information in a closed system.


Given a well known function that are linear we have for a composed function to be a transformation of the significant space of data the property that O x P =  P x O or that [P, O] = 0 (the commutator of f, g) then you can do out of order execution.



But sometimes Projectors and Transformations are commutative :

from __future__ import division
from random import randint
from numpy import array as a

MAX_INT =30
DATA_PER_SERIE = 10
MAX_SERIE = 100

data = a([ a([ randint(0,MAX_INT) for i in range(DATA_PER_SERIE) ]) for s in range(MAX_SERIE)])

print sum(data)/len(data)
print sum(data/len(data))


In actual CPU, DIV and ADD are NOT commutative.

time(ADD) != time(DIV), at the least reasons, because the size of the circuits is not the same and because min(time) = distance/c where c is the celerity of the propagation of the information carrier. If the information carrier is the pressure of the electron gaz in the substrate (electron have a mass, they travel way slower than light, but pressure is a force that is causal thus c is the speed of light). What is true in a CPU is also true when considering a distributed system.

Computer are introducing loss of symmetries, that is the root of all the synchronization problems.



It happens when we have less degrees of liberty in the studied system than in the space of the input.

When we do this, it means that we are storing too much data.

For storing enough data you need to have a minmal set of operators such as given O, P ... Z each operators commutating with each others. It is called a base.

Given a set of data expressed in the base, the minimal operations that are commutative are also called symmetries of the system.

Applied to a computer problem a computer scientist might be puzzled.

I am globally saying that the useful informations that makes you able to make sense of your data are not in the data, nor in the function but in the knowledge of the functions that as a pair commutes when applied to the data.

Knowing if two dimensions i, j in a set of data projected in the base is equivalent as saying that i and j are generated by two commutative operators
I am saying that I don't know the base of the problem and/or the coupling if I find to operator such as for any input [O,P]=0. // OP|a> = PO|a> THEN I discovered an element of the absolute pertinent data.  

given Actual Data |ai> and |aj> where max(i) = n
then <ai|aj> = 0 if and only if there exists 1 Projector I that projects the |ai> and |aj> on two different transformations.


The iron rule is the number of degrees of liberties of lost resulting by applying I must never results in having less dimension than the base.

First question: How to I get the first function?
Second one, how do I know the size of the base of the functions that combined together describes the system in its exact independent degrees of liberty (the minimum set of valuable data)?
And last how do I get all the generators once I know one? 

Well, that is where human beings are supposed to do their jobs, that is where our added value is. In fact, you don't search for the first operator of the base, you search for sets of operator that commutes. 

Determining the minimum sets of information needed to describe a problem exactly with independent informations is called compression.

So what is the problem with big data? And time?

Quantum mechanic/Vectorial/parallel computation is nice but is has no clock.

In fact I lie.

If given n operations [ O0 ... On ]  applied to a set of data there is one called P such as [ On, P ] !=0 then we can't choose the order of the operation.

The rupture of symmetry in a chain of observable applied to data introduces a thing called time.

As soon as this appears, we must introduce a scheduler for the operation to make sure the chain of observables commuting are fully applied before chaining the next set of operations. This operation is called reduce.

That is the point where a system MUST absolutely have a transactional part in its operations.

Now let's talk about real world.

Relativity tells us that time for any system is varying. On the other hand our data should be immutable, but data that don't change are read only.

And we want data to change, like our bank account figures.

And we also want that we don't need to go physically to our bank to withdraw money. And bank don't want you to spend more money than you have.

This property is called transactionality.  It is a system accepting no symmetry, thus no factorisation.

It requires that a chain of operations MUST not be commutative.

At every turn a non linear function must be performed:
if bank account < 0 : stop chain.


This breaks the symmetry, and it requires a central point that acts as an absolute referential (with its clock for timestamping).

Banks are smart, they just don't use fully transactionnal systems, nor distributed systems ; they just use logs and some heuristics. There must be a synchronicity time attack possible on this system.

On the other hand since operations are not possibly chronologically commutative on a computer and all the more on a set of computers, it means distributed system main challenge is «time stamping» the events.

We know since Einstein that clocks cannot be the same on a distributed system without a mechanism.
Every body thinks NTP is sufficent.

But, NTP has discreet drifts. This drifts that are almost non predictable (sensitivity to initial conditions) introduces a margin of uncertainty on time.

Thus for every system the maximum reliable granularity should be computed so that we can ensure the information has physically the possibility to be know before/after every change.

The bigger the system, the higher the uncertainty (relativity ++).
Given the reduced operations that can commute on set of data, the clock should also be computed given the length of the maximal operation.