Dividing is not as easy at it seems

The div inferno


Still coding archery and trying to give a consistent definition of add/mul/div/sub to Mappings, I stumbled upon ... my memories of when I was a student in applied physics : computers are just big abaci, they dont do maths, they do something that looks like maths, but is not math.

The ring horror movie


How to explain the most efficiently the when the horror movie begins ?
I dare say when you use division operation and try to overload it. (/, __div__, //, or truediv)

Why ? 


if a and b belongs to the same type of number (rational, integer, relative, fractional, real, irrational, complex), addition and multiplication  you have an endomorphism : what comes in, stays in.

Addition and multiplication are nice algebraic players.

substraction, is more tricky, but does not pose a great threat. The truth about «subtraction» is you only need __neg__. a - b = a + (-b) = a.__add__(b.__neg__())

This normally does not increase the complexity in the behaviour of numbers.

(I'll splip for the sake of comprehension on [-N,N] = Z). You pretty much don't leave your ring.

Div : when all hell broke loose 


Following sub implementation one would say it is sufficient to define the unary operator inversion to define the whole division : a/b= a*( 1/b ), anyway division must enforce
b * (1/b) == 1


Python weirdness


In python of course : int(a) * 1/int(a) = 0 or 1.0 (with from __future__ import division).
Are python developers stupids ?

They are not  : computer is



Well, you have anyway a serious problem. Especially knowing that computers are clueless about math : they know only of IEEE754 floats and integers.

If a variables belongs to natural integers where does its inverse belongs ? 

It is a rational number. For which computer and python have no natural representation. So it becomes a float. Float is a territory, math is a map. It is a map that does not map the territory.


But what if a/b is exact ? then the number belongs to integers ! And binary representations does not gives me the informations of the type of number I deal with : I only know if a number belongs to N, Z (integer) or the rest of the world. And 2.0 is not the same as 2. Therefore, I am stuck.

Why is it important ?


In python Sequence * 2 is the same as repeating the sequence twice. But Sequence * 2.0 raises an exception. Because is a nonsense.

I want  archery to behave this way :
  •  a = { 'a' : [ 1 ] } * (4 / 2)  should be { 'a' : [ 1,1 ] } 
  • .5 * { 'a' : 1 } should be { 'a' : 1 } / 2

So I fall in PEP0238, and in PEP0228, and in pep3141 traps.

Python's fault ?


No, the real truth, for I know how to wire at the metal level an adder, a muler a subber, I thus know that computers are only very very very fast abaci.



The muler of the CPU (not to be confused with the Floating Point Unit) used for int is really done the way it would be done with an abacus with one wire and one bead per bit, plus extra wires used for flagging the results. (Overflow, Carry, Zero, Infinite, Sign)

Abaci are greats for integers, but any other number we use in math are in fact abstractions. 0.5 and 1/2 are the same because .5 is a fraction of power of two. But .8 cannot be represented in binary abacus because it is not an exact sum of factions of power of two.

sum( [ .1 ] * 10  ) == 1.0
#False


That's the reason why all developers using float for financial or commercial transaction should be shot in the head without any trial. They are introducing errors.

This is a must read http://docs.python.org/tutorial/floatingpoint.html
 
At this point one should know of IEEE754. This is the root of all evil underneath the floating point.

It is the standard of the electronic industry for floating point numbers. And it is the standard in all GPU too. So openCL won't fix your problem. You may use extra slow library high level libraries, but you have no hard wired therefore fast replacement in view. There is no standards, or widely used ASIC or  FGPA I am aware of to solve the problem. Since I have not updated my knowledge on the topic for 15 years, you should not take my words for granted (IEEE854?).

So since my problem is at the wire level, I don't expect python to solve it.

If you don't have a solution, then you don't have a problem.

Possible remedies ? 

PEP3141


Well, in fact I would have preferred the rejected alternative based on haskell. But they might have had good reason to reject it.

Symbolic calculus ?


I have played with HP48 and MATLAB (and seen mathematica in action). I therefore know symbolic calculus is possible, but, it requires having dynamic AST parsing inside the language plus lazy evaluation and probably some side effects we cannot predict. We have this in python with SAGE, and it has a lot of opened tickets.

Plus, if you encapsulate every number in symbolic object (fractions, irrational,  integers, symbolic expressions ...) you'll have a loss of performance. Actual integer add is dazzling fast. FPU is pretty slow. Imagine that a simple addition needs to resolve a lot of mro (method resolution order) before actually calling the assembly level add to do the addition ? Your code will be slow.

And I intend my code to rely on stdlib. So I won't use SAGE.


Sympi ? Well, I just discovered it while writing this post. I will look at it later.


Smart approximation ?


Imagine you have 0.30000001 in your float, wouldn't it be tempting to say it is .3? Well as long as you don’t mean a= .3000000001 and don't need to subtract it to  .3 you don't have a problem.
Since python is pretty well known for its love of  the least surprise principle and the refusal of the core developers to guess, there is not a chance it may happens in python. This would de facto results in implicit casting, and if you are not aware of implicit casting nightmare, just think of PHP has the result of this choice. https://bugs.php.net/bug.php?id=54547

Arbitrary precision ?


Python already provides it natively for integers. We also have the bigfloat package on pypi. It does not solve my implicit cast problems, nor does it solves the fact that floats are an inexact representations of fractions. It may also (not tested) slow the speed.


Fixed Point arithmetic ?


Python provides the decimal base data type, you are strongly encourage to use it when playing with code that relies on exact results (commerce, finance...). Hardware manufacturer (IBM being the leader) also scratches their itches on the topic. But it am still unable to know if  (a/b) is an integer, or if .5 is exactly 1/2,

Uncertainty estimates ? 


I sometimes wish operator have an enhanced mode where they not only give the result, but also the uncertainty. But, I don't know if it is feasible.

Porting haskell monads ?


Well, if I think it would be faster to switch to haskell than to wait for that to be ported in python.

Archery will therefore not support div for safe implementations


Since I want to ship a working version, and I don't want to debug impossible problems I therefore am reluctant to implement division for mappings. The problem is at the wire level, and I think  any workaround at language level is just plain wrong in terms of predictability, consistency, and performance.

I will flagged any mappings with div as tainted/unsafe, or gladly accept any propositions that makes it sane. My brain just can't figure any good solutions now. 

Education might be the solution


I don't know what happens in your country, but I have -especially in finance- seen educated monkeys using very sophisticated equations relying on exponential of matrices (exp(X) = Sum(pow(X,n)/fact(n)) to model economy (with retrocation) and use plain floats for the sake of «performance». Whenever I hear them, I shiver. Retraction + division implies the amplification of errors. And as far as I am concerned, for so called performance reasons (I dare say numerical analysis illiteracy), they don’t use error estimations.

Well truth is, as soon as you work with float, your intuition is wrong !

(a - b ) * c != a * c - b * c

Errors  in substraction and multiplication have not the same amplitudes and do not behave like linear equations. And if you use feedback models witch converge (reinjecting small portions of the output values in the input), you may have errors being greater than the signal.
Dont confuse exactitude and precision :  http://en.wikipedia.org/wiki/Propagation_of_uncertainty

Conclusion : the headache of symbols, the sanity of python


I am pretty much a computer illiterate,  I never code relying on what I know of computer, I rely on words, symbol and their meaning in the real world. When I code, I don't have the feeling to rely on scientific knowledge, I have the feeling to write an English essay. I take a short-cut by taking for granted that people will rely on what we share as a common knowledge in real world :
  • I expect + - / * to behave like in math ; 
  • I expect words to have the usual meaning;
  • I expect context to give me informations;
  • I expect object to be the subject, method to be the verb, and arguments to act as complements.
I stumbled on this div, because my rule works 99% of the time, but it fails in 1% of the time.  But, computers have their own logic. People should know it before claiming to be developers. On the other hand, a program not readable and modifiable by a human is useless.

So as a conclusion, I'd say, computers are imperfect, human are too, and the craft of programming is staying in a balance between computer literacy, and relying on human common knowledge. This is a hard path, since balance is about breaking rules wisely, and therefore being taken into default by the zealots of both chapels.

I also discovered I like python because the core developers are safeguarding the language core against weird ideas, and I like PEP because everything has been rationally pondered and the arguments behind every designs are stated.

Also, the scientific community being very active amongst python community and giving feedbacks,  I see python as the leader in numerical analysis problematic amongst the big 4 (ruby, perl, PHP, .net).

I also discovered, I shoud really peek at haskell monads.



No comments: