Multi-processing or multihreading ? Build your own distributed system to find the answer.

My (gross) latin teacher used to say, stuff always sounds more profound when said in latin.
And then proceeded to close the door of the toilets saying : « fluctuat, nec mergitur » (it barely floats, but is doesn't sink).
Let's talk about distributed systems ? Simply said it is swarm based computing : you have a pubsub as a databus and send commands to lightweitgth probes that answer back on the bus.

Almost always using a network socket as a bus.

On these sockets we send tasks and expect results over the wire. Very often, distributed system are used at the core of ... SRE/measurement system.

And the first think a hard taught real life lesson on distributed system will always teach you is : you know nothing of time.

This old war story has been deprecated thanks to the work of the python devs on the PEP418. But, there used to be a time, your carefully timestamped task made in python would come back executed in the past. Leading to a mysterious bug in the invoices of customer were a non expensive land line call going through a SIP gateway would cost thousandths instead of cents.

The first thing you do when you code a distributed system is to make it measure time, and its own reaction in face of congestion.

You see, you : the distributed system guy you are probably being PAID so people can observe when incident in production OCCURES, which often is when part or all system is saturated/congested.

So, first think you do when building a distributed system is to be the hammer that will burn the CPU and measure that it behaves well. Basically, the torture test is the test itself.

Copying from the zmq book

Here I found a very minimalistic framework so I adapted the REQ/ROUTEUR model to the code in annexe.
As you can see the code is both Multi threading, multi processing. Hence, you can make a nice trick : measuring the impact of the measure of time in fonction of multi processing/multi threading.

It's pretty meta, but it yields results : Lauching workers 128 by 128 on a 4 core intel pentium in multi processing on linux gives this result :
TIME IS LINEAR WITH TIME. You don't know what happens on the system, which fault it is (your code saturing the cores for instance) but you want time to stay monotonicly shaped and looking like x=f(x).

However, with multi threading .........

Clearly times measured by the system is less desirable.

But, look, I have virtual machine with freebsd 14.0 coming from another post... why not try with freeBSD too ? (hence varying the « nested virtualisation, and OS parameters)

Well, I don't know why :
  • freebsd time() call return time within a second resolution with multi processing
  • the curve for time = f(time) is clearly dilatating faster on multithreading also on freeBSD in a non nice smooth way.
You see, this is where quarrels about multi threading and processing should stop : when you can observe that is matters (or not).

And, all of this is in a very peculiar context of using pyzmq, python and linux or freebsd, not mac or windows. Actually, the API for both of those are so compatible that python gives us the luxury to not care about the decision of choosing either threading or processing. We can always change our mind... And it's cool.

Annexe I

No comments: