Socat, netcat, nc, tcpserver and open source "moulé sous les aisselles".

In my last post, I explored Laurent Bercot's premisce that for making a piped based server requiring a script to only know how to talk on stin/stdout tcpserver we don't need no systemd.

I experimented with tcpserver, and it worked fine.

I also « stofled » (went my way trhough stack overflow's answers) the topic and re-discovered nc, netcat, socat possible alternatives to tcpserver.

Well, as a first, all these tools are great. But, one conquered my heart : socat.

First, all honests persons should state their bias : I am a linux user since 1993 (slackware, debian) and since systemd have got some freeBSD, openBSD, and devuan. I long for systems that are idiot friendly. A special kind : the one that are not afraid to read the documentation and go on the web page of the software to get the upstrem documentation.

For famlily reasons, my main battle-station is a 12 yo core i3 with linux mint so to say debian, hence software are rarely conforming with upstream vanilla since debian political commisars have got weired ideas about how to package software.

But first, before this long digression about a certain idea of free software, let's remind the core of the problem :

Making a shell script act as a server by reading from stdin, writing to stdout and having a magical stuff transforming the shebang in a multi-connection server.

An echo server would probably look like :
while [ 1 ]; do
    read a
    echo $a
    echo $( export ) # maybe the server can see our IP address in the environment variables set ?
done
And with telnet if you connect on the port this server listen to everything you write is repeated.

Let's review the « There Is More Than One Way To Do It » (Perl's (in)famous TIMTOWTDI vs python « one best way » state of mind).
 tcpserver 127.0.0.1 1234 ./echo.sh 
https://cr.yp.to/ucspi-tcp/tcpserver.html . A tool smelling of BSD's spirit of doing one thing and only one thing well.
Netcat's way. Well. They are 3 netcats !
  • The root of all nc «hobbit» netcat, that is not available anymore ;
  • the openBSD netcat fork (also called nc) providing more protocols ;
  • the NMAP fork providing the -e option that I need (also called ncat) ;
As a result you can only do the same as tcpserver with NMAP's netcat (aka ncat) :
ncat -nlvp 1234 -e echo.sh 
Where l means listen, p port, n stand for numeric dotted quad IP address, v is verbose, e stands for execve. Trust me explainshell.com does an awesome job at explaining command line arguments in a readable way better than me.

And then, there is socat's way :
socat        TCP4-LISTEN:1234,bind=127.0.0.1,fork,reuseaddr  EXEC:"./echo.sh" 
All of these servers are accessed by doing :
  • telnet localhost 1234
  • nc localhost 1234
  • /usr/local/bin/socat READLINE TCP:localhost:123
  • and in the previous post you can also roll your own client to have history with tcpclient


So far, all these tools DO what I want, and they all transmit the IP/PORT of the client with the same convention (TOOLS)_PEERADDR, (TOOLS)_PEERPORT where tools can be SOCAT, NCAT, TCPSERVER (check yourself).

So, all these tools are equals, benchmarking over, choosing the right tool is a question of taste and religion. Over.

Or is it ? ... my religion


I may love concision, but concision sometimes does not help memory. Socat raises above all else by being user friendly but not idiot friendly.

Socat has an awesome PATTERN :
 socat [opt] SOURCE_FAMILY:args,opts DEST_FAMILY:args,opts
This make myself hate this tool for being overly verbose, but loving it for saving my precious memory muscle. For instance, READLINE is a family of source/dest that is a wrapper around stdin/stdout to provide history and line editing out of the box, which is AWESOME when you test a server repetively.

Since DEBIAN, is still so special by not providing upstream packages in their vanilla flavour, when you try READLINE source with debian based distro you will have this cryptic error :
$socat    READLINE   TCP:localhost:1234
2024/05/16 10:14:10 socat[5947] E unknown device/address "READLINE"
Requiring you to visit the upstream provider of socat and enjoy a free software « moulé sous les aisselles à l'ancienne ».

I mean, no git, no reactive website, nothing fancy, first the changelog, a tarball with checksums you can download and a man page with examples.

Ah ! It reminds me so much of my early days on linux. And a golden nugget hidden in the not that obvious link to the repos.
Curated examples commented by the author (you also have in the man page)

I think it even beats openBSD amish's style by being feature rich and consistent.

So, because I WANTED my socat with READLINE I had to compile it, and it was a delight a ./configure for portability, and even though linux may not be the primary target I had very few warnings, few of them being scary.

Oh, and there is a test suite in bash (that I red of course) and it ... was nice to check the compiled software against the expectation of the author. I have a bug in the interaction between READLINE and openSSL.

And you see also an hidden nugget of a broker + fanin/fanout network pattern example in shell that makes me question my usage of ZMQ (I basically use ZMQ only for this).

At this level, I think my will for benchmarking went out of the technical path to become much more fandom.

I mean, it even has CHROOTING out of the box, pf integration, (very) basic ingres/outgres control (CDB), SSL tunneling, cork screwing (very specific options to pass ill configured firewall with UDP), STUN like features (another way of firewall piercing) explained in a concise but funny way.

It's like christmas before christmas, with nice gift falling from the sky with mnemonics strengthening my sysadmin usage of sockets (nodelay, lingering, addresse reuse and other options that are coined as everyone everywhere else).

socat is my new favourite tool because it has a learning curve that totally worths it. It is fitting my brain topology nicely, maybe not yours.

/me <3 gerhard at dest-unreach.org

Is systemd bloated ? A pubsub server in 200 lines of code in bash talking on stdout/stdin that does not require systemd

Two days ago I was reading this about a « new vendor locking » made in lehnart pottering (systemd) and it gave me the will to see how much the assertion of this Laurent Bercot was true.
The new vendor locking is about an old technique used in inetd.
It took me one day to decide what to code, and I began today using uscpi toolsuite.

To be honest, I am a little bit of a fanboy of DJ Bernstein. He is from the school of keep it simple stupid and ROCK HARD idiot proof. And ... I am an idiot of my kind. I just want to do server logic by talking ot stdin/stdout and I am pretty glad to see there exists a correct tool to connect stdion/stdout to a TCP socket.

I decided to add injury to the team of the « I need the latest shibazam* team » and to code in bash.

* shibazam being any of « memory safe », « cryptographically proven », « portable » ...

Doing a pubsub server in bash



a publish/subscribe server is a very simple concept : you have channels on which you write and concurrent readers who can read. It was first used for tickers of stock exchange.

A channel can be bijected to a file, and as long as you handle concurrent writing with an exclusive lock, nothing wrong can happen.

A TCP server is basically a REP/REQ Pattern, while true, you listen to what the user says and you answer and come back to listening.

The perfect job for a DISPATCH TABLE I use to bashize my makefile, hence, the code for which I could basically steal my own code.

I don't know what to say, because it actually worked as a skeleton (saying what it would do but actually doing nothing) below 2 hours flat after the beginning of the coding so I dare say using bash was neat and fun.

Sometimes, I have the feeling we over-engineer our solutions.

Securing the server ?



DJB gives some nice input out of the box with the possibility to add ingres and outgres network rules in a portable way (CDB). And, if you are (rightly) paranoid, and want to deploy it on the internet, I strongly advise to use the beauty of TCP/IP tunneling. Stunnel will add both private key authentication and ciphering without having to roll your own cryptography by tunnelling your clear text protocol in an SSL ciphered IP socket.

I really have the feeling modern days is about over-complexifying design to make money and justify our bullshit jobs. I know by having tried that this kind of solutions don't make it to production line because « it looks to simple and don't give warranties of being sure enough ».

Annexe

The code with a tad of explanation on how to use it in the "?)" case that took me 6 hours to code, SLOWLY. VERY SLOWLY.

Percolons ! Les simulations numériques à l'aide du monde réel en 200 lignes de code.

Je suis sûr que vous ne vous êtes jamais demander combien on pouvait débrancher sauvagement de routeur sur internet avant que les paquets n'arrivent plus à être routé ? Ou, quelle est la taille optimale de mouture (la taille à laquelle on moud le café) pour retirer le maximum d'arôme dans une cafetière à piston sans se faire exploser la cafetière (au propre comme au figuré).

Cette classe de problème est la percolation.

C'est un domaine dans lequel les solutions mathématiques exactes sont dures à calculer et pour lequel faire tourner des simulations aide. On va regarder ici une petite simulation de mon crû et aborder dans la foulée : Laué, les réseaux de Boltzman, Monté Carlo, et Galton.

Au début, venait un problème simple : cette impression que coder est déconnecté de la réalité, alors j'ai eu envie de détourner le summum de la branlette informatique en truc utile : le jeu de la vie.

J'ai un module python pour faire joujou (gof) et quelques gists dont un jeu de la vie en réseau héxagonal.

Pourquoi un réseau héxagonal et pas carré ?



L'intérêt d'un réseau hexagonal est qu'il est géométriquement plus régulier et moins déformé qu'un réseau carré utilisé pour le jeu de la vie. Selon Laué (mathématicien célèbre en cristallographie pour avoir étudier les impacts de la symétrie des réseaux sur les causalités) plus on a de groupes de symmétrie, mieux c'est.

Un réseau carré c'est L2, L4. Un réseau héxagonal c'est symmétrie d'ordre 2 (miroir), 3 (tiers de tours) et 6. Plus on a de symétries, moins on « diverge » du monde réel en introduisant du moirage.



l'Automate à état



Pour notre simulation on va introduire un automate à état simple dont le pseudo fonctionnement est :
Pour tout x de droite à gauche:
    pour tout y de haut en bas:
        Suis je vide ?
            au dessus de moi (2 choix) est-ce plein ?
                si oui prendre au hasard le contenu de la cellule en haut et le mettre dans la mienne
C'est encore plus simple que le jeu de la vie.

Une fois qu'on a fait ça, on utilise python-tk et on affiche des pseudos particules qu'on injecte en haut et on regarde comment elle tombe.

Si la pseudo-physique est bien faite, qu'on plante des clous virtuels à la sortie d'un flux de particule et qu'on les collecte « en bas » dans des « bins » (littérallement les bacs physiques de l'expérience de Galton avant d'être un terme de physique statistique quand on fait des histogrammes) ALORS je dois avoir une belle gaussienne qui se dessine.

Ceci est traité dans cette vidéo :


Résultat 1 : physique bolchévique et monté carlo



J'espère que vous me pardonnerez de planter aussi mal mes clous virtuels que réels, mais regardons LE premier résultat que j'ai en comparaison.


Le résultat est sans appel je suis : CRYPTO BOLCHÉVIQUE. Je code des abstractions dont les gaussiennes penchent à gauche ! Malheur de moi, j'ai la DGSI qui va débarquer si je ne deviens pas centriste républicain.

Avant de corriger : comprenons la géométrie du canvas : les x sont croissant de gauche à droite, les y croissants de haut en bas.

Ma simulation est normalement biaisées haut/bas, mais en scannant séquentiellement de droite à gauche, je favorise la chûte à gauche.

En simulation physique ce problème est connu dans les laboratoires, c'est la raison d'être de la méthode dite de Monté Carlo. On randomise pour casser les ordres qui n'existent pas.

Physique corrigée

Vu que python
random.randrange
est asbolument pas conforme à l'API de range, recodons là de manière saine
def randrange(x):
    c = list(range(x))
    shuffle(c)
    return c
Et remplaçons le
for i in range(x)
par
for i in randrange(i)
soit un mélange faisant disparaître l'anisotropie droite-gauche qui n'existe pas dans le monde réel et faisons retourner la simulation.
Hey, en méthodologie « Good enough », on peut s'arrêter là. Oki, la gaussienne est un peu ... trop phallique à mon goût, c'est mon coté Jul in Shape qui fait ses gainages à la boxe pour rester un adonis éternel.





PERCOLONS



Enfin, on peut s'intéresser à la mouture du café et internet.

Certains problèmes physique sont BIEN CHIANTS à calculer donc, il faut
  1. pouvoir s'épargner les calculs
  2. trouver les moyens de vérifier les calculs simplement
Et c'est là qu'on simule. Cette simulation imparfaite va nous permettre de commencer à PERCOLER.

Percoler pour un fluide, c'est passer à travers un réseau d'obstacle aléatoire. Comme par exemple de l'eau à travers des grains de café dans une cafetière à piston.

Si vos grains sont moutus, moulus? moudus? bref passés trop fin au moulin, ça vous explose à la yeule. Si vos grains sont trop gros, l'eau passe sans extraire le café. L'art de déterminer les bonnes tailles de grains et la bonne vitesse/pression d'eau est donc l'art de percoler. C'est pour ça qu'une machine à café s'appelle un percolateur.

Mais, ça touche aussi la conception d'internet.

Imaginez un réseau hexagonal de routeurs qui passe aléatoirement les paquets de droite à gauche, mais déterministiquement du plus court chemin du haut vers le bas, et cette simulation simulerait un inernet spécial.

Néanmoins, elle permet de bâtir une intuition à des questions comme : à combien de pourcent de nœuds détorioré le réseau peut survivre, quels sont les signes avant-coureur d'une trop forte dégradation (latence, débit, perte de paquets) ? Quelle est la topologie optimale pour fonctionner dans le mode le plus dégradé possible ?

Cette science de la percolation a incité internet à être structuré d'une manière hérétique pour les ingénieurs X/Ponts Télecom c'est à dire en étant peu étanche (très connecté) et partiellement stochastique aux frontières (lire la RFC BGP).

Internet est conçu pour avoir topoliguquement et dans ses choix d'algorithme de routage au frontière la résistance à la dégradation du réseau. On prétend qu'il a été conçu pour survivre à une attaque nucléaire globale.

Voilà à quoi ressemble un « run » de simulation :

Ce qu'il reste à faire (si j'étais pas fainéant)



Si vous êtes pas fainéant, vous faîtes des histogrammes sur des milliers de run. Le plus de simuls le plus l'incertitude diminue (vite) (test de Student).

Pour chaque valeurs de pertubations vous notez : la latence induite, les paquets perdus, la diminution totale ou partielle de flux et vous faîtes des histogrammes.

Normalement, vous allez voir apparaître une valeur de rupture abrupte où statistiquement le réseau passe de quasi 100% de proba de passant à quasi 0% passant qu'on appelle la valeur de percolation. Et pour cette valeur vous allez pondérer les métriques et voir une belle courbe en forme de sigma (une ... sigmoïde) qu'on appelle une courbe de transition. Ça ressemble à une réponse d'un transistor ? Et bien oui, un transistor est une application de la transition abrupte en percolation sauf qu'au lieu que ce soit des grains de matières macroscopiques ce sont des électrons qui sont impliqués, et la plage d'amplification est celle de la transition.

Voilà, des fois quand on fait trop d'info on en a marre de ne plus s'approcher du monde réel, alors une petite simulation physique ça détend.

Annexe : code final et screenshots

200 lignes de codes ! C'est queue de chie.

Why make a makefile? Can reproducible build ever be achieved again?

When I code I like to pride myself in doing code that works everywhere. If unit testing is a good way to go, the fundation of testing a tool chain does work the same everywhere is checking artefacts are the same.

I recently made a code for a sociogram and added at the top of my « make » (a bash script that does all assembling in a suppositly deterministic way) a claim that by using the same input you would have the same output.

Let's see if I lied by taking a snapshot of both the last frame of videos built on 2 different computers :




If the graph is the same, the topology is not. For something about building geometrical shape this may some questions.

First : why can't computer academics build graphviz, but applied physicist do ?

Graphviz is breaking an unspoken standard of academic computer programming : it is based on a probabilistic simulation with a lot of random in it. Basically you first layout the nodes randomly, and randomly you swap to nodes, count the numbers of edges crossing and keep if less edges crossed than before. The kind of dumb algorithm that works but INVOLVES RANDOMNESS.

Well, computer scientific DO reproducible build ! Don't they ? And RANDMONESS is non reproducible, isn't it ?

Hard Scientists (which exclude the litterature lovers called computer scientists) use PRNG when working, so we can reproduce our builds ?




Seems better :D But not perfect once we set the SEED of the random generator in the graphviz output (the only obvioous source of randomness in the code I control).

Let's wonder what kind of non deterministic element I introduced in my build ?

multi-processing/threading is non déterministic


Enough experience in coding will make you smell were danger is. But, I love danger so I knowingly wanted to be a GOOD hard scientist and make my CPU BURN to 100%. It is a stuff I learned to embrace in physics lab. So, I parallized my code (see bash exemple on how to to it with a snippet :
NB_CORE=$(getconf _NPROCESSORS_ONLN)
function pwait() {
    local nb_core=${1:-$NB_CORE}
    while [ $(jobs -p | wc -l) -ge $nb_core ]; do
        sleep 1
    done
}
for i in *dot; do
    $DOT -Tjpg "$i" > $( basename "$i" .dot).jpg &
    ppids+=( "$!" )
    echo -n .
    pwait ;
done
It does the same thing as python multiprocessing async apply : fork process in the background and wait for all the processes to finish before going on. And, it is clear by exploration of both videos that I miss frames, well, ffmpeg (involved in the process is quite explicit :

[swscaler @ 0x5580a8337840] [swscaler @ 0x5580a8375fc0] deprecated pixel format used, make sure you did set range correctly
[swscaler @ 0x5580a8337840] [swscaler @ 0x5580a8b9dd00] deprecated pixel format used, make sure you did set range correctly
So I put a sleep 10, and it helped ... but not enough, because well modern computing on linux is chaotic :
  • signal delivery is unreliable : signals may be missed or can be « acausal », a core can say : I finished while the kernel is still waiting for a file to be written
  • versions of software may differ even on 2 debian based distros (one is debian testing (hence outdated), the other is linuxmint current (less outdated)
So ... I did my possible to have the same result given the same parameters by fixing all I controled including seeds of PRNG to have the same results and suffice to say, that at the end of the day, EVEN FOR A SIMPLE SCRIPT deterministic « same results are impossible ».



It is fucking neither the same topology, NOR chronology. For a dynamic picture of a topology it kind of sucks terribly and should not be considered « good enough ».

It is good enough for a « side project », but not for a production ready project.

Famous last words


When I code in python, I have a freeBSD station, because BSDs are more boring than linuxes. However I can't play my favourite games with wine (Need for speed, Quake III, Urban Terror, various pinballs), hence the reason when I code for fun it's on my linux computers. (check my script to build a tailored freebsd qemu image on linux) but I dare say modern coding is not the « boring » activity I grew up with, hence my manic way of trying to make the most I can to ensure « maximum reproducibility in my power ». My power is about rationality, I give up when it comes to the entfshification of linux distributions that clearly went down the path of not caring, after all, you just need to spin a docker alpine to make stuff reproducible, don't you ?

I'm a single man army that want to code, not maintain a kubernetes cluster just for the sake of creating a « snowflake » of reproducibility that negates the purpose of coding. When I was in 1982 I could give my basic source code on a floppy and I was sure that another C64 would yield the same result given the same input. Nowadays, this is a wishful thinking.

The state of modern computing (especially on linux) is BAD. I even think of reconverting to COBOL on OS/360 to gain back a minimum of sanity back regarding my expectations.

I strongly advice devs to have put more effort on their assembling code (docker build, makefile, your own tool) than their « muscle code » (unit test included), because it's not in the code you control you might find the most surprising factor of loss of control, but in the lack of trust you should reasonably have from your hardware, your OSes and distributions. And it's a shift of normality, a new NORMAL, not an absolute NORMAL state.



Annexe

Here is the command line involved


SHOW=1 EDGE_SCALE=1 MIN_MAIL=20 \
  PERS_EDGE_SCALE=.2 BY_DAYS=50 SHOW=1 \
  THRESHOLD_ILOT=1 DOT="dot"  ./make very_clean movie
the « make » involved. And the main script. It DOES NOT EVEN MAKE 500 lines of code. It is small by any standards, even a « BASIC » program of the 1980s. I AM PISSED ! (╯°Д°)╯彡┻━┻

Addendum : and that's how I discovered the too silent fan stopped working silently without telling a thing in the log or the sensors complaining about heat on my laptop. Even hardware are becoming shitty. Addendum 2 : after some tinkering I discovered I nearly burnt my CPU thanks to debian removing fan control/cpufreqd ... AGAIN (╯°Д°)╯彡┻━┻ Now I must under clock this computer to make it compute anything because. Fuck debian !