Imagine, you build a distributed system.

While the idiot look at the average, the tamed system engineer might want histograms.

An histogram often reveals more than meet the eyes, one of the most important stuff we look for is mulmtimodal.

When everything happens randomly in a distributed system metrics are gonna be spread with a law which according to the randomness at play will often results in a peak value and a spread in the histogram.

If you are interested in early detection of agents leaving the hurd or the impact of a modification on the metrics, non homonegeity in causality will spread in your histogram.

user story

I want to plot an ascii historam because I am on a server without Xserver/wayland coming from CSV in the form timestamp value. And I would like to use at much gnuplot-lite.
And I still havn't installed any other stuff than bash, so I would like to parse the file in bash :D

Solution

Well thanks to stackoverflow the gnuplot part is easy. The more annoying part is bash and it's unablity to deal with floats and the fact I read ratio of load usage (ranging from 0 to ... 4 since I have 4 cpus).

$ cat plot_histo.sh 
declare -a HISTO
SCALE=${2:-1}
TOTAL=0
LINE=0
X=${X:100}
Y=${Y:40}
while IFS= read p; do
    LINE=$(( LINE + 1 ))
    DATA=$( echo $p | cut -d " " -f2 )
    DATA=$( printf "%.0f" $( awk "BEGIN { print $DATA * $SCALE }" ) )
    TOTAL=$(( TOTAL + DATA ))
    HISTO[$DATA]=$(( ${HISTO[$DATA]} + 1 ))

done < $1 
for i in "${!HISTO[@]}"; do
    echo "$i ${HISTO[$i]}"
done > histo.data
echo  "total sample:$LINE"
echo -n "average value:"
awk "BEGIN { print $TOTAL/$LINE }"

gnuplot -e "set style histogram rowstacked gap 0 ;
set terminal dumb $X $Y ;
set xtics $SCALE;
bin(x,width)=width*floor(x/width);
plot \"histo.data\"  smooth freq with boxes ;"

Not very impressive :D Here it is with data ranging from 0 to n :

$ $X=100 Y=40 ./plot_histo.sh ./badazz%probe_received.csv 
total sample:1211
average value:13.3386

                                                                                                    
  500 +-----------------------------------------------------------------------------------------+   
      |    +   +    *******  +   +    +   +    +   +    +   +    +   +    +   +    +   +    +   |   
      |             *     *                                                "histo.data" ******* |   
      |             *     *                                                                     |   
  450 |-+           *     *                                                                   +-|   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
  400 |-+           *     *                                                                   +-|   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
  350 |-+           *     *                                                                   +-|   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
  300 |-+           *     *                                                                   +-|   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
  250 |-+           *     *                                                                   +-|   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
      |             *     *                                                                     |   
  200 |-+           *     *                                                                   +-|   
      |             *     *                                                                     |   
      |             *     *                                        *****                        |   
  150 |-+           *     *                                        *   *    *****             +-|   
      |             *     *                                        *   *    *   *               |   
      |             *     *                                        *   *    *   ******          |   
      |             *     *                                        *   *    *   *    *          |   
  100 |-+           *     *                                        *   *    *   *    *        +-|   
      |             *     *                                        *   *    *   *    *          |   
      |             *     *                                        *   *    *   *    *          |   
   50 |-******      *     *                                   ******   ******   *    *   *******|   
      | *    *      *     *                                   *    *   *    *   *    *   *    * |   
      | *    *      *     *                                   *    *   *    *   *    *   *    * |   
      | *  + * +    *   + *  +   +    +   +    +   +    +   + *  + * + *  + * + *  + * + *  + * |   
    0 +-----------------------------------------------------------------------------------------+   
      3    4   5    6   7    8   9    10  11   12  13   14  15   16  17   18  19   20  21   22  23

The bimodal comes from me launching a new series of probe on the experiment, and the initial data of probe received per campaign of measurement shows a bimodal where I have either ~7 easures per campaigns and almost no loss to 20 probes per campaing and more spread.

A useful tool :D

But bash don't like floats and well, CPU is all about floats, how does it behave ? Do we see a bimodal from launching a measure ?

$ ./plot_histo.sh ./badazz%load_15.csv 100
total sample:958
average value:118.201

                                                                                                    
  50 +------------------------------------------------------------------------------------------+   
     |                                             +         *                                  |   
     |                                                       *             "histo.data" ******* |   
     |                                     *                 *                                  |   
  45 |-+                                   *                 *                                +-|   
     |                                     *                 *                                  |   
     |                                     *                 *                                  |   
  40 |-+                                   *                 *                                +-|   
     |                                     *                **                                  |   
     |                                     *                **                                  |   
     |                                     *                **                                  |   
  35 |-+                                   *                **                                +-|   
     |                                     *                ***                                 |   
     |                                     *                ***                                 |   
  30 |-+                                 ***                ***                               +-|   
     |                                   ***                ***                                 |   
     |                                   ***                ***                                 |   
     |                                   ***                ***                                 |   
  25 |-+                                 ***                ***                               +-|   
     |                                   ***              *****                                 |   
     |                                   ***              *****     *                           |   
     |                                   ***              *****     *           **              |   
  20 |-+                                 ***              *****     *           **            +-|   
     |                               *   ***              *****     *          ***              |   
     |                               *   ****             *****    **          ***              |   
  15 |-+                             *   ****             ******   **          ***            +-|   
     |                               *   ****             ******   **          ***              |   
     |                           *   *   ****** **        ******   **        *****              |   
     |                           *   *   **** * **        ******** **        *****              |   
  10 |-+                         *   *   **** * **      ********** **        *****            +-|   
     |                           * ***   **** * **      ********** **  *  ** *****              |   
     |                           * ***   **** * ** **** ********** *** *  ** ******* **         |   
   5 |-+                        ****** ****** * ** * ******************** *************       +-|   
     |                          ****** ****** * ** * ******************** *************         |   
     |                          ************* ****** **********************************         |   
     |                          ************* * ** * **********************************         |   
   0 +------------------------------------------------------------------------------------------+   
     0                                            100                                          200

By adding a scale factor of 100 I made all measures expressed as a ratio of 1 turn into a ratio of 100, hence creating bins of 1 percent of the ratio.

Hence by multiplying by 10 I make the bins of 1/10 th the size :D

$ ./plot_histo.sh ./badazz%load_15.csv 10 
total sample:963
average value:11.7985

                                                                                                    
  300 +-----------------------------------------------------------------------------------------+   
      |                                            +                                            |   
      |                                                                    "histo.data" ******* |   
      |                                                                                         |   
      |                                                                                         |   
      |                                                   *****                                 |   
  250 |-+                                                 *   *                               +-|   
      |                                                   *   *                                 |   
      |                                                   *   *                                 |   
      |                                                   *   *                                 |   
      |                                                   *   *                                 |   
      |                                 *****             *   *                                 |   
  200 |-+                               *   *             *   *                               +-|   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
  150 |-+                               *   *             *   *                               +-|   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
      |                                 *   *             *   *                                 |   
  100 |-+                               *   *             *   *    *****                      +-|   
      |                                 *   *             *   *    *   *                        |   
      |                                 *   *             *   *    *   *    *****               |   
      |                                 *   *             *   *    *   *    *   ******          |   
      |                                 *   *             *   *    *   *    *   *    *          |   
      |                                 *   *             *   *    *   *    *   *    *          |   
   50 |-+                          ******   *             *   ******   *    *   *    *        +-|   
      |                            *    *   *             *   *    *   *    *   *    *          |   
      |                        *****    *   *             *   *    *   ******   *    *          |   
      |                        *   *    *   *    **********   *    *   *    *   *    *****      |   
      |                        *   *    *   ******   *    *   *    *   *    *   *    *   *      |   
      |                        *   *    *   *    * + *    *   *    *   *    *   *    *   *      |   
    0 +-----------------------------------------------------------------------------------------+   
      0                                            10                                           20

We do have a bimodal on a computer doing nothing but measurement. When scrutinizing a computer for data most people think for time serie data as the most important, but frequential data are also an important tool especially for non linear phenomenon : a change of number in the modal of a data indicates that somewhere on the system « something brought a perturbation ». It is basically useful for early detection of something going wrong or better and thus an important tool when making a time oriented measure.

Imagination

Plotting ASCII histogram with bash from time series

user story

Solution

No comments: