While the idiot look at the average, the tamed system engineer might want histograms.
An histogram often reveals more than meet the eyes, one of the most important stuff we look for is mulmtimodal.
When everything happens randomly in a distributed system metrics are gonna be spread with a law which according to the randomness at play will often results in a peak value and a spread in the histogram.
If you are interested in early detection of agents leaving the hurd or the impact of a modification on the metrics, non homonegeity in causality will spread in your histogram.
user story
I want to plot an ascii historam because I am on a server without Xserver/wayland coming from CSV in the form timestamp value. And I would like to use at much gnuplot-lite.
And I still havn't installed any other stuff than bash, so I would like to parse the file in bash :D
Solution
Well thanks to stackoverflow the gnuplot part is easy. The more annoying part is bash and it's unablity to deal with floats and the fact I read ratio of load usage (ranging from 0 to ... 4 since I have 4 cpus).
$ cat plot_histo.sh declare -a HISTO SCALE=${2:-1} TOTAL=0 LINE=0 X=${X:100} Y=${Y:40} while IFS= read p; do LINE=$(( LINE + 1 )) DATA=$( echo $p | cut -d " " -f2 ) DATA=$( printf "%.0f" $( awk "BEGIN { print $DATA * $SCALE }" ) ) TOTAL=$(( TOTAL + DATA )) HISTO[$DATA]=$(( ${HISTO[$DATA]} + 1 )) done < $1 for i in "${!HISTO[@]}"; do echo "$i ${HISTO[$i]}" done > histo.data echo "total sample:$LINE" echo -n "average value:" awk "BEGIN { print $TOTAL/$LINE }" gnuplot -e "set style histogram rowstacked gap 0 ; set terminal dumb $X $Y ; set xtics $SCALE; bin(x,width)=width*floor(x/width); plot \"histo.data\" smooth freq with boxes ;"Not very impressive :D Here it is with data ranging from 0 to n :
$ $X=100 Y=40 ./plot_histo.sh ./badazz%probe_received.csv total sample:1211 average value:13.3386 500 +-----------------------------------------------------------------------------------------+ | + + ******* + + + + + + + + + + + + + + + | | * * "histo.data" ******* | | * * | 450 |-+ * * +-| | * * | | * * | 400 |-+ * * +-| | * * | | * * | | * * | 350 |-+ * * +-| | * * | | * * | 300 |-+ * * +-| | * * | | * * | | * * | 250 |-+ * * +-| | * * | | * * | | * * | 200 |-+ * * +-| | * * | | * * ***** | 150 |-+ * * * * ***** +-| | * * * * * * | | * * * * * ****** | | * * * * * * * | 100 |-+ * * * * * * * +-| | * * * * * * * | | * * * * * * * | 50 |-****** * * ****** ****** * * *******| | * * * * * * * * * * * * | | * * * * * * * * * * * * | | * + * + * + * + + + + + + + + * + * + * + * + * + * + * + * | 0 +-----------------------------------------------------------------------------------------+ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23The bimodal comes from me launching a new series of probe on the experiment, and the initial data of probe received per campaign of measurement shows a bimodal where I have either ~7 easures per campaigns and almost no loss to 20 probes per campaing and more spread.
A useful tool :D
But bash don't like floats and well, CPU is all about floats, how does it behave ? Do we see a bimodal from launching a measure ?
$ ./plot_histo.sh ./badazz%load_15.csv 100 total sample:958 average value:118.201 50 +------------------------------------------------------------------------------------------+ | + * | | * "histo.data" ******* | | * * | 45 |-+ * * +-| | * * | | * * | 40 |-+ * * +-| | * ** | | * ** | | * ** | 35 |-+ * ** +-| | * *** | | * *** | 30 |-+ *** *** +-| | *** *** | | *** *** | | *** *** | 25 |-+ *** *** +-| | *** ***** | | *** ***** * | | *** ***** * ** | 20 |-+ *** ***** * ** +-| | * *** ***** * *** | | * **** ***** ** *** | 15 |-+ * **** ****** ** *** +-| | * **** ****** ** *** | | * * ****** ** ****** ** ***** | | * * **** * ** ******** ** ***** | 10 |-+ * * **** * ** ********** ** ***** +-| | * *** **** * ** ********** ** * ** ***** | | * *** **** * ** **** ********** *** * ** ******* ** | 5 |-+ ****** ****** * ** * ******************** ************* +-| | ****** ****** * ** * ******************** ************* | | ************* ****** ********************************** | | ************* * ** * ********************************** | 0 +------------------------------------------------------------------------------------------+ 0 100 200By adding a scale factor of 100 I made all measures expressed as a ratio of 1 turn into a ratio of 100, hence creating bins of 1 percent of the ratio.
Hence by multiplying by 10 I make the bins of 1/10 th the size :D
$ ./plot_histo.sh ./badazz%load_15.csv 10 total sample:963 average value:11.7985 300 +-----------------------------------------------------------------------------------------+ | + | | "histo.data" ******* | | | | | | ***** | 250 |-+ * * +-| | * * | | * * | | * * | | * * | | ***** * * | 200 |-+ * * * * +-| | * * * * | | * * * * | | * * * * | | * * * * | | * * * * | 150 |-+ * * * * +-| | * * * * | | * * * * | | * * * * | | * * * * | | * * * * | 100 |-+ * * * * ***** +-| | * * * * * * | | * * * * * * ***** | | * * * * * * * ****** | | * * * * * * * * * | | * * * * * * * * * | 50 |-+ ****** * * ****** * * * * +-| | * * * * * * * * * * | | ***** * * * * * ****** * * | | * * * * ********** * * * * * ***** | | * * * ****** * * * * * * * * * | | * * * * * + * * * * * * * * * | 0 +-----------------------------------------------------------------------------------------+ 0 10 20We do have a bimodal on a computer doing nothing but measurement. When scrutinizing a computer for data most people think for time serie data as the most important, but frequential data are also an important tool especially for non linear phenomenon : a change of number in the modal of a data indicates that somewhere on the system « something brought a perturbation ». It is basically useful for early detection of something going wrong or better and thus an important tool when making a time oriented measure.
No comments:
Post a Comment