Too many knobs is not a good idea

Have you ever done the electricity board in a house and developed software?

If so, you will know the basics of not burning your house down: Keep It Simple Stupid.

Electric board are built by evenly setting the charge on segment in parallel, than chaining small sub segment in series.

The rule is balance the load on every segment, isolate them and every sub-segment.

A classical modern house will have 10 sub-segment being potentially in 0/1 states. Hence 1024 states which is quite a lot of complexity.
 
If you add equipment on the circuitry or incorrectly estimated your load compared to your input you may:
- use too much power compared to the input (resulting in power loss);
- use too much power compared to the size of your wire and overheat some cables hence having local overload.




If electricians did not knew their job it would happen, resulting in unsafe configurations. To find them a method would be to turn on and off every apparatus in the house to explore safe and unsafe configuration. Happily in 2016 it almost never happens in correctly built housing.

Something even worse could happen though: short circuits. if someone by laziness put a non protected segment directly on the main power and forget to limit its power output, you can overheat so much a wire it creates a fire.



Well, electricity boards best practice is a sane example of how to avoid disaster by tearing down a complex problem in simple practice. Still the «state» described by the list of all potential configurations is equal to 2^(number of switchs). So 10 apparatus is still a 1024 states.

Now let's talk of software. A software like mysql has 400 parameters. Some of them able to take discreet values ranging from 0 to all your memory (in size) others taking between selected values.



The number of states describing a software is way more complex than the number of states achievable by using a whole building integral electrical circuitry.

Worse : in computer isolation is like sex for the teenagers everyone talks about it, hardly anyone do it.

Hence in our software that are memory bound/CPU bound we often have parameters that antagonize one another.

In mysql so "mem" parameters affects the main engine, others the memory used to serve each requests. Some are global some are local. But memory is fixed in size. The job of sysadmins is a tad like an electricians trying to balance with the knobs knows as software configuration parameters the load on both CPU and RAM and IO  so that your software is at max speed of working.

Software don't get under powered or short circuit. But if set wrongly the parameters can make the software degrade its performance in a chaotic dramatic way.

For instance : memory access is around 4 to 100cycles in cache, 1500 to memory, a lot to hard drive, SSD and network.

Your worst nightmare is swapping: running out of memory and having to freeze/thaw data from out of the silicon of the motherboard.

So, with 400 parameters that are not isolated from one another configuration is for a human impossible.

Most wannabe consultant will say: I do not care the price of memory is so cheap we can still buy more memory, I will object that L1/L2/L3 are not extensible and they yield up to 100x improvement in speed. Others will say, as long as it works even slow it still works. I will object that slowing by 10000 thousands your requests may make your request timeout.

The simple part of scalability/availability (mysql doc)
In fact, performance problem snowballs. The more slow query/buffer are being filled, the more it will encumber your memory with relics of past queries in which the state is unknown.


Some sites like mariadb will propose you template. Some company can send you consultants. There is here another problem that I wish you to encounter : you can have success.

A misconception from developer of IT at my opinion is perfect memory is a memory that remembers everything.

My theory is a perfect IT system has a moving window of data that are kept with degraded precision over time and/or irrelevance. You wish your data to stay in a constant size of memory whatever the future is. The only data that should be kept are the one that are relevant and that is information.

Most companies I have worked with let data grow in size while they let their database grow in numbers of entries. The growing number of users can be easily handled (horizontal sharding/partitioning). The other one requires to understand that database are relational. Which often means to be careful to try to make partitions that limits the hop in requests from computers to computers, burning up to 1000 000 000 cycles.   That is coupling the exact opposite of isolation.


On top of this you probably have web services requiring operations to happen in a timely fashion. Most companies just ignore capacity planning. They prefer «scalability». It is a generic term saying that they think they know how to double the hardware (and thus operational costs) and gain a 40% gain in throughput, ignoring the epsilon latency introduced in the process. Still hardware nowadays and electricity are still way lower than workforce. A daily coder's pay is worth a server. So why care?

So modern software have a nasty mathematical property by design, the more success you have the more bound you are to double your cost for every 40% increase in your user base on your underpowered services.

Still mysql doc: it is in the HA groups you add your servers for "scalability"


I know you double initially low expanses. But still it is an exponential growth that is more than linear ... inducing a diminishing return for the acquisition of new customers. And as Microsoft, google, facebook proved it, software industry is about network effects: the more people use your software, the more value it has. It is a realm of natural monopoly and diminishing return thus of prices having to grow the more a software have success.




In this I did not even dared speak of the effect of a failure due to a timeout propagating in a web of interconnected (coupled) services. These local software failures can dramatically trigger by a snowball effect the loss of mutualized resources... like amazon in eastern zone :)



My conclusion is the following; for simple reasons of too many knobs software are hard to configure, and there is no silver bullets (distributed databases, nosql also suffer this flow). I took the example of mysql because it is a fairly commonly used piece of infrastructure in backend technology. But on top of that you may have routers, firewalls, load balancers, memcached....

In computer industry at the opposite of electrical workers we seem to ignore in our models the costs of our lack of isolation and we create the equivalent of shortcuts. 

The too many knobs effect coupled with the lack of isolation triggers leaks of behaviour making our software able to fail in worst case in dramatic ways and propagate as far as affecting large unit of works (datacenters, cloud...).

Hybris is our problem. It is not there are bad developers out there that is the real problem. It is much more we believe in our capacity to handle a lot of complexity. But some level of complexity are not in the reach of our brain.

The software industry is ultimately human bound in its capacity to handle a given amount of complexity/information in its brain.