This is a guest post, originally published by Philippe Humeau which he was kind enough to translate to English and allow us to publish it here. I found this post awesome and really wanted it to be shared with much broader audience in English language.
NBS System is a managed hosting company working mainly on Magento so they have a lot of Magento hosted to put it short (~200 sites). They also maintain a blog regarding Magento optimization and this is a translated post from it (Wikigento):
Here is a small contribution of the support and commercial best phones call, along with other « real life » experiences. The worst questions, the silliest ideas, the best wills hybridized with the worst technical solutions, just see by yourselves.
PS : We always laugh with no bad intentions. Every jobs has it stakes, constraints and problems, one’s difficulty is not to be laughed at but our day-to-day life can be rejoiced by some juicy results ! We also make mistakes, who don’t 🙂
Here are the best contribution to the “Magento Darwin Awards contest”:
Oscar of the most brilliant idea : The S.O.D (“The Slider O’ Death”)
Brillant, simple and efficient concept. Just imagine a homepage of a fashion brand website.
At the bottom of it, 5 cursors you can move in order to position them on the product, size, color, type you want. IE : shoes, blue, suede model, 8”5, adult male, etc.
Okay, now just imagine, 15 product types, 6 sizes, 6 product model on average, 12 colors, 6 types (Adult M/F – Kids M/F – Teenager M/F) : 38 800 possibilities… You move one cursor, this cause 38 800 SQL requests… Actually a bit less because you can move only one category at a time.
In order to make it a *real* server killer, the developer had a good intention: Ajaxify all of this to make the Database server die in background silently while trying to send back the results. You move the size slider from left to right, 6 positions, you take the 38 800 SQL requests. Of course, one could have put a temporization or a cache system to make the thing a bit lighter but, guess what…
Now just image 100 people connected at the same time playing with the awesome feature. Or just Kevin, 12 years old, taking daddy’s mouse and moving a cursor quickly from left to right…
Back office talks: Oops, the database server died, any idea of what happened ? Logs shows he died after trying to answer 1 million requests per second…? Are we hosting Amazon and nobody told me ???
You liked S.O.D concept ? Okay, here is a real player : The O.M.O.D (“On Mouse O’ Death “)
They only rank second since someone had the original concept first but they deserved to have a place in the pantheon since this method is far more efficient to create a D.O.S situation. No more need to clic !!!
Take the same concept almost but with a On Mouse Over to send the SQL requests… You move around a bloc of products with your mouse and “dynamically” (or is it “dramatically”), this loads the products in the category to populate the main page space…
10 categories, 200 products per cat. Make a full move, this loads 2000 products from the database, now move quite fast. YESSSS ! this works, no cache mechanism, we can D.O.S it without any clics !
Back office talks: Did anyone understand why the database server, usually asleep with Magento, is trying to suicide itself !? I told you guy before, never breed a database server with +1/4 spin electrons, this make them sick !
The Big Bad Cron : 2 Go should be enough for everybody, almost…
Crons, our friends, our worst nightmares. The only word is enough to make a Level 1 support guy to pull the plug.
Usually, they make automatic imports of flat CSV files or batch processing of any needed files. Theses scripts are consuming more and more RAM and CPU until exhaustion because they never meet an efficient exit condition or just loop insanely on the same lines.
One or two Gigs later, the servers launch an automated alarm saying something like :
please, can you disable this stupid bunch of code before I trigger
some friendly self defense mechanism like kill -9 ?
Truly your, the server.”
Of course, later on, you proposed to limit the memory a PHP process can consume to a far more reasonable value because this limit is set to all PHP process and this can get very very dirty. As the developer feels it is a unnecessary precaution, he code the killer cron another way, including some tricky optimization that consume CPU instead of RAM…
RAM to CPU to Trade Off. I mean when you see the function name, you already feel this is going to be fun… Well the server was about to reach a load of 80 (100% load = 8 on this 8 core server).
Back office talks: Guyz, I’m trying to get a shell to the server but he doesn’t answer… Any ideas ? Is he gone for lunch ?
We also make mistakes!
As we are not more perfect than the others, we sometimes do mistakes.
We decided to use some self defense mechanism against those « Killer crons ». Sysfence was chosen to watch the resources consumption and if a process is going to kill a server by consuming too much, he tries to stop it, if not possible the process gets a “friendly” kill and if it tries to stay, a quite virile “kill -9” (far less friendly).
The catch was, when the swap space becomes to be filled, the Sysfence consider it like a problem (which is the case) and tried to restart apache. Actually, Apache understood that resistance is futile (and sanctioned by a kill -9) and restarted. But the Swap doesn’t instantly free the allocated space and Sysfence saw the result was not good and… Restarted apache again… and again… and again… …
Back office talk: Is it normal the website of our customer is… blinking? I’m there, oops, I’m there, oops…
(of course, the fix is to let a bit of time, like 10 seconds, before starting the process again)
My Java is beautiful
This is not directly related to Magento but one of our customer uses a very user friendly catalog presentation. You know the kind of “like if you had a real paper catalog in your hand” thing. This beautiful piece of software is provided with a lot of “must_have” features like the possibility to make a 3° counter clockwise rotation. So handy…
The neat feature is that the catalog is able to “auto scale” to the browser heigh & width, providing an always perfectly scaled catalog. Once again, intention was good, implementation was, how to say, not that clever.
Each time someone come with their parameters (my browser isn’t full screen, so my usual resolution is 1211*940, which is not yours I bet), the server had to calculate all the images displayed to make them fit the browser. Just had to this a beautiful Java bug which never trigger the garbage collector, put 10 000 people per day looking at the “paper like catalog” and you have a server allocating ~200 Mo per minutes…
Theses servers are a braves one and have only died after 80 Go of swap were allocated, after 6 hours and a half… No solutions here, product editor has disappeared (which is not really a surprise up to Darwin laws) and we still have to kill the JVM on an hourly basis.
I’m the supervisor!
This is the story of a programmer who’s regular process (don’t say the name Cron please) are consuming too much resources. Sounds familiar no ? Well this one is a bit different. As we don’t want to raise the php_memory_limit, even at night, above a reasonable value (and yes reasonable is not the same value to programmers and to sysadmin), the guy found a way.
He tests the server load with a PHP daemon (yes this starts to be fun), on a regular basis, and when the load is getting low, he launch its batch processing. If the processing gets too resources angry, he stops it, nicely.
When we say regular basis, we wouldn’t have thought he was planning on testing the load every 35 ms… So every 35 ms, this process awakes and launch a dramatically “not optimized” free resource check, consuming a lot of CPU and RAM, like 10 times what is needed by a C process to do the same. As the PHP daemon needs more than 35 ms to fully process the task…
This way of proceeding, with this frequency is quite CPU intensive itself to say the least.
Back office talk: WHAT IS THAT ®#¤! “batch_supervisor” which is triggering … oops. Connection reset by peer. 100% packet loss.
My slave is beautiful
Hello. I Mme Doe from (a major) company.
Hello, I’m Philippe, what can I do for you ?
[…] I’d like when you can come to our office to setup our servers? […]
Oh, I’m very sorry, we don’t do that kind of things. We only optimize our customers architectures, not the one of your actual managed hosting company, which is a competitor of us. (this would be like shooting a nuke in our foot no?)
But wait, you will be paid for it. *450 € per day !* (emphasis). But you have to be there tomorow and for the next 5 days in a row.
Oh, my mistake, I think I didn’t made it clear. We don’t do this kind of work madam.
Even if we did, (and we don’t), a Magento Expert, no matter if he is a developer of Admin, will never charge such a price. More, you can’t call anyone one day and ask them to be there on the next morning, we’re an expert companies, people have schedules.
Listen, I can see you make no efforts, haven’t listened to our generous proposal and I think we won’t work together!
Damn, I’ll get fired to have refused such amazing offer…
May I post you my site ?
This is the story of a guy willing to put its customer site online. After trying for 4 hours unsuccessfully, he calls the L1 support and announce:
You system is buggy, it stops the transfert after only 5 Mo and I just can’t put this site online.
Okay, can you please provide some detail, like what kind of protocol have you tried to put this site online ? FTP, SFTP, SCP ou SVN ?
Listen, I just tell you this stops transferring avec 5 Mo, this is not a about the protocol.
Sorry, this could help us to find where the problem is, would you just please tell me the used protocol ? I swear after I won’t bother you anymore with this question.
I’m using OTRS.
Our Level 1 ticket submission tools ???
But it’s not made to publish a website. The upload field is made to optionally send us a screenshot of the problems but you have to use the provided credentials along with a protocol dedicated to file transfer like SFTP or SVN.
You swear about the protocol!
Grand Jury special price of the tricky-to-find-glitch : I SVN, you SVN, I killed the servers.
This time we talk about a quite famous brand doing a special incentive and mailing campaign. 20 days before the campaign, the project leader call the web agency and say: “Can you please prepare a special home and make 2 or 3 modifications ?”. “No problem madam”.
Special sales starts :
- 500 connected people, good but slow to load… ?
- 1 000 connected, oops, load is rising fast on all 3 servers
- 1 500 mayday mayday, server down, I repeat server down !
We look for the problem, « have you changed anything today or recently ?”. “No, nothing, all is nominal”.
2 hours after, we found the problem. The Web Agency had overwritten a file named /var/www/eu/app/etc/use_cache.ser with its own local version. Guess what, this file is controlling the cache behavior and guess what, developers usually cut them to instantly see the changes when the modify a site (which definitely make sense). But when you upload this file to the server and disable all Magento cache mechanism, this is a very bad situation for the servers…
Back office talk: chmod 500 /var/www/eu/app/etc/use_cache.ser. Won’t happen again, believe me.
My home is slow…
« In the developer mind » : Okay, I take my 6 000 products, I make a loop, for each product on my home, I launch my amazing loop to find among the 6000 the one I have to load and I fill the 200 slots on my home.
Damn, it’s slow, am I hosted on a Pentium 3 ???
Back office talk: Multiplication is quite tricky concept. I’ll make it slow for you. 6000*200*[nb of users] means that …………
Published by Inchoo with the authorization of Wikigento, NBS system and the author.