Escape from EAV the Magento way

Featured Image

One of the differences between Magento eCommerce platform and lets say WordPress, when looked from developer point of view, is “avoid raw queries” approach. For a web application, Magento is massive system. His database, although not so massive but surely breathtaking with around 220 tables forces you to use “eye candy” EAV model to do even simple things.

For instance, if I were to tell you that I want you to retrieve 5 simple products from database that have price in the range of 200-600USD you would most likely spent 1-4 hours trying to work out the query. In the end you would be lucky if you even get the query that wont break on next Magento upgrade. Let me just remind you on some of the things you need to be careful in example above.

Your query would have to take in consideration the at least the following states: is my product visibility such that it can be seen in Catalog and search result, is my product enabled or disabled, is is out of stock, which website and store view has it been assigned, is it assigned to only one category and that category is disabled by some chance, does it have special price assigned, does it have promotion rules assigned, does it have content (descriptions) for multilingual site assigned.

All of those and more are questions that one has to take into consideration when doing raw database queries. It makes no sense to create query that fetches product and all its appropriate info (price, quantity, description…) if that product is disabled or assigned to category that is disabled. For instance, if you need “featured” product functionality and you manually create raw query that extracts one “featured” product to lets say home page. There you create nice block holding all the necessary info of the product with a link for lets say “Add to cart” or “View product”. Clicking one of those two link would give you most likely 404 if the product is disable or any other condition that enables the product to be shown is disabled.

So basically, raw SQL queries in Magento are a “no no” for most of the time. Too much work and you never know what they will change in future upgrades. Which brings me to this new trend of theirs which I like to call “Escape from EAV”.

Gates of HellEAV, also know as Entity Attribute Value was suppose to bee (or is) the next big thing in OOP. Anyhow, Magento and his way of EAV have two huge downfalls called LACK OF DOCUMENTATION and WHAT WILL THEY CHANGE IN NEXT UPGRADE. The other day I was working on a Magento shop that had 22 000 products. Almost all of them were simple products, just about 600 were grouped. The store was initially installed on Magento 1.2 version. Site was pretty much 95% complete from both graphic and development perspective. Anyhow, Magento version 1.3.1 was released and on client demand we decide to to upgrade. Important thing to keep in mind is that in version 1.3 Magento introduce “change in philosophy” concerning the EAV model.

Altough EAV sounds great (and really is great for most of the time), it can really slow things down with all those JOINS executed on database. So the Magento team decided to do a little flat tabling. Basically we now have massive data duplication in MySQL where data is taken from various tables and copied int one, the flat table. Flat tables were introduced for both Products and categories, in regards to various website and store views.

Basically Magento has the power to “on the fly” create tables and do the “magical” copying of data from various tables to the flat tables. I assume they were looking for a faster way to “assemble” Product objects in Magento which in turn should boost the speed of collection object i grew so found about. I can live with duplicated data in database, I mean I am not the one writing them down. But let me get back to real world scenario.

As I mentioned, store I have been working on had more than 20 000 products. Due to limitations of both MySQL and PHP failed to do Rebuilt that you can find in System > Configuration > Cache management. This failure caused corrupted data in database. This manifested with “empty” attributes (attributes not showing up in Layered navigation). I resolved the issue by “re saving” all of the 20 000 product with custom script I wrote. After that i noticed the data in flat tables got rebuild as well. This of course took several hours. Magento EAV model seems OK on paper, but with store that hold’s large number of products it simply does not work, at least not in way in which their collection objects are built. Changes introduced in form of flat tables in versions 1.3 should, to some extent, improve work with stores that have large numbers of products. However this improvement seems to come in form of “copy all data from scattered tables to single flat table”.

Solutions like these are nightmare when it comes to upgrading the store that already has large number of products in. System is expected to handle part of the job transparently but it failed, leaving the client with corrupted database.

So, a word of advice for upgrading an existing Magento store: NEVER do it on live site. Let professional developer transfer live site and database to his dev machine, or make a copy on some sub folder on live site. Magento’s extreme out of the box feature rich capabilities, open source philosophy and good marketing have made it extremely popular but be careful, free is such a a loose term.

I am really anxious to see the future path of EAV vs Flat model in Magento.

PS. This topic was discussed on two more following articles from my colleagues. Tomislav Bilic firstly introduced Magento MySQL database diagram back in 2008, while Mladen Lotar expanded this intro and provided more details at his Magento’s database layout and it’s EAV structure article at 2010.

Interested in hiring us?

Have a chat with us. You would be surprised how small changes can make your business even more successful.


22 comments

  1. I see in the post above that no-no raw SQL queries, but if I want to pull only data from Magento DB based on passed product ID of Magento to a script page can I use raw SQL queries in that script? As far I know nobody will upgrade Magento installation or will be aware about upgrade consequences.

    Thanks

  2. I guess they made a bad decision on using EAV. It is too heavy to query data, to heavy to write data. Too heavy to write queries.

    Man! Wonder if you want do to a BI job on it??

    I’m about to not get surprised anymore on how many popular systems I’m checking have weird stuff, because of “hmm.. it is not working well”.

    I can’t stop thinking: how a system that is supposed to be quick by nature (ecommerce) could implement such slow thing? And being such popular, nobody says “hey, stop with it”?

    Note that they are copying data to flat tables. Sounds a workaround for me. Then, why not just make it flat tables for the whole system? Easy and quick for everyone.

    *Sigh*

    A sad user.

  3. They (the Magento developers) have probably not been using the STRAIGHT_JOIN to do their inner joins in Mysql. Had they tried to use that optimization, flattening the tables would be unnecessary. Read the docs on how it replaces an INNER JOIN and then TEST IT. This one change makes Mysql performance much more reasonable with large datasets being self joined.

  4. They should better think about using a document based db like MongoDB to store Magentos monster objects (catalog/products, customers+adresses, categorys+relations) – 1 request and everything is in place…

  5. @t-low
    How good are views at handling a dozen self joins over 8 tables to covert rows into columnar data? Essentially making a relational pivot table spread over 8 tables?

  6. Why they haven’t designed views instead of creating flat tables? It could advoid having a lot of data redundant and really large DB Dumps.

  7. EAV is a simplistic idea that’s been around a long time. Understanding it is not the problem. Getting simplistic single queries out of an EAV database is easy. Getting a listing for reporting on large ranges of product; however, is easy AND horrifically memory intensive bloating time required for data acquisition from seconds to hours.

  8. And the performance of EAV got better in a negative direction when we went to 1.4.1.1. It’s not scaling well. Dataflow operations that took 30 minutes under 1.3.1.1 now take 3 hours. The upgrade adds another 100 tables. I recommend lots of hardware and lots of mysql tinkering.

  9. Calling customer model in a webservice used by a point of sales wa s a big problem and we were just using it for getting some basic info, we had to create few functions in a new model using raw sql queries and everything went smooth again.

  10. Really a good post.

    Yes, my experience with magento database not so good too. For a store, I installed, upgrade, again install, fix … more than 10 times.

  11. I am looking at using Magento to build a multiple stores website and move from IXXO cart. I currently have 47,000+ products and eventually I will have approx. 1.2 million.

    In addition to tools and accessories we sell service parts for tools. Each manufacturer we carry has roughly 20,000 service part products each. With 50 manufacturers we carry that’s over 1 million service parts alone. Along with the service parts we will be adding .pdf breakdowns of the tools and providing a table below the .pdf image of all the parts associated with it and the user will be able to add them to the cart from that page.

    Any idea if Magento can handle this database? Or should I ‘shop’ for another solution? B)

  12. I am working on a plugin that integrates magento with a POS system. The db performance is terrible largely because of their data abstraction model. I agree their Collections objects are not efficient at all from a db standpoint. There is so much more that needs to be done with Magento – I am starting to think this is not ready for large stores (>2500 products).

  13. @Branko Ajzele: Look at http://www.acris-antibodies.com we have many products and filters.
    I update the products with a cron scripts all 5 minutes (only modified products) an rebuild every hour the searchindex (15min run this script :-( and i wrote a extension for this, because when you generate the searchindex the search on frontpage dont find any product *magento*)

  14. @Tomislav – I agree with you all the way, my friend! My use of “ultimate” was meant without the superlative connotation. Rather, I meant to denote that one of the side effects (goals?) of EAV is a maximum degree of normalization. You are spot-on with the assertion that “EAV turns the MySQL in something that it isn’t supposed to be” – it goes against sensibility to have such complexity when trying to represent data at its simplest, doesn’t it?

    I find EAV to be a huge PITA, and so must Varien, given that they have implemented some rudimentary data warehousing with the Flat Catalog (improvements?) in the 1.3 branch.

    I also agree that today’s underlying data storage technologies will evolve into something more intuitive, just as the programming languages have done. It will be much smarter people than I though who will figure out the most graceful way to represent and store non-static objects for the breadth of Web apps out there… For sure we’ll look back in ten years and say, “I can’t believe we used to work that /hard/!”

  15. @Ben: “EAV is the ultimate in data normalization”

    This could be another topic, but EAV is just a way to go over MySQL disadvantages. Relationship databases are simply not created for this purposes. I have a feeling that EAV turns the MySQL in something that it isn’t supposed to be.

    In the years to come, I expect a new database to replace MySQL in open-source world. Amazon SimpleDB maybe?
    http://aws.amazon.com/simpledb/

  16. @inchoo & branko – Another good post, thanks as always for sharing your opinion. I think the downloads area of Magento’s site should use the html blink tag and some sweet animated flame GIFs to tell people to never never never upgrade a live site.

    @pct:
    - I agree that there is a ton of data in Magento, but I disagree completely that EAV is large because data is duplicated; EAV is the ultimate in data normalization.

    - Magento modules have migration files, so running the upgrade should upgrade everything whether by SSH & pear or via MagentoConnect panel (which just runs pear anyway)

    - As far as SSH installs & upgrades go (they really are the best way):
    Setting up a Staging Server: http://www.crucialwebhost.com/blog/setting-up-a-magento-staging-area/ (There are a number of great tips here)

    Miscellaneous info about upgrade permissions and PHP environment:
    http://www.magentocommerce.com/boards/viewreply/61020/

  17. This is not a comment about EAV in Magento, personally I find the EAV db okay… Biggest complaint I have of it is, that the volume of data in it is huge because so many things are duplicated in it. Especially now that flat thing came. Anyway, I’ll comment about the upgrade part.

    Not long ago I upgraded one store with a bit over 30000 products from Magento 1.2.1 to Magento 1.3.1. The upgrade went okay, but only because I knew that it might not be an easy one (I had previously upgraded one site from 1.1.6 to 1.2.1 and that was a rough one).

    There are few steps I always follow while upgrading.
    1. No live upgrades (or at least close the site so that no one except you can view the frontend or backend). If someone else has access to the site and does page loads (in frontend or in backend), the database will most likely be corrupted after update.
    2. Disable cache before doing anything else. Clear it and delete all the files beforehand. Clear all the logs also. And sessions.
    3. Modify php.ini or .htaccess, increase the maximum execution time to something like few days and also increase the memory limits to as high as you dare. You might as well increase all the other related numbers. The default limist in php.ini were way too low to do upgrade or the flat db rebuild thing.
    4. I haven’t found a way to start the upgrade process from shell. Upgrading the files is easy with pear, but I have no idea how to upgrade the db. And I’ve seen one or two failed upgrades (with older versions like 1.1.2 ->1.1.5) if I go first to the frontend and not to the backend. This is why I always log into the backend before doing the upgrade and refresh the page there (db gets upgraded).

    Oh and one last thing. I try to upgrade the Magento sites as often as possible. I think all the upgrades so far have brought some db upgrades and I think that keeping the db version as recent as possible will avoid painful transitions later on. That was one of the reasons 1.1.6 -> 1.2.1 was painful. Another upgrade where I first upgraded to 1.1.8, then to 1.2.0 and after that to 1.2.1 went really well. This is of course only possible if your custom modules don’t touch the db directly but only through the number of layers Magento provides for db abstraction.

  18. Hi Daim,

    Seems like you have really, really large store. Would love to see how this is working on Magento :)

    Wish you all the best in finding the best solution.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.