Working with large Magento collections

Working with large Magento collections

Magento collections represent a very convenient way of grouping models together. We can grab a group of models from database with little to none sql code thanks to special methods which collection model provides. Collections can get really big (new products, customers, categories, lots of attributes etc.) and become really tricky to work with. Let’s see what can be done to make large collections more manageable.

Let’s say we want to change the first name of every customer on our site to uppercase. One would probably do something like this:

$customers = Mage::getModel('customer/customer')->getCollection()->addAttributeToSelect(array('firstname'), 'inner');
 
foreach ($customers as $customer) {
$customer->setFirstname(strtoupper($customer->getFirstname()));
$customer->save();
}

The above code works perfectly well for smaller collections, but with large ones the script will eventually run out of memory and produce the following error:

Fatal error: Allowed memory size of X bytes exhausted (tried to allocate X bytes) in magento_site\lib\Varien\Data\Collection.php on line 550“.

Why is that so?

Well, collection is basically an array of objects, in this case an array of customer objects. The more objects (in this case customers) we have and the more attributes we add to selection, the bigger the resulting array will be, which leads to high memory usage.

Solution to this kind of problem is implemented in Magento in the form of Mage_Core_Model_Resource_Iterator model. It enables us to grab data from database one by one, which is in contrast to loading all results at once like we did in example above. Main part of the iterator is method walk() which needs two mandatory parameters to work, collection query string and callback method(s).

If we look at the method implementation:

public function walk($query, array $callbacks, array $args=array(), $adapter = null)
{
$stmt = $this->_getStatement($query, $adapter);
$args['idx'] = 0;
while ($row = $stmt->fetch()) {
$args['row'] = $row;
foreach ($callbacks as $callback) {
$result = call_user_func($callback, $args);
if (!empty($result)) {
$args = array_merge($args, $result);
}
}
$args['idx']++;
}
 
return $this;
}

we can see that it takes provided query string, executes it, fetches results one by one and sends them to the callback method. Result which is passed to callback method is placed in an array $args, which holds data for each object in collection. In callback function we can access the data with $args[‘row’].

Here is what our example would look like using iterator:

public function uppercaseAction()
{
// get customer collection
$customers = Mage::getModel('customer/customer')->getCollection()->addAttributeToSelect(array('firstname'), 'inner');
// call iterator walk method with collection query string and callback method as parameters
Mage::getSingleton('core/resource_iterator')->walk($customers->getSelect(), array(array($this, 'customerCallback')));
}
 
// callback method
public function customerCallback($args)
{
$customer = Mage::getModel('customer/customer’); // get customer model
$customer->setData($args['row']); // map data to customer model
$customer->setFirstname(strtoupper($customer->getFirstname())); // set value of firstname attribute
$customer->getResource()->saveAttribute($customer, 'firstname'); // save only changed attribute instead of whole object
}

Things are fairly simple here. We define collection and pass its sql query to walk method of iterator. Iterator than executes given query and calls customerCallback() method on each row returned from database. In callback method we instantiate customer model, map data from $args[‘row’] to the model and change its firstname property to uppercase.

One thing to take care about is object saving. Since we are saving lots of models we want our script to be as efficient as possible. Using “classic” save on model is pretty heavy operation which takes time and resources. Since we are changing only one attribute it is best to use saveAttribute() method which is much faster.

Thats it, no more “out of memory” error :).

Happy coding!

You made it all the way down here so you must have enjoyed this post! You may also like:

How To Connect Google Analytics 4 To Magento 2 Bojan Mareljic
Bojan Mareljic, | 36

How To Connect Google Analytics 4 To Magento 2

3 best open-source eCommerce platforms in 2021 Zrinka Antolovic
Zrinka Antolovic, | 8

3 best open-source eCommerce platforms in 2021

Flat tables in Magento and product collection Domagoj Potkoc
Domagoj Potkoc, | 15

Flat tables in Magento and product collection

10 comments

  1. Hey Guys,
    I like this solution BUT I dont prefer to “walk” and query each single row individually.
    Jarrod had a point at 2014 (see below) for the best solution. Use ResourceCollection instead of Collection and then you will able to use pagination
    ->setCurPage($pageNum)
    ->setPageSize($size)
    Your script will be much more efficient if you can optimize the size of one step/page and read 10 or 100 rows each time and loop on that 10 or 100 (or whatever) records.

  2. It’s still slow, because there is only one update per SQL query. I use function insertOnDuplicate to update thousands rows per query – it’s hundreds times faster 🙂

  3. Great.
    I have the same problem with order collection. Anf I solved this way (I had not yet read your post 🙂 ).
    Reading all the information of the all orders my server says: out of memory (16G!!!). But, I select only the fields that I need. -> addFieldToSelect
    With this, no problem.
    Then perhaps you can use this, you want to work with firstname:
    $customers = Mage::getModel(‘customer/customer’)->getCollection()->addAttributeToSelect(array(‘firstname’), ‘inner’)-> addFieldToSelect(‘firstname’);
    Of course you you will need memory, but, in my example it will use only 14% of total memory, not 100% and crash.

  4. I’m trying to confirm my existing customer database (~3500) imported from an older website, this method still times out.

  5. Another good method to keeping Your memory usage low on collections is to make use of ->setCurPage($pageNum) and ->setPageSize($size).
    Then when you iterate through the collection, you wrap your foreach loop in a while loop and increment your page (start at 1). Use get_mem_usage and tweak the page size as necessary to keep your numbers in your desired range.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <s> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.

Tell us about your project

Drop us a line. We'd love to know more about your project.