Chunk an Iterator into Arrays

Iterators can take the heat out of memory consumption when processing big datasets. Typical for such datasets is that workers process them in manageable chunks of a limited size. As these workers typically are independent of each other, it might be necessary to pack chunks of the source in arrays that can be sent with the job metadata to the worker. This is when you might want to chunk an iterator into arrays.

A Generator function is useful to achieve this purpose. As opposed to a normal function, a generator function yields return values as the caller asks for them. Take this dice roller:

function dice() {
return rand() % 6 + 1;
}

Now, let use create a generator function that takes an iterator and a number $n indicating the (maximum) chunk size.

function chunk_iterator(Iterator $it, int $n)
{
$chunk = [];

for($i = 0; $it->valid(); $i++){
$chunk[] = $it->current();
$it->next();
if(count($chunk) == $n){
yield $chunk;
$chunk = [];
}
}

if(count($chunk)){
yield $chunk;
}
}

After starting with an empty $chunk array, loop over all entries of the iterator. Add each entry to $chunk. Check whether count($chunk) reached limit $n, yield $chunk, empty $chunk.

for($i = 0; $it->valid(); $i++){
$chunk[] = $it->current();
$it->next();
if(count($chunk) == $n){
yield $chunk;
$chunk = [];
}
}

After iteration, there might remain some items in a non-full chunk. Output it as well:

count($chunk)){
yield $chunk;
}

Calling the function returns a Generator, which implements the Iterator interface.

// For demonstration, create an iterator from array
$arr = range(20, 40, 1);
$it = new ArrayIterator($arr);

// Iterate over the generator
foreach(chunk_iterator($it, 6) as $c){
echo implode(',', $c)."\n";
}

… and there we go:

20,21,22,23,24,25
26,27,28,29,30,31
32,33,34,35,36,37
38,39,40

Try it yourself on 3v4l.org/dMjCI