python-split: split python sequences by size and by predicates

2012-10-11

python-split

I’ve just uploaded a package to PyPI for the first tine in my life.

http://pypi.python.org/pypi/split/0.3

It lazily splits sequences into subsequences. I find that itertools are lacking in this area (especially after using Clojure or Haskell).

Chunks of equal size

To partition a sequence into chunks of equal size, use chop:

>>> from split import chop
>>> list(chop(3, range(10)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

If truncate=True keyword argument is given, then the sequence is truncated to a multiple of chunk size.

Subsequences by a predicate

To split a sequence into two by a given predicate, use partition:

>>> from split import partition
>>> def odd(x): return x%2
>>> map(list, partition(odd, range(5)))
[[1, 3], [0, 2, 4]]

For more general partitioning, use groupby:

>>> [(k, list(i)) for k,i in groupby(lambda x: x%3, range(7))]
[(0, [0, 3, 6]), (1, [1, 4]), (2, [2, 5])]

This function is different from itertools.groupby: it returns only one subsequence iterator per predicate value. Its return value can be converted into dictionary. Obviously, the price is that it may force and keep the entire sequence in memory in the worst case.

When working with very long sequences, consider using predicate_values keyword argument to avoid scanning the entire sequence. For example:

>>> longseq = xrange(int(1e9))
>>> pred = lambda x: x%3
>>> dict(groupby(pred, longseq, predicate_values=(0,1,2)))
{0: <generator object subsequence at 0x301b7d0>,
 1: <generator object subsequence at 0x301b780>,
 2: <generator object subsequence at 0x301b730>}

Breaking on separators

To break a sequence into chunks on some separators, use split.

>>> list(split(0, [1,2,3,0,4,5,0,0,6]))
[[1, 2, 3], [4, 5], [], [6]]

You may use a function as a separator's predicate.

Install

To install run

pip install split

It works with both Python 2 and Python 3.