# python-split: split python sequences by size and by predicates

2012-10-11

python-split

I’ve just uploaded a package to PyPI for the first tine in my life.

http://pypi.python.org/pypi/split/0.3

It lazily splits sequences into subsequences. I find that `itertools` are lacking in this area (especially after using Clojure or Haskell).

### Chunks of equal size

To partition a sequence into chunks of equal size, use `chop`:

``````>>> from split import chop
>>> list(chop(3, range(10)))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
``````

If `truncate=True` keyword argument is given, then the sequence is truncated to a multiple of chunk size.

### Subsequences by a predicate

To split a sequence into two by a given predicate, use `partition`:

``````>>> from split import partition
>>> def odd(x): return x%2
>>> map(list, partition(odd, range(5)))
[[1, 3], [0, 2, 4]]
``````

For more general partitioning, use `groupby`:

``````>>> [(k, list(i)) for k,i in groupby(lambda x: x%3, range(7))]
[(0, [0, 3, 6]), (1, [1, 4]), (2, [2, 5])]
``````

This function is different from itertools.groupby: it returns only one subsequence iterator per predicate value. Its return value can be converted into dictionary. Obviously, the price is that it may force and keep the entire sequence in memory in the worst case.

When working with very long sequences, consider using predicate_values keyword argument to avoid scanning the entire sequence. For example:

``````>>> longseq = xrange(int(1e9))
>>> pred = lambda x: x%3
>>> dict(groupby(pred, longseq, predicate_values=(0,1,2)))
{0: <generator object subsequence at 0x301b7d0>,
1: <generator object subsequence at 0x301b780>,
2: <generator object subsequence at 0x301b730>}
``````

### Breaking on separators

To break a sequence into chunks on some separators, use `split`.

``````>>> list(split(0, [1,2,3,0,4,5,0,0,6]))
[[1, 2, 3], [4, 5], [], [6]]
``````

You may use a function as a separator's predicate.

### Install

To install run

``````pip install split
``````

It works with both Python 2 and Python 3.