eindex

Concept of multidimensional indexing for tensors

Example of K-means clustering

Plain numpy

python

def kmeans(init_centers, X, n_iterations: int):
    n_clusters, n_dim = init_centers.shape
    n_onservations, n_dim = X.shape
 
    centers = init_centers.copy()
    for _ in range(n_iterations):
        d = cdist(centers, X)
        clusters = np.argmin(d, axis=0)
        new_centers_sum = np.zeros_like(centers)
        indices_dim = np.arange(n_dim)[None, :]
        np.add.at(new_centers_sum, (clusters[:, None], indices_dim), X)
        cluster_counts = np.bincount(clusters, minlength=n_clusters)
        centers = new_centers_sum / cluster_counts[:, None]
    return centers

With eindex

python

def kmeans_eindex(init_centers, X, n_iterations: int):
    centers = init_centers
    for _ in range(n_iterations):
        d = cdist(centers, X)
        clusters = EX.argmin(d, 'cluster i -> [cluster] i')
        centers = EX.scatter(X, clusters, 'i c, [cluster] i -> cluster c',  
                             agg='mean', cluster=len(centers))
    return centers

Tutorial notebook

Goals

Form a helpful 'language' to think about indexing and index-related operations. Tools shape minds
Cover most common cases of multidimensional indexing that are hard to implement using the standard API
Approach should be applicable to most common tensor frameworks, autograd should work out-of-the-box
Aim for readable and reliable code
Allow simple adoption in existing codebases
Implementation should base on fairly common tensor operations. No custom kernels allowed.
Complexity should be visible: execution plan for every operation should form a static graph. Structure of the graph depends on the pattern, but not on tensor arguments.

Non-goals: there is no goal to develop 'the shortest notation' or 'the most advanced/comprehensive tool for indexing' or 'cover as many operations as possible' or 'completely replace default indexing'.

Examples

Follow tutorial first to learn about all operations provided.

Implementation

Repo provides two implementation:

array api standard. This implementation is based on a standard that multiple frameworks pre-agreed to follow. Implementation uses only API from standard, so all available operations support all frameworks that follow the standard.

At some point this should become the one and the only implementation.

Here is the catch: current support of array api standard is poor, that's why the second implementation exists
numpy implementation

This independent implementation works right now.

Numpy implementation is great to test things out, and is handy for a number of non-DL applications as well.

Development Status

API looks solid, but breaking changes are still possible, so lock the version in your projects (e.g. eindex==0.1.0)

Related projects

Other projects you likely want to look at:

tullio by Michael Abbott (@mcabbott) provides Julia macros with a high level of flexibility. Resulting operations are then compiled.
torchdim by Zachary DeVito (@zdevito) introduces "dimension objects", which in particular allow convenient multi-dim indexing
einindex is an einops-inspired prototype by Jonathan Malmaud (@malmaud) to develop multi-dim indexing notation. (Also, that's why this package isn't called einindex)

Contributing

We welcome the following contributions:

next time you deal with multidimensional indexing, do this with eindex Worked? → great - let us know; didn't work or unclear how to implement → post in discussions
if you feel you're already fluent in eindex, help others
guides/tutorials/video-guides are very welcome, and will be linked
If you want to translate tutorial to other language and post it somewhere - welcome

Discussions

Use discussions at github for this project https://github.com/arogozhnikov/eindex/discussions

eindex

eindex

Example of K-means clustering

Tutorial notebook

Goals

Examples

Implementation

Development Status

Related projects

Contributing

Discussions

Command Palette