MapReduce-MPI Library

A grain of wisdom is worth an ounce of knowledge, which is worth a ton of data. -- Neil Larson
It is a capital mistake to theorize before one has data. -- Arthur Conan Doyle

This is the home page for the MapReduce-MPI (MR-MPI) library, which is an open-source implementation of MapReduce written for distributed-memory parallel machines on top of standard MPI message passing.

FeaturesDocumentationLibrary functionsOINK scripting wrapperOINK commandsPublications
DownloadGitHubLatest features & bug fixesContributeOpen source.

MapReduce is the programming paradigm, popularized by Google, which is widely used for processing large data sets in parallel. Its salient feature is that if a task can be formulated as a MapReduce, the user can perform it in parallel without writing any parallel code. Instead the user writes serial functions (maps and reduces) which operate on portions of the data set independently. The data-movement and other necessary parallel operations can be performed in an application-independent fashion, in this case by the MR-MPI library.

The MR-MPI library was developed at Sandia National Laboratories, a US Department of Energy facility, for use on informatics problems. It includes C++ and C interfaces callable from most hi-level languages, and also a Python wrapper and our own OINK scripting wrapper, which can be used to develop and chain MapReduce operations together. MR-MPI and OINK are open-source codes, distributed freely under the terms of the modified Berkeley Software Distribution (BSD) License. See this page for more details.

The authors of the library are Steve Plimpton and Karen Devine, who can be contacted at sjplimp at sandia.gov and kddevin at sandia.gov.


These are other software packages that perform MapReduce operations:


Recent MR-MPI News