MACSio  0.9
Multi-purpose, Application-Centric, Scalable I/O Proxy App
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Macros Modules Pages
MACSio Documentation

MACSio is a Multi-purpose, Application-Centric, Scalable I/O proxy application.

It is designed to support a number of goals with respect to parallel I/O performance benchmarking including the ability to test and compare various I/O libraries and I/O paradigms, to predict scalable performance of real applications and to help identify where improvements in I/O performance can be made.

For an overview of MACSio's design goals and outline of its design, please see this design document.

MACSio is capable of generating a wide variety of mesh and variable data and of amorphous metadata typical of HPC multi-physics applications. Currently, the only supported mesh type in MACSio is a rectilinear, multi-block type mesh in 2 or 3 dimensions. However, some of the functions to generate other mesh types such as curvilinear, block-structured AMR, unstructured, unstructured-AMR and arbitrary are already available. In addition, regardless of the particular type of mesh MACSio generates for purposes of I/O performance testing, it stores and marshalls all of the resultant data in an uber JSON-C object that is passed around witin MACSio and between MACSIO and its I/O plugins.

MACSio employs a very simple algorithm to generate and then decompose a mesh in parallel. However, the decomposition is also general enough to create multiple mesh pieces on individual MPI ranks and for the number of mesh pieces vary to between MPI ranks. At present, there is no support to explicitly specify a particular arrangement of mesh pieces and MPI ranks. However, such enhancement can be easily made at a later date.

MACSio's command-line arguments are designed to give the user control over the nominal I/O request sizes emitted from MPI ranks for mesh bulk data and for amorphous metadata. The user specifies a size, in bytes, for mesh pieces. MACSio then computes a mesh part size, in nodes, necessary to hit this target byte count for double precision data. MACSio will determine an N dimensional logical size of a mesh piece that is a close to equal dimensional as possible. In addition, the user specifies an average number of mesh pieces that will be assigned to each MPI rank. This does not have to be a whole number. When it is a whole number, each MPI rank has the same number of mesh pieces. When it is not, some processors have one more mesh piece than others. This is common of HPC multi-physics applications. Together, the total processor count and average number of mesh pieces per processor gives a total number of mesh pieces that comprise the entire mesh. MACSio then finds an N dimensional arrangement (N=[1,2,3]) of the pieces that is as close to equal dimension as possible. If mesh piece size or total count of pieces wind up being prime numbers, MACSio will only be able to factor these into long, narrow shapes where 2 (or 3) of the dimensions are of size 1. That will make examination of the resulting data using visualization tools like VisIt a little less convenient but is otherwise harmless from the perspective of driving and assessing I/O performance.

Once the global whole mesh shape is determined as a count of total pieces and as counts of pieces in each of the logical dimensions, MACSio uses a very simple algorithm to assign mesh pieces to MPI ranks. The global list of mesh pieces is numbered starting from 0. First, the number of pieces to assign to rank 0 is chosen. When the average piece count is non-integral, it is a value between K and K+1. So, MACSio randomly chooses either K or K+1 pieces but being carful to weight the randomness so that once all pieces are assigned to all ranks, the average piece count per rank target is achieved. MACSio then assigns the next K or K+1 numbered pieces to the next MPI rank. It continues assigning pieces to MPI ranks, in piece number order, until all MPI ranks have been assigned pieces. The algorithm runs indentically on all ranks. When the algorithm reaches the part assignment for the rank on which its executing, it then generates the K or K+1 mesh pieces for that rank. Although the algorithm is essentially a sequential algorithm with asymptotic behavior O(#total pieces), it is primarily a simple book-keeping loop which completes in a fraction of a second even for more than one million pieces.

Each piece of the mesh is a simple rectangular region of space. The spatial bounds of that region are easily determined. Any variables to be placed on the mesh can be easily handled as long as the variable's spatial variation can be described in the global goemetric space.

Building MACSio

MACSio uses GNU Makefiles with conditionally constructed variables and shell functions.

MACSio source code is divided into two key directories; the main MACSio functionality is in the macsio directory while all plugins are in the plugins directory.

Building MACSio Main

The main bootstrap for building MACSio is the config-site file. This file contains variable definitions for all the key Make variables necessary to control the build of MACSio and any of its plugins. Here is an example config-site file...

SILO_HOME = /Users/miller86/visit/visit/silo/4.10.2-h5par/i386-apple-darwin12_gcc-4.2
HDF5_HOME = /Users/miller86/visit/visit/hdf5/1.8.11-par/i386-apple-darwin12_gcc-4.2
ZFP_HOME = $(HDF5_HOME)
EXODUS_HOME = /Users/miller86/Downloads/exodus-6.09/exodus/myinstall
NETCDF_HOME = /Users/miller86/visit/thirdparty_shared/2.8/netcdf/4.3.2/i386-apple-darwin12_gcc-4.2
CXX = /Users/miller86/installs/openmpi/1.6.4/i386-apple-darwin12_gcc-4.2/bin/mpicxx
CC = /Users/miller86/installs/openmpi/1.6.4/i386-apple-darwin12_gcc-4.2/bin/mpicc
CFLAGS = -DHAVE_MPI -g
LINK = $(CXX)

Note that all package FOO_HOME make variables are treated as specifying a top-level package directory underneath which lives include and lib directories for the package header files and library files respectively. If you have a package that does not install or is not installed in this industry standard way, a work-around is to use symlinks or explicit copies to create some proxy home directory for the package that is structured in the way MACSio's Makefiles expect it.

Ordinarily, we maintain separate config-site files for various hosts upon which MACSio is built. The files are named according to the build host they are associated with. However, it is also perfectly fine to maintain, for example, a config-site file for a generic host such as ubuntu and then just explicitly reference that config-site file when building MACSio on ubuntu systems.

Although MACSio is C Language, at a minimum it must be linked using a C++ linker due to its use of non-constant expressions in static initializers to affect the static plugin behavior. However, its conceivable that some C++'isms have crept into the code causing warnings or outright errors with some C compilers.

In addition, MACSio sources currently include a large number of #warning statements to help remind developers (namely me) of minor issues to be fixed. When compiling, these produce a lot of sprurios output in stderr but are otherwise harmless.

From within the macsio sub-directory, these make targets are defined...

  • make all: will build all of MACSio main + all plugins that have been enabled via setting non-null values for their respective TPL(s) FOO_HOME variables in the config-site file.
  • make CONFIG_SITE_FILE=config-site/foo all: will build all of MACSio main + plugins using the specified config-site file, config-site/foo.
  • make clean: will clean away main and plugin object files.
  • make dataclean: will clean away data files MACSio has produced.
  • make allclean: will clean away all test data files, main and plugin object files, and the macsio executable.

Note that part of building MACSio's main includes building the JSON-C Library. The JSON-C library is configured and installed from the Makefile in the macsio sub-directory but it is actually installed one directory level up in ../json-c/install. Whenever the JSON-C library is modified, it is necessary to re-install it and in that case requires one to manually cd to the ../json-c/build directory and issue the command make install there.

Building MACSio Plugins

By default, the only plugin(s) MACSio builds with automatically are those that depend upon ubiquitous system libraries such as stdio. In the initial release of MACSio, the only plugin that operates directly on system I/O interfaces is the raw-posix (miftmpl) plugin.

Other plugins require associated third party libraries (TPLs). Consequently, before building MACSio, one must have installed the associated TPLs for the desired plugins.

Here are some useful make targets defined for the plugins directory to help with plugin TPL(s).

  • make list: lists all plugins for which source code exists in the plugins directory.
  • make list-tpls-X: lists all TPL(s) required for plugin X as well as their last known URLs.
  • make download-tpls-X: downloads (using either wget or curl) all TPL(s) tarballs needed for plugin X.
  • make install-tpls-X: will attempt to build and install all TPL(s) for a given plugin to path specified in MACSIO_TPLS_PREFIX. (note: this is currently an unreliable option).

All essential make bootstraps can be set in a hostname-specific config-site file or by explicitly specifing a config-site file to be used using the CONFIG_SITE_FILE make variable (e.g. make CONFIG_SITE_FILE=config-site/foo will build using the contents of the file foo in the config-site directory.

A given plugin is built only when installations of its needed TPL(s) are specified via its associated FOO_HOME variable. For example, to build the HDF5 plugin, the variable HDF5_HOME must specify a path to an installation of HDF5 where the include and lib sub-directories for HDF5 can be found.

Sometimes it is desireable to build only some of the available plugins. This can be achieved using the make variable ENABLE_PLUGINS setting it to a space separated string of the names of the plugins to include when linking the MACSio main executable. For example, the command make ENABLE_PLUGINS="miftmpl silo" all will build the MACSio executable so that only the miftmpl and Silo plugins are included.

Each plugin is defined by two files named such as macsio_foo.make and macsio_foo.c for a plugin named foo. macsio_foo.c implements the MACSIO_IFACE interface for the foo plugin. macsio_foo.make is a makefile fragment, that gets included in the main Makefile in the plugins directory, to manage the creation of macsio_foo.o object file.

Given the high likelihood that different plugins may depend on common TPL(s), there is a plugin-specific make variable, FOO_BUILD_ORDER (for a fictitious foo plugin) that informs MACSio's make system of the order in which to build the plugin relative to other plugins. The FOO_BUILD_ORDER variable is a floating point number that is used to sort the order in which plugin's object files appear on the link line when linking MACSio. A higher numerical value for the FOO_BUILD_ORDER variable will result in the foo plugin and its dependent libraries occuring later on the link command-line.

MACSio does not use dlopen() to manage plugins. Instead, MACSio uses a static approach to managing plugins. The set of plugins available in a macsio executable is determined at the time the executable is linked simply by listing all the plugin object files to be linked into the executable (along with their associated TPL(s)). MACSio exploits a feature in C++ which permits initialization of static variables via non-constant expressions. All symbols in a plugin are defined with static scope. Every plugin defines an int registration(void) function and initializes a static dummy integer to the result of registration() like so...

static int register_this_interface(void)
{
strcpy(iface.name, iface_name);
strcpy(iface.ext, iface_ext);
if (!MACSIO_IFACE_Register(&iface))
MACSIO_LOG_MSG(Die, ("Failed to register interface \"%s\"", iface.name));
}

At the time the executable loads, the register_this_interface() method is called. Note that this is called long before even main() is called. The call to MACSIO_IFACE_Register() from within register_this_interface() winds up adding the plugin to MACSio's global list of plugins. This happens for each plugin. The order in which they are added to MACSio doesn't matter because plugins are identified by their (unique) names. If MACSio encounters a case where two different plugins have the same name, then it will abort and inform the user of the problem. The remedy is to adjust the name of one of the two plugins. MACSio is able to call static methods defined within the plugin via function callback pointers registered with the interface.