Mumps MultiDimensional
& Hierarchical Toolkit
Last update: July 30, 2008
****** --> Click Here for Distribution DownLoads <-- ******

Overview of an Open Source/GPL'ed
Mumps Interpreter / Compiler
and
MultiDimensional and Hierarchical Toolkit (MDH)
for Linux, Cygwin and Windows

Mumps (also referred to as M) is a general purpose programming language that supports a unique, hierarchical (or multidimensional) database facility. It was originally developed in the late 1960s and the acronym stands for the Massachusetts General Hospital Utility Multi-programming System. It was (and is) widely used in clinical computing. Its original purpose was to store tree structured medical records.

Over the years a number of commercial versions were developed. Most of these, however, are now extinct, merged or evolved into forms considerably different from the original.

The version described here is an open source, GPL licensed implementation begun in the early 1980's. It also has undergone considerable evolution and Mumps/II is the current result. The main motivation for its development was to implement tools for information storage and retrieval, text processing and bioinformatics. (for example, see Online I&SR Notes)

This implementation is written entirely in C/C++ and compiles under Linux, Cygwin and Windows (some features are omitted from the Windows version due to differences in the MS VC++ compiler versus gcc/g++). The implementation consists of two parts: a compiler which translates Mumps to C++ and then to binaries and an interpreter scripting shell which executes source code directly.

The hierarchical or multidimensional data base is perhaps the most interesting feature of Mumps. It permits the construction of arbitrary trees by means of string indexed array references. Data may be stored at any node and there are functions to sequentially access siblings and children. For example, the NLM MeSH codes, a hierarchy of terminology used in the health sciences, consists of text such as the following excerpt:

Cardiovascular System;A07 Blood Vessels;A07.231 Arteries;A07.231.114 Aorta;A07.231.114.056 Aorta, Abdominal;A07.231.114.056.205 Aorta, Thoracic;A07.231.114.056.372 Sinus of Valsalva;A07.231.114.056.847 Arterioles;A07.231.114.060 Axillary Artery;A07.231.114.085 Basilar Artery;A07.231.114.106 Brachial Artery;A07.231.114.139 Brachiocephalic Trunk;A07.231.114.145 Bronchial Arteries;A07.231.114.158 Carotid Arteries;A07.231.114.186 Carotid Artery, Common;A07.231.114.186.200 Carotid Artery, External;A07.231.114.186.200.210 Carotid Artery, Internal;A07.231.114.186.200.230 Carotid Sinus;A07.231.114.186.456 Celiac Artery;A07.231.114.207 Cerebral Arteries;A07.231.114.228 Anterior Cerebral Artery;A07.231.114.228.100 Circle of Willis;A07.231.114.228.351 Middle Cerebral Artery;A07.231.114.228.550 Posterior Cerebral Artery;A07.231.114.228.700 Temporal Arteries;A07.231.114.228.868
where the text on each line prior to the semi-colon is the name of the item being described and the codes following the semi-colon are the hierarchical codes assigned to the item described.

In Mumps, this can be represented as (this image can be viewed larger):

In the above, a sparse disk resident global array named ^mesh() is created to store the items being described. Each array reference consists of a sequence of codes from the MeSH hierarchy and the text of the item described is stored at the final leaf node. For further examples using the MeSH codes, click here.

The upper limit on global arrays in this version is 246TB.

To legacy Mumps, Mumps/II adds the following features:

  1. Relational database access. Mumps/II interoperates with PostgreSQL, a widely used, free, (Berkeley license) open source RDBMS system. Mumps/II can access PostgreSQL databases as well as store the Mumps/II hierarchical and multidimensional file system in PostgreSQL tables.
  2. Advanced text processing functional support. Mumps/II adds many functions to the legacy Mumps base including functions to compute Smith-Waterman sequence alignments, the Perl Compatible Regular Expression Library, the Cosine, Jaccard, and Dice similarity coefficients, and a number of matrix manipulation routines.
  3. Shell scripting. Mumps/II has facilities to interact fully with the underlying operating system through shell scripts. These permit a full range of system functions to be directly executed from the Mumps/II environment.
  4. Translation to and compatibility with C++. The Mumps/II compiler translates Mumps/II programs to standard C++. Thus, Mumps/II programs can call upon the complete resources of the C++ runtime environment. Mumps/II programs may contain embedded C++ statements and there is a C++ class hierarchy to give user written C++ programs access to all Mumps/II facilities.
Mumps is essentially an interpreted language as Mumps commands can write and execute code.

The MDH (Multi-Dimensional and Hierarchical Data Base Toolkit) is a Linux/Cygwin based, open sourced, toolkit of portable software that makes many features of Mumps available to C++ programs. It supports very fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of data bases ranging in size up to 256 terabytes. The package is written in C and C++ and is available under the GNU GPL/LGPL licenses in source code form. You must install the Mumps Compiler in order to use the MDH.


Documentation and Installation

See MDH manual for Multi-Dimensional and Hierarchical Toolkit C++ Library details.

See Mumps/II Interpreter/Compiler manual for Interpreter/Compiler details. A file in the distribution named mumpsc/doc/compiler.html has additional details. (--> author makes shameless profit from sales of book <--).

You un-tar/gzip the distribution with a command such as: tar xvzf mumpscompiler-11.0.src.tar.gz which will build a sub-directory named mumpsc. Then, in mumpsc, as root:

configure prefix=/usr
make
make install

Use:

configure prefix=/usr --with-cpu64

if you have a 64 bit CPU and are using Linux.

See the PDF for advice if your systems hides its libraries somewhere odd.

You may need to install additional software on your Linux or Cygwin system. Please see the documentation for details. Mumps needs:

  1. The Perl Compatible Regular Expression Developement Library (required)
  2. The PostgreSQL RDBMS (optional)

If you use this work, please cite: O'Kane, Kevin C. (1999), "An M Compiler for Internet server applications", M Computing, 7(1):11-17.

and/or:

http://www.cs.uni.edu/~okane


License

The Mumps Compiler is distributed under the GNU GPL and GNU LGPL licenses. Please see each source module to determine which license applies. Generally speaking, the compiler itself is distributed under the GNU GPL license and the runtime libraries under the GNU LGPL. Copies of the licenses are included in the distributions along with copyright information. The PCRE code is dirstributed under its own license.


Kevin C. O'Kane
http://www.cs.uni.edu/~okane


In March, July, October and May, the Ides fall on the fifteenth day.