Mumps/MDH Toolkit
MDH: The Multi-Dimensional and Hierarchical
Database Toolkit Programmer's Guide
Version 2.1

Kevin C. O'Kane, Ph.D.
Computer Science Department
University of Northern Iowa
Cedar Falls, IA 50614
okane@cs.uni.edu
http://www.cs.uni.edu/~okane
March 1, 2007

Except as otherwise noted, this document is Copyright (c) 2004, 2006 Kevin C. O'Kane, Ph.D.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being: Page 1, with the Front-Cover Texts being: Page 1, and with the Back-Cover Texts being: no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".


The software is distributed under one of the following licenses (please see each source code module for specific copyright and license details applicable to that module). In general, the compiler itself is distributed under the GNU GPL license and the run-time support routines are distributed under the GNU LGPL.

  1. GNU General Public License

    This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

    This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

  2. GNU Lesser General Public License

    This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Full texts of the licenses appear at the end of this document. Programs may call upon the Perl Compatible Regular Expression Library which, in some cases, is distributed with the Mumps Compiler. The separate license and copyright statement for PCRE appears in Appendix B. You should also read the license provided with the Berkeley Data Base (http://www.sleepycat.com).


Contents

Function and Macro Library
Mumps Related Functions
  1. Ascii()
  2. CleanAllLocks()
    CleanLocks()
  3. cvt()
  4. Data()
  5. Dump()
    Restore()
  6. Eval()
  7. Extract()
  8. Find()
  9. Horolog()
  10. Justify()
  11. Kill()
  12. Length()
  13. Lock()
  14. mcvt()
  15. Merge()
  16. Name()
  17. Order()
  18. Pattern()
  19. Perl()
  20. Piece()
  21. Query(),
    Qlength()
    Qsubscript()
  22. ReadLine()
  23. SymGet()
  24. SymPut()
  25. $test
  26. Token()
    TokenInit()
  27. UnLock()
  28. xecute()
    Xecute()
  29. Zseek()
    Ztell()
Vector and Matrix Functions
  1. Avg()
  2. Count()
  3. Max()
  4. Min()
  5. Multiply()
  6. Sum()
  7. Transpose()
Text Processing, Searching and Retrieval Functions
  1. Boyer-Moore-Gosper Functions
  2. Centroid()
  3. Correlation Functions
    TermCorrelate()
    DocCorrelate()
  4. Inverse Document Frequency Function
    IDF()
  5. Shred() functions
  6. ShredQuery() functions
  7. Similarity functions
    Sim1()
    Cosine()
    Jaccard()
    Dice()
  8. stem()
  9. Stop list functions
  10. Synonym functions
  11. ScanAlnum()
  12. Stem()
Bioinformatics Related Functions
  1. Smith-Waterman Alignment Function
Global Array Functions
  1. Global array access functions
  2. Arithmetic operations on global arrays
  3. Assignment operators on global arrays
  4. Closing global arrays: GlobalClose
  5. Interpreter Get, Set, Order and Data functions on global arrays
  6. ostream functions on global arrays
  7. TreePrint()
Pattern Matching Functions
  1. begins()
  2. decorate()
  3. ends()
  4. replace()
Miscelaneous Functions
  1. Btree Access Macro
  2. c_str()
  3. command()
  4. Conversion functions
  5. EncodeHTML()
  6. ErrorMessage()
  7. HitRatio()
  8. Hashing functions
  9. c_str()
  10. s_str()
  11. Perl pattern match function: $perl()
Error Exceptions
  1. ConversionException
  2. GlobalNotFoundException
  3. MumpsSymbolTableException
  4. NumericRangeException

  • Appendix A - Example Code
  • Appendix B - PCRE License
  • Appendix C - Using PERL Expressions
  • Appendix D - Mumps 95 Pattern Matching



    Part I - Programmers Guide

    Software Distribution

    Source code distributions are available at: http://www.cs.uni.edu/~okane

    Important Notes:

    1. Details on installation of the Toolkit for Linux, Windows XP and Cygwin are contained in the Mumps Compiler Manual (compiler.html) which is part of the distribution package as well as operating specific "INSTALL" files contained in the distribution.

    2. Stack size can be an issue for some functions, most notably the Smith-Waterman alignment procedure. The stack size is set for WindowsXP programs in the file "mumpsc.bat" in the batch variable "STACK" which is set, by default, to 5,000,000. It may be raised (or lowered) as needed. The stack size is set for Linux in the "ulimit: command. This can be increased under Linux with the command:

      ulimit -s unlimited

      (Other options are ulimit -a and ulimit -aH to show limits).

    Introduction

    The MDH (Multi-Dimensional and Hierarchical) Database Toolkit is a Linux-based, open sourced, toolkit of portable software that supports fast, flexible, multi-dimensional and hierarchical storage, retrieval and manipulation of information in data bases ranging in size up to 256 terabytes. The package is written in C and C++ and is available under the GNU GPL/LGPL licenses in source code form. The distribution kit contains demonstration implementations of network-capable, interactive text and sequence retrieval tools that function with very large genomic data bases and illustrate the toolkit's capability to manipulate massive data sets of genomic information.

    The toolkit is distributed as part of the Mumps Compiler Versions exist for Linux, Cygwin, the DJGPP port of the GCC compiler for Windows XP and the command line version of the MicroSoft Visual C++ Compiler

    The toolkit is a solution to the problem of manipulating very large, character string indexed, multi-dimensional, sparse matrices. It is based on Mumps (also referred to as M), a general purpose programming language that originated in the mid 60's at the Massachusetts General Hospital. The toolkit supports access to the PostgreSQL relational data base server, the Perl Compatible Regular Expression Library, the Berkeley Data Base, and the Glade GUI builder as well as server-side development of interactive web pages.

    The principal database feature in this project is the global array which permits direct, efficient manipulation of multi-dimensional arrays of effectively unlimited size. A global array is a persistent, sparse, undeclared, multi-dimensional, string indexed data disk based structure. A global array may appear anywhere an ordinary array reference is permitted and data may be stored at leaf nodes as well as intermediate nodes in the data base array. The number of subscripts in an array reference is limited only by the total length of the array reference with all subscripts expanded to their string values. The toolkit includes several functions to traverse the data base and manipulate the arrays.

    The toolkit makes the data base and function set available as C++ classes and also permits execution of legacy Mumps scripts. To use the toolkit, you install the MDH and Mumps distribution kit and related code.

    Creating Global Arrays

    The class, function and macro libraries primarily operate on global arrays. Global arrays are undimensioned, string indexed, disk resident data structures whose size is limited only by available disk space. They can be viewed either as multi-dimensional sparse matrices or as tree structured hierarchies. Global arrays are a C++ class and must be declared or instantiated in your C++ program as an instance of the global. For example, to create the global named "gbl", do the following:

          #include <mumpsc/libmpscpp.h>
    
          global gbl("gbl");
    
    The instantiation consists of two parts: the name of the global array object and the name of the global array on disk associated with this object. In the above example, these are both "gbl". Note that the disk name of the global is enclosed in a parenthesized character string expression following the object name. The name in the expression need not (but usually does) match the name of the object. The name given in the parenthesized character string is the disk name of the global array. The global array object is associated with the disk name when the object is created. When the object is destroyed, the disk based global array persists.

    Global objects may be created through declarations as shown above or dynamically:

          global *gptr;
          gptr = new global ("gbl_name");
          (*gptr)("1","2","3") = "test";
    
    which is equivalent to:
          global g("gbl_name");
          g("1","2","3") = "test";
    
    The #include <mumpsc/libmpscpp.h> statement brings in the necessary header files for you C++ program. These include, in addition to the header files necessary to access the toolkit, the standard system libraries:

          #include <iostream>
          #include <iomanip>
          #include <string>
          #include <string.h>
          #include <math.h>
          #include <stdlib.h>
    

    These are referenced at the beginning of libmpscpp.h.in and you may modify them if your system uses different naming conventions.

    Each global declaration creates a global array name (gbl) to be an object or instance of the global class. Each global array you use must be first declared to be an object of the global class. Global names can be any valid C/C++ variable name.

    A global array will typically have one or more subscripts as discussed below. These will be of type mstring, or a null terminated array of char. Subscripts of global arrays must evaluate to a printable characters in the range of decimal 32 (space) to, but not including, tilde (~).

    Note:

    No data types other than mstring, or null terminated array of char (i.e., char *) may be used as subscripts. Numeric data types (int, short, long, float, double, etc.) may not be used as global array subscripts.

    Also, in any given global array reference, all the indices must be of the same data type (mstring or char *

    mstring is a data type (class) whose behavior is similar to the basic typeless string data type in Mumps. Objects of mstring are stored internally as strings but may contain text, integers and floating point values. Addition, multiplication, subtraction, division, modulo, and concatenation may be performed directly on mstring objects (see details below). Many of the following examples use mstring objects.

    Structure of Global Arrays

    Global arrays may be viewed either as multi-dimensional matrices or as tree structured hierarchies. As matrices, data may be stored not only at fully subscripted matrix elements but also at other levels. For example, given a three dimensional matrix mat1, you could initialize it as follows:

    #include <mumpsc/libmpscpp.h> global mat1("mat1"); int main() { mstring i,j,k; for (i=0; i<100; i++) for (j=0; j<100; j++) for (k=0; k<100; k++) { mat1(i,j,k)=0; } GlobalClose; return 0; } Alternatively, the above can be performed with int but the numeric indices must be converted to mstring before use: #include <mumpsc/libmpscpp.h> global mat1("mat1"); int main() { int i,j,k; for (i=0; i<100; i++) for (j=0; j<100; j++) for (k=0; k<100; k++) { mat1(mcvt(i),mcvt(j),mcvt(k))=0; } GlobalClose; return 0; }

    In this example, all the elements of a three dimensional matrix of 100 rows, 100 columns and 100 planes are initialized to zero. The function mcvt() converts from int to mstring.

    In the view expressed by the code above, the matrix is a traditional three dimensional structure with data stored at each fully indexed position or node.

    Unlike other programming languages, however, there are additional nodes of the matrix which could have been initialized such as indicated by the following example:

    #include <mumpsc/libmpscpp.h> global mat1("mat1"); int main() { mstring i,j,k; for (i=0; i<100; i++) { mat1(i)=i; for (j=0; j<100; j++) { mat1(i,j)=j; for (k=0; k<100; k++) { mat1(i,j,k)=0; } } } return 0; }

    In effect, this means that mat1 can also be a single dimensional vector, a two dimensional matrix and a three dimensional matrix simultaneously.

    Furthermore, not all elements of a matrix need exist. That is, the matrix can be sparse. For example:

    #include <mumpsc/libmpscpp.h>
    
          global mat1("mat1");
    
          int main() {
          mstring i,j,k;
          for (i=0; i<100; i=i+10)
                for (j=0; j<100; j=j+10) {
                      for (k=0; k<100; k=k+10) {
                            mat2(i,j,k)=0;
                            }
                      }
                }
          return 0;
          }
    

    In the above, only index values 0, 10, 20, 30, 40, 50, 60, 70, 80, and 90 are used to create each of the dimensions of the array and only those elements of the matrix are created. The omitted elements do not exist.

    For example, if you are running a drug protocol on a number of patients and are dosing with medications M1, M2, M3, ... on patients P1, P2, P3, ... and collecting observations on days D1, D2, D3, ... you could create a three dimensional matrix named protocol in which each plane consisted of the observations for each patient on each medication for a given day:

    D1

    D2

    D3

    D4

     

    M1

    M2

    M3

    M4

    M5

     

    M1

    M2

    M3

    M4

    M5

     

    M1

    M2

    M3

    M4

    M5

     

    M1

    M2

    M3

    M4

    M5

    P1

     

     

     

     

     

    P1

     

     

     

     

     

    P1

     

     

     

     

     

    P1

     

    X

     

     

     

    P2

     

     

     

     

     

    P2

     

     

     

     

     

    P2

     

     

     

     

     

    P2

     

     

     

     

     

    P3

     

     

     

     

     

    P3

     

     

     

     

     

    P3

     

     

     

     

     

    P3

     

     

     

     

     

    You could refer to patient P1, medication M2 on day D4 with the reference:

    protocol("P1","M2","D4")="X";

    Alternatively, you can view the same data base as a tree structure with patient id at the root, followed by medication, followed by day of study:

    Note that at each node in the tree, a data box may appear containing information about the node. Addressing a node is accomplished by giving its path description such as:

    protocol("P2","M2",D2)

    Compiling Programs

    To compile programs written in C++ that use the MDH (multi-Dimensional and Hierarchical) library, use the command:

          mumpsc myprog.cpp
    

    This will invoke the g++ compiler and make available the necessary libraries. The result will be a program named myprog.cgi which is executable. The cgi extension is used as the default because very often these programs may be used in connection with web servers. You may rename the program as you see fit, however. The script mumpsc is part of the Mumps Compiler which must be installed prior to using the toolkit.

    Accessing Global Arrays

    Note: prior to exiting a program that accessed globals arrays, you must execute a GlobalClose macro to shut down the global array facility. This flushes the system buffers to disk and insures that the file system if properly closed. Failure to do this will result in data base errors. This appears in your program as:

    GlobalClose;

    You may assign global arrray elements to variables of type mstring using the assignment operator(=).

    You may assign values of type int, float, double, mstring, string and char * to global array elements using the assignment operator (=).

    When global array references are passed to function, no more than one instance of the same global object should be used in the argument list. Each global object maintains a private static string which contains the most recent value fetched from the data base. When a global object is passed to a function, its this string value is effectively passed. This means that, in a function reference where two references to the same global object are passed, even though they have differing indices, the value passed will be the value for the second instance of the global. This restriction only applies where there are two or more instances of the same global.

    If you use a reference to a global without a parenthesized list following the name of the global, the reference will be to the most recent referenced global. Effectively, this is similar to the "naked indicator" from Mumps.

    Global Array Indices

    Internally, the indices of global arrays are always stored as character strings. If you initialize a global array with a loop, you must insure that the indices are represented as either values of type mstring or null terminated arrays of type char. Indices to globals may be either char* or mstring but MUST all be of the same type (i.e. all char * or all mstring). For example:

          mstring A,B,C;
          for (A=0; A<1000; A++)
                for (B=0; B<1000; B++)
                      for (C=0; C<1000; C++) {
                            array1(A,B,C) = "0";
                            }
    

    The above initializes an array of 1 billion elements to zero.

    Navigating Globals

    There are several builtin functions used to navigate the globals. The two most important are the Data() function and the Order() function. The Data() function tells you if a node exists and if it has descendants and the Order() function gives you the next higher (or lower) index at a given level in the global array tree.

    The Data() function returns an integer which indicates whether the global array node is defined:

    1. 0 if the global array node is undefined;
    2. 1 if it is defined and has no descendants;
    3. 10 if it is defined but has no value stored at the node (but does have descendants);
    4. 11 it is defined and has descendants.
    A global is defined if data has been stored at it. A "10" is returned for a node at which nothing has been stored but the node has descendants. For example, assuming the global array has only the contents created in the example below:

          global array1("array1");
    
          int result;
    
          array1("1","11") = "foo"
          array1("1","11","21") = "bar"
    
          result = array1("1").Data() ;            // yields 10
          result = array1("1","11").Data();        // yields 11
          result = array1("1","11","21").Data();   // yields 1 
    

    The other major navigation function is the Order() function. This gives you, for a given global array index, the next ascending or descending value for the last index. If the parameter to Order() is 1 or missing, the next ascending index is returned. If the parameter is -1, the next descending index is returned. To get the first (or last if the parameter is -1) value of an index, start with a null (empty) string. For example:

          mstring x, null;
          global array1("array1");
    
          array1("100") = "a";            // initialize the array with three entries
          array1("200") = "b";
          array1("300") = "c";
    
          null = "";
          
          x = array1(null).Order();       // get the first value of the first index: 100
          
          x = array1(x).Order();          // get the second value of the first index: 200
    
          x = array1(x).Order();          // get the third value of the first index: 300
    
          x = array1(x).Order();          // no more indices - returns empty string
    
          x = array1(null).Order(-1);     // get the last value of the first index: 300
          
          x = array1(x).Order(-1);        // get the second value of the first index: 200
    
          x = array1(x).Order(-1);        // get the first value of the first index: 100
    
          x = array1(x).Order(-1);        // no more indices - returns empty string
    
          for ( x = array1(null).Order(); x != null; x = array1(x).Order()) 
    
                   cout x << endl;        // writes 100 200 300 on separate lines
    
          for ( x = array1(null).Order(-1); x != null; x = array1(x).Order(-1)) 
    
                   cout x << endl;        // writes 300 200 100 on separate lines
    
          for ( x = 10; x < 100; x = x + 10) array1("200" , x) = x;
    
          for ( x = array1("200", null).Order(); x != null; x = array1("200", x).Order()) 
    
                   cout x << endl;        // writes 10 20 30 ... 90 on separate lines
          
    
    

    Each call to Order() gives the next value of the last index. The numeric parameter indicates if the direction is ascending (1) or descending (-1). If omitted, 1 is assumed. To get the first index, the empty string is supplied and the function returns the first index of the global array. For subsequent calls, it returns the next ascendant index value until there are no more indices. Then it returns the empty string.

    In the following example, we build a global array vector from an input file consisting of keywords with one keyword per line, keep a count of each time the keyword is used, and, at the end, print an alphabetized list of the keywords followed by the number of times each occurs, do the following:

        #include <mumpsc/libmpscpp.h>
        global key("key"); 
    
        int main() { 
    
        mstring word, null;
        long i; 
    
        null = "";
    
        while (1) {
            if ( ! word.ReadLine(cin)) break;
            if (key(word).Data())      // is word in vector?
                key(word)++;           // yes, increment count
            else key(word) = 1;        // not in vector - add
            } 
    
        word = null;
    
        while ((word = key(word).Order(1)) != null) // next word
    
          cout << word << " " << key(word) << endl; // print word and count
    
        return EXIT_SUCCESS;
        } 
    

    In the above, each line is read into the variable word until the end of file is reached. Each word is tested with the Data() function of the global array to determine if word exists in the key vector. The Data() returns zero if the element does not exist, non-zero if it does. In the case where the word is in the key global array vector, the value stored in the vector for the word is extracted into the variable i, incremented and stored back into the vector. If the word does not exist in the vector, it is added and its initial count is set to one.

    When all the words have been read and stored into the vector, the program sequences through the word entries and prints the words and the total number of times each one was present in the input file. Since global arrays are stored in ascending key order, the display of words will be alphabetic.

    Similarly, given a global array of patient lab data organized hierarchically first by patient id, then by lab test, then by date, we can print a table of patient id's, labs, dates and results with the following:

          #include <mumpsc/libmpscpp.h>
    
          global Labs("labs");
    
          int main() {
    
          mstring null, ptid, lab_test, date, rslt;
    
          null = "";
    
          // create dummy example data base
    
          Labs("1000","hct","July 12, 2003")="45";
          Labs("1000","hct","July 13, 2003")="46";
          Labs("1000","hct","July 14, 2003")="47";
          Labs("1000","hct","July 15, 2003")="48";
          Labs("1000","hgb","July 12, 2003")="15";
          Labs("1000","hgb","July 15, 2003")="14";
          Labs("1001","hct","July 12, 2003")="35";
          Labs("1001","hct","July 13, 2003")="36";
          Labs("1001","hct","July 14, 2003")="37";
          Labs("1001","hct","July 15, 2003")="38";
          Labs("1001","hgb","July 13, 2003")="15";
          Labs("1001","hgb","July 14, 2003")="15";
          Labs("1002","hct","Sept 12, 2003")="35";
          Labs("1002","hct","Sept 13, 2003")="36";
          Labs("1002","hct","Sept 14, 2003")="37";
          Labs("1002","hct","Sept 15, 2003")="38";
          Labs("1002","hgb","Sept 13, 2003")="15";
          Labs("1002","hgb","Sept 14, 2003")="15";
    
          ptid = null;
    
          while ( (ptid = Labs(ptid).Order(1)) != null) {
    
              lab_test = null;
    
              while ( (lab_test = Labs(ptid,lab_test).Order(1)) != null) {
    
                  date = null;
    
                      while ( (date = Labs(ptid,lab_test,date).Order(1)) != null) {
    
                          cout << ptid << " " << lab_test << " " << date ;
    
                          cout << " " << Labs(ptid,lab_test,date) << endl;
    
                          }
                      }
                  }
    
          GlobalClose;
    
          return 1;
          }
    

    The above begins with an empty string for patient id ptid. This is used at the outer loop level to cycle through all the patient ids. At the first nexted loop, the program cycles through all the lab test names (lab_test) then at the innermost level, it cycles through all the dates (date). The resulting table is of the form:

          1000 hct July 12, 2003 45
          1000 hct July 13, 2003 46
          1000 hct July 14, 2003 47
          1000 hct July 15, 2003 48
          1000 hgb July 12, 2003 15
          1000 hgb July 15, 2003 14
          1001 hct July 12, 2003 35
          1001 hct July 13, 2003 36
          1001 hct July 14, 2003 37
          1001 hct July 15, 2003 38
          1001 hgb July 13, 2003 15
          1001 hgb July 14, 2003 15
    

    Locking the Data Base

    There are several functions for locking portions of the data base. Following legacy convention, a lock does not prevent access to an element but merely flags the element as locked. Locking views a global array as a tree structure. If an element is locked, its descendants are locked. An attempt to lock a locked element of an element that has a locked parent or a locked descendant will fail. The primary locking functions are $lock(), Lock() and UnLock():

          if ($lock(gbl(a,b,c)) cout << "locked" << endl;
          if (gbl(a,b,c).Lock()) cout << "locked" << endl;
          gbl(a,b,c).UnLock();
    

    The $lock() and Lock() functions test to see if the node can be locked and locks it if possible. It returns true (1) if successful and false (0) otherwise ($test is set accordingly). A node can be locked if it itself is not locked, if it has no descendants that are locked and if it is not the descendant of a locked node. The UnLock() function releases a lock on a node.

    Additionally, there are functions to release all locks for the current process and all locks for all processes:

        CleanLocks();      // release all locks for this process only
        CleanAllLocks();  // release all locks for all processes
    

    Invoking the Mumps Interpreter

    The full facilities of the Mumps interpreter can be invoked from C++ programs. The interpreter reads, parses and executes commands presented to it at run time. It may also read and execute text files containing Mumps programs. The interpreter is invoked by means of the Xecute() macro and xecute() functions:

    int Xecute("command")
    int xecute(mstring command)
    int xecute(string command)
    int xecute(char * command)

    These functions and macro invoke the Mumps interpreter and execute the text replacing "command". They return 1 of successful, 0 otherwise. With Xecute(), if the mumps command contains quotes or other special symbols, they will be automatically prefixed with backslashes (e.g., quote becomers \").

    Xecute("set i="test"));
    Xecute("for  s i=$order(^a(i)) quit:i=""  set sum=sum+^a(i)");
    

    Details on the Mumps Language are contained in the file compiler.html in the mumpsc/doc subdirectory of the Mumps Compiler distribution. See also: mtring::Eval() for expression interpretation.

    Writing Active Web Server Pages

    C++ programs can be written with the toolkit to be web server active pages. For example:

    Web page HTML code:

    <html> <head> <title>Your title goes here</title> </head> <body bgcolor=silver> <form method="get" action="quiz2.cgi"> <center> Name: <input type="text" name="name" size=40 value=""> <br> </center> Class: <input type="Radio" name="class" value="freshman" > Freshman <input type="Radio" name="class" value="sophmore" > Sophmore <input type="Radio" name="class" value="junior" > Junior <input type="Radio" name="class" value="senior" checked> Senior <input type="Radio" name="class" value="grad" > Grad Student <br> Major: <select name="major" size=1> <option value="computer science" >computer science <option value="mathematics" >Mathematics <option value="biology" selected>Biology <option value="chemistry" >Chemistry <option value="earth science" >Earth Science <option value="industrial technology" >Industrial Technology <option value="physics" >Physics </select> <table border> <tr> <td valign=top> Hobbies: </td> <td> <input type="Checkbox" name="hobby1" value="stamp collecting" > Stamp Collecting<br> <input type="Checkbox" name="hobby2" value="art" > Art<br> <input type="Checkbox" checked name="hobby3" value="bird watching" > Bird Watching<br> <input type="Checkbox" name="hobby4" value="hang gliding" > Hang Gliding<br> <input type="Checkbox" name="hobby5" value="reading" > Reading<br> </td></tr> </table> <input type="submit" value="go for it"> </form> </body> </html>
    A C++ program can accept data from the web page, store the data in global arrays and return a summary web page to the browser. When using "get" mode data transmission from HTML forms, the form names and data are concatenated into a string, delimited by ampersands, containing "name=value" tokens. These are passed in an environment variable named QUERY_STRING. The include file mumpsc/cgi.h contains code to extract data from QUERY_STRING and store the data in the runtime symbol table. The function SymGet() can be used to retrieve values from runtim symbol table.
    #include <mumpsc/libmpscpp.h> global T("T"); int main() { mstring name; mstring class; mstring major; mstring hobby1; mstring hobby2; mstring hobby3; mstring hobby4; mstring hobby5; #include <mumpsc/cgi.h> cout << "Content-type: text/html " << endl << endl; name = SymGet("name"); class = SymGet("class"); major = SymGet("major"); hobby1 = SymGet("hobby1"); hobby2 = SymGet("hobby2"); hobby3 = SymGet("hobby3"); hobby4 = SymGet("hobby4"); hobby5 = SymGet("hobby5"); cout << "<html><body>"; if (name == "") { cout << "Name not specified <br> "; cout << "</body></html>" << endl; return EXIT_FAILURE; } T(name, mcvt("class")) = class; T(name, mcvt("major")) = major; if (hobby1.Length() != 0 ) T(name, mcvt("hobbies"), hobby1) = ""; if (hobby2.Length() != 0) T(name, mcvt("hobbies"), hobby2) = ""; if (hobby3.Length() != 0) T(name, mcvt("hobbies"), hobby3) = ""; if (hobby4.Length() != 0) T(name, mcvt("hobbies"), hobby4) = ""; if (hobby5.Length() != 0) T(name, mcvt("hobbies"), hobby5) = ""; cout << "Thank you " << name << " for your input<br>"; cout << "</body></html>" << endl; return EXIT_SUCCESS; }
    Note: you can test code by simulating input from a web browser with the following code:
    #!/bin/bash
    QUERY_STRING="abc=xyz&cde=123"
    export QUERY_STRING
    your_program.cgi
    

    The "name=value" sets (delimted by ampersands) will be passed to the program. Note: web server cgi protocol requires the value strings to be encoded (see EncodeHTML()).

    Class mstring

    The mstring class provides Mumps-like strings that can be used to write programs in C++ that treat variables in a manner similar to that of Mumps. This means that mstring objects are essentially strings on which arithmetic operations may be performed. For example:

    #include <mumpsc/libmpscpp.cpp> global x("x"); int main() { mstring a, b, c; a = "hello "; b = "world"; cout << (a || b) << endl; // concatenation // prints "hello world" for (a = 0; a < 10; a++) cout << a << endl; // prints 0 thru 9 for (a = 0; a < 10; a++) x(a) = a; // sets global array elements a = ""; while (1) { a = x(a).Order(1); if (a == "") break; cout << a << endl; // prints 0 thru 9 } cout << x(a).Data() << endl; // prints 1 c = "123 elm street"; c = c + 1; cout << c << endl; // prints 124 return EXIT_SUCCESS; }

    Note: the code "(a || b)" in the cout expression is parenthesized. If not parenthesized, the C++ compiler precedence will result in an error since the precedence of << is greater than ||.

    Objects of class mstring may:

    1. Contain character strings, integers or floating point values;

    2. Be assigned to from char *, string, mstring, float, int, or double.;

    3. Objects of mstring may be not initialized in declaration statements.

    4. Participate in add(+, +=), subtract(-, -=), multiply(*, *=), divide(/, /=), modulo (%, %=) (integers values only) pre/post increment/decrement (++/--), and concatenation (||) operations. The mode of the operation will depend on the mode of the other operand. Available modes ASCII string, integer and floating point.

    5. Participate in relational expressions >, >=, <, <=. The mode of comparison will depend on the mode of the other operand. Available modes ASCII string, integer and floating point.

    6. Participate in equality expressions == and !=. The mode of the comparison will depend on the mode of the other operand. Available modes ASCII string, integer and floating point.

    7. Participate in input and output stream operations >> and <<.

    8. Participate in assignment to objects of mstring and string.

    9. Be declared as arrays or allocated/freed by the new/delete operators. Only numeric subscripts permitted at this time.

    If an object of type mstring is to be used in connection with the interpreter, it must be declared with a string giving its name in the runtime symbol table. FOr example:

          mstring x("x");
    


    Btree Access

    Programmers may access the btree directly through the builtin BTREE macro. A number of examples can be found in mumpsc/doc/examples/btree in the distribution.

    To access the btree directly from a C++ program:

    You must first install the Mumps compiler and MDH. Include at the beginning of your program. You can now access the btree directly with the BTREE macro (see description below). Note: any keys you store in the btree co-exist with Mumps/MDH keys. In rare cases, these can interfere with one another if a key you store lies in the range of a global array key set.

    For example, the following program stores NBR_ITERATIONS (defined in btree.h which is included by libmpscpp.h usually with the value 100,000) of keys and data into the btree and then retrieves them (this "btest1.cpp" from mumpsc/doc/examples/btree.cpp). See the other examples and the documentation below for further details.

    /*#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ *#+ Mumps Compiler Run-Time Support Functions *#+ Copyright (c) A.D. 2001, 2002, 2003, 2004 by Kevin C. O'Kane *#+ okane@cs.uni.edu *#+ *#+ This library is free software; you can redistribute it and/or *#+ modify it under the terms of the GNU Lesser General Public *#+ License as published by the Free Software Foundation; either *#+ version 2.1 of the License, or (at your option) any later version. *#+ *#+ This library is distributed in the hope that it will be useful, *#+ but WITHOUT ANY WARRANTY; without even the implied warranty of *#+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU *#+ Lesser General Public License for more details. *#+ *#+ You should have received a copy of the GNU Lesser General Public *#+ License along with this library; if not, write to the Free Software *#+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA *#+ *#+ http://www.cs.uni.edu/~okane *#+ *#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ *#+ *#+ Some of this code was originally written in Fortran *#+ which will explain the odd array and label usage, *#+ especially arrays beginning at index 1. *#+ *#++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ #include <mumpsc/libmpscpp.h> int main() { long i,j; unsigned char key[1024],data[1024]; printf("Store sequentially ascending keys"); for (i=0; i<NBR_ITERATIONS; i++) { sprintf( (char *) key,"key %ld",i); sprintf( (char *) data,"%ld%c",i,0); if (!BTREE(STORE,key,data)) { printf("error\n"); return 1; } if (i%60000L==0) { printf("\n %ld ",i); fflush(stdout); } if (i%1000==0) { putchar('.'); fflush(stdout); } } printf("\nretrieve"); for (i=0; i<NBR_ITERATIONS; i++) { sprintf( (char *) key,"key %ld",i); if (!BTREE(RETRIEVE,key,data)) { printf("error 1\n"); return 1; } sscanf( (char *) data,"%ld",&j); if (j!=i) { printf("error 2\n"); printf("%d != %d\n",i,j); return 1; } if (i%60000L==0) { printf("\n %ld ",i); fflush(stdout); } if (i%1000==0) { putchar('.'); fflush(stdout); } } printf("\nlooks good!\n"); strcpy( (char *) key,""); strcpy( (char *) data,""); BTREE(CLOSE,key,data); return 1; }

    Function and Macro Library

    The following gives details on all the MDH functions and macros. Many have the same or similar syntax to the underlying legacy functions. The discussion assumes that "gbl" has been declared as above. The example indices ("a,b,c") are for illustration purposes. Your actual globals array reference will be different. Many of the functions below mimic the same legacy functions. Please note that not all functions accept all possible argument data types. Check the function definition below for details.

    1. int mstring::Ascii()
      int mstring::Ascii(int start)

      Returns the numeric value of an ASCII character. If no "start" is specified, the numeric values of the first character of invoking mstring is used. If "start" is specified, the numeric value of "start"'th character of nvoking is chosen. If the empty string is given, -1 is returned. For example:

      mstring a;
      a="ABC";
      a.Ascii() yields 65
      a.Ascii(1) yields 65
      a.Ascii(2) yields 66
      

    2. Global array access functions

      int global::Int();
      double global::Double();
      mstring global::Mstring();
      char * global::Char(char * buf, int max);

      The functions return the content of the invokimg global array object converted to the named data type. The Char() function is passed the address of a character array buffer into which the contents of the global array element, null terminated, will be placed and the address of the buffer is returned. The max argument for Char limits the length of the string returned to max-1.

      If the global array element odes not exist, the GlobalNotFoundException exception is thrown. If there is an error in converting the contents of the global to the named data type, a ConversionException is thrown.

      Examples:

      #include <mumpsc/libmpscpp.h> global t("t"); int main() { int a; float b; mstring c; mstring x; char d[100]; x=50; t(x)=99; a=t(x).Int(); cout << a << endl; b=t(x).Double(); cout << b << endl; c=t(x).Mstring(); cout << c << endl; t(x).Char(d,100); cout << d << endl; GlobalClose; }

      The above writes 99 four times.

    3. Arithmetic operations on global arrays

      The operations of add, subtract, multiply, divide, pre/post increment and pre/post decrement are defined (overloaded) for global variables. The operations are defined for mstring, short, unsigned short, int, unsigned int, long, unsigned long, float and double. Note: the contents of the global array node must be compatible with the dominant data type of the operation. If the contents of a global are not compatible with the operation (example, incrementing a string of text), the value of the global will be interpreted as zero. Examples: ????

      global gbl("gbl"); int i, j=10; string a = "10", b = "20", c = "30"; gbl(a,b,c) = 10; i = gbl(a,b,c) + 20; cout << i << endl; // prints 30 i = 20 + gbl(a,b,c); cout << i << endl; // prints 30 i = gbl(a,b,c) / j; cout << i << endl; //prints 3 i = gbl(a,b,c) * 2; cout << i << endl; // prints 20 gbl(a,b,c) ++; cout << gbl(a,b,c) << endl; // prints 11 gbl(a,b,c) --; cout << gbl(a,b,c) << endl; // prints 10 i = ++ gbl(a,b,c); cout << i << " " << gbl(a,b,c) << endl; // prints 11 i = gbl(a,b,c) ++; cout << i << " " << gbl(a,b,c) << endl; // prints 11 12 gbl(a,b,c) += 10; cout << gbl(a,b,c) << endl; // prints 22 gbl(a,b,c) -= 10; cout << gbl(a,b,c) << endl; // prints 12 gbl(a,b,c) *= 2; cout << gbl(a,b,c) << endl; //prints 24 gbl(a,b,c) /= 2; cout << gbl(a,b,c) << endl; // prints 12

    4. Assignment operations on global arrays

      Assignments to global arrays may be accomplished the assignment operator (=).

      When you access a global array, the access may result in the thrown error exceptions GlobalNotFoundException and/or ConversionException. The first can occur in any context that attempts to retrieve data from a global array where none exists. The second occurs if you attempt to convert the contents of a global to a numeric type where the contents of the global are not valid data for the conversion.

      If uncaught, both exceptions will result in program termination. Both exceptions may be caught, however, with code such as the following:

      #include <mumpsc/libmpscpp.h> global a("a"); int main() { long i; kill(a()); a("1") = "now is the time"; try { i = a("1"); } catch ( ConversionException ce) { cout << ce.what() << endl; } try { i = a("22"); } catch (GlobalNotFoundException nf) { cout << nf.what() << endl; } return 0; }

      You may assign data of the following types directly to global arrays: char *, int, string, mstring, double, global, unsigned int, float, short, unsigned short, long, and unsigned long. You may assign global arrays directly to variables of the following types: int, mstring, double, global, unsigned int, float, short, unsigned short, long, and unsigned long.

    5. double global::Avg()

      Returns the average of the values of data bearing nodes beneath the given global array reference. Example:

      global A("A"); mstring i,j; for (i=0; i<1000; i++) for (j=1; j<10; j++) { A(i,j) = j; } cout << A("100").Avg() << endl; // average of nodes below A("100") cout << A().Avg() << endl; // average of all nodes

      The above prints 5.5 - the average value of numeric data bearing nodes beneath A("100"). If there are non-numeric data elements, they are treated as a zero values and contribute to the result.

      The global array object must be specified with indices (i.e., a parenthesized list must follow the name of the global array object. An empty list means the entire array.

    6. int mstring::begins(mstring pattern);

      Returns an integer which is the starting point in the string of pattern or -1 if the pattern is not found. Throws: PatternException if the pattern is in error.

    7. Boyer-Moore-Gosper Functions

      int bmg_fullsearch(mstring search_string, mstring buffer_base);

      Returns the number of non-overlapping instances of "search_string" in "buffer_base".

      Examples:

      #include <mumpsc/libmpscpp.h> int main() { mstring a="now is the time for all good men to come to the aid of the party"; mstring b="to"; cout << bmg_fullsearch(b,a) << endl; return EXIT_SUCCESS; } yields: 2

      These functions are publically available from:

      ftp://ftp.uu.net/usenet/comp.sources.unix/volume5/bmgsubs.Z

      and are believed to be contributed source and are unrestricted with respect to use and redistribution, and, that most, if not all, the code was written by employee(s) of the United States and thus in the public domain. The distribution contains, in part, the following notes:

      Here are routines to perform fast string searches using the
      Boyer-Moore-Gosper algorithm; they can be used in any Unix program (and
      should be portable to non-Unix systems).  You can search either a file
      or a buffer in memory.
      
      The code is mostly due to James A. Woods (jaw@ames-aurora.arpa)
      although I have modified it heavily, so all bugs are my fault.  The
      original code is from his sped-up version of egrep, recently posted on
      mod.sources and available via anonymous FTP from ames-aurora.arpa as
      pub/egrep.one and pub/egrep.two.  That code handles regular
      expressions; mine does not.
      
      These have only been tested on 4.2BSD Vax systems.
      
      -Jeff Mogul
      mogul@navajo.stanford.edu
      decwrl!glacier!navajo!mogul
      

      BMGSUBS(3L)							   BMGSUBS(3L)
      
      NAME
             (bmgsubs)   bmg_setup,  bmg_search,  bmg_fsearch	 -  Boyer-Moore-Gosper
             string search routines
      
      SYNOPSIS
             bmg_setup(search_string, case_fold_flag)
             char *search_string;
             int case_fold_flag;
      
             bmg_fsearch(file_des, action_func)
             int file_des;
             int (*action_func)();
      
             bmg_search(buffer_base, buffer_length, action_func)
             char *buffer_base;
             int buffer_length;
             int (*action_func)();
      
      DESCRIPTION
             These routines perform fast searches  for  strings,  using  the	Boyer-
             Moore-Gosper  algorithm.	  No meta-characters (such as `*' or `.')  are
             interpreted, and the search string cannot contain newlines.
      
             Bmg_setup must be called as the first step in performing a search.  The
             search_string   parameter   is	the   string   to   be	searched  for.
             Case_fold_flag should  be  false	 (zero)	 if  characters	 should	 match
             exactly,	 and  true  (non-zero) if case should be ignored when checking
             for matches.
      
             Once a search string has been specified using bmg_setup,	 one  or  more
             searches for that string may be performed.
      
             Bmg_fsearch  searches  a	 file,	open  for  reading  on file descriptor
             file_des (this is not a stdio file.)  For each line that	 contains  the
             search string, bmg_fsearch will call the action_func function specified
             by the caller as action_func(matching_line, byte_offset).   The	match-
             ing_line	 parameter  is	a  (char *) pointer to a temporary copy of the
             line; byte_offset is the offset from the beginning of the file  to  the
             first  occurence of the search string in that line.  Action_func should
             return true (non-zero) if the search should continue, or	 false	(zero)
             if the search should terminate at this point.
      
             Bmg_search  is  like  bmg_fsearch,  except  that instead of searching a
             file, it searches the buffer pointed to by  buffer_base;	 buffer_length
             specifies the number of bytes in the buffer.  The byte_offset parameter
             to action_func gives the offset from the beginning of the buffer.
      
             If the user merely wants the matching lines  printed  on	 the  standard
             output,	the  action_func parameter to bmg_fsearch or bmg_search can be
             NULL.
      
      AUTHOR
             Jeffrey Mogul (Stanford University), based on code written by James  A.
             Woods (NASA Ames)
      
      BUGS
             Might  be  nice	to have a version of this that handles regular expres-
             sions.
      
             There are large, but finite, limits  on	the  length  of	 both  pattern
             strings	and  text lines.  When these limits are exceeded, all bets are
             off.
      
             The string pointer passed to action_func points to a temporary copy  of
             the  matching  line,  and  must	be copied elsewhere before action_func
             returns.
      
             Bmg_search does not permanently modify the buffer in any way, but  dur-
             ing  its execution (and therefore when action_func is called), the last
             byte of the buffer may be temporarily changed.
      
             The Boyer-Moore algorithm cannot find lines that do not contain a given
             pattern	(like  "grep  -v") or count lines ("grep -n").	Although it is
             fast even for short search strings, it gets faster as the search string
             length increases.
      
      				  16 May 1986			   BMGSUBS(3L)
      

    8. int BTREE(int code, unsigned char * key, unsigned char * data)

      BTREE() is a macro permitting direct access to the underlying btree system. The first argument, "code" is an integer indicating the operation to be performed (see below). The second argument is the key to be stored consisting of a null-terminated array printable ASCII characters. The length of the key should be no greater than one quarter of the btree block size whose default value is 8192 (i.e., max key length is about 2048 bytes in the default case). The third argument is the data to be stored with the key. It is a null-terminated string of printable ASCII characters not greater than the system defined limit STR_MAX (defaults to 4096). An empty string is interpreted as no data to be stored. Note that the second and third arguments must be unsigned char *. The macro returns an integer indicating success. It may also alter "key" or "data" to return values or for other purposes. The contents of "key" and "data" are not preserved across in invocation of BTREE() Examlples of using BTREE() are given in mumpsc/doc/examples/btree.

      Permitted btree operations:

      1. STORE - store a key and data value in the btree; retuns zero if successful, non-zero otherwise:
              unsigned char key[]="test key";
              unsigned char data[]="test data";
              if ( BTREE(STORE,key,data) == 0 ) cout << "stored" << endl;
              else cout << "not stored" << endl;
        
      2. RETRIEVE - retrieve data stored with a key; returns zero if successful, non-zero otherwise:
              unsigned char key[]="test key";
              unsigned char data[STR_MAX];
              if ( BTREE(RETRIEVE,key,data) == 0 ) cout << "retrieved: " << data << endl;
              else cout << "not retrieved." << endl;
        
      3. CLOSE - close the btree data base; returns zero:
              unsigned char key[]="";
              unsigned char data[]="";
              BTREE(CLOSE,key,data);
        
      4. XNEXT/PREVIOUS - retrieve next ascendina/descending key; returns one. Value of second and third arguments become the value of the next ascendina/descendingg key. An initial value of the empty string for the second argument will retrieve the first/last key and the value of the second argument becomes the empty string when there are no more ascending/descending values. An initial value of the empty string for the second argument will retrieve the first/last key.
              unsigned char key[]="";
              unsigned char data[STR_MAX];
              printf("\nbegin retrieve...\n");
              while(1) { // rerteive keys in ascending order
                    i=BTREE(XNEXT,key,data);
                    if (strlen( (char *) data)==0) break;
                    cout << key << endl;
                    }
        
        
    9. void global::Centroid(global B)

      A centroid vector B is calculated for the invoking two dimensional global array. The centroid vector is the average value for each for each column of the matrix. Any previous contents of the global array named to receive the centroid vector are lost. The invoking global array (A) must contain at least two dimensions. For example:

      #include <mumpsc/libmpscpp.h> global A("A"); global B("B"); int main() { mstring i,j; for (i=0; i<10; i++) for (j=1; j<10; j++) { A(i,j) = 5; } A().Centroid(B()); mstring a=""; while (1) { a=B(a).Order(1); if (a=="") break; cout << a << " --> " << B(a) << endl; } return 0; }
      Yields:
      1 --> 5 2 --> 5 3 --> 5 4 --> 5 5 --> 5 6 --> 5 7 --> 5 8 --> 5 9 --> 5

      The above yields a vector giving the average value of each named column of the matrix "A" (5 in this case since each column is initialized with 5).

    10. void CleanLocks(void)
      void CleanAllLocks(void);

      "CleanLocks()" removes all locks for the current process. "CleanAllLocks()" removes all locks for all processes for which the current directory is the default directory. Locks are implemented by entries in a file named "Mumps.Locks" created and maintained in the current directory. This file must be read/write enabled for the current process. You may also delete all locks by removing this file. Locks are discussed elsewhere but, in brief, they are used to signal ownership of a portion of a global array. When a lock has been applied to a node, no other process may lock this node, any descendant node or any parent node. Locking does not actually prevent access, it merely marks a resource as locked.

    11. char * mstring::c_str()

      Returns a char * to a NULL terminated character string containing the same value as the mstring variable.

    12. command(string)

      "command()" is a macro that takes a quoted string constant argument. The macro surrounds the string with an extra set of quotes and processes any embedded quotes to backslash-quote. It then invokes a function (__command__()) which strips the extra surrounding quotes. The net effect of this is that you can pass a quoted string containing quotes without the need for "leaning toothpick" notation. Example:

      Normal usage: 
      
      $pattern(source_str, "3n1\"-\"2n1\"-\"4n") 
      strcpy(target, "for i=1:1:10 write \"test \",i,!"); 
      
      with command(): 
      
      $pattern(source_string, command("3n1"-"2n1"-"4n")) 
      xecute(command("for i=1:1:10 "test ",i,!")); 
      strcpy(target, command("for i=1:1:10 write "test ",i,!")); 
      

      The argument must be a character string constant.

      true A("1") == x --> true A("2") < 123 --> true: integer comparison A("2") < 123. --> true: double comparison A("2") < "123" --> true: string comparison A("2") < 2 --> false: integer comparison A("2") < "2" --> true: string comparison

      Note that the mode of comparison is dependent upon the second operand. In the case of string comparisons, an ASCII comparison takes place thus "123" is less than "2". -->

    13. Conversion functions

      char *cvt(long i)
      char *cvt(double i)
      char *cvt(float i)
      char *cvt(int i)

      These functions return a null terminated varying length character string containing in printable version of the argument. The functions contain short static character arrays and, consequently, are not threadsafe.

    14. GlobalClose;

      This macro closes the global array files. The global arrays must be closed on exit or they will be corrupt. The macro causes the file system to flush all its buffers and cache and close the file system. Normally, a "GlobalClose" is executed automatically when your program ends except if your program is terminated by SIGKILL or SIGSTOP (which cannot be trapped). If your program is using a large memory based cache (cache's can be 1 GB or more, on some systems), there may be a noticeable delay in file system shutdown due to the time required to write the cache to disk.

    15. Correlation functions

      void global::TermCorrelate(global B)
      void global::DocCorrelate(global B, mstring fcnname, double threshold)
      void global::DocCorrelate(global B, char * fcnname, double threshold)

      These functions build document indexing correlation matrices. The invoking global is assumed to be a two dimensional document-term matrix whose rows are documents and whose columns represent the occurrence of terms in the documents (either weights or frequencies).

      TermCorrelate() builds a square term-term correlation matrix in B from the invoking document-term matrix.

      DocCorrelate() builds a square document-document correlation matrix from the invoking document-term matrix. The name of the function to be used in calculating the document-document similarity is given in fcn and may be Cosine, Jaccard, Dice, or Sim1. The minimum corrrelation threshold is given in threshold which defaults to 0.80 if omitted.

      TermCorrelate() Example:
      #include <mumpsc/libmpscpp.h> global A("A"); global B("B"); int main() { long i,j; A("1","computer")=5; A("1","data")=2; A("1","program")=6; A("1","disk")=3; A("1","laptop")=7; A("1","monitor")=1; A("2","computer")=5; A("2","printer")=2; A("2","program")=6; A("2","memory")=3; A("2","laptop")=7; A("2","language")=1; A("3","computer")=5; A("3","printer")=2; A("3","disk")=6; A("3","memory")=3; A("3","laptop")=7; A("3","USB")=1; A().TermCorrelate(B()); mstring a; mstring b; a=""; while (1) { a=B(a).Order(); if (a=="") break; cout << a << endl; b=""; while (1) { b=B(a,b).Order(1); if (b=="") break; cout <<" " << b << "(" << B(a,b) << ")" << endl; } } return 0; }
      Yields:
      USB computer(1) disk(1) laptop(1) memory(1) printer(1) computer USB(1) data(1) disk(2) language(1) laptop(3) memory(2) monitor(1) printer(2) program(2) data computer(1) disk(1) laptop(1) monitor(1) program(1) disk USB(1) computer(2) data(1) laptop(2) memory(1) monitor(1) printer(1) program(1) language computer(1) laptop(1) memory(1) printer(1) program(1) laptop USB(1) computer(3) data(1) disk(2) language(1) memory(2) monitor(1) printer(2) program(2) memory USB(1) computer(2) disk(1) language(1) laptop(2) printer(2) program(1) monitor computer(1) data(1) disk(1) laptop(1) program(1) printer USB(1) computer(2) disk(1) language(1) laptop(2) memory(2) program(1) program computer(2) data(1) disk(1) language(1) laptop(2) memory(1) monitor(1) printer(1)

      The above gives the number of co-occurences of each word with each other word. For example, the words "computer" and "memory" co-occur in two vectors (2 nd 3) while the words "laptop" and "computer" co-occur in all three vectors. If each vector is thought of as a document, the strength of the co-occurences between words is a measure of similarity for indexing purposes.

      DocCorrelate() Example:
      #include <mumpsc/libmpscpp.h> global A("A"); global B("B"); int main() { long i,j; A("1","computer")=5; A("1","data")=2; A("1","program")=6; A("1","disk")=3; A("1","laptop")=7; A("1","monitor")=1; A("2","computer")=5; A("2","printer")=2; A("2","program")=6; A("2","memory")=3; A("2","laptop")=7; A("2","language")=1; A("3","computer")=5; A("3","printer")=2; A("3","disk")=6; A("3","memory")=3; A("3","laptop")=7; A("3","USB")=1; A().DocCorrelate(B(),"Cosine",.5); mstring a=; mstring b; a="" while (1) { a=B(a).Order(1); if (a=="") break; cout << a << endl; b=""; while (1) { b=B(a,b).Order(1); if (b=="") break; cout <<" " << b << "(" << B(a,b) << ")" << endl; } } return 0; }
      Yields
      1 2 0.887096774193548 3 0.741935483870968 2 1 0.887096774193548 3 0.701612903225806 3 1 0.741935483870968 2 0.701612903225806

      The above program calculates the similarities between the document vectors according to the Cosine method.

    16. long global::Count()

      Returns the number of data bearing nodes beneath the given global array reference. Example:

      #include <mumpsc/libmpscpp.h> global A("A"); int main() { mstring i,j; for (i=1; i<11; i++) for (j=1; j<11; j++) { A(i,j) = 5; } cout << "Full count: " << A().Count() << endl; cout << "A row count: " << A("5").Count() << endl; return EXIT_SUCCESS; } Yields Full count: 100 A row count: 10

    17. char * cvt(arg)

      The function converts the argument to a null terminated character string. The arguments may be long, double, float, and int. Do not use this function more than once in an expression as the returned pointer is to a static variable in the function. Multiple calls will point to the same variable.

    18. int global::Data()

      The function Data() returns an integer which indicates whether the global array node is defined. The value returned is 0 if the global array node is undefined, 1 if it is defined and has no descendants; 10 if it is defined but has no value stored at the node (but does have descendants); and 11 it is defined and has descendants.

      If a global array with no indices is passed to these functions, a value of "10" will be returned if the array exists and "0" if the array does not exist. For example:

      Given: global gbl("gbl"); global non("non"); gbl("1","11")="foo" gbl("1","11","21")="bar" Then: gbl("1").Data() // returns 10 - node exists, has no data, has descendents gbl("1","11").Data() // returns 11 - node exists, has data and has a descendent gbl("1","11","21").Data() // returns 1 - nodes exists, has data but no descendents

    19. int mstring::decorate(mstring pattern, mstring left, mstring right);

      Locates the pattern in the invoking mstring and inserts left immediately to the left of the string that matched the pattern and inserts right immediately to the right of the found pattern. Returns 1 if the pattern was found and the insertions were made, -1 if the pattern was not found, and less than -1 for other errors (see PCRE documentation concerning pcre_exec() return codes). Throws: PatternException().

    20. char * mstring EncodeHTML(char * arg) mstring EncodeHTML(mstring arg)

      Encodes the argument string according to HTML rules and returns the result. Alphabetics and numbers are unchanged. Blanks become plus signs and all other characters replaced by "%xx" where "xx" is the hexadecimal value of the character in the ASCII collating sequence. The function is used mainly in connection with parameters passed with URL's which may not contain blanks or special characters. the code in cgi.h is used to decode these strings. Example:

      #include <mumpsc/libmpscpp.h> int main() { char x[]="now is =()$.& the time"; cout << EncodeHTML(x) << endl; return EXIT_SUCCESS; } Yields now+is+%3D%28%29%24%2E%26+the+time

    21. int mstring::ends(mstring pattern)

      Returns an integer giving the character position (relative to zero) immediately following the string that matched pattern. Returns -1 if the string did not match. Throws: PatternException.

    22. void ErrorMessage(char * message, int line_number)

      This function (written in C and part of the underlying legacy library) will print and error message, close the global array files and terminate the program. The integer "line_number" will be printed with the message. The pre-processor predefined macro "__LINE__" can be used here. Example:

      ErrorMessage("Cannot locate patient",__LINE__);

    23. Error Exceptions

      The toolket generates (throws) exceptions for certain conditions. For example, when you access global arrays with the toolkit, the accesses may result in the thrown error exceptions:

      1. ConversionException.
      2. GlobalNotFoundException
      3. MumpsSymbolTableException.
      4. NumericRangeException.

      The first can occur in any context that attempts to retrieve data from a global array where none exists. The second occurs if you attempt to convert the contents of a global to a numeric type where the contents of the global are not valid data for the conversion.

      If uncaught, both exceptions will result in program termination.

      The following are the exceptions thrown by the toolkit:

      1. ConversionException() - usually occurs when you attempt to store a value from a global array into a numeric variable but the string in the global is not a valid number.
      2. GlobalNotFoundException() - thrown by an attempt to reference non-existent global array data.
      3. MumpsSymbolTableException() - thrown by an attempt to fetch the value of a non-esistent variable from the Mumps runtime symbol table.
      4. NumericRangeException() - thrown by attempts to divide by zero or using arguments with values less that or equal to zero to log functions.

      #include <mumpsc/libmpscpp.h> global a("a"); int main() { long i; a().Kill(); mstring A; a("1") = "now is the time"; try { i = a("1"); } catch ( ConversionException ce) { cout << ce.what() << endl; } try { i = a("22"); } catch (GlobalNotFoundException nf) { cout << nf.what() << endl; } try { A=SymGet("abc"); } catch (MumpsSymbolTableException st) { cout << nf.what() << endl; } return 0; }

    24. mstring mstring::Extract([int start, [int end]])

      Returns an mstring containing a substring substring of the first argument. The substring begins at the position noted by the second operand. If the third operand is omitted, the substring consists only of the "start" character of invoking source string. If the third argument is present, the substring begins at position "start" and ends at position "end". If no argument is given, the function returns the first character of the string. If "end" specifies a position beyond the end of source string, the substring ends at the end of source string;. String position counting begins at one (not zero). For example:

      mstring x; x="ABCDEF"; x.Extract(2) yields "B" x.Extract(3,5) yields "CDE"

    25. int mstring::Find(mstring pattern_string [, int start)
      int mstring::Find(const char * pattern_string [, int start)

      Find() searches the first argument for an occurrence of the second argument. If one is found, the value returned is one greater than the end position of the second argument in the first argument. If "start" is specified, the search begins at position "start" in argument 1. If the second argument is not found, the value returned is 0. String position counting begins at position one. For example:

      mstring x; x="ABC"; x.Find("B") yields 3 x="ABCABC"; x.Find("A",3) yields 5

    26. Interpreter Get, Set, Order and Data functions on global arrays

      mstring GlobalGet (mstring global_ref)
      char * GlobalGet (char * global_ref)

      mstring GlobalOrder (mstring global_ref, int direction)
      char
      * GlobalOrder (char * global_ref, int direction)

      int GlobalData (mstring global_ref)
      int
      GlobalData (char * global_ref)

      int GlobalSet (mstring global_ref, mstring source)
      int GlobalSet (mstring global_ref, char * source)
      int GlobalSet (char * global_ref, mstring source)

      These function use the interpreter. These functions are used to permit runtime construction and access to global arrays. In both cases global_ref is a string containing a global array reference. This string can be dynamically constructed at runtime or may be read from a file or another global. Note: as this facility uses the interpreter, global array references must be preceded by the circumflex character (^).

      In the case of the GlobalGet() functions, the string global array reference is interpreted and the value stored at the reference returned. If the reference is invalid or no data is stored, the value returned is the empty string and $test is set to false (zero). If a value is found, $test is set to true and the value is returned.

      GlobalOrder() gives the next or prior value of the last index of the global array reference depending upon if direction is 1 (next) or -1 (prior). $test is set to 0 in the event of an error and 1 if there is no error. See Order().

      GlobalData() returns a number indicating if the node exists and has descendants (see Data()). $test is set to 0 if there i>s an error, 1 otherwise. In the case of the GlobalSet() functions, the second argument is a string of data to be stored at the global array reference. The runtime routines will interpret the global_ref and assign the source to it. The value returned is one if successful ($test is set to 1), zero if not successful ($test set to 0). Examples:

      mstring a,b; a = "^x(\"1\")"; b = "test string"; if (GlobalSet(a,b) != 0) cout << "error\n";

      These functions can be used to allow a program to create a text string global array reference and then use the string to address the global. Note that the target must contain either quoted literals or variables previously instantiated to the interpreter environment (see $SymSet() and SymGet()).

      Generally speaking, these functions will be only used for dynamically constructed global array references. Most access to globals will be by overloaded shift or assignment operators.

    27. double HitRatio(void)

      Calculates the native global array processor cache hit ratio since the beginning of the program or the last call to HitRatio() The native global array file processor, as opposed to the Berkeley Data Base, keeps track of how many file I/O requests are satisfied from data already in the file system's cache. This function gives the percentage of cache hits. It only works with the native global array processor.

    28. Hashing functions

      char * hash(char * str)
      long lhash(char * str)

      hash() returns either a null terminated character string up to 10 characters in length containing a numeric hash code of the string passed as an argument. The argument may be up to STR_MAX characters in length. lhash() returns an unsigned long value of the hash value.

    29. mstring Horolog()

      Returns a mstring containing of two numbers. The first is the number of days since December 31, 1840 and the second is the number of seconds since the most recent midnight. These values are relative to Greenwich Mean Time.

    30. Inverse Document Frequency function

      void global::IDF(doubleDocCount)

      The IDF() function calculates for the global array vector provided the inverse document frequency weight of each term. The vector should be indexed by words and have stored the number of documents in which each word occurs. The document count will be replaced by the calculated IDF value. The IDF is log2(DocCount/Wn)+1 where Wn is the number of documents in which a term appears (the document freqwuency). The value DocCount is the total number of documents present in the collection. Example:

      #include <mumpsc/libmpscpp.h> global a("a"); int main() { kill(a()); a("now")=2; a("is")=5; a("the")=6; a("time")=3; a().IDF(4); a().TreePrint(); return 0; } yields: is=0.678072 now=2.000000 the=0.415037 time=1.415037

    31. mstring mstring::Justify(int field_width[, int precision])

      Justify() right justifies the invoking mstring in an mstring field whose length is given by the first argument. If the second argument is present and a positive integer, the invoking mstring is right justified in a field whose length is given by the first argument with "precision" decimal places. The two argument form imposes a numeric interpretation upon the first argument.

      x="39";
      x.Justify(3) yields " 39"
      x="TEST";
      x.Justify(7) yields " TEST"
      x="39";
      x.Justify(4,1) yields "39.0"
      

    32. void global::Kill()

      This function deletes a node and all its descendants. Examples:

      gbl().Kill();       // kill entire global array "gbl"
      gbl(a,b,c).Kill();  // kill stated node and all descendants
      

    33. int mstring::Length()
      int mstring::Length(char * pattern_string)
      int mstring::Length(mstring pattern_string)

      The function returns the string length of the invoking mstring. For example:

              x="ABC";
              cout << x.Length() << endl;  // writes 3
              x="abcabcabcabc";
              cout << x.Length("abc") << endl;  // writes 5
      

      If an argument is given, the function returns the number of non-overlapping occurrences of "pattern_string" in the source string plus 1.

    34. mstring mcvt(arg)

      Converts the arg to mstring. Arg may be int, char *, float long or double.

    35. int global::Lock()

      Creates a lock on the named node. If successful, "$test" will be true (1), false (0) otherwise. Returns a 1 if the lock succeeds and a 0 otherwise.

      The "Lock()" function marks a portion of the data base for exclusive access for an individual user. The "UnLock()" frees prior locks (see below). The locks are stored in a file named "Mumps.Locks" which is opened for exclusive access by the locking/unlocking job. The contents of the file may be deleted to remove all locks. A lock does not actually prevent access to a global but merely marks it as locked. If another task attempts to place a lock on a locked node, the descendant of a locked node or a direct parent of a locked node, the lock attempt will fail. Examples:

      if (gbl(a,b,c).Lock()) { ..... } // locks gbl(a,b,c) and all children; if ($lock(gbl(a,b,c))) { ..... }

      See also: CleanLocks(), CleanAllLocks(), and UnLock().

    36. void Dump(char * filename)
      void Dump(mstring filename)
      void Dump(string filename)
      void Restore(char * filename)
      void Restore(mstring filename)
      void Restore(string filename)

      The global array data base is dumped to filname or read and restored from filename (null terminated array of chars). Both operations must not be done from the same program.

    37. mstring mstring::Eval();

      Evaluates the mumps expression of the invoking mstrin object and returns the result in an mstring. If an error occurs, an InterpreterException is thrown. The invoking mstring object may contain a valid mumps expression involving calling program mstring variables.

    38. double global::Max()

      Returns the maximum numeric value of the data bearing nodes beneath the given reference. Non-numeric values are treated as zeros. Example:

      #include <mumpsc/libmpscpp.h> global A("A"); int main() { mstring i,j; for (i=1; i<11; i++) for (j=1; j<11; j++) { A(i,j) = rand()%1000; } cout << "Max value of all: " << A().Max() << endl; cout << "Max value of row 10: " << A("10").Max() << endl; return EXIT_SUCCESS; }