081:115G Spring 2010
810:115g Information Storage and Retrieval (3 hours)

Last updated: October 20, 2009
Objective To understand computer based automatic indexing and retrieval of text/web based information.
Books: Experiments in Information Storage and Retrieval Using Mumps

Mumps Book PDF

Course Materials: Collection of data bases, examples and text
Requirements: The requiements will consist of a set of assignments to be performed individually and a project which may be done either individually or in small groups (approx. 3 persons max). As this couse depends heavily on lecture content, attendance is required. Excessive absence will result in a reduced grade for the project. The term "excessive absence" means: don't push your luck.

Term Project (25%)
Assignments (30%) Late assignments charged 5% per class day. Chronically late assignments will be charged at a higher rate.
Tests (2 at 15% each)
Final (15%).

Classes are lecture format. Cell phones, pagers, laptops and PDA's may not be used.

Assignments must include:

  • in a comment block your name
  • the assignment number
  • the due date
  • you must turn in your code and an example of its execution.
  • multiple pages must be stapled.

    Assignments not complying with the above will be returned ungraded. Assignmnents may not be submitted by email.

  • Test 1 TBA
    Test 2 TBA
    Final: Click Here
    Makeup Tests Makeup tests will be given only in cases of demonstrated need for causes such as serious illness, family emergency or University sanctioned schedule conflict. In all cases, written documentation will be required.
    Penalty for
    Hacking & Cheating
    A grade of "F" for the course and possible University disciplinary action. If your work duplicates in whole or part the work of another, both works will receive a grade of F.
    Project Projects will be presented in class with a brief online demonstration. A vote will be taken for the best project and the winner will be exempt from the final (grade of A will be entered for the final exam grade). You may work in teams of from 1 to 3 (hint: have one person do the indexing, one the retrieval code and the other do the web interface). All group members will receive the same grade.
    Final Grades Final grades will not be available via email. If you want your grade mailed to you, bring a stamped, self-addressed envelope to the final.
    Contact Click Here
    Computer You may want to use your own computer to do the assignments and project. If that is not the case, you may have an account on one of my Linux servers. The assignments and project are extreme I/O intensive. If you use your own machine, a desktop with lots of memeory is best.
    Data Base Reuters Unfiltered
    Reuters Filtered
    Reuters Filtered version 2
    Reuters Filtered version 3
    (version 2 has xoxoxo ... XOXOXO, version 3 has title1 ... title2).
    Getting Started
    1. If you use your own machine, get and install Ubuntu either native, under WUBI or Sun VirtualBox.
    2. Get and install Mumps (Check http://cns2.uni.edu/~okane/source/MUMPS-MDH/ for the latest version number). Install G++ (C++ compiler), PCRE (Perl Compatible Regular Expression) development library, and autoconf. In an Ubuntu terminal, as root, type:
      1. wget http://www.cs.uni.edu/~okane/source/MUMPS-MDH/mumpscompiler-12.ALPHA-01.src.tar.gz (use latest version).
      2. tar xvzf mumpscompiler-12.ALPHA-01.src.tar.gz
      3. cd mumpsc
      4. ./configure prefix=/usr
        or, if using a 64 bit version of Ubuntu:
        ./configure prefix=/usr --with-cpu64
      5. make
      6. make install
    3. Learn an adult editor (nano/pico are for children) : vi Tutorial
    4. Learn Mumps.
    HTML BareBones HTML Guide
    Assignments Do not push assignments under the office door - leave them in the CS Office. All pages must be stapled.

    Other Resources Mesh

    Documentation and Availability
    Documentation Descriptor Data Elements
    Descriptor Records
    Documentation Qualifier Data Elements
    Qualifier Records
    Supplementary Records

    Resources:

    Salton, G., Automatic Informatiuon Organization and Retrieval, McGraw Hill (1968)
    Salton, G., Automatic Text Processing, Addison-Wesley (1989)
    Salton, G., ed., The SMART Retrieval System, Prentice-Hall (1971).
    Salton, G., and McGill, M., Introduction to Modern Information Retrieval, McGraw-Hill, (1983)
    Borko, H., Automated Language Processing, (1968)
    The Smart System from Cornell: ftp://cs.cornell.edu/pub/smart


    Data Sets for Machine Learning
    WordNet
    Apache
    Web Archive (you think you have disk capacity problems!)
    PostgreSQL Tutorial
    Rod Library Electronic Resources
    ACM Digital Library
    Lemur Toolkit for Language Modeling
    Cornell Smart System
    SIGIR List and Archives
    UVA Electronic Text Center
    NLM Gateway
    Digital Library Research Laboratory
    Automatic Text Browsing Using the Vector Space Model
    Lawrence Berkeley Lab Science Articles Archive
    The Internet Archive
    Search Engine Features
    Anatomy of a Search Engine (Google)
    Medline (National Library of Medicine)
    Information Retrieval by C. J. van Rijsbergen
    Modern Information Retrieval Chapter 10
    Cystic Fibrosis Reference Collection
    Marti Hearst Site
    Online Papers
    Ed Fox Links
    IIT IR Publications
    Web IR
    WWW 10
    WWW 9
    WWW 8
    WWW 7
    WWW 5
    IRIS Project
    Lots of Links
    Top Ten Issues
    Searching Genomic Databases
    Amazon.com's recommender algorithm
    Huffman Trees
    Knuth Optimal Binary Trees
    Hu-Tucker Trees
    AVL (Balanced) Trees
    B trees and AVL Trees
    B Trees
    IBM Clever Project
    flex documentation
    Homology Searching
    Project Gutenberg

    The following notice is required by the University:

    "The Americans with Disabilities Act of 1990 (ADA) provides protection from illegal discrimination for qualified individuals with disabilities. Students requesting instructional accommodations due to disabilities must arrange for such accommodation through the Office of Disability Services. The ODS is located at: 213 Student Services Center, and the phone number is: 273-2676."

    Because the Office of Disability Services has procedures in place to determine the validity of disability claims as well as the need for instructional accommodations, faculty are reminded that they are to direct all students with accommodation requests to the above listed office.

    UNDER NO CIRCUMSTANCE SHOULD A FACULTY MEMBER MAKE AN ACCOMMODATION INDEPENDENT OF THE OFFICE OF DISABILITY SERVICES.

    Questions may be directed to: Disability Services Coordinator, at 273-2676 or to this office at 273-2846.