The TM Machine Specification


CS 4550
Translation of Programming Languages


Introduction

TM is a simple target machine. Kenneth Louden created Tiny Machine for his textbook, Compiler Construction: Principles and Practice. TM has an architecture and instruction set complex enough to illustrate the important issues faced when writing a compiler, yet simple enough not to distract us with unnecessary details.



Architecture

TM provides two kinds of memory:

Memory addresses are non-negative integers. When the machine is started, all data memory is set to 0, except for the first memory location. That location contains the value of the highest legal address.

We use an extended version of the TM interpreter that accepts command-line arguments to the TM program and stores them in memory locations 1 through n.

TM provides eight registers, numbered 0 through 7. Register 7 is the program counter. The other seven registers are available for program use. When the machine is started, all registers are set to 0.

When the machine is started, after memory and registers have been initialized, TM begins execution of the program beginning in the first location of instruction memory. The machine follows a standard fetch-execute cycle:

The loop terminates when it reaches a HALT instruction or when an error occurs. TM has three native error conditions:



Instruction Set

TM provides two kinds of instructions: register-only and register-memory.

Register-only (RO) instructions are of the form

    opcode r1,r2,r3

where the ri are legal registers. These are the RO opcodes:

IN read an integer from stdin and place result in r1; ignore operands r2 and r3
OUT write contents of r1 to stdout; ignore operands r2 and r3
ADD add contents of r2 and r3 and place result in r1
SUB subtract contents of r3 from contents of r2 and place result in r1
MUL multiply contents of r2 and contents of r3 and place result in r1
DIV divide contents of r2 by contents of r3 and place result in r1
HALT ignore operands and terminate the machine

Register-memory (RM) instructions are of the form

    opcode r1,offset(r2)

where the ri are legal registers and offset is an integer offset. offset may be negative. With the exception of the LDC instruction, the expression offset(r2) is used to compute the address of a memory location:

    address = (contents of r2) + offset

There are four RM opcodes for memory manipulation:

LDC place the constant offset in r1; ignore r2
LDA place the address address in r1
LD place the contents of data memory location address in r1
ST place the contents of r1 to data memory location address

There are six RM opcodes for branching. If the value of r1 satisfies the opcode's condition, then branch to the instruction at instruction memory location address.

JEQ equal to 0
JNE not equal to 0
JLT less than 0
JLE less than or eual to 0
JGT greater than 0
JGE greater than or equal to 0

Note:



The TM Simulator

We do not have a hardware realization of the TM architecture. We do have a TM virtual machine, implemented as a C program. This program accepts assembly language programs written for TM and executes them according to the machine's specification.

Input to the VM

The VM accepts programs as text files of the following form:

Interaction with the VM

Invoke the virtual machine with the name of a TM assembly language program as an argument. If the filename does not have an extension, the simulator assumes .tm.

The simulator then requests a command. The basic commands for running the program are:

Several other commands accepted by the simulator provide rudimentary debugging capabilities:

Finally are these commands:

Command-Line Arguments

We use a version of the TM VM that is identical to the machine described in Louden's textbook, with one exception. Our simulator has been extended to accept command-line arguments to assembly-language programs. These arguments are placed by the VM at the base of the data memory.

For example, we can invoke the TM VM as follows:

    office > tm factorial-cli.tm 10
    TM simulation (enter h for help)...
    Enter command: g
    OUT instruction prints: 3628800
    HALT: 0,0,0
    Halted

This instruction loads the command-line argument 10 into register 0:

    2:     LD  0,1(0)    ; loads arg from DMEM location 1

If user provides multiple command-line arguments, they will be be placed in consecutive data memory locations beginning at location 1.

Note: If a TM program expects n command-line arguments, then the program should not place any static data objects in the first n spots of data memory.



Eugene Wallingford ..... wallingf@cs.uni.edu ..... August 22, 2016