Sample Final Questions over Material since Test 2

1. Suppose we had a block transfer from an I/O device to memory. The block consists of 1024 words and one word can be transferred at a time. For each of the following, indicate the number of interrupts needed to transfer a block:

a) programmed-I/O - 0 since programmed-I/O does not rely on interrupts

b) interrupt-driven I/O - 1024 interrupts. After each word, the I/O device will issue an interrupt.

c) DMA (direct-memory access) - 1 interrupt. After the entire block has be transferred by the DMA device, an interrupt will be generated.

2. What is the main difference between programmed I/O and interrupt-driven I/O? Programmed I/O relies on the "user" program continually polling the status register of the I/O interface in a busy-wait loop. Interrupt-driven I/O issues the I/O command to the I/O interface and then immediately turns control of the CPU over to another program while the I/O is being performed. When the I/O completes, the I/O interface interrupts whatever the CPU is doing to get some attention.

3. What is the main difference between interrupt-driven I/O and DMA? Both rely on interrupts, but the granularity of the DMA transfer is much larger.

4. Assume special I/O instructions are used to fill I/O-interface registers. Why can't a user program use these instructions to communicate with the I/O device directly and "by-pass" the operating system's protection checking? The special I/O instructions would be privileged instruction, i.e., they would be executable only if the CPU was in "system mode."

5. Assume that memory-mapped I/O is used. Since Load and Store instructions are used to communicate with the I/O-interface registers, why can't a user program communicate with the I/O device directly and "by-pass" the operating system's protection checking? Whatever memory management techniques that are being used to restrict access by the user program to its own memory space, will also prevent access to the I/O-interface registers since they'll be assigned addresses outside of the user program's memory space.

6. Explain how a computer can protect against a user program going into an infinite loop. A CPU timer will be set by the operating system by privileged instructions before the CPU is turned over to the user program. As the user program runs this CPU timer will countdown. If it reaches zero, then it will generate an interrupt/exception causing the operating system to regain control of the CPU.

7. Draw an output transfer timing diagram (similar to Figure 4.25) using multiple clock cycles for a synchronous bus.

8. What are the goals of the memory hierarchy on a computer system? To provide a large virtual address space at a reasonable cost.

9. What are the advantages and disadvantages of dynamic RAM over static RAM? Advantages: cheaper since you can get more bits of storage per chip than static RAM. Disadvantages: access time of dynamic RAM is slower since the bit values must be sensed and refreshing must be done.

10. Synchronous DRAM design allows for accesses in burst mode with 2 (or more) data transfers from consecutive memory addresses. Burst mode allows for faster access of the second and later data transfers. For example, Figure 5.9 has a 6-1-1-1 timing which means that the first data transfer require 6 bus cycles and the next three data transfers only require 1 bus cycle.

Why does the second through fourth data transfers take less time than the first data transfer? For the first data transfer the Row address must be sent, the Row address must be decoded, the Row must be read from the cell array, the row read is latched into a "row register" (Static RAM), the Column address must be sent, and the Column address must be decoded. For the second - fourth data transfers the column address is only incremented and decoded since the data is read from the latched "row register".

11. Suppose we have a 32-bit address machine that is byte addressable. If it has a 128KB (217 bytes) cache with 64 (26) bytes per block.

a) How many total lines are in the cache? 211 cache lines

b) If the cache is direct-mapped, how many cache lines could a specific memory block be mapped to? one

c) If the cache is direct-mapped, what would be the format (tag bits, cache line bits, block offset bits) of the address? (Clearly indicate the number of bits in each)

d) If the cache is fully-associative, how many cache lines could a specific memory block be mapped to? 211

e) If the cache is fully-associative, what would be the format of the address?

f) If the cache is 4-way set associative, how many cache lines could a specific memory block be mapped to? 4

g) If the cache is 4-way set associative, how many sets would there be? 211/4 = 29 sets of size 4

h) If the cache is 4-way set associative, what would be the format of the address?

12. A pipelined processor has a 128-KB split, Level-1 cache (two-way 64-KB data cache, and two-way 64-KB instruction cache). Why is a split, Level-1 caches usually used on pipeline processors?

In a pipeline process, different instructions are in different stages of execution. For example, a five-stage pipeline with stages: fetch instruction, decode, fetch operands, execute, and write results would have a timing diagram such as

At a given time stage, the "fetch operands" and "fetch instruction" stages of two different instructions might need to access memory. Having separate/split caches for data and instructions allow both to operate in parallel so their access times overlap.