#### **Bus-Based Computer Systems**

CPU bus, I/O devices, and interfacing
CPU system as a framework
System level performance
Development and debugging
An alarm clock design

#### **Bus-Based Computer Systems**

**Busses**. #Memory devices. **H**/O devices: △serial links Continuents and counters Keyboards ✓ displays △analog I/O

#### System architectures

**#**Architectures and components:

△software;

△hardware.

Some software is very hardwaredependent.

#### Hardware platform architecture

Contains several elements: **∺**CPU; **∺bus**; #memory; I/O devices: networking, sensors, actuators, etc. How big/fast much each one be?

#### Software architecture

Functional description must be broken into pieces:

#division among people;

**#**conceptual organization;

% performance;

**∺**testability;

**∺**maintenance.

#### HW/SW architectures

Hardware and software are intimately related:

%software doesn't run without hardware; %how much hardware you need is determined by the software requirements:

**⊳**speed;

<u>∧</u>memory.

#### **Evaluation boards**

Designed by CPU manufacturer or others.
Includes CPU, memory, some I/O devices.
May include prototyping section.
CPU manufacturer often gives out evaluation board netlist---can be used as starting point for your custom board design.

### Adding logic to a board

**#Programmable logic devices (PLDs)** provide low/medium density logic.
 **#Field-programmable gate arrays (FPGAs)** provide more logic and multi-level logic.
 **#Application-specific integrated circuits** (ASICs) are manufactured for a single purpose.

#### The PC as a platform

**#**Advantages:

△cheap and easy to get;

△rich and familiar software environment.

**#**Disadvantages:

requires a lot of hardware resources;

△not well-adapted to real-time.

## Typical PC hardware platform



#### **Typical busses**

#### ∺PCI: standard for high-speed interfacing

△33 or 66 MHz.

△PCI Express (PCIe): serial link.

🗵 4 data wires per lane,

≥V1.x: 250 MB/s per lane

≥V2.0: 500 MB/s per lane

≥V3.0: 1GB/s per lane

Second Second

#### Software elements

HBM PC uses BIOS (Basic I/O System) to implement low-level functions:

△boot-up;

Minimal device drivers.

₭ BIOS has become a generic term for the lowest-level system software.

### **Example: StrongARM**

#### **StrongARM system includes:**

- △CPU chip (3.686 MHz clock)
- - Real-time clock;
  - operating system timer
  - general-purpose I/O;
  - interrupt controller;
  - power manager controller;
  - reset controller.

#### **Debugging embedded systems**

**#**Challenges:

- Mard to generate realistic inputs;

#### Host/target design

## Here a host system to prepare software for target system:



#### Host-based tools

**#**Cross compiler:

△compiles code on host for target system.

**#**Cross debugger:

△displays target state, allows target system to be controlled.

#### Software debuggers

- **#**A monitor program residing on the target provides basic debugger functions.
- Bebugger should have a minimal footprint in memory.
- ₩User program must be careful not to destroy debugger program, but , should be able to recover from some damage caused by user code.

#### **Breakpoints**

₭ A breakpoint allows the user to stop execution, examine system state, and change state.

## Replace the breakpointed instruction with a subroutine call to the monitor program.

#### **ARM breakpoints**

| 0x400 | MUL r4,r6,r6 | 0x400 | MUL r4,r6,r6 |
|-------|--------------|-------|--------------|
| 0x404 | ADD r2,r2,r4 | 0x404 | ADD r2,r2,r4 |
| 0x408 | ADD r0,r0,#1 | 0x408 | ADD r0,r0,#1 |
| 0x40c | B loop -     | 0x40c | BL bkpoint   |

uninstrumented code code with breakpoint

#### **Breakpoint handler actions**

**∺**Save registers.

**#**Allow user to examine machine.

**#**Before returning, restore system state.

- Safest way to execute the instruction is to replace it and execute in place.
- Put another breakpoint after the replaced breakpoint to allow restoring the original breakpoint.

### In-circuit emulators (ICE)

A microprocessor in-circuit emulator is a specially-instrumented microprocessor.
 Allows you to stop execution, examine CPU state, modify registers.

#### Logic analyzers

### A logic analyzer is an array of low-grade oscilloscopes:



Computers as Components

#### Logic analyzer architecture



#### **Boundary scan**

Simplifies testing of multiple chips on a board.

- Registers on pins can be configured as a scan chain.
- ✓Used for debuggers, in-circuit emulators.



#### How to exercise code

%Run on host system.
%Run on target system.
%Run in instruction-level simulator.
%Run on cycle-accurate simulator.
%Run in hardware/software co-simulation environment.

#### Debugging real-time code

Bugs in drivers can cause nondeterministic behavior in the foreground problem.

Bugs may be timing-dependent.

# System-level performance analysis

Performance depends on all the elements of the system:

<mark>⊡</mark>CPU.

Cache.

⊡Bus.

Main memory.

►I/O device.



#### **Bandwidth as performance**

**#Bandwidth applies to several components:** 

Memory.

<mark>∕</mark>Bus.

CPU fetches.

High parts of the system run at different clock rates.

Different components may have different widths (bus, memory).

# Bandwidth and data transfers

₩Per video frame: 320 x 240 x 3 = 230,400
bytes.

**#**Transfer 1 byte/μsec, 0.23 sec per frame.

 $\square$ Too slow.

**#Increase bandwidth:** 

☐ Increase bus width.

☐ Increase bus clock rate.

#### **Bus bandwidth**

% T: # bus cycles.
% P: time/bus cycle.
% Total time for transfer:

 $rac{1}{2}$ t = TP.

- **#** D: data payload length.
- 33 O1 + O2 = overhead O.

🗠 Address, handshaking

- ₭ N bytes to be transferred
- Bus width: W bytes



 $T_{\text{basic}}(N) = (D+O)N/W$ 

#### Bus burst transfer bandwidth

¥ T: # bus cycles.
¥ P: time/bus cycle.
¥ Total time for transfer:
☑t = TP.
X D: data payload length.
¥ O1 + O2 = overhead O.



 $T_{burst}(N) = (BD+O)N/(BW)$ 

#### Memory aspect ratios



Computers as Components

#### Memory access times

- Hemory component access times comes from chip data sheet.
  - Page modes allow faster access for successive transfers on same page.
- **#** What if data doesn't fit naturally into physical words:
- ₭ A pixel: RGB 24-bit
  - An access for 24-bit-wide memory
  - △ 3 accesses for 8-bit wide memory
  - how about 32-bit wide memory
    - ☑ waste one byte for each access
    - 🗵 packing

### Bus performance bottlenecks

% Transfer 320 x 240
video frame @ 30
frames/sec = 612,000
bytes/sec.

Is performance
 bottleneck bus or
 memory?



#### Bus performance bottlenecks, cont'd.

**However Bus:** assume 1 MHz bus, D=1, O=3:

$$T_{\text{basic}} = (1+3)612,000/2 = 1,224,000 \text{ cycles}$$
  
= 1.224 sec.

#Memory: try burst mode B=4, width w=0.5. (assume 10MHz)

 $rac{1}{2}T_{mem} = (4*1+4)612,000/(4*0.5) = 2,448,000$ cycles = 0.2448 sec.

#### **Performance spreadsheet**

| bus          |          | memory       |          |  |
|--------------|----------|--------------|----------|--|
| clock period | 1.00E-06 | clock period | 1.00E-08 |  |
| W            | 2        | W            | 0.5      |  |
| D            | 1        | D            | 1        |  |
| 0            | 3        | 0            | 4        |  |
|              |          | В            | 4        |  |
| N            | 612000   | N            | 612000   |  |
|              |          |              |          |  |
| T_basic      | 1224000  | T_mem        | 2448000  |  |
| t            | 1.22E+00 | t            | 2.45E-02 |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |
|              |          |              |          |  |

Computers as Components
#### Parallelism



#### Alarm clock interface



#### Operations

Set time: hold set time, depress hour, minute.

## Set alarm time: hold set alarm, depress hour, minute.

**#**Turn alarm on/off: depress alarm on/off.

#### Alarm clock requirements

| name                    | alarm clock                                                                 |
|-------------------------|-----------------------------------------------------------------------------|
| name                    |                                                                             |
| purpose                 | 24-hour digital clock with one alarm                                        |
| inputs                  | set time, set alarm, hour, minute, alarm on/off                             |
| outputs                 | four-digit display, PM indicator, alarm ready, buzzer                       |
| functions               | keep time, set time, set alarm, turn alarm on/off, activate buzzer by alarm |
| performance             | hours and digits, no seconds; not high precision                            |
| manufacturing           | consumer product                                                            |
| cost                    |                                                                             |
| power                   | AC                                                                          |
| physical<br>size/weight | fits on stand                                                               |

#### Alarm clock class diagram



### Alarm clock physical classes

# Lights\*Buttons\*Speaker\*digit-val()<br/>digit-scan()<br/>alarm-on-light()set-time(): boolean<br/>set-alarm(): boolean<br/>alarm-on(): boolean<br/>minute(): boolean<br/>hour(): boolean<br/>hour(): booleanbuzz()

#### **Display class**

**Display** 

time[4]: integer alarm-indicator: boolean PM-indicator: boolean

set-time() alarm-light-on() alarm-light-off() PM-light-on() PM-light-off()

#### Mechanism class

#### **Mechanism**

Seconds: integer PM: boolean tens-hours, ones-hours: boolean tens-minutes, ones-minutes: boolean alarm-ready: boolean alarm-tens-hours, alarm-ones-hours: boolean alarm-tens-minutes, alarm-ones-minutes: boolean scan-keyboard() update-time()

#### **Update-time behavior**



#### Scan-keyboard behavior



#### System architecture

**#Includes**:

periodic behavior (clock);

Apperiodic behavior (buttons, buzzer activation).

**#**Two major software components:

➢interrupt-driven routine updates time;

foreground program deals with buttons, commands.

#### **Interrupt-driven routine**

Timer probably can't handle one-minute interrupt interval.

Here was a software wariable to convert interrupt frequency to seconds.

#### **Foreground program**

#Operates as while loop: while (TRUE) { read\_buttons(button\_values); process\_command(button\_values); check\_alarm();

#### Testing

**#**Component testing:

Can test foreground program using a mockup.

**System testing:** 

relatively few components to integrate;

△check clock accuracy;

Check recognition of buttons, buzzer, etc.

#### The CPU bus

Bus allows CPU, memory, devices to communicate.

△Shared communication medium.

**∺**A bus is:

 $\triangle A$  set of wires.

△A communications protocol.

#### **Bus protocols**

 Bus protocol determines how devices communicate.

Protocols are specified by state machines, one state machine per actor in the protocol.

**#**May contain asynchronous logic behavior.

#### Four-cycle handshake



#### Four-cycle handshake

- 1. Device 1 raises enq.
- 2. Device 2 responds with ack.
- 3. Device 2 lowers ack once it has finished.
- 4. Device 1 lowers enq.

#### **Microprocessor busses**

- Clock provides synchronization.
- R/W is true when reading (R/W' is false when reading).
- Address is a-bit bundle of address lines.
- Data is n-bit bundle of data lines.
- Data ready signals when n-bit data is ready.



#### **Timing diagrams**



#### **Bus read**



## State diagrams for bus read



Computers as Components

#### **Bus wait state**



#### **Bus burst read**



Computers as Components

#### **Bus multiplexing**



## DMA

#### Direct memory access (DMA) performs data transfers without executing instructions.

- △CPU sets up transfer.
- DMA engine fetches, writes.
- BMA controller is a separate unit.





#### **Bus mastership**

∺By default, CPU is bus master and initiates transfers.

- How we want to be the second bus master to perform its work.
  - △CPU can't use bus while DMA operates.

**Bus mastership protocol:** 

Bus request.

Bus grant.

#### **DMA operation**

- CPU sets DMA registers for start address, length.
- Here 3 Controls the unit.
- ₭ Once DMA is bus master, it transfers automatically.
  - May run continuously until complete.
  - May use every n<sup>th</sup> bus cycle.



# Bus transfer sequence diagram



Computers as Components

### System bus configurations



#### Bridge state diagram



#### **ARM AMBA bus**

- **#** Two varieties:
  - △ AHB is high-performance.
  - APB is lower-speed, lower cost.
- # AHB supports pipelining, burst transfers, split transactions, multiple bus masters.
- HI devices are slaves on APB.



#### **Memory components**

Several different types of memory:
DRAM.
SRAM.
Flash.
Each type of memory comes in varying:

△Capacities.

➡Widths.



#### **Random-access memory**

Dynamic RAM is dense, requires refresh.
 Synchronous DRAM is dominant type.
 SDRAM uses clock to improve performance, pipeline memory accesses.
 Static RAM is faster, less dense, consumes more power.

#### **SDRAM operation**



Computers as Components

#### **Read-only memory**

ROM may be programmed at factory.
Flash is dominant form of fieldprogrammable ROM.

- △Electrically erasable, must be block erased.
- □ Random access, but write/erase is much slower than read.
- △NOR flash is more flexible.
- △NAND flash is more dense.
## Flash memory

₭Non-volatile memory.

✓Flash can be programmed in-circuit.

**#**Random access for read.

**#**To write:

Erase a block to 1.

 $\square$ Write bits to 0.

# **Flash writing**

₩ Write is much slower than read.
▲ 1.6 µs write, 70 ns read.
₩ Blocks are large (approx. 1 Mb).
₩ Writing causes wear that eventually destroys the device.

Modern lifetime approx. 1 million writes.

# Types of flash

<mark>₩NOR</mark>:

△Word-accessible read.

⊡Erase by blocks.

<mark>₩NAND</mark>:

─ Read by pages (512-4K bytes).

⊡Erase by blocks.

NAND is cheaper, has faster erase, sequential access times.

## **Timers and counters**

**#Very similar**:

△a timer is incremented by a periodic signal;
 △a counter is incremented by an asynchronous, occasional signal.
 ೫Rollover causes interrupt.

# Watchdog timer

# Watchdog timer is periodically reset by system timer.

#If watchdog is not reset, it generates an interrupt to reset the host.



# Switch debouncing

₭ A switch must be debounced to multiple contacts caused by eliminate mechanical bouncing:

# Encoded keyboard

∺An array of switches is read by an encoder.

**%N-key rollover** remembers multiple key depressions.



Computers as Components

# LED

#### **#**Must use resistor to limit current:



Computers as Components

## 7-segment LCD display

#### 



# **High-resolution display**

# Liquid crystal display (LCD) is dominant form.

#Plasma, OLED, etc.

₭Frame buffer holds current display contents.

⊡Written by processor.

☐Read by video.



#Includes input and output device.
#Input device is a two-dimensional
voltmeter:



## **Touchscreen position sensing**



## **Digital-to-analog conversion**

**#**Use resistor tree:



## Flash A/D conversion

#### **K**N-bit result requires 2<sup>n</sup> comparators:



Computers as Components

### **Dual-slope conversion**

₩Use counter to time required to charge/discharge capacitor.

Charging, then discharging eliminates non-linearities.



## Sample-and-hold

**∺**Samples data:



## System architectures

**#**Architectures and components:

△software;

△hardware.

Some software is very hardwaredependent.