Fault Tolerant Computing

(3-0-0-3)

CMPE Degree: This course is Not Applicable for the CMPE degree.

EE Degree: This course is Not Applicable for the EE degree.

Lab Hours: 0 supervised lab hours and 0 unsupervised lab hours.

Technical Interest Group(s) / Course Type(s): VLSI Systems and Digital Design

Course Coordinator:

Prerequisites: ECE 6100

Corequisites: None.

Catalog Description

Key concepts in fault-tolerant computing. Understanding and use of modern
fault-tolerant hardware and software design practices. Case studies.

Course Outcomes

Not Applicable

Student Outcomes

In the parentheses for each Student Outcome:
"P" for primary indicates the outcome is a major focus of the entire course.
“M” for moderate indicates the outcome is the focus of at least one component of the course, but not majority of course material.
“LN” for “little to none” indicates that the course does not contribute significantly to this outcome.

1. ( Not Applicable ) An ability to identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics

2. ( Not Applicable ) An ability to apply engineering design to produce solutions that meet specified needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors

3. ( Not Applicable ) An ability to communicate effectively with a range of audiences

4. ( Not Applicable ) An ability to recognize ethical and professional responsibilities in engineering situations and make informed judgments, which must consider the impact of engineering solutions in global, economic, environmental, and societal contexts

5. ( Not Applicable ) An ability to function effectively on a team whose members together provide leadership, create a collaborative and inclusive environment, establish goals, plan tasks, and meet objectives

6. ( Not Applicable ) An ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions

7. ( Not Applicable ) An ability to acquire and apply new knowledge as needed, using appropriate learning strategies.

Strategic Performance Indicators (SPIs)

Not Applicable

Course Objectives

Topical Outline

Goals and Applications of Fault Tolerant Computing
Reliability, Availability, Safety, Dependability, etc.
Long Life, Critical Computation
High Availability Applications
Fault Tolerance as a Design Objective

Fault Models
Faults, Errors, and Failures
Causes and Characteristics of Faults
Logical and Physical Faults
Error Models

Fault Tolerant Design Techniques Based on Hardware Redundancy
Hardware Redundancy
TMR, N-modular Redundancy
Voting Methods
Duplication, Standby Sparing
Watchdog Timers
Hybrid Hardware Redundancy
N-modular Redundancy with Spares
Sift-out Modular Redundancy
Triple-duplex Architecture
Fault Tolerant Interconnection Networks

Fault Tolerant Design Techniques Based on Information Redundancy
Parity, M-of-N, Duplication Codes
Checksums, Cyclic Codes, Arithmetic Codes
Berger Codes, Hamming Error Correcting Codes
Code Selection Issues
Time Redundancy, Recomputing with Shifted Operands (RESO)
Software Redundancy, Checks and N-version Programming

Reliability Evaluation Techniques
Failure Rate, Mean Time to Repair, Mean Time Between Failure

Reliability Modeling, Fault Coverage
M-of-N Systems
Markov Models
Safety, Maintainability, Availability

Fault Tolerance in VLSI Circuits
Failure Models in VLSI
Redundancy Techniques in VLSI
Self-checking Logic
Reconfiguration Array Structures
Effect on Yield

Case Studies
FTSC, FTBBC
Space Shuttle
Tandem 16 Non Stop System
Stratus/32 System
ESS

This course will involve writing of a term paper by the students on
research/literature review/design in the fault tolerant computing area. The
topics will be chosen in consultation with the instructor.