Security and Fault-tolerance in Distributed Systems
According to Lamport, a distributed system is one where the crash of a
computer that you've never heard of stops you from getting any work
done. This course presents methods for building dependable and secure
distributed systems. The emphasis is on fault-tolerant and
distributed cryptographic protocols. Topics include group
communication, failure detectors, reliable broadcast protocols,
distributed cryptography, threshold cryptosystems, Byzantine
agreement, quorum systems, and replication. Applications
to cluster computing, Internet services, and storage systems
will be presented.
The course presents principles and fundamental methods, and shows how
they are applied to real-world systems.
Lecturer. Dr.
Christian Cachin,
IBM Zurich Research Lab.
Teaching Assistant. Georges
Baatz.
Dates.
| Lecture: |
Thursday, 14:15-16:00, IFW A32.1, starting 26.10.2006. |
| Exercise: |
Thursday, 16:15-17:00, IFW A32.1, starting 26.10.2006. |
Web page. http://www.zurich.ibm.com/~cca/sft06/,
The course is part of the
Master
in Computer Science, in the
Specialization Track Information Security.
Prerequisites. Knowledge in information
security and/or network security, distributed systems, and cryptography.
- Introduction
- Dependability Concepts
- Quorums
- Registers and Shared Memory
- Consensus and Broadcast
- View-synchronous Group Communication
- Distributed Cryptography
- Byzantine Agreement
- Service Replication
- Data Storage
Books
- Hagit Attiya and
Jennifer Welch.
Distributed Computing: Fundamentals, Simulations
and Advanced Topics.
Wiley, 2nd edition, 2004.
- George Coulouris,
Jean Dollimore, and Tim Kindberg.
Distributed Systems: Concepts and Design.
Addison-Wesley, 3rd edition, 2001.
- Rachid Guerraoui and
Luís Rodrigues.
Introduction to Reliable Distributed Programming.
Springer, 2006.
Recommended articles (in order of topics)
- [opgapa03] David Oppenheimer,
Archana Ganapathi, and David A. Patterson.
Why do Internet
services fail, and what can be done about it?
In Proc. 4th USENIX Symposium on Internet Technologies and Systems
(USITS '03), 2003.
- [patter02] David A. Patterson.
An introduction to dependability.
;login:, 27(4):61-65, August 2002.
- [barspa04] Wendy Bartlett
and Lisa Spainhower.
Commercial fault tolerance: A tale of two systems.
IEEE Transactions on Dependable and Secure Computing, 1(1):87-96,
2004.
- [bbdgjk05] David Bernick, Bill
Bruckert, Paul Del Vigna, David Garcia, Robert Jardine, Jim Klecka, and Jim
Smullen.
NonStop
Advanced Architecture.
In Proc. International Conference on Dependable Systems and Networks
(DSN-2005), pages 12-21, June 2005.
- [spagre99] Lisa Spainhower and
Thomas A. Gregg.
IBM
S/390 parallel enterprise server G5 fault tolerance: A historical
perspective.
IBM Journal on Research and Development, 43(5/6):863-873,
1999.
- [naowoo98] Moni Naor and Avishai Wool.
The load, capacity and availability of quorum systems.
SIAM Journal on Computing, 27(2):423-447, 1998.
- [chatou96] Tushar Deepak Chandra and Sam Toueg.
Unreliable
failure detectors for reliable distributed systems.
Journal of the ACM, 43(2):225-267, 1996.
- [raynal05] Michel Raynal.
A short
introduction to failure detectors for asynchronous distributed systems
(Distributed Computing Column).
SIGACT News, 36(1):53-70, 2005.
- [hadtou93] Vassos Hadzilacos and Sam Toueg.
Fault-tolerant broadcasts and related problems.
In Sape J. Mullender, editor, Distributed Systems.
ACM Press & Addison-Wesley, New York, 1993.
Expanded version appears as Technical Report TR94-1425,
Department of Computer Science, Cornell University, Ithaca NY, 1994.
- [schnei90] Fred B. Schneider.
Implementing
fault-tolerant services using the state machine approach: A tutorial.
ACM Computing Surveys, 22(4):299-319, December 1990.
- [schzho04] Fred B. Schneider and
Lidong Zhou.
Distributed trust: Supporting fault-tolerance and attack-tolerance.
Technical Report TR 2004-1924, Cornell Computer Science Department, January
2004.
- [cacpor02] Christian Cachin and
Jonathan A. Poritz.
Secure intrusion-tolerant replication on the Internet.
In Proc. International Conference on Dependable Systems and Networks
(DSN-2002), pages 167-176, June 2002.
- [cacsam04] Christian Cachin and
Asad Samar.
Secure distributed DNS.
In Proc. International Conference on Dependable Systems and Networks
(DSN-2004), pages 423-432, June 2004.
- [schhas02] Frank Schmuck and
Roger Haskin.
GPFS: A shared-disk file system for large computing clusters.
In Proc. USENIX Conference on File and Storage Technologies (FAST
2002), 2002.
Last updated Thursday, 10-Jul-2008 10:27:11 CEST,
by Christian Cachin.