Fault tolerance problems arise in large-scale distributed systems because application components may eventually fail due to hardware problems, operator mistakes or design faults. Fault tolerance mechanisms must be employed to reduce the susceptibility of a given system to failure. In this paper, we describe the design of an architecture to overcome potential application component failures, using CORBA, a distributed object middleware specified by the OMG. Of primary importance to this architecture is OMG’s CORBA Object Trading Service as the mechanism to advertise and manage service offers for fault tolerant application components. This mechanism enables clients transparently to detect a failed connection to a service object, to discover a similar backup service object and to re-connect to it. This improves overall system stability and enables scalability.
|Title of host publication||Unknown Host Publication|
|Number of pages||4|
|Publication status||Published - 1999|
|Event||Euro-Par '99 Parallel Processing - Toulouse, France|
Duration: 1 Jan 1999 → …
|Conference||Euro-Par '99 Parallel Processing|
|Period||1/01/99 → …|