Re: fault tolerant processing

Fons Rademakers (Fons.Rademakers@cern.ch)
Thu, 11 Mar 1999 16:34:21 +0000


In principle errors are trapped by ROOT. However, correct continuation
depends a lot on where the error happened. If it was somewhere in the
interpreter, things are currently not correctly reset to allow a
restart.
We expect to work on this problem with Masa during the ROOT workshop
in Fermilab. If the error happens somewhere else a lot depends on how
good the code can handle re-entrancy, etc. The system should support
some global reset so that one can continue with a next event in a clean
state.

Cheers, Fons.

Valeriy Onuchin wrote:
>
> Hi Rooters!
> We are using ROOT for online monitoring
> http://emcal06.rhic.bnl.gov/~onuchin/Sproot/html/USER_Index.html
>
> One of the main our problems is providing
> fault tolerant processing =
> providing recovery from system/ROOT/process failure.
>
> If anybody has solutions or experience how to deal with it ?
>
> With best regards, Valery
>
> P.S.
> Suppose similar problems must be in offline processing too,
> e.g. AtlasFast and Star have a chain of makers,
> what do you do when one of the makers crashed your root session?
>
> ... and suggestion
> we are using TMapFiles for local storage of processed data
> http://emcal06.rhic.bnl.gov/~onuchin/Sproot/html/DbManager.html
> after introducing TMapRec it became possible to loop over
> objects in TMapFile ,
>
> but could you(Fons) change TMapFile:AcquireSemaphore()
> and TMapFile::ReleaseSemaphore() from protected to public ?

-- 
Org:    CERN, European Laboratory for Particle Physics.
Mail:   1211 Geneve 23, Switzerland
E-Mail: Fons.Rademakers@cern.ch              Phone: +41 22 7679248
WWW:    http://root.cern.ch/~rdm/            Fax:   +41 22 7677910