Show simple item record

dc.contributor.advisorChaudhary, Sanjay
dc.contributor.authorMehta, Janki
dc.date.accessioned2017-06-10T14:37:12Z
dc.date.available2017-06-10T14:37:12Z
dc.date.issued2007
dc.identifier.citationMehta, Janki (2007). Checkpointing and recovery mechanism in grid. Dhirubhai Ambani Institute of Information and Communication Technology, viii, 54 p. (Acc.No: T00122)
dc.identifier.urihttp://drsr.daiict.ac.in/handle/123456789/159
dc.description.abstractGrid is a collection of distributed computing resources that performs tasks in co-ordination to achieve high-end computational capabilities. Grid Computing is a collective computing of a given task by breaking it into sub-tasks. Each sub-task could be large and run for several hours or days on a number of grid nodes. If a sub-task fails to complete even on a single site, all the computations need to be done again. In scalable distributed systems, individual component failures usually does not result in failure of the entire system. However, a single failure may crash an entire parallel application. Grid is dynamic in nature. Since the probability of a single component failure rises rapidly with the number of components in the system, as system grows in size, efficient recovery mechanism is most important for highly parallel mission critical and long running applications of grid environment. This thesis addresses a recovery mechanism using checkpoints to recover from Grid Service failure resulting in task or transaction failure in Computational Grid and Data Grid which will prevent computations to be restarted from scratch. Grid Service may fail as a result of hardware or software fault. A checkpoint is a point in time snapshot of a grid node in which its state information is stored. It will help in reducing the crash recovery time. This work helps in preserving two main objectives of grid namely optimal resource utilization and speedy computations which can be achieved by using resources in a better way for improving performance of system rather than engaging them in tasks like rollbacks resulting from cascading aborts. The saved state using checkpoints can also be used for job migration using job schedulers of grid on occurrence of critical failures like Operating System failure. Experiments conducted provide integration of proposed mechanism with standard grid Web Service Resource Framework and will aid in future development work.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectComputational grids
dc.subjectComputer systems
dc.subjectGlobus
dc.subjectElectronic resource
dc.subjectApplication software - Development
dc.subjectGrid computing
dc.subjectGrid architectures
dc.subjectGrid middleware and toolkits
dc.subjectComputer software - Development
dc.classification.ddc004.36 MEH
dc.titleCheckpointing and recovery mechanism in grid
dc.typeDissertation
dc.degreeM. Tech
dc.student.id200511029
dc.accession.numberT00122


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record