The mobile agents create a new paradigm for data exchange and resource sharing in rapidly growing and continually changing computer networks. In a distributed system, failures can occur in any software or hardware component. A mobile agent can get lost when its hosting server crashes during execution, or it can get dropped in a congested network. Therefore, survivability and fault tolerance are vital issues for deploying mobile-agent systems. This fault tolerance approach deploys three kinds of cooperating agents to detect server and agent failures and recover services in mobile-agent systems. An actual agent is a common mobile agent that performs specific computations for its owner. Witness agents monitor the actual agent and detect whether it's lost. A probe recovers the failed actual agent and the witness agents. A peer-to-peer message-passing mechanism stands between each actual agent and its witness agents to perform failure detection and recovery through time-bounded information exchange; a log records the actual agent's actions. When failures occur, the system performs rollback recovery to abort uncommitted actions. Moreover, our method uses checkpointed data to recover the lost actual agent.