voldemort.cluster.failuredetector
Class AsyncRecoveryFailureDetector

java.lang.Object
  extended by voldemort.cluster.failuredetector.AbstractFailureDetector
      extended by voldemort.cluster.failuredetector.AsyncRecoveryFailureDetector
All Implemented Interfaces:
java.lang.Runnable, FailureDetector
Direct Known Subclasses:
ThresholdFailureDetector

public class AsyncRecoveryFailureDetector
extends AbstractFailureDetector
implements java.lang.Runnable

AsyncRecoveryFailureDetector detects failures and then attempts to contact the failing node's Store to determine availability.

When a node does go down, attempts to access the remote Store for that node may take several seconds. Rather than cause the thread to block, we perform this check in a background thread.


Field Summary
 
Fields inherited from class voldemort.cluster.failuredetector.AbstractFailureDetector
failureDetectorConfig, idNodeStatusMap, listeners, logger
 
Constructor Summary
AsyncRecoveryFailureDetector(FailureDetectorConfig failureDetectorConfig)
           
 
Method Summary
 void destroy()
          Cleans up any open resources in preparation for shutdown.
 boolean isAvailable(Node node)
          Determines if the node is available or offline.
protected  void nodeRecovered(Node node)
           
 void recordException(Node node, long requestTime, UnreachableStoreException e)
          Allows external callers to provide input to the FailureDetector that an error occurred when trying to access the node.
 void recordSuccess(Node node, long requestTime)
          Allows external callers to provide input to the FailureDetector that an access to the node succeeded.
 void run()
           
 
Methods inherited from class voldemort.cluster.failuredetector.AbstractFailureDetector
addFailureDetectorListener, checkArgs, checkNodeArg, getAvailableNodeCount, getAvailableNodes, getConfig, getLastChecked, getNodeCount, getNodeStatus, getUnavailableNodes, removeFailureDetectorListener, setAvailable, setUnavailable, waitForAvailability
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AsyncRecoveryFailureDetector

public AsyncRecoveryFailureDetector(FailureDetectorConfig failureDetectorConfig)
Method Detail

isAvailable

public boolean isAvailable(Node node)
Description copied from interface: FailureDetector
Determines if the node is available or offline. The isAvailable method is a simple boolean operation to determine if the node in question is available. As expected, the result of this call is an approximation given race conditions. However, the FailureDetector should do its best to determine the then-current state of the cluster to produce a minimum of false negatives and false positives.

Note: this determination is approximate and differs based upon the algorithm used by the implementation.

Specified by:
isAvailable in interface FailureDetector
Parameters:
node - Node to check
Returns:
True if available, false otherwise

recordException

public void recordException(Node node,
                            long requestTime,
                            UnreachableStoreException e)
Description copied from interface: FailureDetector
Allows external callers to provide input to the FailureDetector that an error occurred when trying to access the node. The implementation is free to use or ignore this input. It can be considered a "hint" to the FailureDetector rather than an absolute truth. For example, it is possible to call recordException for a given node and have an immediate call to isAvailable return true, depending on the implementation.

Specified by:
recordException in interface FailureDetector
Parameters:
node - Node to check
requestTime - Length of time (in milliseconds) to perform request
e - Exception that occurred when trying to access the node

recordSuccess

public void recordSuccess(Node node,
                          long requestTime)
Description copied from interface: FailureDetector
Allows external callers to provide input to the FailureDetector that an access to the node succeeded. As with recordException, the implementation is free to use or ignore this input. It can be considered a "hint" to the FailureDetector rather than gospel truth.

Note for implementors: because of threading issues it's possible for multiple threads to attempt access to a node and some fail and some succeed. In a classic last-one-in-wins scenario, it's possible for the failures to be recorded first and then the successes. It would be prudent for implementations not to immediately assume that the node is then available.

Specified by:
recordSuccess in interface FailureDetector
Parameters:
node - Node to check
requestTime - Length of time (in milliseconds) to perform request

destroy

public void destroy()
Description copied from interface: FailureDetector
Cleans up any open resources in preparation for shutdown.

Note for implementors: After this method is called it is assumed that attempts to call the other methods will either silently fail, throw errors, or return stale information.

Specified by:
destroy in interface FailureDetector
Overrides:
destroy in class AbstractFailureDetector

run

public void run()
Specified by:
run in interface java.lang.Runnable

nodeRecovered

protected void nodeRecovered(Node node)


Jay Kreps, Roshan Sumbaly, Alex Feinberg, Bhupesh Bansal, Lei Gao, Chinmay Soman, Vinoth Chandar, Zhongjie Wu