ThresholdFailureDetector (Voldemort)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

voldemort.cluster.failuredetector
Class ThresholdFailureDetector

java.lang.Object
  voldemort.cluster.failuredetector.AbstractFailureDetector
      voldemort.cluster.failuredetector.AsyncRecoveryFailureDetector
          voldemort.cluster.failuredetector.ThresholdFailureDetector

All Implemented Interfaces:: java.lang.Runnable, FailureDetector

public class ThresholdFailureDetector
extends AsyncRecoveryFailureDetector
extends AsyncRecoveryFailureDetector

ThresholdFailureDetector builds upon the AsyncRecoveryFailureDetector and provides a more lenient for marking nodes as unavailable. Fundamentally, for each node, the ThresholdFailureDetector keeps track of a "success ratio" which is a ratio of successful operations to total operations and requires that ratio to meet or exceed a threshold. That is, every call to recordException or recordSuccess increments the total count while only calls to recordSuccess increments the success count. Calls to recordSuccess increase the success ratio while calls to recordException by contrast decrease the success ratio.

As the success ratio threshold continues to exceed the threshold, the node will be considered as available. Once the success ratio dips below the threshold, the node is marked as unavailable. As this class extends the AsyncRecoveryFailureDetector, an unavailable node is only marked as available once a background thread has been able to contact the node asynchronously.

There is also a minimum number of requests that must occur before the success ratio is checked against the threshold. This is to prevent occurrences like 1 failure out of 1 attempt yielding a success ratio of 0%. There is also a threshold interval which means that the success ratio for a given node is only "valid" for a certain period of time, after which it is reset. This prevents scenarios like 100,000,000 successful requests (and thus 100% success threshold) overshadowing a subsequent stream of 10,000,000 failures because this is only 10% of the total and above a given threshold.

Field Summary

Fields inherited from class voldemort.cluster.failuredetector.AbstractFailureDetector
`failureDetectorConfig, idNodeStatusMap, listeners, logger`

Constructor Summary
`ThresholdFailureDetector(FailureDetectorConfig failureDetectorConfig)`

Method Summary
`protected java.lang.String`	`getCatastrophicError(UnreachableStoreException e)`
`java.lang.String`	`getNodeThresholdStats()`
`protected void`	`nodeRecovered(Node node)` We delegate node recovery detection to the `AsyncRecoveryFailureDetector` class.
`void`	`recordException(Node node, long requestTime, UnreachableStoreException e)` Allows external callers to provide input to the FailureDetector that an error occurred when trying to access the node.
`void`	`recordSuccess(Node node, long requestTime)` Allows external callers to provide input to the FailureDetector that an access to the node succeeded.
`protected void`	`update(Node node, boolean isSuccess, UnreachableStoreException e)`

Methods inherited from class voldemort.cluster.failuredetector.AsyncRecoveryFailureDetector
`destroy, isAvailable, run`

Methods inherited from class voldemort.cluster.failuredetector.AbstractFailureDetector
`addFailureDetectorListener, checkArgs, checkNodeArg, getAvailableNodeCount, getAvailableNodes, getConfig, getLastChecked, getNodeCount, getNodeStatus, getUnavailableNodes, removeFailureDetectorListener, setAvailable, setUnavailable, waitForAvailability`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

ThresholdFailureDetector

public ThresholdFailureDetector(FailureDetectorConfig failureDetectorConfig)

Method Detail

recordException

public void recordException(Node node,
                            long requestTime,
                            UnreachableStoreException e)

Description copied from interface: FailureDetector

Allows external callers to provide input to the FailureDetector that an error occurred when trying to access the node. The implementation is free to use or ignore this input. It can be considered a "hint" to the FailureDetector rather than an absolute truth. For example, it is possible to call recordException for a given node and have an immediate call to isAvailable return true, depending on the implementation.

Specified by:: recordException in interface FailureDetector
Overrides:: recordException in class AsyncRecoveryFailureDetector

Parameters:: node - Node to check; requestTime - Length of time (in milliseconds) to perform request; e - Exception that occurred when trying to access the node

recordSuccess

public void recordSuccess(Node node,
                          long requestTime)

Description copied from interface: FailureDetector

Allows external callers to provide input to the FailureDetector that an access to the node succeeded. As with recordException, the implementation is free to use or ignore this input. It can be considered a "hint" to the FailureDetector rather than gospel truth.

Note for implementors: because of threading issues it's possible for multiple threads to attempt access to a node and some fail and some succeed. In a classic last-one-in-wins scenario, it's possible for the failures to be recorded first and then the successes. It would be prudent for implementations not to immediately assume that the node is then available.

Specified by:: recordSuccess in interface FailureDetector
Overrides:: recordSuccess in class AsyncRecoveryFailureDetector

Parameters:: node - Node to check; requestTime - Length of time (in milliseconds) to perform request

getNodeThresholdStats

public java.lang.String getNodeThresholdStats()

nodeRecovered

protected void nodeRecovered(Node node)

We delegate node recovery detection to the AsyncRecoveryFailureDetector class. When it determines that the node has recovered, this callback is executed with the newly-recovered node.

Overrides:: nodeRecovered in class AsyncRecoveryFailureDetector

update

protected void update(Node node,
                      boolean isSuccess,
                      UnreachableStoreException e)

getCatastrophicError

protected java.lang.String getCatastrophicError(UnreachableStoreException e)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Jay Kreps, Roshan Sumbaly, Alex Feinberg, Bhupesh Bansal, Lei Gao, Chinmay Soman, Vinoth Chandar, Zhongjie Wu

voldemort.cluster.failuredetector Class ThresholdFailureDetector

ThresholdFailureDetector

recordException

recordSuccess

getNodeThresholdStats

nodeRecovered

update

getCatastrophicError

voldemort.cluster.failuredetector
Class ThresholdFailureDetector