voldemort.server.storage.prunejob
Class VersionedPutPruneJob
java.lang.Object
voldemort.server.storage.DataMaintenanceJob
voldemort.server.storage.prunejob.VersionedPutPruneJob
- All Implemented Interfaces:
- java.lang.Runnable
public class VersionedPutPruneJob
- extends DataMaintenanceJob
Voldemort supports a "versioned" put interface, where the user can provide a
vector clock, generated outside of Voldemort. A common practice is to create
a vector clock with entries for all current replicas of a key with the
timestamp as the value.
For example, if a key replicates to A,B servers, then the put issued at time
t1, will have the clock [A:t1, B:t1].
The problem with this approach is that after rebalancing the replicas change
and subsequent such "versioned" puts will conflict for sure, with old
versions, leading to disk bloat.
For example, if the key now replicates to C,D servers, then the put issued at
time t2, will have the clock [C:t2, D:t2]. this conflicts with [A:t1, B:t1]
and the space occupied by the old version is never reclaimed
Run this job IF and ONLY IF, you are doing something like above. This job
sifts through all the data for a store and fixes multiple versions, by
pruning vector clocks to contain only entries for the current replicas.
Ergo, This has the following effect
1. For keys that were hit with some online traffic during rebalancing, we
have multiple versions already. In these cases, it will effectively throw
away the old version
2. For keys that were untouched since rebalancing, there will be just one
version on disk and the job will empty out entries in the clock that belong
to old replicas, such that the subsequent write will overwrite this version,
while reads can still read the old value in the meantime
Caveats/Cornercases:
1.While fixing a key, the client could write a value in that tiny tiny
window. In such a rare event, the client write could be overwritten by the
fixer, resulting in loss of that write.
2. The scan over the database is not guaranteed to hit "new" keys inserted
during the run. But, in this case, the new keys will be based off current
replicas anyway and we are good.
NOTE: Voldemort uses "sparse" vector clocks for the regular put interface,
that let's Voldemort pick out a vector clock. This job is NOT NECESSARY if
you are only doing regular puts. In fact, even if you are using "versioned"
puts with "dense" clocks filled with a monotonically increasing number (like
timestamp), you are fine.
Fields inherited from class voldemort.server.storage.DataMaintenanceJob |
BLACKLISTED_STORAGE_TYPES, isRunning, iterator, metadataStore, numKeysScannedThisRun, numKeysUpdatedThisRun, scanPermits, STAT_RECORDS_INTERVAL, storeRepo, throttler, totalKeysScanned, totalKeysUpdated |
Method Summary |
protected java.lang.String |
getJobName()
|
long |
getKeysPruned()
|
protected org.apache.log4j.Logger |
getLogger()
|
void |
operate()
|
static java.util.List<Versioned<byte[]>> |
pruneNonReplicaEntries(java.util.List<Versioned<byte[]>> vals,
java.util.List<java.lang.Integer> keyReplicas,
org.apache.commons.lang.mutable.MutableBoolean didPrune)
Remove all non replica clock entries from the list of versioned values
provided |
void |
setStoreName(java.lang.String storeName)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
VersionedPutPruneJob
public VersionedPutPruneJob(StoreRepository storeRepo,
MetadataStore metadataStore,
ScanPermitWrapper repairPermits,
int maxKeysScannedPerSecond)
setStoreName
public void setStoreName(java.lang.String storeName)
operate
public void operate()
throws java.lang.Exception
- Specified by:
operate
in class DataMaintenanceJob
- Throws:
java.lang.Exception
pruneNonReplicaEntries
public static java.util.List<Versioned<byte[]>> pruneNonReplicaEntries(java.util.List<Versioned<byte[]>> vals,
java.util.List<java.lang.Integer> keyReplicas,
org.apache.commons.lang.mutable.MutableBoolean didPrune)
- Remove all non replica clock entries from the list of versioned values
provided
- Parameters:
vals
- list of versioned values to prune replicas fromkeyReplicas
- list of current replicas for the given keydidPrune
- flag to mark if we did actually prune something
- Returns:
- pruned list
getLogger
protected org.apache.log4j.Logger getLogger()
- Specified by:
getLogger
in class DataMaintenanceJob
getJobName
protected java.lang.String getJobName()
- Specified by:
getJobName
in class DataMaintenanceJob
getKeysPruned
public long getKeysPruned()
Jay Kreps, Roshan Sumbaly, Alex Feinberg, Bhupesh Bansal, Lei Gao, Chinmay Soman, Vinoth Chandar, Zhongjie Wu