Class VersionedPutPruneJob

  extended by voldemort.server.storage.DataMaintenanceJob
      extended by voldemort.server.storage.prunejob.VersionedPutPruneJob
All Implemented Interfaces:

public class VersionedPutPruneJob
extends DataMaintenanceJob

Voldemort supports a "versioned" put interface, where the user can provide a vector clock, generated outside of Voldemort. A common practice is to create a vector clock with entries for all current replicas of a key with the timestamp as the value. For example, if a key replicates to A,B servers, then the put issued at time t1, will have the clock [A:t1, B:t1]. The problem with this approach is that after rebalancing the replicas change and subsequent such "versioned" puts will conflict for sure, with old versions, leading to disk bloat. For example, if the key now replicates to C,D servers, then the put issued at time t2, will have the clock [C:t2, D:t2]. this conflicts with [A:t1, B:t1] and the space occupied by the old version is never reclaimed Run this job IF and ONLY IF, you are doing something like above. This job sifts through all the data for a store and fixes multiple versions, by pruning vector clocks to contain only entries for the current replicas. Ergo, This has the following effect 1. For keys that were hit with some online traffic during rebalancing, we have multiple versions already. In these cases, it will effectively throw away the old version 2. For keys that were untouched since rebalancing, there will be just one version on disk and the job will empty out entries in the clock that belong to old replicas, such that the subsequent write will overwrite this version, while reads can still read the old value in the meantime Caveats/Cornercases: 1.While fixing a key, the client could write a value in that tiny tiny window. In such a rare event, the client write could be overwritten by the fixer, resulting in loss of that write. 2. The scan over the database is not guaranteed to hit "new" keys inserted during the run. But, in this case, the new keys will be based off current replicas anyway and we are good. NOTE: Voldemort uses "sparse" vector clocks for the regular put interface, that let's Voldemort pick out a vector clock. This job is NOT NECESSARY if you are only doing regular puts. In fact, even if you are using "versioned" puts with "dense" clocks filled with a monotonically increasing number (like timestamp), you are fine.

Field Summary
Fields inherited from class voldemort.server.storage.DataMaintenanceJob
BLACKLISTED_STORAGE_TYPES, isRunning, iterator, metadataStore, numKeysScannedThisRun, numKeysUpdatedThisRun, scanPermits, STAT_RECORDS_INTERVAL, storeRepo, throttler, totalKeysScanned, totalKeysUpdated
Constructor Summary
VersionedPutPruneJob(StoreRepository storeRepo, MetadataStore metadataStore, ScanPermitWrapper repairPermits, int maxKeysScannedPerSecond)
Method Summary
protected  java.lang.String getJobName()
 long getKeysPruned()
protected  org.apache.log4j.Logger getLogger()
 void operate()
static java.util.List<Versioned<byte[]>> pruneNonReplicaEntries(java.util.List<Versioned<byte[]>> vals, java.util.List<java.lang.Integer> keyReplicas, org.apache.commons.lang.mutable.MutableBoolean didPrune)
          Remove all non replica clock entries from the list of versioned values provided
 void setStoreName(java.lang.String storeName)
Methods inherited from class voldemort.server.storage.DataMaintenanceJob
closeIterator, getIsRunning, getKeysScanned, isWritableStore, resetStats, run
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public VersionedPutPruneJob(StoreRepository storeRepo,
                            MetadataStore metadataStore,
                            ScanPermitWrapper repairPermits,
                            int maxKeysScannedPerSecond)
Method Detail


public void setStoreName(java.lang.String storeName)


public void operate()
             throws java.lang.Exception
Specified by:
operate in class DataMaintenanceJob


public static java.util.List<Versioned<byte[]>> pruneNonReplicaEntries(java.util.List<Versioned<byte[]>> vals,
                                                                       java.util.List<java.lang.Integer> keyReplicas,
                                                                       org.apache.commons.lang.mutable.MutableBoolean didPrune)
Remove all non replica clock entries from the list of versioned values provided

vals - list of versioned values to prune replicas from
keyReplicas - list of current replicas for the given key
didPrune - flag to mark if we did actually prune something
pruned list


protected org.apache.log4j.Logger getLogger()
Specified by:
getLogger in class DataMaintenanceJob


protected java.lang.String getJobName()
Specified by:
getJobName in class DataMaintenanceJob


public long getKeysPruned()

Jay Kreps, Roshan Sumbaly, Alex Feinberg, Bhupesh Bansal, Lei Gao, Chinmay Soman, Vinoth Chandar, Zhongjie Wu