voldemort.server.storage.prunejob
Class VersionedPutPruneJob

java.lang.Object
  extended by voldemort.server.storage.DataMaintenanceJob
      extended by voldemort.server.storage.prunejob.VersionedPutPruneJob
All Implemented Interfaces:
java.lang.Runnable

public class VersionedPutPruneJob
extends DataMaintenanceJob

Voldemort supports a "versioned" put interface, where the user can provide a vector clock, generated outside of Voldemort. A common practice is to create a vector clock with entries for all current replicas of a key with the timestamp as the value. For example, if a key replicates to A,B servers, then the put issued at time t1, will have the clock [A:t1, B:t1]. The problem with this approach is that after rebalancing the replicas change and subsequent such "versioned" puts will conflict for sure, with old versions, leading to disk bloat. For example, if the key now replicates to C,D servers, then the put issued at time t2, will have the clock [C:t2, D:t2]. this conflicts with [A:t1, B:t1] and the space occupied by the old version is never reclaimed Run this job IF and ONLY IF, you are doing something like above. This job sifts through all the data for a store and fixes multiple versions, by pruning vector clocks to contain only entries for the current replicas. Ergo, This has the following effect 1. For keys that were hit with some online traffic during rebalancing, we have multiple versions already. In these cases, it will effectively throw away the old version 2. For keys that were untouched since rebalancing, there will be just one version on disk and the job will empty out entries in the clock that belong to old replicas, such that the subsequent write will overwrite this version, while reads can still read the old value in the meantime Caveats/Cornercases: 1.While fixing a key, the client could write a value in that tiny tiny window. In such a rare event, the client write could be overwritten by the fixer, resulting in loss of that write. 2. The scan over the database is not guaranteed to hit "new" keys inserted during the run. But, in this case, the new keys will be based off current replicas anyway and we are good. NOTE: Voldemort uses "sparse" vector clocks for the regular put interface, that let's Voldemort pick out a vector clock. This job is NOT NECESSARY if you are only doing regular puts. In fact, even if you are using "versioned" puts with "dense" clocks filled with a monotonically increasing number (like timestamp), you are fine.


Field Summary
 
Fields inherited from class voldemort.server.storage.DataMaintenanceJob
BLACKLISTED_STORAGE_TYPES, isRunning, iterator, metadataStore, numKeysScannedThisRun, numKeysUpdatedThisRun, scanPermits, STAT_RECORDS_INTERVAL, storeRepo, throttler, totalKeysScanned, totalKeysUpdated
 
Constructor Summary
VersionedPutPruneJob(StoreRepository storeRepo, MetadataStore metadataStore, ScanPermitWrapper repairPermits, int maxKeysScannedPerSecond)
           
 
Method Summary
protected  java.lang.String getJobName()
           
 long getKeysPruned()
           
protected  org.apache.log4j.Logger getLogger()
           
 void operate()
           
static java.util.List<Versioned<byte[]>> pruneNonReplicaEntries(java.util.List<Versioned<byte[]>> vals, java.util.List<java.lang.Integer> keyReplicas, org.apache.commons.lang.mutable.MutableBoolean didPrune)
          Remove all non replica clock entries from the list of versioned values provided
 void setStoreName(java.lang.String storeName)
           
 
Methods inherited from class voldemort.server.storage.DataMaintenanceJob
closeIterator, getIsRunning, getKeysScanned, isWritableStore, resetStats, run
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

VersionedPutPruneJob

public VersionedPutPruneJob(StoreRepository storeRepo,
                            MetadataStore metadataStore,
                            ScanPermitWrapper repairPermits,
                            int maxKeysScannedPerSecond)
Method Detail

setStoreName

public void setStoreName(java.lang.String storeName)

operate

public void operate()
             throws java.lang.Exception
Specified by:
operate in class DataMaintenanceJob
Throws:
java.lang.Exception

pruneNonReplicaEntries

public static java.util.List<Versioned<byte[]>> pruneNonReplicaEntries(java.util.List<Versioned<byte[]>> vals,
                                                                       java.util.List<java.lang.Integer> keyReplicas,
                                                                       org.apache.commons.lang.mutable.MutableBoolean didPrune)
Remove all non replica clock entries from the list of versioned values provided

Parameters:
vals - list of versioned values to prune replicas from
keyReplicas - list of current replicas for the given key
didPrune - flag to mark if we did actually prune something
Returns:
pruned list

getLogger

protected org.apache.log4j.Logger getLogger()
Specified by:
getLogger in class DataMaintenanceJob

getJobName

protected java.lang.String getJobName()
Specified by:
getJobName in class DataMaintenanceJob

getKeysPruned

public long getKeysPruned()


Jay Kreps, Roshan Sumbaly, Alex Feinberg, Bhupesh Bansal, Lei Gao, Chinmay Soman, Vinoth Chandar, Zhongjie Wu