voldemort.client.rebalance
Class RebalanceBatchPlan

java.lang.Object
  extended by voldemort.client.rebalance.RebalanceBatchPlan

public class RebalanceBatchPlan
extends java.lang.Object

Constructs a batch plan that goes from currentCluster to finalCluster. The partition-stores included in the move are based on those listed in storeDefs. This batch plan is execution-agnostic, i.e., a plan is generated and later stealer- versus donor-based execution of that plan is decided. Long term, its unclear if the notion of RebalanceBatchPlan separate from RebalancePlan is needed. Batching tends to increase the overall cost of rebalancing and has historically been error prone. (I.e., the transition between batches has had intermittent failures.) Its value, if any, lies in allowing long-running (days or weeks) rebalancing jobs to have interim checkpoints such that single node failures don't force a restart from initial state. Should consider deprecating batching after zone expansion and zone shrinking have been done successfully as short (less than a day or two), single-batch rebalances.


Field Summary
protected  java.util.List<RebalanceTaskInfo> batchPlan
           
 
Constructor Summary
RebalanceBatchPlan(Cluster currentCluster, Cluster finalCluster, java.util.List<StoreDefinition> commonStoreDefs)
          Develops a batch plan to go from current cluster to final cluster for given stores.
RebalanceBatchPlan(Cluster currentCluster, java.util.List<StoreDefinition> currentStoreDefs, Cluster finalCluster, java.util.List<StoreDefinition> finalStoreDefs)
          Develops a batch plan to go from current cluster/stores to final cluster/stores.
 
Method Summary
 java.util.List<RebalanceTaskInfo> getBatchPlan()
           
 int getCrossZonePartitionStoreMoves()
          Determines total number of partition-stores moved across zones.
 Cluster getCurrentCluster()
           
 java.util.List<StoreDefinition> getCurrentStoreDefs()
           
protected  int getDonorId(StoreRoutingPlan currentSRP, StoreRoutingPlan finalSRP, int stealerZoneId, int stealerNodeId, int stealerPartitionId)
          Decide which donor node to steal from.
 Cluster getFinalCluster()
           
 java.util.List<StoreDefinition> getFinalStoreDefs()
           
 MoveMap getNodeMoveMap()
           
 int getPartitionStoreMoves()
          Return the total number of partition-store moves
 RebalanceBatchPlanProgressBar getProgressBar(int batchId)
           
 int getTaskCount()
          Returns the number of rebalance tasks in this batch.
 MoveMap getZoneMoveMap()
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

batchPlan

protected final java.util.List<RebalanceTaskInfo> batchPlan
Constructor Detail

RebalanceBatchPlan

public RebalanceBatchPlan(Cluster currentCluster,
                          java.util.List<StoreDefinition> currentStoreDefs,
                          Cluster finalCluster,
                          java.util.List<StoreDefinition> finalStoreDefs)
Develops a batch plan to go from current cluster/stores to final cluster/stores.

Parameters:
currentCluster -
currentStoreDefs -
finalCluster -
finalStoreDefs -

RebalanceBatchPlan

public RebalanceBatchPlan(Cluster currentCluster,
                          Cluster finalCluster,
                          java.util.List<StoreDefinition> commonStoreDefs)
Develops a batch plan to go from current cluster to final cluster for given stores. (Stores is common for current and final cluster.)

Parameters:
currentCluster -
finalCluster -
commonStoreDefs -
Method Detail

getCurrentCluster

public Cluster getCurrentCluster()

getCurrentStoreDefs

public java.util.List<StoreDefinition> getCurrentStoreDefs()

getFinalCluster

public Cluster getFinalCluster()

getFinalStoreDefs

public java.util.List<StoreDefinition> getFinalStoreDefs()

getBatchPlan

public java.util.List<RebalanceTaskInfo> getBatchPlan()

getProgressBar

public RebalanceBatchPlanProgressBar getProgressBar(int batchId)

getZoneMoveMap

public MoveMap getZoneMoveMap()

getNodeMoveMap

public MoveMap getNodeMoveMap()

getCrossZonePartitionStoreMoves

public int getCrossZonePartitionStoreMoves()
Determines total number of partition-stores moved across zones.

Returns:
number of cross zone partition-store moves

getPartitionStoreMoves

public int getPartitionStoreMoves()
Return the total number of partition-store moves

Returns:
Number of moves

getTaskCount

public int getTaskCount()
Returns the number of rebalance tasks in this batch.

Returns:
number of rebalance tasks in this batch

getDonorId

protected int getDonorId(StoreRoutingPlan currentSRP,
                         StoreRoutingPlan finalSRP,
                         int stealerZoneId,
                         int stealerNodeId,
                         int stealerPartitionId)
Decide which donor node to steal from. This is a policy implementation. I.e., in the future, additional policies could be considered. At that time, this method should be overridden in a sub-class, or a policy object ought to implement this algorithm. Current policy: 1) If possible, a stealer node that is the zone n-ary in the finalCluster steals from the zone n-ary in the currentCluster in the same zone. 2) If there are no partition-stores to steal in the same zone (i.e., this is the "zone expansion" use case), then a differnt policy must be used. The stealer node that is the zone n-ary in the finalCluster determines which pre-existing zone in the currentCluster hosts the primary partition id for the partition-store. The stealer then steals the zone n-ary from that pre-existing zone. This policy avoids unnecessary cross-zone moves and distributes the load of cross-zone moves approximately-uniformly across pre-existing zones. Other policies to consider: - For zone expansion, steal all partition-stores from one specific pre-existing zone. - Replace heuristic to approximately uniformly distribute load among existing zones to something more concrete (i.e. track steals from each pre-existing zone and forcibly balance them). - Select a single donor for all replicas in a new zone. This will require donor-based rebalancing to be run (at least for this specific part of the plan). This would reduce the number of donor-side scans of data. (But still send replication factor copies over the WAN.) This would require apparatus in the RebalanceController to work. - Set up some sort of chain-replication in which a single stealer in the new zone steals some replica from a pre-exising zone, and then other n-aries in the new zone steal from the single cross-zone stealer in the zone. This would require apparatus in the RebalanceController to work.

Parameters:
currentSRP -
finalSRP -
stealerZoneId -
stealerNodeId -
stealerPartitionId -
Returns:
the node id of the donor for this partition Id.

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object


Jay Kreps, Roshan Sumbaly, Alex Feinberg, Bhupesh Bansal, Lei Gao, Chinmay Soman, Vinoth Chandar, Zhongjie Wu