Voldemort support for Avro schema evolution

Introduction

We are introducing support for evolving schemas for people using Avro in Voldemort! This will help Voldemort users to use the data in existing stores and add new fields as their application logic changes.

This project achieves this by adding a new serializer type "avro-generic-versioned". ( see the voldemort.serialization.avro.versioned.AvroVersionedGenericSerializer class)

Important Note:

The "avro-generic-versioned" can only be used with new Voldemort stores (and not existing avro-generic stores). "avro-generic-versioned" is not backwards compatible with the old "avro-generic" format.

Usage

You can define a store with the key/value serialzer type as avro-generic-versioned and then use the version numbers for the evolving schema.

A stores.xml example of this would look like the following:

<value-serializer>
  <type>avro-generic-versioned</type>
  <schema-info version="0">{
    "type": "record",
    "name": "myrec",
    "fields": [{ "name": "original", "type": "string" }]}
  </schema-info>
  <schema-info version="1">{
    "type": "record", 
    "name": "myrec",
    "fields": [
      { "name": "original", "type": "string" },
      { "name": "new-field", "type": "string", "default":"" }
    ]}
  </schema-info>
</value-serializer>

How Do I Query Using a Java Program

Example:

This a simple example to get you started.
If you have an avro object for the key you dont need to do the Jsondecoder stuff just pass the object to the get call directly. If you have a JSON String representation of the key then you need to follow this route instead.
The object the get call returns is a Versioned Object you can do a getValue() on this object to get the actual Avro object.

public void testAvroVersionedGet()
{
   // make sure this is the correct bootstrap URL
    String bootstrapUrl = "tcp://localhost:6666";
    StoreClientFactory factory =
            new SocketStoreClientFactory(new ClientConfig().setBootstrapUrls(bootstrapUrl));
 
  //make sure this is the correct store name
   StoreClient<Object, Object> client = factory.getStoreClient("storeName");
   
   Schema keySchema = Schema.parse("int");
   JsonDecoder decoder = new JsonDecoder(keySchema,"1234");
   GenericDatumReader<Object> datumReader = null;
   Object key = null;
 
   try {
    datumReader = new GenericDatumReader<Object>(keySchema);
    key = datumReader.read(null, decoder);
    } catch (IOException e) {
          // probably log this in your code if you copy paste ;-)
          e.printStackTrace();
    }
    if (key == null) {
    System.err.println("Error parsing key ");
   }
 
 
   System.out.println("Value - " + client.get(key));
}

Safety Net

You should always use the admin tool to update the stores.xml to add new schema. If you want to do this on a production store the Voldemort team will do it.
If the schema change is backwards compatible the tool updates the stores.xml

[anagpal@anagpal-ld voldemort]$ ./bin/voldemort-admin-tool.sh --set-metadata stores.xml --set-metadata-value /home/anagpal/projects/stores.xml --url tcp://localhost:6666
 
INFO: Added required field myrec.new-field with default value: ""
INFO: The schema is backward compatible. Old and new clients will be able to read records serialized by one another.
INFO: Added required field myrec.new-field with default value: ""
INFO: The schema is backward compatible. Old and new clients will be able to read records serialized by one another.
Setting stores.xml for localhost:0

Otherwise it will fail. This is to protect you from corrupting the store due to serializing data with a bad schema.

[anagpal@anagpal-ld voldemort]$ ./bin/voldemort-admin-tool.sh --set-metadata stores.xml --set-metadata-value /home/anagpal/projects/stores.xml --url tcp://localhost:6666
ERROR: Illegal type change from STRING to INT for field myrec.original
ERROR: The schema is not backward compatible. New clients will not be able to read existing data.
voldemort.VoldemortException:  The schema is not backward compatible. New clients will not be able to read existing data.
at voldemort.serialization.avro.versioned.SchemaEvolutionValidator.checkSchemaCompatibility(SchemaEvolutionValidator.java:844)
at voldemort.VoldemortAdminTool.main(VoldemortAdminTool.java:476) The schema is not backward compatible. 
New clients will not be able to read existing data.

Auto Rebootstrap when updating the schema

Please use the "--stores" option when you update

./bin/voldemort-admin-tool.sh --set-metadata stores.xml --set-metadata-value <path to new stores.xml> --url [url] --stores <comma separated list of stores>
Fork me on GitHub