Voldemort support for Avro schema evolution
Introduction
We are introducing support for evolving schemas for people using Avro in Voldemort! This will help Voldemort users to use the data in existing stores and add new fields as their application logic changes.
This project achieves this by adding a new serializer type "avro-generic-versioned". ( see the voldemort.serialization.avro.versioned.AvroVersionedGenericSerializer class)
Important Note:
The "avro-generic-versioned" can only be used with new Voldemort stores (and not existing avro-generic stores). "avro-generic-versioned" is not backwards compatible with the old "avro-generic" format.
Usage
You can define a store with the key/value serialzer type as avro-generic-versioned and then use the version numbers for the evolving schema.
A stores.xml example of this would look like the following:
<value-serializer> <type>avro-generic-versioned</type> <schema-info version="0">{ "type": "record", "name": "myrec", "fields": [{ "name": "original", "type": "string" }]} </schema-info> <schema-info version="1">{ "type": "record", "name": "myrec", "fields": [ { "name": "original", "type": "string" }, { "name": "new-field", "type": "string", "default":"" } ]} </schema-info> </value-serializer>
- Do ALL clients that read data need to be bounced before we can start writes in new schema?
Not anymore ! we have a new auto rebootstrap mechanism in Voldemort client, it will pick up the new schema from the server (This is asynchronous so the writes of objects created using the new schema may fail during this window (~5 seconds), you can choose to manually bounce the clients to pick up the change immediately) - Does the client that write the data need to immediately start writing data in new schema once Voldemort is updates & client is bounced?
The Voldemort client will always try to serialize the data with the newest schema it has. If the record you supply was created with an old schema we will serialize it using the old schema, however on a get call we will return the objects written with old schema with defaults for the new fields in the new schema. This feature supports Rolling upgrades! - Should I evolve my key schema
NO NEVER. this can cause things to break
How Do I Query Using a Java Program
Example:
This a simple example to get you started.
If you have an avro object for the key you dont need to do the Jsondecoder stuff just pass the object to the get call directly. If you have a JSON String representation of the key then you need to follow this route instead.
The object the get call returns is a Versioned Object you can do a getValue() on this object to get the actual Avro object.
public void testAvroVersionedGet() { // make sure this is the correct bootstrap URL String bootstrapUrl = "tcp://localhost:6666"; StoreClientFactory factory = new SocketStoreClientFactory(new ClientConfig().setBootstrapUrls(bootstrapUrl)); //make sure this is the correct store name StoreClient<Object, Object> client = factory.getStoreClient("storeName"); Schema keySchema = Schema.parse("int"); JsonDecoder decoder = new JsonDecoder(keySchema,"1234"); GenericDatumReader<Object> datumReader = null; Object key = null; try { datumReader = new GenericDatumReader<Object>(keySchema); key = datumReader.read(null, decoder); } catch (IOException e) { // probably log this in your code if you copy paste ;-) e.printStackTrace(); } if (key == null) { System.err.println("Error parsing key "); } System.out.println("Value - " + client.get(key)); }
Safety Net
You should always use the admin tool to update the stores.xml to add new schema. If you want to do this on a production store the Voldemort team will do it.
If the schema change is backwards compatible the tool updates the stores.xml
[anagpal@anagpal-ld voldemort]$ ./bin/voldemort-admin-tool.sh --set-metadata stores.xml --set-metadata-value /home/anagpal/projects/stores.xml --url tcp://localhost:6666 INFO: Added required field myrec.new-field with default value: "" INFO: The schema is backward compatible. Old and new clients will be able to read records serialized by one another. INFO: Added required field myrec.new-field with default value: "" INFO: The schema is backward compatible. Old and new clients will be able to read records serialized by one another. Setting stores.xml for localhost:0
Otherwise it will fail. This is to protect you from corrupting the store due to serializing data with a bad schema.
[anagpal@anagpal-ld voldemort]$ ./bin/voldemort-admin-tool.sh --set-metadata stores.xml --set-metadata-value /home/anagpal/projects/stores.xml --url tcp://localhost:6666 ERROR: Illegal type change from STRING to INT for field myrec.original ERROR: The schema is not backward compatible. New clients will not be able to read existing data. voldemort.VoldemortException: The schema is not backward compatible. New clients will not be able to read existing data. at voldemort.serialization.avro.versioned.SchemaEvolutionValidator.checkSchemaCompatibility(SchemaEvolutionValidator.java:844) at voldemort.VoldemortAdminTool.main(VoldemortAdminTool.java:476) The schema is not backward compatible. New clients will not be able to read existing data.
Auto Rebootstrap when updating the schema
Please use the "--stores" option when you update
./bin/voldemort-admin-tool.sh --set-metadata stores.xml --set-metadata-value <path to new stores.xml> --url [url] --stores <comma separated list of stores>