Was this page helpful?
In addition to the monitoring user interface provided by Spark, you can run another program, after running a migration, to check whether the destination table contains exactly the same items as the source table.
Running this program consists of submitting the same Spark job as the migration, but with a different entry point.
Before running the validator, adjust the corresponding configuration in the top-level validation
property of the file config.yml
:
validation:
# Should WRITETIMEs and TTLs be compared?
compareTimestamps: true
# What difference should we allow between TTLs?
ttlToleranceMillis: 60000
# What difference should we allow between WRITETIMEs?
writetimeToleranceMillis: 1000
# How many differences to fetch and print
failuresToFetch: 100
# What difference should we allow between floating point numbers?
floatingPointTolerance: 0.001
# What difference in ms should we allow between timestamps?
timestampMsTolerance: 0
The exact way to run the validator depends on the way you set up the Spark cluster.
Submit the job submit-cql-job-validator.sh
.
Pass the argument --class com.scylladb.migrator.Validator
to the spark-submit
invocation:
spark-submit --class com.scylladb.migrator.Validator \
--master spark://<spark-master-hostname>:7077 \
--conf spark.scylla.config=<path to config.yaml> \
<path to scylla-migrator-assembly.jar>
Use the following arguments for the Cluster Step that runs a custom JAR:
spark-submit --deploy-mode cluster --class com.scylladb.migrator.Validator --conf spark.scylla.config=/mnt1/config.yaml /mnt1/scylla-migrator-assembly.jar
Pass the argument --class com.scylladb.migrator.Validator
to the spark-submit
invocation:
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Validator \
--master spark://spark-master:7077 \
--conf spark.driver.host=spark-master \
--conf spark.scylla.config=/app/config.yaml \
/jars/scylla-migrator-assembly.jar
Was this page helpful?