Was this page helpful?
Caution
You're viewing documentation for a previous version of ScyllaDB Migrator. Switch to the latest stable version.
Since the Migrator is packaged as a Spark application, you first have to set up a Spark cluster to use it. Then, submit the application along with its configuration on the Spark cluster, which will execute the migration by reading from the source database and writing to the target database.
A Spark cluster is made of several nodes, which can contain several workers (although there is usually just one worker per node). When you start the Migrator, the Spark driver looks at the job content and splits it into tasks. It then spawns executors on the cluster workers and feeds them with the tasks to compute. Since the tasks are processed in parallel, you can increase the possible throughput of the migration by increasing the number of worker nodes. Note that the migration throughput is also limited by the read throughput of the source database and the write throughput of the target database.
We suggest starting with a small cluster containing a single worker node with 5 to 10 CPUs, and increasing the number of worker nodes (or the number of CPUs per node) if necessary, as long as the source and target database are not saturated. We recommend provisioning at least 2 GB of memory per CPU on each node. For instance, a cluster node with 8 CPUs should have at least 16 GB of memory.
Caution
Make sure the Spark version, the Scala version, and the Migrator version you use are compatible together.
The following pages describe various alternative ways to set up a Spark cluster:
Once you have a Spark cluster ready to run the scylla-migrator-assembly.jar
, download the file config.yaml.example and rename it to config.yaml
. This file contains properties such as source
or target
defining how to connect to the source database and to the target database, as well as other settings to perform the migration. Adapt it to your case according to the following guides:
Start the migration by invoking the spark-submit
command with the appropriate arguments, as explained in the page Run the Migration.
You might also be interested in the following extra features:
Was this page helpful?