Getting Started | ScyllaDB Docs

Getting Started¶

Since the Migrator is packaged as a Spark application, you first have to set up a Spark cluster to use it. Then, submit the application along with its configuration on the Spark cluster, which will execute the migration by reading from the source database and writing to the target database.

Set Up a Spark Cluster¶

A Spark cluster is made of several nodes, which can contain several workers (although there is usually just one worker per node). When you start the Migrator, the Spark driver looks at the job content and splits it into tasks. It then spawns executors on the cluster workers and feeds them with the tasks to compute. Since the tasks are processed in parallel, you can increase the possible throughput of the migration by increasing the number of worker nodes. Note that the migration throughput is also limited by the read throughput of the source database and the write throughput of the target database.

We suggest starting with a small cluster containing a single worker node with 5 to 10 CPUs, and increasing the number of worker nodes (or the number of CPUs per node) if necessary, as long as the source and target database are not saturated. We recommend provisioning at least 2 GB of memory per CPU on each node. For instance, a cluster node with 8 CPUs should have at least 16 GB of memory.

Caution

Make sure the Spark version, the Scala version, and the Migrator version you use are compatible together.

The following pages describe various alternative ways to set up a Spark cluster:

Configure the Migration¶

Once you have a Spark cluster ready to run the scylla-migrator-assembly.jar, download the file config.yaml.example and rename it to config.yaml. This file contains properties such as source or target defining how to connect to the source database and to the target database, as well as other settings to perform the migration. Adapt it to your case according to the following guides:

Run the Migration¶

Start the migration by invoking the spark-submit command with the appropriate arguments, as explained in the page Run the Migration.

Was this page helpful?

Getting Started¶

Set Up a Spark Cluster¶

Configure the Migration¶

Run the Migration¶

Extra Features¶