ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Migrator Getting Started

Getting Started¶

Since the Migrator is packaged as a Spark application, you first have to set up a Spark cluster to use it. Then, submit the application along with its configuration on the Spark cluster, which will execute the migration by reading from the source database and writing to the target database.

Set Up a Spark Cluster¶

A Spark cluster is made of several nodes, which can contain several workers (although there is usually just one worker per node). When you start the Migrator, the Spark driver looks at the job content and splits it into tasks. It then spawns executors on the cluster workers and feeds them with the tasks to compute. Since the tasks are processed in parallel, you can increase the possible throughput of the migration by increasing the number of worker nodes. Note that the migration throughput is also limited by the read throughput of the source database and the write throughput of the target database.

We suggest starting with a small cluster containing a single worker node with 5 to 10 CPUs, and increasing the number of worker nodes (or the number of CPUs per node) if necessary, as long as the source and target database are not saturated. We recommend provisioning at least 2 GB of memory per CPU on each node. For instance, a cluster node with 8 CPUs should have at least 16 GB of memory.

Caution

Make sure the Spark version, the Scala version, and the Migrator version you use are compatible together.

The following pages describe various alternative ways to set up a Spark cluster:

  • on your infrastructure, using Ansible,

  • on your infrastructure, manually,

  • using AWS EMR,

  • or, on a single machine, using Docker.

Configure the Migration¶

Once you have a Spark cluster ready to run the scylla-migrator-assembly.jar, download the file config.yaml.example and rename it to config.yaml. This file contains properties such as source or target defining how to connect to the source database and to the target database, as well as other settings to perform the migration. Adapt it to your case according to the following guides:

  • Migrate from Apache Cassandra or Parquet files to ScyllaDB.

  • Or, migrate from DynamoDB to ScyllaDB’s Alternator.

Run the Migration¶

Start the migration by invoking the spark-submit command with the appropriate arguments, as explained in the page Run the Migration.

Extra Features¶

You might also be interested in the following extra features:

  • rename columns along the migration,

  • replicate changes applied to the source table after the initial snapshot transfer has completed,

  • resume an interrupted migration where it left off,

  • validate that the migration was complete and correct.

Was this page helpful?

PREVIOUS
ScyllaDB Migrator Documentation
NEXT
Set Up a Spark Cluster with Ansible
  • Create an issue
  • Edit this page

On this page

  • Getting Started
    • Set Up a Spark Cluster
    • Configure the Migration
    • Run the Migration
    • Extra Features
ScyllaDB Migrator
  • 1.1.x
    • master
    • 1.1.x
    • 1.0.x
  • Getting Started
    • Set Up a Spark Cluster with Ansible
    • Manual Set Up of a Spark Cluster
    • Set Up a Spark Cluster with AWS EMR
    • Set Up a Spark Cluster with Docker
  • Migrate from Apache Cassandra or from a Parquet File
  • Migrate from DynamoDB
  • Run the Migration
  • Stream Changes
  • Rename Columns
  • Resume an Interrupted Migration Where it Left Off
  • Validate the Migration
  • Configuration Reference
  • Tutorials
    • Migrate from DynamoDB to ScyllaDB Alternator Using Docker
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 28 Apr 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6