ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Migrator Getting Started Set Up a Spark Cluster with AWS EMR

Set Up a Spark Cluster with AWS EMR¶

This page describes how to use the Migrator in Amazon EMR. This approach is useful if you already have an AWS account, or if you do not want to manage your infrastructure manually.

  1. Download the config.yaml.example from our Git repository.

    wget https://github.com/scylladb/scylla-migrator/raw/master/config.yaml.example \
      --output-document=config.yaml
    
  2. Configure the migration according to your needs.

  3. Download the latest release of the Migrator.

    wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar
    

    Alternatively, download a specific release of scylla-migrator-assembly.jar.

  4. Upload them to an S3 bucket.

    aws s3 cp config.yaml s3://<your-bucket>/scylla-migrator/config.yaml
    aws s3 cp scylla-migrator-assembly.jar s3://<your-bucket>/scylla-migrator/scylla-migrator-assembly.jar
    

    Replace <your-bucket> with an S3 bucket name that you manage.

    Each time you change the migration configuration, re-upload it to the bucket.

  1. Create a script named copy-files.sh, to load the files config.yaml and scylla-migrator-assembly.jar from your S3 bucket.

    #!/bin/bash
    aws s3 cp s3://<your-bucket>/scylla-migrator/config.yaml /mnt1/config.yaml
    aws s3 cp s3://<your-bucket>/scylla-migrator/scylla-migrator-assembly.jar /mnt1/scylla-migrator-assembly.jar
    
  2. Upload the script to your S3 bucket as well.

    aws s3 cp copy-files.sh s3://<your-bucket>/scylla-migrator/copy-files.sh
    
  3. Log in to the AWS EMR console.

  4. Choose “Create cluster” to create a new cluster based on EC2.

  5. Configure the cluster as follows:

    • Choose the EMR release emr-7.1.0, or any EMR release that is compatible with the Spark version used by the Migrator.

    • Make sure to include Spark in the application bundle.

    • Choose all-purpose EC2 instance types (e.g., i4i).

    • Make sure to include at least one task node.

    • Add a Step to run the Migrator:

      • Type: Custom JAR

      • JAR location: command-runner.jar

      • Arguments:

        spark-submit --deploy-mode cluster --class com.scylladb.migrator.Migrator --conf spark.scylla.config=/mnt1/config.yaml <... other arguments> /mnt1/scylla-migrator-assembly.jar
        

        See a complete description of the expected arguments to spark-submit in page Run the Migration, and replace “<… other arguments>” above with the appropriate arguments.

    • Add a Bootstrap action to download the Migrator and the migration configuration:

      • Script location: s3://<your-bucket>/scylla-migrator/copy-files.sh

  6. Finalize your cluster configuration according to your needs and finally choose “Create cluster”.

  7. The migration will start automatically after the cluster is fully up.

Was this page helpful?

PREVIOUS
Manual Set Up of a Spark Cluster
NEXT
Set Up a Spark Cluster with Docker
  • Create an issue
  • Edit this page
ScyllaDB Migrator
  • 1.1.x
    • master
    • 1.1.x
    • 1.0.x
  • Getting Started
    • Set Up a Spark Cluster with Ansible
    • Manual Set Up of a Spark Cluster
    • Set Up a Spark Cluster with AWS EMR
    • Set Up a Spark Cluster with Docker
  • Migrate from Apache Cassandra or from a Parquet File
  • Migrate from DynamoDB
  • Run the Migration
  • Stream Changes
  • Rename Columns
  • Resume an Interrupted Migration Where it Left Off
  • Validate the Migration
  • Configuration Reference
  • Tutorials
    • Migrate from DynamoDB to ScyllaDB Alternator Using Docker
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 28 Apr 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6