ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Migrator Getting Started Set Up a Spark Cluster with Ansible

Caution

You're viewing documentation for an unstable version of ScyllaDB Migrator. Switch to the latest stable version.

Set Up a Spark Cluster with Ansible¶

An Ansible playbook is provided in the ansible folder of our Git repository. The Ansible playbook will install the pre-requisites, Spark, on the master and workers added to the ansible/inventory/hosts file. Scylla-migrator will be installed on the spark master node.

Target OS: The Ansible playbook expects the target hosts to use an Ubuntu-compatible Linux distribution. Ubuntu 22.04 LTS and Ubuntu 24.04 LTS are most broadly tested, but other Ubuntu-compatible Linux distributions are likely to work as well.

Target User: The Ansible playbook connects to the target hosts via SSH as the user ubuntu, because this is the default user created by most AWS EC2 Ubuntu-based AMIs.

  1. Clone the Migrator Git repository:

    git clone https://github.com/scylladb/scylla-migrator.git
    cd scylla-migrator/ansible
    
  2. Update ansible/inventory/hosts file with master and worker instances

  3. Update ansible/ansible.cfg with location of private key if necessary

  4. The ansible/template/spark-env-master-sample and ansible/template/spark-env-worker-sample contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.

  5. run ansible-playbook scylla-migrator.yml

  6. On the Spark master node:

    cd scylla-migrator
    ./start-spark.sh
    
  7. On the Spark worker nodes:

    ./start-slave.sh
    
  8. Open Spark web console

    • Ensure networking is configured to allow you access spark master node via TCP ports 8080 and 4040

    • visit http://<spark-master-hostname>:8080

  9. Review and modify config.yaml based whether you’re performing a migration to CQL or Alternator

    • If you’re migrating to ScyllaDB CQL interface (from Apache Cassandra, ScyllaDB, or other CQL source), make a copy review the comments in config.yaml.example, and edit as directed.

    • If you’re migrating to Alternator (from DynamoDB or other ScyllaDB Alternator), make a copy, review the comments in config.dynamodb.yml, and edit as directed.

  10. As part of ansible deployment, sample submit jobs were created. You may edit and use the submit jobs.

  • For CQL migration: edit scylla-migrator/submit-cql-job.sh, change line --conf spark.scylla.config=config.yaml \ to point to the whatever you named the config.yaml in previous step.

  • For Alternator migration: edit scylla-migrator/submit-alternator-job.sh, change line --conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \ to reference the config.yaml file you created and modified in previous step.

  1. Ensure the table has been created in the target environment.

  2. Submit the migration by submitting the appropriate job

    • CQL migration: ./submit-cql-job.sh

    • Alternator migration: ./submit-alternator-job.sh

  3. You can monitor progress by observing the Spark web console you opened in step 7. Additionally, after the job has started, you can track progress via http://<spark-master-hostname>:4040.

    FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.

Was this page helpful?

PREVIOUS
Getting Started
NEXT
Manual Set Up of a Spark Cluster
  • Create an issue
  • Edit this page
ScyllaDB Migrator
  • master
    • master
    • 1.1.x
    • 1.0.x
  • Getting Started
    • Set Up a Spark Cluster with Ansible
    • Manual Set Up of a Spark Cluster
    • Set Up a Spark Cluster with Docker
  • Migrate from Apache Cassandra or from a Parquet File
  • Migrate from DynamoDB
  • Run the Migration
  • Stream Changes
  • Rename Columns
  • Resume an Interrupted Migration Where it Left Off
  • Validate the Migration
  • Configuration Reference
  • Tutorials
    • Migrate from DynamoDB to ScyllaDB Alternator Using Docker
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 28 Apr 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6