ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Server
  • Cloud
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Download
ScyllaDB Docs ScyllaDB Migrator Getting Started Set Up a Spark Cluster with Docker

Set Up a Spark Cluster with Docker¶

This page describes how to set up a Spark cluster locally on your machine by using Docker containers. It requires Docker and Git.

Note

This approach is useful if you do not need a high-level of performance, and want to quickly try out the Migrator without having to set up a real cluster of nodes. We recommend using a real cluster for production, though.

  1. Clone the Migrator repository.

    git clone https://github.com/scylladb/scylla-migrator.git
    cd scylla-migrator
    
  2. Download the latest release of the scylla-migrator-assembly.jar and put it in the directory migrator/target/scala-2.13/.

    mkdir -p migrator/target/scala-2.13
    wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar \
      --directory-prefix=migrator/target/scala-2.13
    

    Alternatively, download a specific release of scylla-migrator-assembly.jar.

  3. Start the Spark cluster.

    docker compose up -d
    
  4. Open the Spark web UI.

    http://localhost:8080

  5. Rename the file config.yaml.example to config.yaml, and configure it according to your needs.

  6. Finally, run the migration.

    docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
      --master spark://spark-master:7077 \
      --conf spark.driver.host=spark-master \
      --conf spark.scylla.config=/app/config.yaml \
      <... other arguments> \
      /jars/scylla-migrator-assembly.jar
    

    The spark-master container mounts the ./migrator/target/scala-2.13 dir on /jars and the repository root on /app.

    See a complete description of the expected arguments to spark-submit in page Run the Migration, and replace “<… other arguments>” above with the appropriate arguments.

  7. You can monitor progress by observing the Spark web console you opened in step 4. Additionally, after the job has started, you can track progress via http://localhost:4040.

    FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.

Was this page helpful?

PREVIOUS
Set Up a Spark Cluster with AWS EMR
NEXT
Migrate from Apache Cassandra or from a Parquet File
  • Create an issue
  • Edit this page
ScyllaDB Migrator
  • 1.1.x
    • master
    • 1.1.x
    • 1.0.x
  • Getting Started
    • Set Up a Spark Cluster with Ansible
    • Manual Set Up of a Spark Cluster
    • Set Up a Spark Cluster with AWS EMR
    • Set Up a Spark Cluster with Docker
  • Migrate from Apache Cassandra or from a Parquet File
  • Migrate from DynamoDB
  • Run the Migration
  • Stream Changes
  • Rename Columns
  • Resume an Interrupted Migration Where it Left Off
  • Validate the Migration
  • Configuration Reference
  • Tutorials
    • Migrate from DynamoDB to ScyllaDB Alternator Using Docker
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 28 Apr 2025.
Powered by Sphinx 7.4.7 & ScyllaDB Theme 1.8.6