Was this page helpful?
Set Up a Spark Cluster with Docker¶
This page describes how to set up a Spark cluster locally on your machine by using Docker containers. It requires Docker and Git.
This approach is useful if you do not need a high-level of performance, and want to quickly try out the Migrator without having to set up a real cluster of nodes. We recommend using a real cluster for production, though.
Clone the Migrator repository.
git clone https://github.com/scylladb/scylla-migrator.git cd scylla-migrator
Download the latest release of the
and put it in the directorymigrator/target/scala-2.13/
.mkdir -p migrator/target/scala-2.13 wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar \ --directory-prefix=migrator/target/scala-2.13
Alternatively, download a specific release of scylla-migrator-assembly.jar.
Start the Spark cluster.
docker compose up -d
Open the Spark web UI.
Rename the file
, and configure it according to your needs.Finally, run the migration.
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \ --master spark://spark-master:7077 \ --conf spark.driver.host=spark-master \ --conf spark.scylla.config=/app/config.yaml \ <... other arguments> \ /jars/scylla-migrator-assembly.jar
container mounts the./migrator/target/scala-2.13
dir on/jars
and the repository root on/app
.See a complete description of the expected arguments to
in page Run the Migration, and replace “<… other arguments>” above with the appropriate arguments.You can monitor progress by observing the Spark web console you opened in step 4. Additionally, after the job has started, you can track progress via
.FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.