Set Up a Spark Cluster with Docker

Set Up a Spark Cluster with Docker¶

This page describes how to set up a Spark cluster locally on your machine by using Docker containers. It requires Docker and Git.

Note

This approach is useful if you do not need a high-level of performance, and want to quickly try out the Migrator without having to set up a real cluster of nodes. We recommend using a real cluster for production, though.

Clone the Migrator repository.

git clone https://github.com/scylladb/scylla-migrator.git
cd scylla-migrator

Download the latest release of the scylla-migrator-assembly.jar and put it in the directory migrator/target/scala-2.13/.

mkdir -p migrator/target/scala-2.13
wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar \
  --directory-prefix=migrator/target/scala-2.13

Alternatively, download a specific release of scylla-migrator-assembly.jar.

Start the Spark cluster.

docker compose up -d

Open the Spark web UI.

http://localhost:8080

Rename the file config.yaml.example to config.yaml, and configure it according to your needs.

Finally, run the migration.

docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
  --master spark://spark-master:7077 \
  --conf spark.driver.host=spark-master \
  --conf spark.scylla.config=/app/config.yaml \
  <... other arguments> \
  /jars/scylla-migrator-assembly.jar

The spark-master container mounts the ./migrator/target/scala-2.13 dir on /jars and the repository root on /app.

See a complete description of the expected arguments to spark-submit in page Run the Migration, and replace “<… other arguments>” above with the appropriate arguments.

You can monitor progress by observing the Spark web console you opened in step 4. Additionally, after the job has started, you can track progress via http://localhost:4040.

FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.

Was this page helpful?