Was this page helpful?
This page describes how to set up a Spark cluster locally on your machine by using Docker containers. It requires Docker and Git.
Note
This approach is useful if you do not need a high-level of performance, and want to quickly try out the Migrator without having to set up a real cluster of nodes. We recommend using a real cluster for production, though.
Clone the Migrator repository.
git clone https://github.com/scylladb/scylla-migrator.git
cd scylla-migrator
Download the latest release of the scylla-migrator-assembly.jar
and put it in the directory migrator/target/scala-2.13/
.
mkdir -p migrator/target/scala-2.13
wget https://github.com/scylladb/scylla-migrator/releases/latest/download/scylla-migrator-assembly.jar \
--directory-prefix=migrator/target/scala-2.13
Alternatively, download a specific release of scylla-migrator-assembly.jar.
Start the Spark cluster.
docker compose up -d
Open the Spark web UI.
Rename the file config.yaml.example
to config.yaml
, and configure it according to your needs.
Finally, run the migration.
docker compose exec spark-master /spark/bin/spark-submit --class com.scylladb.migrator.Migrator \
--master spark://spark-master:7077 \
--conf spark.driver.host=spark-master \
--conf spark.scylla.config=/app/config.yaml \
<... other arguments> \
/jars/scylla-migrator-assembly.jar
The spark-master
container mounts the ./migrator/target/scala-2.13
dir on /jars
and the repository root on /app
.
See a complete description of the expected arguments to spark-submit
in page Run the Migration, and replace “<… other arguments>” above with the appropriate arguments.
You can monitor progress by observing the Spark web console you opened in step 4. Additionally, after the job has started, you can track progress via http://localhost:4040
.
FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.