Was this page helpful?
Caution
You're viewing documentation for an unstable version of ScyllaDB Migrator. Switch to the latest stable version.
An Ansible playbook is provided in the ansible folder of our Git repository. The Ansible playbook will install the pre-requisites, Spark, on the master and workers added to the ansible/inventory/hosts
file. Scylla-migrator will be installed on the spark master node.
Target OS: The Ansible playbook expects the target hosts to use an Ubuntu-compatible Linux distribution. Ubuntu 22.04 LTS and Ubuntu 24.04 LTS are most broadly tested, but other Ubuntu-compatible Linux distributions are likely to work as well.
Target User: The Ansible playbook connects to the target hosts via SSH as the user ubuntu
, because this is the default user created by most AWS EC2 Ubuntu-based AMIs.
Clone the Migrator Git repository:
git clone https://github.com/scylladb/scylla-migrator.git
cd scylla-migrator/ansible
Update ansible/inventory/hosts
file with master and worker instances
Update ansible/ansible.cfg
with location of private key if necessary
The ansible/template/spark-env-master-sample
and ansible/template/spark-env-worker-sample
contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.
run ansible-playbook scylla-migrator.yml
On the Spark master node:
cd scylla-migrator
./start-spark.sh
On the Spark worker nodes:
./start-slave.sh
Open Spark web console
Ensure networking is configured to allow you access spark master node via TCP ports 8080 and 4040
visit http://<spark-master-hostname>:8080
Review and modify config.yaml based whether you’re performing a migration to CQL or Alternator
If you’re migrating to ScyllaDB CQL interface (from Apache Cassandra, ScyllaDB, or other CQL source), make a copy review the comments in config.yaml.example
, and edit as directed.
If you’re migrating to Alternator (from DynamoDB or other ScyllaDB Alternator), make a copy, review the comments in config.dynamodb.yml
, and edit as directed.
As part of ansible deployment, sample submit jobs were created. You may edit and use the submit jobs.
For CQL migration: edit
scylla-migrator/submit-cql-job.sh
, change line--conf spark.scylla.config=config.yaml \
to point to the whatever you named theconfig.yaml
in previous step.For Alternator migration: edit
scylla-migrator/submit-alternator-job.sh
, change line--conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \
to reference theconfig.yaml
file you created and modified in previous step.
Ensure the table has been created in the target environment.
Submit the migration by submitting the appropriate job
CQL migration: ./submit-cql-job.sh
Alternator migration: ./submit-alternator-job.sh
You can monitor progress by observing the Spark web console you opened in step 7. Additionally, after the job has started, you can track progress via http://<spark-master-hostname>:4040
.
FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.
Was this page helpful?