Was this page helpful?
An Ansible playbook is provided in the ansible folder of our Git repository. The Ansible playbook will install the pre-requisites, Spark, on the master and workers added to the ansible/inventory/hosts
file. Scylla-migrator will be installed on the spark master node.
The Ansible playbook expects to be run in an Ubuntu environment, by a user named ubuntu
(like you get in AWS EC2 Ubuntu-based images).
Clone the Migrator Git repository:
git clone https://github.com/scylladb/scylla-migrator.git
cd scylla-migrator/ansible
Update ansible/inventory/hosts
file with master and worker instances
Update ansible/ansible.cfg
with location of private key if necessary
The ansible/template/spark-env-master-sample
and ansible/template/spark-env-worker-sample
contain environment variables determining number of workers, CPUs per worker, and memory allocations - as well as considerations for setting them.
run ansible-playbook scylla-migrator.yml
On the Spark master node:
cd scylla-migrator
./start-spark.sh
On the Spark worker nodes:
./start-slave.sh
Open Spark web console
Ensure networking is configured to allow you access spark master node via TCP ports 8080 and 4040
visit http://<spark-master-hostname>:8080
Review and modify config.yaml based whether you’re performing a migration to CQL or Alternator
If you’re migrating to ScyllaDB CQL interface (from Apache Cassandra, ScyllaDB, or other CQL source), make a copy review the comments in config.yaml.example
, and edit as directed.
If you’re migrating to Alternator (from DynamoDB or other ScyllaDB Alternator), make a copy, review the comments in config.dynamodb.yml
, and edit as directed.
As part of ansible deployment, sample submit jobs were created. You may edit and use the submit jobs.
For CQL migration: edit
scylla-migrator/submit-cql-job.sh
, change line--conf spark.scylla.config=config.yaml \
to point to the whatever you named theconfig.yaml
in previous step.For Alternator migration: edit
scylla-migrator/submit-alternator-job.sh
, change line--conf spark.scylla.config=/home/ubuntu/scylla-migrator/config.dynamodb.yml \
to reference theconfig.yaml
file you created and modified in previous step.
Ensure the table has been created in the target environment.
Submit the migration by submitting the appropriate job
CQL migration: ./submit-cql-job.sh
Alternator migration: ./submit-alternator-job.sh
You can monitor progress by observing the Spark web console you opened in step 7. Additionally, after the job has started, you can track progress via http://<spark-master-hostname>:4040
.
FYI: When no Spark jobs are actively running, the Spark progress page at port 4040 displays unavailable. It is only useful and renders when a Spark job is in progress.
Was this page helpful?