MySQL Montreal Meetup is coming up on November 6th with MySQL’s Morgan Tocker presenting. According the description, performance and scalability will be among the topics. I though I’d jot down a few questions for the upcoming discussion.
The broad question I’d like to see addressed is this:
How to scale MySQL on AWS?
A leading duo in the space of virtual computing and storage service is AWS with S3 and EC2 services. These services are true game changers. And that includes MySQL deployments. As more and more people realize the convenience and economic benefits of running their next LAMP web apps on EC2 we need to figure out the best practices of MySQL deployments in those new environments. Few appoaches come to mind, but first let’s list some pertinent assumptions about the environement we’re dealing with:
- S3 sits behind a REST API that is slower that local disk i/o interface.
- Data written on the disk of an EC2 instance is volatile – an instance fails, the data is lost.
- The bandwidth between EC2 instances and S3 is free of charge
- EC2 instances come with a local disk with the level of reliability of a generic (cheap) Linux box.
MySQL Replication
In this approach one would setup a regular MySQL replication, with optional backups and automated slave-to-master promotion when master fails.
RightScale has been building some of the tooling to make MySQL deployments on AWS. The approach is described in detail and is available as part of a commercial offering from RightScale.
What is the state of the art in open source world?
CPU on EC2, data on S3
S3 cannot be mounted as a file system because its REST API doesn’t support all of the features that file system provides (eg. file locking). But. S3 API can be exposed as a block device. One such implementation is ElasticDrive. Another is S3NDB.
In this approach the data would reside on such block device. When an EC2 instance fails, you’d just launch another one and bind to the same block device. Data access would be slower than accessing the local disk, but performance may be suited for a some applications.
How will MySQL perform in such an environement? Which applications are suitable to use this kind of setup? Which ones aren’t? How do you know if this approach is for you?
Software RAID
In this approach one would use two disks on EC2 instances to setup a software RAID solution such as drbd. When one of the instance fails there would need to be some fail-over mechanism to switch MySQL from one instance to another. This approach is more expensive than the two above because one would have to run twice the amount of instances.
Does anyone use MySQL with such a setup? Is it worth it?
Look forward to the meetup.