PagerDuty Blog

Trading up Your Engine: How to Move Your IOPS-heavy MySQL/Rails Stack to Unicode Without Downtime

Out with the old, in with the New

You’re a techie working for one of the multitude of startups that rushed to market, where the founders hastily glued a Rails app together with candy-bar wrappers and tinfoil.  Once it became obvious that enthusiasm was no substitute for raw coding power, developers were hired paper over holes in the software architecture.  Finally, when those developers realized what manner of untamed beast the app was, they hired you to clean up the mess and make things pretty.

You know your stack.  You’ve got an old MySQL database; probably MySQL 5.0 or 5.1.  It was set up with default settings (read: we support English) from day one, and likely the only real change (“advancement”) anyone added since then is a read-slave and asynchronous replication.  After years of continued operation in this mode, your devs have come up with a thousand unmaintainable, awful fixes to allow some non-ASCII characters to be stored in BLOB fields.  Meanwhile, your support people complain that most of the planet gets errors using your app with their non-romanized names, and management is annoyed at the sheer number of subtly different transliteration functions in the code.

This was the situation at PagerDuty several months back, and this article discusses how we fixed it – how we transitioned from MySQL 5.1 storing latin1 (ISO-8859-1) characters to the shiny MySQL 5.5 with unicode (UTF-8) characters… and how you never noticed.

The Problem With MySQL in a Nutshell

The character sets used by MySQL when writing data to disk impose some limitations on your application.  A naïve user might claim that MySQL doesn’t need to know anything about character sets, which would make sense if you wanted poor performance when sorting your strings; your CHARs and VARCHARs.  Since you want to take advantage of database indexing to perform implicit server side sorts (in-process client-side sorting:  please die), MySQL has to understand the characters you’re using so that it has a context to sort in that isn’t just ordinal value.  Unfortunately, the default character set MySQL understands is latin1, which excludes symbols used in approximately 90% of the world.  A unicode character set like UTF-8 is much more appropriate when you intend to store multinational-symbol strings without resorting to BLOBs.

MySQL character sets are cooked into a column at column create time.  Of course MySQL has long allowed you to ALTER TABLE and modify this property, which makes it easier to move from one character set to another, but ALTER TABLE locks the whole table when doing its work, which is no good for live applications under heavy write, where your users expect continued responsiveness.  Something a little more involved is necessary.  This is the story of that something.

Before You Get Started, Read the Requirements

At PagerDuty, we saw this challenge as a surmountable technical obstacle that shouldn’t impact our business.  Namely:

  • For as long as we continue to use MySQL, we never again want to migrate datastores due to symbol-related storage/input problems (we want to accept a universal symbol set).
  • This switch needed to have at most negligible impact on the ongoing performance of the PagerDuty application (no new cloud infrastructure could be brought up for the purposes of event throughput).
  • Corollary:  no significant storage resources should be newly allocated to accommodate UTF-8-encoded MySQL characters (we allow for at most 2 times the old storage requirements; this is not unreasonable given that most of our users will simply use romanized characters, expecting anything else to fail).
  • The whole procedure that gets this out the door should incur negligible (< 1 minute) downtime for our users.

Sound ambitious? This is the minimum set of conditions we were given, and we’re pleased to say we succeeded in meeting all of them.

MySQL – Unwinding a Plethora of Insanity

MySQL makes converting to UTF-8 incredibly painful, in order to try to cover up the limitations of the InnoDB engine.  We begin by discussing problems with indices over CHAR/VARCHAR data, assuming you use InnoDB (which we used, because at least our server was not from the Stone Age).

Did you know that InnoDB has low limits on the size of single-column indices?  We didn’t either, but we found out to what extent those sneaky MySQL devs went to try to prevent this hurting you, the unwary user.  You see, MySQL 5.1’s “utf8” encoding is not true UTF-8.  UTF-8 supports symbols between 1-4 bytes long.  MySQL’s utf8 supports symbols only between 1-3 bytes in size.  This breaks our first objective – to support all characters.  In order to solve that little oversight, we used the “utf8mb4” encoding supplied in MySQL 5.5[1]… except we weren’t yet running 5.5.  Our solution to this problem required flipping database servers (I did say there;d be a bit of downtime!) – but we’ll get to that.

Initial testing of MySQL 5.5 was positive until we attempted to recreate our production tables via mysqldump[2] with UTF-8 encoding in place of of latin1:

mysqldump -d the_database | sed -e "s/(.*DEFAULT CHARSET=)latin1/1utf8mb4/" | mysql the_database_utf8

Please don’t hit us.   We were startled by this strange error:

ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Oh woe is MySQL!  To have seen what it has seen, see what it sees!  Indeed, if you glance at the fine print, InnoDB only supports single-column indices at most 767 bytes in size.  Perhaps that’s why “utf8” encoding only supports a maximum of 3 bytes per character:  it means conversions to utf8 from other charsets work when column indices are involved.  Consider; in order for the index comparator to be fast, all entries need to be the same size:  the maximum combined size of the column(s) they are striped over.  With a VARCHAR(255), a pretty standard cell type, and MySQL’s crippled utf8 encoding, max_length_of_string * max_size_of_char expands to 255 * 3 => 765.  With utf8mb4, 255 * 4 => 1020.  Oops.  What a pickle.

Thankfully, the link which describes this limitation also describes the workaround (allowing index size to grow to a max of 3072 bytes for a single column), which lead to some of the following lines in our /etc/my.cnf file:

[client]
default-character-set = utf8mb4

[mysqld]
default-storage-engine = INNODB
sql-mode="NO_ENGINE_SUBSTITUTION"

# file_per_table is required for large_prefix
innodb_file_per_table
# file_format = Barracuda is required for large_prefix
innodb_file_format = Barracuda
# large_prefix gives max single-column indices of 3072 bytes = win!
# we'll also to set ROW_FORMAT=DYNAMIC on each table, though.
innodb_large_prefix

character-set-client-handshake = FALSE
collation-server = utf8mb4_unicode_ci
init-connect='SET collation_connection = utf8mb4_unicode_ci'
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4

[mysqldump]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

We’re convinced that there’s a more concise way to get what you want out of MySQL, but as the old adage tells us about this situation:  “take off, nuke the site from orbit… it’s the only way to be sure.”  If you boot up a server with this my.cnf file and mysql_install_db, CREATE TABLE statements specifying ROW_FORMAT=DYNAMIC will do the right thing, and give you CHAR/VARCHAR strings that can be indexed, while also supporting all the symbols you could ever want.

There is a somewhat-related problem here, that multi-column indices are also bound to a maximum of 3072 bytes.  This one might be harder to solve.  We didn’t have a clever solution to the issue – only one composite index was affected by it, and that index happened to run over a table with few rows (which was thus ALTERable).  The index ran over column phone_number which was needlessly a VARCHAR(255), so a quick ALTER TABLE (well, its abstracted cousin, the Rails “migration”) took care of this for us and sized it down.

Collations:  Learn to Stop Worrying and Love Unicode

Our thinking was that suddenly enlarging indices would turn our snappy MySQL server into a lumbering behemoth.  This turned out not to be the case – the net result of our migration was very neutral, or teetering on a small speed gain!  If you’re running a semi-modern Rails application, chances are, this will be true for you too.  The reason?  Collations.

Collations tell MySQL how to sort strings so that they make sense; intuitively, the string “abc” comes before “bbc” in an ascending sort because the leading ‘a’ alphabetically precedes ‘b’.  However, complicated characters require more complicated rules.  For example, the Deutsches Institut für Normung (DIN) defines two possible latin1 collations; DIN-1 (German dictionary) ordering defines the ‘ß’ symbol as equivalent to ‘s’, and DIN-2 (German phone books) defines ß = “ss” (among other differences).  After the reduction is performed, a standard (English) lexical sort is used.

This matters when client connections want a query ordered by a string field, and thus you need some way to sort the strings.  MySQL collations, if used properly, give you that ordering for basically free (as long as you have an index over the string fields).  One often disregarded prerequisite to this benefit is that both client and server must be speaking the same character set – and the same collation.  It turns out that until our UTF-8 database migration, this was not true at PagerDuty.

Consider your Rails application.  Chances are, you’re using either the MySQL/Ruby or  the mysql2 gem to power ActiveRecord.  They’re reading from a database.yml file which specifies what as the encoding type?  Oh, utf8?  If you dig through the gem code for a while (which we ended up having to do), you’ll notice that that encoding gets passed into the MySQL connection settings; it becomes the character-set (and defines the collation) used to talk to MySQL.  The fact that you set this to utf8 while talking to a latin1-backed DB is the core of the speedup you’re about to make.

Fact: this entire time, you’ve been wasting CPU cycles on sorting.  MySQL has abstracted this away, and has given you strings sorted in an order that your client (Rails) app understood, all the while having to maintain a mapping between a UTF-8 collation (probably utf8_general_ci) and whatever collation your tables have been bound by.  Don’t believe me?  Watch what happens when you set the client and server to both run at utf8mb4 with the same collation (we chose utf8mb4_unicode_ci; see here for a discussion of Unicode collation differences in MySQL).  Enjoy the speedup.  Thank me later.

Keep Your Data Chugging:  Migrate + Replicate + Upgrade

It took us this much text, but at last we get to the tricky part:  how are you going to migrate your old, coal-powered datastore?  You’ve already seen a clever hack using mysqldump + sed to load data into a new server.  But your old database is still under write – so now what?  The solution here involves MySQL throwing you a bone with this insistence on knowing and separating client/server character sets.

We’d love to see the internals of how this works, but unfortunately I wasn’t able to take the time to read the MySQL source code or find credible information about this.  Using the above config file for our new server, simply setting up master/slave replication between our old DB and the new 5.5 UTF-8 one worked flawlessly.  We tested all manner of latin1 characters inserted into the old database, and they came back without issue in the replicated copy.  MySQL was doing all the correct translations, and we just had to sit back and watch.  Once the mesmerizing effect wore off, it was time to do some work – namely, all of our webservers needed to have their mysql-client packages updated.  For you see, mysql-client 5.1 doesn’t speak utf8mb4, and will have some issues talking to your 5.5 server also.

In order to do this, we used chef to quickly spin up new app-backend servers – clones of our existing servers – but with the new mysql-client version, and configured so that the background workers (all of PagerDuty’s queue processing and asynchronous tasks) were disabled.  The wonderful fruits of the cloud – occasionally useful!  These servers were pointed at the slave database, and were already configured (via chef environment settings, which make much more sense to those who use chef) to fully support utf8mb4 all the way through Rails via mysql2.  With these bad boys ready (and some testing to ensure that they worked the way our testing environments had), we were ready to flip our database.

Do it, Do it Now!  Come on, Flip Me!

The flip process is that incredibly risky moment when you’re simply not sure if everything will work or if you missed some crucial detail, and your customers are about to be very unhappy.  You nervously go through your checklist, making sure that you underscore the critical moment of no return.  In our flip, we had the following components:

  • shut down current background workers on the old app-backends
  • (at this point we’re no longer processing notification sending, but still queueing requests)
  • lock the master database
  • (at this point we are fully stopped – this is the downtime you were warned about!  New requests are frozen)
  • stop and reset the slave
  • run chef on our customer-facing load-balancers, bringing them into the new chef environment and changing them to use our newly spun-up machines as app-backends
  • (at this point we’re taking requests again, pre-flip requests will time-out)
  • spin up background workers on the new app-backends
  • (at this point we are fully functional)
  • terminate the old app-backends

As you can imagine since I’m talking to you about it now, all of these executed without error.  You’ve read in another post how our background processes run, and how easy it is to script monit, particularly in conjunction with chef, to shut down and spin up our background tasks.  Chef-client executions on our load-balancers typically complete within 20 seconds as a result of tireless work by our ops team, so we knew this would be the upper bound for our downtime.  The only SQL commands we had to run to jog things into action were:

(on the master)

BEGIN;
FLUSH TABLES WITH READ LOCK;

(on the slave, once you verify it’s caught up with the master)

STOP SLAVE;
RESET SLAVE;

That was it.  The stress was gone, and you, the customer, barely noticed we were temporarily ignoring your events.  All that was left was leisurely reconfiguration of our slaves and backups to target a new MySQL server.  Oh, and Rails needed some love; here’s what we added to our application’s config/initializers/activerecord_ext.rb:

module ActiveRecord
  module ConnectionAdapters
    module SchemaStatements
      def create_table_with_dynamic_row_format(table_name, options = {}, &block)
        new_options = options.dup
        new_options[:options] ||= ""
        new_options[:options] << " DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC"
        create_table_without_dynamic_row_format(table_name, new_options, &block)
      end

      alias_method_chain :create_table, :dynamic_row_format
    end
  end
end

Post-Mortem

If you made it this far, congratulations dear reader – you’re dedicated.  Hopefully, this article inspires you to do good and erase the stains of years of predominantly latin1 applications in your company.  Give that engine a good overhaul.  Be warned though… in the technologically-misimplemented world of Unicode, the excitement never ends.  There are always more components to bring into the 21st-century language mix.

[1] If you oppose Han Unification, then after all the effort here we unfortunately still don’t support your eclectic characters.

[2] Your mysqldump should be set to use the utf8 character set (a strict superset of latin1) to generate your text files, otherwise you may see a whackload of gibberish inserted into your new DB.