Migrating a live server to another host with no downtime

I have had a 1U server co-located for some time now at iWeb Technologies' datacenter in Montreal. So far I've had no issues and it did a wonderful job hosting websites & a few other VMs, but because of my concern for its aging hardware I wanted to migrate away before disaster struck.

Modern VPS offerings are a steal in terms of they performance they offer for the price, and Linode's 4096 plan caught my eye at a nice sweet spot. Backed by powerful CPUs and SSD storage, their VPS is blazingly fast and the only downside is I would lose some RAM and HDD-backed storage compared to my 1U server. The bandwidth provided wit the Linode was also a nice bump up from my previous 10Mbps, 500GB/mo traffic limit.

When CentOS 7 was released I took the opportunity to immediately start modernizing my CentOS 5 configuration and test its configuration. I wanted to ensure full continuity for client-facing services - other than a nice speed boost, I wanted clients to take no manual action on their end to reconfigure their devices or domains.

I also wanted to ensure zero downtime. As the DNS A records are being migrated, I didn't want emails coming in to the wrong server (or clients checking a stale inboxes until they started seeing the new mailserver IP). I can easily configure Postfix to relay all incoming mail on the CentOS 5 server to the IP of the CentOS 7 one to avoid any loss of emails, but there's still the issue that some end users might connect to the old server and get served their old IMAP inbox for some time.

So first things first, after developing a prototype VM that offered the same service set I went about buying a small Linode for a month to test the configuration some of my existing user data from my CentOS 5 server. MySQL was sufficiently easy to migrate over and Dovecot was able to preserve all UUIDs, so my inbox continued to sync seamlessly. Apache complained a bit when importing my virtual host configurations due to the new 2.4 syntax, but nothing a few sed commands couldn't fix. So with full continuity out of the way, I had to develop a strategy to handle zero downtime.

With some foresight and DNS TTL adjustments, we can get near zero downtime assuming all resolvers comply with your TTL. Simply set your TTL to 300 (5 minutes) a day or so before the migration occurs and as your old TTL expires, resolvers will see the new TTL and will not cache the IP for as long. Even with a short TTL, that's still up to 5 minutes of downtime and clients often do bad things... The IP might still be cached (e.g. at the ISP, router, OS, or browser) for longer. Ultimately, I'm the one that ends up looking bad in that scenario even though I have done what I can on the server side and have no ability to fix the broken clients.

To work around this, I discovered an incredibly handy tool socat that can make magic happen. socat routes data between sockets, network connections, files, pipes, you name it. Installing it is as easy as: yum install socat

A quick script later and we can forward all connections from the old host to the new host:


# Stop services on this host
for SERVICE in dovecot postfix httpd mysqld;do
  /sbin/service $SERVICE stop

# Some cleanup
rm /var/lib/mysql/mysql.sock

# Map the new server's MySQL to localhost:3307
# Assumes capability for password-less (e.g. pubkey) login
ssh $NEWIP -L 3307:localhost:3306 &
socat unix-listen:/var/lib/mysql/mysql.sock,fork,reuseaddr,unlink-early,unlink-close,user=mysql,group=mysql,mode=777 TCP:localhost:3307 &

# Map ports from each service to the new host
for PORT in 110 995 143 993 25 465 587 80 3306;do
  echo "Starting socat on port $PORT..."
  socat TCP-LISTEN:$PORT,fork TCP:${NEWIP}:${PORT} &
  sleep 1

And just like that, every connection made to the old server is immediately forwarded to the new one. This includes the MySQL socket (which is automatically used instead of a TCP connection a host of 'localhost' is passed to MySQL).

Note how we establish a SSH tunnel mapping a connection to localhost:3306 on the new server to port 3307 on the old one instead of simply forwarding the connection and socket to the new server - this is done so that if you have users who are permitted on 'localhost' only, they can still connect (forwarding the connection will deny access due to a connection from a unauthorized remote host).

Update: a friend has pointed out this video to me, if you thought 0 downtime was bad enough... These guys move a live server 7km through public transport without losing power or network!

Alex Williamson's talk at KVM Forum 2014

Alex gave a very interesting talk at KVM Forum 2014 about the current state of VGA passthrough using KVM & VFIO:
Also, I think nVidia is making an incredibly silly choice (apparently accidentally) causing Code 43 in their drivers when virtualization is detected and refusing to fix the bugs. Virtualization is becoming evermore powerful and this is just going to push potential customers away to AMD. Once they establish a reputation for their cards not working well with virtualization, they're going to have trouble gaining custom confidence even if they reverse their stance on not fixing the Code 43 bugs.

Sharing your Cyberduck bookmarks between computers via coud sync (Dropbox, Google Drive or Copy)

Cyberduck recently removed a particularly useful piece of information from their wiki regarding the sharing of bookmarks because it is no longer compatible with the sandboxed variant of Cyberduck available from the App Store. It is, however, still compatible with the Windows and OS X download available directly from its website.

To setup bookmark sharing between Cyberduck clients (works with both OS X or Windows), simply create a folder in your cloud sync folder and then point Cyberduck to it.

On OS X, open a Terminal and execute:

defaults write ch.sudo.cyberduck ~/Dropbox/Cyberduck

On Windows, press Super+R (Super is the key with the Windows logo on it) to open the "Run" dialog, and enter %APPDATA%. Next, open the Cyberduck.exe_Url_[some_garble]\[Version]\user.config file and modify the config file to add the new parameter:

<setting name="CdSettings" serializeAs="Xml">
      <setting name="" value="C:\Users\yourname\Dropbox\Cyberduck" />

Advanced Server Monitoring with Riemann and Graphite

My current server monitoring setup is documented in my CentOS 5 server tutorials. It consists of Nagios for service monitoring and Cacti for graphing of metrics including system load, network and disk space.

Both tools are very commonly used and lots of resources are available on their setup & configuration, but I never kicked the feeling that they were plain clunky. Over the past several months, I have performed several research and evaluated a variety of tools and thankfully came across the monitoring sucks effort which aims to document a bunch of blog posts on monitoring tools and their different merits and weaknesses. The collection of all documentation the is now kept in the monitoring sucks GitHub repo.

Long story short, each tool seems to only do part of the job. I hate redundancy, and I believe that a good monitoring system would:

  1. provide an overview of the current service status;
  2. notify you appropriately and timely when things go wrong; and
  3. provide a historical overview of data to establish some sort of baseline / normal level for collected metrics (i.e graphs and 99-percentiles)
  4. ideally, be able to react proactively when things go wrong

You'll find that most tools will do two of four above well, which is just enough to be annoyingly useful. You'll need to implement 2-3 overlapping tools that do one thing well and the other just okay. Well, I don't like to live with workarounds.

Choosing the right tool for the job

I did a bit of research and solicited some advice on r/sysadmin, but sadly it did not get enough upvotes to be very noticed. Collectd looked like a wonderful utility. It is simple, high-performance and focused on doing one thing well. It was trivial to get it writing tons of system metrics to RRD files, at which point Visage provided a smooth user interface. Although it was a step in the right direction as far as what I was looking for, it still only did two of the four items above.

Introducing Riemann

Then, I stumbled across Riemann through his Monitorama 2013 presentation. Although not the easiest to configure and its notification support is a bit lacking, it has several features that immediately piqued my interest:

  • Its architecture forgoes the traditional polling and instead processes arbitrary event streams.
    • Events can contain data (the metric) as well as other information (hostname, service, state, timestamp, tags, ttl)
    • Events can be filtered by their attributes and transformed (percentiles, rolling averages, etc)
    • Monitoring up new machines is as easy as pushing to your Riemann server from the new host
    • Embed a Riemann client into your application or web service and easily add application level metrics
    • Let collectd do what it does best and have it shove the machine's health metrics to Riemann as an event stream
  • It is built for scale, and can handle thousands of events per second
  • Bindings (clients) are available in multitudes of languages
  • Has (somewhat primitive) support for notifications and reacting to service failures, but Riemann is extensible so you can add what you need
  • An awesome, configurable dashboard

All of this is described more adequately and in greater detail on its homepage. So how do you get it?

Installing Riemann

This assumes you are running CentOS 6 or more better (e.g. recent version of Fedora). In the case of CentOS, it also assumes that you have installed the EPEL repository.

yum install ruby rubygems jre-1.6.0
gem install riemann-tools daemonize
rpm -Uhv
chkconfig riemann on
service riemann start

Be sure to open ports 5555 (both TCP and UDP), 5556 (TCP) and in your firewall. Riemann will uses 5555 for event submission, 5556 for a WebSockets connection to the server.

Riemann is now ready to go and accept events. You can modify your configuration at /etc/riemann/riemann.config as required - here is a sample from my test installation:

; -*- mode: clojure; -*-
; vim: filetype=clojure

(logging/init :file "/var/log/riemann/riemann.log")

; Listen on the local interface over TCP (5555), UDP (5555), and websockets (5556)
(let [host "my.hostname.tld"]
  (tcp-server :host host)
  (udp-server :host host)
  (ws-server  :host host))

; Expire old events from the index.
(periodically-expire 5)

; Custom stuffs

; Graphite server - connection pool
(def graph (graphite {:host "localhost"}))
; Email handler
(def email (mailer {:from "riemann@my.hostname.tld"}))

; Keep events in the index for 5 minutes by default.
(let [index (default :ttl 300 (update-index (index)))]

  ; Inbound events will be passed to these streams:

    (where (tagged "rollingavg")
      (rate 5
        (percentiles 15 [0.5 0.95 0.99] index)
        index graph
        index graph

    ; Calculate an overall rate of events.
    (with {:metric 1 :host nil :state "ok" :service "events/sec" :ttl 5}
      (rate 5 index))

    ; Log expired events.
      (fn [event] (info "expired" event)))

The default configuration was modified here to do a few things differently:

  • Expire old events after only 5 seconds
  • Automatically calculate percentiles for events tagged with rollingavg
  • Send all event data to Graphite for graphing and archival
  • Set an email handler that, with some minor changes, could be used to send service state change notifications

Installing Graphite

Graphite can take data processed by Riemann and store it long-term, while also giving you tons of neat graphs.

yum --enablerepo=epel-testing install python-carbon python-whisper graphite-web httpd

We now need to edit /etc/carbon/storage-schemas.conf to tweak the time density of retained metrics. Since Riemann supports processing events quickly, I like to retain events at a higher precision than the default settings:

# Schema definitions for Whisper files. Entries are scanned in order,
# and first match wins. This file is scanned for changes every 60 seconds.
#  [name]
#  pattern = regex
#  retentions = timePerPoint:timeToStore, timePerPoint:timeToStore, ...

# Carbon's internal metrics. This entry should match what is specified in
pattern = ^carbon\.
retentions = 60:90d

#pattern = .*
#retentions = 60s:1d

pattern = .*
retentions = 10s:1h, 1m:7d, 15m:30d, 1h:2y

After making your changes, start the carbon-cache service:

service carbon-cache start
chkconfig carbon-cache on
touch /etc/carbon/storage-aggregation.conf

Now that Graphite's storage backend, Carbon, is running, we need to start Graphite:

python /usr/lib/python2.6/site-packages/graphite/ syncdb
chown apache:apache /var/lib/graphite-web/graphite.db
service httpd graceful

Graphite should now be available on http://localhost - if this is undesirable, edit /etc/httpd/conf.d/graphite-web.conf and map it to a different hostname / URL according to your needs.

Note: as of writing, there's a bug in the version of python-carbon shipped with EL6 that complains incessantly to your logs if the storage-aggregation.conf configuration file doesn't exist. Let's create it to avoid a hundred-megabyte log file:

touch /etc/carbon/storage-aggregation.conf

But what about EL5

I am not going to detail how to install the full Riemann server on EL5, as the dependencies are far behind and it would require quite a bit of work. However, it is possible to install riemann-tools on RHEL/CentOS 5 for monitoring the machine with minimal work.

The rieman-health initscript requires the 'daemonize' command, install it via yum (EL6) or obtain it for EL5 here:

The riemann-tools ruby gem and its dependencies will require a few development packages in order to build, as well as Karan's repo providing an updated ruby-1.8.7:

cat << EOF >> /etc/yum.repos.d/karan-ruby.repo
yum update ruby\*
yum install ruby-devel libxml2-devel libxslt-devel libgcrypt-devel libgpg-error-devel
gem install riemann-tools --no-ri --no-rdoc

Building a home media server with ZFS and a gaming virtual machine

Work has kept me busy lately so it's been a while since my last post... I have been doing lots of research and collecting lots of information over the holiday break and I'm happy to say that in the coming days I will be posting a new server setup guide, this time for a server that is capable of running redundant storage (ZFS RAIDZ2), sharing home media (Plex Media Server, SMB, AFP) as well as a full Windows 7 gaming rig simultaneously!

Windows runs in a virtual machine and is assigned it's own real graphics card from the host's hardware using the using the brand-new VFIO PCI passthrough technique with the VGA quirks enabled. This does require a motherboard and CPU with support for IOMMU, more commonly known as VT-d or AMD-Vi.