One of my pet projects at SoftLayer is looking at a small collection of fancy scripts that scan through all registered Internet domain names to see how many of them are hosted on SoftLayer’s infrastructure. There are a lot of fun little challenges involved, but one of the biggest challenges is managing the distribution of work so that this scan doesn’t take all year. Queuing services are great for task distribution, and for my initial implementation I decided to give running a RabbitMQ instance a try, since at the time it was the only queuing service I was familiar with. Overall, it took me about a week and one beefy server to go from “I need a queue,” to “I have a queue that is actually doing what I need it to.”
While what I had set up worked, looking back, there is a lot about RabbitMQ that I didn’t really have the time to figure out properly. Around the time I finished the first run of this project, Bluemix announced that its MQLight service would allow connections from non-Bluemix resources. So when I got some free time, I decided to move the project to a Bluemix-hosted MQ Light queue, and take some notes on how the migration went.
To better understand how much work was involved, let me quickly explain how the whole “scanning through every registered domain for SoftLayer hosted domains” thing works.
There are three main moving parts in the project:
- The Parser, which is responsible for reading through zone files (which are obtained from the various registrars), filtering out duplicates, and putting nicely formatted domains into a queue.
- The Resolver, which is responsible from taking the nicely formatted domains from queue #1, looking up the domain’s IP address, and putting the result into queue #2.
- The Checker, which takes the domains from queue #2, checks to see if the domains’ IPs belong to SoftLayer or not, and saves the result in a database.
Each queue entry is a package of about 500 domains, which is roughly 200Kb of text data consisting of the domain and some meta-data that I used to see how well everything was performing. There are around 160 million domains I need to review, and resolving a single domain can take anywhere from .001 seconds to four seconds, so being able to push domains quickly through queues is very important.
Things to be aware of
Going into this migration, I made a lot of assumptions about how things worked that caused me grief. So if you are in a similar situation, here is what I wish someone had told me.
AMQP 1.0: MQLight implements the AMQP 1.0 protocol, which is great, because it is the newest and greatest. As everyone knows, newer is usually better. The problem is that my application was using the python-pika library to connect to RabbitMQ, both of which implement AMQP 0.9, which isn’t fully compatible with AMQP 1.0. The Python library I was using gave me a version error when trying to connect to MQ Light. This required a bit of refactoring of my code in order to get everything working properly. The core ideas are the same, but some of the specific API calls are slightly different.
Persistence: Messages sent to a MQ Light queue without active subscribers will be lost, which took me a while to figure out. The UI indicates when this happens, so this is likely just a problem of me not reading the documentation properly and assuming MQ Light worked like RabbitMQ.
Threads: The python-mqlight library uses threads fairly heavily, which is great for performance, but it makes programming a little more thought intensive. Make sure you wait for the connection to initialize before sending any messages, and make sure all your messages have been sent in before exiting.
Apache Proton: MQ Light is built on the Apache Qpid Proton project, and the Python library python-mqlight also uses this.
Setting up MQ Light
Aside from those small issues I mentioned, MQ Light was really easy to set up and start using, especially when compared to running my own RabbitMQ instance.
- Set up the MQ Light Service in Bluemix.
- Install the python-mqlight library (or whatever library supports your language of choice). There are a variety of MQ Light Libraries.
- Try the send/receive examples.
- Write some code.
- Watch the messages come in, and profit.
That’s all there is to it. As a developer, the ease with which I can set up services to try is one of the best things about Bluemix, with MQ Light making a great addition to its portfolio of services.
Some real numbers
After I re-factored my code to be able to use either the pika or python-mqlight libraries interchangeably, I ran a sample set of data through each library to see what impact they had on overall performance, and I was pleasantly surprised to see the results.
Doing a full run-through of all domains would take about seven hours, so I ran this test with only 10,364 domains. Below are the running times for each section, in seconds.
This server was running on a 4 core, 49G Ram VSI.
Since I am using the free, shared tier of MQ Light, I was honestly expecting much worse results. Having only a few seconds increase in runtime was a really big win for MQ Light.
Overall, I was very pleased working with MQ Light, and I highly suggest it as a starting place for anyone wanting to check out queuing services. It was easy to set up, free to try out, and pretty simple once I started to understand the basics.