A planet of blogs from our members...

Tim HopperPyspark's AggregateByKey Method

I can't find a (py)Spark aggregateByKey example anywhere, so I made a quick one.

Tim HopperSundry Links for September 30, 2014

Hammock: A lightweight wrapper around the Python requests module to convert REST APIs into "dead simple programmatic APIs". It's a clever idea. I'll have to play around with it before I can come up with a firm opinion.

pipsi: Pipsi wraps pip and virtualenv to allow you to install Python command line utilities without polluting your global environment.

Writing a Command-Line Tool in Python: Speaking of Python command line utilities, here's a little post from Vincent Driessen on writing them.

Iterables vs. Iterators vs. Generators: Vincent has been on a roll lately. He also wrote this "little pocket reference on iterables, iterators and generators" in Python.

Design for Continuous Experimentation: Talk and Slides: I didn't watch the lecture, but Dan McKinley's slides on web experimentation are excellent.

Apache Spark: A Delight for Developers: I've been playing with PySpark lately, and it really is fun.

Caktus GroupCelery in Production

(Thanks to Mark Lavin for significant contributions to this post.)

In a previous post, we introduced using Celery to schedule tasks.

In this post, we address things you might need to consider when planning how to deploy Celery in production.

At Caktus, we've made use of Celery in a number of projects ranging from simple tasks to send emails or create image thumbnails out of band to complex workflows to catalog and process large (10+ Gb) files for encryption and remote archival and retrieval. Celery has a number of advanced features (task chains, task routing, auto-scaling) to fit most task workflow needs.

Simple Setup

A simple Celery stack would contain a single queue and a single worker which processes all of the tasks as well as schedules any periodic tasks. Running the worker would be done with

python manage.py celery worker -B

This is assuming using the django-celery integration, but there are plenty of docs on running the worker (locally as well as daemonized). We typically use supervisord, for which there is an example configuration, but init.d, upstart, runit, or god are all viable alternatives.

The -B option runs the scheduler for any periodic tasks. It can also be run as its own process. See starting-the-scheduler.

We use RabbitMQ as the broker, and in this simple stack we would store the results in our Django database or simply ignore all of the results.

Large Setup

In a large setup we would make a few changes. Here we would use multiple queues so that we can prioritize tasks, and for each queue, we would have a dedicated worker running with the appropriate level of concurrency. The docs have more information on task routing.

The beat process would also be broken out into its own process.

# Default queue
python manage.py celery worker -Q celery
# High priority queue. 10 workers
python manage.py celery worker -Q high -c 10
# Low priority queue. 2 workers
python manage.py celery worker -Q low -c 2
# Beat process
python manage.py celery beat

Note that high and low are just names for our queues, and don't have any implicit meaning to Celery. We allow the high queue to use more resources by giving it a higher concurrency setting.

Again, supervisor would manage the daemonization and group the processes so that they can all be restarted together. RabbitMQ is still the broker of choice. With the additional task throughput, the task results would be stored in something with high write speed: Memcached or Redis. If needed, these worker processes can be moved to separate servers, but they would have a shared broker and results store.

Scaling Features

Creating additional workers isn't free. The default concurrency uses a new process for each worker and creates a worker per CPU. Pushing the concurrency far above the number of CPUs can quickly pin the memory and CPU resources on the server.

For I/O heavy tasks, you can dedicate workers using either the gevent or eventlet pools rather than new processes. These can have a lower memory footprint with greater concurrency but are both based on greenlets and cooperative multi-tasking. If there is a library which is not properly patched or greenlet safe, it can block all tasks.

There are some notes on using eventlet, though we have primarily used gevent. Not all of the features are available on all of the pools (time limits, auto-scaling, built-in rate limiting). Previously gevent seemed to be the better supported secondary pool, but eventlet seems to have closed that gap or surpassed it.

The process and gevent pools can also auto-scale. It is less relevant for the gevent pool since the greenlets are much lighter weight. As noted in the docs, you can implement your own subclass of the Autoscaler to adjust how/when workers are added or removed from the pool.

Common Patterns

Task state and coordination is a complex problem. There are no magic solutions whether you are using Celery or your own task framework. The Celery docs have some good best practices which have served us well.

Tasks must assert the state they expect when they are picked up by the worker. You won't know how much time has passed since the original task was queued and when it executes. Another similar task might have already carried out the operation if there is a backlog.

We make use of a shared cache (Memcache/Redis) to implement task locks or rate limits. This is typically done via a decorator on the task. One example is given in the docs though it is not written as a decorator.

Key Choices

When getting started with Celery you must make two main choices:

  • Broker
  • Result store

The broker manages pending tasks, while the result store stores the results of completed tasks.

There is a comparison of the various brokers in the docs.

As previously noted, we use RabbitMQ almost exclusively, though we have used Redis successfully and experimented with SQS. We prefer RabbitMQ because Celery's message passing style and much of the terminology was written with AMQP in mind. There are no caveats with RabbitMQ like there are with Redis, SQS, or the other brokers which have to emulate AMQP features.

The major caveat with both Redis and SQS is the lack of built-in late acknowledgment, which requires a visibility timeout setting. This can be important when you have long running tasks. See acks-late-vs-retry.

To configure the broker, use BROKER_URL.

For the result store, you will need some kind of database. A SQL database can work fine, but using a key-value store can help take the load off of the database, as well as provide easier expiration of old results which are no longer needed. Many people choose to use Redis because it makes a great result store, a great cache server and a solid broker. AMQP backends like RabbitMQ are terrible result stores and should never be used for that, even though Celery supports it.

Results that are not needed should be ignored, using CELERY_IGNORE_RESULT or Task.ignore_result.

To configure the result store, use CELERY_RESULT_BACKEND.

RabbitMQ in production

When using RabbitMQ in production, one thing you'll want to consider is memory usage.

With its default settings, RabbitMQ will use up to 40% of the system memory before it begins to throttle, and even then can use much more memory. If RabbitMQ is sharing the system with other services, or you are running multiple RabbitMQ instances, you'll want to change those settings. Read the linked page for details.

Transactions and Django

You should be aware that Django's default handling of transactions can be different depending on whether your code is running in a web request or not. Furthermore, Django's transaction handling changed significantly between versions 1.5 and 1.6. There's not room here to go into detail, but you should review the documentation of transaction handling in your version of Django, and consider carefully how it might affect your tasks.


There are multiple tools available for keeping track of your queues and tasks. I suggest you try some and see which work best for you.


When going to production with your site that uses Celery, there are a number of decisions to be made that could be glossed over during development. In this post, we've tried to review some of the decisions that need to be thought about, and some factors that should be considered.

Tim HopperiOS's Launch Center Pro, Auphonic, and RESTful APIs

Lately I've been using Auphonic's web service for automating audio post-production and distribution. You can provide Auphonic with an audio file (via Dropbox, FTP, web upload, and more), and it will perform any number of tasks for you, including

  • Tag the track with metadata (including chapter markings)
  • Intelligently adjust levels
  • Normalize loudness
  • Reduce background noise and hums
  • Encode the audio in numerous formats
  • Export the final production to a number of services (including Dropbox, FTP, and Soundcloud)

I am very pleased with Auphonic's product, and it's replaced a lot of post-processing tools I tediously hacked together with Hazel, Bash, and Python.

Among its many other features, API has a robust RESTful API available to all users. I routinely create Auphoic productions that vary only in basic metadata, and I have started using this API to automate creation of productions from my iPhone.

Launch Center Pro is a customizable iOS app that can trigger all kinds of actions in other apps. You can also create input forms in LCP and send the data from them elsewhere. I created a LCP action with a form for entering title, artist, album, and track metadata that will eventually end up in a new Auphonic production.

The LCP action settings looks like this 2:

When I launch that action in LCP, I get four prompts like this:

After I fill out the four text fields, LCP uses the x-callback URL I defined to send that data to Pythonista, a masterful "integrated development environment for writing Python scripts on iOS."

In Pythonista, I have a script called New Production. LCP passes the four metadata fields I entered as sys.argv variables to my Python script. The Python script adds these variables to a metadata dictionary that it then POSTs to the Auphonic API using the Python requests library. After briefly displaying the output from the Auphonic API, Pythonista returns me to LCP.

Here's my Pythonista script1:


import sys
import requests
import webbrowser
import time
import json
import datetime as dt

# Read input from LCP
title = sys.argv[1]
artist = sys.argv[2]
album = sys.argv[3]
track = sys.argv[4]

d = {
        "metadata": {
            "title": title,
            "artist": artist,
            "album": album,
            "track": track

# POST production to Auphonic API
r = requests.post("https://auphonic.com/api/productions.json",
          headers={"content-type": "application/json"}).json()

# Display API Response
print "Response", r["status_code"]
print "Error:", r["error_message"]
for key, value in r.get("data",{}).get("metadata",{}).iteritems():
if value:
    print key, ':', value

# Return to LCP

After firing my LCP action, I can log into my Auphonic account and see a incomplete3 production with the metadata I entered!

While I just specified some basic metadata with the API, Auphonic allows every parameter that can be set on the web client to be configured through the API. For example, you can specify exactly what output files you want Auphonic to create or create a production using one of your presets. These details just needed to be added to the d dictionary in the script above. Moreover, this same type of setup could be used with any RESTful API, not just Auphonic.

  1. If you want to use this script, you'll have to provide your own Auphonic username and password. 

  2. Here is that x-callback URL of you want to copy it: pythonista://{{New Production}}?action=run&args={{"[prompt:Title]" "[prompt:Artist]" "[prompt:Album]" "[prompt:Track]"}} 

  3. It doesn't have an audio file and hasn't been submitted. 

Tim HopperSundry Links for September 25, 2014

Philosophy of Statistics (Stanford Encyclopedia of Philosophy): I suspect that a lot of the Bayesian vs Frequentist debates ignore the important epistemological underpinnings of statistics. I haven’t finished reading this yet, but I wonder if it might help.

Connect Sunlight Foundation to anything: “The Sunlight Foundation is a nonpartisan non-profit organization that uses the power of the Internet to catalyze greater U.S. Government openness and transparency.” They now of an IFTTT channel. Get push notifications when the president signs a bill!

furbo.org · The Terminal: Craig Hockenberry wrote a massive post on how he uses the Terminal on OS X for fun and profit. You will learn things.

A sneak peek at Camera+ 6… manual controls are coming soon to you! : I’ve been a Camera+ user on iOS for a long time. The new version coming out soon is very exciting.

GitHut - Programming Languages and GitHub: A very clever visualization of various languages represented on Github and of the properties of their respective repositories.

Og MacielBooks

Woke up this morning and, as usual, sat down to read the Books section of The New York Times while drinking my coffee. This has become sort of a ‘tradition’ for me and because of it I have been able to learn about many interesting books, some of which I would not have found out on my own. I also ‘blame’ this activity to turning my nightstand into a mini-library on its own.

Currently I have the following books waiting for me:

Anyhow, while drinking my coffee this morning I realized just how much I enjoy reading and (what I like to call) catching up with all the books I either read when I was younger but took for granted or finally getting to those books that have been so patiently waiting for me to get to them. And now, whenever I’m not working or with my kids, you can bet your bottom dollar that you’ll find me somewhere outside (when the mosquitos are not buzzing about the yard) or cozily nestled with a book (or two) somewhere quiet around the house.

Book Queue

But to the point of this story, today I realized that, if I could go back in time (which reminds me, I should probably add “The Time Machine” to my list) to the days when I was looking to buy a house, I would have done two things differently:

  1. wire the entire house so that every room would have a couple of ethernet ports;
  2. chosen a house with a large-ish room and add wall-to-wall bookcases, like you see in those movies where a well-off person takes their guests into their private libraries for tea and biscuits;

I realize that I can’t change the past, and I also realize that perhaps it is a good thing that I took my book reading for granted during my high school and university years… I don’t think I would have enjoyed reading “Dandelion Wine” or “Mrs. Dalloway” as much back then as I when I finally did. I guess reading books is very much like the process of making good wines… with age and experience, the reader, not the book, develops the maturity and ability to properly savor a good story.

Tim HopperSundry Links for September 20, 2014

Open Sourcing a Python Project the Right Way: Great stuff that should be taught in school: “Most Python developers have written at least one tool, script, library or framework that others would find useful. My goal in this article is to make the process of open-sourcing existing Python code as clear and painless as possible.”

elasticsearch/elasticsearch-dsl-py: Elasticsearch is an incredible datastore. Unfortunately, its JSON-based query language is tedious, at best. Here’s a nice higher-level Python DSL being developed for it. It’s great!

Equipment Guide — The Podcasting Handbook: Dan Benjamin of 5by5 podcasting fame is writing a book on podcasting. Here’s his brief equipment guide.

bachya/pinpress: Aaron Bach put together a neat Ruby script that he uses to generate his link posts. This is similar to but better than my sundry tool.

Markdown Resume Builder: I haven’t tried this yet, but I like the idea: a Markdown based resume format that can be converted into HTML or PDF.

Git - Tips and Tricks: Enabling autocomplete in Git is something I should have done long ago.

Apache Storm Design Pattern—Micro Batching: Micro batching is a valuable tool when doing stream processing. Horton Works put up a helpful post outlining three ways of doing it.

Caktus GroupImproving Infant and Maternal Health in Rwanda and Zambia with RapidSMS

Image courtesy of UNICEF, the funders of this project.

I have had the good fortune of working internationally on mobile health applications due to Caktus' focus on public health. Our public health work often uses RapidSMS, a free and open-source Django powered framework for dynamic data collection, logistics coordination and communication, leveraging basic short message service (SMS) mobile phone technology. I was able to work on two separate projects tracking data related to the 1000 days between a woman’s pregnancy and the child’s second birthday. Monitoring mothers and children during this time frame is critical as there are many factors that, when monitored properly, can decrease the mortality rates for both mother and child. Both of these projects presented interesting challenges and resulted in a number of takeaways worth further discussion.


The first trip took me to Lusaka, the capitol of Zambia, to work on Saving Mothers Giving Life (SMGL) which is administered by the Zambia Center for Applied Health Research and Development (ZCAHRD) office. The ZCAHRD office had recently finished a pilot phase resulting in a number of additional requirements to implement before expanding the project. In addition to feature development and bug fixes, training a local developer was on the docket.

SMGL collects maternal and fetal/child data via SMS text messages.  When an SMS is received by the application, the message is parsed and routed for additional processing based on matching known keywords. For example, I could have a BirthHandler KeywordHandler that allows the application to track new births. Any message that begins with the keyword birth would be further processed by BirthHandler. KeywordHandlers must have, at a minimum, a defined keyword, help and handler functionality:

from rapidsms.contrib.handlers import KeywordHandler

class BirthHandler(KeywordHandler): 
    def help(self): 
        self.respond("Send BIRTH BOY or BIRTH GIRL.") 

    def handle(self, text): 
        if text.upper() == "BOY": 
            self.respond("A boy was born!") 
        elif text.upper() == "GIRL":
            self.respond("A girl was born!")

An example session:

 > birth 
 < Send BIRTH BOY or birth GIRL. 
 > birth boy 
 < A boy was born! 
 > birth girl
 < A girl was born!
 > birth pizza

New Keyword Handlers

The new syphilis keyword handler would allow clinicians to track a mother’s testing and treatment data. For our handler, a user supplies the SYP keyword, mother id, the date of the visit followed by the test result indicator or shot series and an optional next shot date:


To record a positive syphillis test result on January 1, 2013 for mother #1 with a next shot data of January 2, 2013, the following SMS would be sent:

  SYP 1 01 01 2013 P 02 01 2013

With these records in hand, the system’s periodic reminder application will send notifications to remind patients of their next visit. Similar functionality exists for tracking pre- and post-natal visits.

The other major feature implemented for this phase was a referral workflow.  It is critical for personnel at facilities ranging from the rural village to the district hospital to be aware of incoming patients with both emergent and non-emergent needs, as the reaction to each case differs greatly.  The format for SMGL referrals is as follows:


To refer mother #1 who is bleeding to facility #100 and requires emergency care:

  REFER 1 100 B 1200 EM

Based on the receiving facility and the reason as well as the emergent indicator differing people will be notified of the case. Emergent cases require dispatching ambulances, prepping receiving facilities and other essential components to increase the survivability for the mother and/or child, whereas non-emergent cases may only require clinical workers to be made aware of an inbound patient.


The reporting tools were fairly straightforward, creating web based views for each keyword handler that presented the data in filterable, sortable, tabular format. In addition, end users can export the data as a spreadsheet for further analysis.  These views allow clinicians, researchers, and other stakeholders easily accessible metrics to analyze the efficacy of the system as a whole.


As mentioned earlier, training a local developer was also a core component of this visit.  This person was the office’s jack of all trades for all things technical, from network and systems administration to shuttling around important data on thumb drives. Given his limited exposure to Python, we spent most of the time in country pair programming, talking through the model-view-template architecture and finding bite sized tasks for him to work through when not pair programming.

Zambia Takeaways:

  • It was relatively straightfoward to write one off views and exporters for the keyword handlers. But, as the number of handlers increases for the project, this functionality could benefit from abstracting into a generic DRY reporting tool.
  • When training, advocate that the participant has either 100% of his time allocated or draw up designated blocks of time during the day. The ad hoc schedule we worked with was not as fruitful as it could have been, as competing responsibilities often took precedence over actual Django/RapidSMS training.
  • If in Zambia, there are two requisite weekend trips: Victoria Falls and South Luangwa National Park . Visitors to Zambia do themselves a great disservice to not schedule trips to both areas.

Off to Rwanda!

UNICEF  recognized that many countries were working on solving the same problem, monitoring the patients and capturing the data from those all important first 1000 Days.  A 1000 Days initiative was put forward, whereby countries would contribute resources and code to a single open source platform that all countries could deploy independently. Evan Wheeler, a UNICEF Project Manager contacted Caktus about contributing to this project.

We were tasked with building three RapidSMS components of the 1000 Days architecture: an appointment application, a patient/provider API for storing and accessing records from different backends, and a nutrition monitoring application.  We would flesh out these applications before our in country visit to Kigali, Rwanda. While there, working closely with Evan and our in country counterparts, we would finish the initial versions of these applications as well as orient the local development team to the future direction of the 1000 Days deployment.

rapidsms-appointments  allows users to subscribe to a series of appointments based on a timeline of configurable milestones. Appointment reminders are sent out to patient/staff, and there are mechanisms for confirming, rescheduling, and tracking missed/made appointments. The intent of this application was to create an engine for generating keyword handlers based on appointments. Rather than having to write code for each individual timeline based series (pre- and post-natal mother visits, for example), one could simply configure these through the admin panel. The project overview documentation provides a great entry point.

rapidsms-healthcare obviates the need for countries’ to track patient/provider data in multiple databases. Many countries utilize 3rd party datastores, such as OpenMRS , to create a medical records system. With rapidsms-healthcare in 1000 Days, deployments can take advantage of pre-existing patient & provider data by utilizing a default healthcare storage backend, or creating a custom backend for their existent datastore. Additional applications can then utilize the healthcare API to access patients and providers.

rapidsms-nutrition is an example of such a library.  It will consume patient data from the healthcare API and monitor child growth, generating statistical assessments based on WHO Child Growth Standards. It utilizes the pygrowup library. With this data in hand, it is relatively easy to create useful visualizations with a library such as d3.js.

Rwanda Takeaways

  • Rwanda is gorgeous.  We had an amazing time in Kigali and at Lake Kivu, one of three EXPLODING LAKES in the world.

No report on Africa would be complete without a few pictures...enjoy!!


Tim HopperQuickly Converting Python Dict to JSON

Recently, I've spent a lot of time going back and forth between Python dicts and JSON. For some reason, I decided last week that I'd be useful to be able to quickly convert a Python dict to pretty printed JSON.

I created a TextExpander snippet that takes a Python dict from the clipboard, converts it to JSON, and pastes it.

Here are the details:

#!/usr/bin/env python
import os, json
import subprocess

def getClipboardData():
 p = subprocess.Popen(['pbpaste'], stdout=subprocess.PIPE)
 retcode = p.wait()
 data = p.stdout.read()
 return data

cb = eval(getClipboardData())

print json.dumps(cb, sort_keys=True, indent=4, separators=(',', ': '))

Caktus GroupQ3 Charitable Giving

Our client social impact projects continue here at Caktus, with work presently being done in Libya, Nigeria, Syria, Turkey, Iraq and the US. But every quarter, we pause to consider the excellent nonprofits that our employees volunteer for and, new this quarter, that they have identified as having a substantive influence on their lives. The following list represents employee-nominated nonprofits which we are giving to in alphabetical order:

Animal Protection Society of Durham

The Animal Protection Society of Durham (APS) is a non-profit organization that has been helping animals in our community since 1970, and has managed the Durham County Animal Shelter since 1990. IAPS feeds, shelters and provides medical attention for nearly 7,000 stray, surrendered, abandoned, abused and neglected animals annually.

The Carrack

The Carrack is owned and run by the community, for the community, and maintains an indiscriminate open forum that enables local artists to perform and exhibit outside of the constraints of traditional gallery models, giving the artist complete creative freedom.

Scrap Exchange

The Scrap Exchange is a nonprofit creative reuse center in Durham, North Carolina whose mission is to promote creativity, and environmental awareness. The Scrap Exchange provides a sustainable supply of high-quality, low-cost materials for artists, educators, parents, and other creative people.

Society for the Prevention of Cruelty to Animals - San Francisco

As the fourth oldest humane society in the U.S. and the founders of the No-Kill movement, the SF SPCA has always been at the forefront of animal welfare. SPCA SF’s animal shelter provides pets for adoption.

Southern Coalition for Southern Justice

The Southern Coalition for Social Justice was founded in Durham, North Carolina by a multidisciplinary group, predominantly people of color, who believe that families and communities engaged in social justice struggles need a team of lawyers, social scientists, community organizers and media specialists to support them in their efforts to dismantle structural racism and oppression.

Tim HopperSundry Links for September 10, 2014

textract: textract is a Python module and a command line tool for text extraction from many file formats. It cleverly pulls together many libraries into a consistent API.

Flask Kit: I've been reading a lot about Flask (the Python web server) lately. Flask Kit is a little tool to give some structure to new Flask projects.

cookiecutter: I was looking for this recently, but it I couldn't find it. "A command-line utility that creates projects from cookiecutters (project templates). E.g. Python package projects, jQuery plugin projects." There's even a Flask template!

Over 50? You Probably Prefer Negative Stories About Young People: A research paper from a few years ago show that older people prefer to read negative news about young people. "In fact, older readers who chose to read negative stories about young individuals actually get a small boost in their self-esteem."

Episode 564: The Signature: The fantastic Planet Money podcast explains why signatures are meaningless in a modern age. My scribbles have become even worse since listening to this.

github-selfies: Here's a Chrome and Firefox extension that allows you to quickly embed gif selfies in Github posts. Caution: may lead to improved team morale.

Caktus GroupDjangoCon 2014: Recap

Caktus had a great time at DjangoCon in Portland this year! We met up with old friends and new. The following staff gave talks (we’ll update this post with videos as soon as they’re available):

We helped design the website, so it was gratifying seeing the hard work of our design team displayed on the program ad and various points throughout the conference.

For fellow attendees, you probably noticed our giant inflatable duck, who came out in support of Duckling, our conference outings app. He told us he had a good time too.

Here’s some pictures of our team at DjangoCon:

Tim HopperTracking Weight Loss with R, Hazel, Withings, and IFTTT

As I have noted before, body weight is a noisy thing. Day to day, your weight will probably fluctuate by several pounds. If you're trying to lose weight, this noise can cause unfounded frustration and premature excitement.

When I started a serious weight loss plan a year and a half ago, I bought a wifi-enabled Withings Scale. The scale allows me to automatically sync my weight with Montior Your Weight, MyFitnessPal, RunKeeper, and other fitness apps on my phone. IFTTT also has great Withings support allowing me to push my weight to various other web services.

One IFTTT rule I have appends my weight to a text file in Dropbox. This file looks like this:

263.86 August 21, 2014 at 05:56AM
264.62 August 22, 2014 at 08:27AM
264.56 August 23, 2014 at 09:41AM
263.99 August 24, 2014 at 08:02AM
265.64 August 25, 2014 at 08:08AM
267.4 August 26, 2014 at 08:16AM
265.25 August 27, 2014 at 09:08AM
264.17 August 28, 2014 at 07:21AM
264.03 August 29, 2014 at 08:43AM
262.71 August 30, 2014 at 08:47AM

For a few months, I have been experimenting with using this time series to give myself a less-noisy update on my weight, and I've come up with a decent solution.

This R script will take my weight time series, resample it, smooth it with a rolling median over the last month, and write summary stats to a text file in my Dropbox. It's not the prettiest script, but it gets the job done for now.1

INPUT_PATH <- "~/Dropbox/Text Notes/Weight.txt"
OUTPUT_PATH <- "~/Dropbox/Text Notes/Weight Stats.txt"


con <- file(INPUT_PATH, "rt")
lines <- readLines(con)

parse.line <- function(line) {
  s <- strsplit(line, split=" ")[[1]]
  date.str <- paste(s[2:10][!is.na(s[2:10])], collapse=" ")
  date <- mdy_hm(date.str, quiet=TRUE)
  l <- list(as.numeric(s[1]), date)
  names(l) <- c("weight", "date")
list.weight.date <- lapply(lines, parse.line)
weights <- lapply(list.weight.date, function(X) X$weight)
dates <- lapply(list.weight.date, function(X) X$date)

df <- data.frame(weight = unlist(weights), date = do.call("c", dates) )

ts <- zoo(c(df$weight), df$date)
ts <- aggregate(ts, time(ts), tail, 1)
g <- round(seq(start(ts), end(ts), 60 * 60 * 24), "days")
ts <- na.approx(ts, xout = g)

days.ago <- function(days, smooth.n) {
  date <- head(tail(index(ts),days + 1),1)
  smoothed <- rollmedianr(ts, smooth.n)

days = 29
current.weight <- days.ago(0, days)
x <- c(current.weight,
       current.weight-days.ago(7, days),
       current.weight-days.ago(30, days),
       current.weight-days.ago(365, days),
x = round(x, 1)
names(x) = c("current", "7days", "30days", "365days", "max")

w <- c(paste("Weight (lbs):", x["current"]),
       paste("Total Δ:", x["max"]),
       paste("1 Week Δ:", x["7days"]),
       paste("1 Month Δ:", x["30days"]),
       paste("1 Year Δ:", x["365days"]))

The output looks something like this:

Weight (lbs): 265.7
Total Δ: -112
1 Week Δ: -0.8
1 Month Δ: -4.8
1 Year Δ: -75

I want this script to be run every time my weight is updated, so I created a second IFTTT rule that will create a new file in my Dropbox, called new_weight_measurement, every time I weigh in. On my Mac Mini, I have a Hazel rule to watch for a file of this name to be created. When Hazel sees the file, it runs my R script and deletes that file.

My Hazel rule looks like this:

The 'embedded script' that is run is the R script above; I just have to tell Hazel to use the Rscript shell.2

At this point, every time I step on my scale, a text file with readable statistics about my smoothed weight appear in my Dropbox folder.

Of course, I want this updated information to be pushed directly too me. Hazel is again the perfect tool for the job. I have a second Hazel rule that watches for Weight Stats.txt to be created. Hazel can pass the path of the updated file into any script of your choice. You could, for example, use Mailgun to email it to yourself or Pushover to push it to your mobile devices. Obviously, I want to tweet mine.

I have a Twitter account called @hopsfitness where I've recently been tracking my fitness progress. On my Mac Mini, I have t configured to access @hopsfitness from the command line. Thus, tweeting my updated statistics is just a matter of a little shell script executed by Hazel:

Since this data goes to Twitter, I can get it painlessly pushed to my phone: Twitter still allows you subscribe to accounts via text message, which I've done with @hopsfitness. A minute or so after I step on my scale, I get a text with useful information about where I am and where I'm going; this is much preferable to the noisy weight I see on my scale.

  1. This assumes your input file is formatted like mine, but you could easily adjust the first part of the code for other formats. 

  2. You can download R here; installing it should add Rscript to your system path. 

Tim HopperSundry Links for August 30, 2014

Ggplot2 To Ggvis: I'm a huge fan of ggplot2 for data visualization in R. Here's a brief tutorial for ggplot2 users to learn ggvis for generating interactive plots in R using the grammar of graphics.

From zero to storm cluster for scikit-learn classification | Daniel Rodriguez: This is a very cool, if brief, blog post on using streamparse, my company's open source wrapper for Apache Storm, and scikit-learn, my favorite machine learning library, to do machine learning on data streams.

Pythonic means idiomatic and tasteful: My boss Andrew recently shared an old blogpost he wrote on what it means for code to be Pythonic; I think he's right on track.

Pythonic isn’t just idiomatic Python — it’s tasteful Python. It’s less an objective property of code, more a compliment bestowed onto especially nice Python code.

git workflow: In my ever continuing attempt to be able to run my entire life from Alfred, I recently installed this workflow that makes git repositories on my computer easily searchable.

Alfred-Workflow: Speaking of Alfred, here's a handy Python library that makes it easy to write your own (if you're a Python programmer).

Squashing commits with rebase: Turns out you can use git rebase to clean up your commits before you push them to a remote repository. This can be a great way to make the commits your team sees more meaningful; don't abuse it.

Tim HopperSundry Links for August 28, 2014

How do I generate a uniform random integer partition?: This week, I wanted to generate random partitions of integers. Unsurprisingly, stackoverflow pulled through with a Python snippet to do just that.

Firefox and Chrome Bookmarks: I love Alfred as a launcher in OS X. I use it many, many times a day. I just found this helpful workflow for quickly searching and opening my Chrome bookmarks.

YNAB for iPad is Here: YNAB has been the best thing to ever happen to my financial life. I use it to track all my finances. They just released a beautiful iPad app. Importantly, it brings the ability to modify a budget to mobile!

Distributed systems theory for the distributed systems engineer: I work on distributed systems these days. I need to read some of these papers.

Tim HopperKeeping IPython Notebooks Running in the Background

I spend a lot of time in IPython Notebooks for work. One of the few annoyances of IPython Notebooks is that they require kepeing a terminal window open to run the notebook server and kernel. I routinely launch a Notebook kernel in a directory where I keep my work related notebooks. Earlier this week, I started to wonder if there was a way for me to keep this kernel running all the time without having to keep a terminal window open..

If you've ever tried to do chron-like automation on OS X, you've surely come across launchd, "a unified, open-source service management framework for starting, stopping and managing daemons, applications, processes, and script". You've probably also gotten frustated with launchd and given up.

I recently started using LaunchControl "a fully-featured launchd GUI" for launchd; it's pretty nice and worth $10. It occurred to me that LaunchControl would be a good way to keep my Notebook kernel running in the background.

I created a LaunchControl to run the following command.

/usr/local/bin/IPython notebook --matplotlib inline --port=9777 --browser=false

This launches an IPython Notebook kernel accessible on port 9777; setting the browser flag to something other than an installed browser prevents a browser window from opening when the kernel is launch.

I added three other launchd keys in LaunchControl:

  • A Working Directory key to tell LaunchControl to start my notebook in my desired folder.
  • A Run At Load key to tell it to start my kernel as soon as I load the job.
  • And a Keep alive key to tell LaunchControl to restart my kernel should the process ever die.

Here's how it looks in LaunchControl:

After I created it, I just had to save and load, and I was off to the races; the IPython kernel starts and runs in the background. I can access my Notebooks by navigating to in my browser. Actually, I added parsely.scratch to my hosts file so I can access my Notebooks at parsely.scratch:9777. This works nicely with Chrome's autocomplete feature. I'm avoiding the temptation to run nginx and give it an even prettier url.

Tim HopperSundry Links for August 25, 2014

How can I pretty-print JSON at the command line?: I needed to pretty print some JSON at the command line earlier today. The easiest way might be to pipe it through python -m json.tool.

Integrating Alfred & Keyboard Maestro: I love Keyboard Maestro for automating all kinds of things on my Mac, but I'm reaching a limit of keyboard shortcuts I can remember. Here's an Alfred workflow for launching macros instead.

streamparse 1.0.0: My team at Parsely is building a tool for easily writing Storm topologies (for processing large volumes of streaming data) in Python. We just released 1.0.0!

TextExpander Tools: Brett Terpstra, the king of Mac hacks, has some really handy tools for TextExpander.

GNU Parallel: GNU parallel is a shell tool for executing jobs in parallel using one or more computers using xargs-like syntax. Pretty cool. HT http://www.twitter.com/oceankidbilly.

Tim HopperSundry Links for August 23, 2014

Arrow: better dates and times for Python: Arrow is a slick Python library "that offers a sensible, human-friendly approach to creating, manipulating, formatting and converting dates, times, and timestamps". It's a friendly alternative to datetime.

Docker via Homebrew: I'm starting to use Docker ("Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications") on occasion. Here are easy install instructions for Mac users.

Mining Massive Datasets MOOC: I'm terrible at completing MOOCs, but I'm really interested in this new on on Mining Massive Datasets.

URL Pinner - Chrome Web Store: URL Pinner is one of my favorite Chrome Extensions. I use it to automatically pin my Gmail and Rdio windows (which I almost always have open).

Using multitail for monitoring multiple log files: If you work with distributed systems, you're probably used to SSH-ing into multiple machines to access logs. Multitool might save you some time.

Saturday Morning Breakfast Cereal: SBMC shows how job interviews would go if we were more honest.

Tim HopperSundry Links for August 23, 2014

Remove Styles (ie, make the clipboard plain text – not applicable to variables). Set line endings to Mac, Unix or Windows/DOS. Trim Whitespace. Hard wrap or unwrap paragraphs. Lowercase (all characters), Lowercase First (just the first character). Uppercase (all characters), Uppercase First (just the first character). Capitalize (all words) or Title Case (intelligently uppercase certain first letters). Change quotes to Smart, Dumb or French quotation marks. Encode HTML or non-ASCII HTML entities. Decode HTML entities. Generate an HTML list. Percent Encode for URL. Get or delete the last path component or the path extension. Get the basename of the path (ie the name without directory or extension). Expand tilde (~) paths, or abbreviate with a tilde. Resolve symlinks, or standardize the path. Delete or bullet (•) control characters. Calculate an expression and return the result, see the Calculations section. Process Text Tokens and return the result, see the Text Tokens section. Count the characters, words or lines and return the result.

Frank WierzbickiJython 2.7 beta3 released!

On behalf of the Jython development team, I'm pleased to announce that the third beta of Jython 2.7 is available. I'd like to thank Adconion Media Group (now Amobee) for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7b3 brings us up to language level compatibility with the 2.7 version of CPython. We have focused largely on CPython compatibility, and so this release of Jython can run more pure Python apps then any previous release. Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

Some highlights of the changes that come in beta 3:
  • Reimplementation of socket/select/ssl on top of Netty 4.
  • Requests now works.
  • Pip almost works (it works with a custom branch).
  • Numerous bug fixes
To get a more complete list of changes in beta 3, see Jim Baker's talk.

As a beta release we are concentrating on bug fixing and stabilization for a
production release.

This release is being hosted at maven central. The traditional installer can be found here. See the installation instructions for using the installer. Three other versions are available:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Caktus GroupDjangoCon Ticket Giveaway!

Update: Congratulations to @dmpayton for winning this giveaway!

Caktus is giving away a DjangoCon ticket valued at $850! DjangoCon is the main US Django conference and it’s returning to Portland this year, August 30 - September 4th. Meet fellow Django developers, learn what others are doing, and have a good time!

To enter the giveaway: (1) follow us @caktusgroup and (2) retweet our message by clicking the button below:

The giveaway will end Wednesday, August 20th at 9AM PDT. We’ll randomly select a name and alert the winner by 5PM PDT. Please note that only one entry per individual is allowed and winning tickets are non-transferable.

We hope to see you at DjangoCon this year!

Caktus GroupPyOhio Recap: Celery with Python

Caleb Smith recently gave a talk, “Intro to Celery,” at PyOhio (video below). Celery is a pretty popular topic for us here at Caktus. We use it often in our client work and find it very handy. So we were happy Caleb was out in the world, promoting its use. We sat down with him to hear more about PyOhio and Celery.

What did you enjoy about PyOhio?

PyOhio had good quality talks and a broad range of topics including system administration, web development, and scientific programming. This year, they had over 100 talk submissions and 38 spots, so there was a huge interest in speakers and a lot of variety as a result. They have four tracks and sprints every evening.

Also, PyOhio is free. The value of a free conference is that it lowers the barrier to attend to the costs of hotel, food, and travel. Things are pretty affordable in Columbus. So that’s good for students or people without an employer to help cover costs, like freelancers. People do come from a pretty big range of places across the Midwest and South.

They have a good team of volunteers that take care of everything.

Aside from a vegetable, what is Celery and why should developers use it?

Celery is for offloading background tasks so you can have work happening behind-the-scenes while running a web project. A typical web app does everything within requests and any periodic work with cronjobs. A lot of web projects will block a request on work that needs to be done before giving a response. For example, an image upload form might make the user wait while thumbnails are produced. Sometimes, there’s work that your web project needs to do that doesn’t fit within the upper limit of 30 seconds or so to fulfill a request before timing out the request. Celery allows for offloading this work outside of the web request. It also allows for the distribution of work as needed on multiple machines. You can trigger background tasks periodically for things like nightly backups, importing data, checking on updates to a feed or API, or whatever work that needs to run asynchronously in the background. We use this a ton with some of our client work.

What are Celery alternatives?

There are a few significant ones such as RQ, pyres, gearman and kuyruk. I think Celery is the most common choice among these. You can also just use system cron jobs for the periodic tasks, but cron jobs only work on one machine and are rarely well maintained. A task queue solution such as Celery coordinates with a broker to work on different machines.

What do you think are the challenges to getting started with Celery?

A lot of people think that it only works with Django. That was true when Celery was first released but is no longer true. There’s also somewhat of a barrier to entry because of the terminology involved, the work of setting up system resources such as the message broker, and understanding its role within a project.

You were a former public school music teacher and often teach Python in the community for organizations like Girl Develop It. Is there a relationship you see to giving talks?

Giving talks does feel like an extension of teaching. You learn a lot trying to prepare for it. My talk was about how to get everything set up, the basics of how Celery works, and developing a mental model for programming Celery tasks. A project like Celery can seem very difficult if you are approaching the documentation on your own. The high level overview is a little daunting so it’s nice to provide an on-ramp for people.

Our other blog posts contain more on Celery with Python.

Caktus GroupCaleb Smith to Guest Lecture at Iron Yard Academy

Caleb Smith, a developer at Caktus, will be guest lecturing tomorrow to the inaugral class at the Iron Yard in Durham. Iron Yard is a code school that trains its students in modern programming practices and prepares them for immediate hiring upon graduation. Tobias, our CEO, is on the school’s employer advisory board. Caleb will be speaking on his experience as a Python developer. As an exclusive Python shop, we here at Caktus naturally think it’s the best language for new students--28 of the top 30 universities agree.

Joseph TateMoving a Paravirtualized EC2 legacy instance to a modern HVM one

I had to try a few things before I could get this right, so I thought I'd write about it. These steps are what ultimately worked for me. I had tried several other things to no success, which I'll list at the end of the post.

If you have Elastic Compute Cloud (EC2) instances on the "previous generation" paravirtualization based instance types, and want to convert them to the new/cheaper/faster "current generation", HVM instance types with SSD storage, this is what you have to do:

You'll need a donor Elastic Block Store (EBS) volume so you can copy data from it. Either shutdown the old instance and detach the EBS, or, as I did, snapshot the old system, and then create a new volume from the snapshot so that you can mess up without worrying about losing data. (I was also moving my instances to a cheaper data center, which I could only do by moving snapshots around). If you choose to create a new volume, make a note of which Availability Zone (AZ) you create it in.

Create a new EC2 instance of the desired instance type, configured with a new EBS volume set up the way you want it. Use a base image that's as similar to what you currently have as possible. Make sure you're using the same base OS version, CPU type, and that your instance is in the same AZ as your donor EBS volume. I mounted the ephemeral storage too as a way to quickly rollback if I messed up without having to recreate the instance from scratch.

Attach your donor EBS volume to your new instance as sdf/xvdf, and then mount them to a new directory I'll call /donor
mkdir /donor && mount /dev/xvdf /donor

Suggested: Mount your ephemeral storage on /mnt
mount /dev/xvdb /mnt
and rsync / to /mnt
rsync -aPx / /mnt/
If something goes wrong in the next few steps, you can reverse it by running
rsync -aPx --delete /mnt/ /
to revert to known working state. The rsync options tell rsync to copy (a)ll files, links, and directories, and all ownership/permissions/mtime/ctime/atime values; to show (P)rogress; and to not e(x)tend beyond a single file system (this leaves /proc /sys and your scratch and donor volumes alone).

Copy your /donor volume data to / by running
rsync -aPx /donor/ / --exclude /boot --exclude /etc/grub.d ...
. You can include other excludes (use paths to where they would be copied on the final volume, not the path in the donor system. The excluded paths above are for an Ubuntu system. You should replace /etc/grub.d with the path or paths where your distro keeps its bootloader configuration files. I found that copying /boot was insufficient because the files in /boot are merely linked to /etc/grub.d.

Now you should be able to reboot your instance your new upgraded system. Do so, detach the donor EBS volume, and if you used the ephemeral storage as a scratch copy, reset it as you prefer. Switch your Elastic IP, or change your DNS configuration, test your applications, and then clean up your old instance artifacts. Congratulations, you're done.

Be careful of slashes. The rsync command treats /donor/ differently from /donor.

What failed:
Converting the EBS snapshot to an AMI and setting the AMI virtualization type as HVM, then launching a new instance with this AMI actually failed to boot (I've had trouble with this with PV instances too with the Ubuntu base image unless I specified a specific kernel, so I'm not sure whether to blame HVM or the Ubuntu base images.
Connecting a copy of the PV ebs volume to a running HVM system and copying /boot to the donor, then replacing sda1 with the donor volume also failed to boot, though I think if I'd copied /etc/grub.d too it might have worked. This might not get you an SSD backed EBS volume though, if that's desirable.

Caktus GroupOSCON 2014 & REST API Client Best Practices

Mark Lavin, Caktus Technical Director and author of the forthcoming Django LightWeight was recently at OSCON 2014 in Portland where he gave a talk on improving the relationship between server and client for REST APIs. OSCON, with over 3000 attendees, is one of the largest open source conferences around. I sat down with him to ask him about his time there.

Welcome back! This was your second year speaking at OSCON. How did you enjoy it this year?

I enjoyed it. There’s a variety of topics at OSCON. It’s cool to see what people do with open source—there’s such a large number of companies, technologies, and approaches to solutions. There were great conversations and presentations. I especially liked Ignite OSCON where people gave really well-prepared 5 minute talks.

I participated in the OSCON 5k [Mark received 5th place] too. There were a lot of people out. We went over bridges and went up and down this spiral bridge twice. That race was pretty late for me but fun [began at 9pm PST, which is 12AM EST].

Why did you choose REST API client best practices as a talk topic?

It was something that came out of working on Django LightWeight. I was writing about building REST APIs and the javascript clients. This prompted a lot of thinking and researching on how to design both ends of it from Julia (Ellman, co-author) and I. I found a lot of mixed content and a lot of things I wasn’t happy to see—people skimping on what I felt were best practices.

I think that you need to think about API design in the same way that you think about websites. How is a client going to navigate the API? If it’s asking for a piece of information, how is it going to find a related piece of information? What actions is it allowed to take? Writing a good server can make a client easier, something I’ve seen in my work at Caktus.

Why do you think this isn’t a more common practice?

The focus is often on building a really fast API, not building an API that’s easy to use necessarily. It’s hard to write the client for most APIs. The information that gets passed to the client isn’t always sufficient. Many APIs don’t spend the time to make themselves discoverable, so the client has to spend a lot of work hard coding to make up for the fact that it doesn’t know the location of resources.

What trade-offs do you think exist?

With relational data models, sometimes you end up trading off normalization. A classical “right way” to build a data model is one that doesn’t repeat itself and that doesn’t store redundant data in a very normalized fashion. Denormalizing data can lead to inconsistencies and duplication, but, at times, it can make things faster.

The API design is similar particularly when you have deeply relational structures. There were a lot of conversations about how do you make this trade off. Interestingly enough, Netflix gave a talk about their API and its evolution. They said they started with a normalized structure and discoverable API and found that eventually they had to restructure some pieces into a less normalized fashion for the performance they needed for some of the settop boxes and game boxes that query their API.

We heard you had an opportunity to give a tutorial. Tell us more about it.

I had the opportunity to help Harry Percival. He recently released a book on Python web development using test-driven development. We’d emailed before and so we knew each other a little bit. He asked me to help him be a TA so I spent Monday morning trying to help people follow his tutorial and get set up learning Python and Django. It was unexpected, but a lot of fun, similar to what Caktus has done with the bootcamps. I like to teach. It’s fun to be a part of that and to help someone understand something they didn’t know before. There were a lot of people interested in learning about Python and Django. I was just happy to participate.

Thanks for sharing your experiences with us Mark!

Thanks for asking me!

Caktus GroupWebsite Redesign for PyCon 2015

PyCon 2015’s website launched today (a day early!). PyCon is the premiere conference for the Python community and one we look forward to attending every year. We’re honored that the Python Software Foundation returned to us this year to revamp the site. We were especially happy to work again with organizer-extraordinaires Ewa Jodlowska and Diana Clarke.

One of the most exciting things for our team is turning ideas into reality. The organizers wanted to retain the colorful nature of the PyCon 2014 site Caktus created. The also wanted the team to use the conference site, the Palais des congrès de Montréal, as inspiration (pictured below). The new design needed to pay homage to the iconic building without being either too literal or too abstract.


The design team, led by Trevor Ray, worked together to create the design using the stairs as inspiration (seen through the photo above). The stairs allowed a sense of movement. The colored panes are linked in a snake-like manner, a nod to Python’s namesake. If you look carefully, you will also see the letter P. Working in collaboration with the organizers, the team created multiple drafts, fine-tuning the look and feel with each phase of feedback. The final design represents the direction of the client, the inspiration of the building itself, and the team’s own creativity.

In addition to refreshing PyCon’s website, our developers, as led by Rebecca Lovewell, made augmentations to Symposion, a Django project for conference websites. We’ve previously worked with Symposion for PyCon 2014 and PyOhio. For this round of changes, the team used these previous augmentations as a jumping off point for refinements to the scheduler, financial aid processing, and sponsor information sharing.

Up next? A conference t-shirt!

Vinod KurupUsing dynamic queries in a CBV

Let's play 'Spot the bug'. We're building a simple system that shows photos. Each photo has a publish_date and we should only show photos that have been published (i.e. their publish_date is in the past).

``` python models.py class PhotoManager(models.Manager):

def live(self, as_of=None):
    if as_of is None:
        as_of = timezone.now()
    return super().get_query_set().filter(publish_date__lte=as_of)


And the view to show those photos:

``` python views.py class ShowPhotosView(ListView):

queryset = Hero.objects.live()


Can you spot the bug? I sure didn't... until the client complained that newly published photos never showed up on the site. Restarting the server fixed the problem temporarily. The newly published photos would show up, but then any photos published after the server restart again failed to display.

The problem is that the ShowPhotosView class is instantiated when the server starts. ShowPhotosView.queryset gets set to the value returned by Hero.objects.live(). That, in turn, is a QuerySet, but it's a QuerySet with as_of set to timezone.now() WHEN THE SERVER STARTS UP. That as_of value never gets updated, so newer photos never get captured in the query.

There's probably multiple ways to fix this, but an easy one is:

``` python views.py class ShowPhotosView(ListView):

def get_queryset(self):
    return Hero.objects.live()


Now, instead of the queryset being instantiated at server start-up, it's instantiated only when ShowPhotosView.get_queryset() is called, which is when a request is made.

Caktus GroupA Culture of Code Reviews

Code reviews are one of those things that everyone agrees are worthwhile, but sometimes don’t get done. A good way to keep getting the benefits of code reviews is to establish, and even nurture, a culture of code reviews.

When code reviews are part of the culture, people don’t just expect their changes to be reviewed, they want their changes reviewed.

Some advantages of code reviews

We can all agree that code reviews improve code quality by spotting bugs. But there are other advantages, especially when changes are reviewed consistently.

Having your own code reviewed is a learning experience. We all have different training and experiences, and code reviews give us a chance to share what we know with others on the team. The more experienced developer might be pointing out some pitfall they’ve learned by bitter experience, while the enthusiastic new developer is suggesting the latest library that can do half the work for you.

Reviewing other people’s code is a learning experience too. You’ll see better ways of doing things that you’ll want to adopt.

If all code is reviewed, there are no parts of the code that only one person is familiar with. The code becomes a collaborative product of the team, not a bunch of pieces “owned” by individual programmers.

Obstacles to code reviews

But you only get the benefits of code reviews if you do them. What are some things that can get in the way?

Insufficient staffing is an obvious problem, whether there’s only one person working on the code, or no one working on the code has time to review other changes, or to wait for their own to be reviewed. To nurture a culture of code reviews, enough staff needs to be allocated to projects to allow code reviews to be a part of the normal process. That means at least two people on a team who are familiar enough with the project to do reviews. If there’s not enough work for two full-team team members, one member could be part-time, or even both. Better two people working on a project part-time than one person full-time.

Poor tools can inhibit code reviews. The more difficult something is, the more likely we are to avoid doing it. Take the time to adopt a good set of tools, whether GitHub’s pull requests, the open source ReviewBoard project, or anything else that handles most of the tedious parts of a code review for you. It should be easy to give feedback, linked to the relevant changes, and to respond to the feedback.

Ego is one of the biggest obstacles. No one likes having their work criticized. But we can do things in ways that reduce people’s concerns.

Code reviews should be universal - everyone’s changes are reviewed, always. Any exception can be viewed, if someone is inclined that way, as an indication that some developers are “better” than others.

Reviews are about the code, not the coder. Feedback should be worded accordingly. Instead of saying “You forgot to check the validity of this input”, reviewers can say “This function is missing validation of this input”, and so forth.

We do reviews because our work is important and we want it to be as good as possible, not because we expect our co-workers to screw it up. At the same time, we recognize that we are all human, and humans are fallible.

Establishing a culture of code reviews

Having a culture where code reviews are just a normal part of the workflow, and we’d feel naked without them, is the ideal. But if you’re not there yet, how can you move in that direction?

It starts with commitment from management. Provide the proper tools, give projects enough staffing so there’s time for reviews, and make it clear that all changes are expected to be reviewed. Maybe provide some resources for training.

Then, get out of the way. Management should not be involved in the actual process of code reviews. If developers are reluctant to have other developers review their changes, they’re positively repelled by the idea of non-developers doing it. Keep the actual process something that happens among peers.

When adding code reviews to your workflow, there are some choices to make, and I think some approaches work better than others.

First, every change is reviewed. If developers pick and choose which changes are reviewed, inevitably someone will feel singled out, or a serious bug will slip by in a “trivial” change that didn’t seem to merit a review.

Second, review changes before they’re merged or accepted. A “merge then review” process can result in everyone assuming someone else will review the change, and nobody actually doing it. By requiring a review and signoff before the change is merged, the one who made the change is motivated to seek out a reviewer and get the review done.

Third, reviews are done by peers, by people who are also active coders. Writing and reviewing code is a collaboration among a team. Everyone reviews and has their own changes reviewed. It’s not a process of a developer submitting a proposed change to someone outside the team for approval.

The target

How will you know when you’re moving toward a culture of code reviews? When people want their code to be reviewed. When they complain about obstacles making it more difficult to get their code reviewed. When the team is happier because they’re producing better code and learning to be better developers.

Vinod KurupSome Emacs Posts

A few cool Emacs posts have flown across my radar, so I'm noting them here for that time in the future when I have time to play with them.

Vinod KurupPygments on Arch Linux

I wrote my first blog post in a little while (ok, ok... 18 months) yesterday and when I tried to generate the post, it failed. Silently failed, which is the worst kind of failure. I'm still not sure why it was silent, but I eventually was able to force it to show me an error message:

`` /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:354:inrescue in get_header': Failed to get header. (MentosError)

from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:335:in `get_header'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:232:in `block in mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:206:in `mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:189:in `highlight'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:24:in `pygments'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:14:in `highlight'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:37:in `block in render_code_block'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `gsub'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `render_code_block'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:12:in `pre_filter'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:28:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:112:in `block in pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `each'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:166:in `do_layout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/post.rb:195:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:200:in `block in render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `each'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:41:in `process'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/bin/jekyll:264:in `<top (required)>'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `load'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `<main>'


Professor Google tells me that this happens when you try to run the pygments.rb library in a Python 3 environment. (pygments.rb is a Ruby wrapper around the Python Pygments library). The fix is to run the code in a Python2 virtualenv. I guess the last time I updated my blog, Arch still had Python2 as the system default. No, I don't want to check how long ago that was.

$ mkvirtualenv -p `which python2` my_blog (my_blog)$ bundle exec rake generate

So now I'm running a Ruby command in a Ruby environment (rbenv) inside a Python 2 virtualenv. Maybe it's time to switch blog tools again...

Vinod KurupHow to create test models in Django

It's occasionally useful to be able to create a Django model class in your unit test suite. Let's say you're building a library which creates an abstract model which your users will want to subclass. There's no need for your library to subclass it, but your library should still test that you can create a subclass and test out its features. If you create that model in your models.py file, then Django will think that it is a real part of your library and load it whenever you (or your users) call syncdb. That's bad.

The solution is to create it in a tests.py file within your Django app. If it's not in models.py, Django won't load it during syncdb.

``` python tests.py from django.db import models from django.test import TestCase

from .models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class AbstractTest(TestCase):

def test_my_test_model(self):


A problem with this solution is that I rarely use a single tests.py file. Instead we use multiple test files collected in a tests package. If you try to create a model in tests/test_foo.py, then this approach fails because Django tries to create the model in an application named tests, but there is no such app in INSTALLED_APPS. The solution is to set app_label to the name of your app in an inner Meta class.

```python tests/test_foo.py from django.db import models from django.test import TestCase

from ..models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class Meta:
    app_label = 'myappname'

class AbstractTest(TestCase):

def test_my_test_model(self):


Oh, and I almost forgot... if you use South, this might not work, unless you set SOUTH_TESTS_MIGRATE to False in your settings file.

Comments and corrections welcome!

Joe GregorioObservations on hg and git

Having recently moved to using Git from Mercurial here are my observations:

Git just works

No matter what I try to do, there's a short and simple git command that does it. Need to copy a single file from one branch to my current branch, need to roll back the last two commits and place their changes into the index, need to push or pull from a local branch to a remote and differently named branch, there are all ways to do those things. More importantly, Git does them natively, I don't have to turn on plugins to get a particular piece of functionality.

Turning on plugins is a hurdle

The fact that what I consider to be core functionality is hidden away in plugins and need to be turned on manually is an issue. For example, look at this section of the docs for the Google API Python Client:


A big thing that trips up contributors is that "--rebase" is in a plugin (and I keep forgetting to update the docs to explain that).

Git is fast

So Git is fast, not just "ooh that was fast", but fast as in, "there must have been a problem because there's no way it could have worked that fast". That's a feature.


Git branches are much smoother and integrated than MQ. Maybe this is just because I got stuck on MQ and never learned another way to use hg, but the branch and merge workflow is a lot better than MQ.


In Git ssh: URIs just work for me. Maybe I just got lucky, or was previously unlucky, but I never seemed to be able to pull or push to a remote repository via ssh with hg, and it just worked as advertised with Git.


Git is helpful. Git is filled with helpful messages, many of the form "it looks like you are trying to do blah, here's the exact command line for that", or "you seem to be in 'weird state foo', here's a couple different command lines you might use to rectify the situation". Obviously those are paraphrasing, but the general idea of providing long, helpful messages with actual commands in them is done well throughout Git.


I'm not writing this to cast aspersions on the Mercurial developers, and I've already passed this information along to developers that work on Mercurial. I am hoping that if you're building command line tools that you can incorporate some of the items here, such as helpful error messages, speed, and robust out-of-the-box capabilities.

Caktus GroupContributing Back to Symposion

Recently Caktus collaborated with the organizers of PyOhio, a free regional Python conference, to launch the PyOhio 2014 conference website. The conference starts this weekend, July 26 - 27. As in prior years, the conference web site utilizes Eldarion’s Symposion, an opensource conference management system. Symposion powers a number of annual conference sites including PyCon and DjangoCon. In fact, as of this writing, there are 78 forks of Symposion, a nod to its widespread use for events both large and small. This collaboration afforded us the opportunity to abide by one our core tenets, that of giving back to the community.

PyOhio organizers had identified a few pain points during last year’s rollout that were resolvable in a manner that was conducive to contributing back to Symposion so that future adopters could benefit from this work. The areas we focused on were migration support, refining the user experience for proposal submitters and sponsor applicants, and schedule building.

Migration Support


The majority of our projects utilize South for tracking database migrations. They are not an absolute requirement but for those conferences that reused the same code base from year to year, rather than starting a new repository, it would be beneficial to have a migration strategy in place. There were a few minor implementation details to tackle, namely migration dependencies and introspection rules. The Symposion framework has a number of interdependent apps. As such, when using migrations, the database tables must be created in a certain order. For Symposion, there are two such dependencies: Proposals depend on Speakers, and Sponsorship depends on Conferences. The implementation can be seen in this changeset. In addition, Symposion uses a custom field for certain models; django-timezones’ TimeZoneField. There are a few Pull Requests open on this project to deal with South and introspection rules, but none of them have been incorporated. As such, we add a very simple rule to work around migration errors.

As mentioned before, these migrations give Symposion a solid migration workflow for future database changes, as well as prepping for Django 1.7’s native schema migration support.

User Experience Issues

Currently, if an unauthenticated user manages to make a proposal submission, they are simply redirected to the home page of the site. Similarly, if an authenticated user without a Speaker profile makes a submission, they are redirected to their dashboard. In both cases, there is no additional feedback for what the user should do next. We utilized the django messages framework to provide contextual feedback with help text and hyperlinks should these be valid submission attempts (https://github.com/pinax/symposion/pull/50/files).

Sponsor submissions is another area that benefited from additional contextual messages. There are a variety of sponsor levels (Unobtanium, Aluminum, etc..) that carry their own sponsor benefits (print ad in program, for example). The current workflow redirects a sponsor application to the Sponsor Details page, with no contextual message, that lists Sponsor and Benefits details. For sponsor levels with no benefits, this essentially redirects you to an update form for the details you just submitted. Our pull request redirects these cases to the user dashboard with an appropriate message, as well as providing a more helpful message for sponsor applications that do carry benefits. (https://github.com/pinax/symposion/pull/49/files).

Schedule Builder


The conference schedule is a key component to the web site, as it lets attendees (and speakers) know when to be where! It is also a fairly complex app, with a number of interwoven database tables. The screenshot below lists the components required to build a conference schedule:

At a minimum, to create one scheduled presentation requires 7 objects spread across 7 different tables. Scale this out to tens or nearly one hundred talks and the process of manually building a schedule become egregiously cumbersome. For PyCon 2014 we built a custom importer for the talks schedule. A quick glance reveals this is not easily reusable; there are pinned lunches and breaks, and this particular command assigns accepted proposals to schedule slots. For PyOhio, we wanted to provide something that was more generic and reusable. Rather than building out the entire schedule of approved talks, we wanted a fairly quick and intuitive way for an administrator to build the schedule’s skeleton via the frontend using a CSV file. The format of the CSV is intentionally basic, for example:

"date","time_start","time_end","kind"," room "
"12/12/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/12/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room2"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room2"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room2"

This sample, when uploaded, will create the requisite backend objects (2 Day, 2 Room, 2 Slot Kinds, 8 slots, and 12 SlotRoom objects). This initial implementation will fail if schedule overlaps occur, allows for front end deletion of Schedules, is tested, and provides documentation as well. Having a schedule builder will allow the conference organizers a chance to divert more energy into reviewing and securing great talks and keynotes, rather than dealing with the minutiae of administering the schedule itself.

Symposion is a great Python based conference management system. We are excited about its broad use in general, and in helping contribute to its future longevity and feature set.

Edited to Add (7/21/2014: 3:31PM): PR Merge Efforts

One of the SciPy2014 tech chairs, Sheila, is part of new efforts to get PRs merged. Join the mailing list to learn more about merges based off the PyOhio fork.

Caktus GroupWhat was being in Libya like?

Election day voting on June 25th. Image courtesy of HNEC.


Since this interview was done, Libya’s capitol began experiencing more violence. As of today, militias are fighting over control of the Tripoli international airport, the primary way in and out of this section of Libya. We’re keeping our friends and colleagues in Libya in our thoughts in these truly difficult times. This post speaks to the energy and talent of the Libyans we’ve worked with during this challenging democratic transition.

I’m the project manager for our Libya voter registration by text message project, the first in the world. Two of our staff members, Tobias McNulty, CEO, and Elliott Wilkes, Technical Project Manager, were in Libya during the June 25th elections. Given the political chaos and violence surrounding the elections, the US team was worried for them. Tobias recently returned and Elliott left for Nairobi last week. Elliott was in Libya since August 2013. I asked them some questions about being in Tripoli, Libya’s capital.

With the attempted coup just a couple weeks prior, what were tensions like on election day?

Tobias: One of the outcomes of the attempted coup in May was that the election date got moved up to June 25. This put everyone at the Libyan High National Election Commission (HNEC) on task to deliver an election much earlier than originally planned. I’m proud to say that Caktus and HNEC worked closely together to step up to the plate and meet this mandate successfully.

Elliott: Tripoli was plastered with advertisements about the election, both from HNEC and from candidates themselves. Most of the security presence in Tripoli kept to their usual posts on election day, the gas stations. Due to distribution issues and a resulting panic about petrol supplies, there was a run on gas in the city, and army and police vehicles helped guard gas stations to keep the peace while people spent hours in line waiting to top up. In spite of or perhaps because of this security presence, we didn’t witness any violence while in Tripoli during or in the days leading up to, during, or following election day. With a few marked exceptions this election proceeded largely without incident - a great success for HNEC. However, one act of violence just after the election did shake the country: the murder of humanitarian Salwa Bugaighis in Benghazi. We were all deeply saddened to hear about that significant loss.

Tobias: Yes, that was tragic. It was hard news for all of us.

How did the voter registration system function on election day?

Tobias: There were three things that HNEC and the citizens of Libya could use our text message-based voter registration system for on election day. First and foremost, poll workers and voters could use it to double check individual registrations and poll locations. Second, to track when polling centers opened so HNEC staff knew which centers needed additional support. And lastly, poll workers could send texts of number of voters that arrived, giving HNEC real-time turnout figures.

Elliott: On election day, we helped the polling center staff with message formatting, and performing custom database queries to assess progress throughout the day. It’s a real testament to our work that we encountered no technological issues throughout the election process. The system successfully handled nearly 100,000 text messages on election day alone.

What was the mood like at elections headquarters as the count of voters came in?

Tobias: At HNEC offices on election day, the mood was positive and optimistic. We had 1.5 million registrants in the system. We arrived early in the morning, and as the HNEC staff began to arrive, on numerous occasions the office stood and sang along to the Libyan national anthem. We joined in too of course. The flow was busy, but not unmanageable. We worked long days and accomplished much in the days surrounding the election. But thanks to adequate preparation on the part of HNEC and our team the workload was not unmanageable.

However, among citizens there is clearly some work to do in terms of motivating voter turnout on election day. Several citizens we talked to were indifferent to the elections, and expressed some distrust in elected leaders generally. That said, elections are still relatively new to Libya, and I think we need to moderate our expectations for how quickly they’ll grow in popularity.

Elliott: Having worked on a number of elections around the world I’ve come to the conclusion that the best election days are the boring ones - that is to say, those without incident. If you’ve done your job well, you go into election day with hours upon hours of preparations for all different outcomes and possibilities, plans A, B, C, to Z. You’ve spent weeks and months building a massive ship and all that’s left to do is enjoy the ride. And thankfully, due to our exhaustive preparations, everything was smooth sailing.

What was it like working with the Libyan government?

Tobias: While from the outside Libya may look like an unstable country with lots of negative coverage in the news, the reality on the ground is that working with HNEC has been a real pleasure. The operations staff at the Commission are motivated to continue strengthening democracy in the country, which was evidenced by the long hours many put in in the days leading up to and following the election. We’re honored that the government of Libya selected Caktus as the technology partner for this highly impactful project.

Elliott: Tobias, I couldn't agree more. Working on this project has been extraordinary. There's something special about working with a group of young, committed citizens putting in the extra hours to ensure their electoral process is as inclusive as possible, especially given that for over forty years, government services in the Libya were everything but. Their commitment to the pursuit of democracy and everything that entails has made this project a real pleasure and a deeply humbling experience. I'm proud that we've been able to support them at this critical junction.

We’ve seen the photos, but want to hear you list the pizza toppings!

Tobias: Ha, the rumors are true. The Libyans put all sorts of things on their pizza, the two most prominent of which are often canned tuna fish and french fries. Ketchup and mayonnaise are two other favorite toppings.

Elliott: It tastes terrible. Honestly, I prefer shawerma. The pizza toppings in Libya can be...a bit exotic for my taste.

Tobias: Luckily, there’s other food and the staff at HNEC didn’t hesitate to invite us in to join their meals. It was a pleasure to break bread with the staff at HNEC during such a momentous week for the country of Libya.


Libyan Pizza
Photo by Elliott Wilkes.


Caktus GroupJuly 2014 ShipIt Day Recap

This past Friday we celebrated another ShipIt day at Caktus. There was a lot of open source contribution, exploring, and learning happening in the office. The projects ranged from native mobile Firefox OS apps, to development on our automated server provisioning templates via Salt, to front-end apps aimed at using web technology to create interfaces where composing new music or performing Frozen’s Let It Go is so easy anyone can do it.

Here is a quick summary of the projects that folks worked on:

Calvin worked on updating our own minimal CMS component for editing content on a site, django-pagelets, to work nicely with Django 1.7. He also is interested in adding TinyMCE support and making it easy to upload images and reference them in the block. If you have any other suggestions for pagelets, get in touch with Calvin.

ShipIt Day Project: Anglo-Saxon / French Etymology Analyzer

Philip worked on a code to tag words in a text with basic information about their etymologies. He was interested in exploring words with dual French and Anglo-Saxon variations eg “Uncouth” and “Rude”. These words have evolved from different origins to have similar meanings in modern English and it turns out that people often perceive the French or Latin derived word, in general, to be more erudite (“erudite” from Latin) than the Anglo-Saxon variant. To explore this concept, Philip harvested word etymologies from the XML version of the Wiktionary database and categorized words from in Lewis Carroll’s Alice In Wonderland as well as reports from the National Hurricane Center. His initial results showed that Carroll’s British background was evident in his use of language, and Philip is excited to take what he developed in ShipIt day and continue to work on the project.

Mark created a Firefox OS app, Costanza, based on a concept from a Seinfeld episode. Mark’s app used standard web tools including HTML, CSS, and Javascript to build an offline app that recorded and played back audio. Mark learned a lot about building apps with the new OS and especially spent a lot of time diving into issues with packaging up apps for distribution.

Rebecca and Scott collaborated on work in porting an application to the latest and greatest Python 3. The migration of apps from Python 2 to Python 3 started off as a controversial subject in the Python community, but slowly there has been lots of progress. Caktus is embracing this transition and trying to get projects ported over when there is time. Rebecca and Scott both wrestled with some of the challenges faced with moving a big project on a legacy server over to a new Python version.

Dan also wrestled with the Python 2 to 3 growing pains, though less directly. He set out to create a reusable Django app that supported generic requirements he had encountered in a number of client apps while exporting data to comma separated value (CSV) format. But, while doing this, he ran into difference in the Python 2 and 3 standard libraries for handling CSVs. Dan created cordwainer, a generic CSV library that works both in Python 2 and 3.

ShipIt Day Project: Template Include Visualization

Victor and Caleb worked together to create a wonderful tool for debugging difficult and tangled Django template includes. The tool helps template developers edit templates without fear that they won’t know what pages on the live site may be affected by their changes. They used d3 to visualize the template in a way that was interactive and intuitive for template writers to get a handle on complex dependency trees.

Michael has been working on a migraine tracking app using in iOS using PhoneGap and JQuery mobile. He has been diving in and learning about distributing mobile apps using XCode and interfacing with the phone calendar to store migraine data. In terms of the interface, Michael studied up on accessibility in creating the app whose primary audience will not be wanting to dig into small details or stare at their bright phone long while enduring a migraine.

Karen, Vinod, and Tobias all worked together to help improve Caktus’ Django project template. Karen learned a lot about updating projects on servers provisioned with Salt while trying to close out one of the tickets on our project-template repository. The ticket she was working on was how to delete stale Python byte code (.pyc) files that are left over when a Python source code file (.py) is deleted from a Git repository. These stale .pyc files can cause errors when they aren’t deleted properly during an upgrade. Vinod worked through many issues getting Docker instead of Virtualbox with Vagrant to create virtual environments in which SaltStack can run and provisioning new servers. Docker is a lighter weight environment than a full Virtualbox Linux server and would allow for faster iteration while developing provisioning code with SaltStack. Tobias improved the default logging configuration in the template to make it easier to debug errors when they occur, and also got started on some tools for integration testing of the project template itself.

Wray and Hunter collaborated to build a music composition and performance app called Whoppy (go ahead and try it out!). Whoppy uses Web Audio to create a new randomized virtual instruments every time you start the app. Wray and Hunter worked through creating a nice interface that highlights notes in the same key so that it is easier for amateur composers to have fun making music.

Og MacielThe End For Pylyglot


It was around 2005 when I started doing translations for Free and Open-Source Software. Back then I was warmly welcomed to the Ubuntu family and quickly learned all there was to know about using their Rosetta online tool to translate and/or review existing translations for the Brazilian Portuguese language. I spent so much time doing it, even during working hours, that eventually I sort of “made a name for myself” and made my way up to the upper layers of the Ubuntu Community echelon.

Then I “graduated” and started doing translations for the upstream projects, such as GNOME, Xfce, LXDE, and Openbox. I took on more responsabilities, learned to use Git and make commits for myself as well as for other contributors, and strived to unify all Brazilian Portuguese translations across as many different projects as possible. Many discussions were had, (literally) hundreds of hours were spent going though also hundreds of thoundands of translations for hundreds of different applications, none of it bringing me any monetary of financial advantage, but all done for the simple pleasure of knowing that I was helping make FOSS applications “speak” Brazilian Portuguese.

I certainly learned a lot though the experience of working on these many projects… some times I made mistakes, other times I “fought” alone to make sure that standards and procedures were complied with. All in all, looking back I only have one regret: not being nominated to become the leader for the Brazilian GNOME translation team.

Having handled 50% of the translations for one of the GNOME releases (the other 50% was handled by a good friend, Vladimir Melo while the leader did nothing to help) and spent much time making sure that the release would go out the door 100% translated, I really thought I’d be nominated to become the next leader. Not that I felt that I needed a ‘title’ to show off to other people, but in a way I wanted to feel that my peers acknowledged my hard work and commitment to the project.

Seeing other people, even people with no previous experience, being nominated by the current leader to replace him was a slap in the face. It really hurt me… but I made sure to be supportive and continue to work just as hard. I guess you could say that I lived and breathed translations, my passion not knowing any limits or knowing when to stop…

But stop I eventually did, several years ago, when I realized how hard it was to land a job that would allow me to support my family (back then I had 2 small kids) and continue to do the thing I cared the most. I confess that I even went through a series of job interviews for the translation role that Jono Bacon, Canonical’s former community manager, was trying to hire, but in the end things didn’t work out the way I wanted. I also flirted with another similar role at MeeGo but since they wanted me to move to the West Coast I decided not to pursue it (I also had fallen in love with my then current job).


As a way to keep myself somewhat still involved with the translation communities and at the same time learn a bit more about the Django framework, I then created Pylyglot, “a web based glossary compedium for Free and Open Source Software translators heavily inspired on the Open-tran.eu web site… with the objective to ‘provide a concise, yet comprehensive compilation of a body of knowledge’ for translators derived from existing Free and Open Source Software translations.”


I have been running this service on my own and paying for the cost of domain registration and database costs out of my own pocket for a while now, and I now find myself facing the dilema of renewing the domain registration and keep Pylyglot alive for another year… or retire it and end once and for all my relationship with FOSS translations.

Having spent the last couple of months thinking about it, I have now arrived at the conclusion that it is time to let this chapter of my life rest. Though the US$140/year that I won’t be spending won’t make me any richer, I don’t foresee myself either maintaining or spending any time improving the project. So this July 21st, 2014 Pylyglot will close its doors and cease to exist in its current form.

To those who knew about Pylyglot and used it and, hopefuly, found it to be useful, my sincere thanks for using it. To those who supported my idea and the project itself, whether by submitting code patches, building the web site or just giving me moral support, thank you!

Caktus GroupRemoval of Mural

We have recently heard complaints about the painting over of the mural on the side of 108 Morris, the building we purchased and are restoring in Downtown Durham. I am personally distressed at this response. I see now, in retrospect, where we needed to work harder to discuss our decision with the community. In our enthusiasm to bring more life to Downtown Durham via ground-level retail space and offices for our staff, we were blind to what the public response might be to the mural. Its removal was not a decision taken lightly and one done in consultation with the Historic Preservation Commission. However, we handled this poorly. We apologize for not making more efforts to include the community in this decision.

I do wish to emphasize that though we are moving from Carrboro to Durham, many of us are Durhamites, including two of three owners. Many in our small staff of 23 feel far from outsiders. We raise our families in Durham. Our CTO, Colin Copeland, is co-captain of Code for Durham, a volunteer organization dedicated solely to giving more access to public records information for the citizens of Durham. But again, our interest in Downtown Durham is not theoretical, but the place we are building our lives… so this building project is a deeply personal one. We want to see Downtown Durham continue to thrive.

Unfortunately, in restoring a long abandoned historic building that had been remodeled by many hands over the decades, we had to make sacrifices. To return the building to its original 1910 state, we needed to unbrick the windows which would also remove sections of Emily Weinstein’s 1996 Eno River mural. The mural would receive further damage around the windows by default. Our contractor told us (and we could see) the mural had begun deteriorating. We were as diligent as humanly possible, referring often to great resources like Endangered Durham and Open Durham for images of the original building in making the final decision. It was a difficult decision and one that we, of course, could not make alone.

We tried our best to not only preserve, but to add to Durham. We submitted our proposal to the Historic Preservation Commission (HPC) and they approved it during a public meeting in April. They had already approved a similar proposal from the previous owner of the building. During the meeting, those who knew better than us-- actual preservationists-- said that going forward with the window openings would do more to preserve the integrity of the building than the more recent mural. These layers of approval made us feel we should proceed with our focus on restoration.

To further ensure we were doing right by Durham, we voluntarily and eagerly followed the guidelines of the National Park Service and the North Carolina State Historic Preservation Office for exterior restorations. The State Historic Preservation Offices and the National Park Service review the rehabilitation work to ensure that it complies with the Secretary’s Standards for Rehabilitation. As residents of Durham, we were excited and motivated at the prospect of further burnishing Downtown Durham’s reputation as a historic center.

Now, we see that we should not have assumed that the community would see and understand our sincere efforts to improve Downtown Durham. We strongly felt that occupation and restoration of a vacant building would be welcomed. We had not heard complaints until yesterday which surprised us in part because our plans were public. We received one phone call we missed, but they did not respond to our return call. We are new to land development-- as a technology firm, we can safely say that it is not our focus. But we are made up of real people. We are a small firm that got its start thanks to the community around us, so again, it pains me to think we have hurt the community in any way.

In an effort to show our good faith and make amends, we’re planning on having a public meeting within the next few weeks. We are working to arrange a space for it, but will update you as soon as possible. We want to hear your thoughts and brainstorm together how we can better support our new home. We want to listen. We will also happily share with you how the restoration is coming along with photos and mock-ups of the space.

Please sign up to join our mailing list for updates and to find out when the public meeting will be: http://cakt.us/building-updates

Again, we are eager to hear your thoughts.

Sincerely, Tobias McNulty, CEO

Caktus GroupAnnouncing Q2 Caktus Charitable Giving

Caktus participates in social impact projects around the world, but we believe in starting local. We’re proud of the many ways in which our staff contribute to local organizations, each making the world around us just a little better. To further support our employees, Caktus asks employees to suggest donations every quarter. This quarter, we’re sending contributions to the following five non-profits:

RAIN: Regional AIDS Interfaith Network

RAIN engages the community to transform lives and promote respect and dignity for all people touched by HIV through compassionate care, education and leadership development. Caktus staff visited RAIN during a focus group test of a mobile HIV adherence application last year and admired their good work.

Urban Durham Ministries

Urban Ministries of Durham welcomes more than 6,000 people each year who come seeking food, shelter, clothing and supportive services.

Ronald McDonald House of Chapel Hill

Each year, The Ronald McDonald House of Chapel Hill provides more than 2,200 families with seriously ill or injured children the basic necessities and comforts of home so that they can focus on caring for a sick child. Caktus’ contribution will shelter a family in need for one week.

Raleigh Review

The Raleigh Review mission is to foster the creation and availability of accessible yet provocative contemporary literature through our biannual magazine as well as through workshops, readings, and other community events.

LGBT Center of Durham

The LGBT Center of Raleigh is working in tandem with Durham community members to establish a Durham branch for local events, programs, and resources.

VOICES, the Chapel Hill Chorus

Voices is one of the Triangle’s oldest and most distinguished choral groups with a rich history spanning over three decades. Multiple Caktus employees participate. Caktus is providing financial support for promotional T-shirts for the group.

Caktus GroupTips for Upgrading Django

From time to time we inherit code bases running outdated versions of Django and part of our work is to get them running a stable and secure version. In the past year we've done upgrades from versions as old as 1.0 and we've learned a few lessons along the way.

Tests are a Must

You cannot begin a major upgrade without planning how you are going to test that the site works after the upgrade. Running your automated test suite should note warnings for new or pending deprecations. If you don’t have an automated test suite then now would be a good time to start one. You don't need 100% coverage, but the more you have, the more confident you will feel about the upgrade. Integration tests with Django's TestClient can help cover a lot of ground with just a few tests. You'll want to use these sparingly because they tend to be slow and fragile. However, you can use them to test your app much like a human might do, submitting forms (both valid and invalid), and navigating to various pages. As you get closer to your final target version or you find more edge cases, you can add focused unittests to cover those areas. It is possible to do these upgrades without a comprehensive automated test suite and only using manual testing but you need a thorough plan to test the entire site. This type of testing is very slow and error prone and if you are going to be upgrading multiple Django versions it may have to be run multiple times.

Know Your Release Notes

Given Django's deprecation cycle, it's easiest to upgrade a project one Django version at a time. If you try to jump two releases, you may call Django APIs which no longer exist and you’ll miss the deprecation warnings that existed only in the releases you jumped over. Each version has a few big features and a few things which were changed and deprecated. For Django 1.1, there were a number of new features for the admin and most of the deprecations and breaking changes were related to the admin. Django 1.2 added multiple database support and improved the CSRF framework which deprecated the old DB and CSRF settings and code. Static file handling landed in Django 1.3 as did class based views. This started the deprecation of the old function based generic view and the old-style url tag. Django 1.4 changed the default project layout and the manage.py script. It also improved timezone support and upgrading usually started with tracking down RuntimeWarnings about naive datetimes. The customized user was added in Django 1.5 but more important in terms of upgrading was the removal of the function based generic views like direct_to_template and redirect_to. You can see a great post about changing from the built-in User to a custom User model on our blog. Also the url tag upgrade was completed so if your templates weren't updated yet, you'd have a lot of work to do. Django 1.6 reworked the transaction handling and deprecated all of the old transaction management API. Finally in the upcoming 1.7 version, Django will add built-in migrations and projects will need to move away from South. The app-loading refactor is also landing which changes how signals should be registered and how apps like the admin should manage auto-discovery.

Do Some Spring Cleaning

As you upgrade your project remember that there are new features in the Django versions. Take the opportunity to refactor code which wasn't easily handled by older versions of Django. Django 1.2's "smart if" and "elif" (added in 1.4) can help clean up messy template blocks. Features like prefetch_related (added in 1.4) can help reduce queries on pages loading a number of related objects. The update_fields parameter on the save method (added in Django 1.5) is another place where applications can lower overhead and reduce parallel requests overwriting data.

There will also be reusable third-party applications which are no longer compatible with the latest Django versions. However, in most cases there are better applications which are up to date. Switching reusable apps can be difficult and disruptive but in most cases it's better than letting it hold you back from the latest Django release. Unless you are willing to take on the full time maintenance of an app which is a release or two behind Django, you are better off looking for an alternative.


Those are the highlights from our experiences. Get a stable test suite in place before starting, take it one release at a time, and do some spring cleaning along the way. Overall it's much less work to upgrade projects as soon as possible after new Django versions are released. But we understand dependencies and other deadlines can get in the way. If you find yourself a few releases behind, we hope this can help guide you in upgrading.


Mark Lavin is the author of the forthcoming book, Lightweight Django, from O'Reilly Media.


Caktus GroupChapelboro.com: Carrboro Firm Develops Web App to Register Voters in Libya

Chapelboro.com recently featured Caktus’ work in implementing the first ever voter registration system via text message.

Caktus GroupO'Reilly Deal: 50% Off Lightweight Django

O'Reilly Media, the go-to source for technical books, just let us know that they're having a 50% off sale on eBook pre-orders of Lightweight Django today. Use coupon code: DEAL.

Lightweight Django is being written by our very own Technical Director, Mark Lavin and Caktus alumna Julia Elman. We would've thought the book was a fantastic intro to the power of Django in web app development anyway, but since Mark and Julia wrote it, we think it’s extra fantastic.

Mark and Julia are continuing to write, but O'Reilly is providing this special pre-release peek for pre-orders. Those that pre-order automatically receive the first three chapters, content as it’s being added, the complete ebook, free lifetime access, multiple file formats, and free updates.

Og MacielFauxFactory 0.3.0

Took some time from my vacation and released FauxFactory 0.3.0 to make it Python 3 compatible and to add a new generate_utf8 method (plus some nice tweaks and code clean up).

As always, the package is available on Pypi and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Caktus GroupCaktus + Durham Bulls Game!

Is there a better way to celebrate the first day of summer than a baseball game? To ring in summer, the Caktus team and their families attended a Durham Bulls game. It was a great chance to hang out in our new city before relocating later this fall.

Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game

Caktus GroupTechPresident: Libya Uses World's First Mobile Voter Registration System

Caktus team members from our Libya mobile voter registration team recently spoke with TechPresident about the context and challenges of implementation.

Caktus GroupCaktus Supports Libya Elections with World’s First SMS Voter Registration System

Today’s election in Libya, the second general election for a governing body since Gaddafi’s ouster, is being supported in-country by our Caktus team. Caktus developers created Libya's SMS voter registration system, the first of its kind in the world.

Since 2013, we have worked closely with the Libyan government to create mobile applications that would enable poll workers and citizens to register to vote. The system currently has over 1.5 million registrants. Using lessons learned in the first national test of the system during the February elections for the constitutional draft writers, we’re excited to be on the ground, supporting the Libyan government.

Our work includes data management, running reports to show progress throughout the day, and assisting poll workers in verifying registration data. With more than 12 tons of paper registrations that resulted from SMS registrations, the vast amount of data streaming to and from the system is keeping our team on their toes.

There are many news articles describing the political instability and significant security challenges faced by Libya. There is no question that the situation is difficult. However, we see the hope and excitement of not only Libya’s election staff, but also in the citizens of this fledgling democracy. We are proud to be amongst the organizations working to support Libya’s democratic transition.

Caktus GroupGetting Started Scheduling Tasks with Celery

Many Django applications can make good use of being able to schedule work, either periodically or just not blocking the request thread.

There are multiple ways to schedule tasks in your Django app, but there are some advantages to using Celery. It’s supported, scales well, and works well with Django. Given its wide use, there are lots of resources to help learn and use it. And once learned, that knowledge is likely to be useful on other projects.

Celery versions

This documentation applies to Celery 3.0.x. Earlier or later versions of Celery might behave differently.

Introduction to Celery

The purpose of Celery is to allow you to run some code later, or regularly according to a schedule.

Why might this be useful? Here are a couple of common cases.

First, suppose a web request has come in from a user, who is waiting for the request to complete so a new page can load in their browser. Based on their request, you have some code to run that's going to take a while (longer than the person might want to wait for a web page), but you don't really need to run that code before responding to the web request. You can use Celery to have your long-running code called later, and go ahead and respond immediately to the web request.

This is common if you need to access a remote server to handle the request. Your app has no control over how long the remote server will take to respond, or the remote server might be down.

Another common situation is wanting to run some code regularly. For example, maybe every hour you want to look up the latest weather report and store the data. You can write a task to do that work, then ask Celery to run it every hour. The task runs and puts the data in the database, and then your Web application has access to the latest weather report.

A task is just a Python function. You can think of scheduling a task as a time-delayed call to the function. For example, you might ask Celery to call your function task1 with arguments (1, 3, 3) after five minutes. Or you could have your function batchjob called every night at midnight.

We'll set up Celery so that your tasks run in pretty much the same environment as the rest of your application's code, so they can access the same database and Django settings. There are a few differences to keep in mind, but we'll cover those later.

When a task is ready to be run, Celery puts it on a queue, a list of tasks that are ready to be run. You can have many queues, but we'll assume a single queue here for simplicity.

Putting a task on a queue just adds it to a to-do list, so to speak. In order for the task to be executed, some other process, called a worker, has to be watching that queue for tasks. When it sees tasks on the queue, it'll pull off the first and execute it, then go back to wait for more. You can have many workers, possibly on many different servers, but we'll assume a single worker for now.

We'll talk more later about the queue, the workers, and another important process that we haven't mentioned yet, but that's enough for now, let's do some work.

Installing celery locally

Installing celery for local use with Django is trivial - just install django-celery:

$ pip install django-celery

Configuring Django for Celery

To get started, we'll just get Celery configured to use with runserver. For the Celery broker, which we will explain more about later, we'll use a Django database broker implementation. For now, you just need to know that Celery needs a broker and we can get by using Django itself during development (but you must use something more robust and better performing in production).

In your Django settings.py file:

  1. Add these lines:
import djcelery
BROKER_URL = 'django://'

The first two lines are always needed. Line 3 configures Celery to use its Django broker.

Important: Never use the Django broker in production. We are only using it here to save time in this tutorial. In production you'll want to use RabbitMQ, or maybe Redis.

  1. Add djcelery and kombu.transport.django to INSTALLED_APPS:

djcelery is always needed. kombu.transport.django is the Django-based broker, for use mainly during development.

  1. Create celery's database tables. If using South for schema migrations:
$ python manage.py migrate


$ python manage.py syncdb

Writing a task

As mentioned before, a task can just be a Python function. However, Celery does need to know about it. That's pretty easy when using Celery with Django. Just add a tasks.py file to your application, put your tasks in that file, and decorate them. Here's a trivial tasks.py:

from celery import task

def add(x, y):
    return x + y

When djcelery.setup_loader() runs from your settings file, Celery will look through your INSTALLED_APPS for tasks.py modules, find the functions marked as tasks, and register them for use as tasks.

Marking a function as a task doesn't prevent calling it normally. You can still call it: z = add(1, 2) and it will work exactly as before. Marking it as a task just gives you additional ways to call it.

Scheduling it

Let's start with the simple case we mentioned above. We want to run our task soon, we just don't want it to hold up our current thread. We can do that by just adding .delay to the name of our task:

from myapp.tasks import add

add.delay(2, 2)

Celery will add the task to its queue ("worker, please call myapp.tasks.add(2, 2)") and return immediately. As soon as an idle worker sees it at the head of the queue, the worker will remove it from the queue, then execute it:

import myapp.tasks.add

myapp.tasks.add(2, 2)

A warning about import names

It's important that your task is always imported and refered to using the same package name. For example, depending on how your Python path is set up, it might be possible to refer to it as either myproject.myapp.tasks.add or myapp.tasks.add. Or from myapp.views, you might import it as .tasks.add. But Celery has no way of knowing those are all the same task.

djcelery.setup_loader() will register your task using the package name of your app in INSTALLED_APPS, plus .tasks.functionname. Be sure when you schedule your task, you also import it using that same name, or very confusing bugs can occur.

Testing it

Start a worker

As we've already mentioned, a separate process, the worker, has to be running to actually execute your Celery tasks. Here's how we can start a worker for our development needs.

First, open a new shell or window. In that shell, set up the same Django development environment - activate your virtual environment, or add things to your Python path, whatever you do so that you could use runserver to run your project.

Now you can start a worker in that shell:

$ python manage.py celery worker --loglevel=info

The worker will run in that window, and send output there.

Run your task

Back in your first window, start a Django shell and run your task:

$ python manage.py shell
>>> from myapp.tasks import add
>>> add.delay(2, 2)

You should see output in the worker window indicating that the worker has run the task:

[2013-01-21 08:47:08,076: INFO/MainProcess] Got task from broker: myapp.tasks.add[e080e047-b2a2-43a7-af74-d7d9d98b02fc]
[2013-01-21 08:47:08,299: INFO/MainProcess] Task myapp.tasks.add[e080e047-b2a2-43a7-af74-d7d9d98b02fc] succeeded in 0.183349132538s: 4

An Example

Earlier we mentioned using Celery to avoid delaying responding to a web request. Here's a simplified Django view that uses that technique:

# views.py

def view(request):
    form = SomeForm(request.POST)
    if form.is_valid():
        data = form.cleaned_data
        # Schedule a task to process the data later
    return render_to_response(...)

# tasks.py

def do_something_with_form_data(data):
    call_slow_web_service(data['user'], data['text'], ...)


It can be frustrating trying to get Celery tasks working, because multiple parts have to be present and communicating with each other. Many of the usual tips still apply:

  • Get the simplest possible configuration working first.
  • Use the python debugger and print statements to see what's going on.
  • Turn up logging levels (e.g. --loglevel debug on the worker) to get more insight.

There are also some tools that are unique to Celery.

Eager scheduling

In your Django settings, you can add:


and Celery will bypass the entire scheduling mechanism and call your code directly.

In other words, with CELERY_ALWAYS_EAGER = True, these two statements run just the same:

add.delay(2, 2)
add(2, 2)

You can use this to get your core logic working before introducing the complication of Celery scheduling.

Peek at the Queue

As long as you're using Django itself as your broker for development, your queue is stored in a Django database. That means you can look at it easily. Add a few lines to admin.py in your application:

from kombu.transport.django import models as kombu_models

Now you can go to /admin/django/message/ to see if there are items on the queue. Each message is a request from Celery for a worker to run a task. The contents of the message are rather inscrutable, but just knowing if your task got queued can sometimes be useful. The messages tend to stay in the database, so seeing a lot of messages there doesn't mean your tasks aren't getting executed.

Check the results

Anytime you schedule a task, Celery returns an AsyncResult object. You can save that object, and then use it later to see if the task has been executed, whether it was successful, and what the result was.

result = add.delay(2, 2)
if result.ready():
    print "Task has run"
    if result.successful():
        print "Result was: %s" % result.result
        if isinstance(result.result, Exception):
            print "Task failed due to raising an exception"
            raise result.result
            print "Task failed without raising exception"
     print "Task has not yet run"

Periodic Scheduling

Another common case is running a task on a regular schedule. Celery implements this using another process, celerybeat. Celerybeat runs continually, and whenever it's time for a scheduled task to run, celerybeat queues it for execution.

For obvious reasons, only one celerybeat process should be running (unlike workers, where you can run as many as you want and need).

Starting celerybeat is similar to starting a worker. Start another window, set up your Django environment, then:

$ python manage.py celery beat

There are several ways to tell celery to run a task on a schedule. We're going to look at storing the schedules in a Django database table. This allows you to easily change the schedules, even while Django and Celery are running.

Add this setting:

CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'

You can now add schedules by opening the Django admin and going to /admin/djcelery/periodictask/. See the image above for what adding a new periodic task looks like, and here's how the fields are used:

  • Name — Any name that will help you identify this scheduled task later.
  • Task (registered) — This should give a choice of any of your defined tasks, as long as you've started Django at least once after adding them to your code. If you don't see the task you want here, it's better to figure out why and fix it than use the next field.
  • Task (custom) — You can enter the full name of a task here (e.g. myapp.tasks.add), but it's better to use the registered tasks field just above this.
  • Enabled — You can uncheck this if you don't want your task to actually run for some reason, for example to disable it temporarily.
  • Interval — Use this if you want your task to run repeatedly with a certain delay in between. You'll probably need to use the green "+" to define a new schedule. This is pretty simple, e.g. to run every 5 minutes, set "Every" to 5 and "Period" to minutes.
  • Crontab — Use crontab, instead of Interval, if you want your task to run at specific times. Use the green "+" and fill in the minute, hour, day of week, day of month, and day of year. You can use "*" in any field in place of a specific value, but be careful - if you use "*" in the Minute field, your task will run every minute of the hour(s) selected by the other fields. Examples: to run every morning at 7:30 am, set Minute to "30", Hour to "7", and the remaining fields to "*".
  • Arguments — If you need to pass arguments to your task, you can open this section and set *args and **kwargs.
  • Execution Options — Advanced settings that we won't go into here.

Default schedules

If you want some of your tasks to have default schedules, and not have to rely on someone setting them up in the database after installing your app, you can use Django fixtures to provide your schedules as initial data for your app.

  • Set up the schedules you want in your database.
  • Dump the schedules in json format:
$ python manage.py dumpdata djcelery --indent=2 --exclude=djcelery.taskmeta >filename.json
  • Create a fixtures directory inside your app
  • If you never want to edit the schedules again, you can copy your json file to initial_data.json in your fixtures directory. Django will load it every time syncdb is run, and you'll either get errors or lose your changes if you've edited the schedules in your database. (You can still add new schedules, you just don't want to change the ones that came from your initial data fixture.)
  • If you just want to use these as the initial schedules, name your file something else, and load it when setting up a site to use your app:
$ python manage.py loaddata <your-app-label/fixtures/your-filename.json

Hints and Tips

Don't pass model objects to tasks

Since tasks don't run immediately, by the time a task runs and looks at a model object that was passed to it, the corresponding record in the database might have changed. If the task then does something to the model object and saves it, those changes in the database are overwritten by older data.

It's almost always safer to save the object, pass the record's key, and look up the object again in the task:



def mytask(pk):
    myobject = MyModel.objects.get(pk=pk)

Schedule tasks in other tasks

It's perfectly all right to schedule one task while executing another. This is a good way to make sure the second task doesn't run until the first task has done some necessary work first.

Don't wait for one task in another

If a task waits for another task, the first task's worker is blocked and cannot do any more work until the wait finishes. This is likely to lead to a deadlock, sooner or later.

If you're in Task A and want to schedule Task B, and after Task B completes, do some more work, it's better to create a Task C to do that work, and have Task B schedule Task C when it's done.

Next Steps

Once you understand the basics, parts of the Celery User's Guide are good reading. I recommend these chapters to start with; the others are either not relevant to Django users or more advanced:

Using Celery in production

The Celery configuration described here is for convenience in development, and should never be used in production.

The most important change to make in production is to stop using kombu.transport.django as the broker, and switch to RabbitMQ or something equivalent that is robust and scalable.

Caktus GroupReflecting on Challenges Faced by Female Developers

Karen Tracey, a Django core committer and Caktus Lead Developer and Technical Manager, recently participated in TriLUG’s panel on Women in Free and/or Open Source Software. Karen was one of five female developers who discussed challenges women face in joining the open source community. We recently caught up with Karen to discuss her own experience.

Why do you think there are so few women software developers?
This question always come up. There’s no good single answer. I think there are implicit and explicit messages women get from a very young age that this is not for them. Nobody really knows the complete answer. It was great to see a lot of women come to the meeting. I hope the panel was useful for them and encourages increased participation.

Did you think of computer science as a “boy’s only” field?
I’m old enough that when I was entering the field, women were at the highest levels within computer science. I entered at a time where women were joining a lot of professional fields-- law, medicine, etc. I had no reason to think computer engineering was different.

Also, I had this bubble with technical parents. My father worked for IBM, as had my mother before having children, and I had an IBM PC the year they came out. Also, I went to an all-girl’s high school and I think that helped in the sense that there was no boy’s group to say this is a boy’s thing. For me, there wasn’t a lot of pushing away that younger girls now see in the field.

I think the highest enrollment in computer science degree was when I went to college over twenty-five years ago. Notre Dame had far more men than women at the time, so that there were single-digit number of females in a class of around 100 seemed more like a reflection of the school’s gender ratio. I was not limited in what I could do.

Did you receive any negative messages at the beginning of your career?
I did with a professor who flat-out stated women shouldn’t be in technical fields. I was a grad student at the time and had received enough positive feedback by then that his opinion did not hold much weight with me, plus he said I was an “exception”. But his message could have been quite inhibiting to me a few years earlier I think. There have been multiple gender-related dustups through the overall open source community. When I first started using Django, I did question whether to sign my own name on my very first question posted to the django-users mailing list. I didn’t know if it was wise to reveal I was a woman before I was established in the community. I did and got an excellent welcome, but I was not sure what to expect having read about various ways in which women were disrespected in such communities.

What do you think individuals in the open source community can do to increase participation by women?
Be welcoming, including explicit individual invitations to attend/participate (this came up during the panel). Be aware that many women may have been receiving these “this is not for you” messages from a young age and try to counteract it. Be observant and try to notice any behavior by others which may be unwelcoming. If you see unwelcoming or bad behavior, take steps to correct it. For example, if someone makes an inappropriate joke, don’t just ignore it but rather make it clear to the joke-teller and whatever group that heard it that you don’t find it funny or appropriate.

Caktus GroupCTO Copeland Featured on WNCN for Open Government App

Colin Copeland, our Chief Technology Officer, recently spoke to WNCN about a new web application, NCFoodInspector.com, that lets Durham County visitors know the cleanliness of nearby restaurants. Colin helped build the application in his spare time as captain of Code for Durham Brigade, an all-volunteer group dedicated to using technology to improve access to publicly available sanitation scores. The group leverages open source technology to build applications.

NCFoodInspector.com displays a map and listing of restaurants, their sanitation score, and details of any violations. This makes difficult to access health inspection information readily available for the first time. To ensure the app reached multiple populations, it is also available in Spanish.

Colin says this is just the first of many future applications. The Brigade hopes to build more apps that can serve as a resource to the Durham County community using public information.

To view Colin’s interview, visit WNCN.

Og MacielTwenty Three Years

My parents were eagerly awaiting our arrival on an early Spring morning, and when our plane finally landed after the almost 10 1/2 hours flight and we made our way to the luggage claim area, the reunion was filled with a lot of hugging, laughter and a huge sigh of relief. For someone who had spent most of their entire lives in a small and sleepy town in the East coast of Brazil, waking up and finding yourself at JFK Airport was nothing short of a major event! I had never seen so many people of so many different races and speaking so many different dialects in my entire life, all 16 years of them! Everywhere I looked, everything was so different from what I was used to… even signs (so many of them) were in a different language! Eventually we grabbed our luggage and made our way to the parking lot looking for our car.

Before my sister and I left Brazil, I had the very hard task of giving away all of my possessions and only bringing the very bare minimal to start “a new life”. I was still going through my mid-teenager years, so I had to give away all of my favorite music LPs, books, childhood toys, and all the mementos I had collected through the years. This may not be such a big deal to you, but I have always been very attached to the things people give me, specially if they were given by someone I really cared. Seeing the things that represented so many people and moments of my life slowly drifting away filled me with a great feeling of personal loss. This feeling would stay with me for the next couple of years as I tried to adjust to my new adopted country. I was a stranger in a different land, where nobody knew me and I did not know anyone.

It’s been 23 years since this event took place, and I’m still here in the “Land of the Free”. Through the years I have survived High School, graduated with a Bachelors in Science from an university in Upstate New York, married (another immigrant from another country who you shall meet soon), moved a couple of times, and now find myself raising three young girls in North Carolina, the first Maciel generation of our families to be born outside our countries! Our similarities and differences, however, go beyond only the generation gap!

You see, contrary to a lot of the “stereotypical” immigrant families, we have completely immersed ourselves into the Americal way of life and culture, with a dash of our childhood cultures sprinkled here and there to add a little diversity to the mix. My wife and I stopped observing the holidays from our countries of origin a long time ago, specially those with no corresponding holidays here. We share a lot of the things that we learned growing up with our kids, but always in a nostalgic, almost didactic sort of way. We speak a mix of Brazilian Portuguese-Mexican Spanish-New Jersey English at home and try our best not to force our children to learn either language in particular. As it stands now, our kids’ primary language is English and even though I still make a habit of speaking in Brazilian Portuguese to them, their vocabulary consists of several words that they only say either in Spanish or Portuguese, like the word “daddy”. My wife’s vocabulary has also gone through a very interesting transformation, and she now speaks more Portuguese than Spanish when talking to our kids. Maybe it is because she was very young when she moved to New York in the early 1990s and never really got a lot of exposure to the Spanish language growing up in a different country.

All I can say is that I call North Carolina home, I vote during elections, I always get emotional when hearing the American Anthem, and together with my wife I raise the next generation of the Maciel family! Maybe they will take some of our culture and teach it to their own kids one day… maybe one day they may even learn to speak Portuguese or Spanish… maybe they won’t, and that is ok by me. We don’t even force them to follow the same religion our parents (and their parents) taught us growing up, prefering that they make that decision on their own, when and if they’re ever interested in doing so. We want them to be able to choose their own paths and make educated decisions about every aspect of their lives without any pressure or guilt.

I’m an American-Brazilian, my wife is American-Mexican and our kids are Americans with a touch of Brazilian and Mexican pride and culture. Together we form the New American Family!

Frank WierzbickiRough cut of Jython devguide.

A while ago I started a port of the CPython devguide but I never got it to the point that I felt I could release it. I've decided that it's better to have an incomplete version out there vs. having no Jython devguide at all, so I'm doing a soft launch. It contains much CPython specific instructions that don't actually apply to Jython and it certainly has gaps. Some of the gaps are flagged by TODO notes. Please feel free to comment or, best of all, send patches! Patches can be made against the main devguide (for enhancements that apply to both CPython and Jython) or, for Jython only changes: the Jython fork.

Joe GregorioNo more JS frameworks

Stop writing Javascript frameworks.

Translations: Japanese

JavaScript frameworks seem like death and taxes; inevitable and unavoidable. I'm sure that if I could be a fly on that wall every time someone started a new web project, the very first question they'd ask is, which JS framework are we using? That's how ingrained the role of JS frameworks are in the industry today. But that's not the way it needs to be, and actually, it needs to stop.

Let's back up and see how we got here.

Angular and Backbone and Ember, oh my.

For a long time the web platform, the technology stack most succinctly described as HTML+CSS+JS, was, for lack of a better term, a disaster. Who can forget the IE box model, or the layer tag? I'm sure I just started several of you twitching with flashbacks to the bad old days of web development with just those words.

For a long time there was a whole lot of inconsistency between browsers and we, as an industry, had to write frameworks to paper over them. The problem is that there was disagreement even on the fundamental issues among browsers, like how events propagate, or what tags to support, so every framework not only papered over the holes, but designed their own model of how the browser should work. Actually their own models, plural, because you got to invent a model for how events propagate, a model for how to interact with the DOM, etc. A lot of inventing went on. So frameworks were written, each one a snowflake, a thousand flowers bloomed and gave us the likes of jQuery and Dojo and MochiKit and Ext JS and AngularJS and Backbone and Ember and React. For the past ten years we’ve been churning out a steady parade of JS frameworks.

But something else has happened over the past ten years; browsers got better. Their support for standards improved, and now there are evergreen browsers: automatically updating browsers, each version more capable and standards compliant than the last. With newer standards like:

I think it's time to rethink the model of JS frameworks. There's no need to invent yet another way to do something, just use HTML+CSS+JS.

So why are we still writing JS frameworks? I think a large part of it is inertia, it's habit. But is that so bad, it's not like frameworks are actively harmful, right? Well, let's first start off by defining what I mean by web framework. There's actually a gradient of code that starts with a simple snippet of code, such as a Gist, and that moves to larger and larger collections of code, moving up to libraries, and finally frameworks:

gist -> library -> framework

Frameworks aren't just big libraries, they have their own models for how to interact with events, with the DOM, etc. So why avoid frameworks?

Abstractions Well, one of the problems of frameworks is usually one of their selling points, that they abstract away the platform so you can concentrate on building your own software. The problem is that now you have two systems to learn, HTML+CSS+JS, and the framework. Sure, if the framework was a perfect abstraction of the web as a platform you would never have to go beyond the framework, but guess what, abstractions leak. So you need to know HTML+CSS+JS because at some point your program won't work the way you expect it to, and you’ll have to dig down through all the layers in the framework to figure out what's wrong, all the way down to HTML+CSS+JS.

Mapping the iceberg.

A framework is like an iceberg, that 10% floating above the water doesn't look dangerous, it's the hidden 90% that will eventually get you. Actually it's even more apt than that, learning a framework is like mapping an iceberg, in order to use the framework you need to learn the whole thing, apply the effort of mapping out the entire thing, and in the long run the process is pointless because the iceberg is going to melt anyway.

Widgets Another selling point of frameworks is that you can get access to a library of widgets. But really, you shouldn't need to adopt a framework to get access to widgets, they should all be orthogonal and independent. A good example of this today is CodeMirror, a syntax highlighting code editor built in JavaScript. You can use it anywhere, no framework needed.

There is also the lost effort of building widgets for a framework. Remember all those MochiKit widgets you wrote? Yeah, how much good are they doing you now that you've migrated to Ember, or Angular?

Data Binding Honestly I've never needed it, but if you do, it should come in the form of a library and not a framework.

The longer term problem with frameworks is that they end up being silos, they segment the landscape, widgets built for framework A don't work in framework B. That's lost effort.

So what does a post-framework world look like?

HTML+CSS+JS are my framework.

The fundamental idea is that frameworks aren't needed, use the capabilities already built into HTML+CSS+JS to build your widgets. Break apart the monoliths into orthogonal components that can be mixed in any combination. The final pieces that enable all of this fall under the umbrella of Web Components.

HTML Imports, HTML Templates, Custom Elements, and Shadow DOM are the enabling technologies that should allow us to cut the cord from frameworks, allowing the creation of reusable elements and functionality. For a much better introduction see these articles and libraries:

So, we all create <x-flipbox>'s, declare victory, and go home?

No, not actually, the first thing you need for working with Web Components are polyfills for that functionality, such as X-Tag and Polymer. The need for those will decrease over time as browsers flesh out their implementations of those specifications.

A point to be stressed here is that these polyfills aren't frameworks that introduce their own models to developing on the web, they enable the HTML 5 model. But that isn't really the only need, there are still minor gaps in the platform where one browser deviates in a small way from current standards, and that's something we need to polyfill. MDN seems to have much of the needed code, as the documentation frequently contains short per-function polyfills.

So one huge HTML 5 Polyfill library would be good, but even better would be what I call html-5-polyfill-o-matic, a set of tools that allows me to write Web Components via bog standard HTML+JS and then after analyzing my code, either via static analysis or via Object.observe at runtime, it produces a precise subset of the full HTML 5 polyfill for my project.

This sort of functionality will be even more important as I start trying to mix and match web components and libraries from multiple sources, i.e. an <x-foo> from X-Tag and a <core-bar> from Polymer, does that mean I should have to include both of their polyfill libraries? (It turns out the answer is no.) And how exactly should I get these custom elements? Both X-Tag and Brick have custom bundle generators:

If I start creating custom elements do I need to create my own custom bundler too? I don't think that's a scalable idea, I believe we need idioms and tools that handle this much better. This may actually mean changing how we do open source; a 'widget' isn't a project, so our handling of these things needs to change. Sure, still put the code in Git, but do you need the full overhead of a GitHub project? Something lighter weight, closer to a Gist than a current project might be a better fit. How do I minimize/vulcanize all of this code into the right form for use in my project? Something like Asset Graph might be a good start on that.

So what do we need now?

  1. Idioms and guidelines for building reusable components.
  2. Tools that work under those idioms to compile, crush, etc. all that HTML, CSS, and JS.
  3. A scalable HTML 5 polyfill, full or scaled down based on what's really used.

That's what we need to build a future where we don't need to learn the latest model of the newest framework, instead we just work directly with the platform, pulling in custom elements and libraries to fill specific needs, and spend our time building applications, not mapping icebergs.


Q: Why do you hate framework authors.

A: I don’t hate them. Some of my best friends are framework authors. I will admit a bit of inspiration from the tongue-in-cheek you have ruined javascript, but again, no derision intended for framework authors.

Q: You can’t do ____ in HTML5, for that you need a framework.

A: First, that's not a question. Second, thanks for pointing that out. Now let's work together to add the capabilities to HTML 5 that allows ____ to be done w/o a framework.

Q: But ___ isn't a framework, it's a library!

A: Yeah, like I said, it’s a gradient from gist to framework, and you might draw the lines slightly differently from me. That's OK, this isn't about the categorization of any one particular piece of software, it's about moving away from frameworks.

Q: I've been doing this for years, with ___ and ___ and ___.

A: Again, that's not a question, but regardless, good for you, you should be in good shape to help everyone else.

Q: So everyone needs to rewrite dropdown menus, tabs, sliders and toggles themselves?

A: Absolutely not, the point is there should be a way to create those elements in a way that doesn't require buying into one particular framework.

Q: Dude, all those HTML Imports are going to kill my sites performance.

A: Yes, if you implemented all this stuff naively it would, which is why I mentioned the need for tools to compile and crush all the HTML, CSS, and JS.

Q: So I'm not supposed to use any libraries?

A: No, that's not what I said, I was very careful to delineate a line between libraries and frameworks, a library providing an orthogonal piece of functionality that can be used with other libraries. Libraries are fine, it's the frameworks that demand 100% buyin that I'd like to see us move away from.

Q: But I like data binding!

A: Lot's of people do, I was only expressing a personal preference. I didn't say that you shouldn't use data binding, but only that you don't need to adopt an entire framework to get data-binding, there are standalone libraries for that.

Og MacielFauxFactory 0.2.1

paper bag release

Short on its heels, today I’m releasing FauxFactory 0.2.1 to fix a brown paper bag bug I encountered last night before going to bed.

Basically, the new “Lorem Ipsum” generator was not honoring the words parameter if you asked for a string longer than 70 characters. I have fixed the issue as well as added a new test to make sure that the generator does the right thing.

The package is available on Pypi (sadly the page is still not rendering correctly… suggestions welcome) and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Image: Cry by LLewleyn Williams a.k.a. SCUD, some rights reserved.

Og MacielFauxFactory 0.2.0

Today I’m releasing FauxFactory 0.2.0 with a new feature, a “Lorem Ipsum” generator. I confess that I did not look around for any existing implementation in python out there and just started writing code. My idea was to create a method that would:

Return a “Lorem Ipsum” string if I passed no arguments:

In [1]: from fauxfactory import FauxFactory

In [2]: FauxFactory.generate_iplum()
Out[2]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.'

Return a single paragraph with a fixed number of words if I passed a numeric words=x argument. If words was a large number, the text would ‘wrap around’ as many times as needed:

In [3]: FauxFactory.generate_iplum(words=8)
Out[3]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit.'

In [4]: FauxFactory.generate_iplum(words=80)
Out[4]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.'

If paragraphs=x was used, then a given number of paragraphs containing the entire “Lorem Ipsum” string is returned:

In [5]: FauxFactory.generate_iplum(paragraphs=1)
Out[5]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.'

In [6]: FauxFactory.generate_iplum(paragraphs=2)
Out[6]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.\nLorem ipsum
dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.'

Finally, if both words and paragraphs are used, then a given number of paragraphs with the specified number of words is returned, with the text ‘flowing’ and ‘wrapping around’ as needed:

In [7]: FauxFactory.generate_iplum(paragraphs=1, words=7)
Out[7]: u'Lorem ipsum dolor sit amet, consectetur adipisicing.'

In [8]: FauxFactory.generate_iplum(paragraphs=3, words=7)
Out[8]: u'Lorem ipsum dolor sit amet, consectetur adipisicing.\nElit,
sed do eiusmod tempor incididunt ut.\nLabore et dolore magna aliqua.
Ut enim.'

The package is available on Pypi (sadly the page is not rendering correctly… suggestions welcome) and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Frank WierzbickiJython 2.7 beta2 released!

Update: This release of Jython requires JDK 7 or above.

On behalf of the Jython development team, I'm pleased to announce that the second beta of Jython 2.7 is available. I'd like to thank Adconion Media Group for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7b2 brings us up to language level compatibility with the 2.7 version
of CPython. We have focused largely on CPython compatibility, and so this
release of Jython can run more pure Python apps then any previous release.
Please see the NEWS file for detailed release notes. This is primarily a bugfix
release, with numerous improvements, including much improvement on Windows

As a beta release we are concentrating on bug fixing and stabilization for a
production release.

This release is being hosted at maven central. The traditional installer can be found here. See the installation instructions for using the installer. Three other versions are available:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Og MacielHiring is Tough!

So I’ve been trying to hire two python developers to join my automation team here at Red Hat since last November, 2013… and believe it or not, so far I’ve had absolutely zero success in finding good, strong, with real world experience candidates in North Carolina! I either find really smart people, who do have relevant backgrounds or could ‘hit the ground running’ but are way out of my current budget, or they lack real world experience and fall into more of an entry level position.

Basically I’m looking for someone who not only can ‘speak python’ fluently but also has experience doing automation and writing tests, as well as that ‘QE mindset’ that makes you want to automate all things and question all things about a product! Someone who knows how to file a good bug report and knows how to provide pertinent, relevant information to developers so that they can fix a bug. Finally, someone who believes in continuous integration and is excited about an opportunity to improve and augment our existing testing framework and work with a very exciting product, knowing that your contributions will affect literally thousands of customers world wide!

Bonus points if you know what Selenium is and have played with Paramiko and/or Requests!!!

Does that interest you? Feel that you got what I’m looking for? Then take a peek at these two positions and apply fast!

Caktus GroupCaleb and Rebecca at this Month’s Girl Develop It Intro to Python Class

One of Caktus’ most pedagogically focused developers, Caleb Smith, will be teaching a class to a group of local budding Pythonistas tomorrow, Saturday 26th, and Caktus’ Rebecca Lovewell will be contributing as a teaching assistant. You can read more about it, and sign up via the meetup page for the event. The class is run by the local chapter of Girl Develop It, a group focused on improving the landscape of women in tech via women focused (but not exclusive) educational opportunities.

This class is a labor of love for Caleb and Rebecca who contribute for fun and as a way to help out new coders. Caleb has developed his curriculum in the open using a GitHub repository with input from Rebecca, Nick Lang, and Leslie Ray. It’s great to see a distributed team collaborating using development tools to create curriculum that ultimately gets more women involved in technology through local classes.

Josh JohnsonCentralized Ansible Management With Knockd + Auto-provisioning with AWS

Ansible is a great tool. We’ve been using it at my job with a fair amount of success. When it was chosen, we didn’t have a requirement for supporting Auto scaling groups in AWS. This offers a unique problem – we need machines to be able to essentially provision themselves when AWS brings them up. This has interesting implications outside of AWS as well. This article covers using the Ansible API to build just enough of a custom playbook runner to target a single machine at a time, and discusses how to wire it up to knockd, a “port knocking” server and client, and finally how to use user data in AWS to execute this at boot – or any reboot.

Ansible – A “Push” Model

Ansible is a configuration management tool used in orchestration of large pieces of infrastructure. It’s structured as a simple layer above SSH – but it’s a very sophisticated piece of software. Bottom line, it uses SSH to “push” configuration out to remote servers – this differs from some other popular approaches (like Chef, Puppet and CFEngine) where an agent is run on each machine, and a centralized server manages communication with the agents. Check out How Ansible Works for a bit more detail.

Every approach has it’s advantages and disadvantages – discussing the nuances is beyond the scope of this article, but the primary disadvantage that Ansible has is one of it’s strongest advantages: it’s decentralized and doesn’t require agent installation. The problem arises when you don’t know your inventory (Ansible-speak for “list of all your machines”) beforehand. This can be mitigated with inventory plugins. However, when you have to configure machines that are being spun up dynamically, that need to be configured quickly, the push model starts to break down.

Luckily, Ansible is highly compatible with automation, and provides a very useful python API for specialized cases.

Port Knocking For Fun And Profit

Port knocking is a novel way of invoking code. It involves listening to the network at a very low level, and listening for attempted connections to a specific sequence of ports. No ports are opened. It has its roots in network security, where it’s used to temporarily open up firewalls. You knock, then you connect, then you knock again to close the door behind you. It’s very cool tech.

The standard implementation of port knocking is knockd, included with  most major linux distributions. It’s extremely light weight, and uses a simple configuration file. It supports some interesting features, such as limiting the number of times a client can invoke the knock sequence, by commenting out lines in a flat file.

User Data In EC2

EC2 has a really cool feature called user data, that allows you to add some information to an instance upon boot. It works with cloud-init (installed on most AMIs) to perform tasks and run scripts when the machine is first booted, or rebooted.

Auto Scalling

EC2 provides a mechanism for spinning up instances based on need (or really any arbitrary event). The AWS documentation gives a detailed overview of how it works. It’s useful for responding to sudden spikes in demand, or contracting your running instances during low-demand periods.

Ansilbe + Knockd = Centralized, On-Demand Configuration

As mentioned earlier, Ansible provides a fairly robust API for use in your own scripts. Knockd can be used to invoke any shell command. Here’s how I tied the two together.


All of my experimentation was done in EC2, using the Ubuntu 12.04 LTS AMI.

To get the machine running ansible configured, I ran the following commands:

$ sudo apt-get update
$ sudo apt-get install python-dev python-pip knockd
$ sudo pip install ansible

Note: its important that you install the python-dev package before you install ansible. This will provide the proper headers so that the c-based SSH library will be compiled, which is faster than the pure-python version installed when the headers are not available.

You’ll notice some information from the knockd package regarding how to enable it. Take note of this for final deployment, but we’ll be running knockd manually during this proof-of-concept exercise.

On the “client” machine, the one who is asking to be configured, you need only install knockd. Again, the service isn’t enabled by default, but the package provides the knock command.

EC2 Setup

We require a few things to be done in the EC2 console for this all to work.

First, I created a keypair for use by the tool. I called “bootstrap”. I downloaded it onto a freshly set up instance I designated for this purpose.

NOTE: It’s important to set the permissions of the private key correctly. They must be set to 0600.

I then needed to create a special security group. The point of the group is to allow all ports from within the current subnet. This gives us maximum flexibility when assigning port knock sequences.

Here’s what it looks like:

Depending on our circumstances, we would need to also open up UDP traffic as well (port knocks can be TCP or UDP based, or a combination within a sequence).

For the sake of security, a limited range of a specific type of connection is advised, but since we’re only communicating over our internal subnet, the risk here is minimal.

Note that I’ve also opened SSH traffic to the world. This is not advisable as standard practice, but it’s necessary for me since I do not have a fixed IP address on my connection.

Making It Work

I wrote a simple python script that runs a given playbook against a given IP address:

Script to run a given playbook against a specific host

import ansible.playbook
from ansible import callbacks
from ansible import utils

import argparse
import os, sys

parser = argparse.ArgumentParser(
    description="Run an ansible playbook against a specific host."

    help="The IP address or hostname of the machine to run the playbook against."

    help="Specify path to a specific playbook to run."

    help="Specify path to a config file. Defaults to %(default)s."

def run_playbook(host, playbook, user, key_file):
    Run a given playbook against a specific host, with the given username
    and private key file.
    stats = callbacks.AggregateStats()
    playbook_cb = callbacks.PlaybookCallbacks(verbose=utils.VERBOSITY)
    runner_cb = callbacks.PlaybookRunnerCallbacks(stats, verbose=utils.VERBOSITY)

    pb = ansible.playbook.PlayBook(


options = parser.parse_args()

playbook = os.path.abspath("./playbooks/%s" % options.playbook)

run_playbook(options.host, playbook, 'ubuntu', "./bootstrap.pem")

Most of the script is user-interface code, using argparse to bring in configuration options. One unimplemented feature is using an INI file to specify things like the default playbook, pem key, user, etc. These things are just hard coded in the call to run_playbook for this proof-of-concept implementation.

The real heart of the script is the run_playbook function. Given a host (IP or hostname), a path to a playbook file (assumed to be relative to a “playbooks” directory), a user and a private key, it uses the Ansible API to run the playbook.

This function represents the bare-minimum code required to apply a playbook to one or more hosts. It’s surprisingly simple – and I’ve only scratched the surface here of what can be done. With custom callbacks, instead of the ones used by the ansible-playbook runner, we can fine tune how we collect information about each run.

The playbook I used for testing this implementation is very simplistic (see the Ansible playbook documentation for an explaination of the playbook syntax):

- hosts: all
  sudo: yes
  - name: ensure apache is at the latest version
    apt: update_cache=yes pkg=apache2 state=latest
  - name: drop an arbitrary file just so we know something happened
    copy: src=it_ran.txt dest=/tmp/ mode=0777

It just installs and starts apache, does an apt-get update, and drops a file into /tmp to give me a clue that it ran.

Note that the hosts: setting is set to “all” – this means that this playbook will run regardless of the role or class of the machine. This is essential, since, again, the machines are unknown when they invoke this script.

For the sake of simplicity, and to set a necessary environment variable, I wrapped the call to my script in a shell script:

cd /home/ubuntu
/usr/bin/python /home/ubuntu/run_playbook.py $1 >> $1.log 2>&1

The $ANSIBLE_HOST_KEY_CHECKING environment variable here is necessary, short of futzing with the ssh configuration for the ubuntu user, to tell Ansible to not bother verifying host keys. This is required in this situation because the machines it talks to are unknown to it, since the script will be used to configure newly launched machines. We’re also running the playbook unattended, so there’s no one to say “yes” to accepting a new key.

The script also does some very rudimentary logging of all output from the playbook run – it creates logs for each host that it services, for easy debugging.

Finally, the following configuration in knockd.conf makes it all work:


        sequence    = 9000, 9999
        seq_timeout = 5
        Command     = /home/ubuntu/run.sh %IP%

The first configuration section [options], is special to knockd – its used to configure the server. Here we’re just asking knockd to log message to the system log (e.g. /var/log/messages).

The [ansible] section sets up the knock sequence for an machine that wants Ansible to configure it. The sequence set here (it can be anything – any port number and any number of ports >= 2) is 9000, 9999. There’s a 5 second timeout – in the event that the client doing the knocking takes longer than 5 seconds to complete the sequence, nothing happens.

Finally, the command to run is specified. The special %IP% variable is replaced when the command is executed by the IP address of the machine that knocked.

At this point, we can test the setup by running knockd. We can use the -vD options to output lots of useful information.

We just need to then do the knocking from a machine that’s been provisioned with the bootstrap keypair.

Here’s what it looks like (these are all Ubuntu 12.04 LTS instances):

On the “server” machine, the one with the ansible script:

$  sudo knockd -vD
config: new section: 'options'
config: usesyslog
config: new section: 'ansible'
config: ansible: sequence: 9000:tcp,9999:tcp
config: ansible: seq_timeout: 5
config: ansible: start_command: /home/ubuntu/run.sh %IP%
ethernet interface detected
Local IP:
listening on eth0...

On the “client” machine, the one that wants to be provisioned:

$ knock 9000 9999

Back on the server machine, we’ll see some output upon successful knock:

2014-03-23 10:32:02: tcp: -> 74 bytes ansible: Stage 1
2014-03-23 10:32:02: tcp: -> 74 bytes ansible: Stage 2 ansible: OPEN SESAME
ansible: running command: /home/ubuntu/run.sh


Making It Automatic With User Data

Now that we have a way to configure machines on demand – the knock could happen at any time, from a cron job, executed via a distributed SSH client (like fabric), etc – we can use the user data feature of EC2 with cloud-init to do the knock at boot, and every reboot.

Here is the user data that I used, which is technically cloud config code (more examples here):

 - knockd

 - knock 9000 9999

User data can be edited at any time as long as an EC2 instance is in the “stopped” state. When launching a new instance, the field is hidden in Step 3, under “Advanced Details”:

User Data FieldOnce this is established, you can use the “launch more like this” feature of the AWS console to replicate the user data.

This is also a prime use case for writing your own provisioning scripts (using something like boto) or using something a bit higher level, like CloudFormation.

Auto Scaling And User Data

Auto Scaling is controlled via “auto scaling groups” and “launch configuration”. If you’re not familiar these can sound like foreign concepts, but they’re quite simple.

Auto Scaling Groups define how many instances will be maintained, and set up the events to scale up or down the number of instances in the group.

Launch Configurations are nearly identical to the basic settings used when launching an EC2 instance, including user data. In fact, user data is entered in on Step 3 of the process, in the “Advanced Details” section, just like when spinning up a new EC2 instance.

In this way, we can automatically configure machines that come up via auto scaling.

Conclusions And Next Steps

This proof of concept presents an exciting opportunity for people who use Ansible and have use cases that benefit from a “pull” model – without really changing anything about their setup.

Here are a few miscellaneous notes, and some things to consider:

  • There are many implementations of port knocking, beyond knockd. There is a huge amount of information available to dig into the concept itself, and it’s various implementations.
  • The way the script is implemented, it’s possible to have different knock sequences execute different playbooks. A “poor-man’s” method of differentiating hosts.
  • The Ansible script could be coupled the AWS API to get more information about the particular host it’s servicing. Imagine using a tag to set the “class” or “role” of the machine. The API could be used to look up that information about the host, and apply playbooks accordingly. This could also be done with variables – the values that are “punched in” when a playbook is run. This means one source of truth for configuration – just add the relevant bits to the right tags, and it just works.
  • I tested this approach with an auto scaling group, but I’ve used a trivial playbook and only launched 10 machines at a time – it would be a good idea to test this approach with hundreds of machines and more complex plays – my “free tier” t1.micro instance handled this “stampeding herd” without a blink, but it’s unclear how this really scales. If anyone gives this a try, please let me know how it went.
  • Custom callbacks could be used to enhance the script to send notifications when machines were launched, as well as more detailed logging.

Caktus GroupCaktus Attends YTH Live

Last week Tobias and I had a great time attending our first Youth+Tech+Health Live conference. I went to present along with our partners Sara LeGrand and Emily Pike from Duke and UNC respectively on our NIH/SBIR funded game focused on encouraging HIV medication adherence. The panel we spoke on "Stick to it: Tech for Medical Adherence + Health Interventions" was rounded out by Dano Beck from the Oregon Health Authority speaking about how they have used SMS message reminders successfully to increase HIV medication adherence in Oregon.

We had a great response to our talk. It’s not often that you get a chance to talk to other teams around North America focused on creating games to improve health outcomes. We learned about other teams making health related educational games and lots of programs doing mobile support for youth through SMS messaging help lines. It was clear from the schedule of talks and conversations that happened around the event space, that everyone was learning a lot and getting a unique chance to share about their projects.

Caktus GroupCaktus is going to Montréal for PyCon 2014!

Caktus is happy to once again sponsoring and attending PyCon in Montreal this year. Year after year, we look forward to this conference and we are always impressed with the quality of the speakers that the conference draws. The team consistently walks away with new ideas from attending the talks, open spaces and working on sprints that they are excited to implement here at Caktus and in their personal projects.

A favorite part of PyCon for us is meeting and catching up with people, so we are excited to premiere Duckling at PyCon, an app that will helps people organize outings and connect at the conference. Come by our booth, #700 on Thursday night at the opening reception to say “hi” and chat with our team!