A planet of blogs from our members...

Vinod KurupUsing dynamic queries in a CBV

Let's play 'Spot the bug'. We're building a simple system that shows photos. Each photo has a publish_date and we should only show photos that have been published (i.e. their publish_date is in the past).

``` python models.py class PhotoManager(models.Manager):

def live(self, as_of=None):
    if as_of is None:
        as_of = timezone.now()
    return super().get_query_set().filter(publish_date__lte=as_of)


And the view to show those photos:

``` python views.py class ShowPhotosView(ListView):

queryset = Hero.objects.live()


Can you spot the bug? I sure didn't... until the client complained that newly published photos never showed up on the site. Restarting the server fixed the problem temporarily. The newly published photos would show up, but then any photos published after the server restart again failed to display.

The problem is that the ShowPhotosView class is instantiated when the server starts. ShowPhotosView.queryset gets set to the value returned by Hero.objects.live(). That, in turn, is a QuerySet, but it's a QuerySet with as_of set to timezone.now() WHEN THE SERVER STARTS UP. That as_of value never gets updated, so newer photos never get captured in the query.

There's probably multiple ways to fix this, but an easy one is:

``` python views.py class ShowPhotosView(ListView):

def get_queryset(self):
    return Hero.objects.live()


Now, instead of the queryset being instantiated at server start-up, it's instantiated only when ShowPhotosView.get_queryset() is called, which is when a request is made.

Caktus GroupA Culture of Code Reviews

Code reviews are one of those things that everyone agrees are worthwhile, but sometimes don’t get done. A good way to keep getting the benefits of code reviews is to establish, and even nurture, a culture of code reviews.

When code reviews are part of the culture, people don’t just expect their changes to be reviewed, they want their changes reviewed.

Some advantages of code reviews

We can all agree that code reviews improve code quality by spotting bugs. But there are other advantages, especially when changes are reviewed consistently.

Having your own code reviewed is a learning experience. We all have different training and experiences, and code reviews give us a chance to share what we know with others on the team. The more experienced developer might be pointing out some pitfall they’ve learned by bitter experience, while the enthusiastic new developer is suggesting the latest library that can do half the work for you.

Reviewing other people’s code is a learning experience too. You’ll see better ways of doing things that you’ll want to adopt.

If all code is reviewed, there are no parts of the code that only one person is familiar with. The code becomes a collaborative product of the team, not a bunch of pieces “owned” by individual programmers.

Obstacles to code reviews

But you only get the benefits of code reviews if you do them. What are some things that can get in the way?

Insufficient staffing is an obvious problem, whether there’s only one person working on the code, or no one working on the code has time to review other changes, or to wait for their own to be reviewed. To nurture a culture of code reviews, enough staff needs to be allocated to projects to allow code reviews to be a part of the normal process. That means at least two people on a team who are familiar enough with the project to do reviews. If there’s not enough work for two full-team team members, one member could be part-time, or even both. Better two people working on a project part-time than one person full-time.

Poor tools can inhibit code reviews. The more difficult something is, the more likely we are to avoid doing it. Take the time to adopt a good set of tools, whether GitHub’s pull requests, the open source ReviewBoard project, or anything else that handles most of the tedious parts of a code review for you. It should be easy to give feedback, linked to the relevant changes, and to respond to the feedback.

Ego is one of the biggest obstacles. No one likes having their work criticized. But we can do things in ways that reduce people’s concerns.

Code reviews should be universal - everyone’s changes are reviewed, always. Any exception can be viewed, if someone is inclined that way, as an indication that some developers are “better” than others.

Reviews are about the code, not the coder. Feedback should be worded accordingly. Instead of saying “You forgot to check the validity of this input”, reviewers can say “This function is missing validation of this input”, and so forth.

We do reviews because our work is important and we want it to be as good as possible, not because we expect our co-workers to screw it up. At the same time, we recognize that we are all human, and humans are fallible.

Establishing a culture of code reviews

Having a culture where code reviews are just a normal part of the workflow, and we’d feel naked without them, is the ideal. But if you’re not there yet, how can you move in that direction?

It starts with commitment from management. Provide the proper tools, give projects enough staffing so there’s time for reviews, and make it clear that all changes are expected to be reviewed. Maybe provide some resources for training.

Then, get out of the way. Management should not be involved in the actual process of code reviews. If developers are reluctant to have other developers review their changes, they’re positively repelled by the idea of non-developers doing it. Keep the actual process something that happens among peers.

When adding code reviews to your workflow, there are some choices to make, and I think some approaches work better than others.

First, every change is reviewed. If developers pick and choose which changes are reviewed, inevitably someone will feel singled out, or a serious bug will slip by in a “trivial” change that didn’t seem to merit a review.

Second, review changes before they’re merged or accepted. A “merge then review” process can result in everyone assuming someone else will review the change, and nobody actually doing it. By requiring a review and signoff before the change is merged, the one who made the change is motivated to seek out a reviewer and get the review done.

Third, reviews are done by peers, by people who are also active coders. Writing and reviewing code is a collaboration among a team. Everyone reviews and has their own changes reviewed. It’s not a process of a developer submitting a proposed change to someone outside the team for approval.

The target

How will you know when you’re moving toward a culture of code reviews? When people want their code to be reviewed. When they complain about obstacles making it more difficult to get their code reviewed. When the team is happier because they’re producing better code and learning to be better developers.

Vinod KurupSome Emacs Posts

A few cool Emacs posts have flown across my radar, so I'm noting them here for that time in the future when I have time to play with them.

Vinod KurupPygments on Arch Linux

I wrote my first blog post in a little while (ok, ok... 18 months) yesterday and when I tried to generate the post, it failed. Silently failed, which is the worst kind of failure. I'm still not sure why it was silent, but I eventually was able to force it to show me an error message:

`` /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:354:inrescue in get_header': Failed to get header. (MentosError)

from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:335:in `get_header'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:232:in `block in mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:206:in `mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:189:in `highlight'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:24:in `pygments'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:14:in `highlight'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:37:in `block in render_code_block'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `gsub'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `render_code_block'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:12:in `pre_filter'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:28:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:112:in `block in pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `each'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:166:in `do_layout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/post.rb:195:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:200:in `block in render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `each'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:41:in `process'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/bin/jekyll:264:in `<top (required)>'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `load'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `<main>'


Professor Google tells me that this happens when you try to run the pygments.rb library in a Python 3 environment. (pygments.rb is a Ruby wrapper around the Python Pygments library). The fix is to run the code in a Python2 virtualenv. I guess the last time I updated my blog, Arch still had Python2 as the system default. No, I don't want to check how long ago that was.

$ mkvirtualenv -p `which python2` my_blog (my_blog)$ bundle exec rake generate

So now I'm running a Ruby command in a Ruby environment (rbenv) inside a Python 2 virtualenv. Maybe it's time to switch blog tools again...

Vinod KurupHow to create test models in Django

It's occasionally useful to be able to create a Django model class in your unit test suite. Let's say you're building a library which creates an abstract model which your users will want to subclass. There's no need for your library to subclass it, but your library should still test that you can create a subclass and test out its features. If you create that model in your models.py file, then Django will think that it is a real part of your library and load it whenever you (or your users) call syncdb. That's bad.

The solution is to create it in a tests.py file within your Django app. If it's not in models.py, Django won't load it during syncdb.

``` python tests.py from django.db import models from django.test import TestCase

from .models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class AbstractTest(TestCase):

def test_my_test_model(self):


A problem with this solution is that I rarely use a single tests.py file. Instead we use multiple test files collected in a tests package. If you try to create a model in tests/test_foo.py, then this approach fails because Django tries to create the model in an application named tests, but there is no such app in INSTALLED_APPS. The solution is to set app_label to the name of your app in an inner Meta class.

```python tests/test_foo.py from django.db import models from django.test import TestCase

from ..models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class Meta:
    app_label = 'myappname'

class AbstractTest(TestCase):

def test_my_test_model(self):


Oh, and I almost forgot... if you use South, this might not work, unless you set SOUTH_TESTS_MIGRATE to False in your settings file.

Comments and corrections welcome!

Joe GregorioObservations on hg and git

Having recently moved to using Git from Mercurial here are my observations:

Git just works

No matter what I try to do, there's a short and simple git command that does it. Need to copy a single file from one branch to my current branch, need to roll back the last two commits and place their changes into the index, need to push or pull from a local branch to a remote and differently named branch, there are all ways to do those things. More importantly, Git does them natively, I don't have to turn on plugins to get a particular piece of functionality.

Turning on plugins is a hurdle

The fact that what I consider to be core functionality is hidden away in plugins and need to be turned on manually is an issue. For example, look at this section of the docs for the Google API Python Client:


A big thing that trips up contributors is that "--rebase" is in a plugin (and I keep forgetting to update the docs to explain that).

Git is fast

So Git is fast, not just "ooh that was fast", but fast as in, "there must have been a problem because there's no way it could have worked that fast". That's a feature.


Git branches are much smoother and integrated than MQ. Maybe this is just because I got stuck on MQ and never learned another way to use hg, but the branch and merge workflow is a lot better than MQ.


In Git ssh: URIs just work for me. Maybe I just got lucky, or was previously unlucky, but I never seemed to be able to pull or push to a remote repository via ssh with hg, and it just worked as advertised with Git.


Git is helpful. Git is filled with helpful messages, many of the form "it looks like you are trying to do blah, here's the exact command line for that", or "you seem to be in 'weird state foo', here's a couple different command lines you might use to rectify the situation". Obviously those are paraphrasing, but the general idea of providing long, helpful messages with actual commands in them is done well throughout Git.


I'm not writing this to cast aspersions on the Mercurial developers, and I've already passed this information along to developers that work on Mercurial. I am hoping that if you're building command line tools that you can incorporate some of the items here, such as helpful error messages, speed, and robust out-of-the-box capabilities.

Caktus GroupContributing Back to Symposion

Recently Caktus collaborated with the organizers of PyOhio, a free regional Python conference, to launch the PyOhio 2014 conference website. The conference starts this weekend, July 26 - 27. As in prior years, the conference web site utilizes Eldarion’s Symposion, an opensource conference management system. Symposion powers a number of annual conference sites including PyCon and DjangoCon. In fact, as of this writing, there are 78 forks of Symposion, a nod to its widespread use for events both large and small. This collaboration afforded us the opportunity to abide by one our core tenets, that of giving back to the community.

PyOhio organizers had identified a few pain points during last year’s rollout that were resolvable in a manner that was conducive to contributing back to Symposion so that future adopters could benefit from this work. The areas we focused on were migration support, refining the user experience for proposal submitters and sponsor applicants, and schedule building.

Migration Support


The majority of our projects utilize South for tracking database migrations. They are not an absolute requirement but for those conferences that reused the same code base from year to year, rather than starting a new repository, it would be beneficial to have a migration strategy in place. There were a few minor implementation details to tackle, namely migration dependencies and introspection rules. The Symposion framework has a number of interdependent apps. As such, when using migrations, the database tables must be created in a certain order. For Symposion, there are two such dependencies: Proposals depend on Speakers, and Sponsorship depends on Conferences. The implementation can be seen in this changeset. In addition, Symposion uses a custom field for certain models; django-timezones’ TimeZoneField. There are a few Pull Requests open on this project to deal with South and introspection rules, but none of them have been incorporated. As such, we add a very simple rule to work around migration errors.

As mentioned before, these migrations give Symposion a solid migration workflow for future database changes, as well as prepping for Django 1.7’s native schema migration support.

User Experience Issues

Currently, if an unauthenticated user manages to make a proposal submission, they are simply redirected to the home page of the site. Similarly, if an authenticated user without a Speaker profile makes a submission, they are redirected to their dashboard. In both cases, there is no additional feedback for what the user should do next. We utilized the django messages framework to provide contextual feedback with help text and hyperlinks should these be valid submission attempts (https://github.com/pinax/symposion/pull/50/files).

Sponsor submissions is another area that benefited from additional contextual messages. There are a variety of sponsor levels (Unobtanium, Aluminum, etc..) that carry their own sponsor benefits (print ad in program, for example). The current workflow redirects a sponsor application to the Sponsor Details page, with no contextual message, that lists Sponsor and Benefits details. For sponsor levels with no benefits, this essentially redirects you to an update form for the details you just submitted. Our pull request redirects these cases to the user dashboard with an appropriate message, as well as providing a more helpful message for sponsor applications that do carry benefits. (https://github.com/pinax/symposion/pull/49/files).

Schedule Builder


The conference schedule is a key component to the web site, as it lets attendees (and speakers) know when to be where! It is also a fairly complex app, with a number of interwoven database tables. The screenshot below lists the components required to build a conference schedule:

At a minimum, to create one scheduled presentation requires 7 objects spread across 7 different tables. Scale this out to tens or nearly one hundred talks and the process of manually building a schedule become egregiously cumbersome. For PyCon 2014 we built a custom importer for the talks schedule. A quick glance reveals this is not easily reusable; there are pinned lunches and breaks, and this particular command assigns accepted proposals to schedule slots. For PyOhio, we wanted to provide something that was more generic and reusable. Rather than building out the entire schedule of approved talks, we wanted a fairly quick and intuitive way for an administrator to build the schedule’s skeleton via the frontend using a CSV file. The format of the CSV is intentionally basic, for example:

"date","time_start","time_end","kind"," room "
"12/12/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/12/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room2"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room2"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room2"

This sample, when uploaded, will create the requisite backend objects (2 Day, 2 Room, 2 Slot Kinds, 8 slots, and 12 SlotRoom objects). This initial implementation will fail if schedule overlaps occur, allows for front end deletion of Schedules, is tested, and provides documentation as well. Having a schedule builder will allow the conference organizers a chance to divert more energy into reviewing and securing great talks and keynotes, rather than dealing with the minutiae of administering the schedule itself.

Symposion is a great Python based conference management system. We are excited about its broad use in general, and in helping contribute to its future longevity and feature set.

Edited to Add (7/21/2014: 3:31PM): PR Merge Efforts

One of the SciPy2014 tech chairs, Sheila, is part of new efforts to get PRs merged. Join the mailing list to learn more about merges based off the PyOhio fork.

Caktus GroupWhat was being in Libya like?

Election day voting on June 25th. Image courtesy of HNEC.


Since this interview was done, Libya’s capitol began experiencing more violence. As of today, militias are fighting over control of the Tripoli international airport, the primary way in and out of this section of Libya. We’re keeping our friends and colleagues in Libya in our thoughts in these truly difficult times. This post speaks to the energy and talent of the Libyans we’ve worked with during this challenging democratic transition.

I’m the project manager for our Libya voter registration by text message project, the first in the world. Two of our staff members, Tobias McNulty, CEO, and Elliott Wilkes, Technical Project Manager, were in Libya during the June 25th elections. Given the political chaos and violence surrounding the elections, the US team was worried for them. Tobias recently returned and Elliott left for Nairobi last week. Elliott was in Libya since August 2013. I asked them some questions about being in Tripoli, Libya’s capital.

With the attempted coup just a couple weeks prior, what were tensions like on election day?

Tobias: One of the outcomes of the attempted coup in May was that the election date got moved up to June 25. This put everyone at the Libyan High National Election Commission (HNEC) on task to deliver an election much earlier than originally planned. I’m proud to say that Caktus and HNEC worked closely together to step up to the plate and meet this mandate successfully.

Elliott: Tripoli was plastered with advertisements about the election, both from HNEC and from candidates themselves. Most of the security presence in Tripoli kept to their usual posts on election day, the gas stations. Due to distribution issues and a resulting panic about petrol supplies, there was a run on gas in the city, and army and police vehicles helped guard gas stations to keep the peace while people spent hours in line waiting to top up. In spite of or perhaps because of this security presence, we didn’t witness any violence while in Tripoli during or in the days leading up to, during, or following election day. With a few marked exceptions this election proceeded largely without incident - a great success for HNEC. However, one act of violence just after the election did shake the country: the murder of humanitarian Salwa Bugaighis in Benghazi. We were all deeply saddened to hear about that significant loss.

Tobias: Yes, that was tragic. It was hard news for all of us.

How did the voter registration system function on election day?

Tobias: There were three things that HNEC and the citizens of Libya could use our text message-based voter registration system for on election day. First and foremost, poll workers and voters could use it to double check individual registrations and poll locations. Second, to track when polling centers opened so HNEC staff knew which centers needed additional support. And lastly, poll workers could send texts of number of voters that arrived, giving HNEC real-time turnout figures.

Elliott: On election day, we helped the polling center staff with message formatting, and performing custom database queries to assess progress throughout the day. It’s a real testament to our work that we encountered no technological issues throughout the election process. The system successfully handled nearly 100,000 text messages on election day alone.

What was the mood like at elections headquarters as the count of voters came in?

Tobias: At HNEC offices on election day, the mood was positive and optimistic. We had 1.5 million registrants in the system. We arrived early in the morning, and as the HNEC staff began to arrive, on numerous occasions the office stood and sang along to the Libyan national anthem. We joined in too of course. The flow was busy, but not unmanageable. We worked long days and accomplished much in the days surrounding the election. But thanks to adequate preparation on the part of HNEC and our team the workload was not unmanageable.

However, among citizens there is clearly some work to do in terms of motivating voter turnout on election day. Several citizens we talked to were indifferent to the elections, and expressed some distrust in elected leaders generally. That said, elections are still relatively new to Libya, and I think we need to moderate our expectations for how quickly they’ll grow in popularity.

Elliott: Having worked on a number of elections around the world I’ve come to the conclusion that the best election days are the boring ones - that is to say, those without incident. If you’ve done your job well, you go into election day with hours upon hours of preparations for all different outcomes and possibilities, plans A, B, C, to Z. You’ve spent weeks and months building a massive ship and all that’s left to do is enjoy the ride. And thankfully, due to our exhaustive preparations, everything was smooth sailing.

What was it like working with the Libyan government?

Tobias: While from the outside Libya may look like an unstable country with lots of negative coverage in the news, the reality on the ground is that working with HNEC has been a real pleasure. The operations staff at the Commission are motivated to continue strengthening democracy in the country, which was evidenced by the long hours many put in in the days leading up to and following the election. We’re honored that the government of Libya selected Caktus as the technology partner for this highly impactful project.

Elliott: Tobias, I couldn't agree more. Working on this project has been extraordinary. There's something special about working with a group of young, committed citizens putting in the extra hours to ensure their electoral process is as inclusive as possible, especially given that for over forty years, government services in the Libya were everything but. Their commitment to the pursuit of democracy and everything that entails has made this project a real pleasure and a deeply humbling experience. I'm proud that we've been able to support them at this critical junction.

We’ve seen the photos, but want to hear you list the pizza toppings!

Tobias: Ha, the rumors are true. The Libyans put all sorts of things on their pizza, the two most prominent of which are often canned tuna fish and french fries. Ketchup and mayonnaise are two other favorite toppings.

Elliott: It tastes terrible. Honestly, I prefer shawerma. The pizza toppings in Libya can be...a bit exotic for my taste.

Tobias: Luckily, there’s other food and the staff at HNEC didn’t hesitate to invite us in to join their meals. It was a pleasure to break bread with the staff at HNEC during such a momentous week for the country of Libya.


Libyan Pizza
Photo by Elliott Wilkes.


Caktus GroupJuly 2014 ShipIt Day Recap

This past Friday we celebrated another ShipIt day at Caktus. There was a lot of open source contribution, exploring, and learning happening in the office. The projects ranged from native mobile Firefox OS apps, to development on our automated server provisioning templates via Salt, to front-end apps aimed at using web technology to create interfaces where composing new music or performing Frozen’s Let It Go is so easy anyone can do it.

Here is a quick summary of the projects that folks worked on:

Calvin worked on updating our own minimal CMS component for editing content on a site, django-pagelets, to work nicely with Django 1.7. He also is interested in adding TinyMCE support and making it easy to upload images and reference them in the block. If you have any other suggestions for pagelets, get in touch with Calvin.

ShipIt Day Project: Anglo-Saxon / French Etymology Analyzer

Philip worked on a code to tag words in a text with basic information about their etymologies. He was interested in exploring words with dual French and Anglo-Saxon variations eg “Uncouth” and “Rude”. These words have evolved from different origins to have similar meanings in modern English and it turns out that people often perceive the French or Latin derived word, in general, to be more erudite (“erudite” from Latin) than the Anglo-Saxon variant. To explore this concept, Philip harvested word etymologies from the XML version of the Wiktionary database and categorized words from in Lewis Carroll’s Alice In Wonderland as well as reports from the National Hurricane Center. His initial results showed that Carroll’s British background was evident in his use of language, and Philip is excited to take what he developed in ShipIt day and continue to work on the project.

Mark created a Firefox OS app, Costanza, based on a concept from a Seinfeld episode. Mark’s app used standard web tools including HTML, CSS, and Javascript to build an offline app that recorded and played back audio. Mark learned a lot about building apps with the new OS and especially spent a lot of time diving into issues with packaging up apps for distribution.

Rebecca and Scott collaborated on work in porting an application to the latest and greatest Python 3. The migration of apps from Python 2 to Python 3 started off as a controversial subject in the Python community, but slowly there has been lots of progress. Caktus is embracing this transition and trying to get projects ported over when there is time. Rebecca and Scott both wrestled with some of the challenges faced with moving a big project on a legacy server over to a new Python version.

Dan also wrestled with the Python 2 to 3 growing pains, though less directly. He set out to create a reusable Django app that supported generic requirements he had encountered in a number of client apps while exporting data to comma separated value (CSV) format. But, while doing this, he ran into difference in the Python 2 and 3 standard libraries for handling CSVs. Dan created cordwainer, a generic CSV library that works both in Python 2 and 3.

ShipIt Day Project: Template Include Visualization

Victor and Caleb worked together to create a wonderful tool for debugging difficult and tangled Django template includes. The tool helps template developers edit templates without fear that they won’t know what pages on the live site may be affected by their changes. They used d3 to visualize the template in a way that was interactive and intuitive for template writers to get a handle on complex dependency trees.

Michael has been working on a migraine tracking app using in iOS using PhoneGap and JQuery mobile. He has been diving in and learning about distributing mobile apps using XCode and interfacing with the phone calendar to store migraine data. In terms of the interface, Michael studied up on accessibility in creating the app whose primary audience will not be wanting to dig into small details or stare at their bright phone long while enduring a migraine.

Karen, Vinod, and Tobias all worked together to help improve Caktus’ Django project template. Karen learned a lot about updating projects on servers provisioned with Salt while trying to close out one of the tickets on our project-template repository. The ticket she was working on was how to delete stale Python byte code (.pyc) files that are left over when a Python source code file (.py) is deleted from a Git repository. These stale .pyc files can cause errors when they aren’t deleted properly during an upgrade. Vinod worked through many issues getting Docker instead of Virtualbox with Vagrant to create virtual environments in which SaltStack can run and provisioning new servers. Docker is a lighter weight environment than a full Virtualbox Linux server and would allow for faster iteration while developing provisioning code with SaltStack. Tobias improved the default logging configuration in the template to make it easier to debug errors when they occur, and also got started on some tools for integration testing of the project template itself.

Wray and Hunter collaborated to build a music composition and performance app called Whoppy (go ahead and try it out!). Whoppy uses Web Audio to create a new randomized virtual instruments every time you start the app. Wray and Hunter worked through creating a nice interface that highlights notes in the same key so that it is easier for amateur composers to have fun making music.

Og MacielThe End For Pylyglot


It was around 2005 when I started doing translations for Free and Open-Source Software. Back then I was warmly welcomed to the Ubuntu family and quickly learned all there was to know about using their Rosetta online tool to translate and/or review existing translations for the Brazilian Portuguese language. I spent so much time doing it, even during working hours, that eventually I sort of "made a name for myself" and made my way up to the upper layers of the Ubuntu Community echelon.

Then I "graduated" and started doing translations for the upstream projects, such as GNOME, Xfce, LXDE, and Openbox. I took on more responsabilities, learned to use Git and make commits for myself as well as for other contributors, and strived to unify all Brazilian Portuguese translations across as many different projects as possible. Many discussions were had, (literally) hundreds of hours were spent going though also hundreds of thoundands of translations for hundreds of different applications, none of it bringing me any monetary of financial advantage, but all done for the simple pleasure of knowing that I was helping make FOSS applications "speak" Brazilian Portuguese.

I certainly learned a lot though the experience of working on these many projects... some times I made mistakes, other times I "fought" alone to make sure that standards and procedures were complied with. All in all, looking back I only have one regret: not being nominated to become the leader for the Brazilian GNOME translation team.

Having handled 50% of the translations for one of the GNOME releases (the other 50% was handled by a good friend, Vladimir Melo while the leader did nothing to help) and spent much time making sure that the release would go out the door 100% translated, I really thought I'd be nominated to become the next leader. Not that I felt that I needed a 'title' to show off to other people, but in a way I wanted to feel that my peers acknowledged my hard work and commitment to the project.

Seeing other people, even people with no previous experience, being nominated by the current leader to replace him was a slap in the face. It really hurt me... but I made sure to be supportive and continue to work just as hard. I guess you could say that I lived and breathed translations, my passion not knowing any limits or knowing when to stop...

But stop I eventually did, several years ago, when I realized how hard it was to land a job that would allow me to support my family (back then I had 2 small kids) and continue to do the thing I cared the most. I confess that I even went through a series of job interviews for the translation role that Jono Bacon, Canonical's former community manager, was trying to hire, but in the end things didn't work out the way I wanted. I also flirted with another similar role at MeeGo but since they wanted me to move to the West Coast I decided not to pursue it (I also had fallen in love with my then current job).


As a way to keep myself somewhat still involved with the translation communities and at the same time learn a bit more about the Django framework, I then created Pylyglot, "a web based glossary compedium for Free and Open Source Software translators heavily inspired on the Open-tran.eu web site... with the objective to 'provide a concise, yet comprehensive compilation of a body of knowledge' for translators derived from existing Free and Open Source Software translations."


I have been running this service on my own and paying for the cost of domain registration and database costs out of my own pocket for a while now, and I now find myself facing the dilema of renewing the domain registration and keep Pylyglot alive for another year... or retire it and end once and for all my relationship with FOSS translations.

Having spent the last couple of months thinking about it, I have now arrived at the conclusion that it is time to let this chapter of my life rest. Though the US$140/year that I won't be spending won't make me any richer, I don't foresee myself either maintaining or spending any time improving the project. So this July 21st, 2014 Pylyglot will close its doors and cease to exist in its current form.

To those who knew about Pylyglot and used it and, hopefuly, found it to be useful, my sincere thanks for using it. To those who supported my idea and the project itself, whether by submitting code patches, building the web site or just giving me moral support, thank you!

Caktus GroupRemoval of Mural

We have recently heard complaints about the painting over of the mural on the side of 108 Morris, the building we purchased and are restoring in Downtown Durham. I am personally distressed at this response. I see now, in retrospect, where we needed to work harder to discuss our decision with the community. In our enthusiasm to bring more life to Downtown Durham via ground-level retail space and offices for our staff, we were blind to what the public response might be to the mural. Its removal was not a decision taken lightly and one done in consultation with the Historic Preservation Commission. However, we handled this poorly. We apologize for not making more efforts to include the community in this decision.

I do wish to emphasize that though we are moving from Carrboro to Durham, many of us are Durhamites, including two of three owners. Many in our small staff of 23 feel far from outsiders. We raise our families in Durham. Our CTO, Colin Copeland, is co-captain of Code for Durham, a volunteer organization dedicated solely to giving more access to public records information for the citizens of Durham. But again, our interest in Downtown Durham is not theoretical, but the place we are building our lives… so this building project is a deeply personal one. We want to see Downtown Durham continue to thrive.

Unfortunately, in restoring a long abandoned historic building that had been remodeled by many hands over the decades, we had to make sacrifices. To return the building to its original 1910 state, we needed to unbrick the windows which would also remove sections of Emily Weinstein’s 1996 Eno River mural. The mural would receive further damage around the windows by default. Our contractor told us (and we could see) the mural had begun deteriorating. We were as diligent as humanly possible, referring often to great resources like Endangered Durham and Open Durham for images of the original building in making the final decision. It was a difficult decision and one that we, of course, could not make alone.

We tried our best to not only preserve, but to add to Durham. We submitted our proposal to the Historic Preservation Commission (HPC) and they approved it during a public meeting in April. They had already approved a similar proposal from the previous owner of the building. During the meeting, those who knew better than us-- actual preservationists-- said that going forward with the window openings would do more to preserve the integrity of the building than the more recent mural. These layers of approval made us feel we should proceed with our focus on restoration.

To further ensure we were doing right by Durham, we voluntarily and eagerly followed the guidelines of the National Park Service and the North Carolina State Historic Preservation Office for exterior restorations. The State Historic Preservation Offices and the National Park Service review the rehabilitation work to ensure that it complies with the Secretary’s Standards for Rehabilitation. As residents of Durham, we were excited and motivated at the prospect of further burnishing Downtown Durham’s reputation as a historic center.

Now, we see that we should not have assumed that the community would see and understand our sincere efforts to improve Downtown Durham. We strongly felt that occupation and restoration of a vacant building would be welcomed. We had not heard complaints until yesterday which surprised us in part because our plans were public. We received one phone call we missed, but they did not respond to our return call. We are new to land development-- as a technology firm, we can safely say that it is not our focus. But we are made up of real people. We are a small firm that got its start thanks to the community around us, so again, it pains me to think we have hurt the community in any way.

In an effort to show our good faith and make amends, we’re planning on having a public meeting within the next few weeks. We are working to arrange a space for it, but will update you as soon as possible. We want to hear your thoughts and brainstorm together how we can better support our new home. We want to listen. We will also happily share with you how the restoration is coming along with photos and mock-ups of the space.

Please sign up to join our mailing list for updates and to find out when the public meeting will be: http://cakt.us/building-updates

Again, we are eager to hear your thoughts.

Sincerely, Tobias McNulty, CEO

Caktus GroupAnnouncing Q2 Caktus Charitable Giving

Caktus participates in social impact projects around the world, but we believe in starting local. We’re proud of the many ways in which our staff contribute to local organizations, each making the world around us just a little better. To further support our employees, Caktus asks employees to suggest donations every quarter. This quarter, we’re sending contributions to the following five non-profits:

RAIN: Regional AIDS Interfaith Network

RAIN engages the community to transform lives and promote respect and dignity for all people touched by HIV through compassionate care, education and leadership development. Caktus staff visited RAIN during a focus group test of a mobile HIV adherence application last year and admired their good work.

Urban Durham Ministries

Urban Ministries of Durham welcomes more than 6,000 people each year who come seeking food, shelter, clothing and supportive services.

Ronald McDonald House of Chapel Hill

Each year, The Ronald McDonald House of Chapel Hill provides more than 2,200 families with seriously ill or injured children the basic necessities and comforts of home so that they can focus on caring for a sick child. Caktus’ contribution will shelter a family in need for one week.

Raleigh Review

The Raleigh Review mission is to foster the creation and availability of accessible yet provocative contemporary literature through our biannual magazine as well as through workshops, readings, and other community events.

LGBT Center of Durham

The LGBT Center of Raleigh is working in tandem with Durham community members to establish a Durham branch for local events, programs, and resources.

VOICES, the Chapel Hill Chorus

Voices is one of the Triangle’s oldest and most distinguished choral groups with a rich history spanning over three decades. Multiple Caktus employees participate. Caktus is providing financial support for promotional T-shirts for the group.

Caktus GroupTips for Upgrading Django

From time to time we inherit code bases running outdated versions of Django and part of our work is to get them running a stable and secure version. In the past year we've done upgrades from versions as old as 1.0 and we've learned a few lessons along the way.

Tests are a Must

You cannot begin a major upgrade without planning how you are going to test that the site works after the upgrade. Running your automated test suite should note warnings for new or pending deprecations. If you don’t have an automated test suite then now would be a good time to start one. You don't need 100% coverage, but the more you have, the more confident you will feel about the upgrade. Integration tests with Django's TestClient can help cover a lot of ground with just a few tests. You'll want to use these sparingly because they tend to be slow and fragile. However, you can use them to test your app much like a human might do, submitting forms (both valid and invalid), and navigating to various pages. As you get closer to your final target version or you find more edge cases, you can add focused unittests to cover those areas. It is possible to do these upgrades without a comprehensive automated test suite and only using manual testing but you need a thorough plan to test the entire site. This type of testing is very slow and error prone and if you are going to be upgrading multiple Django versions it may have to be run multiple times.

Know Your Release Notes

Given Django's deprecation cycle, it's easiest to upgrade a project one Django version at a time. If you try to jump two releases, you may call Django APIs which no longer exist and you’ll miss the deprecation warnings that existed only in the releases you jumped over. Each version has a few big features and a few things which were changed and deprecated. For Django 1.1, there were a number of new features for the admin and most of the deprecations and breaking changes were related to the admin. Django 1.2 added multiple database support and improved the CSRF framework which deprecated the old DB and CSRF settings and code. Static file handling landed in Django 1.3 as did class based views. This started the deprecation of the old function based generic view and the old-style url tag. Django 1.4 changed the default project layout and the manage.py script. It also improved timezone support and upgrading usually started with tracking down RuntimeWarnings about naive datetimes. The customized user was added in Django 1.5 but more important in terms of upgrading was the removal of the function based generic views like direct_to_template and redirect_to. You can see a great post about changing from the built-in User to a custom User model on our blog. Also the url tag upgrade was completed so if your templates weren't updated yet, you'd have a lot of work to do. Django 1.6 reworked the transaction handling and deprecated all of the old transaction management API. Finally in the upcoming 1.7 version, Django will add built-in migrations and projects will need to move away from South. The app-loading refactor is also landing which changes how signals should be registered and how apps like the admin should manage auto-discovery.

Do Some Spring Cleaning

As you upgrade your project remember that there are new features in the Django versions. Take the opportunity to refactor code which wasn't easily handled by older versions of Django. Django 1.2's "smart if" and "elif" (added in 1.4) can help clean up messy template blocks. Features like prefetch_related (added in 1.4) can help reduce queries on pages loading a number of related objects. The update_fields parameter on the save method (added in Django 1.5) is another place where applications can lower overhead and reduce parallel requests overwriting data.

There will also be reusable third-party applications which are no longer compatible with the latest Django versions. However, in most cases there are better applications which are up to date. Switching reusable apps can be difficult and disruptive but in most cases it's better than letting it hold you back from the latest Django release. Unless you are willing to take on the full time maintenance of an app which is a release or two behind Django, you are better off looking for an alternative.


Those are the highlights from our experiences. Get a stable test suite in place before starting, take it one release at a time, and do some spring cleaning along the way. Overall it's much less work to upgrade projects as soon as possible after new Django versions are released. But we understand dependencies and other deadlines can get in the way. If you find yourself a few releases behind, we hope this can help guide you in upgrading.


Mark Lavin is the author of the forthcoming book, Lightweight Django, from O'Reilly Media.


Caktus GroupChapelboro.com: Carrboro Firm Develops Web App to Register Voters in Libya

Chapelboro.com recently featured Caktus’ work in implementing the first ever voter registration system via text message.

Caktus GroupO'Reilly Deal: 50% Off Lightweight Django

O'Reilly Media, the go-to source for technical books, just let us know that they're having a 50% off sale on eBook pre-orders of Lightweight Django today. Use coupon code: DEAL.

Lightweight Django is being written by our very own Technical Director, Mark Lavin and Caktus alumna Julia Elman. We would've thought the book was a fantastic intro to the power of Django in web app development anyway, but since Mark and Julia wrote it, we think it’s extra fantastic.

Mark and Julia are continuing to write, but O'Reilly is providing this special pre-release peek for pre-orders. Those that pre-order automatically receive the first three chapters, content as it’s being added, the complete ebook, free lifetime access, multiple file formats, and free updates.

Og MacielFauxFactory 0.3.0

Took some time from my vacation and released FauxFactory 0.3.0 to make it Python 3 compatible and to add a new generate_utf8 method (plus some nice tweaks and code clean up).

As always, the package is available on Pypi and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Caktus GroupCaktus + Durham Bulls Game!

Is there a better way to celebrate the first day of summer than a baseball game? To ring in summer, the Caktus team and their families attended a Durham Bulls game. It was a great chance to hang out in our new city before relocating later this fall.

Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game

Caktus GroupTechPresident: Libya Uses World's First Mobile Voter Registration System

Caktus team members from our Libya mobile voter registration team recently spoke with TechPresident about the context and challenges of implementation.

Caktus GroupTriangle Business Journal: The Triangle technology fueling Libya's election

The Triangle Business Journal recently featured our work in Libya on the first ever SMS voter registration system.

Caktus GroupCaktus Supports Libya Elections with World’s First SMS Voter Registration System

Today’s election in Libya, the second general election for a governing body since Gaddafi’s ouster, is being supported in-country by our Caktus team. Caktus developers created Libya's SMS voter registration system, the first of its kind in the world.

Since 2013, we have worked closely with the Libyan government to create mobile applications that would enable poll workers and citizens to register to vote. The system currently has over 1.5 million registrants. Using lessons learned in the first national test of the system during the February elections for the constitutional draft writers, we’re excited to be on the ground, supporting the Libyan government.

Our work includes data management, running reports to show progress throughout the day, and assisting poll workers in verifying registration data. With more than 12 tons of paper registrations that resulted from SMS registrations, the vast amount of data streaming to and from the system is keeping our team on their toes.

There are many news articles describing the political instability and significant security challenges faced by Libya. There is no question that the situation is difficult. However, we see the hope and excitement of not only Libya’s election staff, but also in the citizens of this fledgling democracy. We are proud to be amongst the organizations working to support Libya’s democratic transition.

Caktus GroupGetting Started Scheduling Tasks with Celery

Many Django applications can make good use of being able to schedule work, either periodically or just not blocking the request thread.

There are multiple ways to schedule tasks in your Django app, but there are some advantages to using Celery. It’s supported, scales well, and works well with Django. Given its wide use, there are lots of resources to help learn and use it. And once learned, that knowledge is likely to be useful on other projects.

Celery versions

This documentation applies to Celery 3.0.x. Earlier or later versions of Celery might behave differently.

Introduction to Celery

The purpose of Celery is to allow you to run some code later, or regularly according to a schedule.

Why might this be useful? Here are a couple of common cases.

First, suppose a web request has come in from a user, who is waiting for the request to complete so a new page can load in their browser. Based on their request, you have some code to run that's going to take a while (longer than the person might want to wait for a web page), but you don't really need to run that code before responding to the web request. You can use Celery to have your long-running code called later, and go ahead and respond immediately to the web request.

This is common if you need to access a remote server to handle the request. Your app has no control over how long the remote server will take to respond, or the remote server might be down.

Another common situation is wanting to run some code regularly. For example, maybe every hour you want to look up the latest weather report and store the data. You can write a task to do that work, then ask Celery to run it every hour. The task runs and puts the data in the database, and then your Web application has access to the latest weather report.

A task is just a Python function. You can think of scheduling a task as a time-delayed call to the function. For example, you might ask Celery to call your function task1 with arguments (1, 3, 3) after five minutes. Or you could have your function batchjob called every night at midnight.

We'll set up Celery so that your tasks run in pretty much the same environment as the rest of your application's code, so they can access the same database and Django settings. There are a few differences to keep in mind, but we'll cover those later.

When a task is ready to be run, Celery puts it on a queue, a list of tasks that are ready to be run. You can have many queues, but we'll assume a single queue here for simplicity.

Putting a task on a queue just adds it to a to-do list, so to speak. In order for the task to be executed, some other process, called a worker, has to be watching that queue for tasks. When it sees tasks on the queue, it'll pull off the first and execute it, then go back to wait for more. You can have many workers, possibly on many different servers, but we'll assume a single worker for now.

We'll talk more later about the queue, the workers, and another important process that we haven't mentioned yet, but that's enough for now, let's do some work.

Installing celery locally

Installing celery for local use with Django is trivial - just install django-celery:

$ pip install django-celery

Configuring Django for Celery

To get started, we'll just get Celery configured to use with runserver. For the Celery broker, which we will explain more about later, we'll use a Django database broker implementation. For now, you just need to know that Celery needs a broker and we can get by using Django itself during development (but you must use something more robust and better performing in production).

In your Django settings.py file:

  1. Add these lines:
import djcelery
BROKER_URL = 'django://'

The first two lines are always needed. Line 3 configures Celery to use its Django broker.

Important: Never use the Django broker in production. We are only using it here to save time in this tutorial. In production you'll want to use RabbitMQ, or maybe Redis.

  1. Add djcelery and kombu.transport.django to INSTALLED_APPS:

djcelery is always needed. kombu.transport.django is the Django-based broker, for use mainly during development.

  1. Create celery's database tables. If using South for schema migrations:
$ python manage.py migrate


$ python manage.py syncdb

Writing a task

As mentioned before, a task can just be a Python function. However, Celery does need to know about it. That's pretty easy when using Celery with Django. Just add a tasks.py file to your application, put your tasks in that file, and decorate them. Here's a trivial tasks.py:

from celery import task

def add(x, y):
    return x + y

When djcelery.setup_loader() runs from your settings file, Celery will look through your INSTALLED_APPS for tasks.py modules, find the functions marked as tasks, and register them for use as tasks.

Marking a function as a task doesn't prevent calling it normally. You can still call it: z = add(1, 2) and it will work exactly as before. Marking it as a task just gives you additional ways to call it.

Scheduling it

Let's start with the simple case we mentioned above. We want to run our task soon, we just don't want it to hold up our current thread. We can do that by just adding .delay to the name of our task:

from myapp.tasks import add

add.delay(2, 2)

Celery will add the task to its queue ("worker, please call myapp.tasks.add(2, 2)") and return immediately. As soon as an idle worker sees it at the head of the queue, the worker will remove it from the queue, then execute it:

import myapp.tasks.add

myapp.tasks.add(2, 2)

A warning about import names

It's important that your task is always imported and refered to using the same package name. For example, depending on how your Python path is set up, it might be possible to refer to it as either myproject.myapp.tasks.add or myapp.tasks.add. Or from myapp.views, you might import it as .tasks.add. But Celery has no way of knowing those are all the same task.

djcelery.setup_loader() will register your task using the package name of your app in INSTALLED_APPS, plus .tasks.functionname. Be sure when you schedule your task, you also import it using that same name, or very confusing bugs can occur.

Testing it

Start a worker

As we've already mentioned, a separate process, the worker, has to be running to actually execute your Celery tasks. Here's how we can start a worker for our development needs.

First, open a new shell or window. In that shell, set up the same Django development environment - activate your virtual environment, or add things to your Python path, whatever you do so that you could use runserver to run your project.

Now you can start a worker in that shell:

$ python manage.py celery worker --loglevel=info

The worker will run in that window, and send output there.

Run your task

Back in your first window, start a Django shell and run your task:

$ python manage.py shell
>>> from myapp.tasks import add
>>> add.delay(2, 2)

You should see output in the worker window indicating that the worker has run the task:

[2013-01-21 08:47:08,076: INFO/MainProcess] Got task from broker: myapp.tasks.add[e080e047-b2a2-43a7-af74-d7d9d98b02fc]
[2013-01-21 08:47:08,299: INFO/MainProcess] Task myapp.tasks.add[e080e047-b2a2-43a7-af74-d7d9d98b02fc] succeeded in 0.183349132538s: 4

An Example

Earlier we mentioned using Celery to avoid delaying responding to a web request. Here's a simplified Django view that uses that technique:

# views.py

def view(request):
    form = SomeForm(request.POST)
    if form.is_valid():
        data = form.cleaned_data
        # Schedule a task to process the data later
    return render_to_response(...)

# tasks.py

def do_something_with_form_data(data):
    call_slow_web_service(data['user'], data['text'], ...)


It can be frustrating trying to get Celery tasks working, because multiple parts have to be present and communicating with each other. Many of the usual tips still apply:

  • Get the simplest possible configuration working first.
  • Use the python debugger and print statements to see what's going on.
  • Turn up logging levels (e.g. --loglevel debug on the worker) to get more insight.

There are also some tools that are unique to Celery.

Eager scheduling

In your Django settings, you can add:


and Celery will bypass the entire scheduling mechanism and call your code directly.

In other words, with CELERY_ALWAYS_EAGER = True, these two statements run just the same:

add.delay(2, 2)
add(2, 2)

You can use this to get your core logic working before introducing the complication of Celery scheduling.

Peek at the Queue

As long as you're using Django itself as your broker for development, your queue is stored in a Django database. That means you can look at it easily. Add a few lines to admin.py in your application:

from kombu.transport.django import models as kombu_models

Now you can go to /admin/django/message/ to see if there are items on the queue. Each message is a request from Celery for a worker to run a task. The contents of the message are rather inscrutable, but just knowing if your task got queued can sometimes be useful. The messages tend to stay in the database, so seeing a lot of messages there doesn't mean your tasks aren't getting executed.

Check the results

Anytime you schedule a task, Celery returns an AsyncResult object. You can save that object, and then use it later to see if the task has been executed, whether it was successful, and what the result was.

result = add.delay(2, 2)
if result.ready():
    print "Task has run"
    if result.successful():
        print "Result was: %s" % result.result
        if isinstance(result.result, Exception):
            print "Task failed due to raising an exception"
            raise result.result
            print "Task failed without raising exception"
     print "Task has not yet run"

Periodic Scheduling

Another common case is running a task on a regular schedule. Celery implements this using another process, celerybeat. Celerybeat runs continually, and whenever it's time for a scheduled task to run, celerybeat queues it for execution.

For obvious reasons, only one celerybeat process should be running (unlike workers, where you can run as many as you want and need).

Starting celerybeat is similar to starting a worker. Start another window, set up your Django environment, then:

$ python manage.py celery beat

There are several ways to tell celery to run a task on a schedule. We're going to look at storing the schedules in a Django database table. This allows you to easily change the schedules, even while Django and Celery are running.

Add this setting:

CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'

You can now add schedules by opening the Django admin and going to /admin/djcelery/periodictask/. See the image above for what adding a new periodic task looks like, and here's how the fields are used:

  • Name — Any name that will help you identify this scheduled task later.
  • Task (registered) — This should give a choice of any of your defined tasks, as long as you've started Django at least once after adding them to your code. If you don't see the task you want here, it's better to figure out why and fix it than use the next field.
  • Task (custom) — You can enter the full name of a task here (e.g. myapp.tasks.add), but it's better to use the registered tasks field just above this.
  • Enabled — You can uncheck this if you don't want your task to actually run for some reason, for example to disable it temporarily.
  • Interval — Use this if you want your task to run repeatedly with a certain delay in between. You'll probably need to use the green "+" to define a new schedule. This is pretty simple, e.g. to run every 5 minutes, set "Every" to 5 and "Period" to minutes.
  • Crontab — Use crontab, instead of Interval, if you want your task to run at specific times. Use the green "+" and fill in the minute, hour, day of week, day of month, and day of year. You can use "*" in any field in place of a specific value, but be careful - if you use "*" in the Minute field, your task will run every minute of the hour(s) selected by the other fields. Examples: to run every morning at 7:30 am, set Minute to "30", Hour to "7", and the remaining fields to "*".
  • Arguments — If you need to pass arguments to your task, you can open this section and set *args and **kwargs.
  • Execution Options — Advanced settings that we won't go into here.

Default schedules

If you want some of your tasks to have default schedules, and not have to rely on someone setting them up in the database after installing your app, you can use Django fixtures to provide your schedules as initial data for your app.

  • Set up the schedules you want in your database.
  • Dump the schedules in json format:
$ python manage.py dumpdata djcelery --indent=2 --exclude=djcelery.taskmeta >filename.json
  • Create a fixtures directory inside your app
  • If you never want to edit the schedules again, you can copy your json file to initial_data.json in your fixtures directory. Django will load it every time syncdb is run, and you'll either get errors or lose your changes if you've edited the schedules in your database. (You can still add new schedules, you just don't want to change the ones that came from your initial data fixture.)
  • If you just want to use these as the initial schedules, name your file something else, and load it when setting up a site to use your app:
$ python manage.py loaddata <your-app-label/fixtures/your-filename.json

Hints and Tips

Don't pass model objects to tasks

Since tasks don't run immediately, by the time a task runs and looks at a model object that was passed to it, the corresponding record in the database might have changed. If the task then does something to the model object and saves it, those changes in the database are overwritten by older data.

It's almost always safer to save the object, pass the record's key, and look up the object again in the task:



def mytask(pk):
    myobject = MyModel.objects.get(pk=pk)

Schedule tasks in other tasks

It's perfectly all right to schedule one task while executing another. This is a good way to make sure the second task doesn't run until the first task has done some necessary work first.

Don't wait for one task in another

If a task waits for another task, the first task's worker is blocked and cannot do any more work until the wait finishes. This is likely to lead to a deadlock, sooner or later.

If you're in Task A and want to schedule Task B, and after Task B completes, do some more work, it's better to create a Task C to do that work, and have Task B schedule Task C when it's done.

Next Steps

Once you understand the basics, parts of the Celery User's Guide are good reading. I recommend these chapters to start with; the others are either not relevant to Django users or more advanced:

Using Celery in production

The Celery configuration described here is for convenience in development, and should never be used in production.

The most important change to make in production is to stop using kombu.transport.django as the broker, and switch to RabbitMQ or something equivalent that is robust and scalable.

Caktus GroupReflecting on Challenges Faced by Female Developers

Karen Tracey, a Django core committer and Caktus Lead Developer and Technical Manager, recently participated in TriLUG’s panel on Women in Free and/or Open Source Software. Karen was one of five female developers who discussed challenges women face in joining the open source community. We recently caught up with Karen to discuss her own experience.

Why do you think there are so few women software developers?
This question always come up. There’s no good single answer. I think there are implicit and explicit messages women get from a very young age that this is not for them. Nobody really knows the complete answer. It was great to see a lot of women come to the meeting. I hope the panel was useful for them and encourages increased participation.

Did you think of computer science as a “boy’s only” field?
I’m old enough that when I was entering the field, women were at the highest levels within computer science. I entered at a time where women were joining a lot of professional fields-- law, medicine, etc. I had no reason to think computer engineering was different.

Also, I had this bubble with technical parents. My father worked for IBM, as had my mother before having children, and I had an IBM PC the year they came out. Also, I went to an all-girl’s high school and I think that helped in the sense that there was no boy’s group to say this is a boy’s thing. For me, there wasn’t a lot of pushing away that younger girls now see in the field.

I think the highest enrollment in computer science degree was when I went to college over twenty-five years ago. Notre Dame had far more men than women at the time, so that there were single-digit number of females in a class of around 100 seemed more like a reflection of the school’s gender ratio. I was not limited in what I could do.

Did you receive any negative messages at the beginning of your career?
I did with a professor who flat-out stated women shouldn’t be in technical fields. I was a grad student at the time and had received enough positive feedback by then that his opinion did not hold much weight with me, plus he said I was an “exception”. But his message could have been quite inhibiting to me a few years earlier I think. There have been multiple gender-related dustups through the overall open source community. When I first started using Django, I did question whether to sign my own name on my very first question posted to the django-users mailing list. I didn’t know if it was wise to reveal I was a woman before I was established in the community. I did and got an excellent welcome, but I was not sure what to expect having read about various ways in which women were disrespected in such communities.

What do you think individuals in the open source community can do to increase participation by women?
Be welcoming, including explicit individual invitations to attend/participate (this came up during the panel). Be aware that many women may have been receiving these “this is not for you” messages from a young age and try to counteract it. Be observant and try to notice any behavior by others which may be unwelcoming. If you see unwelcoming or bad behavior, take steps to correct it. For example, if someone makes an inappropriate joke, don’t just ignore it but rather make it clear to the joke-teller and whatever group that heard it that you don’t find it funny or appropriate.

Caktus GroupCTO Copeland Featured on WNCN for Open Government App

Colin Copeland, our Chief Technology Officer, recently spoke to WNCN about a new web application, NCFoodInspector.com, that lets Durham County visitors know the cleanliness of nearby restaurants. Colin helped build the application in his spare time as captain of Code for Durham Brigade, an all-volunteer group dedicated to using technology to improve access to publicly available sanitation scores. The group leverages open source technology to build applications.

NCFoodInspector.com displays a map and listing of restaurants, their sanitation score, and details of any violations. This makes difficult to access health inspection information readily available for the first time. To ensure the app reached multiple populations, it is also available in Spanish.

Colin says this is just the first of many future applications. The Brigade hopes to build more apps that can serve as a resource to the Durham County community using public information.

To view Colin’s interview, visit WNCN.

Og MacielTwenty Three Years

My parents were eagerly awaiting our arrival on an early Spring morning, and when our plane finally landed after the almost 10 1/2 hours flight and we made our way to the luggage claim area, the reunion was filled with a lot of hugging, laughter and a huge sigh of relief. For someone who had spent most of their entire lives in a small and sleepy town in the East coast of Brazil, waking up and finding yourself at JFK Airport was nothing short of a major event! I had never seen so many people of so many different races and speaking so many different dialects in my entire life, all 16 years of them! Everywhere I looked, everything was so different from what I was used to… even signs (so many of them) were in a different language! Eventually we grabbed our luggage and made our way to the parking lot looking for our car.

Before my sister and I left Brazil, I had the very hard task of giving away all of my possessions and only bringing the very bare minimal to start “a new life”. I was still going through my mid-teenager years, so I had to give away all of my favorite music LPs, books, childhood toys, and all the mementos I had collected through the years. This may not be such a big deal to you, but I have always been very attached to the things people give me, specially if they were given by someone I really cared. Seeing the things that represented so many people and moments of my life slowly drifting away filled me with a great feeling of personal loss. This feeling would stay with me for the next couple of years as I tried to adjust to my new adopted country. I was a stranger in a different land, where nobody knew me and I did not know anyone.

It’s been 23 years since this event took place, and I’m still here in the “Land of the Free”. Through the years I have survived High School, graduated with a Bachelors in Science from an university in Upstate New York, married (another immigrant from another country who you shall meet soon), moved a couple of times, and now find myself raising three young girls in North Carolina, the first Maciel generation of our families to be born outside our countries! Our similarities and differences, however, go beyond only the generation gap!

You see, contrary to a lot of the “stereotypical” immigrant families, we have completely immersed ourselves into the Americal way of life and culture, with a dash of our childhood cultures sprinkled here and there to add a little diversity to the mix. My wife and I stopped observing the holidays from our countries of origin a long time ago, specially those with no corresponding holidays here. We share a lot of the things that we learned growing up with our kids, but always in a nostalgic, almost didactic sort of way. We speak a mix of Brazilian Portuguese-Mexican Spanish-New Jersey English at home and try our best not to force our children to learn either language in particular. As it stands now, our kids’ primary language is English and even though I still make a habit of speaking in Brazilian Portuguese to them, their vocabulary consists of several words that they only say either in Spanish or Portuguese, like the word “daddy”. My wife’s vocabulary has also gone through a very interesting transformation, and she now speaks more Portuguese than Spanish when talking to our kids. Maybe it is because she was very young when she moved to New York in the early 1990s and never really got a lot of exposure to the Spanish language growing up in a different country.

All I can say is that I call North Carolina home, I vote during elections, I always get emotional when hearing the American Anthem, and together with my wife I raise the next generation of the Maciel family! Maybe they will take some of our culture and teach it to their own kids one day… maybe one day they may even learn to speak Portuguese or Spanish… maybe they won’t, and that is ok by me. We don’t even force them to follow the same religion our parents (and their parents) taught us growing up, prefering that they make that decision on their own, when and if they’re ever interested in doing so. We want them to be able to choose their own paths and make educated decisions about every aspect of their lives without any pressure or guilt.

I’m an American-Brazilian, my wife is American-Mexican and our kids are Americans with a touch of Brazilian and Mexican pride and culture. Together we form the New American Family!

Frank WierzbickiRough cut of Jython devguide.

A while ago I started a port of the CPython devguide but I never got it to the point that I felt I could release it. I've decided that it's better to have an incomplete version out there vs. having no Jython devguide at all, so I'm doing a soft launch. It contains much CPython specific instructions that don't actually apply to Jython and it certainly has gaps. Some of the gaps are flagged by TODO notes. Please feel free to comment or, best of all, send patches! Patches can be made against the main devguide (for enhancements that apply to both CPython and Jython) or, for Jython only changes: the Jython fork.

Joe GregorioNo more JS frameworks

Stop writing Javascript frameworks.

JavaScript frameworks seem like death and taxes; inevitable and unavoidable. I'm sure that if I could be a fly on that wall every time someone started a new web project, the very first question they'd ask is, which JS framework are we using? That's how ingrained the role of JS frameworks are in the industry today. But that's not the way it needs to be, and actually, it needs to stop.

Let's back up and see how we got here.

Angular and Backbone and Ember, oh my.

For a long time the web platform, the technology stack most succinctly described as HTML+CSS+JS, was, for lack of a better term, a disaster. Who can forget the IE box model, or the layer tag? I'm sure I just started several of you twitching with flashbacks to the bad old days of web development with just those words.

For a long time there was a whole lot of inconsistency between browsers and we, as an industry, had to write frameworks to paper over them. The problem is that there was disagreement even on the fundamental issues among browsers, like how events propagate, or what tags to support, so every framework not only papered over the holes, but designed their own model of how the browser should work. Actually their own models, plural, because you got to invent a model for how events propagate, a model for how to interact with the DOM, etc. A lot of inventing went on. So frameworks were written, each one a snowflake, a thousand flowers bloomed and gave us the likes of jQuery and Dojo and MochiKit and Ext JS and AngularJS and Backbone and Ember and React. For the past ten years we’ve been churning out a steady parade of JS frameworks.

But something else has happened over the past ten years; browsers got better. Their support for standards improved, and now there are evergreen browsers: automatically updating browsers, each version more capable and standards compliant than the last. With newer standards like:

I think it's time to rethink the model of JS frameworks. There's no need to invent yet another way to do something, just use HTML+CSS+JS.

So why are we still writing JS frameworks? I think a large part of it is inertia, it's habit. But is that so bad, it's not like frameworks are actively harmful, right? Well, let's first start off by defining what I mean by web framework. There's actually a gradient of code that starts with a simple snippet of code, such as a Gist, and that moves to larger and larger collections of code, moving up to libraries, and finally frameworks:

gist -> library -> framework

Frameworks aren't just big libraries, they have their own models for how to interact with events, with the DOM, etc. So why avoid frameworks?

Abstractions Well, one of the problems of frameworks is usually one of their selling points, that they abstract away the platform so you can concentrate on building your own software. The problem is that now you have two systems to learn, HTML+CSS+JS, and the framework. Sure, if the framework was a perfect abstraction of the web as a platform you would never have to go beyond the framework, but guess what, abstractions leak. So you need to know HTML+CSS+JS because at some point your program won't work the way you expect it to, and you’ll have to dig down through all the layers in the framework to figure out what's wrong, all the way down to HTML+CSS+JS.

Mapping the iceberg.

A framework is like an iceberg, that 10% floating above the water doesn't look dangerous, it's the hidden 90% that will eventually get you. Actually it's even more apt than that, learning a framework is like mapping an iceberg, in order to use the framework you need to learn the whole thing, apply the effort of mapping out the entire thing, and in the long run the process is pointless because the iceberg is going to melt anyway.

Widgets Another selling point of frameworks is that you can get access to a library of widgets. But really, you shouldn't need to adopt a framework to get access to widgets, they should all be orthogonal and independent. A good example of this today is CodeMirror, a syntax highlighting code editor built in JavaScript. You can use it anywhere, no framework needed.

There is also the lost effort of building widgets for a framework. Remember all those MochiKit widgets you wrote? Yeah, how much good are they doing you now that you've migrated to Ember, or Angular?

Data Binding Honestly I've never needed it, but if you do, it should come in the form of a library and not a framework.

The longer term problem with frameworks is that they end up being silos, they segment the landscape, widgets built for framework A don't work in framework B. That's lost effort.

So what does a post-framework world look like?

HTML+CSS+JS are my framework.

The fundamental idea is that frameworks aren't needed, use the capabilities already built into HTML+CSS+JS to build your widgets. Break apart the monoliths into orthogonal components that can be mixed in any combination. The final pieces that enable all of this fall under the umbrella of Web Components.

HTML Imports, HTML Templates, Custom Elements, and Shadow DOM are the enabling technologies that should allow us to cut the cord from frameworks, allowing the creation of reusable elements and functionality. For a much better introduction see these articles and libraries:

So, we all create <x-flipbox>'s, declare victory, and go home?

No, not actually, the first thing you need for working with Web Components are polyfills for that functionality, such as X-Tag and Polymer. The need for those will decrease over time as browsers flesh out their implementations of those specifications.

A point to be stressed here is that these polyfills aren't frameworks that introduce their own models to developing on the web, they enable the HTML 5 model. But that isn't really the only need, there are still minor gaps in the platform where one browser deviates in a small way from current standards, and that's something we need to polyfill. MDN seems to have much of the needed code, as the documentation frequently contains short per-function polyfills.

So one huge HTML 5 Polyfill library would be good, but even better would be what I call html-5-polyfill-o-matic, a set of tools that allows me to write Web Components via bog standard HTML+JS and then after analyzing my code, either via static analysis or via Object.observe at runtime, it produces a precise subset of the full HTML 5 polyfill for my project.

This sort of functionality will be even more important as I start trying to mix and match web components and libraries from multiple sources, i.e. an <x-foo> from X-Tag and a <core-bar> from Polymer, does that mean I should have to include both of their polyfill libraries? (It turns out the answer is no.) And how exactly should I get these custom elements? Both X-Tag and Brick have custom bundle generators:

If I start creating custom elements do I need to create my own custom bundler too? I don't think that's a scalable idea, I believe we need idioms and tools that handle this much better. This may actually mean changing how we do open source; a 'widget' isn't a project, so our handling of these things needs to change. Sure, still put the code in Git, but do you need the full overhead of a GitHub project? Something lighter weight, closer to a Gist than a current project might be a better fit. How do I minimize/vulcanize all of this code into the right form for use in my project? Something like Asset Graph might be a good start on that.

So what do we need now?

  1. Idioms and guidelines for building reusable components.
  2. Tools that work under those idioms to compile, crush, etc. all that HTML, CSS, and JS.
  3. A scalable HTML 5 polyfill, full or scaled down based on what's really used.

That's what we need to build a future where we don't need to learn the latest model of the newest framework, instead we just work directly with the platform, pulling in custom elements and libraries to fill specific needs, and spend our time building applications, not mapping icebergs.


Q: Why do you hate framework authors.

A: I don’t hate them. Some of my best friends are framework authors. I will admit a bit of inspiration from the tongue-in-cheek you have ruined javascript, but again, no derision intended for framework authors.

Q: You can’t do ____ in HTML5, for that you need a framework.

A: First, that's not a question. Second, thanks for pointing that out. Now let's work together to add the capabilities to HTML 5 that allows ____ to be done w/o a framework.

Q: But ___ isn't a framework, it's a library!

A: Yeah, like I said, it’s a gradient from gist to framework, and you might draw the lines slightly differently from me. That's OK, this isn't about the categorization of any one particular piece of software, it's about moving away from frameworks.

Q: I've been doing this for years, with ___ and ___ and ___.

A: Again, that's not a question, but regardless, good for you, you should be in good shape to help everyone else.

Q: So everyone needs to rewrite dropdown menus, tabs, sliders and toggles themselves?

A: Absolutely not, the point is there should be a way to create those elements in a way that doesn't require buying into one particular framework.

Q: Dude, all those HTML Imports are going to kill my sites performance.

A: Yes, if you implemented all this stuff naively it would, which is why I mentioned the need for tools to compile and crush all the HTML, CSS, and JS.

Q: So I'm not supposed to use any libraries?

A: No, that's not what I said, I was very careful to delineate a line between libraries and frameworks, a library providing an orthogonal piece of functionality that can be used with other libraries. Libraries are fine, it's the frameworks that demand 100% buyin that I'd like to see us move away from.

Q: But I like data binding!

A: Lot's of people do, I was only expressing a personal preference. I didn't say that you shouldn't use data binding, but only that you don't need to adopt an entire framework to get data-binding, there are standalone libraries for that.

Og MacielFauxFactory 0.2.1

paper bag release

Short on its heels, today I'm releasing FauxFactory 0.2.1 to fix a brown paper bag bug I encountered last night before going to bed.

Basically, the new "Lorem Ipsum" generator was not honoring the words parameter if you asked for a string longer than 70 characters. I have fixed the issue as well as added a new test to make sure that the generator does the right thing.

The package is available on Pypi (sadly the page is still not rendering correctly... suggestions welcome) and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Image: Cry by LLewleyn Williams a.k.a. SCUD, some rights reserved.

Og MacielFauxFactory 0.2.0

Today I'm releasing FauxFactory 0.2.0 with a new feature, a "Lorem Ipsum" generator. I confess that I did not look around for any existing implementation in python out there and just started writing code. My idea was to create a method that would:

Return a "Lorem Ipsum" string if I passed no arguments:

In [1]: from fauxfactory import FauxFactory

In [2]: FauxFactory.generate_iplum()
Out[2]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.'

Return a single paragraph with a fixed number of words if I passed a numeric words=x argument. If words was a large number, the text would 'wrap around' as many times as needed:

In [3]: FauxFactory.generate_iplum(words=8)
Out[3]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit.'

In [4]: FauxFactory.generate_iplum(words=80)
Out[4]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.'

If paragraphs=x was used, then a given number of paragraphs containing the entire "Lorem Ipsum" string is returned:

In [5]: FauxFactory.generate_iplum(paragraphs=1)
Out[5]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.'

In [6]: FauxFactory.generate_iplum(paragraphs=2)
Out[6]: u'Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut
enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.\nLorem ipsum
dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt
mollit anim id est laborum.'

Finally, if both words and paragraphs are used, then a given number of paragraphs with the specified number of words is returned, with the text 'flowing' and 'wrapping around' as needed:

In [7]: FauxFactory.generate_iplum(paragraphs=1, words=7)
Out[7]: u'Lorem ipsum dolor sit amet, consectetur adipisicing.'

In [8]: FauxFactory.generate_iplum(paragraphs=3, words=7)
Out[8]: u'Lorem ipsum dolor sit amet, consectetur adipisicing.\nElit,
sed do eiusmod tempor incididunt ut.\nLabore et dolore magna aliqua.
Ut enim.'

The package is available on Pypi (sadly the page is not rendering correctly... suggestions welcome) and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Frank WierzbickiJython 2.7 beta2 released!

Update: This release of Jython requires JDK 7 or above.

On behalf of the Jython development team, I'm pleased to announce that the second beta of Jython 2.7 is available. I'd like to thank Adconion Media Group for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7b2 brings us up to language level compatibility with the 2.7 version
of CPython. We have focused largely on CPython compatibility, and so this
release of Jython can run more pure Python apps then any previous release.
Please see the NEWS file for detailed release notes. This is primarily a bugfix
release, with numerous improvements, including much improvement on Windows

As a beta release we are concentrating on bug fixing and stabilization for a
production release.

This release is being hosted at maven central. The traditional installer can be found here. See the installation instructions for using the installer. Three other versions are available:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Og MacielHiring is Tough!

So I've been trying to hire two python developers to join my automation team here at Red Hat since last November, 2013... and believe it or not, so far I've had absolutely zero success in finding good, strong, with real world experience candidates in North Carolina! I either find really smart people, who do have relevant backgrounds or could 'hit the ground running' but are way out of my current budget, or they lack real world experience and fall into more of an entry level position.

Basically I'm looking for someone who not only can 'speak python' fluently but also has experience doing automation and writing tests, as well as that 'QE mindset' that makes you want to automate all things and question all things about a product! Someone who knows how to file a good bug report and knows how to provide pertinent, relevant information to developers so that they can fix a bug. Finally, someone who believes in continuous integration and is excited about an opportunity to improve and augment our existing testing framework and work with a very exciting product, knowing that your contributions will affect literally thousands of customers world wide!

Bonus points if you know what Selenium is and have played with Paramiko and/or Requests!!!

Does that interest you? Feel that you got what I'm looking for? Then take a peek at these two positions and apply fast!

Caktus GroupCaleb and Rebecca at this Month’s Girl Develop It Intro to Python Class

One of Caktus’ most pedagogically focused developers, Caleb Smith, will be teaching a class to a group of local budding Pythonistas tomorrow, Saturday 26th, and Caktus’ Rebecca Lovewell will be contributing as a teaching assistant. You can read more about it, and sign up via the meetup page for the event. The class is run by the local chapter of Girl Develop It, a group focused on improving the landscape of women in tech via women focused (but not exclusive) educational opportunities.

This class is a labor of love for Caleb and Rebecca who contribute for fun and as a way to help out new coders. Caleb has developed his curriculum in the open using a GitHub repository with input from Rebecca, Nick Lang, and Leslie Ray. It’s great to see a distributed team collaborating using development tools to create curriculum that ultimately gets more women involved in technology through local classes.

Josh JohnsonCentralized Ansible Management With Knockd + Auto-provisioning with AWS

Ansible is a great tool. We’ve been using it at my job with a fair amount of success. When it was chosen, we didn’t have a requirement for supporting Auto scaling groups in AWS. This offers a unique problem – we need machines to be able to essentially provision themselves when AWS brings them up. This has interesting implications outside of AWS as well. This article covers using the Ansible API to build just enough of a custom playbook runner to target a single machine at a time, and discusses how to wire it up to knockd, a “port knocking” server and client, and finally how to use user data in AWS to execute this at boot – or any reboot.

Ansible – A “Push” Model

Ansible is a configuration management tool used in orchestration of large pieces of infrastructure. It’s structured as a simple layer above SSH – but it’s a very sophisticated piece of software. Bottom line, it uses SSH to “push” configuration out to remote servers – this differs from some other popular approaches (like Chef, Puppet and CFEngine) where an agent is run on each machine, and a centralized server manages communication with the agents. Check out How Ansible Works for a bit more detail.

Every approach has it’s advantages and disadvantages – discussing the nuances is beyond the scope of this article, but the primary disadvantage that Ansible has is one of it’s strongest advantages: it’s decentralized and doesn’t require agent installation. The problem arises when you don’t know your inventory (Ansible-speak for “list of all your machines”) beforehand. This can be mitigated with inventory plugins. However, when you have to configure machines that are being spun up dynamically, that need to be configured quickly, the push model starts to break down.

Luckily, Ansible is highly compatible with automation, and provides a very useful python API for specialized cases.

Port Knocking For Fun And Profit

Port knocking is a novel way of invoking code. It involves listening to the network at a very low level, and listening for attempted connections to a specific sequence of ports. No ports are opened. It has its roots in network security, where it’s used to temporarily open up firewalls. You knock, then you connect, then you knock again to close the door behind you. It’s very cool tech.

The standard implementation of port knocking is knockd, included with  most major linux distributions. It’s extremely light weight, and uses a simple configuration file. It supports some interesting features, such as limiting the number of times a client can invoke the knock sequence, by commenting out lines in a flat file.

User Data In EC2

EC2 has a really cool feature called user data, that allows you to add some information to an instance upon boot. It works with cloud-init (installed on most AMIs) to perform tasks and run scripts when the machine is first booted, or rebooted.

Auto Scalling

EC2 provides a mechanism for spinning up instances based on need (or really any arbitrary event). The AWS documentation gives a detailed overview of how it works. It’s useful for responding to sudden spikes in demand, or contracting your running instances during low-demand periods.

Ansilbe + Knockd = Centralized, On-Demand Configuration

As mentioned earlier, Ansible provides a fairly robust API for use in your own scripts. Knockd can be used to invoke any shell command. Here’s how I tied the two together.


All of my experimentation was done in EC2, using the Ubuntu 12.04 LTS AMI.

To get the machine running ansible configured, I ran the following commands:

$ sudo apt-get update
$ sudo apt-get install python-dev python-pip knockd
$ sudo pip install ansible

Note: its important that you install the python-dev package before you install ansible. This will provide the proper headers so that the c-based SSH library will be compiled, which is faster than the pure-python version installed when the headers are not available.

You’ll notice some information from the knockd package regarding how to enable it. Take note of this for final deployment, but we’ll be running knockd manually during this proof-of-concept exercise.

On the “client” machine, the one who is asking to be configured, you need only install knockd. Again, the service isn’t enabled by default, but the package provides the knock command.

EC2 Setup

We require a few things to be done in the EC2 console for this all to work.

First, I created a keypair for use by the tool. I called “bootstrap”. I downloaded it onto a freshly set up instance I designated for this purpose.

NOTE: It’s important to set the permissions of the private key correctly. They must be set to 0600.

I then needed to create a special security group. The point of the group is to allow all ports from within the current subnet. This gives us maximum flexibility when assigning port knock sequences.

Here’s what it looks like:

Depending on our circumstances, we would need to also open up UDP traffic as well (port knocks can be TCP or UDP based, or a combination within a sequence).

For the sake of security, a limited range of a specific type of connection is advised, but since we’re only communicating over our internal subnet, the risk here is minimal.

Note that I’ve also opened SSH traffic to the world. This is not advisable as standard practice, but it’s necessary for me since I do not have a fixed IP address on my connection.

Making It Work

I wrote a simple python script that runs a given playbook against a given IP address:

Script to run a given playbook against a specific host

import ansible.playbook
from ansible import callbacks
from ansible import utils

import argparse
import os, sys

parser = argparse.ArgumentParser(
    description="Run an ansible playbook against a specific host."

    help="The IP address or hostname of the machine to run the playbook against."

    help="Specify path to a specific playbook to run."

    help="Specify path to a config file. Defaults to %(default)s."

def run_playbook(host, playbook, user, key_file):
    Run a given playbook against a specific host, with the given username
    and private key file.
    stats = callbacks.AggregateStats()
    playbook_cb = callbacks.PlaybookCallbacks(verbose=utils.VERBOSITY)
    runner_cb = callbacks.PlaybookRunnerCallbacks(stats, verbose=utils.VERBOSITY)

    pb = ansible.playbook.PlayBook(


options = parser.parse_args()

playbook = os.path.abspath("./playbooks/%s" % options.playbook)

run_playbook(options.host, playbook, 'ubuntu', "./bootstrap.pem")

Most of the script is user-interface code, using argparse to bring in configuration options. One unimplemented feature is using an INI file to specify things like the default playbook, pem key, user, etc. These things are just hard coded in the call to run_playbook for this proof-of-concept implementation.

The real heart of the script is the run_playbook function. Given a host (IP or hostname), a path to a playbook file (assumed to be relative to a “playbooks” directory), a user and a private key, it uses the Ansible API to run the playbook.

This function represents the bare-minimum code required to apply a playbook to one or more hosts. It’s surprisingly simple – and I’ve only scratched the surface here of what can be done. With custom callbacks, instead of the ones used by the ansible-playbook runner, we can fine tune how we collect information about each run.

The playbook I used for testing this implementation is very simplistic (see the Ansible playbook documentation for an explaination of the playbook syntax):

- hosts: all
  sudo: yes
  - name: ensure apache is at the latest version
    apt: update_cache=yes pkg=apache2 state=latest
  - name: drop an arbitrary file just so we know something happened
    copy: src=it_ran.txt dest=/tmp/ mode=0777

It just installs and starts apache, does an apt-get update, and drops a file into /tmp to give me a clue that it ran.

Note that the hosts: setting is set to “all” – this means that this playbook will run regardless of the role or class of the machine. This is essential, since, again, the machines are unknown when they invoke this script.

For the sake of simplicity, and to set a necessary environment variable, I wrapped the call to my script in a shell script:

cd /home/ubuntu
/usr/bin/python /home/ubuntu/run_playbook.py $1 >> $1.log 2>&1

The $ANSIBLE_HOST_KEY_CHECKING environment variable here is necessary, short of futzing with the ssh configuration for the ubuntu user, to tell Ansible to not bother verifying host keys. This is required in this situation because the machines it talks to are unknown to it, since the script will be used to configure newly launched machines. We’re also running the playbook unattended, so there’s no one to say “yes” to accepting a new key.

The script also does some very rudimentary logging of all output from the playbook run – it creates logs for each host that it services, for easy debugging.

Finally, the following configuration in knockd.conf makes it all work:


        sequence    = 9000, 9999
        seq_timeout = 5
        Command     = /home/ubuntu/run.sh %IP%

The first configuration section [options], is special to knockd – its used to configure the server. Here we’re just asking knockd to log message to the system log (e.g. /var/log/messages).

The [ansible] section sets up the knock sequence for an machine that wants Ansible to configure it. The sequence set here (it can be anything – any port number and any number of ports >= 2) is 9000, 9999. There’s a 5 second timeout – in the event that the client doing the knocking takes longer than 5 seconds to complete the sequence, nothing happens.

Finally, the command to run is specified. The special %IP% variable is replaced when the command is executed by the IP address of the machine that knocked.

At this point, we can test the setup by running knockd. We can use the -vD options to output lots of useful information.

We just need to then do the knocking from a machine that’s been provisioned with the bootstrap keypair.

Here’s what it looks like (these are all Ubuntu 12.04 LTS instances):

On the “server” machine, the one with the ansible script:

$  sudo knockd -vD
config: new section: 'options'
config: usesyslog
config: new section: 'ansible'
config: ansible: sequence: 9000:tcp,9999:tcp
config: ansible: seq_timeout: 5
config: ansible: start_command: /home/ubuntu/run.sh %IP%
ethernet interface detected
Local IP:
listening on eth0...

On the “client” machine, the one that wants to be provisioned:

$ knock 9000 9999

Back on the server machine, we’ll see some output upon successful knock:

2014-03-23 10:32:02: tcp: -> 74 bytes ansible: Stage 1
2014-03-23 10:32:02: tcp: -> 74 bytes ansible: Stage 2 ansible: OPEN SESAME
ansible: running command: /home/ubuntu/run.sh


Making It Automatic With User Data

Now that we have a way to configure machines on demand – the knock could happen at any time, from a cron job, executed via a distributed SSH client (like fabric), etc – we can use the user data feature of EC2 with cloud-init to do the knock at boot, and every reboot.

Here is the user data that I used, which is technically cloud config code (more examples here):

 - knockd

 - knock 9000 9999

User data can be edited at any time as long as an EC2 instance is in the “stopped” state. When launching a new instance, the field is hidden in Step 3, under “Advanced Details”:

User Data FieldOnce this is established, you can use the “launch more like this” feature of the AWS console to replicate the user data.

This is also a prime use case for writing your own provisioning scripts (using something like boto) or using something a bit higher level, like CloudFormation.

Auto Scaling And User Data

Auto Scaling is controlled via “auto scaling groups” and “launch configuration”. If you’re not familiar these can sound like foreign concepts, but they’re quite simple.

Auto Scaling Groups define how many instances will be maintained, and set up the events to scale up or down the number of instances in the group.

Launch Configurations are nearly identical to the basic settings used when launching an EC2 instance, including user data. In fact, user data is entered in on Step 3 of the process, in the “Advanced Details” section, just like when spinning up a new EC2 instance.

In this way, we can automatically configure machines that come up via auto scaling.

Conclusions And Next Steps

This proof of concept presents an exciting opportunity for people who use Ansible and have use cases that benefit from a “pull” model – without really changing anything about their setup.

Here are a few miscellaneous notes, and some things to consider:

  • There are many implementations of port knocking, beyond knockd. There is a huge amount of information available to dig into the concept itself, and it’s various implementations.
  • The way the script is implemented, it’s possible to have different knock sequences execute different playbooks. A “poor-man’s” method of differentiating hosts.
  • The Ansible script could be coupled the AWS API to get more information about the particular host it’s servicing. Imagine using a tag to set the “class” or “role” of the machine. The API could be used to look up that information about the host, and apply playbooks accordingly. This could also be done with variables – the values that are “punched in” when a playbook is run. This means one source of truth for configuration – just add the relevant bits to the right tags, and it just works.
  • I tested this approach with an auto scaling group, but I’ve used a trivial playbook and only launched 10 machines at a time – it would be a good idea to test this approach with hundreds of machines and more complex plays – my “free tier” t1.micro instance handled this “stampeding herd” without a blink, but it’s unclear how this really scales. If anyone gives this a try, please let me know how it went.
  • Custom callbacks could be used to enhance the script to send notifications when machines were launched, as well as more detailed logging.

Caktus GroupCaktus Attends YTH Live

Last week Tobias and I had a great time attending our first Youth+Tech+Health Live conference. I went to present along with our partners Sara LeGrand and Emily Pike from Duke and UNC respectively on our NIH/SBIR funded game focused on encouraging HIV medication adherence. The panel we spoke on "Stick to it: Tech for Medical Adherence + Health Interventions" was rounded out by Dano Beck from the Oregon Health Authority speaking about how they have used SMS message reminders successfully to increase HIV medication adherence in Oregon.

We had a great response to our talk. It’s not often that you get a chance to talk to other teams around North America focused on creating games to improve health outcomes. We learned about other teams making health related educational games and lots of programs doing mobile support for youth through SMS messaging help lines. It was clear from the schedule of talks and conversations that happened around the event space, that everyone was learning a lot and getting a unique chance to share about their projects.

Caktus GroupCaktus is going to Montréal for PyCon 2014!

Caktus is happy to once again sponsoring and attending PyCon in Montreal this year. Year after year, we look forward to this conference and we are always impressed with the quality of the speakers that the conference draws. The team consistently walks away with new ideas from attending the talks, open spaces and working on sprints that they are excited to implement here at Caktus and in their personal projects.

A favorite part of PyCon for us is meeting and catching up with people, so we are excited to premiere Duckling at PyCon, an app that will helps people organize outings and connect at the conference. Come by our booth, #700 on Thursday night at the opening reception to say “hi” and chat with our team! 

Caktus GroupCaktus has a new website!

It’s been a few years since we last updated our website, and we gave it a whole new look!

With the new site, it’s easy to see just what services we offer, and our processes for bringing our client’s ideas to life. The new layout allows for more in-depth reviews of our projects, and also highlights our talented and growing team. We also wanted to share more information on our commitment to the open source community and social good. And the updated structure makes finding out about events and reading our ever-popular blog posts simple.

The new design utilizes a responsive grid structure and a refined typographic sensibility.

Wrap this all up with our new branding—adding a bold blue to our green/grey—and you get a polished and and informative new site that reflects what Caktus does best.

We hope you find the new site more intuitive, user-friendly and as easy-on-the-eyes as we do!

Caktus GroupNew for PyCon: App for Group Outings + Giant Duck

For PyCon 2014, we’ve been working for the past few months on Duckling, an app to make it easier to find and join casual group outings. Our favorite part of PyCon is meeting up with fellow Pythonistas, but without someone rounding everyone up and sorting the logistics, we’ve found it difficult to figure who’s going where and when. Our answer to this age-old conference conundrum is Duckling

Duckling, made for conferences, lets you find, join, and create outings to local restaurants, bars, or other entertainment venues. You can see who’s going, when, and where they’re meeting up to leave. It’s hooked up to Yelp, so you can look at reviews before heading out. And of course, it is written in Django too! 

Lots of options to keep up to date on all outings at PyCon: follow @ducklingquacks, find our Duckling sign in the lobby, or come visit our booth, #700, to see a map that shows where everyone is headed. Also, our booth will have a giant duck. Why? Because look how happy it makes people!

Caktus GroupCaktus Implementing New Policy Modeled on Foreign Corrupt Practices Act

Caktus’ successful growth in mobile messaging applications overseas has been a wonderful point of pride for us. The work we’re doing internationally has real impact on burgeoning democracies, HIV/AIDs patients, and others where technology makes all the difference.

As we continue this work—work we truly believe in—we think it’s important to step back and state our values. We are building a new policy modeled on the U.S. Foreign Corrupt Practices Act (FCPA). The FCPA was originally implemented to address bribery and other corrupt behavior. Our new policy will re-state our company philosophy: we will not give or receive bribes in any form, including gifts. We will adhere to all laws and regulations.

Creating a policy like this was obvious in many ways. Though we’d always operated with the assumption that we’d be transparent and above board (because why else be an open source firm if we didn’t believe in that?), we’re glad to be taking this official step.

Caktus GroupCaktus Presenting HIV mHealth Gamification App for Adherence in San Francisco

We’re pleased to announce that Caktus will be presenting Epic Allies, an mHealth Gamification App to improve ART drug adherence for HIV patients, at this year’s YTH Live in San Francisco.

Alex Lemann, Co-founder and Chief Business Development Officer, will be speaking on the panel, “Stick to It: Tech for Medical Adherence + Health Interventions,” along with our research partners from Duke University and UNC-Chapel Hill.

Our application, still in development, has a strong gamification aspect due to focus group feedback. We’ve been diligently working on building a complete new world within Epic Allies for our users-- complete with a power mad artificial intelligence and all sorts of monsters to defeat. It’s exciting to share all the work we’ve done!

For more information about the conference, visit YTH.

Caktus GroupCaktus Just Bought a Building in Downtown Durham!

After sustained growth that has us packed in six suites, we have spent the past year and a half seeking new space. We’ve found it! I’m pleased to announce that Caktus has bought a historic 1910 building with nearly 11,000 square feet of space in Downtown Durham. We'll be right near five points at 108 Morris St. The new building will be completely renovated from top to bottom to create an open workspace that’ll make it even easier for us to collaborate and share ideas.

The renovations include plans to welcome the open source community and encourage continued small business growth in Downtown Durham. The first floor will contain a retail space up for lease plus a community meeting area for local tech events. The meeting area will have a small kitchen because what’s an event without snacks?

We’re sad to leave all the great restaurants and shops within walking distance from our Carrboro offices, but we’ll also have new opportunities to explore the amenities near the new office. There’s bakeries, burger joints, food trucks, and more. Our exercise will be the short walk to and from the local eateries.

We’ll keep everyone posted on construction updates. There is much to be done. The building has been, in its 100+ year history, a nightclub (that our very own project manager, Ben, has played his saxaphone at), a funeral home, a bowling alley, and a furniture store. Currently, the construction crew is removing the giant bar and amoeba-like mirrors that line the entire first floor. We’ll let you know if we find anything interesting hidden in the walls.

We look forward to our move to downtown Durham!

Caktus GroupCongrats to PearlHacks Winners (Including Our Intern, Annie)!

Caleb Smith, Caktus developer, awarding the third place prize to TheRightFit creators Bipasa Chattopadhyay, Ping Fu, and Sarah Andrabi.

Many congratulations to the PearlHacks third place winners who won Sphero Balls! The team from UNC’s Computer Science department created TheRightFit, an Android app that helps shoppers know what sizes will fit them and their families among various brands. Their prize of Sphero Balls, programmable balls that can interact and play games via smart phones, was presented by Caktus developer and Pearl Hacks mentor Caleb Smith as part of our sponsorship. PearlHacks, held at UNC-Chapel Hill, is a conference designed to encourage female high school and college programmers from the NC and VA area.

Also, we’re deeply proud of our intern, Annie Daniel (UNC School of Journalism), who was part of the first place team for their application, The Culture of Yes. Excellent job, Annie!

This is what Annie had to say about her team's first place project:

The Culture of Yes was a web app that's meant to broaden the conversation on sexual assault on college campuses. We chose a flagship university from each of the 50 states, created a json file that summarized that university (population, male/female ratio, % in greek life, etc) and wrote scrapers to pull stories on sexual assault from each of the universitys' student newspapers. We also allow for user generated content where survivors of sexual assault are able to share their stories and experiences anonymously (or not) and present the similarities and differences of assault experiences across the U.S. It's a Django-based web app built on a bootstrap CSS framework.

Basically we wanted to focus more on the story than quantitative data. We present the university newspaper stories side-by-side with survivor experiences to show how the conversation compares to the experience of sexual assault and how the crime is treated across campuses.

Caktus GroupCaktus is sponsoring Pearl Hacks

Caktus is excited to further encourage young female programmers through our support of PearlHacks, a two-day hackathon and conference hosted by the University of North Carolina - Chapel Hill. This weekend, March 22-23rd, over 200 young women from local high schools and universities in Virginia and North Carolina will arrive for the conference.

We wanted to be more than general corporate sponsors and worked with organizers to find a way to directly engage with the students. Our participation includes being a judge during the competition, and one of our staff, Caleb Smith, will teach a workshop, Introduction to Python. We also hope to build excitement for the hackathon by providing the third place team prize, Sphero balls, an open source robotic ball that the students can program games with.

Over the course of the hackathon, attendees will be able to attend workshops about different areas of development to improve their skills. After these workshops, attendees will create teams and put what they learned in the workshops to use. There will be an all-night hackathon followed by a judging session where the attendees will demo their projects.

It is very exciting that an event like this is being held in the area and we’re so pleased to continue supporting young programmers, especially women. Our other recent efforts include the donation of tickets to PyCon 2014 to PyLadies financial aid and hosting a Python course for Girl Develop It! at our offices.

Caktus GroupCaktus Completes RapidSMS Community Coordinator Development for UNICEF

Colin Copeland, Managing Member at Caktus, has wrapped up work, supported by UNICEF, as the Community Coordinator for the open source RapidSMS project. RapidSMS is a text messaging application development library built on top of the Django web framework. It creates a SMS provider agnostic way of sending and receiving text messages. RapidSMS has been used widely in the mobile health field, in particular in areas where internet access cannot be taken for granted and cell phones are the best communication tool available. This has included projects initiated by UNICEF country offices in Ethiopia, Madagascar, Malawi, Rwanda, Uganda, Zambia, and Zimbabwe.

Modeling RapidSMS on the Django Community

The overall goals set forth by UNICEF for this project included improving community involvement by making RapidSMS easier to use and contribute to. Colin accomplished this by using Django's large and active community as a model. The community employs common standards to maintain consistency across developers. Using this best practice for the RapidSMS developers meant easier communication, collaboration, and work transfer.

Colin shepherded six releases of the RapidSMS framework over his year long stint including 948 code commits to the repository. Colin broadened engagement in the RapidSMS community by tapping five others at Caktus’ to contribute code including Dan, Alex, Rebecca, Tobias and Caleb. Evan Wheeler, a founder of RapidSMS, oversaw Caktus’ work on the project and offered an outside perspective. Evan acted as a liaison between Caktus and UNICEF by coordinating our work with Erica Kochi, co-lead of UNICEF’s Innovation Unit.

The major releases to the framework included releases 0.10 through 0.15 and included major updates both on the frontend and backend of RapidSMS.

  • 0.10 Pluggable Routers — This opened the door for different router algorithms for different use cases and removed support for the legacy and difficult to debug threaded router. For example, texts can be handled within the request cycle by using the blocking router or pushed off to a queue (which requires extra dependencies) for handling later. This settled a long standing debate on the mailing list, by letting users make their own decisions and having RapidSMS support different router options out of the box.
  • 0.11 Continuous Iteration — This release focused on testing & continuous iteration with the inclusion of a new RapidTest base class, PEP8 related changes, and monitoring test coverage using the TravisCI continuous integration tool.
  • 0.12 Interface Redesign — Caktus developers updated the default RapidSMS interface to now use Twitter Bootstrap. This included reviewing all of the current contrib apps and deprecating the ones that were no longer necessary.
  • 0.13 Bulk Messaging — This included adding an API for sending messages to many phones at once. As part of this change, the new database router was added which keeps track of which messages in a bulk message group have been sent and resends messages if there is an error.
  • 0.14 Production Hosting Documentation — This change includes best practices for hosting a production instance of RapidSMS. It is agnostic as to which cloud provider is chosen, or what provisioning tool is used, but encourages the use of a tool supported by the Django community to automate creating new servers and pushing out code changes.
  • 0.15 Tutorials & Contributing Documentation — The new documentation released in 0.15 was aimed at helping new users get up to speed quickly by following along with the tutorials. The Django tutorials were a strong influence in the tutorials developed for RapidSMS. The final push was to update the documentation to make it clear how to contirbute back to the RapidSMS development community so that development work is not duplicated across RapidSMS implementations.

A few of the themes of Colin’s tenure as Community Coordinator of the RapidSMS project were code consistency via PEP8, a focus on automated testing, test coverage monitoring, continuous integration, and improving documentation. There were also some important new features like the Bootstrap facelift & bulk messaging (supported by the router refactor) which will make it easier to write new apps and interact with RapidSMS as an administrator on the web. Colin pushed the community to embody the spirit of the Django and Python communities in RapidSMS through detailed documentation and testing. For more details about the changes, you can see the Release Notes documentation or the commits themselves.

Enabling Sharing of Development and Field Work

Part way through the development tasks for the RapidSMS project, it was brought up on the RapidSMS community mailing list that rapidsms.org was being reported as a source of malware by Google. This motivated an already present need to rebuild RapidSMS.org as a sharing platform for both types of the RapidSMS framework users—software engineers & mobile health project coordinators.

The software engineers need a place to share their reusable RapidSMS packages on the site. This is a repository of reusable code so that new community members can build their packages using existing code. These packages include apps for appointment reminders and SMS based polling. Beyond a shopping ground for reusable code, the package repository also is a place for new developers to go to see the work of others so that they can get a sense of how to structure their own projects.

Project implementers on the other hand want a high level review of real life projects where RapidSMS has been used and what the outcomes of the projects were. That is, if they are evaluating frameworks to be used by a new SMS based product, they can look at the successes that have been attributed to RapidSMS based projects.

Taking into account the needs of both software engineers and project implementors, Caktus redesigned RapidSMS.org with leadership from Colin and Evan Wheeler. The work was done by Caktus team members including design by Julia and backend development by Rebecca, Victor, David, and Caleb. The website is also open source and welcomes all contributions from new bug tickets to pull requests.

Final Thoughts

Overall, Colin provided leadership to RapidSMS and pushed the development standards higher and more inline with parent projects, Django and Python. The Caktus team, with input from Evan Wheeler, and all of the RapidSMS community, rebuilt parts of RapidSMS from deep in the core of how messages are handled to the look of the external website used by administrative staff. Colin’s leadership lay the groundwork for the next phases of RapidSMS’s codebase and the community surrounding the project.

Tim HopperShould I Do a Ph.D.? Oscar Boykin

Continuing my series of interviews on whether or not a college senior in a technical field should consider a Ph.D., I interviewed Oscar Boykin, a data scientist at Twitter. Prior to joining Twitter, Oscar was a professor of computer engineering at the University of Florida.

A 22-year old college student has been accepted to a Ph.D. program in a technical field. He's academically talented, and he's always enjoyed school and his subject matter. His acceptance is accompanied with 5-years of guaranteed funding. He doesn't have any job offers but suspects he could get a decent job as a software developer. He's not sure what to do. What advice would you give him, or what questions might you suggest he ask himself as he goes about making the decision?

Oscar: The number one question: does he or she have a burning desire to do a PhD? If so, and funding is available without taking loans, then absolutely! Go for it! It is a wonderful experience to try to, at least in one small area, touch the absolute frontier of human knowledge. If you greatly enjoy the area of study, and you find it is the kind of thing you love thinking about day in and day out, if you imagine yourself as some kind of ascetic scholar of old, toiling to make the most minor advance simply for the joy the work, then by all means.

The second reason to do a Ph.D., and this one hardly needs discussing, is that you want a career that requires it. If you want to teach at a university or do certain scientific occupations, a Ph.D may be required.

If the student has doubts about any of the above, I recommend a master's degree and an industry job. There are a few reasons: 1) If you lack the passion, the risk of not completing and the time investment do not equal the cost. 2) There is little, if any, direct financial benefit to having a Ph.D. and the cost in time is substantial. 3) Professions are changing very fast now and you should expect a lifetime of learning in any case, so without a strong desire for a PhD, why not do that learning in the environment where it is most relevant?

He decides he wants to do the Ph.D. but his timing is negotiable. Would you recommend he jump straight in or take some time off?

Oscar: Doing an internship or one year of work at a relevant company will often give students much better insight into choosing research topics. Choosing a great topic and a great advisor is the entire name of the game when it comes to the Ph.D. BUT, taking time off makes it very easy to get out of the habit of studying and learning, so if commitment is a concern, there is substantial risk that time off will turn into never coming back.

Are there skills developed while earning a Ph.D. that are particularly valuable to being a practicing software engineer? Are there ways in which a non-Ph.D. can work to build similar skills?

Oscar: You can't take too much math. Learning linear algebra, probability theory, information theory, Markov chains, differential equations, and how to do proofs, have all been very valuable to me, and very few, if any, of my colleagues without PhDs have these skills. I have seen on the order of 3-10 people in the hundreds to ~thousand that I've interacted with, that picked these up on their own, so doing so is clearly possible. It is hard to say if getting a PhD helps learn those things: perhaps the same people who learned them with a PhD would have done so without. A safe bet seems to be structured education to pick up such classical mathematics.

In your experience, are there potential liabilities that come with getting a Ph.D.? Do doctoral students learn habits that have to be reversed for them to become successful in industry?

Oscar: One problem with the industry/academia divide is that each has a caricatured picture of the other. Academics fear entering industry means becoming a "code monkey" and often disparage strong coding as a skill. I think this is to the detriment of academia as coding is perhaps as powerful a tool as mathematics. Yet, many academics muddle through coding so much that the assumption by teams hiring academics is they will have to unlearn a lot of bad habits if they join somewhere like Twitter, Facebook, Google, etc… This assumption often means that hiring committees are a bit skeptical of an academic's ability to actually be productive. This skepticism must either be countered with strong coding in an interview or some existing coding project that gives evidence of skill.

You didn't really ask much about the career path of academia vs industry. I did want to address that a bit. First, those paths are much more similar than most people realize. As a professor, on average, your colleagues are brighter, and that is exciting. But academia today is very focused on fund raising, and that fund raising is involves a lot of politics and sales-personship. In the software industry today, one has a lot of perks: great salary, lots (even unlimited) vacation time, the freedom to focus on the things you enjoy the most (compare to being a professor and doing 3-4 different jobs concurrently). As a professor, you are running a startup that can never be profitable: you are always raising money and hiring. The caliber of the very best in industry is also just as high or higher than academia (though the mean may be lower). I much prefer my job at Twitter to my time in academia.

Oscar Boykin is data scientst at Twitter. He earned a Ph.D. in Physics at UCLA. You can find his scholarly work on Google Scholar. You can find him on Twitter and Github.

Joe GregorioIPython and Curves

So I've been impressed with IPython so far, it can be a little fiddly to install, but given the power I'm not going to complain. I'm now working on graphics and in trying to get up to speed going back and learning, or relearning, some basics. Today was Bézier curves, and thus this IPython notebook. Note that the content there isn't actually educational, you should follow the links provided to really learn about these constructs, I just wanted an excuse to try out LaTeX and plot some interesting graphs.

Tim HopperRosetta Stone for Stochastic Dynamic Programming

In grad school, I worked on what operations researchers call approximate dynamic programming. This field is virtually identical to what computer scientists call reinforcement learning. It's also been called neuro-dynamic programming. All these things are an extension of stochastic dynamic programming, which is usually introduced through Markov decision processes.

Because these overlapping fields have been developed in different disciplines, mathematical notation for them can vary dramatically. After much confusion and frustration, I created this Rosetta Stone to help me keep everything straight. I hope others will find it useful as well. A PDF and the original LaTeX code are available on my Github.

Bertsekas Sutton and Barto Puterman Powell
Stages $k$ $t$ $t$ $t$
First Stage $N$ $1$ $1$ 1
Final Stage $0$ $T$ $N$ $T$
State Space $S$ $\mathcal
State $i$, $i_ $s$ $s$ $s$, $S_
Action Space $U(i)$ $A=\cup_ $\mathcal
Action $a$ $a$ $a$
Policy $\mu_ $\pi(s,a)$, $\pi$ $\pi$, $d_ $\pi$
Transitions $p_ $\mathcal $p_ $\mathbb
Cost $g(i,u,j)$ $\mathcal $r_t(s,a)$ $C_
Terminal Cost $G(i_ $r_ $r_ $V_
Discount $\alpha$ $\gamma$ $\lambda$ $\gamma$
Q-Value (Policy) $J_ $\mathcal
Q-Value (Optimal) $\mathcal
Value (Policy) $J_ $V^ $u_ $V_
Value (Optimal) $J_ $V^ $u_ $V_
Bellman Operator $T$ $\mathscr $\mathcal

Optimal Value Function

  • Bertsekas

$$J_{k}^{*}=\min_{u\in U(i)}\sum_{j=1}^{n}p_{ij}(u)\left(g(i,u,j)+\alpha J^{*}_{k-1}(j)\right)$$

  • Sutton and Barto

$$V^{*}(s)=\max_{a}\mathcal{P}^{a}_{ss'}\left[\mathcal{R}^{a}_{ss'}+\gamma V^{*}(s')\right]$$

  • Puterman

$$u_{t}^{*}(s_{t})=\max_{a\in A_{s_{t}}}\left\{r_{t}(s_{t},a)+\sum_{j\in S}p_{t}(j\,|\,s_{t},a)u_{t+1}^{*}(j)\right\}$$

  • Powell

$$V_{t}(S_{t})=\max_{a_{t}}\left\{C_{t}(S_{t},a_{t})+\gamma\sum_{s'\in \mathcal{S}}\mathbb{P}(s'\,\vert\,S_{t},a_{t})V_{t+1}(s')\right\}$$


  • D.P. Bertsekas. Dynamic Programming and Optimal Control. Number v. 2 in Athena Scientific Optimization and Computation Series. Athena Scientific, 2007. ISBN 9781886529304. URL.
  • W.B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics. John Wiley & Sons, 2011. ISBN 9781118029152. URL.
  • M.L. Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley series in probability and statistics. Wiley-Interscience, 1994. ISBN 9780471727828. URL.
  • R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. Mit Press, 1998. ISBN 9780262193986. URL.

Tim HopperShould I Do a Ph.D.? Mike Nute

A day late, but never a dollar short, the next contributor to my series on whether or not a college senior should consider getting a Ph.D. is Mike Nute, a current Ph.D. student. Mike scoffed at my banal interview format and instead wrote an open letter to a hypothetical student.

Dear Young Student,

First, as far as whether you should do the Ph.D. program, you should think long and hard about it because there is a lot of upside and a lot of downside. Here are some things in particular to should think about:

1) Going to a Ph.D. program straight from undergrad and continuing essentially the same field of study is a lot like marrying your high school sweetheart. Certainly for some people that works out ok, but for many others in ends in a terrible divorce. In the case of the Ph.D. program, you should bear in mind that the academic world and the real world are two very different places, and that up until this point you have only seen the academic world, so you might very well enjoy the real world just as much. Bear in mind also that the professors and advisors you have dealt with up to this point tend to be people who have thrived in the academic world and their advice will come from that lens. So do your best to get at least some exposure to the variety of jobs out there where you could apply the discipline. You should especially do this if the reason that you think you'd like to get a Ph.D. is to enter academia afterward. Take a little Rumspringa from that life before you enter the order.

2) If you find yourself thinking about the Ph.D. program as a means to an end, like as a requirement for some job, then you should strongly consider whether there is an easier way. More specifically, if you don't think that getting the Ph.D. is going to be fun on its own, then there's a strong chance you'll be miserable and it will end badly. If you want to be a professor, then getting a Ph.D. is really the only way to go about it, but for virtually any other goal there is probably an alternative that doesn't require the same sacrifice.

3) The way you should be thinking about the program is like this: it's a chance to spend five years doing nothing but studying your favorite subject in great depth while keeping adequate health insurance and avoiding debt. You really won't have either the time or the money to do very much else, so you had better really love this subject. You remember that episode of the Simpsons where the devil's ironic punishment for Homer is to force feed him an endless stream of donuts, but gets frustrated because Homer never stops loving the donuts? Well you have to love your subject like Homer loves donuts, because that's going to be you. If you do love it though, you won't even notice being broke or studying all the time. 

4) In fact, you should come to grips right now with the fact that you may finish your Ph.D. and find that you want to change careers and never revisit that subject again. That may sound unimaginable, but it's possible mainly because you're 22 and who the hell knows what you'll want when you're 28. If that happens though, will you look back on having gotten a Ph.D. as a terrible decision and a waste of time? If so, then don't do it. If you think you'll be proud of it and will have enjoyed yourself no matter what, then it's actually a low risk move because that's basically the worst case scenario.

5) You should also note that there is a major difference between a Masters and Ph.D. program. First of all, the Ph.D. will be much more intense, even in the first two years, than a Masters. Since most Masters programs are not funded but Ph.D.s are, you can think of it as the difference between being an employee and being a customer. But on the other hand, most industry jobs are as happy to have you with a Masters as with a Ph.D., so you can easily use the extra years of work to pay off the loans from the Masters program. This reinforces the last point above: the only real reason to do a Ph.D. program is for love of the subject. 

In my case, I came back to grad school after seven years in industry. From my experience you can take two lessons: 1) try to avoid waiting seven years to go back if you can, and 2) if you do wait that long, just know that it's never too late. The longer you wait, the harder the sacrifice of time and money will be. But you gotta do what's going to make you happy, and there are a lot of ways to be happy with a job. You can literally enjoy being at work every day, you can do something that benefits others, you can do something very challenging, or you can do something that enables you to enjoy other parts of your life, like gives you schedule flexibility or a sufficiently high salary. There are others too. In my case, I had a very tiny bit of all of those but not enough of any one to really count, which is why it took me so long to leave. Grad school though is challenging, and it makes me proud as hell to tell people about because I know how hard it is. So think about which of those is the most important to you, and plan accordingly. 

So anyway, good luck young man or lady, and don't stress about it too hard; you can always change your mind later.


Mike Nute

Mike Nute is a recovering actuary and a current Ph.D. student in statistics at University of Illinois at Urbana-Champaign.

Caktus GroupManaging Events with Explicit Time Zones

Recently we wanted a way to let users create real-life events which could occur in any time zone that the user desired. By default, Django interprets any date/time that the user enters as being in the user’s time zone, but it never displays that time zone, and it converts the time zone to UTC before storing it, so there is no record of what time zone the user initially chose. This is fine for most purposes, but if you want to specifically give the user the ability to choose different time zones for different events, this won’t work.

One idea I had was to create a custom model field type. It would store both the date/time (in UTC) and the preferred time zone in the database, and provide a form field and some kind of compound widget to let the user set and see the date/time with its proper time zone.

We ended up with a simpler solution. It hinged on considering the time zone separately from a time. In our case, we would set a time zone for an event. Any date/time fields in that event form would then be interpreted to be in that time zone.

Now, displaying a time in any time zone you want isn't too hard, and we weren't worried about that. More troublesome was letting a user enter an arbitrary time zone in one form field, and some date and time in other fields, and interpreting that date and time using the chosen time zone when the form was validated. Normally, Django parses a date/time form field using the user's time zone and gives you back a UTC - all time zone information is lost.

We started by defining a custom form to validate entry of time zone names:

class TimeZoneForm(forms.Form):
    Form just to validate the event timezone field
    event_time_zone = fields.ChoiceField(choices=TIME_ZONE_CHOICES)

Then in our view, we processed the submitted form in two steps. First, we got the time zone the user entered.

from django.utils import timezone

def view(request):

    if request.method == 'POST':
        tz_form = TimeZoneForm(request.POST)
        if tz_form.is_valid():
            tz = tz_form.cleaned_data['event_time_zone']

Then, before handling the complete form, we activated that time zone in Django, so the complete form would be processed in the context of that event's time zone:

from django.utils import timezone

def view(request):

    if request.method == 'POST':
        tz_form = TimeZoneForm(request.POST)
        if tz_form.is_valid():
            tz = tz_form.cleaned_data['event_time_zone']
            # Process the full form now

When displaying the initial form, we activate the event's time zone before constructing the form, so those times are displayed using the event's time zone:

    # assuming we have an event object already
    # Continue to create form for display on the web page


What we just showed is a simplification of our actual solution, because we were using the Django admin to add and edit events, not custom forms. Here's how we customized the admin.

First, we wanted to display event times in a column in the admin change list, in their proper time zones. That kind of thing is pretty easy in the admin:

from pytz import timezone as pytz_timezone

class EventAdmin(admin.ModelAdmin):
    list_display = [..., 'event_datetime_in_timezone', ...]

    def event_datetime_in_timezone(self, event):
        """Display each event time on the changelist in its own timezone"""
        fmt = '%Y-%m-%d %H:%M:%S %Z'
        dt = event.event_datetime.astimezone(pytz_timezone(event.event_time_zone))
        return dt.strftime(fmt)
    event_datetime_in_timezone.short_description = _('Event time')

This uses pytz to convert the event's time into the event's time zone, then strftime to format it the way we wanted it, including the timezone.

Next, when adding a new event, we wanted to interpret the times in the event's time zone. The admin's add view is just a method on the model admin class, so it's not hard to override it, and insert the same logic we showed above:

class EventAdmin(admin.ModelAdmin):
    # ...

    # Override add view so we can peek at the timezone they've entered and
    # set the current time zone accordingly before the form is processed
    def add_view(self, request, form_url='', extra_context=None):
        if request.method == 'POST':
            tz_form = TimeZoneForm(request.POST)
            if tz_form.is_valid():
        return super(EventAdmin, self).add_view(request, form_url, extra_context)

That handles submitting a new event. When editing an existing event, we also need to display the existing time values according to the event's time zone. To do that, we override the change view:

class EventAdmin(admin.ModelAdmin):
    # ...

    # Override change view so we can peek at the timezone they've entered and
    # set the current time zone accordingly before the form is processed
    def change_view(self, request, object_id, form_url='', extra_context=None):
        if request.method == 'POST':
            tz_form = TimeZoneForm(request.POST)
            if tz_form.is_valid():
            obj = self.get_object(request, unquote(object_id))
        return super(EventAdmin, self).change_view(request, object_id, form_url, extra_context)

One more thing. Since the single time zone field is applied to all the times in the event, if someone changes the time zone, they might need to also adjust one or more of the times. As a reminder, we added help text to the time zone field:

event_time_zone = models.CharField(
    help_text=_('All times for this event are in this time zone. If you change it, '
                'be sure all the times are correct for the new time zone.')

Thanks to Vinod Kurup for his help with this post.

Caktus GroupCaktus is hiring a Web Designer-Contractor

Caktus is actively seeking local web design contractors in the North Carolina Triangle area. We’re looking for folks who can contribute to our growing team of designers on a per-project basis. Our team is focused on designing for the web using HTML5, CSS3, LESS, and responsive design best practices. We take an iterative approach with our clients involving them early and often. So, if you’re a designer looking for some extra work and want to sit in with our sharp team check out the job posting. It has more information about the types of projects you would be working on and some of the skills we hope to find in your toolbox. If it sounds like a good fit, drop us a line—we’d love to chat!

Josh JohnsonBuilding A DNS Sandbox

I’m developing some automation around DNS. Its imperative that I don’t break anything that might impact any users. This post documents the process I went through to build a DNS sandbox, and serves as a crash-course in DNS, which is, like most technology that’s been around since the dawn of time, a lot simpler than it seems at first glance.

Use Case

When a new machine is brought up, I need to register it with my internal DNS server. The DNS server doesn’t allow remote updating, and there’s no DHCP server in play (for the record, I’m bringing up machines in AWS). I have access to the machine via SSH, so I can edit the zone files directly. As SOP, we use hostdb and a git repository. This works really well for manual updates, but it’s a bit clunky to automate, and has one fatal flaw: there can be parity between what’s in DNS and what’s in the git repo. My hope is to eliminate this by using the DNS server as the single source of truth, using other mechanisms for auditing and change-management.

So the point of the DNS sandbox is to make testing/debugging of hostdb easier and to facilitate rapid development of new tools.


Please be aware that this setup is designed strictly for experimentation and testing – it’s not secure, or terribly useful outside of basic DNS functionality. Please take the time to really understand what it takes to run a DNS server before you try to set something like this up outside of a laboratory setting.

Contributions Welcome

If you have any trouble using this post setting up your own DNS sandbox, please leave a comment.

If you have any suggestions or corrections, again, leave a comment!

Together we can make this better, and help make it easier to put together infrastructure for testing for everybody.

Coming Soon

There are a lot of features of DNS that this setup doesn’t take into account. I’m planning on following up this post as I add the following to my sandbox setup (suggestions for other things are welcome!)

  • Remote updates (being able to speak DNS directly to the server from a script)
  • Chrooting the bind installation for security.
  • DNS Security (DNSSEC) – makes remote updates secure.
  • Slaves – DNS can be configured so that when one server is updated, the changes propagate to n number of servers.

Server Setup

I started with stock Ubuntu 12.04 server instance.

I installed bind9 using apt, and created a couple of directories for log and zone files.

$ sudo apt-get install bind9
$ sudo mkdir -p /var/log/named
$ sudo chown bind:bind /var/log/named
$ sudo mkdir /etc/bind/zones

Zone Files


See: http://www.zytrax.com/books/dns/ch8

  • Zone files have a header section called the SOA
  • Fully-qualified domain names end with a period (e.g. my.domain.com.), subdomains relative to an FDQN do not (e.g. my, www).
  • Zone files have an origin – a base added to every name in the file. The origin is defined using the $ORIGIN directive (see: http://www.zytrax.com/books/dns/ch8/#directives)
  • The @ symbol is used as a shortcut for $ORIGIN.
  • Each zone file is referenced in the named.conf file (see Configuration below for details)
  • The name of the file itself is immaterial – there are many standards in the wild – I’m opting to keep them consistent with the name of the zone in named.conf.
  • Zone files have a serial number, which consists of the current date and an increment number. Example: 20131226000 (YYYYMMDDXXX). This number must be incremented every time you make a change to a zone file, or bind will ignore the changes.
  • Comments start with a semi-colon (;) and run to the end of the line.

DNS lookups can happen in two directions: forward and reverse. Forward lookups resolve a doman name to an IP address (my.domain.com -> Reverse lookups resolve an IP address to a domain name ( -> my.domain.name). Each type of lookup is controlled from a separate zone file, with different types of records.

See http://www.zytrax.com/books/dns/ch8/#types for details about the different types of records. This post only deals with SOA, NS, A, CNAME and PTR records.

Note that reverse lookup is not required for a functioning DNS setup, but is recommended.

Forward Lookup


Our domain name is example.test.

$ORIGIN example.test.
$TTL 1h

@    IN    SOA    ns.example.test.    hostmaster.example.test. (
    20131226000  ; serial number
             1d  ; refresh
             2h  ; update retry
             4w  ; expiry
             1h  ; minimum

@               NS     ns

ns               A
box1             A
alt              CNAME  box1

  • Line 1 sets the origin. All entries will be a subdomain of example.test. You can put whatever you want in this stanza, but keep it consistent in the other areas.
  • Line 2 sets the Time To Live for records in this zone.
  • Lines 4-10 are the SOA.
  • On line 5, We use @ to stand in for the $ORIGIN directive defined on line 1. We specifify the authoritative server (ns.example.test.), which we will define in an A record later. Finally, we specify the e-mail address of a person responsible for this zone, replacing the at symbol (@) with a period.
  • Line 5 contains the serial number. This will need to be incremented every time we make a change. In this example, I’m starting with the current date and 000, so we’ll get 999 updates before we have to increment the date.
  • Line 12 is a requirement of Bind – we must specify at least one NS record for our DNS server. The @ symbol is used again here to avoid typing the origin again. The hostname for the NS record is ns, which means ns.example.test, defined in an A record on line 14.
  • Line 14 defines our DNS server for the NS record on line 12. We’re using localhost here to point back to the default setup we got from using the ubuntu packages.
  • Line 15 is an example of another A record, for a box named box1.example.test. Its IP address is Note that the actual IP addresses here do not need to be routable to the DNS server; all it’s doing is translating a hostname to an IP address. For testing purposes, this can be anything. Just be aware that reverse lookups are scoped to a given address range, so things will need to be consistent across the two zones.
  • Finally on line 16, we have an example of a CNAME record. This aliases the name alt.example.test to box1.example.test, and ultimately resolves to
  • Reverse Lookup


    We’re setting up reverse lookups for the 192.168.0.x subnet (CIDR

    $ORIGIN 0.168.192.in-addr.arpa.
    $TTL 1h
    @   IN  SOA     ns.example.test     hostmaster.example.test (
            20131226000  ; serial number
                     1d  ; refresh
                     2h  ; update retry
                     4w  ; expiry
                     1h  ; minimum
        IN      NS      ns.example.test.
    1   IN      PTR     box1.example.test
    • Lines 1-10 are the SOA, and are formatted the exact same way as in our forward zone file.

      Note that the $ORIGIN is now 0.168.192.in-addr.arpa.. The in-addr.arpa domain is special; used for reverse lookups. The numbers before the top level domain are simply the subnet octets, reversed (192.168.0 becomes 0.168.192).

      Remember, this serves as shorthand for defining the entry records below the SOA.

    • Line 12 is the required NS record, pointing at the one that we set up an A record for in the forward zone file.
    • Finally, line 13 is a typical PTR record. It associates with box1.example.test.


    In the default ubuntu setup, local configuration is handled in /etc/bind/named.conf.local (this is just simply included into /etc/bind/named.conf).

    See http://www.zytrax.com/books/dns/ch7/ for details about the named.conf format and what the directives mean.

    zone "example.test." {
            type master;
            file "/etc/bind/zones/example.test";
            allow-update { none; };
    zone "0.168.192.in-addr.arpa." {
            type master;
            file "/etc/bind/zones/0.168.192.in-addr.arpa";
            allow-update { none; };
      channel simple_log {
        file "/var/log/named/bind.log" versions 3 size 5m;
        severity debug;
        print-time yes;
        print-severity yes;
        print-category yes;
      category default{
    • Lines 1-5 set up our forward zone “example.test.”. Note that allow-update is set to none. This simplifies our configuration and prevents updates to this zone from other servers.
    • Lines 7-11 set up the reverse zone “0.168.192.in-addr.arpa.”.
    • Lines 13-24 set up simple (and verbose) logging to /var/log/named/bind.log. See http://www.zytrax.com/books/dns/ch7/logging.html for details about the setting here.


    Configuration Syntax Check

    We can use the named-checkzone utility to verify our zone file syntax before reloading the configuration.

    You specify the name of the zone and then the filename (the -k fail parameter causes it to return a failed return code when an error is found, useful for automated scripts):

    $ named-checkzone -k fail example.test /etc/bind/zones/example.test
    zone example.test/IN: loaded serial 2951356816

    In the case of a reverse zone file:

    $ named-checkzone -k fail 0.168.192.in-addr.arpa /etc/bind/zones/0.168.192.in-addr.arpa
    zone 0.168.192.in-addr.arpa/IN: loaded serial 2951356817

    Reloading Config

    Configuraiton can be reloaded with the rndc reload command.

    $ sudo rndc reload

    It’s helpful to run tail -f /var/log/named/bind.log in another terminal window during testing.

    Testing DNS Queries

    The definitive tool is dig. nslookup is also useful for basic queries.

    With both tools, its possible to specify a specific DNS server to query. In this case, it’s assumed that we’re logged in to the sandbox DNS server, so we’ll use for the server to query.

    With dig

    Note: remove the +short parameter from the end of the query to get more info.

    Forward Lookup

    The A record:

    $ dig @ box1.example.test +short

    The CNAME:

    $ dig @ alt.example.test +short

    Reverse Lookup

    $ dig @ -x +short

    With nslookup

    Forward Lookup

    The A record:

    $ nslookup box1.example.test
    Name:	box1.example.test

    The CNAME:

    $ nslookup alt.example.test
    alt.example.test	canonical name = box1.example.test.
    Name:	box1.example.test

    Reverse Lookup

    $ nslookup
    Address:	name = box1.example.test.0.168.192.in-addr.arpa.

    Using Your Sandbox

    Now that the DNS sandbox is built and working correctly, you may want to add it
    to your list of DNS servers.

    This process will vary depending on what operating system you use, and is an
    exercise best left to the user. However, here are some pointers:

    Note: depending on your setup, you will likely need to put your sandbox DNS server
    first in the list.

    Mac OS X: https://www.plus.net/support/software/dns/changing_dns_mac.shtml

    Ubuntu: http://www.cyberciti.biz/faq/ubuntu-linux-configure-dns-nameserver-ip-address/

Joe GregorioThe shiny parabolic broadcast spreader of vomit

Due to suspected food poisoning I checked into the local emergency room last night around 2 AM, trusty 13 gallon plastic garbage bag in hand, because, I've been throwing up. Once they get me into a room the nurse offers me a shallow pink plastic pan in exchange for my plastic garbage bag, and I'm thinking to myself, "Really, have you never seen anyone vomit in your entire life?". Why on earth would you offer me a shallow round bottom plastic pan as an alternative to a 13 gallon plastic garbage bag? This is vomit we're trying to contain here. This reminds me of a previous visit to the same ER with my son when he had appendicitis, this time we came in with a kitchen garbage pail and the nurse laughed at us and handed him a small metal kidney dish. My son held it in his hands, looking at it perplexedly for about five seconds before he started vomiting again, turning it from a kidney dish into a shiny metal parabolic broadcast spreader of vomit.

I don't know what to make of this phenomenon, as I thought that working in an ER would expose you to a lot of puking people, and thus you'd know better than to give someone a kidney dish to throw up in. I can only come up with two possible conclusions, the first that the ER folks are just aren't quick learners and haven't picked up on the association:

kidney dish : vomit
  :: broadcast spreader : grass seed

The other possibility is that my family is unique, maybe my wife and I are both descended from long lines of projectile vomitters, a long and honorable tradition of high velocity emesis, and that the rest of population is filled with polite, low volume, low velocity vomitters. If so, you people make me sick.

Joe GregorioSnow

Reilly has been learning Javascript, and one of the projects he wanted to do was a snow simulation. I guess growing up in the south snow is a rare and wonderous event for him.

Tim HopperNoisy Series and Body Weight, Take 2

Back in July, I posted some analysis of my attempt at weight loss. Now that I'm four months further down the line, I thought I'd post a follow-up.

I continue to be fascinated with how noisy my weight time series is. While I've continued to lose weight over time, my weight goes up two out of five mornings.

Here's a plot of the time series of my change in weight. Note how often the change is positive, i.e. I appear to have gained weight:

This volatility can hide the fact that I'm making progress! When I put a regression line through the points, you can see that the average change slightly below zero:2

I have wondered recently if my average change in weight is correlated with the day of the week. My hypothesis is that my weight tends to go up over the weekends, so I created a boxplot of my change in weight categorized by day.

Indeed, on Sundays and Mondays (i.e. weight change from Saturday morning to Sunday morning and Sunday morning to Monday morning) my median weight change is slightly above zero. This makes sense to me: on Saturdays, I'm more likely to be doing things with friends, and thus I have less control over my meals.1

I wish I had a good explanation for why the change on Friday is so dramatic, but I don't. Any guesses?

  1. Also, beer. 

  2. I mentioned this to my college roommate who is a financial planner. He noted how similar this is to investing; it's a constant battle for him to convince his clients to look at average behavior instead of daily changes.  

Josh JohnsonCan Scrum Scale Down?

Prompted by a discussion on a LinkedIn group, I was reminded of a presentation deck I put together a couple of years ago to capture what my cohorts and I were doing for project management at the time.

The short answer is: “why yes, yes scrum can most certainly scale down”. How far down? I think with the right frame of mind, it can scale down to a singe individual.

Here are the slides as they stand. They’re a few years old, entitled “The blitZEN Method”.

I’ve seen this process work, in practice, with as few as 2 people. It’s worked with a cross-functional team of 5. I’ve applied the concepts to individual work as well. So, I’d say I’ve proven it can scale down. But is it really Scrum?

When it was written, the only experience I had with Agile development methodologies was on the team from which The blitZEN Method was born. It shows a bit in my terminology, and the simplicity of the overall approach.

Since then, I’ve left UNC and I’ve worked in several different so-called “Agile” shops. I’ve yet to see any as effective as The blitZEN Method was. These were organizations filled with a zeal for core Agile values, who had consultants and coaches and trainers – folks paid unfathomable amounts of money, all for nothing. At best, people would bypass core values just to get work done. At worst, low-quality code would get rushed through to production in spite of it all – ceremony for the sake of ceremony. Not just Scrum – I’ve seen Kanban fail too.

So, maybe everyone is just doing it wrong. Maybe there’s something really special about The blitZEN Method. Maybe the people I worked with at the time were what was really special. It’s a tough call, even in hindsight.

When you’ve seen something evolve into a proven methodology, and you go out in the world that spawned it, and find yourself constantly bombarded with contradictory information from highly dogmatic sources, you start to wonder what happened – is it me, or is it Agile? Are we all kidding ourselves?

Take a look, and let me know what you think. Feedback is greatly appreciated. I’m especially interested in hearing of any applications of the approach – I haven’t been in a position to try myself for several years.

I hope to update these slides soon, I would love to incorporate your insights.

Note: I’m also working on a project to expand on some of the concepts – it’s been on my back burner for a while, but keep an eye on How I Develop Web Apps in my github.

Tim HopperTweeting Primes

I recently discovered the Twitter account @primes. Every hour, they tweet the subsequent prime number. This made me wonder two things. First, what is the largest prime that you can tweet (in base-10 encoding in 140 characters).1 Second, how long until they get there.

Doing some quick calculations in Mathematica, I believe the largest 140 digit prime is the following:

9999999999999999999999999999999999999999999999 9999999999999999999999999999999999999999999999 999999999999999999999999999999999999999999999997

Wolfram Alpha confirms that this is prime and that the next prime is 141 characters.

As for how long it would take, recall that the number of primes less than $n$ is approximately $\frac{n}{\ln n}$. The number of primes less than $10^141$ is approximately

$$\pi(10^140) = \frac{10^140}{140\cdot \ln 10} = 3.1\cdot 10^{137}.$$

That's $3\cdot 10^{57}$ times the estimated number of atoms in the universe. Looks like @primes should be able to tweet for a while.

  1. The largest known prime is $2^{57,885,161} − 1$ and has 17,425,170 digits.  

Caktus GroupUsing strace to Debug Stuck Celery Tasks

Celery is a great tool for background task processing in Django. We use it in a lot of the custom web apps we build at Caktus, and it's quickly becoming the standard for all variety of task scheduling work loads, from simple to highly complex.

Although rarely, sometimes a Celery worker may stop processing tasks and appear completely hung. In other words, issuing a restart command (through Supervisor) or kill (from the command line) doesn't immediately restart or shutdown the process. This can particularly be an issue in cases where you have a queue setup with only one worker (e.g., to avoid processing any of the tasks in this queue simultaneously), because then none of the new tasks in the queue will get processed. In these cases you may find yourself resorting to manually calling kill -9 <pid> on the process to get the queue started back up again.

We recently ran into this issue at Caktus, and in our case the stuck worker process wasn't processing any new tasks and wasn't showing any CPU activity in top. That seemed a bit odd, so I thought I'd make an attempt to discover what that process was actually doing at the time that it became non-responsive. Enter strace.

strace is a powerful command-line tool for inspecting running processes to determine what "system calls" they're making. System calls are low level calls to the operating system kernel that might involve accessing hard disks, the network, creating new processes, or other such operations. First, let's find the PID of the celery process we're interested in:

ps auxww|grep celery

You'll find the PID in the second column. For the purposes of this post let's assume that's 1234. You can inspect the full command in the process list to make sure you've identified the right celery worker.

Next, run strace on that PID as follows:

sudo strace -p 1234 -s 100000

The -p flag specifies the PID, and the -s flag specifies the size of the output. By default it's limited to 32 characters, which we found isn't very helpful if the system call being made includes a long string as an argument. You might see something like this:

Process 1234 attached - interrupt to quit
write(5, "ion id='89273' responses='12'>\n     ...", 37628

In our case, the task was writing what looked like some XML to file descriptor "5". The XML was much longer and at the end included what looked like a few attributes of a pickled Python object, but I've shortened it here for clarity's sake. You can see what "5" corresponds to by looking at the output of lsof:

sudo lsof|grep 1234

The file descriptor shows up in the "FD" column; in our version of strace, that happens to be the 4th column from the left. You'll see a bunch of files that you don't care about, and then down near the bottom, the list of open file descriptors:

python    1234   myuser    0r     FIFO                0,8      0t0    6593806 pipe
python    1234   myuser    1w     FIFO                0,8      0t0    6593807 pipe
python    1234   myuser    2w     FIFO                0,8      0t0    6593807 pipe
python    1234   myuser    3u     0000                0,9        0       4738 anon_inode
python    1234   myuser    4r     FIFO                0,8      0t0    6593847 pipe
python    1234   myuser    5w     FIFO                0,8      0t0    6593847 pipe
python    1234   myuser    6r      CHR                1,9      0t0       4768 /dev/urandom
python    1234   myuser    7r     FIFO                0,8      0t0    6593850 pipe
python    1234   myuser    8w     FIFO                0,8      0t0    6593850 pipe
python    1234   myuser    9r     FIFO                0,8      0t0    6593916 pipe
python    1234   myuser   10u     IPv4            6593855      0t0        TCP ip-10-142-126-212.ec2.internal:33589->ip-10-112-43-181.ec2.internal:amqp (ESTABLISHED)

You can see "5" corresponds to a pipe, which at least in theory ends up with a TCP connection to the amqp port on another EC2 server (host names are fictional).

RabbitMQ was operating properly and not reporting any errors, so our attention turned to the Celery task in question. Upon further examination, an object we were passing to the task included a long XML string as an attribute, which was being pickled and passed to RabbitMQ. Issues have been reported with long argument sizes in Celery before, and while it appears they should be supported, an easy workaround for us (and Celery's recommended approach) was to pass an ID for this object rather than the object itself, greatly reducing the size of the task's arguments and avoiding the risk of overwriting any object attributes.

While there may have been other ways to fix the underlying issue, strace and lsof were crucial in helping us figure out the problem. One might be able to accomplish the same thing with a lot of logging, but if your code is stuck in a system call and doesn't appear to be showing any noticeable CPU usage in top, strace can take you immediately to the root of the problem.

Josh JohnsonWhat does it mean to be a “python shop”?

Python developers: Do you call your team or business a “python shop”? If so, what do you mean? If not, why not?

Caktus GroupShipIt Day 4: SaltStack, Front-end Exploration, and Django Core

Last week everyone at Caktus stepped away from client work for a day and a half to focus on learning and experimenting. This was our fourth ShipIt day at Caktus, our first being almost exactly a year ago. Each time we all learn a ton, not only by diving head first into something new, but also by hearing the experiences of everyone else on the team.

DevOps: Provisioning with SaltStack & LXC+Vagrant

We have found SaltStack to be useful in provisioning servers. It is a Python based tool for spinning up and configuring servers with all of the services that are needed to run your web application. This work is a natural progression from the previous work that we have done at Caktus in deploying code in a consistent way to servers. SaltStack really shines with larger more complicated systems designed for high availability (HA) and scaling where each service runs on its own server. Salt will make sure that the setup is reproducible.

This is often important while testing ideas for HA and scaling. The typical cycle looks like:

  • Work on states in Salt

  • Run Salt through Fabric building the whole system, locally through vagrant, on a cloud provider, or on physical hardware.

  • Pound on the system using benchmarking tools narrowing in on bottlenecks and single points of failure.

  • Start fixing the problems you uncovered in your states starting the cycle over again.

Victor, Vinod, David, and Dan all worked on learning more about SaltStack through scratching different itches they’ve felt while working on client projects. Some of the particular issues folks looked at included understanding the differences between running Salt with and without a master, how to keep passwords safe on the server while sharing them internally on the development team, and setting up Halite, a new web interface for Salt.

In order to test these complicated server configurations, we often rely on Vagrant to run these full system configurations on developer’s laptops. This is the quickest way to start building out systems. The problem with this though is that our laptops are not as fast as the hardware that the services will eventually be provisioned on. In order to reduce the time waiting in the code-rebuild-test cycle, Scott our friendly system administrator delved into running Vagrant with LXC containers on the backend with our development laptops. LXC containers are more lightweight than VirtualBox virtual machines and can be created more quickly on the fly. This involved learning about how to upgrade the development laptop’s kernels on Ubuntu long term support image. Scott was successful and there was a response of “Oooh” and “Ahhh” from the developers when he demoed the speed of creating a new VM with LXC through vagrant.

Front-end Web + Visualizations: Realtime Web, Cellular Automata, Javascript

MVC, & SVG Splines

Caktus has a growing team of front-end developers and folks interested in user interaction. There were a number of projects this ShipIt day that explored different tools for designing visualizations and user experiences.

Mark dove into WebRTC building a new project, rtc- demo with the actual demo hosted on Github static pages (note: it only works on recent Firefox builds so far). It’s neat that this is hosted on a static web assets host since the project does allow users to interact with one another. Red & Blue users go to the static site and create links that they share with one another and connect directly through their browsers without any server in between. Both users see a tic-tac-toe board and can alternate taking turns. The moves are relayed directly to their opponent. This exploration allowed Mark to evaluate some new technologies and their support in different browsers. He was able to play around with new modes of interaction on the web where the traditional server-client model can be challenged and hybrid peer- to-peer apps can be built.

Caleb has been taking a class on programming in C outside of work and wanted to continue to work on one of his personal projects during the ShipIt day. There’s something about pointer arithmetic and syntax that Caleb finds fascinating Caleb extended some of the features of his cellular automata simulation engine, gameoflife. This included some of his rule files, initial states, and rule file engine. The result was experiments with the Wireworld simulation including a working XOR gate using simple cellular automata rules and Brian’s Brain which produces a visually interesting chaotic simulation. Caleb’s demo was a crowd pleaser with everyone enthralled by the puffers, spaceships, and gliders moving around and interacting on the screen.

Rebecca made some amazing usability improvements to our own django- timepiece. Timepiece is part of the suite of tools that we use internally for managing contracts, hours, timesheets, and schedules. Rebecca focused on some of the timesheet related workflows including verifying timesheets. She made these processes more fluid by building an API and making calls to it in using Backbone. This allowed users to do things like delete or edit entries on their timesheet before verifying it without leaving the page.

Wray was also decided to work on a front-end project. His project focused on building SVGs on the fly in raw Javascript without any supporting libraries. Wray started by showing us his passion for how to mathematically define curves and how useful the different definitions are from a user experience point of view as designer working in a vector drawing program. The easier it is for the designer to get from what they meant to draw to what is represented on the screen the better the resulting design will be. He showed a particular interest in the Spiro curve definition. Wray dug into this further by looking at the definition of curves supported by the SVG standard and built an interactive tool for drawing curves by editing the string defining the SVG element on the fly. The resulting project is still experimental at this point, but is an interesting exploration into what can be done with SVGs in Javascript without any supporting libraries.

Django Core

Karen and Tobias waded into some internals of bleeding edge of Django. In particular, Tobias offered up comments on a pull request related to ticket #21271. Tobias gave feedback to the original pull request author and Django Core committer Tim Graham on the difference between instance and class variables and when it’s appropriate to use class variables in Python.

Karen gave an illuminating talk on transitioning a particular client project to Django 1.6 from Django 1.5. Her slides are available to view online. She generalized the testing, debugging, and eventually fixing strategies she used while discussing the particular problems she encountered on the project. The strategy Karen used was running manage.py check to check for new settings that may have changed. This is a new management comment added in 1.6 and can be used from now on to ease upgrading versions.

She found the following issues when upgrading:

  • UNUSABLE_PASSWORD was part of an internal API that changed. We used this internal in the project as sometimes you must, but special care must be taken when upgrading code that relies on unstable APIs.

  • MAXIMUM_PASSWORD_LENGTH was used on this project after the recent Django security fix. Based on subsequent security updates, this is no longer needed and can be removed in 1.6.

  • BooleanFields now do not automatically default to False. This was changed to encourage coder to be explicit and to increase the level of standards within the code.

  • Django debug toolbar broke in 1.6! This is a shame, but will hopefully be updated by the time 1.6 comes out or soon after.

After this, Karen ran the project’s test suite and discovered the following remaining changes:

Karen urged us to check out the Django 1.6 release notes and remember where you’re using unsupported internals and to check and write tests for that code particularly carefully. Also, she encouraged everyone to run python -Wall manage.py test to help expose more deprecation warnings and bugs.

Estimation Process Review

Ben and Daryl, our Project Managers at Caktus, worked up a full proposal and project schedule for an internal process of reworking our estimation process. We don’t shy away from large fixed bid projects and pride ourselves on meticulous estimates informed by careful client communication and requirements gathering. Our PMs wanted to help us look at this process and make it more formal as we grow to a larger group with lots of people leading the same process for different incoming projects.

Wrap Up

We had a great time with our latest ShipIt day. Each time we learn a ton from each other and by digging into some tough technical problems. We also learn a little bit more about what makes a good ShipIt day project. There were a number of crowd pleasers in this batch of projects and while some folks decided to try a completely new library or technology others decided to stay a bit closer to home by extending learning they did not have time look into more during a client project. We had a lot of fun and are looking forward to our next ShipIt day.

Josh JohnsonStringList – Is It A String? Is It A List? Depends On How You Use It!

At my job, I had an API problem: I had a object property that could contain one or more strings. I assumed it would be a list, some of my users felt that ['value',] was harder to write than 'value'. I found myself making the same mistake. So, I solved the problem, then I took the rough class I wrote and polished it. It’s up on my github, at https://github.com/jjmojojjmojo/stringlist

So what is it?

It’s a class that tries to make a slick API when there can be one, or many values for a property. The class can be instantiated with either a single value or many. If you use many, it will act like a list. If you use a single value (a string), it will act like a string. If it’s a string, and you use the .append() method, it becomes a list. Bam.

How would I use it?

Like so (see the README for a more detailed example, and the tests for comprehensive usage):

class Thing(object):
    Arbitrary class representing a single thing with one or more roles.

    role = StringList()

    def __init__(self, role=None):
        self.role = role

# initialize with a single string value        
obj = Thing('single role')

# initialize with multiple string values
obj = Thing(('one', 'two', 'three'))

# set to a single string value
obj.roles = 'new role'

# set to many string values
obj.roles = 'primary', 'secondary'

# when it's a string, it works like a string
obj = Thing('role1')

obj.roles = obj.upper()

# convert to a list using .append()

# now its a list
obj.roles[1] == 'role2'


  • The module sports 100% test coverage.
  • There is a buildout in the repo. Just run python bootstrap.py; bin/buildout and then you can run the tests with bin/nosetests
  • It was originally developed to avoid ['h', 'e', 'l', 'l', 'o'] mistakes when a single string was used instead of a list.
  • This is the first time I’ve used descriptors in Python. Very cool.
  • It would not have been possible without this marvelous post about Python’s magic methods (big thanks to Rafe Kettler).
  • Some things were not as straight-forward as they could have been, given Python’s string implementation. Of special interest is the implementation of the __delitem__ and __iter__ methods. In both cases, the base str class doesn’t have either method, so instead of doing the sane thing and proxying the call, I had to fall back to re-doing the operation in the method.

Tim HopperSublime Text and Markdown

I have largely moved from Textmate to Sublime Text 2 for text editing. Among other reasons, Sublime Text is cross platform, and I use Windows at work and a Mac at home. I have also started writing as much as I can in Markdown.

I intended to write a blog post about using Sublime Text as a tool for writing Markdown. However, the inimitable Federico Viticci, of macstories.net, has already written that post, so I will simply refer you there.

Tim HopperWalmart's Command of Logistics

Steve Sashihara, The Optimization Edge:

What makes Walmart unique is its command of logistics. It continually deconstructs its entire supply chain, from supplier to distribution centers to customers, and treats each link as a decision point, asking a battery of microquestions: Where and how much to buy and at what price? Where to route goods? How to resupply and reorder?