A planet of blogs from our members...

Caktus GroupQ4 ShipIt Day: Dedicated to Creativity

This October, nearly everyone at Caktus took a break from their usual projects to take part in Caktus’s 8th ShipIt Day. Apart from a few diligent individuals who couldn’t afford to spare any time from urgent responsibilities, nearly everyone took a break to work and collaborate on creative and experimental projects, with the aim of trying something new and ideally seeing a project through from start to finish in the space of a day and a half.

Participants in ShipIt Day worked on a variety of projects. We all had the chance to try out Calvin’s shared music player application, SharpTunes, which within the first few hours was playing “Hooked On A Feeling”. It utilizes peer-to-peer sharing, similar to BitTorrent, to more efficiently distribute a music file to a large number of users while allowing them to simultaneously listen to a shared playlist. On his blog, he describes how he achieved proof of concept in under an hour and some later challenges with arranging playlists.

Caktus Sharp Tunes - Caktus Ship It Day

David worked on improving the UX (user experience) for Timepiece, our time-tracking system. While getting the chance to brush up on his Javascript and utilize bootstrap modals, he worked on improvements for the weekly schedule feature. The current version, although very handy, is increasingly difficult to read as the company grows and more employees’ hours are on the schedule. Therefore, David built a feature which makes it possible to view individual schedules as a modal. Rebecca provided some assistance in getting it deployed, and although it’s not quite done yet, it should save us all a lot of trouble reading the schedule from the back of big standup meetings.

Timepiece - Caktus Ship It Day

Tobias built tests for the Django Project Template. The Django Project Template makes it easy to provision new projects in Django, but testing the template itself can be difficult. Therefore, the tests, which can be run to test the template on a new server and then reports back in HipChat, should improve usability of the template.

Vinod worked on adding Django 1.7 support to RapidSMS, and with help from Dan, successfully reached his goal by the end of ShipIt Day. For next ShipIt Day, he hopes to implement Python 3 support too.

Brian set up a Clojure-based Overtone real time music environment, and although he didn’t reach his goal of using it to build new instruments, he did succeed in creating, in his own words, “some annoying tones.”

Victor and Alex collaborated on School Navigator (still a work in progress) for Code for Durham, designed to help Durham residents understand the perplexing complexity of public school options available to them. Alex imported GIS (geographic information system) data from Durham Public Schools, modeled the data, and built a backend using django-rest-framework. Victor contributed the frontend, which he built using Angular, while getting the chance to learn more about Appcache and Angular.

NC School Navigator - Caktus Ship It Day

Rebecca did some work for BRP Weather using django-pipeline, which gave her, Caleb, and Victor the opportunity to compare the pros and cons of django-compressor and django-pipeline. Although she finds the error messages with django-compressor to be a nuisance and prefers how django-pipeline handles static files, django-pipeline is not very helpful when it can not find a file and has some issues with sass files.

Michael continued designing a migraine-tracking app. He designed a simplified data entry system and did some animation design as well. The app is intended to track occurrences of migraines as well as potential triggers, such as barometric pressure and the user’s sleep patterns. Trevor also contributed some assistance with UX details.

Dan made progress on an application called Hall Monitor which he has been working on since before coming to Caktus. It accesses an office’s Google Calendars and uses string matching to check event names on calendars in order to determine who is in or out of the office. For instance, if someone has an event called “wfh” (working from home), it concludes that they are out of the office. Similarly, if someone is at an all-day event, it also logically concludes they are probably out. He demonstrated it to us, showing that it does indeed have an uncanny ability to track our presence.

Caleb set up Clojure and Quil and built an application for displaying animated lines in Quil which allows you to use Processing in Clojure. By modifying the program, the user can instantly modify the animation, creating interesting effects. He also created a Game of Life which runs in Quil (see below) and finished a Scheme implementation of Langton’s Ant in Automaton .

Animated Quil Lines - Caktus Ship It Day

Scott used the time to back up changes as well as add a KVM (keyboard, video and mouse) switch to the server rack.

Wray worked on a couple different projects. He completed a Unity tutorial which involved building a space shooter game which runs on Android, which we all got to try out. He also used the time to work on Curvemixer, which creates interesting vector graphics using curves and circles.

I took the time to write some help files for an application designed to allow medical students to test their radiology knowledge. The help files should allow students and instructors to better understand the many features in the application and writing them allowed me to practice documentation creation.

Overall, ShipIt Day was a very productive and refreshing experience for everyone, allowing us to spend time on the sorts of projects we wouldn’t usually find time to work on. Moreover, we got the chance to find new solutions to projects we may have been stuck on through collaboration.

Tim HopperSundry Links for November 17, 2014

There's no magic: virtualenv edition: I didn't really get virtualenvs until long after I started programming Python, though they're now an essential part of my toolkit. This is a great post explaining how they work.

Traps for the Unwary in Python’s Import System: "Python’s import system is powerful, but also quite complicated."

pyfmt: I recently learned about gofmt for auto-formatting Go code. Here's a similar tool for Python.

Q: Setting User-Agent Field?: A 1996 question in comp.lang.java on how to set the user agent field for a Java crawler. The signature on the question? Thanks, Larry Page

alecthomas/importmagic: Python tool and Sublime extension for automatically adding imports.

Caktus GroupSupporting Increased Healthcare Access with NCGetCovered.org

We’ve launched NCGetCovered.org, a site dedicated to helping North Carolinians gain access to health insurance. As many know, enrolling in health insurance can feel daunting. NCGetcovered.org aims to simplify that process by centralizing enrollment information and great resources like live help. The site is launching ahead of the November 15th open enrollment period for the federal healthcare exchange (healthcare.gov).

NCGetCovered.org is a testament to the hard work of the many dedicated to enrolling the uninsured. Caktus created the site on behalf of the Big Tent Coalition, a nonpartisan consortium of more than 100 organizations and 320 individuals pulled from community-based organizations, hospitals, insurance carriers, in-person assisters and non-profit organizations.

Taking the lead on this web project was our neighbor in Durham, MDC, a nonprofit dedicated to closing opportunity gaps and a Big Tent member. MDC is an incredibly forward-thinking organization and saw early on the need for a one-stop shop for health insurance enrollment information. We feel very fortunate to be MDC’s partners in increasing health insurance access in our home state.

Caktus GroupOpen Data Project in Durham - Thumbs Up to Open Government!

In exciting local news, Durham and Durham County are launching a new site dedicated to centralizing public data in Summer 2015. Their press release mentions a health sanitation app Code for Durham built as a model of civic engagement with open data. Our own co-founder and CTO, Colin Copeland, is co-captain of Code for Durham, a volunteer organization dedicated to building apps that improve government transparency.

Their press release describes the project:

“The City of Durham and Durham County Government are embarking on an open data partnership that will lay the groundwork for businesses, non-profits, journalists, universities, and residents to access and use the wealth of public data available between the two government organizations, while becoming even more transparent to the residents of Durham.”

We’re looking forward to seeing all the great apps for Durhamites that result from this big step towards open government!

Caktus GroupWe've Won Two W3 Awards for Creative Excellence on the Web!

We’re honored to announce that we’ve won two W3 Silver Awards for Creative Excellence on the Web. The awards were given in recognition of our homepage redesign and DjangonCon 2014. Many thanks to Open Bastion and, by extension, the Django Software Foundation for selecting us to build the DjangoCon website. Also many thanks to our hardworking team of designers, developers and project managers that worked on these projects: Dan, Daryl, David, Michael, Rebecca, and Trevor!

Here’s a quote from Linda Day, the director of the Academy of Interactive and Visual Arts (the sponsors of the award):

“We were once again amazed with the high level of execution and creativity represented within this year’s group of entrants. Our winners continue to find innovative and forward- thinking ways to push the boundaries of creativity in web design.”

We’re particularly humbled to learn that there were 4,000 entries this year and to be in the company of winners like Google, ESPN, Visa, and Sony and the many other wonderful companies that received recognition. We’re looking forward to continuing to build great web experiences!

The official press release: http://www.prweb.com/releases/2014-CaktusGroup/11/prweb12306675.htm

Tim HopperSundry Links for November 12, 2014

Amazon Picking Challenge: Kiva Systems (where I interned in 2011) is setting up a robotics challenging for picking items off warehouse shelves.

contexttimer 0.3.1: A handy Python context manger and decorator for timing things.

How-to: Translate from MapReduce to Apache Spark: This is a helpful bit from Cloudera on moving algorithms from Mapreduce to Spark.

combinatorics 1.4.3: Here's a Python module adding some combinatorial functions to the language.

Special methods and interface-based type system: Guido van Rossum explains (in 2006) why Python uses len(x) instead of x.len().

Caktus GroupUsing Amazon S3 to Store your Django Site's Static and Media Files

Storing your Django site's static and media files on Amazon S3, instead of serving them yourself, can make your site perform better.

This post is about how to do that. We'll describe how to set up an S3 bucket with the proper permissions and configuration, how to upload static and media files from Django to S3, and how to serve the files from S3 when people visit your site.

S3 Bucket Access

We'll assume that you've got access to an S3 account, and a user with the permissions you'll need.

The first thing to consider is that, while I might be using my dpoirier userid to set this up, I probably don't want our web site using my dpoirier userid permanently. If someone was able to break into the site and get the credentials, I wouldn't want them to have access to everything I own. Or if I left Caktus (unthinkable though that is), someone else might need to be able to manage the resources on S3.

What we'll do is set up a separate AWS user, with the necessary permissions to run the site, but no more, and then have the web site use that user instead of your own.

  • Create the bucket.
  • Create a new user: Go to AWS IAM. Click "Create new users" and follow the prompts. Leave "Generate an access key for each User" selected.
  • Get the credentials
  • Go to the new user's Security Credentials tab.
  • Click "Manage access keys",
  • Download the credentials for the access key that was created, and
  • Save them somewhere because no one will ever be able to download them again.
  • (Though it's easy enough to create a new access key if you lose the old one's secret key.)
  • Get the new user's ARN (Amazon Resource Name) by going to the user's Summary tab. It'll look like this: "arn:aws:iam::123456789012:user/someusername"
  • Go to the bucket properties in the S3 management console.
  • Add a bucket policy that looks like this, but change "BUCKET-NAME" to the bucket name, and "USER-ARN" to your new user's ARN. The first statement makes the contents publicly readable (so you can serve the files on the web), and the second grants full access to the bucket and its contents to the specified user::

    {
        "Statement": [
            {
              "Sid":"PublicReadForGetBucketObjects",
              "Effect":"Allow",
              "Principal": {
                    "AWS": "*"
                 },
              "Action":["s3:GetObject"],
              "Resource":["arn:aws:s3:::BUCKET-NAME/*"
              ]
            },
            {
                "Action": "s3:*",
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::BUCKET-NAME",
                    "arn:aws:s3:::BUCKET-NAME/*"
                ],
                "Principal": {
                    "AWS": [
                        "USER-ARN"
                    ]
                }
            }
        ]
    }
    
  • If you need to add limited permissions for another user to do things with this bucket, you can add more statements. For example, if you want another user to be able to copy all the content from this bucket to another bucket:

        {
            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::BUCKET-NAME",
            "Principal": {
                "AWS": [
                    "USER-ARN"
                ]
            }
        }
    

That will let the user list the objects in the bucket. The bucket was already publicly readable, but not listable, so adding this permission will let the user sync from this bucket to another one where the user has full permissions.

Expected results:

  • The site can use the access key ID and secret key associated with the user's access key to access the bucket
  • The site will be able to do anything with that bucket
  • The site will not be able to do anything outside that bucket

S3 for Django static files

The simplest case is just using S3 to serve your static files. In Django, we say "static files" to refer to the fixed files that we provide and serve as part of our site - typically images, css, and javascript, and maybe some static HTML files. Static files do not include any files that might be uploaded by users of the site. We call those "media files".

Before continuing, you should be familiar with managing static files, the staticfiles app, and deploying static files in Django.

Also, your templates should never hard-code the URL path of your static files. Use the static tag instead:

      {% load static from staticfiles %}
      <img src="{% static 'images/rooster.png' %}"/>

That will use whatever the appropriate method is to figure out the right URL for your static files.

The two static tags

Django provides two template tags named static.

The first static is in the static templatetags library, and accessed using {% load static %}. It just puts the value of STATIC_URL in front of the path.

The one from staticfiles ({% load static from staticfiles %}) is smarter - it uses whatever storage class you've configured for static files to come up with the URL.

By using the one from staticfiles from the start, you'll be prepared for any storage class you might decide to use in the future.

Moving your static files to S3

In order for your static files to be served from S3 instead of your own server, you need to arrange for two things to happen:

  1. When you serve pages, any links in the pages to your static files should point at their location on S3 instead of your own server.
  2. Your static files are on S3 and accessible to the web site's users.

Part 1 is easy if you've been careful not to hardcode static file paths in your templates. Just change STATICFILES_STORAGE in your settings.

But you still need to get your files onto S3, and keep them up to date. You could do that by running collectstatic locally, and using some standalone tool to sync the collected static files to S3, at each deploy. But we won't be able to get away with such a simple solution for media files, so we might as well go ahead and set up the custom Django storage we'll need now, and then our collectstatic will copy the files up to S3 for us.

To start, install two Python packages: django-storages (yes, that's "storages" with an "S" on the end), and boto:

    $ pip install django-storages boto

Add 'storages' to INSTALLED_APPS:

    INSTALLED_APPS = (
          ...,
          'storages',
     )

If you want (optional), add this to your common settings:

    AWS_HEADERS = {  # see http://developer.yahoo.com/performance/rules.html#expires
        'Expires': 'Thu, 31 Dec 2099 20:00:00 GMT',
        'Cache-Control': 'max-age=94608000',
    }

That will tell boto that when it uploads files to S3, it should set properties on them so that when S3 serves them, it'll include those HTTP headers in the response. Those HTTP headers in turn will tell browsers that they can cache these files for a very long time.

Now, add this to your settings, changing the first three values as appropriate:

    AWS_STORAGE_BUCKET_NAME = 'BUCKET_NAME'
    AWS_ACCESS_KEY_ID = 'xxxxxxxxxxxxxxxxxxxx'
    AWS_SECRET_ACCESS_KEY = 'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy'

    # Tell django-storages that when coming up with the URL for an item in S3 storage, keep
    # it simple - just use this domain plus the path. (If this isn't set, things get complicated).
    # This controls how the `static` template tag from `staticfiles` gets expanded, if you're using it.
    # We also use it in the next setting.
    AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME

    # This is used by the `static` template tag from `static`, if you're using that. Or if anything else
    # refers directly to STATIC_URL. So it's safest to always set it.
    STATIC_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN

    # Tell the staticfiles app to use S3Boto storage when writing the collected static files (when
    # you run `collectstatic`).
    STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Only the first three lines should need to be changed for now.

CORS

One more thing you need to set up is CORS. CORS defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. Since we're going to be serving our static files and media from a different domain, if you don't take CORS into account, you'll run into mysterious problems, like Firefox not using your custom fonts for no apparent reason.

Go to your S3 bucket properties, and under "Permissions", click on "Add CORS Configuration". Paste this in:

    <CORSConfiguration>
        <CORSRule>
            <AllowedOrigin>*</AllowedOrigin>
            <AllowedMethod>GET</AllowedMethod>
            <MaxAgeSeconds>3000</MaxAgeSeconds>
            <AllowedHeader>Authorization</AllowedHeader>
        </CORSRule>
    </CORSConfiguration>

I won't bother to explain this, since there are plenty of explanations on the web that you can Google for. The tricky part is knowing you need to add CORS in the first place.

Try it

With this all set up, you should be able to upload your static files to S3 using collectstatic:

    python manage.py collectstatic

If you see any errors, double-check all the steps above.

Once that's successful, you should be able to start your test site and view some pages. Look at the page source and you should see that the images, css, and javascript are being loaded from S3 instead of your own server. Any media files should still be served as before.

Don't put this into production quite yet, though. We still have some changes to make to how we're doing this.

Moving Media Files to S3

Reminder: Django "media" files are files that have been uploaded by web site users, that then need to be served from your site. One example is a user avatar (an image the user uploads and the site displays with the user's information).

Media files are typically managed using FileField and ImageField fields on models. In a template, you use the url attribute on the file field to get the URL of the underlying file.

For example, if user.avatar is an ImageField on your user model, then

    <img src="{{ user.avatar.url }}">

would embed the user's avatar image in the web page.

By default, when a file is uploaded using a FileField or ImageField, it is saved to a file on a path inside the local directory named by MEDIA_ROOT, under a subdirectory named by the field's upload_to value. When the file's url attribute is accessed, it returns the value of MEDIA_URL, prepended to the file's path inside MEDIA_ROOT.

An example might help. Suppose we have these settings:

    MEDIA_ROOT = '/var/media/'
    MEDIA_URL = 'http://media.example.com/'

and this is part of our user model:

    avatar = models.ImageField(upload_to='avatars')

When a user uploads an avatar image, it might be saved as /var/media/avatars/12345.png. Then <img src="{{ user.avatar.url }}"> would expand to <img src="http://media.example.com/avatars/12345.png">.

Our goal is instead of saving those files to a local directory, to send them to S3. Then instead of having to serve them somehow locally, we can let Amazon serve them for us.

Another advantage of using S3 for media files is if you scale up by adding more servers, this makes uploaded images available on all servers at once.

Configuring Django media to use S3

Ideally, we'd be able to start putting new media files on S3 just by adding this to our settings:

    # DO NOT DO THIS!
    MEDIA_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN
    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Adding those settings would indeed tell Django to save uploaded files to our S3 bucket, and use our S3 URL to link to them.

Unfortunately, this would store our media files on top of our static files, which we're already keeping in our S3 bucket. If we were careful to always set upload_to on our FileFields to directory names that would never occur in our static files, we might get away with it (though I'm not sure Django would even let us). But we can do better.

What we want to do is either enforce storing our static files and media files in different subdirectories of our bucket, or use two different buckets. I'll show how to use the different paths first.

In order for our STATICFILES_STORAGE to have different settings from our DEFAULT_FILE_STORAGE, they need to use two different storage classes; there's no way to configure anything more fine-grained. So, we'll start by creating a custom storage class for our static file storage, by subclassing S3BotoStorage. We'll also define a new setting, so we don't have to hard-code the path in our Python code:

    # custom_storages.py
    from django.conf import settings
    from storages.backends.s3boto import S3BotoStorage

    class StaticStorage(S3BotoStorage):
        location = settings.STATICFILES_LOCATION

Then in our settings:

    STATICFILES_LOCATION = 'static'
    STATICFILES_STORAGE = 'custom_storages.StaticStorage'
    STATIC_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, STATICFILES_LOCATION)

Giving our class a location attribute of 'static' will put all our files into paths on S3 starting with 'static/'.

You should be able to run collectstatic again, restart your site, and now all your static files should have '/static/' in their URLs. Now delete from your S3 bucket any files outside of '/static' (using the S3 console, or whatever tool you like).

We can do something very similar now for media files, adding another storage class:

    class MediaStorage(S3BotoStorage):
        location = settings.MEDIAFILES_LOCATION

and in settings:

    MEDIAFILES_LOCATION = 'media'
    MEDIA_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, MEDIAFILES_LOCATION)
    DEFAULT_FILE_STORAGE = 'custom_storages.MediaStorage'

Now when a user uploads their avatar, it should go into '/media/' in our S3 bucket. When we display the image on a page, the image URL will include '/media/'.

Using different buckets

You can use different buckets for static and media files by adding a bucket_name attribute to your custom storage classes. You can see the whole list of attributes you can set by looking at the source for S3BotoStorage.

Moving an existing site's media to S3

If your site already has user-uploaded files in a local directory, you'll need to copy them up to your media directory on S3. There are lots of tools these days for doing this kind of thing. If the command line is your thing, try the AWS CLI tools from Amazon. They worked okay for me.

Summary

Serving your static and media files from S3 requires getting a lot of different parts working together. But it's worthwhile for a number of reasons:

  • S3 can probably serve your files more efficiently than your own server.
  • Using S3 saves the resources of your own server for more important work.
  • Having media files on S3 allows easier scaling by replicating your servers.
  • Once your files are on S3, you're well on the way to using CloudFront to serve them even more efficiently using Amazon's CDN service.

Caktus GroupWebcast: Creating Enriching Web Applications with Django and Backbone.js

Update: The live webcast is now available at O'Reilly Media

Our technical director, Mark Lavin, will be giving a tutorial on Django and Backbone.js during a free webcast for O’Reilly Media tomorrow, November 6th, 1pm EST. There will be demos and a discussion of common stumbling blocks when building rich client apps.

Register today!

Here’s a description of his talk:

"Django and Backbone are two of the most popular frameworks for web backends and frontends respectively and this webcast will talk about how to use them together effectively. During the session we'll build a simple REST API with Django and connect it to a single page application built with Backbone. This will examine the separation of client and server responsibilities. We'll dive into the differences between client-side and server-side routing and other stumbling blocks that developers encounter when trying to build rich client applications.

If you're familiar with Python/Django but unfamiliar with Javascript frameworks, you'll get some useful ideas and examples on how to start integrating the two. If you're a Backbone guru but not comfortable working on the server, you'll learn how the MVC concepts you know from Backbone can translate to building a Django application."

Update: The live webcast is now available at O'Reilly Media

Tim HopperSundry Links for November 3, 2014

Public Data Sets : Amazon Web Services: Amazon hosts a number of publicly datasets on AWS (including the common crawl corpus and the "Marvel Universe Social Graph").

Rapid Web Prototyping with Lightweight Tools: I've shared this before, but my boss Andrew did a fantastic tutorial last year on Flask, Jinja2, MongoDB, and Twitter Bootstrap. Combined with Heroku, it's surprisingly easy to get a website running these days.

rest_toolkit: REST has been my obsession of late. Here's a little Python package for quickly writing RESTful APIs.

The Joys of the Craft: A quote from Fred Brooks' The Mythical Man-Month on why programming is fun.

How do I use pushd and popd commands?: I recently learned bash has push and popd commands for temporarily changing directories. This is very handy for scripting.

Tim HopperSundry Links for November 1, 2014

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!): I guess the title says it all. By Joel Spolsky.

Unix Shells - Hyperpolyglot: Very cool comparison of basic command syntax in Bash, Fish, Ksh, Tcsh, and Zsh.

Better Bash history: I'm pretty stuck on Bash at the moment. Here's a way to get a better history in Bash. (Other shells often improve on Bash's history.)

usaddress 0.1: I always love seeing a Python library for something I've tried to do poorly on my own: "usaddress is a python library for parsing unstructured address strings into address components, using advanced NLP methods."

more-itertools: A great extension to the helpful itertools module in Python. Some particularly helpful functions: chunked, first, peekaboo, and take. Unfortunately, it doesn't have Python 3 support at the moment.

Tim HopperPyspark's AggregateByKey Method

I can't find a (py)Spark aggregateByKey example anywhere, so I made a quick one.

Tim HopperSundry Links for September 30, 2014

Hammock: A lightweight wrapper around the Python requests module to convert REST APIs into "dead simple programmatic APIs". It's a clever idea. I'll have to play around with it before I can come up with a firm opinion.

pipsi: Pipsi wraps pip and virtualenv to allow you to install Python command line utilities without polluting your global environment.

Writing a Command-Line Tool in Python: Speaking of Python command line utilities, here's a little post from Vincent Driessen on writing them.

Iterables vs. Iterators vs. Generators: Vincent has been on a roll lately. He also wrote this "little pocket reference on iterables, iterators and generators" in Python.

Design for Continuous Experimentation: Talk and Slides: I didn't watch the lecture, but Dan McKinley's slides on web experimentation are excellent.

Apache Spark: A Delight for Developers: I've been playing with PySpark lately, and it really is fun.

Caktus GroupCelery in Production

(Thanks to Mark Lavin for significant contributions to this post.)

In a previous post, we introduced using Celery to schedule tasks.

In this post, we address things you might need to consider when planning how to deploy Celery in production.

At Caktus, we've made use of Celery in a number of projects ranging from simple tasks to send emails or create image thumbnails out of band to complex workflows to catalog and process large (10+ Gb) files for encryption and remote archival and retrieval. Celery has a number of advanced features (task chains, task routing, auto-scaling) to fit most task workflow needs.

Simple Setup

A simple Celery stack would contain a single queue and a single worker which processes all of the tasks as well as schedules any periodic tasks. Running the worker would be done with

python manage.py celery worker -B

This is assuming using the django-celery integration, but there are plenty of docs on running the worker (locally as well as daemonized). We typically use supervisord, for which there is an example configuration, but init.d, upstart, runit, or god are all viable alternatives.

The -B option runs the scheduler for any periodic tasks. It can also be run as its own process. See starting-the-scheduler.

We use RabbitMQ as the broker, and in this simple stack we would store the results in our Django database or simply ignore all of the results.

Large Setup

In a large setup we would make a few changes. Here we would use multiple queues so that we can prioritize tasks, and for each queue, we would have a dedicated worker running with the appropriate level of concurrency. The docs have more information on task routing.

The beat process would also be broken out into its own process.

# Default queue
python manage.py celery worker -Q celery
# High priority queue. 10 workers
python manage.py celery worker -Q high -c 10
# Low priority queue. 2 workers
python manage.py celery worker -Q low -c 2
# Beat process
python manage.py celery beat

Note that high and low are just names for our queues, and don't have any implicit meaning to Celery. We allow the high queue to use more resources by giving it a higher concurrency setting.

Again, supervisor would manage the daemonization and group the processes so that they can all be restarted together. RabbitMQ is still the broker of choice. With the additional task throughput, the task results would be stored in something with high write speed: Memcached or Redis. If needed, these worker processes can be moved to separate servers, but they would have a shared broker and results store.

Scaling Features

Creating additional workers isn't free. The default concurrency uses a new process for each worker and creates a worker per CPU. Pushing the concurrency far above the number of CPUs can quickly pin the memory and CPU resources on the server.

For I/O heavy tasks, you can dedicate workers using either the gevent or eventlet pools rather than new processes. These can have a lower memory footprint with greater concurrency but are both based on greenlets and cooperative multi-tasking. If there is a library which is not properly patched or greenlet safe, it can block all tasks.

There are some notes on using eventlet, though we have primarily used gevent. Not all of the features are available on all of the pools (time limits, auto-scaling, built-in rate limiting). Previously gevent seemed to be the better supported secondary pool, but eventlet seems to have closed that gap or surpassed it.

The process and gevent pools can also auto-scale. It is less relevant for the gevent pool since the greenlets are much lighter weight. As noted in the docs, you can implement your own subclass of the Autoscaler to adjust how/when workers are added or removed from the pool.

Common Patterns

Task state and coordination is a complex problem. There are no magic solutions whether you are using Celery or your own task framework. The Celery docs have some good best practices which have served us well.

Tasks must assert the state they expect when they are picked up by the worker. You won't know how much time has passed since the original task was queued and when it executes. Another similar task might have already carried out the operation if there is a backlog.

We make use of a shared cache (Memcache/Redis) to implement task locks or rate limits. This is typically done via a decorator on the task. One example is given in the docs though it is not written as a decorator.

Key Choices

When getting started with Celery you must make two main choices:

  • Broker
  • Result store

The broker manages pending tasks, while the result store stores the results of completed tasks.

There is a comparison of the various brokers in the docs.

As previously noted, we use RabbitMQ almost exclusively, though we have used Redis successfully and experimented with SQS. We prefer RabbitMQ because Celery's message passing style and much of the terminology was written with AMQP in mind. There are no caveats with RabbitMQ like there are with Redis, SQS, or the other brokers which have to emulate AMQP features.

The major caveat with both Redis and SQS is the lack of built-in late acknowledgment, which requires a visibility timeout setting. This can be important when you have long running tasks. See acks-late-vs-retry.

To configure the broker, use BROKER_URL.

For the result store, you will need some kind of database. A SQL database can work fine, but using a key-value store can help take the load off of the database, as well as provide easier expiration of old results which are no longer needed. Many people choose to use Redis because it makes a great result store, a great cache server and a solid broker. AMQP backends like RabbitMQ are terrible result stores and should never be used for that, even though Celery supports it.

Results that are not needed should be ignored, using CELERY_IGNORE_RESULT or Task.ignore_result.

To configure the result store, use CELERY_RESULT_BACKEND.

RabbitMQ in production

When using RabbitMQ in production, one thing you'll want to consider is memory usage.

With its default settings, RabbitMQ will use up to 40% of the system memory before it begins to throttle, and even then can use much more memory. If RabbitMQ is sharing the system with other services, or you are running multiple RabbitMQ instances, you'll want to change those settings. Read the linked page for details.

Transactions and Django

You should be aware that Django's default handling of transactions can be different depending on whether your code is running in a web request or not. Furthermore, Django's transaction handling changed significantly between versions 1.5 and 1.6. There's not room here to go into detail, but you should review the documentation of transaction handling in your version of Django, and consider carefully how it might affect your tasks.

Monitoring

There are multiple tools available for keeping track of your queues and tasks. I suggest you try some and see which work best for you.

Summary

When going to production with your site that uses Celery, there are a number of decisions to be made that could be glossed over during development. In this post, we've tried to review some of the decisions that need to be thought about, and some factors that should be considered.

Tim HopperiOS's Launch Center Pro, Auphonic, and RESTful APIs

Lately I've been using Auphonic's web service for automating audio post-production and distribution. You can provide Auphonic with an audio file (via Dropbox, FTP, web upload, and more), and it will perform any number of tasks for you, including

  • Tag the track with metadata (including chapter markings)
  • Intelligently adjust levels
  • Normalize loudness
  • Reduce background noise and hums
  • Encode the audio in numerous formats
  • Export the final production to a number of services (including Dropbox, FTP, and Soundcloud)

I am very pleased with Auphonic's product, and it's replaced a lot of post-processing tools I tediously hacked together with Hazel, Bash, and Python.

Among its many other features, API has a robust RESTful API available to all users. I routinely create Auphoic productions that vary only in basic metadata, and I have started using this API to automate creation of productions from my iPhone.

Launch Center Pro is a customizable iOS app that can trigger all kinds of actions in other apps. You can also create input forms in LCP and send the data from them elsewhere. I created a LCP action with a form for entering title, artist, album, and track metadata that will eventually end up in a new Auphonic production.

The LCP action settings looks like this 2:

When I launch that action in LCP, I get four prompts like this:

After I fill out the four text fields, LCP uses the x-callback URL I defined to send that data to Pythonista, a masterful "integrated development environment for writing Python scripts on iOS."

In Pythonista, I have a script called New Production. LCP passes the four metadata fields I entered as sys.argv variables to my Python script. The Python script adds these variables to a metadata dictionary that it then POSTs to the Auphonic API using the Python requests library. After briefly displaying the output from the Auphonic API, Pythonista returns me to LCP.

Here's my Pythonista script1:

username = "AUPHONIC_USERNAME"
password = "AUPHONIC_PASSWORD

import sys
import requests
import webbrowser
import time
import json
import datetime as dt

# Read input from LCP
title = sys.argv[1]
artist = sys.argv[2]
album = sys.argv[3]
track = sys.argv[4]

d = {
        "metadata": {
            "title": title,
            "artist": artist,
            "album": album,
            "track": track
            }
    }

# POST production to Auphonic API
r = requests.post("https://auphonic.com/api/productions.json",
          auth=(username,password),
          data=json.dumps(d),
          headers={"content-type": "application/json"}).json()

# Display API Response
print "Response", r["status_code"]
print "Error:", r["error_message"]
for key, value in r.get("data",{}).get("metadata",{}).iteritems():
if value:
    print key, ':', value

time.sleep(2)
# Return to LCP
webbrowser.open("launch://")

After firing my LCP action, I can log into my Auphonic account and see a incomplete3 production with the metadata I entered!

While I just specified some basic metadata with the API, Auphonic allows every parameter that can be set on the web client to be configured through the API. For example, you can specify exactly what output files you want Auphonic to create or create a production using one of your presets. These details just needed to be added to the d dictionary in the script above. Moreover, this same type of setup could be used with any RESTful API, not just Auphonic.


  1. If you want to use this script, you'll have to provide your own Auphonic username and password. 

  2. Here is that x-callback URL of you want to copy it: pythonista://{{New Production}}?action=run&args={{"[prompt:Title]" "[prompt:Artist]" "[prompt:Album]" "[prompt:Track]"}} 

  3. It doesn't have an audio file and hasn't been submitted. 

Tim HopperSundry Links for September 25, 2014

Philosophy of Statistics (Stanford Encyclopedia of Philosophy): I suspect that a lot of the Bayesian vs Frequentist debates ignore the important epistemological underpinnings of statistics. I haven’t finished reading this yet, but I wonder if it might help.

Connect Sunlight Foundation to anything: “The Sunlight Foundation is a nonpartisan non-profit organization that uses the power of the Internet to catalyze greater U.S. Government openness and transparency.” They now of an IFTTT channel. Get push notifications when the president signs a bill!

furbo.org · The Terminal: Craig Hockenberry wrote a massive post on how he uses the Terminal on OS X for fun and profit. You will learn things.

A sneak peek at Camera+ 6… manual controls are coming soon to you! : I’ve been a Camera+ user on iOS for a long time. The new version coming out soon is very exciting.

GitHut - Programming Languages and GitHub: A very clever visualization of various languages represented on Github and of the properties of their respective repositories.

Og MacielBooks

Woke up this morning and, as usual, sat down to read the Books section of The New York Times while drinking my coffee. This has become sort of a ‘tradition’ for me and because of it I have been able to learn about many interesting books, some of which I would not have found out on my own. I also ‘blame’ this activity to turning my nightstand into a mini-library on its own.

Currently I have the following books waiting for me:

Anyhow, while drinking my coffee this morning I realized just how much I enjoy reading and (what I like to call) catching up with all the books I either read when I was younger but took for granted or finally getting to those books that have been so patiently waiting for me to get to them. And now, whenever I’m not working or with my kids, you can bet your bottom dollar that you’ll find me somewhere outside (when the mosquitos are not buzzing about the yard) or cozily nestled with a book (or two) somewhere quiet around the house.

Book Queue

But to the point of this story, today I realized that, if I could go back in time (which reminds me, I should probably add “The Time Machine” to my list) to the days when I was looking to buy a house, I would have done two things differently:

  1. wire the entire house so that every room would have a couple of ethernet ports;
  2. chosen a house with a large-ish room and add wall-to-wall bookcases, like you see in those movies where a well-off person takes their guests into their private libraries for tea and biscuits;

I realize that I can’t change the past, and I also realize that perhaps it is a good thing that I took my book reading for granted during my high school and university years… I don’t think I would have enjoyed reading “Dandelion Wine” or “Mrs. Dalloway” as much back then as I when I finally did. I guess reading books is very much like the process of making good wines… with age and experience, the reader, not the book, develops the maturity and ability to properly savor a good story.

Tim HopperSundry Links for September 20, 2014

Open Sourcing a Python Project the Right Way: Great stuff that should be taught in school: “Most Python developers have written at least one tool, script, library or framework that others would find useful. My goal in this article is to make the process of open-sourcing existing Python code as clear and painless as possible.”

elasticsearch/elasticsearch-dsl-py: Elasticsearch is an incredible datastore. Unfortunately, its JSON-based query language is tedious, at best. Here’s a nice higher-level Python DSL being developed for it. It’s great!

Equipment Guide — The Podcasting Handbook: Dan Benjamin of 5by5 podcasting fame is writing a book on podcasting. Here’s his brief equipment guide.

bachya/pinpress: Aaron Bach put together a neat Ruby script that he uses to generate his link posts. This is similar to but better than my sundry tool.

Markdown Resume Builder: I haven’t tried this yet, but I like the idea: a Markdown based resume format that can be converted into HTML or PDF.

Git - Tips and Tricks: Enabling autocomplete in Git is something I should have done long ago.

Apache Storm Design Pattern—Micro Batching: Micro batching is a valuable tool when doing stream processing. Horton Works put up a helpful post outlining three ways of doing it.

Caktus GroupImproving Infant and Maternal Health in Rwanda and Zambia with RapidSMS

Image courtesy of UNICEF, the funders of this project.

I have had the good fortune of working internationally on mobile health applications due to Caktus' focus on public health. Our public health work often uses RapidSMS, a free and open-source Django powered framework for dynamic data collection, logistics coordination and communication, leveraging basic short message service (SMS) mobile phone technology. I was able to work on two separate projects tracking data related to the 1000 days between a woman’s pregnancy and the child’s second birthday. Monitoring mothers and children during this time frame is critical as there are many factors that, when monitored properly, can decrease the mortality rates for both mother and child. Both of these projects presented interesting challenges and resulted in a number of takeaways worth further discussion.

Zambia

The first trip took me to Lusaka, the capitol of Zambia, to work on Saving Mothers Giving Life (SMGL) which is administered by the Zambia Center for Applied Health Research and Development (ZCAHRD) office. The ZCAHRD office had recently finished a pilot phase resulting in a number of additional requirements to implement before expanding the project. In addition to feature development and bug fixes, training a local developer was on the docket.

SMGL collects maternal and fetal/child data via SMS text messages.  When an SMS is received by the application, the message is parsed and routed for additional processing based on matching known keywords. For example, I could have a BirthHandler KeywordHandler that allows the application to track new births. Any message that begins with the keyword birth would be further processed by BirthHandler. KeywordHandlers must have, at a minimum, a defined keyword, help and handler functionality:

from rapidsms.contrib.handlers import KeywordHandler

class BirthHandler(KeywordHandler): 
    def help(self): 
        self.respond("Send BIRTH BOY or BIRTH GIRL.") 

    def handle(self, text): 
        if text.upper() == "BOY": 
            self.respond("A boy was born!") 
        elif text.upper() == "GIRL":
            self.respond("A girl was born!")
        self.help()

An example session:

 > birth 
 < Send BIRTH BOY or birth GIRL. 
 > birth boy 
 < A boy was born! 
 > birth girl
 < A girl was born!
 > birth pizza
 < Send BIRTH BOY or BIRTH GIRL.

New Keyword Handlers

The new syphilis keyword handler would allow clinicians to track a mother’s testing and treatment data. For our handler, a user supplies the SYP keyword, mother id, the date of the visit followed by the test result indicator or shot series and an optional next shot date:

  SYP ID DAY MONTH YEAR P/N/S[1-3] NEXT_DAY NEXT_MONTH NEXT_YEAR

To record a positive syphillis test result on January 1, 2013 for mother #1 with a next shot data of January 2, 2013, the following SMS would be sent:

  SYP 1 01 01 2013 P 02 01 2013

With these records in hand, the system’s periodic reminder application will send notifications to remind patients of their next visit. Similar functionality exists for tracking pre- and post-natal visits.

The other major feature implemented for this phase was a referral workflow.  It is critical for personnel at facilities ranging from the rural village to the district hospital to be aware of incoming patients with both emergent and non-emergent needs, as the reaction to each case differs greatly.  The format for SMGL referrals is as follows:

  REFER ID FACILITY_ID REASON TIME EM/NEM

To refer mother #1 who is bleeding to facility #100 and requires emergency care:

  REFER 1 100 B 1200 EM

Based on the receiving facility and the reason as well as the emergent indicator differing people will be notified of the case. Emergent cases require dispatching ambulances, prepping receiving facilities and other essential components to increase the survivability for the mother and/or child, whereas non-emergent cases may only require clinical workers to be made aware of an inbound patient.

Reporting

The reporting tools were fairly straightforward, creating web based views for each keyword handler that presented the data in filterable, sortable, tabular format. In addition, end users can export the data as a spreadsheet for further analysis.  These views allow clinicians, researchers, and other stakeholders easily accessible metrics to analyze the efficacy of the system as a whole.



Training

As mentioned earlier, training a local developer was also a core component of this visit.  This person was the office’s jack of all trades for all things technical, from network and systems administration to shuttling around important data on thumb drives. Given his limited exposure to Python, we spent most of the time in country pair programming, talking through the model-view-template architecture and finding bite sized tasks for him to work through when not pair programming.

Zambia Takeaways:

  • It was relatively straightfoward to write one off views and exporters for the keyword handlers. But, as the number of handlers increases for the project, this functionality could benefit from abstracting into a generic DRY reporting tool.
  • When training, advocate that the participant has either 100% of his time allocated or draw up designated blocks of time during the day. The ad hoc schedule we worked with was not as fruitful as it could have been, as competing responsibilities often took precedence over actual Django/RapidSMS training.
  • If in Zambia, there are two requisite weekend trips: Victoria Falls and South Luangwa National Park . Visitors to Zambia do themselves a great disservice to not schedule trips to both areas.

Off to Rwanda!

UNICEF  recognized that many countries were working on solving the same problem, monitoring the patients and capturing the data from those all important first 1000 Days.  A 1000 Days initiative was put forward, whereby countries would contribute resources and code to a single open source platform that all countries could deploy independently. Evan Wheeler, a UNICEF Project Manager contacted Caktus about contributing to this project.

We were tasked with building three RapidSMS components of the 1000 Days architecture: an appointment application, a patient/provider API for storing and accessing records from different backends, and a nutrition monitoring application.  We would flesh out these applications before our in country visit to Kigali, Rwanda. While there, working closely with Evan and our in country counterparts, we would finish the initial versions of these applications as well as orient the local development team to the future direction of the 1000 Days deployment.

rapidsms-appointments  allows users to subscribe to a series of appointments based on a timeline of configurable milestones. Appointment reminders are sent out to patient/staff, and there are mechanisms for confirming, rescheduling, and tracking missed/made appointments. The intent of this application was to create an engine for generating keyword handlers based on appointments. Rather than having to write code for each individual timeline based series (pre- and post-natal mother visits, for example), one could simply configure these through the admin panel. The project overview documentation provides a great entry point.

rapidsms-healthcare obviates the need for countries’ to track patient/provider data in multiple databases. Many countries utilize 3rd party datastores, such as OpenMRS , to create a medical records system. With rapidsms-healthcare in 1000 Days, deployments can take advantage of pre-existing patient & provider data by utilizing a default healthcare storage backend, or creating a custom backend for their existent datastore. Additional applications can then utilize the healthcare API to access patients and providers.

rapidsms-nutrition is an example of such a library.  It will consume patient data from the healthcare API and monitor child growth, generating statistical assessments based on WHO Child Growth Standards. It utilizes the pygrowup library. With this data in hand, it is relatively easy to create useful visualizations with a library such as d3.js.

Rwanda Takeaways

  • Rwanda is gorgeous.  We had an amazing time in Kigali and at Lake Kivu, one of three EXPLODING LAKES in the world.

No report on Africa would be complete without a few pictures...enjoy!!

 

Tim HopperQuickly Converting Python Dict to JSON

Recently, I've spent a lot of time going back and forth between Python dicts and JSON. For some reason, I decided last week that I'd be useful to be able to quickly convert a Python dict to pretty printed JSON.

I created a TextExpander snippet that takes a Python dict from the clipboard, converts it to JSON, and pastes it.

Here are the details:

#!/usr/bin/env python
import os, json
import subprocess

def getClipboardData():
 p = subprocess.Popen(['pbpaste'], stdout=subprocess.PIPE)
 retcode = p.wait()
 data = p.stdout.read()
 return data

cb = eval(getClipboardData())

print json.dumps(cb, sort_keys=True, indent=4, separators=(',', ': '))

Caktus GroupQ3 Charitable Giving

Our client social impact projects continue here at Caktus, with work presently being done in Libya, Nigeria, Syria, Turkey, Iraq and the US. But every quarter, we pause to consider the excellent nonprofits that our employees volunteer for and, new this quarter, that they have identified as having a substantive influence on their lives. The following list represents employee-nominated nonprofits which we are giving to in alphabetical order:

Animal Protection Society of Durham

apsofdurham.org
The Animal Protection Society of Durham (APS) is a non-profit organization that has been helping animals in our community since 1970, and has managed the Durham County Animal Shelter since 1990. IAPS feeds, shelters and provides medical attention for nearly 7,000 stray, surrendered, abandoned, abused and neglected animals annually.

The Carrack

thecarrack.org
The Carrack is owned and run by the community, for the community, and maintains an indiscriminate open forum that enables local artists to perform and exhibit outside of the constraints of traditional gallery models, giving the artist complete creative freedom.

Scrap Exchange

scrapexchange.org
The Scrap Exchange is a nonprofit creative reuse center in Durham, North Carolina whose mission is to promote creativity, and environmental awareness. The Scrap Exchange provides a sustainable supply of high-quality, low-cost materials for artists, educators, parents, and other creative people.

Society for the Prevention of Cruelty to Animals - San Francisco

sfspca.org
As the fourth oldest humane society in the U.S. and the founders of the No-Kill movement, the SF SPCA has always been at the forefront of animal welfare. SPCA SF’s animal shelter provides pets for adoption.

Southern Coalition for Social Justice

southerncoalition.org
The Southern Coalition for Social Justice was founded in Durham, North Carolina by a multidisciplinary group, predominantly people of color, who believe that families and communities engaged in social justice struggles need a team of lawyers, social scientists, community organizers and media specialists to support them in their efforts to dismantle structural racism and oppression.

Tim HopperSundry Links for September 10, 2014

textract: textract is a Python module and a command line tool for text extraction from many file formats. It cleverly pulls together many libraries into a consistent API.

Flask Kit: I've been reading a lot about Flask (the Python web server) lately. Flask Kit is a little tool to give some structure to new Flask projects.

cookiecutter: I was looking for this recently, but it I couldn't find it. "A command-line utility that creates projects from cookiecutters (project templates). E.g. Python package projects, jQuery plugin projects." There's even a Flask template!

Over 50? You Probably Prefer Negative Stories About Young People: A research paper from a few years ago show that older people prefer to read negative news about young people. "In fact, older readers who chose to read negative stories about young individuals actually get a small boost in their self-esteem."

Episode 564: The Signature: The fantastic Planet Money podcast explains why signatures are meaningless in a modern age. My scribbles have become even worse since listening to this.

github-selfies: Here's a Chrome and Firefox extension that allows you to quickly embed gif selfies in Github posts. Caution: may lead to improved team morale.

Caktus GroupDjangoCon 2014: Recap

Caktus had a great time at DjangoCon in Portland this year! We met up with old friends and new. The following staff gave talks (we’ll update this post with videos as soon as they’re available):

We helped design the website, so it was gratifying seeing the hard work of our design team displayed on the program ad and various points throughout the conference.

For fellow attendees, you probably noticed our giant inflatable duck, who came out in support of Duckling, our conference outings app. He told us he had a good time too.

Here’s some pictures of our team at DjangoCon:


Update with Caktus DjangoCon talk video links Anatomy of a Django Project REST: It’s Not Just for Servers * What is the Django Admin Good For?

Tim HopperTracking Weight Loss with R, Hazel, Withings, and IFTTT

As I have noted before, body weight is a noisy thing. Day to day, your weight will probably fluctuate by several pounds. If you're trying to lose weight, this noise can cause unfounded frustration and premature excitement.

When I started a serious weight loss plan a year and a half ago, I bought a wifi-enabled Withings Scale. The scale allows me to automatically sync my weight with Montior Your Weight, MyFitnessPal, RunKeeper, and other fitness apps on my phone. IFTTT also has great Withings support allowing me to push my weight to various other web services.

One IFTTT rule I have appends my weight to a text file in Dropbox. This file looks like this:

263.86 August 21, 2014 at 05:56AM
264.62 August 22, 2014 at 08:27AM
264.56 August 23, 2014 at 09:41AM
263.99 August 24, 2014 at 08:02AM
265.64 August 25, 2014 at 08:08AM
267.4 August 26, 2014 at 08:16AM
265.25 August 27, 2014 at 09:08AM
264.17 August 28, 2014 at 07:21AM
264.03 August 29, 2014 at 08:43AM
262.71 August 30, 2014 at 08:47AM

For a few months, I have been experimenting with using this time series to give myself a less-noisy update on my weight, and I've come up with a decent solution.

This R script will take my weight time series, resample it, smooth it with a rolling median over the last month, and write summary stats to a text file in my Dropbox. It's not the prettiest script, but it gets the job done for now.1

INPUT_PATH <- "~/Dropbox/Text Notes/Weight.txt"
OUTPUT_PATH <- "~/Dropbox/Text Notes/Weight Stats.txt"

library(lubridate)
library(ggplot2)
library(zoo)

# READ FILE
con <- file(INPUT_PATH, "rt")
lines <- readLines(con)
close(con)

# PARSE INTO LISTS OF WEIGHTS AND DATES
parse.line <- function(line) {
  s <- strsplit(line, split=" ")[[1]]
  date.str <- paste(s[2:10][!is.na(s[2:10])], collapse=" ")
  date <- mdy_hm(date.str, quiet=TRUE)
  l <- list(as.numeric(s[1]), date)
  names(l) <- c("weight", "date")
  l
}
list.weight.date <- lapply(lines, parse.line)
weights <- lapply(list.weight.date, function(X) X$weight)
dates <- lapply(list.weight.date, function(X) X$date)

# BUILD DATA FRAME
df <- data.frame(weight = unlist(weights), date = do.call("c", dates) )

# CREATE TIME SERIES AND RESAMPLE
ts <- zoo(c(df$weight), df$date)
ts <- aggregate(ts, time(ts), tail, 1)
g <- round(seq(start(ts), end(ts), 60 * 60 * 24), "days")
ts <- na.approx(ts, xout = g)

# FUNCTION TO GET WEIGHT N-DAYS AGO IF WEIGHT IS SMOOTHED BY ROLLING MEDIAN
# OVER A GIVEN (smooth.n) NUMBER OF DAYS
days.ago <- function(days, smooth.n) {
  date <- head(tail(index(ts),days + 1),1)
  smoothed <- rollmedianr(ts, smooth.n)
  as.numeric(smoothed[date])
}

# SMOOTH WEIGHT BY 29 DAYS AND GENERATE SOME SUMMARY STATS
days = 29
current.weight <- days.ago(0, days)
x <- c(current.weight,
       current.weight-days.ago(7, days),
       current.weight-days.ago(30, days),
       current.weight-days.ago(365, days),
       current.weight-max(ts))
x = round(x, 1)
names(x) = c("current", "7days", "30days", "365days", "max")


fileConn<-file(OUTPUT_PATH)
w <- c(paste("Weight (lbs):", x["current"]),
       paste("Total Δ:", x["max"]),
       paste("1 Week Δ:", x["7days"]),
       paste("1 Month Δ:", x["30days"]),
       paste("1 Year Δ:", x["365days"]))
writeLines(w,fileConn)
close(fileConn)

The output looks something like this:

Weight (lbs): 265.7
Total Δ: -112
1 Week Δ: -0.8
1 Month Δ: -4.8
1 Year Δ: -75

I want this script to be run every time my weight is updated, so I created a second IFTTT rule that will create a new file in my Dropbox, called new_weight_measurement, every time I weigh in. On my Mac Mini, I have a Hazel rule to watch for a file of this name to be created. When Hazel sees the file, it runs my R script and deletes that file.

My Hazel rule looks like this:

The 'embedded script' that is run is the R script above; I just have to tell Hazel to use the Rscript shell.2

At this point, every time I step on my scale, a text file with readable statistics about my smoothed weight appear in my Dropbox folder.

Of course, I want this updated information to be pushed directly too me. Hazel is again the perfect tool for the job. I have a second Hazel rule that watches for Weight Stats.txt to be created. Hazel can pass the path of the updated file into any script of your choice. You could, for example, use Mailgun to email it to yourself or Pushover to push it to your mobile devices. Obviously, I want to tweet mine.

I have a Twitter account called @hopsfitness where I've recently been tracking my fitness progress. On my Mac Mini, I have t configured to access @hopsfitness from the command line. Thus, tweeting my updated statistics is just a matter of a little shell script executed by Hazel:

Since this data goes to Twitter, I can get it painlessly pushed to my phone: Twitter still allows you subscribe to accounts via text message, which I've done with @hopsfitness. A minute or so after I step on my scale, I get a text with useful information about where I am and where I'm going; this is much preferable to the noisy weight I see on my scale.


  1. This assumes your input file is formatted like mine, but you could easily adjust the first part of the code for other formats. 

  2. You can download R here; installing it should add Rscript to your system path. 

Tim HopperSundry Links for August 30, 2014

Ggplot2 To Ggvis: I'm a huge fan of ggplot2 for data visualization in R. Here's a brief tutorial for ggplot2 users to learn ggvis for generating interactive plots in R using the grammar of graphics.

From zero to storm cluster for scikit-learn classification | Daniel Rodriguez: This is a very cool, if brief, blog post on using streamparse, my company's open source wrapper for Apache Storm, and scikit-learn, my favorite machine learning library, to do machine learning on data streams.

Pythonic means idiomatic and tasteful: My boss Andrew recently shared an old blogpost he wrote on what it means for code to be Pythonic; I think he's right on track.

Pythonic isn’t just idiomatic Python — it’s tasteful Python. It’s less an objective property of code, more a compliment bestowed onto especially nice Python code.

git workflow: In my ever continuing attempt to be able to run my entire life from Alfred, I recently installed this workflow that makes git repositories on my computer easily searchable.

Alfred-Workflow: Speaking of Alfred, here's a handy Python library that makes it easy to write your own (if you're a Python programmer).

Squashing commits with rebase: Turns out you can use git rebase to clean up your commits before you push them to a remote repository. This can be a great way to make the commits your team sees more meaningful; don't abuse it.

Tim HopperSundry Links for August 28, 2014

How do I generate a uniform random integer partition?: This week, I wanted to generate random partitions of integers. Unsurprisingly, stackoverflow pulled through with a Python snippet to do just that.

Firefox and Chrome Bookmarks: I love Alfred as a launcher in OS X. I use it many, many times a day. I just found this helpful workflow for quickly searching and opening my Chrome bookmarks.

YNAB for iPad is Here: YNAB has been the best thing to ever happen to my financial life. I use it to track all my finances. They just released a beautiful iPad app. Importantly, it brings the ability to modify a budget to mobile!

Distributed systems theory for the distributed systems engineer: I work on distributed systems these days. I need to read some of these papers.

Tim HopperKeeping IPython Notebooks Running in the Background

I spend a lot of time in IPython Notebooks for work. One of the few annoyances of IPython Notebooks is that they require kepeing a terminal window open to run the notebook server and kernel. I routinely launch a Notebook kernel in a directory where I keep my work related notebooks. Earlier this week, I started to wonder if there was a way for me to keep this kernel running all the time without having to keep a terminal window open..

If you've ever tried to do chron-like automation on OS X, you've surely come across launchd, "a unified, open-source service management framework for starting, stopping and managing daemons, applications, processes, and script". You've probably also gotten frustated with launchd and given up.

I recently started using LaunchControl "a fully-featured launchd GUI" for launchd; it's pretty nice and worth $10. It occurred to me that LaunchControl would be a good way to keep my Notebook kernel running in the background.

I created a LaunchControl to run the following command.

/usr/local/bin/IPython notebook --matplotlib inline --port=9777 --browser=false

This launches an IPython Notebook kernel accessible on port 9777; setting the browser flag to something other than an installed browser prevents a browser window from opening when the kernel is launch.

I added three other launchd keys in LaunchControl:

  • A Working Directory key to tell LaunchControl to start my notebook in my desired folder.
  • A Run At Load key to tell it to start my kernel as soon as I load the job.
  • And a Keep alive key to tell LaunchControl to restart my kernel should the process ever die.

Here's how it looks in LaunchControl:

After I created it, I just had to save and load, and I was off to the races; the IPython kernel starts and runs in the background. I can access my Notebooks by navigating to 127.0.0.1:9777 in my browser. Actually, I added 127.0.0.1 parsely.scratch to my hosts file so I can access my Notebooks at parsely.scratch:9777. This works nicely with Chrome's autocomplete feature. I'm avoiding the temptation to run nginx and give it an even prettier url.

Tim HopperSundry Links for August 25, 2014

How can I pretty-print JSON at the command line?: I needed to pretty print some JSON at the command line earlier today. The easiest way might be to pipe it through python -m json.tool.

Integrating Alfred & Keyboard Maestro: I love Keyboard Maestro for automating all kinds of things on my Mac, but I'm reaching a limit of keyboard shortcuts I can remember. Here's an Alfred workflow for launching macros instead.

streamparse 1.0.0: My team at Parsely is building a tool for easily writing Storm topologies (for processing large volumes of streaming data) in Python. We just released 1.0.0!

TextExpander Tools: Brett Terpstra, the king of Mac hacks, has some really handy tools for TextExpander.

GNU Parallel: GNU parallel is a shell tool for executing jobs in parallel using one or more computers using xargs-like syntax. Pretty cool. HT http://www.twitter.com/oceankidbilly.

Tim HopperSundry Links for August 23, 2014

Arrow: better dates and times for Python: Arrow is a slick Python library "that offers a sensible, human-friendly approach to creating, manipulating, formatting and converting dates, times, and timestamps". It's a friendly alternative to datetime.

Docker via Homebrew: I'm starting to use Docker ("Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications") on occasion. Here are easy install instructions for Mac users.

Mining Massive Datasets MOOC: I'm terrible at completing MOOCs, but I'm really interested in this new on on Mining Massive Datasets.

URL Pinner - Chrome Web Store: URL Pinner is one of my favorite Chrome Extensions. I use it to automatically pin my Gmail and Rdio windows (which I almost always have open).

Using multitail for monitoring multiple log files: If you work with distributed systems, you're probably used to SSH-ing into multiple machines to access logs. Multitool might save you some time.

Saturday Morning Breakfast Cereal: SBMC shows how job interviews would go if we were more honest.

Tim HopperSundry Links for August 23, 2014

Remove Styles (ie, make the clipboard plain text – not applicable to variables). Set line endings to Mac, Unix or Windows/DOS. Trim Whitespace. Hard wrap or unwrap paragraphs. Lowercase (all characters), Lowercase First (just the first character). Uppercase (all characters), Uppercase First (just the first character). Capitalize (all words) or Title Case (intelligently uppercase certain first letters). Change quotes to Smart, Dumb or French quotation marks. Encode HTML or non-ASCII HTML entities. Decode HTML entities. Generate an HTML list. Percent Encode for URL. Get or delete the last path component or the path extension. Get the basename of the path (ie the name without directory or extension). Expand tilde (~) paths, or abbreviate with a tilde. Resolve symlinks, or standardize the path. Delete or bullet (•) control characters. Calculate an expression and return the result, see the Calculations section. Process Text Tokens and return the result, see the Text Tokens section. Count the characters, words or lines and return the result.

Frank WierzbickiJython 2.7 beta3 released!

On behalf of the Jython development team, I'm pleased to announce that the third beta of Jython 2.7 is available. I'd like to thank Adconion Media Group (now Amobee) for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7b3 brings us up to language level compatibility with the 2.7 version of CPython. We have focused largely on CPython compatibility, and so this release of Jython can run more pure Python apps then any previous release. Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

Some highlights of the changes that come in beta 3:
  • Reimplementation of socket/select/ssl on top of Netty 4.
  • Requests now works.
  • Pip almost works (it works with a custom branch).
  • Numerous bug fixes
To get a more complete list of changes in beta 3, see Jim Baker's talk.

As a beta release we are concentrating on bug fixing and stabilization for a
production release.

This release is being hosted at maven central. The traditional installer can be found here. See the installation instructions for using the installer. Three other versions are available:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Caktus GroupDjangoCon Ticket Giveaway!

Update: Congratulations to @dmpayton for winning this giveaway!

Caktus is giving away a DjangoCon ticket valued at $850! DjangoCon is the main US Django conference and it’s returning to Portland this year, August 30 - September 4th. Meet fellow Django developers, learn what others are doing, and have a good time!

To enter the giveaway: (1) follow us @caktusgroup and (2) retweet our message by clicking the button below:

The giveaway will end Wednesday, August 20th at 9AM PDT. We’ll randomly select a name and alert the winner by 5PM PDT. Please note that only one entry per individual is allowed and winning tickets are non-transferable.

We hope to see you at DjangoCon this year!

Caktus GroupPyOhio Recap: Celery with Python

Caleb Smith recently gave a talk, “Intro to Celery,” at PyOhio (video below). Celery is a pretty popular topic for us here at Caktus. We use it often in our client work and find it very handy. So we were happy Caleb was out in the world, promoting its use. We sat down with him to hear more about PyOhio and Celery.

What did you enjoy about PyOhio?

PyOhio had good quality talks and a broad range of topics including system administration, web development, and scientific programming. This year, they had over 100 talk submissions and 38 spots, so there was a huge interest in speakers and a lot of variety as a result. They have four tracks and sprints every evening.

Also, PyOhio is free. The value of a free conference is that it lowers the barrier to attend to the costs of hotel, food, and travel. Things are pretty affordable in Columbus. So that’s good for students or people without an employer to help cover costs, like freelancers. People do come from a pretty big range of places across the Midwest and South.

They have a good team of volunteers that take care of everything.

Aside from a vegetable, what is Celery and why should developers use it?

Celery is for offloading background tasks so you can have work happening behind-the-scenes while running a web project. A typical web app does everything within requests and any periodic work with cronjobs. A lot of web projects will block a request on work that needs to be done before giving a response. For example, an image upload form might make the user wait while thumbnails are produced. Sometimes, there’s work that your web project needs to do that doesn’t fit within the upper limit of 30 seconds or so to fulfill a request before timing out the request. Celery allows for offloading this work outside of the web request. It also allows for the distribution of work as needed on multiple machines. You can trigger background tasks periodically for things like nightly backups, importing data, checking on updates to a feed or API, or whatever work that needs to run asynchronously in the background. We use this a ton with some of our client work.

What are Celery alternatives?

There are a few significant ones such as RQ, pyres, gearman and kuyruk. I think Celery is the most common choice among these. You can also just use system cron jobs for the periodic tasks, but cron jobs only work on one machine and are rarely well maintained. A task queue solution such as Celery coordinates with a broker to work on different machines.

What do you think are the challenges to getting started with Celery?

A lot of people think that it only works with Django. That was true when Celery was first released but is no longer true. There’s also somewhat of a barrier to entry because of the terminology involved, the work of setting up system resources such as the message broker, and understanding its role within a project.

You were a former public school music teacher and often teach Python in the community for organizations like Girl Develop It. Is there a relationship you see to giving talks?

Giving talks does feel like an extension of teaching. You learn a lot trying to prepare for it. My talk was about how to get everything set up, the basics of how Celery works, and developing a mental model for programming Celery tasks. A project like Celery can seem very difficult if you are approaching the documentation on your own. The high level overview is a little daunting so it’s nice to provide an on-ramp for people.

Our other blog posts contain more on Celery with Python.

Caktus GroupCaleb Smith to Guest Lecture at Iron Yard Academy

Caleb Smith, a developer at Caktus, will be guest lecturing tomorrow to the inaugral class at the Iron Yard in Durham. Iron Yard is a code school that trains its students in modern programming practices and prepares them for immediate hiring upon graduation. Tobias, our CEO, is on the school’s employer advisory board. Caleb will be speaking on his experience as a Python developer. As an exclusive Python shop, we here at Caktus naturally think it’s the best language for new students--28 of the top 30 universities agree.

Joseph TateMoving a Paravirtualized EC2 legacy instance to a modern HVM one

I had to try a few things before I could get this right, so I thought I'd write about it. These steps are what ultimately worked for me. I had tried several other things to no success, which I'll list at the end of the post.

If you have Elastic Compute Cloud (EC2) instances on the "previous generation" paravirtualization based instance types, and want to convert them to the new/cheaper/faster "current generation", HVM instance types with SSD storage, this is what you have to do:

You'll need a donor Elastic Block Store (EBS) volume so you can copy data from it. Either shutdown the old instance and detach the EBS, or, as I did, snapshot the old system, and then create a new volume from the snapshot so that you can mess up without worrying about losing data. (I was also moving my instances to a cheaper data center, which I could only do by moving snapshots around). If you choose to create a new volume, make a note of which Availability Zone (AZ) you create it in.

Create a new EC2 instance of the desired instance type, configured with a new EBS volume set up the way you want it. Use a base image that's as similar to what you currently have as possible. Make sure you're using the same base OS version, CPU type, and that your instance is in the same AZ as your donor EBS volume. I mounted the ephemeral storage too as a way to quickly rollback if I messed up without having to recreate the instance from scratch.

Attach your donor EBS volume to your new instance as sdf/xvdf, and then mount them to a new directory I'll call /donor
mkdir /donor && mount /dev/xvdf /donor


Suggested: Mount your ephemeral storage on /mnt
mount /dev/xvdb /mnt
and rsync / to /mnt
rsync -aPx / /mnt/
If something goes wrong in the next few steps, you can reverse it by running
rsync -aPx --delete /mnt/ /
to revert to known working state. The rsync options tell rsync to copy (a)ll files, links, and directories, and all ownership/permissions/mtime/ctime/atime values; to show (P)rogress; and to not e(x)tend beyond a single file system (this leaves /proc /sys and your scratch and donor volumes alone).

Copy your /donor volume data to / by running
rsync -aPx /donor/ / --exclude /boot --exclude /etc/grub.d ...
. You can include other excludes (use paths to where they would be copied on the final volume, not the path in the donor system. The excluded paths above are for an Ubuntu system. You should replace /etc/grub.d with the path or paths where your distro keeps its bootloader configuration files. I found that copying /boot was insufficient because the files in /boot are merely linked to /etc/grub.d.

Now you should be able to reboot your instance your new upgraded system. Do so, detach the donor EBS volume, and if you used the ephemeral storage as a scratch copy, reset it as you prefer. Switch your Elastic IP, or change your DNS configuration, test your applications, and then clean up your old instance artifacts. Congratulations, you're done.

Notes:
Be careful of slashes. The rsync command treats /donor/ differently from /donor.

What failed:
Converting the EBS snapshot to an AMI and setting the AMI virtualization type as HVM, then launching a new instance with this AMI actually failed to boot (I've had trouble with this with PV instances too with the Ubuntu base image unless I specified a specific kernel, so I'm not sure whether to blame HVM or the Ubuntu base images.
Connecting a copy of the PV ebs volume to a running HVM system and copying /boot to the donor, then replacing sda1 with the donor volume also failed to boot, though I think if I'd copied /etc/grub.d too it might have worked. This might not get you an SSD backed EBS volume though, if that's desirable.

Caktus GroupOSCON 2014 & REST API Client Best Practices

Mark Lavin, Caktus Technical Director and author of the forthcoming Django LightWeight was recently at OSCON 2014 in Portland where he gave a talk on improving the relationship between server and client for REST APIs. OSCON, with over 3000 attendees, is one of the largest open source conferences around. I sat down with him to ask him about his time there.

Welcome back! This was your second year speaking at OSCON. How did you enjoy it this year?

I enjoyed it. There’s a variety of topics at OSCON. It’s cool to see what people do with open source—there’s such a large number of companies, technologies, and approaches to solutions. There were great conversations and presentations. I especially liked Ignite OSCON where people gave really well-prepared 5 minute talks.

I participated in the OSCON 5k [Mark received 5th place] too. There were a lot of people out. We went over bridges and went up and down this spiral bridge twice. That race was pretty late for me but fun [began at 9pm PST, which is 12AM EST].

Why did you choose REST API client best practices as a talk topic?

It was something that came out of working on Django LightWeight. I was writing about building REST APIs and the javascript clients. This prompted a lot of thinking and researching on how to design both ends of it from Julia (Ellman, co-author) and I. I found a lot of mixed content and a lot of things I wasn’t happy to see—people skimping on what I felt were best practices.

I think that you need to think about API design in the same way that you think about websites. How is a client going to navigate the API? If it’s asking for a piece of information, how is it going to find a related piece of information? What actions is it allowed to take? Writing a good server can make a client easier, something I’ve seen in my work at Caktus.

Why do you think this isn’t a more common practice?

The focus is often on building a really fast API, not building an API that’s easy to use necessarily. It’s hard to write the client for most APIs. The information that gets passed to the client isn’t always sufficient. Many APIs don’t spend the time to make themselves discoverable, so the client has to spend a lot of work hard coding to make up for the fact that it doesn’t know the location of resources.

What trade-offs do you think exist?

With relational data models, sometimes you end up trading off normalization. A classical “right way” to build a data model is one that doesn’t repeat itself and that doesn’t store redundant data in a very normalized fashion. Denormalizing data can lead to inconsistencies and duplication, but, at times, it can make things faster.

The API design is similar particularly when you have deeply relational structures. There were a lot of conversations about how do you make this trade off. Interestingly enough, Netflix gave a talk about their API and its evolution. They said they started with a normalized structure and discoverable API and found that eventually they had to restructure some pieces into a less normalized fashion for the performance they needed for some of the settop boxes and game boxes that query their API.

We heard you had an opportunity to give a tutorial. Tell us more about it.

I had the opportunity to help Harry Percival. He recently released a book on Python web development using test-driven development. We’d emailed before and so we knew each other a little bit. He asked me to help him be a TA so I spent Monday morning trying to help people follow his tutorial and get set up learning Python and Django. It was unexpected, but a lot of fun, similar to what Caktus has done with the bootcamps. I like to teach. It’s fun to be a part of that and to help someone understand something they didn’t know before. There were a lot of people interested in learning about Python and Django. I was just happy to participate.

Thanks for sharing your experiences with us Mark!

Thanks for asking me!

Caktus GroupWebsite Redesign for PyCon 2015

PyCon 2015’s website launched today (a day early!). PyCon is the premiere conference for the Python community and one we look forward to attending every year. We’re honored that the Python Software Foundation returned to us this year to revamp the site. We were especially happy to work again with organizer-extraordinaires Ewa Jodlowska and Diana Clarke.

One of the most exciting things for our team is turning ideas into reality. The organizers wanted to retain the colorful nature of the PyCon 2014 site Caktus created. The also wanted the team to use the conference site, the Palais des congrès de Montréal, as inspiration (pictured below). The new design needed to pay homage to the iconic building without being either too literal or too abstract.

montreal-convention_center-composite

The design team, led by Trevor Ray, worked together to create the design using the stairs as inspiration (seen through the photo above). The stairs allowed a sense of movement. The colored panes are linked in a snake-like manner, a nod to Python’s namesake. If you look carefully, you will also see the letter P. Working in collaboration with the organizers, the team created multiple drafts, fine-tuning the look and feel with each phase of feedback. The final design represents the direction of the client, the inspiration of the building itself, and the team’s own creativity.

In addition to refreshing PyCon’s website, our developers, as led by Rebecca Lovewell, made augmentations to Symposion, a Django project for conference websites. We’ve previously worked with Symposion for PyCon 2014 and PyOhio. For this round of changes, the team used these previous augmentations as a jumping off point for refinements to the scheduler, financial aid processing, and sponsor information sharing.

Up next? A conference t-shirt!

Vinod KurupUsing dynamic queries in a CBV

Let's play 'Spot the bug'. We're building a simple system that shows photos. Each photo has a publish_date and we should only show photos that have been published (i.e. their publish_date is in the past).

``` python models.py class PhotoManager(models.Manager):

def live(self, as_of=None):
    if as_of is None:
        as_of = timezone.now()
    return super().get_query_set().filter(publish_date__lte=as_of)

```

And the view to show those photos:

``` python views.py class ShowPhotosView(ListView):

queryset = Hero.objects.live()

```

Can you spot the bug? I sure didn't... until the client complained that newly published photos never showed up on the site. Restarting the server fixed the problem temporarily. The newly published photos would show up, but then any photos published after the server restart again failed to display.

The problem is that the ShowPhotosView class is instantiated when the server starts. ShowPhotosView.queryset gets set to the value returned by Hero.objects.live(). That, in turn, is a QuerySet, but it's a QuerySet with as_of set to timezone.now() WHEN THE SERVER STARTS UP. That as_of value never gets updated, so newer photos never get captured in the query.

There's probably multiple ways to fix this, but an easy one is:

``` python views.py class ShowPhotosView(ListView):

def get_queryset(self):
    return Hero.objects.live()

```

Now, instead of the queryset being instantiated at server start-up, it's instantiated only when ShowPhotosView.get_queryset() is called, which is when a request is made.

Caktus GroupA Culture of Code Reviews

Code reviews are one of those things that everyone agrees are worthwhile, but sometimes don’t get done. A good way to keep getting the benefits of code reviews is to establish, and even nurture, a culture of code reviews.

When code reviews are part of the culture, people don’t just expect their changes to be reviewed, they want their changes reviewed.

Some advantages of code reviews

We can all agree that code reviews improve code quality by spotting bugs. But there are other advantages, especially when changes are reviewed consistently.

Having your own code reviewed is a learning experience. We all have different training and experiences, and code reviews give us a chance to share what we know with others on the team. The more experienced developer might be pointing out some pitfall they’ve learned by bitter experience, while the enthusiastic new developer is suggesting the latest library that can do half the work for you.

Reviewing other people’s code is a learning experience too. You’ll see better ways of doing things that you’ll want to adopt.

If all code is reviewed, there are no parts of the code that only one person is familiar with. The code becomes a collaborative product of the team, not a bunch of pieces “owned” by individual programmers.

Obstacles to code reviews

But you only get the benefits of code reviews if you do them. What are some things that can get in the way?

Insufficient staffing is an obvious problem, whether there’s only one person working on the code, or no one working on the code has time to review other changes, or to wait for their own to be reviewed. To nurture a culture of code reviews, enough staff needs to be allocated to projects to allow code reviews to be a part of the normal process. That means at least two people on a team who are familiar enough with the project to do reviews. If there’s not enough work for two full-team team members, one member could be part-time, or even both. Better two people working on a project part-time than one person full-time.

Poor tools can inhibit code reviews. The more difficult something is, the more likely we are to avoid doing it. Take the time to adopt a good set of tools, whether GitHub’s pull requests, the open source ReviewBoard project, or anything else that handles most of the tedious parts of a code review for you. It should be easy to give feedback, linked to the relevant changes, and to respond to the feedback.

Ego is one of the biggest obstacles. No one likes having their work criticized. But we can do things in ways that reduce people’s concerns.

Code reviews should be universal - everyone’s changes are reviewed, always. Any exception can be viewed, if someone is inclined that way, as an indication that some developers are “better” than others.

Reviews are about the code, not the coder. Feedback should be worded accordingly. Instead of saying “You forgot to check the validity of this input”, reviewers can say “This function is missing validation of this input”, and so forth.

We do reviews because our work is important and we want it to be as good as possible, not because we expect our co-workers to screw it up. At the same time, we recognize that we are all human, and humans are fallible.

Establishing a culture of code reviews

Having a culture where code reviews are just a normal part of the workflow, and we’d feel naked without them, is the ideal. But if you’re not there yet, how can you move in that direction?

It starts with commitment from management. Provide the proper tools, give projects enough staffing so there’s time for reviews, and make it clear that all changes are expected to be reviewed. Maybe provide some resources for training.

Then, get out of the way. Management should not be involved in the actual process of code reviews. If developers are reluctant to have other developers review their changes, they’re positively repelled by the idea of non-developers doing it. Keep the actual process something that happens among peers.

When adding code reviews to your workflow, there are some choices to make, and I think some approaches work better than others.

First, every change is reviewed. If developers pick and choose which changes are reviewed, inevitably someone will feel singled out, or a serious bug will slip by in a “trivial” change that didn’t seem to merit a review.

Second, review changes before they’re merged or accepted. A “merge then review” process can result in everyone assuming someone else will review the change, and nobody actually doing it. By requiring a review and signoff before the change is merged, the one who made the change is motivated to seek out a reviewer and get the review done.

Third, reviews are done by peers, by people who are also active coders. Writing and reviewing code is a collaboration among a team. Everyone reviews and has their own changes reviewed. It’s not a process of a developer submitting a proposed change to someone outside the team for approval.

The target

How will you know when you’re moving toward a culture of code reviews? When people want their code to be reviewed. When they complain about obstacles making it more difficult to get their code reviewed. When the team is happier because they’re producing better code and learning to be better developers.

Vinod KurupSome Emacs Posts

A few cool Emacs posts have flown across my radar, so I'm noting them here for that time in the future when I have time to play with them.

Vinod KurupPygments on Arch Linux

I wrote my first blog post in a little while (ok, ok... 18 months) yesterday and when I tried to generate the post, it failed. Silently failed, which is the worst kind of failure. I'm still not sure why it was silent, but I eventually was able to force it to show me an error message:

`` /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:354:inrescue in get_header': Failed to get header. (MentosError)

from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:335:in `get_header'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:232:in `block in mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:206:in `mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:189:in `highlight'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:24:in `pygments'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:14:in `highlight'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:37:in `block in render_code_block'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `gsub'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `render_code_block'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:12:in `pre_filter'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:28:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:112:in `block in pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `each'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:166:in `do_layout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/post.rb:195:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:200:in `block in render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `each'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:41:in `process'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/bin/jekyll:264:in `<top (required)>'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `load'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `<main>'

```

Professor Google tells me that this happens when you try to run the pygments.rb library in a Python 3 environment. (pygments.rb is a Ruby wrapper around the Python Pygments library). The fix is to run the code in a Python2 virtualenv. I guess the last time I updated my blog, Arch still had Python2 as the system default. No, I don't want to check how long ago that was.

$ mkvirtualenv -p `which python2` my_blog (my_blog)$ bundle exec rake generate

So now I'm running a Ruby command in a Ruby environment (rbenv) inside a Python 2 virtualenv. Maybe it's time to switch blog tools again...

Vinod KurupHow to create test models in Django

It's occasionally useful to be able to create a Django model class in your unit test suite. Let's say you're building a library which creates an abstract model which your users will want to subclass. There's no need for your library to subclass it, but your library should still test that you can create a subclass and test out its features. If you create that model in your models.py file, then Django will think that it is a real part of your library and load it whenever you (or your users) call syncdb. That's bad.

The solution is to create it in a tests.py file within your Django app. If it's not in models.py, Django won't load it during syncdb.

``` python tests.py from django.db import models from django.test import TestCase

from .models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class AbstractTest(TestCase):

def test_my_test_model(self):
    self.assertTrue(MyTestModel.objects.create(name='foo'))

```

A problem with this solution is that I rarely use a single tests.py file. Instead we use multiple test files collected in a tests package. If you try to create a model in tests/test_foo.py, then this approach fails because Django tries to create the model in an application named tests, but there is no such app in INSTALLED_APPS. The solution is to set app_label to the name of your app in an inner Meta class.

```python tests/test_foo.py from django.db import models from django.test import TestCase

from ..models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class Meta:
    app_label = 'myappname'

class AbstractTest(TestCase):

def test_my_test_model(self):
    self.assertTrue(MyTestModel.objects.create(name='foo'))

```

Oh, and I almost forgot... if you use South, this might not work, unless you set SOUTH_TESTS_MIGRATE to False in your settings file.

Comments and corrections welcome!

Joe GregorioObservations on hg and git

Having recently moved to using Git from Mercurial here are my observations:

Git just works

No matter what I try to do, there's a short and simple git command that does it. Need to copy a single file from one branch to my current branch, need to roll back the last two commits and place their changes into the index, need to push or pull from a local branch to a remote and differently named branch, there are all ways to do those things. More importantly, Git does them natively, I don't have to turn on plugins to get a particular piece of functionality.

Turning on plugins is a hurdle

The fact that what I consider to be core functionality is hidden away in plugins and need to be turned on manually is an issue. For example, look at this section of the docs for the Google API Python Client:

https://code.google.com/p/google-api-python-client/wiki/BecomingAContributor#Submitting_Your_Approved_Code

A big thing that trips up contributors is that "--rebase" is in a plugin (and I keep forgetting to update the docs to explain that).

Git is fast

So Git is fast, not just "ooh that was fast", but fast as in, "there must have been a problem because there's no way it could have worked that fast". That's a feature.

Branching

Git branches are much smoother and integrated than MQ. Maybe this is just because I got stuck on MQ and never learned another way to use hg, but the branch and merge workflow is a lot better than MQ.

SSH

In Git ssh: URIs just work for me. Maybe I just got lucky, or was previously unlucky, but I never seemed to be able to pull or push to a remote repository via ssh with hg, and it just worked as advertised with Git.

Helpful

Git is helpful. Git is filled with helpful messages, many of the form "it looks like you are trying to do blah, here's the exact command line for that", or "you seem to be in 'weird state foo', here's a couple different command lines you might use to rectify the situation". Obviously those are paraphrasing, but the general idea of providing long, helpful messages with actual commands in them is done well throughout Git.

Caveats

I'm not writing this to cast aspersions on the Mercurial developers, and I've already passed this information along to developers that work on Mercurial. I am hoping that if you're building command line tools that you can incorporate some of the items here, such as helpful error messages, speed, and robust out-of-the-box capabilities.

Caktus GroupContributing Back to Symposion

Recently Caktus collaborated with the organizers of PyOhio, a free regional Python conference, to launch the PyOhio 2014 conference website. The conference starts this weekend, July 26 - 27. As in prior years, the conference web site utilizes Eldarion’s Symposion, an opensource conference management system. Symposion powers a number of annual conference sites including PyCon and DjangoCon. In fact, as of this writing, there are 78 forks of Symposion, a nod to its widespread use for events both large and small. This collaboration afforded us the opportunity to abide by one our core tenets, that of giving back to the community.

PyOhio organizers had identified a few pain points during last year’s rollout that were resolvable in a manner that was conducive to contributing back to Symposion so that future adopters could benefit from this work. The areas we focused on were migration support, refining the user experience for proposal submitters and sponsor applicants, and schedule building.

Migration Support

https://github.com/pinax/symposion/pull/47

The majority of our projects utilize South for tracking database migrations. They are not an absolute requirement but for those conferences that reused the same code base from year to year, rather than starting a new repository, it would be beneficial to have a migration strategy in place. There were a few minor implementation details to tackle, namely migration dependencies and introspection rules. The Symposion framework has a number of interdependent apps. As such, when using migrations, the database tables must be created in a certain order. For Symposion, there are two such dependencies: Proposals depend on Speakers, and Sponsorship depends on Conferences. The implementation can be seen in this changeset. In addition, Symposion uses a custom field for certain models; django-timezones’ TimeZoneField. There are a few Pull Requests open on this project to deal with South and introspection rules, but none of them have been incorporated. As such, we add a very simple rule to work around migration errors.

As mentioned before, these migrations give Symposion a solid migration workflow for future database changes, as well as prepping for Django 1.7’s native schema migration support.

User Experience Issues

Currently, if an unauthenticated user manages to make a proposal submission, they are simply redirected to the home page of the site. Similarly, if an authenticated user without a Speaker profile makes a submission, they are redirected to their dashboard. In both cases, there is no additional feedback for what the user should do next. We utilized the django messages framework to provide contextual feedback with help text and hyperlinks should these be valid submission attempts (https://github.com/pinax/symposion/pull/50/files).

Sponsor submissions is another area that benefited from additional contextual messages. There are a variety of sponsor levels (Unobtanium, Aluminum, etc..) that carry their own sponsor benefits (print ad in program, for example). The current workflow redirects a sponsor application to the Sponsor Details page, with no contextual message, that lists Sponsor and Benefits details. For sponsor levels with no benefits, this essentially redirects you to an update form for the details you just submitted. Our pull request redirects these cases to the user dashboard with an appropriate message, as well as providing a more helpful message for sponsor applications that do carry benefits. (https://github.com/pinax/symposion/pull/49/files).

Schedule Builder

https://github.com/pinax/symposion/pull/51/files

The conference schedule is a key component to the web site, as it lets attendees (and speakers) know when to be where! It is also a fairly complex app, with a number of interwoven database tables. The screenshot below lists the components required to build a conference schedule:

At a minimum, to create one scheduled presentation requires 7 objects spread across 7 different tables. Scale this out to tens or nearly one hundred talks and the process of manually building a schedule become egregiously cumbersome. For PyCon 2014 we built a custom importer for the talks schedule. A quick glance reveals this is not easily reusable; there are pinned lunches and breaks, and this particular command assigns accepted proposals to schedule slots. For PyOhio, we wanted to provide something that was more generic and reusable. Rather than building out the entire schedule of approved talks, we wanted a fairly quick and intuitive way for an administrator to build the schedule’s skeleton via the frontend using a CSV file. The format of the CSV is intentionally basic, for example:

"date","time_start","time_end","kind"," room "
"12/12/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/12/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room2"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room2"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room2"

This sample, when uploaded, will create the requisite backend objects (2 Day, 2 Room, 2 Slot Kinds, 8 slots, and 12 SlotRoom objects). This initial implementation will fail if schedule overlaps occur, allows for front end deletion of Schedules, is tested, and provides documentation as well. Having a schedule builder will allow the conference organizers a chance to divert more energy into reviewing and securing great talks and keynotes, rather than dealing with the minutiae of administering the schedule itself.

Symposion is a great Python based conference management system. We are excited about its broad use in general, and in helping contribute to its future longevity and feature set.

Edited to Add (7/21/2014: 3:31PM): PR Merge Efforts

One of the SciPy2014 tech chairs, Sheila, is part of new efforts to get PRs merged. Join the mailing list to learn more about merges based off the PyOhio fork.

Caktus GroupWhat was being in Libya like?

Election day voting on June 25th. Image courtesy of HNEC.

 

Since this interview was done, Libya’s capitol began experiencing more violence. As of today, militias are fighting over control of the Tripoli international airport, the primary way in and out of this section of Libya. We’re keeping our friends and colleagues in Libya in our thoughts in these truly difficult times. This post speaks to the energy and talent of the Libyans we’ve worked with during this challenging democratic transition.

I’m the project manager for our Libya voter registration by text message project, the first in the world. Two of our staff members, Tobias McNulty, CEO, and Elliott Wilkes, Technical Project Manager, were in Libya during the June 25th elections. Given the political chaos and violence surrounding the elections, the US team was worried for them. Tobias recently returned and Elliott left for Nairobi last week. Elliott was in Libya since August 2013. I asked them some questions about being in Tripoli, Libya’s capital.

With the attempted coup just a couple weeks prior, what were tensions like on election day?

Tobias: One of the outcomes of the attempted coup in May was that the election date got moved up to June 25. This put everyone at the Libyan High National Election Commission (HNEC) on task to deliver an election much earlier than originally planned. I’m proud to say that Caktus and HNEC worked closely together to step up to the plate and meet this mandate successfully.

Elliott: Tripoli was plastered with advertisements about the election, both from HNEC and from candidates themselves. Most of the security presence in Tripoli kept to their usual posts on election day, the gas stations. Due to distribution issues and a resulting panic about petrol supplies, there was a run on gas in the city, and army and police vehicles helped guard gas stations to keep the peace while people spent hours in line waiting to top up. In spite of or perhaps because of this security presence, we didn’t witness any violence while in Tripoli during or in the days leading up to, during, or following election day. With a few marked exceptions this election proceeded largely without incident - a great success for HNEC. However, one act of violence just after the election did shake the country: the murder of humanitarian Salwa Bugaighis in Benghazi. We were all deeply saddened to hear about that significant loss.

Tobias: Yes, that was tragic. It was hard news for all of us.

How did the voter registration system function on election day?

Tobias: There were three things that HNEC and the citizens of Libya could use our text message-based voter registration system for on election day. First and foremost, poll workers and voters could use it to double check individual registrations and poll locations. Second, to track when polling centers opened so HNEC staff knew which centers needed additional support. And lastly, poll workers could send texts of number of voters that arrived, giving HNEC real-time turnout figures.

Elliott: On election day, we helped the polling center staff with message formatting, and performing custom database queries to assess progress throughout the day. It’s a real testament to our work that we encountered no technological issues throughout the election process. The system successfully handled nearly 100,000 text messages on election day alone.

What was the mood like at elections headquarters as the count of voters came in?

Tobias: At HNEC offices on election day, the mood was positive and optimistic. We had 1.5 million registrants in the system. We arrived early in the morning, and as the HNEC staff began to arrive, on numerous occasions the office stood and sang along to the Libyan national anthem. We joined in too of course. The flow was busy, but not unmanageable. We worked long days and accomplished much in the days surrounding the election. But thanks to adequate preparation on the part of HNEC and our team the workload was not unmanageable.

However, among citizens there is clearly some work to do in terms of motivating voter turnout on election day. Several citizens we talked to were indifferent to the elections, and expressed some distrust in elected leaders generally. That said, elections are still relatively new to Libya, and I think we need to moderate our expectations for how quickly they’ll grow in popularity.

Elliott: Having worked on a number of elections around the world I’ve come to the conclusion that the best election days are the boring ones - that is to say, those without incident. If you’ve done your job well, you go into election day with hours upon hours of preparations for all different outcomes and possibilities, plans A, B, C, to Z. You’ve spent weeks and months building a massive ship and all that’s left to do is enjoy the ride. And thankfully, due to our exhaustive preparations, everything was smooth sailing.

What was it like working with the Libyan government?

Tobias: While from the outside Libya may look like an unstable country with lots of negative coverage in the news, the reality on the ground is that working with HNEC has been a real pleasure. The operations staff at the Commission are motivated to continue strengthening democracy in the country, which was evidenced by the long hours many put in in the days leading up to and following the election. We’re honored that the government of Libya selected Caktus as the technology partner for this highly impactful project.

Elliott: Tobias, I couldn't agree more. Working on this project has been extraordinary. There's something special about working with a group of young, committed citizens putting in the extra hours to ensure their electoral process is as inclusive as possible, especially given that for over forty years, government services in the Libya were everything but. Their commitment to the pursuit of democracy and everything that entails has made this project a real pleasure and a deeply humbling experience. I'm proud that we've been able to support them at this critical junction.

We’ve seen the photos, but want to hear you list the pizza toppings!

Tobias: Ha, the rumors are true. The Libyans put all sorts of things on their pizza, the two most prominent of which are often canned tuna fish and french fries. Ketchup and mayonnaise are two other favorite toppings.

Elliott: It tastes terrible. Honestly, I prefer shawerma. The pizza toppings in Libya can be...a bit exotic for my taste.

Tobias: Luckily, there’s other food and the staff at HNEC didn’t hesitate to invite us in to join their meals. It was a pleasure to break bread with the staff at HNEC during such a momentous week for the country of Libya.

 

Libyan Pizza
Photo by Elliott Wilkes.

 

Caktus GroupJuly 2014 ShipIt Day Recap

This past Friday we celebrated another ShipIt day at Caktus. There was a lot of open source contribution, exploring, and learning happening in the office. The projects ranged from native mobile Firefox OS apps, to development on our automated server provisioning templates via Salt, to front-end apps aimed at using web technology to create interfaces where composing new music or performing Frozen’s Let It Go is so easy anyone can do it.

Here is a quick summary of the projects that folks worked on:

Calvin worked on updating our own minimal CMS component for editing content on a site, django-pagelets, to work nicely with Django 1.7. He also is interested in adding TinyMCE support and making it easy to upload images and reference them in the block. If you have any other suggestions for pagelets, get in touch with Calvin.

ShipIt Day Project: Anglo-Saxon / French Etymology Analyzer

Philip worked on a code to tag words in a text with basic information about their etymologies. He was interested in exploring words with dual French and Anglo-Saxon variations eg “Uncouth” and “Rude”. These words have evolved from different origins to have similar meanings in modern English and it turns out that people often perceive the French or Latin derived word, in general, to be more erudite (“erudite” from Latin) than the Anglo-Saxon variant. To explore this concept, Philip harvested word etymologies from the XML version of the Wiktionary database and categorized words from in Lewis Carroll’s Alice In Wonderland as well as reports from the National Hurricane Center. His initial results showed that Carroll’s British background was evident in his use of language, and Philip is excited to take what he developed in ShipIt day and continue to work on the project.

Mark created a Firefox OS app, Costanza, based on a concept from a Seinfeld episode. Mark’s app used standard web tools including HTML, CSS, and Javascript to build an offline app that recorded and played back audio. Mark learned a lot about building apps with the new OS and especially spent a lot of time diving into issues with packaging up apps for distribution.

Rebecca and Scott collaborated on work in porting an application to the latest and greatest Python 3. The migration of apps from Python 2 to Python 3 started off as a controversial subject in the Python community, but slowly there has been lots of progress. Caktus is embracing this transition and trying to get projects ported over when there is time. Rebecca and Scott both wrestled with some of the challenges faced with moving a big project on a legacy server over to a new Python version.

Dan also wrestled with the Python 2 to 3 growing pains, though less directly. He set out to create a reusable Django app that supported generic requirements he had encountered in a number of client apps while exporting data to comma separated value (CSV) format. But, while doing this, he ran into difference in the Python 2 and 3 standard libraries for handling CSVs. Dan created cordwainer, a generic CSV library that works both in Python 2 and 3.

ShipIt Day Project: Template Include Visualization

Victor and Caleb worked together to create a wonderful tool for debugging difficult and tangled Django template includes. The tool helps template developers edit templates without fear that they won’t know what pages on the live site may be affected by their changes. They used d3 to visualize the template in a way that was interactive and intuitive for template writers to get a handle on complex dependency trees.

Michael has been working on a migraine tracking app using in iOS using PhoneGap and JQuery mobile. He has been diving in and learning about distributing mobile apps using XCode and interfacing with the phone calendar to store migraine data. In terms of the interface, Michael studied up on accessibility in creating the app whose primary audience will not be wanting to dig into small details or stare at their bright phone long while enduring a migraine.

Karen, Vinod, and Tobias all worked together to help improve Caktus’ Django project template. Karen learned a lot about updating projects on servers provisioned with Salt while trying to close out one of the tickets on our project-template repository. The ticket she was working on was how to delete stale Python byte code (.pyc) files that are left over when a Python source code file (.py) is deleted from a Git repository. These stale .pyc files can cause errors when they aren’t deleted properly during an upgrade. Vinod worked through many issues getting Docker instead of Virtualbox with Vagrant to create virtual environments in which SaltStack can run and provisioning new servers. Docker is a lighter weight environment than a full Virtualbox Linux server and would allow for faster iteration while developing provisioning code with SaltStack. Tobias improved the default logging configuration in the template to make it easier to debug errors when they occur, and also got started on some tools for integration testing of the project template itself.

Wray and Hunter collaborated to build a music composition and performance app called Whoppy (go ahead and try it out!). Whoppy uses Web Audio to create a new randomized virtual instruments every time you start the app. Wray and Hunter worked through creating a nice interface that highlights notes in the same key so that it is easier for amateur composers to have fun making music.

Og MacielThe End For Pylyglot

Background

It was around 2005 when I started doing translations for Free and Open-Source Software. Back then I was warmly welcomed to the Ubuntu family and quickly learned all there was to know about using their Rosetta online tool to translate and/or review existing translations for the Brazilian Portuguese language. I spent so much time doing it, even during working hours, that eventually I sort of “made a name for myself” and made my way up to the upper layers of the Ubuntu Community echelon.

Then I “graduated” and started doing translations for the upstream projects, such as GNOME, Xfce, LXDE, and Openbox. I took on more responsabilities, learned to use Git and make commits for myself as well as for other contributors, and strived to unify all Brazilian Portuguese translations across as many different projects as possible. Many discussions were had, (literally) hundreds of hours were spent going though also hundreds of thoundands of translations for hundreds of different applications, none of it bringing me any monetary of financial advantage, but all done for the simple pleasure of knowing that I was helping make FOSS applications “speak” Brazilian Portuguese.

I certainly learned a lot though the experience of working on these many projects… some times I made mistakes, other times I “fought” alone to make sure that standards and procedures were complied with. All in all, looking back I only have one regret: not being nominated to become the leader for the Brazilian GNOME translation team.

Having handled 50% of the translations for one of the GNOME releases (the other 50% was handled by a good friend, Vladimir Melo while the leader did nothing to help) and spent much time making sure that the release would go out the door 100% translated, I really thought I’d be nominated to become the next leader. Not that I felt that I needed a ‘title’ to show off to other people, but in a way I wanted to feel that my peers acknowledged my hard work and commitment to the project.

Seeing other people, even people with no previous experience, being nominated by the current leader to replace him was a slap in the face. It really hurt me… but I made sure to be supportive and continue to work just as hard. I guess you could say that I lived and breathed translations, my passion not knowing any limits or knowing when to stop…

But stop I eventually did, several years ago, when I realized how hard it was to land a job that would allow me to support my family (back then I had 2 small kids) and continue to do the thing I cared the most. I confess that I even went through a series of job interviews for the translation role that Jono Bacon, Canonical’s former community manager, was trying to hire, but in the end things didn’t work out the way I wanted. I also flirted with another similar role at MeeGo but since they wanted me to move to the West Coast I decided not to pursue it (I also had fallen in love with my then current job).

Pylyglot

As a way to keep myself somewhat still involved with the translation communities and at the same time learn a bit more about the Django framework, I then created Pylyglot, “a web based glossary compedium for Free and Open Source Software translators heavily inspired on the Open-tran.eu web site… with the objective to ‘provide a concise, yet comprehensive compilation of a body of knowledge’ for translators derived from existing Free and Open Source Software translations.”

Pylyglot

I have been running this service on my own and paying for the cost of domain registration and database costs out of my own pocket for a while now, and I now find myself facing the dilema of renewing the domain registration and keep Pylyglot alive for another year… or retire it and end once and for all my relationship with FOSS translations.

Having spent the last couple of months thinking about it, I have now arrived at the conclusion that it is time to let this chapter of my life rest. Though the US$140/year that I won’t be spending won’t make me any richer, I don’t foresee myself either maintaining or spending any time improving the project. So this July 21st, 2014 Pylyglot will close its doors and cease to exist in its current form.

To those who knew about Pylyglot and used it and, hopefuly, found it to be useful, my sincere thanks for using it. To those who supported my idea and the project itself, whether by submitting code patches, building the web site or just giving me moral support, thank you!

Caktus GroupRemoval of Mural

We have recently heard complaints about the painting over of the mural on the side of 108 Morris, the building we purchased and are restoring in Downtown Durham. I am personally distressed at this response. I see now, in retrospect, where we needed to work harder to discuss our decision with the community. In our enthusiasm to bring more life to Downtown Durham via ground-level retail space and offices for our staff, we were blind to what the public response might be to the mural. Its removal was not a decision taken lightly and one done in consultation with the Historic Preservation Commission. However, we handled this poorly. We apologize for not making more efforts to include the community in this decision.

I do wish to emphasize that though we are moving from Carrboro to Durham, many of us are Durhamites, including two of three owners. Many in our small staff of 23 feel far from outsiders. We raise our families in Durham. Our CTO, Colin Copeland, is co-captain of Code for Durham, a volunteer organization dedicated solely to giving more access to public records information for the citizens of Durham. But again, our interest in Downtown Durham is not theoretical, but the place we are building our lives… so this building project is a deeply personal one. We want to see Downtown Durham continue to thrive.

Unfortunately, in restoring a long abandoned historic building that had been remodeled by many hands over the decades, we had to make sacrifices. To return the building to its original 1910 state, we needed to unbrick the windows which would also remove sections of Emily Weinstein’s 1996 Eno River mural. The mural would receive further damage around the windows by default. Our contractor told us (and we could see) the mural had begun deteriorating. We were as diligent as humanly possible, referring often to great resources like Endangered Durham and Open Durham for images of the original building in making the final decision. It was a difficult decision and one that we, of course, could not make alone.

We tried our best to not only preserve, but to add to Durham. We submitted our proposal to the Historic Preservation Commission (HPC) and they approved it during a public meeting in April. They had already approved a similar proposal from the previous owner of the building. During the meeting, those who knew better than us-- actual preservationists-- said that going forward with the window openings would do more to preserve the integrity of the building than the more recent mural. These layers of approval made us feel we should proceed with our focus on restoration.

To further ensure we were doing right by Durham, we voluntarily and eagerly followed the guidelines of the National Park Service and the North Carolina State Historic Preservation Office for exterior restorations. The State Historic Preservation Offices and the National Park Service review the rehabilitation work to ensure that it complies with the Secretary’s Standards for Rehabilitation. As residents of Durham, we were excited and motivated at the prospect of further burnishing Downtown Durham’s reputation as a historic center.

Now, we see that we should not have assumed that the community would see and understand our sincere efforts to improve Downtown Durham. We strongly felt that occupation and restoration of a vacant building would be welcomed. We had not heard complaints until yesterday which surprised us in part because our plans were public. We received one phone call we missed, but they did not respond to our return call. We are new to land development-- as a technology firm, we can safely say that it is not our focus. But we are made up of real people. We are a small firm that got its start thanks to the community around us, so again, it pains me to think we have hurt the community in any way.

In an effort to show our good faith and make amends, we’re planning on having a public meeting within the next few weeks. We are working to arrange a space for it, but will update you as soon as possible. We want to hear your thoughts and brainstorm together how we can better support our new home. We want to listen. We will also happily share with you how the restoration is coming along with photos and mock-ups of the space.

Please sign up to join our mailing list for updates and to find out when the public meeting will be: http://cakt.us/building-updates

Again, we are eager to hear your thoughts.

Sincerely, Tobias McNulty, CEO

Caktus GroupAnnouncing Q2 Caktus Charitable Giving

Caktus participates in social impact projects around the world, but we believe in starting local. We’re proud of the many ways in which our staff contribute to local organizations, each making the world around us just a little better. To further support our employees, Caktus asks employees to suggest donations every quarter. This quarter, we’re sending contributions to the following five non-profits:

RAIN: Regional AIDS Interfaith Network

www.carolinarain.org
RAIN engages the community to transform lives and promote respect and dignity for all people touched by HIV through compassionate care, education and leadership development. Caktus staff visited RAIN during a focus group test of a mobile HIV adherence application last year and admired their good work.

Urban Durham Ministries

www.umdurham.org
Urban Ministries of Durham welcomes more than 6,000 people each year who come seeking food, shelter, clothing and supportive services.

Ronald McDonald House of Chapel Hill

www.rmh-chapelhill.org
Each year, The Ronald McDonald House of Chapel Hill provides more than 2,200 families with seriously ill or injured children the basic necessities and comforts of home so that they can focus on caring for a sick child. Caktus’ contribution will shelter a family in need for one week.

Raleigh Review

www.raleighreview.org
The Raleigh Review mission is to foster the creation and availability of accessible yet provocative contemporary literature through our biannual magazine as well as through workshops, readings, and other community events.

LGBT Center of Durham

www.lgbtcenterofdurham.com
The LGBT Center of Raleigh is working in tandem with Durham community members to establish a Durham branch for local events, programs, and resources.

VOICES, the Chapel Hill Chorus

voiceschapelhill.org
Voices is one of the Triangle’s oldest and most distinguished choral groups with a rich history spanning over three decades. Multiple Caktus employees participate. Caktus is providing financial support for promotional T-shirts for the group.

Caktus GroupTips for Upgrading Django

From time to time we inherit code bases running outdated versions of Django and part of our work is to get them running a stable and secure version. In the past year we've done upgrades from versions as old as 1.0 and we've learned a few lessons along the way.

Tests are a Must

You cannot begin a major upgrade without planning how you are going to test that the site works after the upgrade. Running your automated test suite should note warnings for new or pending deprecations. If you don’t have an automated test suite then now would be a good time to start one. You don't need 100% coverage, but the more you have, the more confident you will feel about the upgrade. Integration tests with Django's TestClient can help cover a lot of ground with just a few tests. You'll want to use these sparingly because they tend to be slow and fragile. However, you can use them to test your app much like a human might do, submitting forms (both valid and invalid), and navigating to various pages. As you get closer to your final target version or you find more edge cases, you can add focused unittests to cover those areas. It is possible to do these upgrades without a comprehensive automated test suite and only using manual testing but you need a thorough plan to test the entire site. This type of testing is very slow and error prone and if you are going to be upgrading multiple Django versions it may have to be run multiple times.

Know Your Release Notes

Given Django's deprecation cycle, it's easiest to upgrade a project one Django version at a time. If you try to jump two releases, you may call Django APIs which no longer exist and you’ll miss the deprecation warnings that existed only in the releases you jumped over. Each version has a few big features and a few things which were changed and deprecated. For Django 1.1, there were a number of new features for the admin and most of the deprecations and breaking changes were related to the admin. Django 1.2 added multiple database support and improved the CSRF framework which deprecated the old DB and CSRF settings and code. Static file handling landed in Django 1.3 as did class based views. This started the deprecation of the old function based generic view and the old-style url tag. Django 1.4 changed the default project layout and the manage.py script. It also improved timezone support and upgrading usually started with tracking down RuntimeWarnings about naive datetimes. The customized user was added in Django 1.5 but more important in terms of upgrading was the removal of the function based generic views like direct_to_template and redirect_to. You can see a great post about changing from the built-in User to a custom User model on our blog. Also the url tag upgrade was completed so if your templates weren't updated yet, you'd have a lot of work to do. Django 1.6 reworked the transaction handling and deprecated all of the old transaction management API. Finally in the upcoming 1.7 version, Django will add built-in migrations and projects will need to move away from South. The app-loading refactor is also landing which changes how signals should be registered and how apps like the admin should manage auto-discovery.

Do Some Spring Cleaning

As you upgrade your project remember that there are new features in the Django versions. Take the opportunity to refactor code which wasn't easily handled by older versions of Django. Django 1.2's "smart if" and "elif" (added in 1.4) can help clean up messy template blocks. Features like prefetch_related (added in 1.4) can help reduce queries on pages loading a number of related objects. The update_fields parameter on the save method (added in Django 1.5) is another place where applications can lower overhead and reduce parallel requests overwriting data.

There will also be reusable third-party applications which are no longer compatible with the latest Django versions. However, in most cases there are better applications which are up to date. Switching reusable apps can be difficult and disruptive but in most cases it's better than letting it hold you back from the latest Django release. Unless you are willing to take on the full time maintenance of an app which is a release or two behind Django, you are better off looking for an alternative.

Summary

Those are the highlights from our experiences. Get a stable test suite in place before starting, take it one release at a time, and do some spring cleaning along the way. Overall it's much less work to upgrade projects as soon as possible after new Django versions are released. But we understand dependencies and other deadlines can get in the way. If you find yourself a few releases behind, we hope this can help guide you in upgrading.

 

Mark Lavin is the author of the forthcoming book, Lightweight Django, from O'Reilly Media.

 

Caktus GroupChapelboro.com: Carrboro Firm Develops Web App to Register Voters in Libya

Chapelboro.com recently featured Caktus’ work in implementing the first ever voter registration system via text message.

Caktus GroupO'Reilly Deal: 50% Off Lightweight Django

O'Reilly Media, the go-to source for technical books, just let us know that they're having a 50% off sale on eBook pre-orders of Lightweight Django today. Use coupon code: DEAL.

Lightweight Django is being written by our very own Technical Director, Mark Lavin and Caktus alumna Julia Elman. We would've thought the book was a fantastic intro to the power of Django in web app development anyway, but since Mark and Julia wrote it, we think it’s extra fantastic.

Mark and Julia are continuing to write, but O'Reilly is providing this special pre-release peek for pre-orders. Those that pre-order automatically receive the first three chapters, content as it’s being added, the complete ebook, free lifetime access, multiple file formats, and free updates.

Og MacielFauxFactory 0.3.0

Took some time from my vacation and released FauxFactory 0.3.0 to make it Python 3 compatible and to add a new generate_utf8 method (plus some nice tweaks and code clean up).

As always, the package is available on Pypi and can be installed via pip install fauxfactory.

If you have any constructive feedback, suggestions, or file a bug report or feature request, please use the Github page.

Caktus GroupCaktus + Durham Bulls Game!

Is there a better way to celebrate the first day of summer than a baseball game? To ring in summer, the Caktus team and their families attended a Durham Bulls game. It was a great chance to hang out in our new city before relocating later this fall.

Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game
Caktus @ Durham Bulls Game

Caktus GroupTechPresident: Libya Uses World's First Mobile Voter Registration System

Caktus team members from our Libya mobile voter registration team recently spoke with TechPresident about the context and challenges of implementation.

Caktus GroupCaktus Supports Libya Elections with World’s First SMS Voter Registration System

Today’s election in Libya, the second general election for a governing body since Gaddafi’s ouster, is being supported in-country by our Caktus team. Caktus developers created Libya's SMS voter registration system, the first of its kind in the world.

Since 2013, we have worked closely with the Libyan government to create mobile applications that would enable poll workers and citizens to register to vote. The system currently has over 1.5 million registrants. Using lessons learned in the first national test of the system during the February elections for the constitutional draft writers, we’re excited to be on the ground, supporting the Libyan government.

Our work includes data management, running reports to show progress throughout the day, and assisting poll workers in verifying registration data. With more than 12 tons of paper registrations that resulted from SMS registrations, the vast amount of data streaming to and from the system is keeping our team on their toes.

There are many news articles describing the political instability and significant security challenges faced by Libya. There is no question that the situation is difficult. However, we see the hope and excitement of not only Libya’s election staff, but also in the citizens of this fledgling democracy. We are proud to be amongst the organizations working to support Libya’s democratic transition.

Caktus GroupGetting Started Scheduling Tasks with Celery

Many Django applications can make good use of being able to schedule work, either periodically or just not blocking the request thread.

There are multiple ways to schedule tasks in your Django app, but there are some advantages to using Celery. It’s supported, scales well, and works well with Django. Given its wide use, there are lots of resources to help learn and use it. And once learned, that knowledge is likely to be useful on other projects.

Celery versions

This documentation applies to Celery 3.0.x. Earlier or later versions of Celery might behave differently.

Introduction to Celery

The purpose of Celery is to allow you to run some code later, or regularly according to a schedule.

Why might this be useful? Here are a couple of common cases.

First, suppose a web request has come in from a user, who is waiting for the request to complete so a new page can load in their browser. Based on their request, you have some code to run that's going to take a while (longer than the person might want to wait for a web page), but you don't really need to run that code before responding to the web request. You can use Celery to have your long-running code called later, and go ahead and respond immediately to the web request.

This is common if you need to access a remote server to handle the request. Your app has no control over how long the remote server will take to respond, or the remote server might be down.

Another common situation is wanting to run some code regularly. For example, maybe every hour you want to look up the latest weather report and store the data. You can write a task to do that work, then ask Celery to run it every hour. The task runs and puts the data in the database, and then your Web application has access to the latest weather report.

A task is just a Python function. You can think of scheduling a task as a time-delayed call to the function. For example, you might ask Celery to call your function task1 with arguments (1, 3, 3) after five minutes. Or you could have your function batchjob called every night at midnight.

We'll set up Celery so that your tasks run in pretty much the same environment as the rest of your application's code, so they can access the same database and Django settings. There are a few differences to keep in mind, but we'll cover those later.

When a task is ready to be run, Celery puts it on a queue, a list of tasks that are ready to be run. You can have many queues, but we'll assume a single queue here for simplicity.

Putting a task on a queue just adds it to a to-do list, so to speak. In order for the task to be executed, some other process, called a worker, has to be watching that queue for tasks. When it sees tasks on the queue, it'll pull off the first and execute it, then go back to wait for more. You can have many workers, possibly on many different servers, but we'll assume a single worker for now.

We'll talk more later about the queue, the workers, and another important process that we haven't mentioned yet, but that's enough for now, let's do some work.

Installing celery locally

Installing celery for local use with Django is trivial - just install django-celery:

$ pip install django-celery
.

Configuring Django for Celery

To get started, we'll just get Celery configured to use with runserver. For the Celery broker, which we will explain more about later, we'll use a Django database broker implementation. For now, you just need to know that Celery needs a broker and we can get by using Django itself during development (but you must use something more robust and better performing in production).

In your Django settings.py file:

  1. Add these lines:
import djcelery
djcelery.setup_loader()
BROKER_URL = 'django://'

The first two lines are always needed. Line 3 configures Celery to use its Django broker.

Important: Never use the Django broker in production. We are only using it here to save time in this tutorial. In production you'll want to use RabbitMQ, or maybe Redis.

  1. Add djcelery and kombu.transport.django to INSTALLED_APPS:
INSTALLED_APPS = (
   ...
   'djcelery',
   'kombu.transport.django',
   ...
)

djcelery is always needed. kombu.transport.django is the Django-based broker, for use mainly during development.

  1. Create celery's database tables. If using South for schema migrations:
$ python manage.py migrate

Otherwise:

$ python manage.py syncdb
.

Writing a task

As mentioned before, a task can just be a Python function. However, Celery does need to know about it. That's pretty easy when using Celery with Django. Just add a tasks.py file to your application, put your tasks in that file, and decorate them. Here's a trivial tasks.py:

from celery import task

@task()
def add(x, y):
    return x + y

When djcelery.setup_loader() runs from your settings file, Celery will look through your INSTALLED_APPS for tasks.py modules, find the functions marked as tasks, and register them for use as tasks.

Marking a function as a task doesn't prevent calling it normally. You can still call it: z = add(1, 2) and it will work exactly as before. Marking it as a task just gives you additional ways to call it.

Scheduling it

Let's start with the simple case we mentioned above. We want to run our task soon, we just don't want it to hold up our current thread. We can do that by just adding .delay to the name of our task:

from myapp.tasks import add

add.delay(2, 2)

Celery will add the task to its queue ("worker, please call myapp.tasks.add(2, 2)") and return immediately. As soon as an idle worker sees it at the head of the queue, the worker will remove it from the queue, then execute it:

import myapp.tasks.add

myapp.tasks.add(2, 2)
.

A warning about import names

It's important that your task is always imported and refered to using the same package name. For example, depending on how your Python path is set up, it might be possible to refer to it as either myproject.myapp.tasks.add or myapp.tasks.add. Or from myapp.views, you might import it as .tasks.add. But Celery has no way of knowing those are all the same task.

djcelery.setup_loader() will register your task using the package name of your app in INSTALLED_APPS, plus .tasks.functionname. Be sure when you schedule your task, you also import it using that same name, or very confusing bugs can occur.

Testing it

Start a worker

As we've already mentioned, a separate process, the worker, has to be running to actually execute your Celery tasks. Here's how we can start a worker for our development needs.

First, open a new shell or window. In that shell, set up the same Django development environment - activate your virtual environment, or add things to your Python path, whatever you do so that you could use runserver to run your project.

Now you can start a worker in that shell:

$ python manage.py celery worker --loglevel=info

The worker will run in that window, and send output there.

Run your task

Back in your first window, start a Django shell and run your task:

$ python manage.py shell
>>> from myapp.tasks import add
>>> add.delay(2, 2)

You should see output in the worker window indicating that the worker has run the task:

[2013-01-21 08:47:08,076: INFO/MainProcess] Got task from broker: myapp.tasks.add[e080e047-b2a2-43a7-af74-d7d9d98b02fc]
[2013-01-21 08:47:08,299: INFO/MainProcess] Task myapp.tasks.add[e080e047-b2a2-43a7-af74-d7d9d98b02fc] succeeded in 0.183349132538s: 4
.

An Example

Earlier we mentioned using Celery to avoid delaying responding to a web request. Here's a simplified Django view that uses that technique:

# views.py

def view(request):
    form = SomeForm(request.POST)
    if form.is_valid():
        data = form.cleaned_data
        # Schedule a task to process the data later
        do_something_with_form_data.delay(data)
    return render_to_response(...)

# tasks.py

@task
def do_something_with_form_data(data):
    call_slow_web_service(data['user'], data['text'], ...)
.

Troubleshooting

It can be frustrating trying to get Celery tasks working, because multiple parts have to be present and communicating with each other. Many of the usual tips still apply:

  • Get the simplest possible configuration working first.
  • Use the python debugger and print statements to see what's going on.
  • Turn up logging levels (e.g. --loglevel debug on the worker) to get more insight.

There are also some tools that are unique to Celery.

Eager scheduling

In your Django settings, you can add:

CELERY_ALWAYS_EAGER = True

and Celery will bypass the entire scheduling mechanism and call your code directly.

In other words, with CELERY_ALWAYS_EAGER = True, these two statements run just the same:

add.delay(2, 2)
add(2, 2)

You can use this to get your core logic working before introducing the complication of Celery scheduling.

Peek at the Queue

As long as you're using Django itself as your broker for development, your queue is stored in a Django database. That means you can look at it easily. Add a few lines to admin.py in your application:

from kombu.transport.django import models as kombu_models
site.register(kombu_models.Message)

Now you can go to /admin/django/message/ to see if there are items on the queue. Each message is a request from Celery for a worker to run a task. The contents of the message are rather inscrutable, but just knowing if your task got queued can sometimes be useful. The messages tend to stay in the database, so seeing a lot of messages there doesn't mean your tasks aren't getting executed.

Check the results

Anytime you schedule a task, Celery returns an AsyncResult object. You can save that object, and then use it later to see if the task has been executed, whether it was successful, and what the result was.

result = add.delay(2, 2)
...
if result.ready():
    print "Task has run"
    if result.successful():
        print "Result was: %s" % result.result
    else:
        if isinstance(result.result, Exception):
            print "Task failed due to raising an exception"
            raise result.result
        else:
            print "Task failed without raising exception"
 else:
     print "Task has not yet run"
.

Periodic Scheduling

Another common case is running a task on a regular schedule. Celery implements this using another process, celerybeat. Celerybeat runs continually, and whenever it's time for a scheduled task to run, celerybeat queues it for execution.

For obvious reasons, only one celerybeat process should be running (unlike workers, where you can run as many as you want and need).

Starting celerybeat is similar to starting a worker. Start another window, set up your Django environment, then:

$ python manage.py celery beat

There are several ways to tell celery to run a task on a schedule. We're going to look at storing the schedules in a Django database table. This allows you to easily change the schedules, even while Django and Celery are running.

Add this setting:

CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'

You can now add schedules by opening the Django admin and going to /admin/djcelery/periodictask/. See the image above for what adding a new periodic task looks like, and here's how the fields are used:

  • Name — Any name that will help you identify this scheduled task later.
  • Task (registered) — This should give a choice of any of your defined tasks, as long as you've started Django at least once after adding them to your code. If you don't see the task you want here, it's better to figure out why and fix it than use the next field.
  • Task (custom) — You can enter the full name of a task here (e.g. myapp.tasks.add), but it's better to use the registered tasks field just above this.
  • Enabled — You can uncheck this if you don't want your task to actually run for some reason, for example to disable it temporarily.
  • Interval — Use this if you want your task to run repeatedly with a certain delay in between. You'll probably need to use the green "+" to define a new schedule. This is pretty simple, e.g. to run every 5 minutes, set "Every" to 5 and "Period" to minutes.
  • Crontab — Use crontab, instead of Interval, if you want your task to run at specific times. Use the green "+" and fill in the minute, hour, day of week, day of month, and day of year. You can use "*" in any field in place of a specific value, but be careful - if you use "*" in the Minute field, your task will run every minute of the hour(s) selected by the other fields. Examples: to run every morning at 7:30 am, set Minute to "30", Hour to "7", and the remaining fields to "*".
  • Arguments — If you need to pass arguments to your task, you can open this section and set *args and **kwargs.
  • Execution Options — Advanced settings that we won't go into here.

Default schedules

If you want some of your tasks to have default schedules, and not have to rely on someone setting them up in the database after installing your app, you can use Django fixtures to provide your schedules as initial data for your app.

  • Set up the schedules you want in your database.
  • Dump the schedules in json format:
$ python manage.py dumpdata djcelery --indent=2 --exclude=djcelery.taskmeta >filename.json
  • Create a fixtures directory inside your app
  • If you never want to edit the schedules again, you can copy your json file to initial_data.json in your fixtures directory. Django will load it every time syncdb is run, and you'll either get errors or lose your changes if you've edited the schedules in your database. (You can still add new schedules, you just don't want to change the ones that came from your initial data fixture.)
  • If you just want to use these as the initial schedules, name your file something else, and load it when setting up a site to use your app:
$ python manage.py loaddata <your-app-label/fixtures/your-filename.json
.

Hints and Tips

Don't pass model objects to tasks

Since tasks don't run immediately, by the time a task runs and looks at a model object that was passed to it, the corresponding record in the database might have changed. If the task then does something to the model object and saves it, those changes in the database are overwritten by older data.

It's almost always safer to save the object, pass the record's key, and look up the object again in the task:

myobject.save()
mytask.delay(myobject.pk)

...


@task
def mytask(pk):
    myobject = MyModel.objects.get(pk=pk)
    ...
.

Schedule tasks in other tasks

It's perfectly all right to schedule one task while executing another. This is a good way to make sure the second task doesn't run until the first task has done some necessary work first.

Don't wait for one task in another

If a task waits for another task, the first task's worker is blocked and cannot do any more work until the wait finishes. This is likely to lead to a deadlock, sooner or later.

If you're in Task A and want to schedule Task B, and after Task B completes, do some more work, it's better to create a Task C to do that work, and have Task B schedule Task C when it's done.

Next Steps

Once you understand the basics, parts of the Celery User's Guide are good reading. I recommend these chapters to start with; the others are either not relevant to Django users or more advanced:

Using Celery in production

The Celery configuration described here is for convenience in development, and should never be used in production.

The most important change to make in production is to stop using kombu.transport.django as the broker, and switch to RabbitMQ or something equivalent that is robust and scalable.

Caktus GroupReflecting on Challenges Faced by Female Developers

Karen Tracey, a Django core committer and Caktus Lead Developer and Technical Manager, recently participated in TriLUG’s panel on Women in Free and/or Open Source Software. Karen was one of five female developers who discussed challenges women face in joining the open source community. We recently caught up with Karen to discuss her own experience.

Why do you think there are so few women software developers?
This question always come up. There’s no good single answer. I think there are implicit and explicit messages women get from a very young age that this is not for them. Nobody really knows the complete answer. It was great to see a lot of women come to the meeting. I hope the panel was useful for them and encourages increased participation.

Did you think of computer science as a “boy’s only” field?
I’m old enough that when I was entering the field, women were at the highest levels within computer science. I entered at a time where women were joining a lot of professional fields-- law, medicine, etc. I had no reason to think computer engineering was different.

Also, I had this bubble with technical parents. My father worked for IBM, as had my mother before having children, and I had an IBM PC the year they came out. Also, I went to an all-girl’s high school and I think that helped in the sense that there was no boy’s group to say this is a boy’s thing. For me, there wasn’t a lot of pushing away that younger girls now see in the field.

I think the highest enrollment in computer science degree was when I went to college over twenty-five years ago. Notre Dame had far more men than women at the time, so that there were single-digit number of females in a class of around 100 seemed more like a reflection of the school’s gender ratio. I was not limited in what I could do.

Did you receive any negative messages at the beginning of your career?
I did with a professor who flat-out stated women shouldn’t be in technical fields. I was a grad student at the time and had received enough positive feedback by then that his opinion did not hold much weight with me, plus he said I was an “exception”. But his message could have been quite inhibiting to me a few years earlier I think. There have been multiple gender-related dustups through the overall open source community. When I first started using Django, I did question whether to sign my own name on my very first question posted to the django-users mailing list. I didn’t know if it was wise to reveal I was a woman before I was established in the community. I did and got an excellent welcome, but I was not sure what to expect having read about various ways in which women were disrespected in such communities.

What do you think individuals in the open source community can do to increase participation by women?
Be welcoming, including explicit individual invitations to attend/participate (this came up during the panel). Be aware that many women may have been receiving these “this is not for you” messages from a young age and try to counteract it. Be observant and try to notice any behavior by others which may be unwelcoming. If you see unwelcoming or bad behavior, take steps to correct it. For example, if someone makes an inappropriate joke, don’t just ignore it but rather make it clear to the joke-teller and whatever group that heard it that you don’t find it funny or appropriate.

Caktus GroupCTO Copeland Featured on WNCN for Open Government App

Colin Copeland, our Chief Technology Officer, recently spoke to WNCN about a new web application, NCFoodInspector.com, that lets Durham County visitors know the cleanliness of nearby restaurants. Colin helped build the application in his spare time as captain of Code for Durham Brigade, an all-volunteer group dedicated to using technology to improve access to publicly available sanitation scores. The group leverages open source technology to build applications.

NCFoodInspector.com displays a map and listing of restaurants, their sanitation score, and details of any violations. This makes difficult to access health inspection information readily available for the first time. To ensure the app reached multiple populations, it is also available in Spanish.

Colin says this is just the first of many future applications. The Brigade hopes to build more apps that can serve as a resource to the Durham County community using public information.

To view Colin’s interview, visit WNCN.

Og MacielTwenty Three Years

My parents were eagerly awaiting our arrival on an early Spring morning, and when our plane finally landed after the almost 10 1/2 hours flight and we made our way to the luggage claim area, the reunion was filled with a lot of hugging, laughter and a huge sigh of relief. For someone who had spent most of their entire lives in a small and sleepy town in the East coast of Brazil, waking up and finding yourself at JFK Airport was nothing short of a major event! I had never seen so many people of so many different races and speaking so many different dialects in my entire life, all 16 years of them! Everywhere I looked, everything was so different from what I was used to… even signs (so many of them) were in a different language! Eventually we grabbed our luggage and made our way to the parking lot looking for our car.

Before my sister and I left Brazil, I had the very hard task of giving away all of my possessions and only bringing the very bare minimal to start “a new life”. I was still going through my mid-teenager years, so I had to give away all of my favorite music LPs, books, childhood toys, and all the mementos I had collected through the years. This may not be such a big deal to you, but I have always been very attached to the things people give me, specially if they were given by someone I really cared. Seeing the things that represented so many people and moments of my life slowly drifting away filled me with a great feeling of personal loss. This feeling would stay with me for the next couple of years as I tried to adjust to my new adopted country. I was a stranger in a different land, where nobody knew me and I did not know anyone.

It’s been 23 years since this event took place, and I’m still here in the “Land of the Free”. Through the years I have survived High School, graduated with a Bachelors in Science from an university in Upstate New York, married (another immigrant from another country who you shall meet soon), moved a couple of times, and now find myself raising three young girls in North Carolina, the first Maciel generation of our families to be born outside our countries! Our similarities and differences, however, go beyond only the generation gap!

You see, contrary to a lot of the “stereotypical” immigrant families, we have completely immersed ourselves into the Americal way of life and culture, with a dash of our childhood cultures sprinkled here and there to add a little diversity to the mix. My wife and I stopped observing the holidays from our countries of origin a long time ago, specially those with no corresponding holidays here. We share a lot of the things that we learned growing up with our kids, but always in a nostalgic, almost didactic sort of way. We speak a mix of Brazilian Portuguese-Mexican Spanish-New Jersey English at home and try our best not to force our children to learn either language in particular. As it stands now, our kids’ primary language is English and even though I still make a habit of speaking in Brazilian Portuguese to them, their vocabulary consists of several words that they only say either in Spanish or Portuguese, like the word “daddy”. My wife’s vocabulary has also gone through a very interesting transformation, and she now speaks more Portuguese than Spanish when talking to our kids. Maybe it is because she was very young when she moved to New York in the early 1990s and never really got a lot of exposure to the Spanish language growing up in a different country.

All I can say is that I call North Carolina home, I vote during elections, I always get emotional when hearing the American Anthem, and together with my wife I raise the next generation of the Maciel family! Maybe they will take some of our culture and teach it to their own kids one day… maybe one day they may even learn to speak Portuguese or Spanish… maybe they won’t, and that is ok by me. We don’t even force them to follow the same religion our parents (and their parents) taught us growing up, prefering that they make that decision on their own, when and if they’re ever interested in doing so. We want them to be able to choose their own paths and make educated decisions about every aspect of their lives without any pressure or guilt.

I’m an American-Brazilian, my wife is American-Mexican and our kids are Americans with a touch of Brazilian and Mexican pride and culture. Together we form the New American Family!

Footnotes