A planet of blogs from our members...

Joe GregorioSix Places

One of the questions that comes up regularly when talking about zero frameworks is how can you expect to stitch together an application without a framework? The short answer is "the same way you stitch together native elements," but I think it's interesting and instructional to look at those ways of stitching elements together individually.

There are six surfaces, or points of contact, between elements, that you can use when stitching elements together, whether they are native or custom elements.

Before we go further a couple notes on terminology and scope. For scope, realize that we are only talking about DOM, we aren't talking about composing JS modules or strategies for composing CSS. For the terminology clarification, when talking about DOM I'm referring to the DOM Interface for an element, not the element markup. Note that there is a subtle difference between the markup element and the DOM Interface to such an element.

For example, <img data-foo="5" src="https://example.com/image.png"/> may be the markup for an image. The corresponding DOM Interface has an attribute of src with a value of "https://example.com/image.png", but the corresponding DOM Interface doesn't have a "data-foo" attribute, instead all data-* attributes are available via the dataset attribute on the DOM Interface. In the terminology of the WhatWG Living Standard, this is the distinction between content attributes vs IDL attributes, and I'll only be referring to IDL attributes. So with the preliminaries out of the way let's get into the six surfaces that can be used to stitch together an application.

Attributes and Methods

The first two surfaces, and probably the most obvious, are attributes and methods. If you are interacting with an element it's usually either reading and writing attribute values:


or calling element methods:


Technically these are the same thing, as they are both just properties with different types. Native elements have their set of defined attributes and methods, and depending on which element a custom element is derived from it will also have that base element's attributes and methods along with the custom ones it defines.


The next two surface are events. Events are actually two surfaces because an element can listen for events,

ele.addEventListener(‘some-event’, function(e) { /* */ });

and an element can dispatch its own events:

var e = new CustomEvent(‘some-event’, {details: details});

DOM Position

The final two surfaces are position in the DOM tree, and again I'm counting this as two surfaces because each element has a parent and can be a parent to another element. Yeah, an element has siblings too, but that would bring the total count of surfaces to seven and ruin my nice round even six.

  <img src="">

Combinations are powerful

Let's look at a relatively simple but powerful example, the 'sort-stuff' element. This is a custom element that allows the user to sort elements. All children of 'sort-stuff' with an attribute of 'data-key' are used for sorting the children of the element pointed to by the sort-stuff's 'target' attribute. See below for an example usage:

<sort-stuff target="#sortable">
   <button data-key=one>Sort on One</button>
   <button data-key=two>Sort on Two</button>
 <ul id=sortable>
   <li data-one=c data-two=x>Item 3</li>
   <li data-one=a data-two=z>Item 1</li>
   <li data-one=d data-two=w>Item 4</li>
   <li data-one=b data-two=y>Item 2</li>
   <li data-one=e data-two=v>Item 5</li>

If the user presses the "Sort on One" button then the children of #sortable are sorted in alphabetical order of their data-one attributes. If the user presses the "Sort on Two" button then the children of #sortable are sorted in alphabetical order of their data-two attributes.

Here is the definition of the 'sort-stuff' element:

    function Q(query) {
      return Array.prototype.map.call(
          function(e) { return e; });

    var SortStuffProto = Object.create(HTMLElement.prototype);

    SortStuffProto.createdCallback = function() {
      Q('[data-key]').forEach(function(ele) {
        ele.addEventListener('click', this.clickHandler.bind(this));

    SortStuffProto.clickHandler = function(e) {
      var target = Q(this.getAttribute('target'))[0];
      var elements = [];
      var children = target.children;
      for (var i=0; i<children.length; i++) {
        var ele = children[i];
        var value = ele.dataset[e.target.dataset.key];
          value: value,
          node: ele
      elements.sort(function(x, y) {
        return (x.value == y.value ? 0 : (x.value > y.value ? 1 : -1));
      elements.forEach(function(i) {

    document.registerElement('sort-stuff', {prototype: SortStuffProto});

And here is a running example of the code above:

  • Item 3
  • Item 1
  • Item 4
  • Item 2
  • Item 5

Note the surfaces that were used in constructing this functionality:

  1. sort-stuff has an attribute 'target' that selects the element to sort.
  2. The target children have data attributes that elements are sorted on.
  3. sort-stuff registers for 'click' events from its children.
  4. sort-stuff children have data attributes that determine how the target children will be sorted.

In addition you could imagine adding a custom event 'sorted' that 'sort-stuff' could generate each time it sorts.

So there's your six surfaces that you can use when composing elements into your application. And why the insistence on making the number of surfaces equal six? Because while history may not repeat itself, it does rhyme.

Joe GregoriogRPC

Today Google launched gRPC, a new HTTP/2 and Protocol Buffer based system for building APIs. This is Google's third system for web APIs.

The first system was Google Data, which was based on the Atom Publishing Protocol [RFC 5023]. It was an XML protocol over HTTP. The serving system for that grew, but started to hit scalability issues at around 50 APIs. The scaling issues weren't in the realm of serving QPS, but more in the management of that many APIs, such as rolling out new features across all APIs and all clients.

Those growing pains and lessons learned led to the next generation of APIs that launched in 2010. In addition to writing a whole new serving infrastructure to make launching APIs easier, it was also a time to shed XML and build the protocol on JSON. This Google I/O video contains good overview of the system:

Now, five years later, a third generation API system has been developed, and the team took the opportunity to make another leap, moving to HTTP/2 and Protocol Buffers. This is the first web API system from Google that I haven't been involved in, but I'm glad to see them continuing to push the envelope on web APIs.

Caktus GroupPyCon 2015 Ticket Giveaway

Caktus is giving away a PyCon 2015 ticket, valued at $350. We love going to PyCon every year. It’s the largest gathering of developers using Python, the open source programming language that Caktus relies on. This year, it’ll be held April 8th-16th at the beautiful Palais des congrès de Montréal (the inspiration we used to design the website).

To enter, follow @caktusgroup on Twitter and RT this message.

The giveaway will end Tuesday, March 3rd at 12pm EST. Winner will be notified via Twitter DM. A response via DM is required within 24 hours or entrant forfeits their ticket. Caktus employees are not eligible. Winning entrant must be 18 years of age or older. Ticket is non-transferable.

Bonne chance!

Frank WierzbickiJython 2.7 beta4 released!

[Update: some of the download links where wrong, they should now be correct. Sorry for the mistake!] On behalf of the Jython development team, I'm pleased to announce that the fourth beta of Jython 2.7 is available. I'd like to thank Amobee for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7b4 brings us up to language level compatibility with the 2.7 version of CPython. We have focused largely on CPython compatibility, and so this release of Jython can run more pure Python apps then any previous release. Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

Jim Baker put together a great summary of the recent work for beta4.
As a beta release we are concentrating on bug fixing and stabilization for a
production release.

This release is being hosted at maven central. The traditional installer can be found here. See the installation instructions for using the installer. Two other versions are available:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Caktus GroupTriangle Open Data Day and Code Across

International Open Data Day is this weekend, February 21st and 22nd. As part of the festivities, Code for America is hosting its 4th annual CodeAcross. The event aims to unite developers across the country for a day of civic coding, creating tools that make government services simple, effective, easy to use. Put simply, “the goal of CodeAcross is to activate the Code for America network and inspire residents everywhere to get actively involved in their community.”

Technology Tank is hosting CodeAcross as part of Triangle Open Data Day, a chance for people living in the Triangle to come together to learn about open data and hacking for civic good. This year will involve a civic hackathon with something for everyone, from novice to expert coders alike.

Not only is Caktus an official bronze sponsor of this year’s Triangle Open Data Day, but CTO Colin Copeland is also the founder and co-captain of Code for Durham, the Durham chapter of Code for America.

Not sure whether you should attend? Don’t worry, Code for America has provided a helpful flowchart to help you decide. We’ll hope to see you there!

Josh JohnsonClojure + Boot Backend For The React.js Tutorial

Last weekend I worked my way through the introductory tutorial for React.js. It’s very well written and easy to follow, I was really happy with it overall.

For the uninitiated, React.js is a framework that provides a means to create reusable javascript components that emit HTML in a very intuitive way. Taking that concept a step further, its possible to use React on the backend, utilizing the same components to build the UI that is served to the user initially. The end result is very interesting. React prescribes an intuitive and scalable approach to building complex, dynamic user interfaces as highly reusable components.

These user interfaces avoid the redundancy of generating and manipulating HTML twice – once on the server, and again in the browser.

The server-side rendering feels like a natural pattern in a Node.js environment, but there are examples in the wild of doing server-side rendering with other platforms, most notably clojure. This is exciting stuff.

React has been around for a while, but this is the first time I’ve taken a close look at it.

The tutorial focuses on building a simple font-end application rendered entirely in the browser. Initially, you work with a standalone HTML page, and near the end, you integrate it with a simple web application.

The source repository for the tutorial provides some example applications written in Python, Ruby and Node.js.

A simple application like this seemed like an ideal use case for a simple boot script, so I decided to write one of my own. Here’s the code inline, but I’ve forked the repository if you’d like to examine the code along-side its cohorts.

#!/usr/bin/env boot
  #(into % '[[org.clojure/data.json "0.2.5"]
             [ring/ring-core "1.3.2"]
             [ring/ring-jetty-adapter "1.3.2"]]))

(require '[ring.adapter.jetty     :as jetty]
         '[clojure.data.json      :as json]
         '[ring.middleware.params :refer [wrap-params]]
         '[ring.util.response     :refer [file-response response]])

(defn static
  "Handle static file delivery"
  (let [uri (:uri request)
        path (str "./public" uri)]
    (if (= uri "/comments.json")
      (file-response "./_comments.json")
      (file-response path))))

(defn save-comments
  "Save the comments to the json file, and return the new data"
  (let [data (json/read-str (slurp "./_comments.json"))
        input (:form-params request)
        out (concat data [input])
        new-json (json/write-str out)]
    (spit "./_comments.json" new-json)
    (response new-json)))

(defn handler
  "Simple handler that delegates based on the request type"
  (case (:request-method request)
    :post (save-comments request)
    :get (static request)))

(def app
  "Add middleware to the main handler"
  (wrap-params handler))

(defn -main
  [& args]
  (jetty/run-jetty app {:port 3000}))

Essentially, it sets up two handlers, and then a dispatcher that proxies between them depending on the type of request. If the request is a GET, a static file is assumed. This serves the html and any local dependencies. If the request is specifically for comments.json, the handler serves the _comments.json file.

If the request is a POST, its assumed that the body of the request contains a JSON-encoded comment to add. It deserializes that data and the _comments.json file, and appends the new comment to the list. The result is then saved to the filesystem.

Obviously, there is little in the way of error checking going on here. This tracks with the scope of the other example applications.

Note: It’s not clear to me exactly why they used _comments.json to store the data – in my initial prototype I named it comments.json and placed it with the other static files.

Interestingly, this boot script also serves as a minimalistic example of a web application using ring – including adding middleware.

This was a fun way to finish up a really informative tutorial – I’m excited to continue exploring what React.js can do, especially with Clojure!

Special thanks to alandipert and ul from #hoplon for code review and some great advice on cleaning up my initial implementation!

Caktus GroupAstro Code School Tapped to Teach App Development at UNC Journalism School

Our own Caleb Smith, Astro Code School lead instructor, is teaching this semester at UNC’s School of Journalism, one of the nation’s leading journalism schools. He’s sharing his enthusiasm for Django application development with undergraduate and graduate media students in a 500-level course, Advanced Interactive Development.

For additional details about the course and why UNC School of Journalism selected Caktus and Astro Code School, please see our press release.

Caktus GroupPyCon Blog Features Caktus Group

Brian Curtis, the director of the Python Software Foundation, recently interviewed and featured Caktus on the PyCon website. PyCon is the premiere event for those of us within the Python and Django open source communities. Brian writes about our work designing the PyCon 2015 website, our efforts in Libya, and what’s on the horizon in 2015. We're excited about this recognition!

Caktus GroupDjango Logging Configuration: How the Default Settings Interfere with Yours

My colleague Vinod recently found the answer on Stack Overflow to something that's been bugging me for a long time - why do my Django logging configurations so often not do what I think they should?

Short answer

If you want your logging configuration to behave sensibly, set LOGGING_CONFIG to None in your Django settings, and do the logging configuration from scratch using the Python APIs:

LOGGING = {...}  # whatever you want

import logging.config


The kernel of the explanation is in this Stack Overflow answer by jcotton; kudos to jcotton for the answer: before processing your settings, Django establishes a default configuration for Python's logging system, but you can't override it the way you would think, because disable_existing_loggers doesn't work quite the way the Django documentation implies.

The Django documentation for disable_existing_loggers in 1.6, 1.7, and dev (as of January 8, 2015) says "If the disable_existing_loggers key in the LOGGING dictConfig is set to True (which is the default) the default configuration is completely overridden." (emphasis added)

That made me think that I could set disable_existing_loggers to True (or leave it out) and Django's previously established default configuration would have no effect.

Unfortunately, that's not what happens. The disable_existing_loggers flag only does literally what it says: it disables the existing loggers, which is different from deleting them. The result is that they stay in place, they don't log any messages, but they also don't propagate any messages to any other loggers that might otherwise have logged them, regardless of whether they're configured to do so.

What if you try the other option, and set disable_existing_loggers to False? Then your configuration is merged with the previous one (the default configuration that Django has already set up), without disabling the existing loggers. If you use Django's LOGGING setting with the default LOGGING_CONFIG, there is no setting that will simply replace Django's default configuration.

Because Django installs several django loggers, the result is that unless you happened to have specified your own configuration for each of them (replacing Django's default loggers), you have some hidden loggers possibly blocking what you expect to happen.

For example - when I wasn't sure what was going on in a Django project, sometimes I'd try just adding a root logger, to the console or to a file, so I could see everything. I didn't know that the default Django loggers were blocking most log messages from Django itself from ever reaching the root logger, and I would get very frustrated trying to see what was wrong with my logging configuration. In fact, my own logging configuration was probably fine; it was just being blocked by a hidden, overriding configuration I didn't know about.

We could work around the problem by carefully providing our own configuration for each logger included in the Django default logging configuration, but that's subject to breaking if the Django default configuration changes.

The most fool-proof solution is to disable Django's own log configuration mechanism by setting LOGGING_CONFIG to None, then setting the log configuration explicitly ourselves using the Python logging APIs. There's an example above.

The nitty-gritty

The Python documentation is more accurate: "disable_existing_loggers – If specified as False, loggers which exist when this call is made are left enabled. The default is True because this enables old behavior in a backward- compatible way. This behavior is to disable any existing loggers unless they or their ancestors are explicitly named in the logging configuration."

In other words, disable_existing_loggers does literally what it says: it leaves existing loggers in place, it just changes them to disabled.

Unfortunately, Python doesn't seem to document exactly what it means for a logger to be disabled, or even how to do it. The code seems to set a disabled attribute on the logger object. The effect is to stop the logger from calling any of its handlers on a log event. An additional effect of not calling any handlers is to also block propagation of the event to any parent loggers.

Status of the problem

There's been some recent discussion on the developers' list about at least improving the documentation, with a core developer offering to review anything submitted. And that's where things stand.

Caktus GroupWe’re Launching a Django code school: Astro Code School

One of the best ways to grow the Django community is to have more high-quality Django developers. The good news is that we’ve seen sharply increasing demand for Django web applications. The challenge that we and many other firms face is that there’s much higher demand than there is supply: there aren’t enough high-quality Django developers. We’ve talked about this issue intensely internally and with our friends while at DjangoCon and PyCon. We decided that we can offer at least one solution: a new Django-focused code school.

We’re pleased to announce the launch of Astro Code School in Spring 2015. Astro will be the first Django code school on the East Coast. Programs include private trainings and weekend, 3-week, and 12-week full-time courses. In addition to Django, students will learn Python (of course), HTML, CSS, and JavaScript. They will come away being able to build web applications. The shorter programs will be geared towards beginners. The longer program will are for those with previous programming experience. Astro will also provide on-site, private corporate training, another area we frequently get asked about.

Astro will be a separate company under Caktus. To support Astro, we welcome Brian Russell, the new director of Astro. Brian is the former owner of Carrboro Creative Coworking, the place where Caktus got its start. In addition to being a long-term supporter of new developers, Brian is also an artist and entrepreneur. He has a special interest in increasing diversity within open source. Django itself is one of the most respectful and welcoming places for women and minorities and he’s excited to contribute.

Our first and leading instructor will be Caleb Smith, a Caktus developer since 2011. Caleb first joined Caktus as an intern, straight from his days as a public school music teacher. He continued to teach while at Caktus, supporting free and low-cost courses for women through the nonprofit Girl Develop It RDU. He’s also currently teaching an advanced web application course at the University of North Carolina’s School of Journalism and Mass Communication.

We’re building out the space for Astro currently on the first floor of our new headquarters in Downtown Durham. Astro Code School will have a dedicated 1,795 square feet of space. Construction should be complete by April.

Caktus GroupWhy I Love Technical Blogging

I love writing blog posts, and today I’m setting out to do something I’ve never tried before: write a blog post about writing blog posts. A big part of our mission at Caktus is to foster and help grow the Python and Django development communities, both locally and nationally. Part of how we’ve tried to accomplish this in the past is through hosting development sprints, sponsoring and attending conferences such as PyCon and DjangoCon, and building a knowledge base of common problems in Python and Django development in our blog. Many in the Django community first get to know Caktus through our blog, and it’s both gratifying and humbling when I meet someone at a conference and the person thanks me for a post Caktus wrote that helped him or her solve a technical problem at some point in the past.

While I personally don’t do as much software development as I used to and hence no longer write as many technical posts, the Caktus blog and many others in the community continue as a constant source of inspiration and education to me. As software developers we are constantly trying to work ourselves out of a job, building tools that organize information and help people communicate. Sharing a brief, highly specific technical blog post serves in a similar capacity; after I’ve spent 1-2 hours or more researching something that ultimately took 5-10 minutes to fix, I’d hate for someone else to have to go through the same experience. Writing up a quick, 1-2 paragraph technical post about the issue not only helps me think through the problem, but also hopefully saves a few minutes of someone else’s life at some point in the future.

To help me better understand what I like so much about blogging, I went back and reviewed the history of Caktus blogging efforts over the past 5 years and separated our posts into categories. While I’m sure there are innumerable ways to do this, in case it serves as a source of inspiration to others, what follows are the categories I came up with:

  • Technical Tidbits. These types of posts are small, usually a paragraph or two, along with a code or configuration snippet. They might cover upgrading a specific open source package or reusable app in your project, or augment existing Django release notes when you find the built-in Django documentation lacking for a specific use case. Posts in this category that we’ve written in the past at Caktus include upgrading django-photologue and changing or SHMMAX setting (for PostgreSQL) on a Mac. These are great posts to write after you’ve just done something for the first time. You’ll have a fresher perspective than someone who’s done the task many times before. Because of this, you can easily anticipate many of the common problems someone coming to the task for the first time might face.

  • Debugging Sugar. Posts handy for debugging purposes often rely on a specific error message or stack trace. Another good candidate for this type of post is documenting an existing Django bug that requires a specific workaround. Posts we’ve written in this category include using strace to debug stuck celery tasks and the (thankfully now obsolete) parsing microseconds in the Django admin. A good sign you need to write a post like this is that you had to spend more than 5-10 minutes Googling for an answer to something or asking your co-workers. If you’re looking for an answer and having trouble finding it, there’s a good chance someone else out there is doing the same and would benefit from your blog post.

  • Open Source Showcase. Open Source Showcase posts are a great way to spread the word about a project you have or a teammate has written, or to validate a 3rd party app or Django feature you’ve found particularly helpful. These are typically longer, more in-depth analyses of a project or feature rather than an answer to a specific technical problems (though the two are not always mutually exclusive). At Caktus we’ve written about our django-scribbler app as well as several new features in Django, including bulk inserts, class-based views, and support for custom user models. While these posts can require a significant time investment to get right, their value as augmentation to or 3rd-party validation of Python and Django development patterns cannot be underrated. Patterns are set through a community rallying around an open source package or approach. Proposing and sharing these ideas openly is what drives the open source community forward.

  • Mini How-tos. Mini How-tos are generally a combination of other types of posts. They start with a specific goal in mind -- setting up a server, installing a reusable app -- and walk the reader through all the necessary steps, services, and packages required to get there. If you feel passionately that something should be done in a certain way, this is a great way to set a standard for the community to be aware of and potentially follow. This could cover anything from configuring a Jenkins slave to using Amazon S3 to store your static and uploaded media. Similar to an Open Source Showcase, Mini How-tos are an asset to the community insofar as they help advance and disseminate common approaches to software development problems. At the same time, they’re open to review and critique by the wider open source community.

A big thank you to everyone in the Python and Django community for being open and willing to share your experiences and problem solving efforts. Without this, Caktus would not be where it is today and for that I am deeply grateful. If this post happens to inspire at least one short technical post from someone who hasn’t written one before, I’ll consider it a success.

Caktus GroupCaktus is looking for a Web Design Director

Over the last two years Caktus’ design portfolio has rapidly been growing. We’ve taken on new projects primarily focused on design and have received community recognition for those efforts. We are happy to have grown our design capabilities to match the level of quality we demand from our Django developers. We have found it’s important to have strength on both sides of the table as each side challenges the other and forces the final product of our process to be as high quality as possible.

In an effort to continue to push ourselves and expand our web design skill sets, Caktus is looking to hire a new Web Design Director. We’re searching for someone who can do a bit of wireframing and user experience and then has the tools necessary to design and code pages. We’re looking for someone who is attune to both form and function and knows where to focus depending on clients’ needs. Caktus is committed to doing good in our development communities as well as through the projects that we choose to work on, so we are also interested in finding someone who is engaged in the design community.

If you or someone you know would be a good fit, please apply to the position! If you have any questions get in touch.

Caktus GroupMaking a Difference for Teens and Young Adults Living with HIV

Caktus has always pursued projects that make a difference in the world, particularly around HIV/AIDs. Now we’re hoping to provide a technology solution to a population that’s notoriously difficult to treat: teens and young adults living with HIV. They’re disproportionately impacted by new infections and, in the South in particular, those with HIV/AIDs have the lowest survival rates of any group living with the disease.

Our goal is to create an app that makes taking daily HIV medication fun. We’ll use games and social networking features to create a habit of taking medication, thereby lowering viral loads, improving health, and curbing the spread of HIV. Users will also be able to have their own private support network with other users. We began our efforts in 2012, building a prototype and conducting user experience tests. Based on that early succss, we recently received a huge boost: a major Small Business Innovation Research grant through the National Institutes of Health in partnership with UNC Institute for Global Health and Infectious Diseases and the Duke Global Health Institute.

The grant, made to help socially conscious organizations like ours tackle major US research goals, enables us to expand our team, pictured above, for six months. We’re bringing on board Lucas Rowe (Front-End Developer), Edward Rowe (Game Developer), and Nkechinyere Nwoko (Product Manager). They’ll augment our existing team, Daryl (Project Manager), Wray (Front-End Developer), and Calvin (Developer). The grant also supports research on the project, led by Lisa Hightow-Weidman, MD, MPH, Associate Professor of Medicine at UNC and Sara LeGrand, PhD, Assistant Research Professor at the Duke Global Health Institute and Center for Health Policy and Inequalities Research. Additional research team members include Sybil Hosek, PhD, Kathryn Muessig, PhD, and Joseph Egger, PhD.

The ultimate goal of curbing HIV infection rates through the use of games is an ambitious one, but we hope to support the fight against HIV/AIDs by doing what we do best: building sharp tools that help others.

Caktus GroupWebinar: Testing Client-Side Applications with Django

Technical Director Mark Lavin will be hosting a free O’Reilly webinar today at 4PM EST or 1PM PT on Testing Client-Side Applications with Django. Mark says testing is one of the most popular question topics he receives. It’s also a topic near and dear to Caktus’ quality-loving heart. Mark’s last webinar garnered more than 500 viewers, so sign up quick!

Here’s a description from Mark:

During the session we'll examine a simple REST API with Django connected to a single page application built with Backbone. We'll look at some of the tools available to test the application with both Javascript unit tests and integration tests written in Python. We'll also look at how to organize them in a sane way for your project workflow.

To sign up, visit the webinar page on O’Reilly’s site.

Caktus GroupThe Herald-Sun: Durham Firms MDC and Caktus Group Team Up to Assist in Healthcare Enrollment

We’re proud of our collaboration with Durham-based non-profit MDC to create the health insurance exchange alternative for North Carolina, NCGetCovered.org. The site was recently featured in the Herald-Sun. We’re excited to see our commitment to social good making an impact in our local community. Don’t forget the open enrollment deadline for health coverage is February 15th!

Caktus GroupThe Triangle Business Journal: Fighting the South's Low Survival Rate for Those with HIV/AIDS

The Triangle Business Journal recognized Caktus’ work in the fight against HIV/AIDS in an important article discussing lower survival rates for those diagnosed with the disease in the southern United States. The article lauds our work on the medication adherence app Epic Allies as innovative in the effort to develop and maintain consistent medication habits for those living with HIV/AIDS.

Caktus GroupAnnouncing Durham TriPython Project Nights @ Caktus Group

We’re happy to announce that TriPython will be hosting their project nights at our new offices in Durham. This means now there’s a TriPython project night every single week, in every major Triangle city every month, on most weekdays. It’s great to see that the Triangle Python community has gotten so large.

We loved hosting the Carrboro meetings in our old location, so it’s great that we can continue our support in a new, larger space. The Durham project nights will occur at our offices every third Monday of the month. From TriPython:

Project nights are informal get-togethers. Users new to Python will find this a good opportunity to seek assistance. Seasoned Python users find project night an opportunity to share with others and cross-pollinate projects.
We hope you’ll join us. To celebrate, we’ll have a couple giveaways. The next Durham project night is tonight!

Tim HopperSundry Links for January 19, 2015

Matt Blodgett: But Where Do People Work in This Office?: "After looking through tons of cool office photos of many of the hottest companies in the Valley, I started to play a fun game I made up called 'spot the desks’. I’ll show you what I mean."

Why We (Still) Believe in Private Offices: Joel Spolsky and Fog Creek Software have been relentless defenders of quite, private offices for developers. They continue that here.

Pandoctor: An Alfred GUI for Pandoc: If you use Pandoc and Alfred, this is worth trying.

Alfred Hop: I use a little bash tool called Hop to bookmark frequently used directories. I made this tool to give me quick access to my bookmarks from Alfred.

Discover Flask: Flask, the lightweight Python framework, is a joy to use. Here’s a nice introduction to it.

Introduction to dplyr: I haven’t used R much since leaving my last job, but the ecosystem has been booming with great tools; dplyr is one of them.

Josh JohnsonBoot: Getting Started With Clojure In < 10 Minutes

With the power of boot, it’s possible to go from “never used java before” to budding Clojure-ist cranking out jars like a pickle factory in record time. This post walks you through the process, and provides some post-‘hello world’ examples, with pointers to more information.


You will need the following: A JDK installed. Really, that’s it. Sun’s JDK or OpenJDK will work. Use the newest version.

You’ll need a way to download things. Feel free to use your browser. The examples below use wget.

If you’re on Linux or Mac OS, you’ll also need root access via sudo – this is not a hard requirement but allows you to install boot for everyone on your machine to use.

There’s an expectation that you know basic Clojure, and tries not to be too clever. For a good introduction, check out Clojure For The Brave and True, specifically Do Things: a Clojure Crash Course.
If you need help with specific forms used, the Clojure Community Documentation is extremely helpful, especially the Clojure Cheat Sheet.

It may be helpful to give the boot readme and wiki documentation a read. If you have questions about boot, IRC is a great way to get boot and clojure rockstars to help you out. Come join us on freenode, in #hoplon.

Dales la Bota (Give ‘em The Boot)

Boot is ‘installed‘ by simply downloading an executable file and putting it somewhere where you can execute it:

$ wget https://github.com/boot-clj/boot/releases/download/2.0.0-rc8/boot.sh
$ mv boot.sh boot && chmod a+x boot && sudo mv boot /usr/local/bin

The real magic happens when boot is run. Boot sets everything up in a .boot directory in your home folder. Without having any code to execute yet, you can trigger this by simply asking boot for help:

$ boot -h

Let’s Play With Clojure

Clojure utilizes a concept called a REPL (Read, Evaluate, Print, Loop). REPLs allow you to interactively run code and experiment.

$ boot repl

Boot then provides you with a prompt, where you can play around:

boot.user=> (+ 1 2 3 4 5)
boot.user=> (/ 10 0)

java.lang.ArithmeticException: Divide by zero

Boot also works as a scripting platform – you can construct applications, specifying dependencies, and parse command-line arguments.

Here’s a simple Clojure function that prints the fibonacci sequence to a given number of digits:

(defn fib
    (fib [0 1] n))
  ([pair, n]
    (print (first pair) " ")
    (if (> n 0)
      (fib [(second pair) (apply + pair)] (- n 1))

You can paste this into your REPL and try it out:

boot.user=> (defn fib
       #_=>   ([n]
       #_=>   (fib [0 1] n))
       #_=> ([pair, n]
       #_=>   (print (first pair) " ")
       #_=>   (if (> n 0)
       #_=>     (fib [(second pair) (apply + pair)] (- n 1))
       #_=>     (println))))
boot.user=> (fib 10)
0  1  1  2  3  5  8  13  21  34  55
We can transform that function into a command-line tool using the power of boot scripting. Assume this file is called fib.boot:
#!/usr/bin/env boot

(defn fib
    (fib [0 1] n))
  ([pair, n]
    (print (first pair) " ")
    (if (> n 0)
      (fib [(second pair) (apply + pair)] (- n 1))

(defn -main [& args]
  (let [limit (first args)]
    (println "Printing fibonacci sequence up to " limit "numbers")
    (fib (Integer/parseInt limit))))

Make the script executable:

$ chmod u+x fib.boot

Now you can run the script:

$ ./fib.boot 10
Printing fibonacci sequence up to  10 numbers
0  1  1  2  3  5  8  13  21  34

The script can declare dependencies, which will be downloaded as needed when the script is run.

Here, we’ll show the use of an external dependency: we can write a new fibonacci sequence that utilizes the fact that numbers in the sequence are related to each other by approximately the golden ratio (ca 1.62). Rounding makes it all work, but rounding isn’t “baked in” to Clojure, so we’ll use an external library to do it for us, called math.numeric-tower.

Ok, actually, it’s there, you just need to use some existing Java libraries to make it work – I admit this is a bit of a strain!

#!/usr/bin/env boot

(set-env! :dependencies '[[org.clojure/math.numeric-tower "0.0.4"]])
(require '[clojure.math.numeric-tower :refer [floor ceil round]])

(defn fib
  (loop [counter 0 x 0]
    (if (= counter 0)
      (do (print 0 " " 1 " " 1 " ")
        (recur 3 1))
    (let [y (round (* x 1.62))]
      (print y " ")
      (if (< counter 9)
        (recur (+ counter 1) y))))))

(defn -main [& args]
  (let [limit (first args)]
    (println "Printing fibonacci sequence up to" limit "numbers")
    (fib (Integer/parseInt limit))

When you run this code the first time, you’ll notice boot tells you that it’s downloaded some new jars:

$ ./fib.boot
Retrieving clojure-1.4.0.jar from http://clojars.org/repo/
Retrieving math.numeric-tower-0.0.4.jar from http://repo1.maven.org/maven2/
Printing fibonacci sequence up to  10 numbers
0  1  1  2  3  5  8  13  21  34

The syntax to define our -main function and parse our command line options can be a bit tedious. Luckily, we can borrow a macro from boot.core that lets us specify CLI options using a robust syntax.

For the full syntax, check out the documentation.

Here, we’ll let the user choose which implementation they’d like to use, and utilize the task DSL to do some simple command line options:

#!/usr/bin/env boot

(set-env! :dependencies '[[org.clojure/math.numeric-tower "0.0.4"]])

(require '[clojure.math.numeric-tower :refer [floor ceil round]])
(require '[boot.cli :as cli])

(defn fib
    (fib [0 1] n))
  ([pair, n]
     (print (first pair) " ")
     (if (> n 1)
       (fib [(second pair) (apply + pair)] (- n 1)))))

(defn fibgolden
  (loop [counter 0 x 0]
    (if (= counter 0)
      (do (print (str 0 "  " 1 "  " 1 "  "))
        (recur 3 1))
    (let [y (round (* x 1.62))]
      (print y " ")
      (if (< counter 9)
        (recur (+ counter 1) y))))))

(cli/defclifn -main
  "Print a fibonacci sequence to stdout using one of two algorithms."
  [g golden bool "Use the golden mean to calculate"
   n number NUMBER int "Quantity of numbers to generate. Defaults to 10"]
  (let [n (get :n *opts* 10)
        note (if golden "[golden]" "[recursive]")]
    (println note "Printing fibonacci sequence up to" n "numbers:")
    (if golden
      (fibgolden n)
      (fib n)))

Now you can see what options are available, tell the script what to do:

$ boot fib.boot -h
Print a fibonacci sequence to stdout using one of two algorithms.

  -h, --help           Print this help info.
  -g, --golden         Use the golden mean to calculate
  -n, --number NUMBER  Set quantity of numbers to generate. Defaults to 10 to NUMBER.

$ boot fib.boot
[recursive] Printing fibonacci sequence up to 10 numbers:
0  1  1  2  3  5  8  13  21  34  

$ boot fib.boot -g -n 20
[recursive] Printing fibonacci sequence up to 20 numbers:
0  1  1  2  3  5  8  13  21  34  55  89  144  233  377  610  987  1597  2584  4181

Working At The Pickle Factory (Packing Java Jars and More Complex Projects)

Now that we’ve got a basic feel for Clojure and using boot, we can build a project, that creates a library with an entry point that we can use and distribute as a jar file.

This opens the doors to being able to deploy web applications, build libraries to share, and distribute standalone applications.

First, we need to create a project structure. This will help us keep things organized, and fit in with the way Clojure handles namespaces and files. We’ll put our source code in src, and create a new namespace, called fib.core:

$ mkdir -p src/fib

In src/fib/core.clj, we’ll declare our new namespace:

(ns fib.core
  (:require [clojure.math.numeric-tower :refer [floor ceil round]]
            [boot.cli :as cli])

(defn fib
    (fib [0 1] n))
  ([pair, n]
    (print (first pair) " ")
    (if (> n 1)
      (fib [(second pair) (apply + pair)] (- n 1)))))

(defn fibgolden
  (loop [counter 0 x 0]
    (if (= counter 0)
      (do (print (str 0 "  " 1 "  " 1 "  "))
          (recur 3 1))
    (let [y (round (* x 1.62))]
      (print y " ")
      (if (< counter 9)
        (recur (+ counter 1) y))))))

(cli/defclifn -main
  "Print a fibonacci sequence to stdout using one of two algorithms."
  [g golden bool "Use the golden mean to calculate"
   n number NUMBER int "Quantity of numbers to generate. Defaults to 10"]
  (let [n (if number number 10)
        note (if golden "[golden]" "[recursive]")]
    (println note "Printing fibonacci sequence up to" n "numbers:")
    (if golden
      (fibgolden n)
      (fib n)))

To build our jar, there are a handful of steps:

  1. Download our dependencies.
  2. Compile our clojure code ahead of time (aka AOT).
  3. Add a POM file describing our project and the version.
  4. Scan all of our dependencies and add them to the fileset to be put into the jar.
  5. Build the jar, specifying a module containing a -main function to run when the jar is invoked.

Helpfully, boot provides built-in functionality to do this for us. Each step is implemented as a boot task. Tasks act as a pipeline: the result of each can influence the next.

boot -d org.clojure/clojure:1.6.0 \
     -d boot/core:2.0.0-rc8 \
     -d org.clojure/math.numeric-tower:0.0.4 \
     -s src/ \
     aot -a \
     pom -p fib -v 1.0.0 \
     uber \
     jar -m fib.core

A brief explanation of each task and command line options:

Line 1-3: the -d option specifies a dependency. Here we list Clojure itself, boot.core, and math.numeric-tower.

Line 4: -s specifies a source directory to look into for .clj files.

Line 5: this is the AOT task, that compiles all of the .clj files for us. The -a flag tells the task to compile everything it finds.

Line 6: the POM task. This task adds project information to the jar. The -p option specifies the project name, -v is the version.

Line 7: the uber task collects the dependencies so they can be baked into the jar file. This makes the jar big (huge really), but it ends up being self-contained.

Line 8: finally, the jar task. This is the task that actually generates the jar file. The -m option specifies which module has the -main function.

Running the above command, produces output something like this:

$ boot -d org.clojure/clojure:1.6.0 \
>      -d boot/core:2.0.0-rc8 \
>      -d org.clojure/math.numeric-tower:0.0.4 \
>      -s src/ \
>      aot -a \
>      pom -p fib -v 1.0.0 \
>      uber \
>      jar -m fib.core
Compiling fib.core...
Writing pom.xml and pom.properties...
Adding uberjar entries...
Writing fib-1.0.0.jar...

At this point, there is a file named fib-1.0.0.jar in the target directory. We can use the java command to run it:

$ java -jar target/fib-1.0.0.jar
[recursive] Printing fibonacci sequence up to 10 numbers:
0  1  1  2  3  5  8  13  21  34

You can send this file to a friend, and they can use it too.

Introducing build.boot

At this point we have a project and can build a standalone jar file from it. This is great, but long command lines are prone to error. Boot provides a mechanism for defining your own tasks and setting the command line options in a single file, named build.boot.

Here’s a build.boot that configures boot in a manner equivalent to the command line switches above:

(set-env! :dependencies
          '[[org.clojure/math.numeric-tower "0.0.4"]
            [boot/core "2.0.0-rc8"]
            [org.clojure/clojure "1.6.0"]]
          :source-paths #{"src/"})

  pom {:project 'fib
       :version "1.0.0"}
  jar {:main 'fib.core}
  aot {:all true})

With build.boot in the current directory, you can now run the tasks like this:

$ boot aot pom uber jar
Compiling fib.core...
Writing pom.xml and pom.properties...
Adding uberjar entries...
Writing fib-1.0.0.jar...

The convenience of build.boot one step further, we can chain the tasks we want to use into our own task, using the deftask macro:

(set-env! :dependencies
          '[[org.clojure/math.numeric-tower "0.0.4"]
            [boot/core "2.0.0-rc8"]
            [org.clojure/clojure "1.6.0"]]
          :source-paths #{"src/"})

  pom {:project 'fib
       :version "1.0.0"}
  jar {:main 'fib.core}
  aot {:all true})

(deftask build
  "Create a standalone jar file that computes fibonacci sequences."
  (comp (aot) (pom) (uber) (jar)))

Now, we can just run boot build to make our standalone jar file. You’ll also see your task show up in the help output:

$ boot -h
   build                      Create a standalone jar file that computes fibonacci sequences.
$ boot build
Compiling fib.core...
Writing pom.xml and pom.properties...
Adding uberjar entries...
Writing fib-1.0.0.jar...

Where To Go From Here

At this point we’ve touched most of the awesomeness that boot gives us. With these basic tools, there’s all sorts of interesting things we can do next. Here are some ideas:

Caktus GroupCaktus Libya Project Makes Top 10 Stories of 2014 for Technology Tank

Technology Tank, a communications and tech think tank, recently ranked their top 10 stories of 2014 and Caktus made the cut. Earlier this year, we shared a story about Libya’s SMS voter registration system. We’re happy to learn that other people were just as excited about our work enfranchising Libyan voters as we are! The list is based on number of visits to the story.

Og Maciel2014 in Book Covers

For my last post of 2014 I wanted to show, with pictures, the books I read and spent so much time with this year.

Back in January of 2014 I set out to read 30 books as part of my Reading Challenge. I wanted to focus on reading Brazilian authors early on as I felt that I really needed to learn more about Brazilian literature and this time, read books for fun and not because I was told to back when I was much younger.

books 1

I took advantage that UNC has a vast books collection with a very decent section on Brazilian authors and I was able to read some really awesome books by Erico Verissimo, Joaquim Manuel de Macedo, José de Alencar, Jorge Amado and Machado de Assis! I also fell in love with these authors and the fact that it took me a couple of decades to truly appreciate them doesn’t bother me at all, since I believe that it took me this long to reach the right maturity level… in other words, I was not ready for them yet until this year.

books 2

I also fell in love with John Steinbeck and Ray Bradbury, and I think that Grapes of Wrath and Dandelion Wine are two of the best books I have ever read!

Lastly, 2014 was also the year I started reading short stories (to sample different authors and see what they ‘have to offer’), and I highly recommend the short stories of Flannery O’Connor, John Cheever, Simon Rich, and Gabriel Garcí­a Márquez.

books 3

In 2015 I plan to continue reading more Brazilian authors and exploring different authors and genres! I may even re-read The Lord of the Rings again. :)

Tim HopperSundry Links for December 29, 2014

Sublime: Nice Features & Plugins: A brief talk introducing my favorite editor.

Alfred Workflow for Pinboard: I've started using Pinboard a bit for organizing links. Here's something that has the chance of getting me much deeper into pinboard: a powerful Alfred Workflow for interacting with Pinboard from your Mac's keyboard. HT Aaron Bachya

fullPage.js: I've been using this jquery plugin in a forthcoming project. It makes it really easy to create slide-like single page websites.

Obfuscating "Hello world!": The author attempts to write the worst 'hello world' possibe in Python. He does a good job.

Show Time in Multiple Time Zones with TextExpander: As I spend more time working with people in different time zones, tools like this help remove the cognitive challenge of translating time.

Tim HopperSundry Links for December 22, 2014

Time: Programmers all hate time, timezones, etc. Here are some helpful "notes about time".

Python strftime reference: Speaking of time: "A quick reference for Python's strftime formatting directives." I have to look this stuff each time I need it.

gitup: "A console script that allows you to easily update multiple git repositories at once"

The “How Does a Google Coder Work?” Edition : I enjoyed this interview. My favorite quote? "When you're reading code is it as clear as reading English?" "If I'm reading C++ code, it's clearer."

Sunset Salvo: John Turkey discusses practical data analysis and statistical humility.

10th Conference on Bayesian Nonparametrics: This is coming up in my own back yard. I’m excited!

Machine Learning: The High-Interest Credit Card of Technical Debt: I haven’t read this in detail, but the premise makes tons of sense to me: "It is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning."

Tim HopperShouldIGetAPhD.com

Last year, I published nine interviews with Internet friends about how an academically-minded, 22-year old college senior should work on a Ph.D. Many people have told me the interviews have been helpful for them or that they've emailed them to others.

I decided to make a dedicated website to host the interviews. You can find it at shouldigetaphd.com.

I hope this continues to be a valuable resource. I'd encourage you to share this with anyone you know who is thinking through this question.

Tim HopperSundry Links for December 6, 2014

Sketching as a Tool for Numerical Linear Algebra: A neat paper on sketching algorithms for linear algebra. No, not that kind of sketching. "One first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties. Much of the expensive computation can then be performed on the smaller matrix, thereby accelerating the solution for the original problem."

Maps and the Geospatial Revolution: Coursera is teaching a class in the spring on how geospatical technology has changed our world.

Geoprocessing with Python using Open Source GIS: Speaking of geospatial technology, here are some slides and problems from a class on "Geoprocessing with Python".

How to use the bash command line history: Bash's history can do more than I realized!

A geometric interpretation of the covariance matrix: Superb little post explaning covariance matrices with pictures and geometry.

Og MacielThree Years and Counting!

Making a quick pit stop to mark this milestone in my professional career: today is my 3-year anniversary at Red Hat! Time has certainly flown by and I really cannot believe that it has been three years since I joined this company.

I know it is sort of cliche to say “I can not believe that it has been this long…” and so on and so forth, but it is so true. Back then I joined a relatively new project with very high ambitions, and the first few months had me swimming way out in the deepest part of the pool, trying to learn all ‘Red Hat-things’ and Clojure for the existing automation framework (now we are fully using Python).

I did a lot of swimming for sure, and through the next months, through many long days and weekends and hard work, tears and sweat (you know, your typical life for a Quality Engineer worth his/her salt), I succeeded in adding and wearing many types of hats, going from a Senior Quality Engineer, to a Supervisor of the team, to eventually becoming the Manager for a couple of teams, spread over 4 different countries. Am I bragging? Maaaybe a little bit :) but my point is really to highlight a major key factor that made this rapid ascension path possible: Red Hat’s work philosophy and culture of rewarding those who work hard and truly embrace the company! Sure, I worked really hard, but I have worked just as hard before in previous places and gotten nowhere really fast! Being recognized and rewarded for your hard work is something new to me, and I owe a great debt of gratitude to those who took the time to acknowledge my efforts and allowed me room to grow within this company!

The best part of being a Red Hatter for 3 years? Being surrounded by an enormous pool of talented, exciting people who not only enjoy what they do, but are always willing to teach you something new, and/or to drop what they’re working on to lend you a helping hand! There is not a single day that I don’t learn something new, and thankfully I don’t see any sign of this trend stopping :) Have I mentioned that I love my teammates too? What a great bunch of guys!!! Getting up early in the morning and walking to my home office (yeah, they let me work remotely too) day in, day out, is never a drag because I just know that there are new things to learn and new adventures and ‘achievements to unlock’ right around the corner.

I am Red Hat!!!

Tim HopperSundry Links for December 4, 2014

How do I draw a pair of buttocks?: Have you ever wondered how to plot a pair of buttocks in Mathematica? Of course you have.

Frequentism and Bayesianism: A Python-driven Primer: Jake Vanderplas wrote a "brief, semi-technical comparison" of frequentist and Bayesian statistical inference using examples in Python.

skll: Dan Blanchard released version 1.0 of his very cool command line tool for doing experiments with scikit-learn.

Personalized Recommendations at Etsy: A fantastic post from Etsy's engineering blog on building scalable, personalized recommendations using linear algebra and locally sensitive hashing. I like math.

Pythonic Clojure: Andrew Montalenti wrote a post analyzing Clojure from a Python programmer's perspective. It's great.

Caktus GroupCaktus Hosts Lightweight Django Book Launch with Girl Develop It

With Girl Develop It RDU, we celebrated the launch of Lightweight Django (O'Reilly) with the authors, Caktus Technical Director Mark Lavin and Caktus alum Julia Elman. Sylvia Richardson of Girl Develop It MCed. The event was open to the public and so popular we kept recounting the RSVPs and fretting over the fire code. But, phew, we were good. In attendance were friends, family, fellow Cakti, and Django fans from around the Triangle.

Festivities included lots of Mediterranean food, some of Mark's favorite beers, raffled-off gift bags filled with Caktus goodies, and, for those in the first two rows, free copies of Lightweight Django. The main attraction, of course, was hearing Mark and Julia speak. In response to audience questions, they both emphasized code quality with Mark highlighting the importance of iteration in any process. You can read more about their very strong opinions in Lightweight Django. (See what I did there? I totally just encouraged you to buy it. Go buy it!) They also spoke about the ups and downs of writing, the doubts, and the ways they egged each other on. There too was the challenge of working on client projects full time, writing the rest of the time, having young children, and, very occasionally, sleeping. (Note: Mark also was training for and participating in triathlons during this time, a fact I cannot fully comprehend.)

Congratulations again to Mark and Julia for their great achievement. Also, many thanks to Sylvia for her smooth MC'ing skills, Girl Develop It RDU for co-hosting, O'Reilly Media for the books, and everyone who braved the traffic to get here.

Mark Signing LightWeight Django at Caktus Group

Lightweight Django launch audience at Caktus

Joe Gregoriowsgicollection | BitWorking



The idea of RESTful "Collections", i.e. doing CRUD over HTTP correctly, has been percolating for years now. A Collection is nothing more than a list, a container for resources. While the APP defines a Collection in terms of Atom Feed and Entry documents we don't have to be limited to that. It's time to complete a virtuous circle; RESTLog inspired the Atom Publishing Protocol which inspired David Heinemeier Hansson's World of Resources (pdf) and now it's time to come full circle and get that world of resources in Python.

In particular look at page 18 of that slide deck, where dispatching to a collection of people, the following URIs are to be handled:

  GET    /people         
  POST   /people
  GET    /people/1
  PUT    /people/1
  DELETE /people/1
  GET    /people;new
  GET    /people/1;edit

Now the 'new' and 'edit' URIs can be a bit ambiguous, only in the sense that you might not guess right away that they are nouns, and remember, URIs always identify nouns. I prefer to make the noun-ishness of them more apparent.

  GET    /people;create_form
  GET    /people/1;edit_form

In general, using the notation of Selector, we are looking at URIs of the form:


And dispatching requests to URIs of that form to functions with nice names:

  GET    /people               list()
  POST   /people               create()
  GET    /people/1             retrieve()
  PUT    /people/1             update()
  DELETE /people/1             delete()
  GET    /people;create_form   get_create_form()
  GET    /people/1;edit_form   get_edit_form()

Introducing wsgicollection, a Python library that does just that, simplifying implementing such a Collection under WSGI.

Wsgicollection uses Selector indirectly, relying on it to parse the URIs for {id} and {noun}. In theory it will work with any WSGI middleware that sets values for 'id' and 'noun' in environ['selector.vars'] environ['wsgiorg.routing_args']. Here is how you would define a WSGI application that implements a collection:

from wsgicollection import Collection

class RecipeCollection(Collection):

    # GET /cookbook/
def list(environ, start_response):
# POST /cookbook/
def create(environ, start_response):
# GET /cookbook/1
def retrieve(environ, start_response):
# PUT /cookbook/1
def update(environ, start_response):
# DELETE /cookbook/1
def delete(environ, start_response):
# GET /cookbook/;create_form
def get_create_form(environ, start_response):
# POST /cookbook/1;comment_form
def post_comment_form(environ, start_response):

And this class can be easily hooked up to Selector:

import selector

urls = selector.Selector()

urls.add('/cookbook/[{id}][;{noun}]', _ANY_=RecipeCollection())

Now that I have this Collection class it will ease implementing the APP, but as I indicated earlier, the collection (CRUD) model goes beyond that of just Atom, and we'll dig into that next.

You can find the code here.

Update: Fixed a bug where wsgicollection directly imported selector, which it does not need to do. You will, however, need selector installed to run the unit tests.

Update 2: Updated to support routing_args


Joe GregorioConfession of an Infinite Looper | BitWorking


Confession of an Infinite Looper

I admit it. I'm a looper. I will load up a single track on CD or mp3 and put it on infinite loop. The same song. Over and over. For possibly days at a time. I know I'm not the only one, so fess up if you're a looper too.

Some of my favorites for looping:

  • Ozzy Osbourne - Crazy Train
  • The President of the United States of America - Feather Pluckin
  • Iron Maiden - Childhood's End
  • Talking Heads - Once In A Lifetime

Today's post is brought to you by Warrant's "Down Boys", which I've been looping since Monday...

I saw "Lost in Translation" last Friday and have been playing Jesus & Mary Chain's "Just Like Honey" ever since.

Posted by Larry O'Brien on 2003-10-03

I am a looper too although I usually loop for only a few hours playing songs like:

Alphaville - Forever Young
Vivaldi - Spring
Eagles - Hotel California
Emotions - Best of My Love
Isley Brothers - Shout
James Brown - Play that funky music
Lee Soo Young - Final Fantasy X Theme
Madonna - Ray of Light
Norah Jones - Don't Know Why
Savage Garden - I Knew I Love You
Vanessa William - Colors of the Wind

I think the wide variety is due to my being moody. :-)

Posted by Don Park on 2003-10-03

i keep looping Tori Amos and Nirvana UNplugged to death - a winning combination for me - still strong after few years ...

Posted by Alek on 2003-10-04

Songs I've looped for several days:
Red House Painters - Revelation Big Sur
Alanis Morissette - No Pressure Over Cappucino
Everything But The Girl - Driving (cover)
Jeff Buckley - Last Goodbye
Massive Attack - Protection
They Might Be Giants - Where Do They Make Balloons

As for whole CDs I have looped for days or even weeks, the list goes on. I "learn" entire CDs at a time. For the most part, I wake up with a song in my head and have to scramble to put it on so that it doesn't eat up my brain for the rest of the day.

In the past few months I consistently loop through all my Pizzicato Five tracks on iTunes (8 hours' worth) to stay awake. It's pretty hard to feel down when you're listening to J-pop. :)

Posted by steph on 2003-10-06

My wife happen to look over my shoulder when I was reading this.  "Hah! See?!? I'm not the only one."  She's more of a James Tailor and Van Morrison looper (as classics go).  Otherwise, its every new album.  Right now, it's Dido and Ben Harper, with REM likely on the horizon.

This also reminds me of a guy I knew in college who was really into the various RPGs.  In one (Rifts maybe), he had created a character who was some sort of techno-enhanced soldier.  The background story was that in the heat of a battle, a direct hit caused his music device to get fused to the jack in his head, also breaking the device so it could not be turned off.  One song played over and over and over: "rooster" by Alice in Chains.

There was no point to the story.  The post just reminded me of it.

Posted by Seairth on 2003-10-08

I tend to play Meat Beat Manifesto's "It's the music" over and over again when driving my car. Other favs are "Circle Jack (Chase the magic word lego-lego" by Melt Banana; "The Empty Page" by Sonic Youth; come to think of it, I'm a pretty damn looper myself...

Posted by Adriaan on 2003-10-09


Joe GregorioClearCase as a leading indicator of small technology company failure | BitWorking


ClearCase as a leading indicator of small technology company failure

Is it just me or is ClearCase a leading indicator of small technology company death? I've never used the product, never even seen a demo, and yes, I know, they position themselves as software asset management for medium to large teams. The question comes from the fact that the only people I know that have used ClearCase have used it in the past, all at small technology companies, and they all, without exception, are companies that have gone out of business.

So fill up the comments with your experiences with ClearCase, good or bad, hopefully with someone from at least one small technology company that has succeeded inspite of deploying ClearCase.

Update: In case you are wondering what CMS you should use, consult kittenfight, where subversion beats ClearCase, and so does BitKeeper.

I can't speak for ClearCase, but I worked for a small tech company that nearly destroyed its market share using Rational's "Unified Process." We didn't recover until we dumped all the weight and moved to something more XPish. Rational is where ClearCase originated.

Posted by petrilli on 2004-08-29

Yes, seen it happen at a .com.  They tried to get clearcase (and the paradise of clearquest integration) going for years, while the dev team plodded along using visual source safe (horrible, but it worked...  as long as you don't try to branch.)  The company was delisted, almost went bankrupt, then sold itself for pennies a share.  Then again, it's hard to blame clearcase when the place was offering free overnight shipping on every order.

An ever surer sign of impending doom: the email that comes out from some new vice president you've never heard of, announcing the deal just signed with $MAJOR_VENDOR to implement AN ERP SYSTEM.  When you see this, you are already dead.

Posted by steve minutillo on 2004-08-30

The thing that always amazed me about ClearCase was the sheer incompetence of their Web interface.  I have never seen an app break THAT many usability rules.  It was a nightmare.

I thought the software was pretty good at the core, but that Web interface was (and may still be, for all I know), utterly laughable.

Posted by Deane on 2004-08-30

Actually, Clearcase originated at Atria and was designed by defectors from Apollo Computers back in the late '80s.  I was supporting Apollo dev tools back then when HP bought and dissected the company, and I was one of the original tech supporters of Clearcase.  Even caught one of Paul Levine's marketing talks for 1.0 beta.

Last I used it was 5 years ago with the Chandra X-Ray Observatory project, which worked well since that project was fractured among many companies and locations around the country and had a large diverse code base.  But after 5 years in the eCompany space I'm not sure who uses it anymore or why.

Posted by chris burleson on 2004-08-30

Now you've probably gone and ruined my Pagerank for "Clearcase sucks".  I wrote down all my grudges against The Great Satan of Version Control (ooohhh, I like that) here: http://www.trmk.org/~adam/blog/archive/000106.html

Posted by Adam Keys on 2004-08-30

Clearcase is neat in a large, multi-team environment, but yes, I can see why it would be the death of a small company.

The worst thing about Clearcase is Rational's attempts to integrate it with the rest of their product range, ClearQuest in particular. The best thing is the ability of a central SCM-guy to keep many teamsworth of source code under some kind of control.

Posted by Alan Green on 2004-08-30

Whoever said that ClearCase has the worst web interface ever hasn't used ClearQuest. We had a vendor who required us to use their ClearQuest server to handle tickets during QA and it was a painful effort. FogBUGZ has some issues, but it's at least quick and easy.

Posted by Bill Brown on 2004-08-30

I've used Clearcase some years ago when I was working for a very large
IT companies. They could afford to have 2 Clearcase admins.

For smaller companies Clearcase is just too complicated and requires too much resources.

It has lot's of estoric features, and you probably have some problem with your process, if you really need them :]


Posted by anonymous on 2004-08-31

I know for certain that here in Singapore, Motorola uses ClearCase in some of their projects.

Posted by Deepak on 2004-08-31

I used it once when I joined a company for a short term contract. The company is still there but I really do not think it will be there next year... if comments are still opened next year, I will confirm this ;)

They had an actual team of 5-6 people devoted to managing that beast. 5 experts that touched ClearCase all day long. As a new developer there, it took me 2 full weeks to manage to get a build working on my machine. Some will argue that the build has nothing to do with ClearCase and you are probably right. But the complexity of things like managing branches caused developers to be lazy and almost never do full updates to newer code. Result: Daily broken builds.

This was (is) no small project: 50 developers working on all tiers of an amazingly complex J2EE application built with WebSphere. Before joining this company, I thought I had seen all nightmares out there. I had not. ClearCase's setup, complexity plus code organization was a huge part of this nightmare.

We used the Windows client. I also remember that for certain operations, we had to login to a unix box and use X to do other specific things (I do not remember what though) not available on the windows client. There were ClearCase admins available 24/7 to support developers that were working around the clock.

Posted by Claude on 2004-09-01

ClearCase is failure, whether the company is big, medium or small. Projects that use it are incapable of recognising their impending death.

Some managers like to buy things that they don't understand - they think they must be impressive. But sometimes they just don't work, and that's why they don't understand them. Developers suffer.

In theory ClearCase can do some things that CVS (or similar) can't. In practice, you don't need to do those things, and you'll never get that far with ClearCase anyway. Get CVS and enjoy actually being able to change a line of code now and then.

Posted by Murray Cumming on 2004-09-02

Interesting.  My company just paid to have me brought up to speed on ClearCase.  I don't think they will be fading away any time soon considering they have already made it past the century mark.  What blows my mind is that this is only one of four versioning systems they use in different parts of the company.

Disclaimer: I am not a fan of ClearCase.

Posted by Peter on 2004-09-04

Interesting comments on Rational service since big blue has taken over: http://www.gripe2ed.com/scoop/story/2004/9/5/14453/96974

Posted by John Beimler on 2004-09-06

Tality Corporation. A spin off from Cadence corporation, doing design services.

Clearcase was purchased. SW development tanked. Tality folds.

Pretty clearcut I'd say.

Posted by Ex Tality Employee on 2004-09-08

Well, I used to work for a multinational that used clear case on large and small projects. It was VERY expensive, but as far as I can tell having since used CVS it is much more powerful. The branching and merging are done properly, and the tagging is also done properly (ie you can see tagged and when...). It also lets you have distributed versioning databases so that you can work in multiple places without having to depend on a central server in overseas somewhere.

The web interface mat not have been the best, but the solaris and windows interface for merging multiple streams of development hammers CVS into a pulp.

Posted by Ryan on 2004-09-08

The branching and merging are done

Branching is done per-file, only when a file has actually changed. As far as I can tell that's for ClearCase-specific performance reasons.

So, instead of one person at one time saying "this project has branched", and then just letting people work on that branch, this means that individual people need to branch each file before they change it,

Firstly, they forget, secondly it's very difficult to make ClearCase do this automatically, thirdly you have to tell people to make ClearCase do this automatically. But most importantly, someone working on a main branch who has no interest in a second branch, will not branch the file because he doesn't care. And then your build breaks. The CVS solution is simpler and usually what you want.

Posted by Murray Cumming on 2004-09-09

ClearCase is really slow.  Explicit checkouts are painful (but the Vim plugin makes it bearable).

The versioning is file-based, like CVS, instead of changeset-based.  I don't know why people would pay so much money for something so fundamentally flawed.

Posted by Kannan Goundan on 2004-09-14

Having switched to a project at wireless telco in Atlanta, I have to agree with everything I've read that you folks have said.

Clearcase and Clearquest are both unnecessarily heavy and unusable. Views, dynamic views, no support for generic linux distros, bastardized command line support, the list goes on and on.

The one thing keeping us on it is the fact that we can't seem to make it so subversion or CVS require a defect number from Clearquest to commit. That alone would give us an out.

There is nothing about the software I like. And I've used VSS, Starteam, CVS, Subversion (My favorite now. Atomic commits, Whoot!), and sccs.

Never work for a team using this software if you have a choice. If you don't supplement it with a subversion repository so you don't have to check in but every once in a while.

Posted by anonymous on 2004-09-16

I am working on a development project, that has 500+ developers in 6 locations worldwide, with 80,000+ files in the system. We use the UCM option in clearcase. Yes clearcase has alot of annoying bits, and the admin overhead can be a big pain. BUT as an admin/developer i see alot of issues are with developer misuse of the system. they beat at it repeatidly, ignoring alot of the main concepts, and then shout when it isnt smooth anymore. And if one more developer comments about how it was much better with CVS, i will break a keyboard over their head. AND finally it can be slow, and the web or WAN options are next to useless

Posted by Liam on 2004-10-16

Odd comments.

I guess we are a smallish company with 200 developers in our office and 800 developers worldwide working on large enterprise development systems.

We use CC and CQ tied together with UCM.
As well as being a developer I am also the admin for CC/CQ on our project.

I have to say that once you get over the initial shock of the appaling interfaces and terrible way the products interface with each other (why do they have different logins neither of which tie into Windows or Unix?), the products do what they claim and rarely need any intervention in my experience.

If you don't use UCM then I can imagine that any company is going to get overwhelmed pretty quickly but if you can live with the restrictions of UCM then I don't see a problem. Merging and versioning is all handled automatically.

I just wish that IBM would rework the products to all be on a common base so that they dovetail together, with the OS, update the interfaces, and improve the performance. (And bring down the crippling price.)

But basically it works for us.

Posted by Rob Scott on 2004-10-19

Let's just clarify, a small company does not have 800 developers. Nor does it have 200 developers.

A small company has less than 10 developers.

Posted by Joe on 2004-10-20

My group has used CC and other systems (primarily for worldwide multisite hardware projects) since the Atria/Rational days.  Our company's worldwide software groups also use CC.

We do have dedicated CC admins, but ours generally do not dictate or enforce configuration mangement (they prefer to let projects choose their own path to success or ruin.)  They just keep the multisite syncs running.

Over the years our company has developed a robust release and configuration management method on top of CC.  Branches are often lifesavers, and we can reliably and automatically generate coherent releases of our designs.  CC is very fast on adequate servers, and ClearMake's DOs and CRs are unmatched.

I am amused by some of the comments above and in the links, which apparently don't understand what is required by CC and what might be imposed by a clueless CC admin.

It doesn't bother me that CC implements its own filesystem and that the raw data is in some database; our RAID NAS works the same way.  I'm never going to see the raw data on a disk somewhere anyway.

Given the resource (staff, hardware, etc) requirements of CC, I heartily agree it isn't for the small developer, just as a (insert-name-of-exotic-car) isn't for the average commuter.  IMHO, if you can afford the infrastructure for it and you take the time to learn to use it properly, I haven't seen anything that can beat it.

Posted by L on 2004-10-26

I have been using Clearquest for about 2 years now. We also have ClearCase. Both are used very widely in our organization. Even though I know ClearQuest and ClearCase have problems but I didnt feel it bad to the extent described in this page.

We are planning to integrate ClearCase with Clearquest and looks like we will have to take our decision very carefully.

Victor Nadar , Mumbai Udyan

Posted by Victor Nadar - Mumbai Udyan on 2004-11-01

We use both CC and CQ in a company of around 100-150 developers. I don't think it's all that bad - granted, the interfaces are shocking and there's a bit of overhead, but I can't see it being the company-killer people are describing...

That said, after four years of CC, we're investigating other options - I gather one team is now running a CVS repository that they periodically sync with CC, and I'd like to see a move to Subversion in future.

Posted by Simon on 2004-11-09

I work for a company that almost exclusively use CC. There have been problems with it ofcourse but nothing major. The features CC provides outweighs the problems (atleast in my book).

There is an initial hurdle before one understands the config-spec syntax. This ofcourse may be the problem with CC because if one can't write correct CS then stuff will not work as expected.

Someone mentioned problems with branching. This is not something I have noticed. If the config-spec is written correctly all files checked out will automagically be branched out to the desired branch.

Posted by anonymous on 2004-11-09

I've used CC at a number of small and medium sized companies, and it seems to me that most of the negative comments here are based on a misunderstanding of how to use the product.  As with most products, CC has a large number of features available to the user, but that doesn't mean you have to use them.

I have a team of 2 C++ programmers who have just migrated from VSS to Base CC.  They use it in exactly the same way (i.e. single stream, no branching) and have no additional overhead with CC.  What they do have is a far more robust system, better merging etc.

I also have a team of 6 Java programmers who migrated from VSS to CC.  At first, I tried to implement the same process in VSS, as I would have done in CC.  I don't think it was even possible!!  Now we use CC UCM and it's a piece of cake.  We have a release stream (branch), a maint stream and a dev stream.  We can easily merge changes between the streams and take full advantages of activities and baselines.

As for admin, there are two of us (of the 8 programmers above) who take care of things.  This basically involves the creation of new dev and maint streams at the end of a release(about 2 hours work max) - not too much of an overhead.

It seems to me that the companies wishing to use CC should think about engaging someone with CC expertise.  This person should be able to set up procedures relative to their project size.

To answer the interface problems, I believe IBM is planing to build CC into the Eclipse framework.  This should improve things in that area.

Posted by Phil Kelly on 2004-11-11

Phil (or anyone), do you mean a dev stream per user or a shared dev stream?  For the teams that share a dev stream, are you using dynamic views (where checked-in items appear instantly in all other developers' workspaces) or snapshot views (where developers "pull" changes when they're ready to integrate)?

If you use per-developer streams (common in UCM), do you find the process of rebasing, delivering, baselining, and recommending baselines to be cumbersome? Is it clear that per-developer streams is implicitly the "edit-merge-commit" style common to CVS and Subversion, but with all the overhead of branches?  (CC streams are really branches.)

We've recently ditched per-developer streams (unless there is the rare need) and use snapshot views on the mainline stream.  ClearCase's support for snapshot views is a little primitive compared to, say, Subversion but is far simpler than using UCM.

Posted by A CC User on 2004-11-12

I have used Clear Case for the past 2 years.  I was the buildmaster/local Clear Case admin for our development teams.  Clear Case is by far (out of any application I have ever used in general) the most difficult and over engineered tool I have ever used.  I have spent countless hours troubleshooting it and dealing with our CC admins.  It is so tightly coupled with the OS that starting up a Windows pc when the network was down caused each developer a half hour startup delay while CC tried to start itself over and over again.  The merge and Diff tools are complete wastes of time.  Imagine dealing with a crap tool every day you came into work.  I would not recommend Rational Head Case to anyone.

Posted by scranthdaddy on 2004-12-01

why are all comments tagged as spam?
oh well forget it

Posted by Another CC user on 2004-12-07

clearcase blows.

Posted by spunkboy on 2005-02-11

I was just browsing the internet for some ideas on software development when I stumbled across this page. I've had a read through and I don't think it represents a fair argument.

I started using CC back in 97 when it was still Atria or Pure Atria (can't remember which!) and CQ from it's beta days in 98/99. I was also in my early 20's and didn't know much about config management so I was just happy to go with the flow. Basically I've just grown up with it, got used to it's quirks along the way and it's the only source control tool I've used professionally. Yes this makes be a bit biased towards ClearCase but I'm not protective of it!

I'll also admit that it took me a long time to fully understand it's methods, and error messages, but at that time the support from Atria was outstanding. Their help desk staff knew their product inside and out. If I had an issue it was usually resolved there and then on the first phone call. If it wasn't then it really was a problem and needed to be investigated. Then Rational turned up and ever since then the support has just gone from bad to worse. Now when I call I have to provide ccdoctor reports, event logs and a million and other things. This usually means a very long email trail and I've usually solved the problem before Rational bother to call me back. Now that IBM have taken over, it's even worse. First of all I have to navigate their website. Ha! At my last company we actually had an expert on the IBM website. If you had a problem you talked to him and he'd navigate their site and send you the URL! Anyway...

ClearCase on it's own with no UCM and CQ integration is a fantastic product. Companies fail with ClearCase implementations for many reasons and license cost is not usually one of them. For example,

1. The projects themselves are planned badly. I have often come in on a Monday morning to find an email saying that development on a new version started last Friday.... How on earth am I supposed to support a new release when I have no input in to it's development strategy, have no ideas about it's lifecycle and can not spend half an hour setting up the new CC environment and perhaps closing off the old one! I could go on but I think most of you will have experienced this whether you are a developer or CC administrator.

2. The CC administrator is usually some poor sole who has no idea about configuration management principles, probably didn't want the job to start with and has a 1 week IBM/Rational course under his/her belt if their lucky. Then the unreasonable requests start coming in from the management teams who think ClearCase is no more complicated than notepad and that this new CC administrator is now a guru! They want projects migrated within the week with no consideration for processes and policies...back to point 1

3. Branching strategies are the next big failure. Don't mean to offend any UNIX guys but ClearCase is like a UNIX system. If it can do it it will do it. It won't complain if it's the wrong way of doing it or suggest a better way. I have seen so many companies using way too many levels of branches that things become unmanageable. A project typically needs 3 or 4 levels branches - main, development, integration and possibly task based branches. Again, back to point 1. Consultation between admins and managers to understand expectiations and plan a strategy.

ClearCase is just a database like any other. Take Oracle for example. If you just slap in a bit of SQL and hope for the best then you might as well use Access. Instead you have to write triggers, procedures, packages and build tools around it. And, as with any database, you have to plan it.

In terms of overhead, ClearCase maintenance should take no more than 3 hours per week. The rest of the time the CC administrator should be writing tools such as build scripts, reporting queries, triggers, automation tools, CM policies/strategies and just generally improving the environment. i.e. things to help developers and keep the boss happy!

I have never had a project fail because of a ClearCase implementation (even with UCM) and I would also expect that those small companies that went under probably would have anyway regardless of the version control tool. Saving the AU$200,000 spent on licensing would probably not have saved the company anyway.

UCM... hmm! First of all this is targeted at managers especially non-technical ones. They like the fancy pictures and models of their project's development. I was disgusted at IBM/Rational's sales pitch (I heard it last week) because, for a non-technical manager, it's like giving candy to a baby. It just gets gobbled up. I personally think UCM should be wiped from this Earth and that anyone who thinks UCM is the solution to their problems or project should be shot! Having said that my boss wants the full UCM rollout..... I know I've just been very harsh but in any event any failure of UCM comes back to points 1, 2 and 3 above. Whether I like UCM or not is irelevant.

The ClearCase Exploder GUI? Let's just say ?bring back ClearCase details?! Why do I need view shortcuts. I'm an administrator and chances are I created most of the 2000 views on the network. I don't need 2000 shortcuts!

I usually work for corporates, oh and by the way, my last job was to support 1500 developers in 8 sites across 6 countries. I was the only CC administrator until I trained someone else up. Total time spent on CC maintenance was probably 1 day a week usually less (assuming no server crashes or catastrophes). Anyway... most of my other jobs had between 10 and 20 developers which is small for a corporate. None of those projects failed because of ClearCase. Most failed because of changing times or crap management. I now work for a small company of 20 staff of which 9 are developers. They are forking out for both CC and CQ licenses at AU$13000 per CC+CQ license. If this company fails because they implemented ClearCase it won't be ClearCase's fault; it will be mine. It will be my fault for not implementing the solution correctly, for not educating the developers properly and overall for being a crap CC & CM administrator.

Problems with CC on Windows? Chances are it's a Microsoft related issue affecting ClearCase. Buy a UNIX box! Windows explorer has just locked up because my server has just crashed. Most of my other applications are now locked up too as a result. If I kill explorer, the other apps spring back to life! CCWeb, ah yes, it runs on a Windows platform and is designed for IE. Apache web server or not I rest my case.

Finally, ClearCase is not always the right tool for the job. Brings me back to point 1. Good project management means evaluating the available software and making the right choice. Whether thats ClearCase, Perforce, Subversion, Bitkeeper or even SourceSafe. It also means employing someone with the right skills, enthusiasm and passion for CM and not just re-training someone to fill an open position. ClearCase is just a tool to assist the CM process. If the CM process is flawed then the tool will fail. Managers and developers will hate it and projects/companies can fail. Back to point 1.

Well I hope I didn't bore you too much and I hope this goes through in one piece because I spent a long time writing it!

Posted by BB on 2005-02-24

How do I integrate Clear Quest in Visual Source Safe.

Posted by Roland Ehi on 2005-03-09

To briefly echo the sentiments of spunkboy, a fool with a tool is still a fool. 

You need to know what you want to do with the tool.  If you can do all of those things with another tool, more power to you.

I have made a reputation out of intergrating tools thaat aren't supposedly integrated.  Base ClearCase (without UCM) is great for that because it has a lot of hooks.  Granted, you have to know how to take advantage of it. 

On a single OS platform ClearCase doesn't take a dedicated admin, except as you work to set it up with automating your processes and putting in enforcement triggers.  I speak from 10 years experience on this.

When working with cross platform (PC to Unix), I have found it is a bit of a mess.  But the mess is strictly based on  crappy networking support on the PC side, not anything to do with ClearCase.

Posted by vobguy on 2005-04-27

Ouch, poor old ClearCase.
I'm trying to set ClearCase up at the moment, and to be fair, it's a bit of pain in the arse. My main question is why the hell do you need ClearCase for 10 or less developers?!?!
Seems like you're using a sledgehammer to crack an egg.
I think VSS would probably do, if anything. You just need a decent, workable CM process.

Posted by Dave LR on 2005-05-27

Well I can see why people with little clearcase skills would not like the tool. However I can asure you in the correct hands a team using clearcase can manage change far better then a team using CVS or worse yet, no cm tool at all.

ClearCase isnt bad, clearcase in unskilled hands is.

Posted by chad on 2005-06-30

I've used CC on four projects now and they've all suffered as a result.

I do believe that CC is fundamentally powerful (in the right hands) BUT the conceptual complexity it brings is overwhelming and it outweighs it's benefits, IMO. Most developers do not have the time nor the inclination to spend the weeks learning how to use the SCM system.

As developers we have enough complexity to worry about. An SCM tool should make working with code easier, not harder.

To me, CC is a fetid putrid piece of stinking dog poop. I hate it deeply.

Posted by Tony on 2005-07-08

Hi Guys,
I was just reading all the comments made from you guys about CC. Well I've been using CC and CQ since a while already. We've started with an old IBM product CMVC. I know that many had their issues with this one as well, but you would not believe how many comapnies out there still use it. I must say that I've seen many other products (VSS, PVCS,CVS,MKS and more recently Seapines software suite), but in some of the concepts CMVC was ahead of its time.
I've originally started in a company origined in germany. We had been aquired (and later sold ) several times. When we started migrating to CC/CQ we used many concepts from CMVC (for good or for the bad).
Well overall I would not say that CC is that bad. It has many many thinks other tools don't have. But at the same time, I believe that with many tools you can make it usable for the needs it is bought for. You just need to know how.

We've succesfully used CC/CQ in a multinational environment utilizing multisite. We've managed to create (with extra effort) a distributed development environment for development around the globe and around the clock.

Posted by Raymond Masocol on 2005-08-05


Judging from the feedback above, ClearCase is an administrator's dream. All the control, all the features, but at the cost of being heavyweight.  To a developer, it represents another complex tool to be overcome in their workday. Hence the schism evident between the developer's viewpoint and the administrator's viewpoint.

This line of discussion raises the question: who is more important to the software development process? The developer or the adminstration staff?



Posted by Will Waterson on 2005-09-25

We've used Rational at our small (< 10 person) company for a couple of years. I wouldn't say it's been an unmitigated disaster, but it sure as hell is not worth the money. I think a lot of companies make the decision to go to Rational because Rational claims to have a process (RUP) and they convince you that you can't do RUP without their tools. They also try to convince you that every single person needs a full enterprise suite, which is crap. Companies could save a lot of money by figuring out what a good process for them actually is and THEN deciding what tools fit the bill. The most commonly used tools in the suite(CC and CQ) don't do any better IMHO than freely available ones like SubVersion and Bugzilla respectively, and the other ones like Purify/Quantify can be nice but you can probably share one or two licenses across the entire group.

Posted by Ted on 2005-09-27

Just came across this link today as company stated using ClearCase (suckered into it by a Rational Sales team IMHO as  we have succesfully been using CVS for past two year with no problems)!

"ClearCase isnt bad, clearcase in unskilled hands is."
If the tool was any good it would be developer proof! CVS up and running in less than an hour (I know a little slow) CC a week and still it's causing the development team nightmares - We will probably thorw out and go back to CVS and just let managment think using it!!!

.... Don't even get me started on the "RUP'ish" - A great process if throw out the "R" bit of it - UP as put forward in XP world great!

Posted by John on 2005-10-03

I fully agree with many of the comments above

1. UCM is bad, very bad. It is all pretty pictures and management-speak. Anybody with a few minutes to spare can use the trigger feature available in base ClearCase to implement a much better, site-specific version of UCM without loosing many of the better features of base ClearCase. Do not use UCM!

2. Base ClearCase config specs are incredibly powerful and knowing how to at least read them is important for a developer.

3. If I gave a responsible, apparently intelligent, adult a chainsaw and they cut their leg off would it be my fault? I love the commment that a fool with a tool is still a fool. That is so true. OK, ClearCase may need a bit of thought to set up (it certainly isn't plug 'n' play) but it is worth it for the right size team.

4. ClearCase is overkill for small projects

5. CC/CQ integration is easy and in my experience has never failed

6. One company I worked at who used ClearCase very nearly went under. It was not the fault of ClearCase. ClearCase wasn't (allegedly) playing with the books, writing crap contracts and deceiving the stock market.

So in summary, don't use ClearCase if you are small because it's not worth it, don't use UCM no matter what your size, and ClearCase can't bring your project/company down without significant help and assistance.

Posted by Jim on 2005-10-10

I fully agree with many of the comments above

1. UCM is bad, very bad. It is all pretty pictures and management-speak. Anybody with a few minutes to spare can use the trigger feature available in base ClearCase to implement a much better, site-specific version of UCM without loosing many of the better features of base ClearCase. Do not use UCM!

2. Base ClearCase config specs are incredibly powerful and knowing how to at least read them is important for a developer.

3. If I gave a responsible, apparently intelligent, adult a chainsaw and they cut their leg off would it be my fault? I love the commment that a fool with a tool is still a fool. That is so true. OK, ClearCase may need a bit of thought to set up (it certainly isn't plug 'n' play) but it is worth it for the right size team.

4. ClearCase is overkill for small projects

5. CC/CQ integration is easy and in my experience has never failed

6. One company I worked at who used ClearCase very nearly went under. It was not the fault of ClearCase. ClearCase wasn't (allegedly) playing with the books, writing crap contracts and deceiving the stock market.

So in summary, don't use ClearCase if you are small because it's not worth it, don't use UCM no matter what your size, and ClearCase can't bring your project/company down without significant help and assistance.

Posted by Jim on 2005-10-10


Joe GregorioChrysler 300 | BitWorking


Chrysler 300

Chrysler 300

The first time I saw the new Chrylser 300 I hated it.

The next time I saw it I was intrigued.

The third time I saw it the Dick-Tracy-Muscle-Car-Terra-Plane shape had firmly wrapped it's tentacles around my lizard brain.

I don't own one.


I use a car service for some of my business trips (travel to the airport).  The service I like best uses the 300 and it is a spectacular car for this purpose. 

From the perspective of a rider in the back seat, the car if very comfortably and spacious.  It also has enough cup holders and arm rest space to be quite functional as an en-route office.

I haven't driven the car, but the drivers have all commented that they like the car very much, and since they're driving for 6+ hours a day I think their opinion is quite valuable.

Posted by Lou on 2005-06-21


Joe GregorioChristmas 2003 | BitWorking


Christmas 2003

There were cookies.

Picture of decorated sugar cookies


Joe GregorioChina - Day 9 and 10 Chongqing to the White Swan Hotel in Guangzhou | BitWorking


China - Day 9 and 10 Chongqing to the White Swan Hotel in Guangzhou

Day 9 was pretty uneventful, Lynne, Caden and I spent some time in the morning shopping with Mark, Moya and Andie. The afternoon was filled with paperwork and the rest of the day was spent packing for our travelling to Guangzhou the next day.

Part of our shopping was to buy a new large suitcase. We didn't really need it for this leg of the trip but we plan on doing a lot of shopping in Guangzhou, enough that we know we'll need a whole extra suitcase for everything we are going to purchase.

The paperwork took over two hours to complete, not that it bothered me much as Lynne took care of it while Caden and I had together time. One of the things we did together was get more pictures. Here is a shot of the building I was talking about the other day, where a shop has been setup on the ground floor and the building isn't even complete.

Store and noodleshop setup in a building still under construction

I also got some pictures of what our group referred to as "the big dig" next to the Marriott. It is a large excavation for a basement, one of the three new highrises being constructed in the front of the Marriott. The culture here is definitely not as safety conscious as it is in the US. The pictures I took here are from the driveway of the Marriott. The only thing that separated me from a 50 foot fall to the bottom of the pit was a thin landscaped area. No fence. No barriers. Nothing.

Large excavation for the basement of a highrise.

The paperwork that needed to be done is much longer and more complex than anything we've had to fill out so far. That's because it is all for the American government and not for the Chinese government. Over the next 4 days we will complete all the steps we need to adopt Caden in the US.

Day 10

After packing we turned in early, got up early and were on the bus to the Chongqing airport by 9AM. A one and a half hour flight landed us in Guangzhou and a 20 minute bus ride got us to the White Swan Hotel. The White Swan is a 5 star hotel that is a favorite of adoption agencies to put up their clients because it is so nice and it is conveniently located to the services we need. The American consulate is literally a 5 minute walk down the street. Now we were fortunately warned from others that had gone before us that if you stayed at the Chongqing Marriott then not to expect too much when you go to the White Swan. They were right. I am quite honestly not that impressed. They messed up our room assignment, first giving us a room on the 11th floor then retracting it, saying the room wasn't ready, then after waiting they gave us a room on the 15th floor, but it too wasn't cleaned so they gave us the information, but refused to give us the keys until the room was done. Wendy, one of our agency's representatives out of New York was kind enough to let us use her room on the same floor until ours was cleaned. They assured us that they would get us from Wendy's room and give us the keys as soon as it was done. Well, they didn't and 20 minutes later we checked on things to find that the room was done, but our keys were not available, and they'd forgotten to set up a crib in the room. They eventually got the crib set up and the keys to us, but all-in-all it was a pretty shabby introduction to the hotel. Oh, and the rooms are smaller, the beds are smaller, the bathroom is absolutely cramped with no counter or drawer space. In general the staff have been professional, but they verge on snooty and are nowhere near as nice as the staff at the Chongqing Marriott.

The entire feel of the area is very different from Chongqing. The air is much cleaner and the area surrounding the White Swan is almost alien, as it is all European styled buildings, something to do with Britian and France and the Opium Wars. We'll go into that more tomorrow.

We did a little exploring, mostly to get some bottled water and Coca-Cola, and to find the local laundry.


Water has been constantly on our minds during this trip because none of the tap water is potable. As a matter of fact it has been various shades of brown and tan over the last few days. Given that we don't in anyway want to ingest the water we go through a lot of bottled water. It also means that we have been washing all of Caden's bottles and bowls with bottled water too. You never realize how much water you use washing dishes until you're pouring that water out of a bottle and not from a tap. The water restriction means that at every hotel we have been hanging a towel over the tap to remind ourselves not to use it, even for things like rinsing your mouth after brushing your teeth, or for rinsing your tooth brush off afterwards. Bottled water for everything. Luckly we can pick it up at a reasonable price, a two liter bottle runs only about 70 cents.


Special thanks to Ralf for converting the flower street videos into divx, quicktime and mpeg4!


Joe GregorioChina - Day 8 - Another walk around Chongqing | BitWorking


China - Day 8 - Another walk around Chongqing

Another day and another walk around Chongqing. This time Caden and Lynne accompanied me and Lynne took the pictures.

Porter on a crowded street

Whenever we go out with Caden she gets put in the sling, which does several things. The first thing it does is save my back. The second thing the sling does is cover most of Caden, with a hat only her eyes are visible outside of the sling if she's awake, and she is completely hidden if she falls asleep. The reason you want to keep her hidden is to avoid the "clothing police". The "clothing police", as we call them, are well meaning older chinese women that will run up to you and cover any exposed part of a child and admonish you to keep them covered. The first time we went out Caden had the lower part of her legs exposed and three times on the walk we had ladies approach and pull her pant legs down further over her socks. The "clothing police" are only bested by the "thumb police". Caden sucks her thumb which is apparently a big no-no in China and complete strangers have no problem walking up to her and yanking her thumb from her mouth while admonishing us. Now you could try to explain that we just adopted her and that this is a particularly stressful time for her and we'll deal with the thumb sucking at a later date, but when you only speak english and they only speak Mandarin, well, you just keep the thumb hidden.

Caden in the sling.

Being so deep in China anybody who isn't non-Asian is a rare sight and Lynne and I are constantly stared at just for our looks alone, but the attention is much greater when were out with Caden. Lots of pleasant stares and smiles as we walk along, but if we stop, a crowd will gather. This has an interesting dynamic as many people try out their English on us and if one person 'makes contact', that is they say something we understand, they instantly get promoted to 'local translator' and everyone around them starts peppering them with questions to ask us. News that we are Americans and that we just adopted Caden is always greeted with huge smiles and great appreciation. Here I am getting slightly mobbed on the pedestrian mall.

A crowd gathers around me and Caden

These two girls were at first shy, but after asking about Caden and getting a couple pictures of themselves taken they started to ham it up.

Two children.

One of the strangest sights in both Beijing and Chongqing has been the sight of decidedly non-Asian mannequins. And it doesn't just end with the mannequins, a good portion of the advertising is populated with decidedly European looking models.


Did I mention that there's a lot of people here.

A crowd

One of the things I won't be able to get across, no matter how many pictures or videos I take, is the amazing contrasts that you can find in Chongqing. The brand new building in the front is built right up to, and includes a bit of a facade for, the aging apartment building behind it. Everywhere you see these contrasts and amazing changes. There is a high-rise across the street from the hotel that is under construction. In the uncompleted ground floor of the building people have already setup a noodle shop. From our hotel window I can count 17 cranes on 15 high-rises currently under construction.

Two buildings

Even in the midst of the all the construction and modernity there are reminders that this is an ancient land and culture. Less than half a block from our hotel is this drug-store which is six feet wide it is a single aisle twenty feet long.

Drug store

I hope you're all enjoying the photos. Once I get back to the US I need to learn some more about photography in general and digital photograpy in particular. For example, here is my current 'system' for handling digital photos.

  1. Load photo into Paint Shop Pro.
  2. Crop and resize.
  3. Press "One step photo fix..."
  4. If it doesn't look good then choose another photo.

Joe - it's been really interesting reading about your trip. I can totally relate to some of your "Lost in Translation" moments, as my wife is Chinese (I am not) and we occasionally visit Taiwan.

Congratulations on the adoption!

Posted by Craig Andera on 2004-01-08


Joe GregorioChina - Day 7 - A short walk around Chongqing | BitWorking


China - Day 7 - A short walk around Chongqing

This afternoon I went for a short walk walk in the area immediately around the Marriott. I looped through the 'pedestrian mall' which is a shipping area where the streets have been closed to cars, and the local flower market.

A crowded sidewalk

This Zhonghua Road, a busy shop lined road on the way to the pedestrian mall. There are a mass of shops along these roads, some no more than 5 feet wide, selling everything from fine suits, to noodles, to shoes. Some of the clothing and shoe shops have their production setup right on the sidewalk, ancient looking sewing machines running all day long. It might look like a trick of the light but the leaves really are that dull grayish-green color. The coal fired smoke and fog combine to produce a heavy smog that rains down a fine soot over everything in the city.

There aren't many private vehicles on the roads, mostly buses and taxi cabs. This is actually a pretty rare shot with the street completely empty. The careful art of crossing the street as a pedestrian, well, I'll leave the description of that for anther day.

An alley way.

The alley way above is a nice study in contrasts. The smaller buildings nestled back in the alley are much shorter and older than the ones near the front of the building. In the back you can see two high-rises in the mist. Both buildings are incomplete, the one on the left you can see the ever present crane on top.

Busy street with a porter

The man on the right hand side of the above picture with the long bamboo pole in his hands is a member of what the locals call the "Pole Army". Chongqing is a very mountainous region and bikes are not much use here, instead there are droves of men with poles that work as porters that carry items around the city. I've seen them carrying some pretty heavy loads. In the center of the picture is one of the clothing shops with the sewing machines set out on the sidewalk.

This brings out a recurrent theme I've seen across the whole trip into China. Whatever problem we would solve with technology in the US, the Chinese solve with people, and where we would use people to do a job in China they would use more people. Where we would use trucks, they have porters. Where we would use backhoes, they use laborers. Another example, in every grocery store and department store we have been in from Beijing to Chongqing, they are staffed with one person per aisle. Yes, you read that correctly, one person per aisle. We have yet to eat in a sit-down restaurant where the patrons outnumbered the staff.

McDonalds sign on a busy street

What good is a city without a McDonalds? How about 60 of them in Chongqing alone. There are two within walking distance of our hotel. This picture is taken on the pedestrian mall, with the McDonalds and a noodle shop on the left. The large mass of people in the middle of the picture are sitting on the benches eating noodles. Huge steaming mounds of delicious smelling noodles served in thin plastic bowls and eaten with the same disposable wooden chopsticks you get at Chinese restaurants in the US.

entrance to the flower market

Here is the flower market across the street from the hotel. I bought Lynne a half-dozen roses here on my way back. It was my first time haggling and I didn't do a very good job, but I got her the half-dozen (actually 7 roses) for 10 Yuan or $1.25US. One of the nicest things about being in China is that I am getting used to the prices and going back to the US is going to be difficult, and we haven't even gotten to do a lot of shopping outside of department stores where there isn't any haggling.

open air flower market

The market is more like an alley, with vendors selling everything from individual flowers, to huge arrangements, to bonsai trees, to gardening supplies. The alleys winds through the block and turns right and as turns it trasitions from flowers to food staples and cooked food. The above picture is a shot looking back down the alley towards the entrance. Each of the rooves on the left is a different shop selling different tools, produce or plants. For those of you with good connections I shot a short video (440KB) of the pedestrian traffic flowing into the entrance of the flower market . This is a pretty good representation of the pedestrian traffic and noise levels that are continuous throughout the city. It is a WMV file that I've tested in both Windows Media Player and Real Player. I can provide the original AVI file to anyone that wants to convert it into different formats, for which I would be eternally grateful.

Special thanks to Ralf for converting the flower street video into divx, quicktime and mp4!

Flower vendor wrapping up a sale.

This final shot is about half-way up the alley, a vendor is wrapping up a sale to the people standing in front of him. Behind him wends another alley, orthogonal to the current one that contains more plants and trees for sale. There aren't many flowers on the branches he's sold but they have a very strong scent. Speaking of scent, that's one of the things about China that I can't blog. There are wonderful scents like the flower market above, and the great smells of food wafting out of some of the stalls. On the other hand there are some pretty awful stenches, some that are nauseatingly familiar, others that of unknown origin and are completely beyond any previous experience.

Thank you so much for your tour.  I have a friend who lives in Chongqing and it is my desire to visit her soon.  You have permitted a glimpse, I thank you for that.


Posted by Stephen Cotta on 2004-04-18


Joe GregorioChina - Day 6 | BitWorking


China - Day 6

Today was a very low key day.

I wanted to keep a low profile as I am still recovering from the stomach flu, and we also want time with Caden, so Lynne and I skipped out on the scheduled site-seeing and stuck around the hotel. In the morning we walked around the nearby pedestrian mall with Mark and Moya doing some window shopping and stopping by the grocery store on the way back for necessities. I'm still not coherent enough, I forgot to bring along the camera, a sure sign that staying back was a good idea. The rest of the day was spent playing with and taking care of Caden, eating and exploring the hotel (you laugh, but this place is 39 stories tall, has 8 restaurants and about that many shops).

Caden playing with a rattle. Caden playing with a teething ring.

We took a lot of pictures of Caden, but because she is always laughing and playing so hard most of the shots end up looking like this one.

Blurry Caden playing.
I have to keep saying it -- she's beautiful, and looks thoroughly happy!  I couldn't be more happy for you.  This has definitely been a long gestation :)

Posted by Eric Vitiello on 2004-01-07


Joe GregorioChina - Day 5 - Gotcha Day | BitWorking


China - Day 5 - Gotcha Day

This is the day, Gotcha Day, the day we get Caden.

Lynne with Caden

She came right to us, with no crying at all.

Joe and Caden

It was very cute, she sat on Lynne's lap and just stared at me for a good 5 minutes. Then she reached out for me. We sent pictures out to the orphanage earlier and I think the Aunties did a good job of showing her the pictures because she seemed to recognize us.

Officially Adopted in China

After getting the babies we went to complete the official adoption paperwork. We have now officially adopted her in China. Now we wait for some of the paperwork to get back to us and then on to Guangzhou to adopt her in the US.

What you don't see in the pictures is that I am deathly ill. I caught the stomach bug. For the last picture just seconds before it I was sleeping on a chair in the lobby, shivering with the chills. I went back there immediately after the picture was taken. After we returned to the hotel I slept for 15 hours. I'm feeling better now and can hopefully more fully enjoy this time with our daughter.

Congratulations!!  She's beautiful!  I'm glad it's working so well, and wish you a safe trip home.


Posted by Eric Vitiello on 2004-01-05

Congratulations!!  She's beautiful!  I'm glad it's working so well, and wish you a safe trip home.


Posted by Eric Vitiello on 2004-01-05

Congratulations Joe & Lynne!  What a wonderful thing for you.  Best wishes to you all.

Posted by Jason Clark on 2004-01-05

Congradulations!  Just wonderful.

Posted by Don Park on 2004-01-08


Joe GregorioDetecting Benchmark Regression

Subtitle if this were an academic paper, which it’s not: A k-means clustering derived point statistic highly correlated with regressions in commit-series data with applications to automatic anomaly detection in large sets of benchmark results.

TL;DR: To detect regressions in benchmark data over a series of commits use k-means clustering on mean and variance normed commit-series. For each of the clusters find the best fitting step function to each cluster’s centroid. The metric |step/fit| is highly correlated with regressions, where step is the height of the step function, and fit is the mean square error of the fit of the step function to the centroid.

Below is the description of how we detect performance regression for the Skia graphics library. I’m writing this up because after much searching I haven’t found anyone describe the method we came up with for detecting performance regressions and maybe this writeup will be useful to other people.

Problem Statement

Skia is an open source cross platform 2D graphics library. In Skia, like many other software projects, we have large number of performance tests, aka benchmarks, and we run those benchmarks every time we change the code. Just having a large number of benchmarks isn’t a problem, but being cross platform means running those tests across many different platforms; Linux, Mac, Windows, Android, ChromeOS, on different GPUs, etc. which leads to a combinatorial explosion in benchmark results. For every commit to Skia today the testing infrastructure generates approximately 40,000 benchmark measurements. That number of results tends to change frequently as tests, platforms, and configurations are added and removed regularly. The number of results has been over 70,000 per commit in the past several months.


To make the following discussion easier let’s define some terms.

A Trace is single benchmark and configuration tracked over a series of commits. Note that this isn’t exactly a time series since the measurements aren’t taken at equidistant times, but are spaced by commits to the codebase. Also note that for each benchmark there may be multiple traces, for example, one for Windows 8, one for Linux, and one for Android.

Fig 1 - Trace

A “performance regression” is a significant change in either direction of a metric. Now a metric that drops may actually be a good performance increase, but could also be an indication of a test that is broken, or has stopped working entirely. So regardless of the benchmark, we are looking for step-like changes in either direction.

The issue with tens of thousands of traces is that you just can’t look at the raw numbers, or even plot all the data, and figure out when you’ve had a regression. At first we tried setting performance ranges for each trace, i.e. an upper and lower bound for each trace. If a later commit caused the benchmark results to move outside those bounds then that would trigger an error. There are many drawbacks to monitoring benchmarks by manually placing bounds on each trace:

  1. The most important drawback is that in such a system a single test result can trigger an alert. You know old phrase, “the plural of anecdote isn't data”, a single benchmark measurement is virtually meaningless as any number of anomalies could actually be responsible for that benchmark result changing. For example, a machine could overheat forcing a move to frequency scaling, or other processes on the machine may starve the test of CPU cycles. You can work to mitigate these eventualities, but they never completely go away. SPC systems such as the Western Electric rules might be applicable in this case, but we’ve never tested them.
  2. Constant manual editing of trace bounds is time consuming and error prone.
  3. Constantly adding manual trace bounds for new benchmarks is time consuming. Add one test and you may have to add many trace bounds, one for each member of that combinatorial explosion.
  4. Forgetting to add new ranges for new benchmarks another source of error.

Even if you automate the placing of trace bounds, you still have the issue of transient behavior that looks like a regression, and you also have to take pains that the automatic calculation of trace bounds doesn’t mask a true regression.

Fig 2- Is this a regression or an anomaly?

So we needed a better system than setting trace bounds. The next section explains the system we implemented and have successfully run for several months now.

Before we go further let’s define a few more terms.

Normalized Traces
Normalization is the process of modifying each Trace so that it has a mean of zero and a standard deviation of 1.0. Note that if the standard deviation of a trace is too small, then blowing that up to a standard deviation of 1.0 would introduce nothing but noise, so there’s a lower limit for the standard deviation of a trace, and below that we don’t normalize the standard deviation of the trace. The idea is to extract just the shape of the trace, so that all the normalized traces are comparable using a sum of squares distance metric. The smaller the sum of squares error is between two normalized trace, the more similar their shapes.
k-means clustering
I’m not going to explain k-means clustering in one paragraph, you should go look it up on Wikipedia or any of the fine explanations available on the web. The important point is that we are using k-means clustering to group normalized traces together based on their shape. The idea is that many traces will move together in the same direction from commit to commit. For example, if I speed up rectangle drawing then all tests that use rectangles should get faster, if not in the same proportion.
The centroid is the center point at the center of a cluster. In this case the mean of the normalized traces in a cluster, which acts as a prototype shape for the members of the cluster.
Regression Factor

For each cluster of normalized traces we find the best fitting step function to the centroid. From that best fitting step function we calculate Fit and Step, where Fit is the sum of squares error between the step function and the centroid, and Step is the height of the step function.

From there we calculate the Regression Factor:

R = Step / Fit

A smaller Fit values gives you a larger R, which means that the more a centroid looks like step function the larger R gets. Similarly the larger Step gets the larger R gets, which is a measure of how big of a change the centroid represents.

Putting it all together.

So finally, with all the preliminaries set up, we can get to the core of the system.

  • Collect all Traces over a specific range of commits. We usually use around the last 100-250 commits worth of data.
  • Normalize all the Traces.
  • Perform k-means clustering on all the Normalized Traces.
  • For each cluster calculate the Regression Factor of the centroid.
  • Any cluster with a Regression Factor whose absolute value is over 150 is considered interesting enough to need triaging. Note that 150 was chosen after observing the system in action for a while, the cutoff may be different for your data.

Here’s a view of the system at work, finding regressions in performance. Note that out of 40,000 benchmarks the first cluster contains 1336 traces and has a Regression Factor of -4.08e+3.

Screenshot 2014-11-23 at 1.14.19 PM.png

Continuous Analysis

The above system works for finding regressions once. But what happens if you want to check for regressions continuously as new commits land and new benchmark results arrive? One last definition:

A cluster is considered interesting if it’s Regression Factor is over 150. This is only a rule of thumb based on observing the system and may be relevant only to the Skia benchmarks, while a different cutoff may be appropriate for other datasets. The important point in that as |R| grows so does the likelihood of that cluster being a regression.

To continuously monitor for Interesting clusters, start by running the above analysis once and find interesting clusters. If there are any then triage them as either really needing attention, such as a CL needs to be rolled back, or ignorable, say in the case where a true performance increase was seen.  Then on a periodic basis run the analysis again when new data arrives. What should be done with the new set of interesting clusters produced from the analysis? The existing interesting clusters have already been triaged, and those same clusters may appear in the output of the analysis, and new interesting clusters may appear. The process of only raising up new interesting clusters for triaging while folding existing clusters with similar clusters that appear in the analysis results is called cluster coalescing.

Cluster coalescing currently works by looking at all the new interesting clusters and if they have the same traces as the 20 best traces in an existing cluster then they are considered the same cluster. Note that ‘best’ means the 20 traces that are closest to the centroid of a cluster. Note that this is an area of active work and we are still experimenting regularly with new cluster coalescing schemes.

Wrap Up

I hope that was useful. Please shoot me any questions on Twitter @bitworking. The code for the software that does the above analysis, and much more, is open sourced here.

Tim HopperSundry Links for November 24, 2014

brushfire: Avi Bryant has been building a 'Brushfire is a framework for distributed supervised learning of decision tree ensemble models in Scala.' Fun stuff!

What are the lesser known but useful data structures?: I always enjoy StackOverflow questions like this, but it is not considered a good, on-topic question for this site, of course.

Free Programming Books: A huge, crowd-sourced list of free programming books by language and topic.

PhD Dissertations-Machine Learning Department: Seven years of ML PhD dissertations from Carnegie Mellon University. I wish I had time to read Tools for Large Graph Mining.

Caktus GroupQ4 ShipIt Day: Dedicated to Creativity

This October, nearly everyone at Caktus took a break from their usual projects to take part in Caktus’s 8th ShipIt Day. Apart from a few diligent individuals who couldn’t afford to spare any time from urgent responsibilities, nearly everyone took a break to work and collaborate on creative and experimental projects, with the aim of trying something new and ideally seeing a project through from start to finish in the space of a day and a half.

Participants in ShipIt Day worked on a variety of projects. We all had the chance to try out Calvin’s shared music player application, SharpTunes, which within the first few hours was playing “Hooked On A Feeling”. It utilizes peer-to-peer sharing, similar to BitTorrent, to more efficiently distribute a music file to a large number of users while allowing them to simultaneously listen to a shared playlist. On his blog, he describes how he achieved proof of concept in under an hour and some later challenges with arranging playlists.

Caktus Sharp Tunes - Caktus Ship It Day

David worked on improving the UX (user experience) for Timepiece, our time-tracking system. While getting the chance to brush up on his Javascript and utilize bootstrap modals, he worked on improvements for the weekly schedule feature. The current version, although very handy, is increasingly difficult to read as the company grows and more employees’ hours are on the schedule. Therefore, David built a feature which makes it possible to view individual schedules as a modal. Rebecca provided some assistance in getting it deployed, and although it’s not quite done yet, it should save us all a lot of trouble reading the schedule from the back of big standup meetings.

Timepiece - Caktus Ship It Day

Tobias built tests for the Django Project Template. The Django Project Template makes it easy to provision new projects in Django, but testing the template itself can be difficult. Therefore, the tests, which can be run to test the template on a new server and then reports back in HipChat, should improve usability of the template.

Vinod worked on adding Django 1.7 support to RapidSMS, and with help from Dan, successfully reached his goal by the end of ShipIt Day. For next ShipIt Day, he hopes to implement Python 3 support too.

Brian set up a Clojure-based Overtone real time music environment, and although he didn’t reach his goal of using it to build new instruments, he did succeed in creating, in his own words, “some annoying tones.”

Victor and Alex collaborated on School Navigator (still a work in progress) for Code for Durham, designed to help Durham residents understand the perplexing complexity of public school options available to them. Alex imported GIS (geographic information system) data from Durham Public Schools, modeled the data, and built a backend using django-rest-framework. Victor contributed the frontend, which he built using Angular, while getting the chance to learn more about Appcache and Angular.

NC School Navigator - Caktus Ship It Day

Rebecca did some work for BRP Weather using django-pipeline, which gave her, Caleb, and Victor the opportunity to compare the pros and cons of django-compressor and django-pipeline. Although she finds the error messages with django-compressor to be a nuisance and prefers how django-pipeline handles static files, django-pipeline is not very helpful when it can not find a file and has some issues with sass files.

Michael continued designing a migraine-tracking app. He designed a simplified data entry system and did some animation design as well. The app is intended to track occurrences of migraines as well as potential triggers, such as barometric pressure and the user’s sleep patterns. Trevor also contributed some assistance with UX details.

Dan made progress on an application called Hall Monitor which he has been working on since before coming to Caktus. It accesses an office’s Google Calendars and uses string matching to check event names on calendars in order to determine who is in or out of the office. For instance, if someone has an event called “wfh” (working from home), it concludes that they are out of the office. Similarly, if someone is at an all-day event, it also logically concludes they are probably out. He demonstrated it to us, showing that it does indeed have an uncanny ability to track our presence.

Caleb set up Clojure and Quil and built an application for displaying animated lines in Quil which allows you to use Processing in Clojure. By modifying the program, the user can instantly modify the animation, creating interesting effects. He also created a Game of Life which runs in Quil (see below) and finished a Scheme implementation of Langton’s Ant in Automaton .

Animated Quil Lines - Caktus Ship It Day

Scott used the time to back up changes as well as add a KVM (keyboard, video and mouse) switch to the server rack.

Wray worked on a couple different projects. He completed a Unity tutorial which involved building a space shooter game which runs on Android, which we all got to try out. He also used the time to work on Curvemixer, which creates interesting vector graphics using curves and circles.

I took the time to write some help files for an application designed to allow medical students to test their radiology knowledge. The help files should allow students and instructors to better understand the many features in the application and writing them allowed me to practice documentation creation.

Overall, ShipIt Day was a very productive and refreshing experience for everyone, allowing us to spend time on the sorts of projects we wouldn’t usually find time to work on. Moreover, we got the chance to find new solutions to projects we may have been stuck on through collaboration.

Tim HopperSundry Links for November 17, 2014

There's no magic: virtualenv edition: I didn't really get virtualenvs until long after I started programming Python, though they're now an essential part of my toolkit. This is a great post explaining how they work.

Traps for the Unwary in Python’s Import System: "Python’s import system is powerful, but also quite complicated."

pyfmt: I recently learned about gofmt for auto-formatting Go code. Here's a similar tool for Python.

Q: Setting User-Agent Field?: A 1996 question in comp.lang.java on how to set the user agent field for a Java crawler. The signature on the question? Thanks, Larry Page

alecthomas/importmagic: Python tool and Sublime extension for automatically adding imports.

Caktus GroupSupporting Increased Healthcare Access with NCGetCovered.org

We’ve launched NCGetCovered.org, a site dedicated to helping North Carolinians gain access to health insurance. As many know, enrolling in health insurance can feel daunting. NCGetcovered.org aims to simplify that process by centralizing enrollment information and great resources like live help. The site is launching ahead of the November 15th open enrollment period for the federal healthcare exchange (healthcare.gov).

NCGetCovered.org is a testament to the hard work of the many dedicated to enrolling the uninsured. Caktus created the site on behalf of the Big Tent Coalition, a nonpartisan consortium of more than 100 organizations and 320 individuals pulled from community-based organizations, hospitals, insurance carriers, in-person assisters and non-profit organizations.

Taking the lead on this web project was our neighbor in Durham, MDC, a nonprofit dedicated to closing opportunity gaps and a Big Tent member. MDC is an incredibly forward-thinking organization and saw early on the need for a one-stop shop for health insurance enrollment information. We feel very fortunate to be MDC’s partners in increasing health insurance access in our home state.

Caktus GroupOpen Data Project in Durham - Thumbs Up to Open Government!

In exciting local news, Durham and Durham County are launching a new site dedicated to centralizing public data in Summer 2015. Their press release mentions a health sanitation app Code for Durham built as a model of civic engagement with open data. Our own co-founder and CTO, Colin Copeland, is co-captain of Code for Durham, a volunteer organization dedicated to building apps that improve government transparency.

Their press release describes the project:

“The City of Durham and Durham County Government are embarking on an open data partnership that will lay the groundwork for businesses, non-profits, journalists, universities, and residents to access and use the wealth of public data available between the two government organizations, while becoming even more transparent to the residents of Durham.”

We’re looking forward to seeing all the great apps for Durhamites that result from this big step towards open government!

Caktus GroupWe've Won Two W3 Awards for Creative Excellence on the Web!

We’re honored to announce that we’ve won two W3 Silver Awards for Creative Excellence on the Web. The awards were given in recognition of our homepage redesign and DjangonCon 2014. Many thanks to Open Bastion and, by extension, the Django Software Foundation for selecting us to build the DjangoCon website. Also many thanks to our hardworking team of designers, developers and project managers that worked on these projects: Dan, Daryl, David, Michael, Rebecca, and Trevor!

Here’s a quote from Linda Day, the director of the Academy of Interactive and Visual Arts (the sponsors of the award):

“We were once again amazed with the high level of execution and creativity represented within this year’s group of entrants. Our winners continue to find innovative and forward- thinking ways to push the boundaries of creativity in web design.”

We’re particularly humbled to learn that there were 4,000 entries this year and to be in the company of winners like Google, ESPN, Visa, and Sony and the many other wonderful companies that received recognition. We’re looking forward to continuing to build great web experiences!

The official press release: http://www.prweb.com/releases/2014-CaktusGroup/11/prweb12306675.htm

Tim HopperSundry Links for November 12, 2014

Amazon Picking Challenge: Kiva Systems (where I interned in 2011) is setting up a robotics challenging for picking items off warehouse shelves.

contexttimer 0.3.1: A handy Python context manger and decorator for timing things.

How-to: Translate from MapReduce to Apache Spark: This is a helpful bit from Cloudera on moving algorithms from Mapreduce to Spark.

combinatorics 1.4.3: Here's a Python module adding some combinatorial functions to the language.

Special methods and interface-based type system: Guido van Rossum explains (in 2006) why Python uses len(x) instead of x.len().

Caktus GroupUsing Amazon S3 to Store your Django Site's Static and Media Files

Storing your Django site's static and media files on Amazon S3, instead of serving them yourself, can make your site perform better.

This post is about how to do that. We'll describe how to set up an S3 bucket with the proper permissions and configuration, how to upload static and media files from Django to S3, and how to serve the files from S3 when people visit your site.

S3 Bucket Access

We'll assume that you've got access to an S3 account, and a user with the permissions you'll need.

The first thing to consider is that, while I might be using my dpoirier userid to set this up, I probably don't want our web site using my dpoirier userid permanently. If someone was able to break into the site and get the credentials, I wouldn't want them to have access to everything I own. Or if I left Caktus (unthinkable though that is), someone else might need to be able to manage the resources on S3.

What we'll do is set up a separate AWS user, with the necessary permissions to run the site, but no more, and then have the web site use that user instead of your own.

  • Create the bucket.
  • Create a new user: Go to AWS IAM. Click "Create new users" and follow the prompts. Leave "Generate an access key for each User" selected.
  • Get the credentials
  • Go to the new user's Security Credentials tab.
  • Click "Manage access keys",
  • Download the credentials for the access key that was created, and
  • Save them somewhere because no one will ever be able to download them again.
  • (Though it's easy enough to create a new access key if you lose the old one's secret key.)
  • Get the new user's ARN (Amazon Resource Name) by going to the user's Summary tab. It'll look like this: "arn:aws:iam::123456789012:user/someusername"
  • Go to the bucket properties in the S3 management console.
  • Add a bucket policy that looks like this, but change "BUCKET-NAME" to the bucket name, and "USER-ARN" to your new user's ARN. The first statement makes the contents publicly readable (so you can serve the files on the web), and the second grants full access to the bucket and its contents to the specified user::

        "Statement": [
              "Principal": {
                    "AWS": "*"
                "Action": "s3:*",
                "Effect": "Allow",
                "Resource": [
                "Principal": {
                    "AWS": [
  • If you need to add limited permissions for another user to do things with this bucket, you can add more statements. For example, if you want another user to be able to copy all the content from this bucket to another bucket:

            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::BUCKET-NAME",
            "Principal": {
                "AWS": [

That will let the user list the objects in the bucket. The bucket was already publicly readable, but not listable, so adding this permission will let the user sync from this bucket to another one where the user has full permissions.

Expected results:

  • The site can use the access key ID and secret key associated with the user's access key to access the bucket
  • The site will be able to do anything with that bucket
  • The site will not be able to do anything outside that bucket

S3 for Django static files

The simplest case is just using S3 to serve your static files. In Django, we say "static files" to refer to the fixed files that we provide and serve as part of our site - typically images, css, and javascript, and maybe some static HTML files. Static files do not include any files that might be uploaded by users of the site. We call those "media files".

Before continuing, you should be familiar with managing static files, the staticfiles app, and deploying static files in Django.

Also, your templates should never hard-code the URL path of your static files. Use the static tag instead:

      {% load static from staticfiles %}
      <img src="{% static 'images/rooster.png' %}"/>

That will use whatever the appropriate method is to figure out the right URL for your static files.

The two static tags

Django provides two template tags named static.

The first static is in the static templatetags library, and accessed using {% load static %}. It just puts the value of STATIC_URL in front of the path.

The one from staticfiles ({% load static from staticfiles %}) is smarter - it uses whatever storage class you've configured for static files to come up with the URL.

By using the one from staticfiles from the start, you'll be prepared for any storage class you might decide to use in the future.

Moving your static files to S3

In order for your static files to be served from S3 instead of your own server, you need to arrange for two things to happen:

  1. When you serve pages, any links in the pages to your static files should point at their location on S3 instead of your own server.
  2. Your static files are on S3 and accessible to the web site's users.

Part 1 is easy if you've been careful not to hardcode static file paths in your templates. Just change STATICFILES_STORAGE in your settings.

But you still need to get your files onto S3, and keep them up to date. You could do that by running collectstatic locally, and using some standalone tool to sync the collected static files to S3, at each deploy. But we won't be able to get away with such a simple solution for media files, so we might as well go ahead and set up the custom Django storage we'll need now, and then our collectstatic will copy the files up to S3 for us.

To start, install two Python packages: django-storages (yes, that's "storages" with an "S" on the end), and boto:

    $ pip install django-storages boto

Add 'storages' to INSTALLED_APPS:


If you want (optional), add this to your common settings:

    AWS_HEADERS = {  # see http://developer.yahoo.com/performance/rules.html#expires
        'Expires': 'Thu, 31 Dec 2099 20:00:00 GMT',
        'Cache-Control': 'max-age=94608000',

That will tell boto that when it uploads files to S3, it should set properties on them so that when S3 serves them, it'll include those HTTP headers in the response. Those HTTP headers in turn will tell browsers that they can cache these files for a very long time.

Now, add this to your settings, changing the first three values as appropriate:

    AWS_ACCESS_KEY_ID = 'xxxxxxxxxxxxxxxxxxxx'
    AWS_SECRET_ACCESS_KEY = 'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy'

    # Tell django-storages that when coming up with the URL for an item in S3 storage, keep
    # it simple - just use this domain plus the path. (If this isn't set, things get complicated).
    # This controls how the `static` template tag from `staticfiles` gets expanded, if you're using it.
    # We also use it in the next setting.
    AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME

    # This is used by the `static` template tag from `static`, if you're using that. Or if anything else
    # refers directly to STATIC_URL. So it's safest to always set it.
    STATIC_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN

    # Tell the staticfiles app to use S3Boto storage when writing the collected static files (when
    # you run `collectstatic`).
    STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Only the first three lines should need to be changed for now.


One more thing you need to set up is CORS. CORS defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. Since we're going to be serving our static files and media from a different domain, if you don't take CORS into account, you'll run into mysterious problems, like Firefox not using your custom fonts for no apparent reason.

Go to your S3 bucket properties, and under "Permissions", click on "Add CORS Configuration". Paste this in:


I won't bother to explain this, since there are plenty of explanations on the web that you can Google for. The tricky part is knowing you need to add CORS in the first place.

Try it

With this all set up, you should be able to upload your static files to S3 using collectstatic:

    python manage.py collectstatic

If you see any errors, double-check all the steps above.

Once that's successful, you should be able to start your test site and view some pages. Look at the page source and you should see that the images, css, and javascript are being loaded from S3 instead of your own server. Any media files should still be served as before.

Don't put this into production quite yet, though. We still have some changes to make to how we're doing this.

Moving Media Files to S3

Reminder: Django "media" files are files that have been uploaded by web site users, that then need to be served from your site. One example is a user avatar (an image the user uploads and the site displays with the user's information).

Media files are typically managed using FileField and ImageField fields on models. In a template, you use the url attribute on the file field to get the URL of the underlying file.

For example, if user.avatar is an ImageField on your user model, then

    <img src="{{ user.avatar.url }}">

would embed the user's avatar image in the web page.

By default, when a file is uploaded using a FileField or ImageField, it is saved to a file on a path inside the local directory named by MEDIA_ROOT, under a subdirectory named by the field's upload_to value. When the file's url attribute is accessed, it returns the value of MEDIA_URL, prepended to the file's path inside MEDIA_ROOT.

An example might help. Suppose we have these settings:

    MEDIA_ROOT = '/var/media/'
    MEDIA_URL = 'http://media.example.com/'

and this is part of our user model:

    avatar = models.ImageField(upload_to='avatars')

When a user uploads an avatar image, it might be saved as /var/media/avatars/12345.png. Then <img src="{{ user.avatar.url }}"> would expand to <img src="http://media.example.com/avatars/12345.png">.

Our goal is instead of saving those files to a local directory, to send them to S3. Then instead of having to serve them somehow locally, we can let Amazon serve them for us.

Another advantage of using S3 for media files is if you scale up by adding more servers, this makes uploaded images available on all servers at once.

Configuring Django media to use S3

Ideally, we'd be able to start putting new media files on S3 just by adding this to our settings:

    MEDIA_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN
    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Adding those settings would indeed tell Django to save uploaded files to our S3 bucket, and use our S3 URL to link to them.

Unfortunately, this would store our media files on top of our static files, which we're already keeping in our S3 bucket. If we were careful to always set upload_to on our FileFields to directory names that would never occur in our static files, we might get away with it (though I'm not sure Django would even let us). But we can do better.

What we want to do is either enforce storing our static files and media files in different subdirectories of our bucket, or use two different buckets. I'll show how to use the different paths first.

In order for our STATICFILES_STORAGE to have different settings from our DEFAULT_FILE_STORAGE, they need to use two different storage classes; there's no way to configure anything more fine-grained. So, we'll start by creating a custom storage class for our static file storage, by subclassing S3BotoStorage. We'll also define a new setting, so we don't have to hard-code the path in our Python code:

    # custom_storages.py
    from django.conf import settings
    from storages.backends.s3boto import S3BotoStorage

    class StaticStorage(S3BotoStorage):
        location = settings.STATICFILES_LOCATION

Then in our settings:

    STATICFILES_STORAGE = 'custom_storages.StaticStorage'

Giving our class a location attribute of 'static' will put all our files into paths on S3 starting with 'static/'.

You should be able to run collectstatic again, restart your site, and now all your static files should have '/static/' in their URLs. Now delete from your S3 bucket any files outside of '/static' (using the S3 console, or whatever tool you like).

We can do something very similar now for media files, adding another storage class:

    class MediaStorage(S3BotoStorage):
        location = settings.MEDIAFILES_LOCATION

and in settings:

    DEFAULT_FILE_STORAGE = 'custom_storages.MediaStorage'

Now when a user uploads their avatar, it should go into '/media/' in our S3 bucket. When we display the image on a page, the image URL will include '/media/'.

Using different buckets

You can use different buckets for static and media files by adding a bucket_name attribute to your custom storage classes. You can see the whole list of attributes you can set by looking at the source for S3BotoStorage.

Moving an existing site's media to S3

If your site already has user-uploaded files in a local directory, you'll need to copy them up to your media directory on S3. There are lots of tools these days for doing this kind of thing. If the command line is your thing, try the AWS CLI tools from Amazon. They worked okay for me.


Serving your static and media files from S3 requires getting a lot of different parts working together. But it's worthwhile for a number of reasons:

  • S3 can probably serve your files more efficiently than your own server.
  • Using S3 saves the resources of your own server for more important work.
  • Having media files on S3 allows easier scaling by replicating your servers.
  • Once your files are on S3, you're well on the way to using CloudFront to serve them even more efficiently using Amazon's CDN service.

Caktus GroupWebcast: Creating Enriching Web Applications with Django and Backbone.js

Update: The live webcast is now available at O'Reilly Media

Our technical director, Mark Lavin, will be giving a tutorial on Django and Backbone.js during a free webcast for O’Reilly Media tomorrow, November 6th, 1pm EST. There will be demos and a discussion of common stumbling blocks when building rich client apps.

Register today!

Here’s a description of his talk:

"Django and Backbone are two of the most popular frameworks for web backends and frontends respectively and this webcast will talk about how to use them together effectively. During the session we'll build a simple REST API with Django and connect it to a single page application built with Backbone. This will examine the separation of client and server responsibilities. We'll dive into the differences between client-side and server-side routing and other stumbling blocks that developers encounter when trying to build rich client applications.

If you're familiar with Python/Django but unfamiliar with Javascript frameworks, you'll get some useful ideas and examples on how to start integrating the two. If you're a Backbone guru but not comfortable working on the server, you'll learn how the MVC concepts you know from Backbone can translate to building a Django application."

Update: The live webcast is now available at O'Reilly Media

Tim HopperSundry Links for November 3, 2014

Public Data Sets : Amazon Web Services: Amazon hosts a number of publicly datasets on AWS (including the common crawl corpus and the "Marvel Universe Social Graph").

Rapid Web Prototyping with Lightweight Tools: I've shared this before, but my boss Andrew did a fantastic tutorial last year on Flask, Jinja2, MongoDB, and Twitter Bootstrap. Combined with Heroku, it's surprisingly easy to get a website running these days.

rest_toolkit: REST has been my obsession of late. Here's a little Python package for quickly writing RESTful APIs.

The Joys of the Craft: A quote from Fred Brooks' The Mythical Man-Month on why programming is fun.

How do I use pushd and popd commands?: I recently learned bash has push and popd commands for temporarily changing directories. This is very handy for scripting.

Tim HopperSundry Links for November 1, 2014

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!): I guess the title says it all. By Joel Spolsky.

Unix Shells - Hyperpolyglot: Very cool comparison of basic command syntax in Bash, Fish, Ksh, Tcsh, and Zsh.

Better Bash history: I'm pretty stuck on Bash at the moment. Here's a way to get a better history in Bash. (Other shells often improve on Bash's history.)

usaddress 0.1: I always love seeing a Python library for something I've tried to do poorly on my own: "usaddress is a python library for parsing unstructured address strings into address components, using advanced NLP methods."

more-itertools: A great extension to the helpful itertools module in Python. Some particularly helpful functions: chunked, first, peekaboo, and take. Unfortunately, it doesn't have Python 3 support at the moment.

Tim HopperPyspark's AggregateByKey Method

I can't find a (py)Spark aggregateByKey example anywhere, so I made a quick one.

Tim HopperSundry Links for September 30, 2014

Hammock: A lightweight wrapper around the Python requests module to convert REST APIs into "dead simple programmatic APIs". It's a clever idea. I'll have to play around with it before I can come up with a firm opinion.

pipsi: Pipsi wraps pip and virtualenv to allow you to install Python command line utilities without polluting your global environment.

Writing a Command-Line Tool in Python: Speaking of Python command line utilities, here's a little post from Vincent Driessen on writing them.

Iterables vs. Iterators vs. Generators: Vincent has been on a roll lately. He also wrote this "little pocket reference on iterables, iterators and generators" in Python.

Design for Continuous Experimentation: Talk and Slides: I didn't watch the lecture, but Dan McKinley's slides on web experimentation are excellent.

Apache Spark: A Delight for Developers: I've been playing with PySpark lately, and it really is fun.

Caktus GroupCelery in Production

(Thanks to Mark Lavin for significant contributions to this post.)

In a previous post, we introduced using Celery to schedule tasks.

In this post, we address things you might need to consider when planning how to deploy Celery in production.

At Caktus, we've made use of Celery in a number of projects ranging from simple tasks to send emails or create image thumbnails out of band to complex workflows to catalog and process large (10+ Gb) files for encryption and remote archival and retrieval. Celery has a number of advanced features (task chains, task routing, auto-scaling) to fit most task workflow needs.

Simple Setup

A simple Celery stack would contain a single queue and a single worker which processes all of the tasks as well as schedules any periodic tasks. Running the worker would be done with

python manage.py celery worker -B

This is assuming using the django-celery integration, but there are plenty of docs on running the worker (locally as well as daemonized). We typically use supervisord, for which there is an example configuration, but init.d, upstart, runit, or god are all viable alternatives.

The -B option runs the scheduler for any periodic tasks. It can also be run as its own process. See starting-the-scheduler.

We use RabbitMQ as the broker, and in this simple stack we would store the results in our Django database or simply ignore all of the results.

Large Setup

In a large setup we would make a few changes. Here we would use multiple queues so that we can prioritize tasks, and for each queue, we would have a dedicated worker running with the appropriate level of concurrency. The docs have more information on task routing.

The beat process would also be broken out into its own process.

# Default queue
python manage.py celery worker -Q celery
# High priority queue. 10 workers
python manage.py celery worker -Q high -c 10
# Low priority queue. 2 workers
python manage.py celery worker -Q low -c 2
# Beat process
python manage.py celery beat

Note that high and low are just names for our queues, and don't have any implicit meaning to Celery. We allow the high queue to use more resources by giving it a higher concurrency setting.

Again, supervisor would manage the daemonization and group the processes so that they can all be restarted together. RabbitMQ is still the broker of choice. With the additional task throughput, the task results would be stored in something with high write speed: Memcached or Redis. If needed, these worker processes can be moved to separate servers, but they would have a shared broker and results store.

Scaling Features

Creating additional workers isn't free. The default concurrency uses a new process for each worker and creates a worker per CPU. Pushing the concurrency far above the number of CPUs can quickly pin the memory and CPU resources on the server.

For I/O heavy tasks, you can dedicate workers using either the gevent or eventlet pools rather than new processes. These can have a lower memory footprint with greater concurrency but are both based on greenlets and cooperative multi-tasking. If there is a library which is not properly patched or greenlet safe, it can block all tasks.

There are some notes on using eventlet, though we have primarily used gevent. Not all of the features are available on all of the pools (time limits, auto-scaling, built-in rate limiting). Previously gevent seemed to be the better supported secondary pool, but eventlet seems to have closed that gap or surpassed it.

The process and gevent pools can also auto-scale. It is less relevant for the gevent pool since the greenlets are much lighter weight. As noted in the docs, you can implement your own subclass of the Autoscaler to adjust how/when workers are added or removed from the pool.

Common Patterns

Task state and coordination is a complex problem. There are no magic solutions whether you are using Celery or your own task framework. The Celery docs have some good best practices which have served us well.

Tasks must assert the state they expect when they are picked up by the worker. You won't know how much time has passed since the original task was queued and when it executes. Another similar task might have already carried out the operation if there is a backlog.

We make use of a shared cache (Memcache/Redis) to implement task locks or rate limits. This is typically done via a decorator on the task. One example is given in the docs though it is not written as a decorator.

Key Choices

When getting started with Celery you must make two main choices:

  • Broker
  • Result store

The broker manages pending tasks, while the result store stores the results of completed tasks.

There is a comparison of the various brokers in the docs.

As previously noted, we use RabbitMQ almost exclusively, though we have used Redis successfully and experimented with SQS. We prefer RabbitMQ because Celery's message passing style and much of the terminology was written with AMQP in mind. There are no caveats with RabbitMQ like there are with Redis, SQS, or the other brokers which have to emulate AMQP features.

The major caveat with both Redis and SQS is the lack of built-in late acknowledgment, which requires a visibility timeout setting. This can be important when you have long running tasks. See acks-late-vs-retry.

To configure the broker, use BROKER_URL.

For the result store, you will need some kind of database. A SQL database can work fine, but using a key-value store can help take the load off of the database, as well as provide easier expiration of old results which are no longer needed. Many people choose to use Redis because it makes a great result store, a great cache server and a solid broker. AMQP backends like RabbitMQ are terrible result stores and should never be used for that, even though Celery supports it.

Results that are not needed should be ignored, using CELERY_IGNORE_RESULT or Task.ignore_result.

To configure the result store, use CELERY_RESULT_BACKEND.

RabbitMQ in production

When using RabbitMQ in production, one thing you'll want to consider is memory usage.

With its default settings, RabbitMQ will use up to 40% of the system memory before it begins to throttle, and even then can use much more memory. If RabbitMQ is sharing the system with other services, or you are running multiple RabbitMQ instances, you'll want to change those settings. Read the linked page for details.

Transactions and Django

You should be aware that Django's default handling of transactions can be different depending on whether your code is running in a web request or not. Furthermore, Django's transaction handling changed significantly between versions 1.5 and 1.6. There's not room here to go into detail, but you should review the documentation of transaction handling in your version of Django, and consider carefully how it might affect your tasks.


There are multiple tools available for keeping track of your queues and tasks. I suggest you try some and see which work best for you.


When going to production with your site that uses Celery, there are a number of decisions to be made that could be glossed over during development. In this post, we've tried to review some of the decisions that need to be thought about, and some factors that should be considered.

Tim HopperiOS's Launch Center Pro, Auphonic, and RESTful APIs

Lately I've been using Auphonic's web service for automating audio post-production and distribution. You can provide Auphonic with an audio file (via Dropbox, FTP, web upload, and more), and it will perform any number of tasks for you, including

  • Tag the track with metadata (including chapter markings)
  • Intelligently adjust levels
  • Normalize loudness
  • Reduce background noise and hums
  • Encode the audio in numerous formats
  • Export the final production to a number of services (including Dropbox, FTP, and Soundcloud)

I am very pleased with Auphonic's product, and it's replaced a lot of post-processing tools I tediously hacked together with Hazel, Bash, and Python.

Among its many other features, API has a robust RESTful API available to all users. I routinely create Auphoic productions that vary only in basic metadata, and I have started using this API to automate creation of productions from my iPhone.

Launch Center Pro is a customizable iOS app that can trigger all kinds of actions in other apps. You can also create input forms in LCP and send the data from them elsewhere. I created a LCP action with a form for entering title, artist, album, and track metadata that will eventually end up in a new Auphonic production.

The LCP action settings looks like this 2:

When I launch that action in LCP, I get four prompts like this:

After I fill out the four text fields, LCP uses the x-callback URL I defined to send that data to Pythonista, a masterful "integrated development environment for writing Python scripts on iOS."

In Pythonista, I have a script called New Production. LCP passes the four metadata fields I entered as sys.argv variables to my Python script. The Python script adds these variables to a metadata dictionary that it then POSTs to the Auphonic API using the Python requests library. After briefly displaying the output from the Auphonic API, Pythonista returns me to LCP.

Here's my Pythonista script1:


import sys
import requests
import webbrowser
import time
import json
import datetime as dt

# Read input from LCP
title = sys.argv[1]
artist = sys.argv[2]
album = sys.argv[3]
track = sys.argv[4]

d = {
        "metadata": {
            "title": title,
            "artist": artist,
            "album": album,
            "track": track

# POST production to Auphonic API
r = requests.post("https://auphonic.com/api/productions.json",
          headers={"content-type": "application/json"}).json()

# Display API Response
print "Response", r["status_code"]
print "Error:", r["error_message"]
for key, value in r.get("data",{}).get("metadata",{}).iteritems():
if value:
    print key, ':', value

# Return to LCP

After firing my LCP action, I can log into my Auphonic account and see a incomplete3 production with the metadata I entered!

While I just specified some basic metadata with the API, Auphonic allows every parameter that can be set on the web client to be configured through the API. For example, you can specify exactly what output files you want Auphonic to create or create a production using one of your presets. These details just needed to be added to the d dictionary in the script above. Moreover, this same type of setup could be used with any RESTful API, not just Auphonic.

  1. If you want to use this script, you'll have to provide your own Auphonic username and password. 

  2. Here is that x-callback URL of you want to copy it: pythonista://{{New Production}}?action=run&args={{"[prompt:Title]" "[prompt:Artist]" "[prompt:Album]" "[prompt:Track]"}} 

  3. It doesn't have an audio file and hasn't been submitted. 

Tim HopperSundry Links for September 25, 2014

Philosophy of Statistics (Stanford Encyclopedia of Philosophy): I suspect that a lot of the Bayesian vs Frequentist debates ignore the important epistemological underpinnings of statistics. I haven’t finished reading this yet, but I wonder if it might help.

Connect Sunlight Foundation to anything: “The Sunlight Foundation is a nonpartisan non-profit organization that uses the power of the Internet to catalyze greater U.S. Government openness and transparency.” They now of an IFTTT channel. Get push notifications when the president signs a bill!

furbo.org · The Terminal: Craig Hockenberry wrote a massive post on how he uses the Terminal on OS X for fun and profit. You will learn things.

A sneak peek at Camera+ 6… manual controls are coming soon to you! : I’ve been a Camera+ user on iOS for a long time. The new version coming out soon is very exciting.

GitHut - Programming Languages and GitHub: A very clever visualization of various languages represented on Github and of the properties of their respective repositories.

Og MacielBooks

Woke up this morning and, as usual, sat down to read the Books section of The New York Times while drinking my coffee. This has become sort of a ‘tradition’ for me and because of it I have been able to learn about many interesting books, some of which I would not have found out on my own. I also ‘blame’ this activity to turning my nightstand into a mini-library on its own.

Currently I have the following books waiting for me:

Anyhow, while drinking my coffee this morning I realized just how much I enjoy reading and (what I like to call) catching up with all the books I either read when I was younger but took for granted or finally getting to those books that have been so patiently waiting for me to get to them. And now, whenever I’m not working or with my kids, you can bet your bottom dollar that you’ll find me somewhere outside (when the mosquitos are not buzzing about the yard) or cozily nestled with a book (or two) somewhere quiet around the house.

Book Queue

But to the point of this story, today I realized that, if I could go back in time (which reminds me, I should probably add “The Time Machine” to my list) to the days when I was looking to buy a house, I would have done two things differently:

  1. wire the entire house so that every room would have a couple of ethernet ports;
  2. chosen a house with a large-ish room and add wall-to-wall bookcases, like you see in those movies where a well-off person takes their guests into their private libraries for tea and biscuits;

I realize that I can’t change the past, and I also realize that perhaps it is a good thing that I took my book reading for granted during my high school and university years… I don’t think I would have enjoyed reading “Dandelion Wine” or “Mrs. Dalloway” as much back then as I when I finally did. I guess reading books is very much like the process of making good wines… with age and experience, the reader, not the book, develops the maturity and ability to properly savor a good story.

Tim HopperSundry Links for September 20, 2014

Open Sourcing a Python Project the Right Way: Great stuff that should be taught in school: “Most Python developers have written at least one tool, script, library or framework that others would find useful. My goal in this article is to make the process of open-sourcing existing Python code as clear and painless as possible.”

elasticsearch/elasticsearch-dsl-py: Elasticsearch is an incredible datastore. Unfortunately, its JSON-based query language is tedious, at best. Here’s a nice higher-level Python DSL being developed for it. It’s great!

Equipment Guide — The Podcasting Handbook: Dan Benjamin of 5by5 podcasting fame is writing a book on podcasting. Here’s his brief equipment guide.

bachya/pinpress: Aaron Bach put together a neat Ruby script that he uses to generate his link posts. This is similar to but better than my sundry tool.

Markdown Resume Builder: I haven’t tried this yet, but I like the idea: a Markdown based resume format that can be converted into HTML or PDF.

Git - Tips and Tricks: Enabling autocomplete in Git is something I should have done long ago.

Apache Storm Design Pattern—Micro Batching: Micro batching is a valuable tool when doing stream processing. Horton Works put up a helpful post outlining three ways of doing it.

Caktus GroupImproving Infant and Maternal Health in Rwanda and Zambia with RapidSMS

Image courtesy of UNICEF, the funders of this project.

I have had the good fortune of working internationally on mobile health applications due to Caktus' focus on public health. Our public health work often uses RapidSMS, a free and open-source Django powered framework for dynamic data collection, logistics coordination and communication, leveraging basic short message service (SMS) mobile phone technology. I was able to work on two separate projects tracking data related to the 1000 days between a woman’s pregnancy and the child’s second birthday. Monitoring mothers and children during this time frame is critical as there are many factors that, when monitored properly, can decrease the mortality rates for both mother and child. Both of these projects presented interesting challenges and resulted in a number of takeaways worth further discussion.


The first trip took me to Lusaka, the capitol of Zambia, to work on Saving Mothers Giving Life (SMGL) which is administered by the Zambia Center for Applied Health Research and Development (ZCAHRD) office. The ZCAHRD office had recently finished a pilot phase resulting in a number of additional requirements to implement before expanding the project. In addition to feature development and bug fixes, training a local developer was on the docket.

SMGL collects maternal and fetal/child data via SMS text messages.  When an SMS is received by the application, the message is parsed and routed for additional processing based on matching known keywords. For example, I could have a BirthHandler KeywordHandler that allows the application to track new births. Any message that begins with the keyword birth would be further processed by BirthHandler. KeywordHandlers must have, at a minimum, a defined keyword, help and handler functionality:

from rapidsms.contrib.handlers import KeywordHandler

class BirthHandler(KeywordHandler): 
    def help(self): 
        self.respond("Send BIRTH BOY or BIRTH GIRL.") 

    def handle(self, text): 
        if text.upper() == "BOY": 
            self.respond("A boy was born!") 
        elif text.upper() == "GIRL":
            self.respond("A girl was born!")

An example session:

 > birth 
 < Send BIRTH BOY or birth GIRL. 
 > birth boy 
 < A boy was born! 
 > birth girl
 < A girl was born!
 > birth pizza

New Keyword Handlers

The new syphilis keyword handler would allow clinicians to track a mother’s testing and treatment data. For our handler, a user supplies the SYP keyword, mother id, the date of the visit followed by the test result indicator or shot series and an optional next shot date:


To record a positive syphillis test result on January 1, 2013 for mother #1 with a next shot data of January 2, 2013, the following SMS would be sent:

  SYP 1 01 01 2013 P 02 01 2013

With these records in hand, the system’s periodic reminder application will send notifications to remind patients of their next visit. Similar functionality exists for tracking pre- and post-natal visits.

The other major feature implemented for this phase was a referral workflow.  It is critical for personnel at facilities ranging from the rural village to the district hospital to be aware of incoming patients with both emergent and non-emergent needs, as the reaction to each case differs greatly.  The format for SMGL referrals is as follows:


To refer mother #1 who is bleeding to facility #100 and requires emergency care:

  REFER 1 100 B 1200 EM

Based on the receiving facility and the reason as well as the emergent indicator differing people will be notified of the case. Emergent cases require dispatching ambulances, prepping receiving facilities and other essential components to increase the survivability for the mother and/or child, whereas non-emergent cases may only require clinical workers to be made aware of an inbound patient.


The reporting tools were fairly straightforward, creating web based views for each keyword handler that presented the data in filterable, sortable, tabular format. In addition, end users can export the data as a spreadsheet for further analysis.  These views allow clinicians, researchers, and other stakeholders easily accessible metrics to analyze the efficacy of the system as a whole.


As mentioned earlier, training a local developer was also a core component of this visit.  This person was the office’s jack of all trades for all things technical, from network and systems administration to shuttling around important data on thumb drives. Given his limited exposure to Python, we spent most of the time in country pair programming, talking through the model-view-template architecture and finding bite sized tasks for him to work through when not pair programming.

Zambia Takeaways:

  • It was relatively straightfoward to write one off views and exporters for the keyword handlers. But, as the number of handlers increases for the project, this functionality could benefit from abstracting into a generic DRY reporting tool.
  • When training, advocate that the participant has either 100% of his time allocated or draw up designated blocks of time during the day. The ad hoc schedule we worked with was not as fruitful as it could have been, as competing responsibilities often took precedence over actual Django/RapidSMS training.
  • If in Zambia, there are two requisite weekend trips: Victoria Falls and South Luangwa National Park . Visitors to Zambia do themselves a great disservice to not schedule trips to both areas.

Off to Rwanda!

UNICEF  recognized that many countries were working on solving the same problem, monitoring the patients and capturing the data from those all important first 1000 Days.  A 1000 Days initiative was put forward, whereby countries would contribute resources and code to a single open source platform that all countries could deploy independently. Evan Wheeler, a UNICEF Project Manager contacted Caktus about contributing to this project.

We were tasked with building three RapidSMS components of the 1000 Days architecture: an appointment application, a patient/provider API for storing and accessing records from different backends, and a nutrition monitoring application.  We would flesh out these applications before our in country visit to Kigali, Rwanda. While there, working closely with Evan and our in country counterparts, we would finish the initial versions of these applications as well as orient the local development team to the future direction of the 1000 Days deployment.

rapidsms-appointments  allows users to subscribe to a series of appointments based on a timeline of configurable milestones. Appointment reminders are sent out to patient/staff, and there are mechanisms for confirming, rescheduling, and tracking missed/made appointments. The intent of this application was to create an engine for generating keyword handlers based on appointments. Rather than having to write code for each individual timeline based series (pre- and post-natal mother visits, for example), one could simply configure these through the admin panel. The project overview documentation provides a great entry point.

rapidsms-healthcare obviates the need for countries’ to track patient/provider data in multiple databases. Many countries utilize 3rd party datastores, such as OpenMRS , to create a medical records system. With rapidsms-healthcare in 1000 Days, deployments can take advantage of pre-existing patient & provider data by utilizing a default healthcare storage backend, or creating a custom backend for their existent datastore. Additional applications can then utilize the healthcare API to access patients and providers.

rapidsms-nutrition is an example of such a library.  It will consume patient data from the healthcare API and monitor child growth, generating statistical assessments based on WHO Child Growth Standards. It utilizes the pygrowup library. With this data in hand, it is relatively easy to create useful visualizations with a library such as d3.js.

Rwanda Takeaways

  • Rwanda is gorgeous.  We had an amazing time in Kigali and at Lake Kivu, one of three EXPLODING LAKES in the world.

No report on Africa would be complete without a few pictures...enjoy!!


Tim HopperQuickly Converting Python Dict to JSON

Recently, I've spent a lot of time going back and forth between Python dicts and JSON. For some reason, I decided last week that I'd be useful to be able to quickly convert a Python dict to pretty printed JSON.

I created a TextExpander snippet that takes a Python dict from the clipboard, converts it to JSON, and pastes it.

Here are the details:

#!/usr/bin/env python
import os, json
import subprocess

def getClipboardData():
 p = subprocess.Popen(['pbpaste'], stdout=subprocess.PIPE)
 retcode = p.wait()
 data = p.stdout.read()
 return data

cb = eval(getClipboardData())

print json.dumps(cb, sort_keys=True, indent=4, separators=(',', ': '))

Caktus GroupQ3 Charitable Giving

Our client social impact projects continue here at Caktus, with work presently being done in Libya, Nigeria, Syria, Turkey, Iraq and the US. But every quarter, we pause to consider the excellent nonprofits that our employees volunteer for and, new this quarter, that they have identified as having a substantive influence on their lives. The following list represents employee-nominated nonprofits which we are giving to in alphabetical order:

Animal Protection Society of Durham

The Animal Protection Society of Durham (APS) is a non-profit organization that has been helping animals in our community since 1970, and has managed the Durham County Animal Shelter since 1990. IAPS feeds, shelters and provides medical attention for nearly 7,000 stray, surrendered, abandoned, abused and neglected animals annually.

The Carrack

The Carrack is owned and run by the community, for the community, and maintains an indiscriminate open forum that enables local artists to perform and exhibit outside of the constraints of traditional gallery models, giving the artist complete creative freedom.

Scrap Exchange

The Scrap Exchange is a nonprofit creative reuse center in Durham, North Carolina whose mission is to promote creativity, and environmental awareness. The Scrap Exchange provides a sustainable supply of high-quality, low-cost materials for artists, educators, parents, and other creative people.

Society for the Prevention of Cruelty to Animals - San Francisco

As the fourth oldest humane society in the U.S. and the founders of the No-Kill movement, the SF SPCA has always been at the forefront of animal welfare. SPCA SF’s animal shelter provides pets for adoption.

Southern Coalition for Social Justice

The Southern Coalition for Social Justice was founded in Durham, North Carolina by a multidisciplinary group, predominantly people of color, who believe that families and communities engaged in social justice struggles need a team of lawyers, social scientists, community organizers and media specialists to support them in their efforts to dismantle structural racism and oppression.