A planet of blogs from our members...

Tim HopperShouldIGetAPhD.com

Last year, I published nine interviews with Internet friends about how an academically-minded, 22-year old college senior should work on a Ph.D. Many people have told me the interviews have been helpful for them or that they've emailed them to others.

I decided to make a dedicated website to host the interviews. You can find it at shouldigetaphd.com.

I hope this continues to be a valuable resource. I'd encourage you to share this with anyone you know who is thinking through this question.

Tim HopperSundry Links for December 6, 2014

Sketching as a Tool for Numerical Linear Algebra: A neat paper on sketching algorithms for linear algebra. No, not that kind of sketching. "One first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties. Much of the expensive computation can then be performed on the smaller matrix, thereby accelerating the solution for the original problem."

Maps and the Geospatial Revolution: Coursera is teaching a class in the spring on how geospatical technology has changed our world.

Geoprocessing with Python using Open Source GIS: Speaking of geospatial technology, here are some slides and problems from a class on "Geoprocessing with Python".

How to use the bash command line history: Bash's history can do more than I realized!

A geometric interpretation of the covariance matrix: Superb little post explaning covariance matrices with pictures and geometry.

Og MacielThree Years and Counting!

Making a quick pit stop to mark this milestone in my professional career: today is my 3-year anniversary at Red Hat! Time has certainly flown by and I really cannot believe that it has been three years since I joined this company.

I know it is sort of cliche to say “I can not believe that it has been this long…” and so on and so forth, but it is so true. Back then I joined a relatively new project with very high ambitions, and the first few months had me swimming way out in the deepest part of the pool, trying to learn all ‘Red Hat-things’ and Clojure for the existing automation framework (now we are fully using Python).

I did a lot of swimming for sure, and through the next months, through many long days and weekends and hard work, tears and sweat (you know, your typical life for a Quality Engineer worth his/her salt), I succeeded in adding and wearing many types of hats, going from a Senior Quality Engineer, to a Supervisor of the team, to eventually becoming the Manager for a couple of teams, spread over 4 different countries. Am I bragging? Maaaybe a little bit :) but my point is really to highlight a major key factor that made this rapid ascension path possible: Red Hat’s work philosophy and culture of rewarding those who work hard and truly embrace the company! Sure, I worked really hard, but I have worked just as hard before in previous places and gotten nowhere really fast! Being recognized and rewarded for your hard work is something new to me, and I owe a great debt of gratitude to those who took the time to acknowledge my efforts and allowed me room to grow within this company!

The best part of being a Red Hatter for 3 years? Being surrounded by an enormous pool of talented, exciting people who not only enjoy what they do, but are always willing to teach you something new, and/or to drop what they’re working on to lend you a helping hand! There is not a single day that I don’t learn something new, and thankfully I don’t see any sign of this trend stopping :) Have I mentioned that I love my teammates too? What a great bunch of guys!!! Getting up early in the morning and walking to my home office (yeah, they let me work remotely too) day in, day out, is never a drag because I just know that there are new things to learn and new adventures and ‘achievements to unlock’ right around the corner.

I am Red Hat!!!

Tim HopperSundry Links for December 4, 2014

How do I draw a pair of buttocks?: Have you ever wondered how to plot a pair of buttocks in Mathematica? Of course you have.

Frequentism and Bayesianism: A Python-driven Primer: Jake Vanderplas wrote a "brief, semi-technical comparison" of frequentist and Bayesian statistical inference using examples in Python.

skll: Dan Blanchard released version 1.0 of his very cool command line tool for doing experiments with scikit-learn.

Personalized Recommendations at Etsy: A fantastic post from Etsy's engineering blog on building scalable, personalized recommendations using linear algebra and locally sensitive hashing. I like math.

Pythonic Clojure: Andrew Montalenti wrote a post analyzing Clojure from a Python programmer's perspective. It's great.

Caktus GroupCaktus Hosts Lightweight Django Book Launch with Girl Develop It

With Girl Develop It RDU, we celebrated the launch of Lightweight Django (O'Reilly) with the authors, Caktus Technical Director Mark Lavin and Caktus alum Julia Elman. Sylvia Richardson of Girl Develop It MCed. The event was open to the public and so popular we kept recounting the RSVPs and fretting over the fire code. But, phew, we were good. In attendance were friends, family, fellow Cakti, and Django fans from around the Triangle.

Festivities included lots of Mediterranean food, some of Mark's favorite beers, raffled-off gift bags filled with Caktus goodies, and, for those in the first two rows, free copies of Lightweight Django. The main attraction, of course, was hearing Mark and Julia speak. In response to audience questions, they both emphasized code quality with Mark highlighting the importance of iteration in any process. You can read more about their very strong opinions in Lightweight Django. (See what I did there? I totally just encouraged you to buy it. Go buy it!) They also spoke about the ups and downs of writing, the doubts, and the ways they egged each other on. There too was the challenge of working on client projects full time, writing the rest of the time, having young children, and, very occasionally, sleeping. (Note: Mark also was training for and participating in triathlons during this time, a fact I cannot fully comprehend.)

Congratulations again to Mark and Julia for their great achievement. Also, many thanks to Sylvia for her smooth MC'ing skills, Girl Develop It RDU for co-hosting, O'Reilly Media for the books, and everyone who braved the traffic to get here.

Mark Signing LightWeight Django at Caktus Group

Lightweight Django launch audience at Caktus

Joe Gregoriowsgicollection | BitWorking

BitWorking

wsgicollection

The idea of RESTful "Collections", i.e. doing CRUD over HTTP correctly, has been percolating for years now. A Collection is nothing more than a list, a container for resources. While the APP defines a Collection in terms of Atom Feed and Entry documents we don't have to be limited to that. It's time to complete a virtuous circle; RESTLog inspired the Atom Publishing Protocol which inspired David Heinemeier Hansson's World of Resources (pdf) and now it's time to come full circle and get that world of resources in Python.

In particular look at page 18 of that slide deck, where dispatching to a collection of people, the following URIs are to be handled:

  GET    /people         
  POST   /people
  GET    /people/1
  PUT    /people/1
  DELETE /people/1
  GET    /people;new
  GET    /people/1;edit

Now the 'new' and 'edit' URIs can be a bit ambiguous, only in the sense that you might not guess right away that they are nouns, and remember, URIs always identify nouns. I prefer to make the noun-ishness of them more apparent.

  GET    /people;create_form
  GET    /people/1;edit_form

In general, using the notation of Selector, we are looking at URIs of the form:

 /...people/[{id}][;{noun}] 

And dispatching requests to URIs of that form to functions with nice names:

  GET    /people               list()
  POST   /people               create()
  GET    /people/1             retrieve()
  PUT    /people/1             update()
  DELETE /people/1             delete()
  GET    /people;create_form   get_create_form()
  GET    /people/1;edit_form   get_edit_form()
  

Introducing wsgicollection, a Python library that does just that, simplifying implementing such a Collection under WSGI.

Wsgicollection uses Selector indirectly, relying on it to parse the URIs for {id} and {noun}. In theory it will work with any WSGI middleware that sets values for 'id' and 'noun' in environ['selector.vars'] environ['wsgiorg.routing_args']. Here is how you would define a WSGI application that implements a collection:

from wsgicollection import Collection

class RecipeCollection(Collection):

    # GET /cookbook/
def list(environ, start_response):
        pass
# POST /cookbook/
def create(environ, start_response):
        pass
# GET /cookbook/1
def retrieve(environ, start_response):
        pass
# PUT /cookbook/1
def update(environ, start_response):
        pass
# DELETE /cookbook/1
def delete(environ, start_response):
        pass
# GET /cookbook/;create_form
def get_create_form(environ, start_response):
        pass
# POST /cookbook/1;comment_form
def post_comment_form(environ, start_response):
        pass

And this class can be easily hooked up to Selector:

import selector

urls = selector.Selector()

urls.add('/cookbook/[{id}][;{noun}]', _ANY_=RecipeCollection())

Now that I have this Collection class it will ease implementing the APP, but as I indicated earlier, the collection (CRUD) model goes beyond that of just Atom, and we'll dig into that next.

You can find the code here.

Update: Fixed a bug where wsgicollection directly imported selector, which it does not need to do. You will, however, need selector installed to run the unit tests.

Update 2: Updated to support routing_args

2006-09-28

Joe GregorioConfession of an Infinite Looper | BitWorking

BitWorking

Confession of an Infinite Looper

I admit it. I'm a looper. I will load up a single track on CD or mp3 and put it on infinite loop. The same song. Over and over. For possibly days at a time. I know I'm not the only one, so fess up if you're a looper too.

Some of my favorites for looping:

  • Ozzy Osbourne - Crazy Train
  • The President of the United States of America - Feather Pluckin
  • Iron Maiden - Childhood's End
  • Talking Heads - Once In A Lifetime

Today's post is brought to you by Warrant's "Down Boys", which I've been looping since Monday...

I saw "Lost in Translation" last Friday and have been playing Jesus & Mary Chain's "Just Like Honey" ever since.

Posted by Larry O'Brien on 2003-10-03

I am a looper too although I usually loop for only a few hours playing songs like:

Alphaville - Forever Young
Vivaldi - Spring
Eagles - Hotel California
Emotions - Best of My Love
Isley Brothers - Shout
James Brown - Play that funky music
Lee Soo Young - Final Fantasy X Theme
Madonna - Ray of Light
Norah Jones - Don't Know Why
Savage Garden - I Knew I Love You
Vanessa William - Colors of the Wind

I think the wide variety is due to my being moody. :-)

Posted by Don Park on 2003-10-03

i keep looping Tori Amos and Nirvana UNplugged to death - a winning combination for me - still strong after few years ...

Posted by Alek on 2003-10-04

Songs I've looped for several days:
Red House Painters - Revelation Big Sur
Alanis Morissette - No Pressure Over Cappucino
Everything But The Girl - Driving (cover)
Jeff Buckley - Last Goodbye
Massive Attack - Protection
They Might Be Giants - Where Do They Make Balloons
(etc)

As for whole CDs I have looped for days or even weeks, the list goes on. I "learn" entire CDs at a time. For the most part, I wake up with a song in my head and have to scramble to put it on so that it doesn't eat up my brain for the rest of the day.

In the past few months I consistently loop through all my Pizzicato Five tracks on iTunes (8 hours' worth) to stay awake. It's pretty hard to feel down when you're listening to J-pop. :)

Posted by steph on 2003-10-06

My wife happen to look over my shoulder when I was reading this.  "Hah! See?!? I'm not the only one."  She's more of a James Tailor and Van Morrison looper (as classics go).  Otherwise, its every new album.  Right now, it's Dido and Ben Harper, with REM likely on the horizon.

This also reminds me of a guy I knew in college who was really into the various RPGs.  In one (Rifts maybe), he had created a character who was some sort of techno-enhanced soldier.  The background story was that in the heat of a battle, a direct hit caused his music device to get fused to the jack in his head, also breaking the device so it could not be turned off.  One song played over and over and over: "rooster" by Alice in Chains.

There was no point to the story.  The post just reminded me of it.

Posted by Seairth on 2003-10-08

I tend to play Meat Beat Manifesto's "It's the music" over and over again when driving my car. Other favs are "Circle Jack (Chase the magic word lego-lego" by Melt Banana; "The Empty Page" by Sonic Youth; come to think of it, I'm a pretty damn looper myself...

Posted by Adriaan on 2003-10-09

2003-10-03

Joe GregorioClearCase as a leading indicator of small technology company failure | BitWorking

BitWorking

ClearCase as a leading indicator of small technology company failure

Is it just me or is ClearCase a leading indicator of small technology company death? I've never used the product, never even seen a demo, and yes, I know, they position themselves as software asset management for medium to large teams. The question comes from the fact that the only people I know that have used ClearCase have used it in the past, all at small technology companies, and they all, without exception, are companies that have gone out of business.

So fill up the comments with your experiences with ClearCase, good or bad, hopefully with someone from at least one small technology company that has succeeded inspite of deploying ClearCase.

Update: In case you are wondering what CMS you should use, consult kittenfight, where subversion beats ClearCase, and so does BitKeeper.

I can't speak for ClearCase, but I worked for a small tech company that nearly destroyed its market share using Rational's "Unified Process." We didn't recover until we dumped all the weight and moved to something more XPish. Rational is where ClearCase originated.

Posted by petrilli on 2004-08-29

Yes, seen it happen at a .com.  They tried to get clearcase (and the paradise of clearquest integration) going for years, while the dev team plodded along using visual source safe (horrible, but it worked...  as long as you don't try to branch.)  The company was delisted, almost went bankrupt, then sold itself for pennies a share.  Then again, it's hard to blame clearcase when the place was offering free overnight shipping on every order.

An ever surer sign of impending doom: the email that comes out from some new vice president you've never heard of, announcing the deal just signed with $MAJOR_VENDOR to implement AN ERP SYSTEM.  When you see this, you are already dead.

Posted by steve minutillo on 2004-08-30

The thing that always amazed me about ClearCase was the sheer incompetence of their Web interface.  I have never seen an app break THAT many usability rules.  It was a nightmare.

I thought the software was pretty good at the core, but that Web interface was (and may still be, for all I know), utterly laughable.

Posted by Deane on 2004-08-30

Actually, Clearcase originated at Atria and was designed by defectors from Apollo Computers back in the late '80s.  I was supporting Apollo dev tools back then when HP bought and dissected the company, and I was one of the original tech supporters of Clearcase.  Even caught one of Paul Levine's marketing talks for 1.0 beta.

Last I used it was 5 years ago with the Chandra X-Ray Observatory project, which worked well since that project was fractured among many companies and locations around the country and had a large diverse code base.  But after 5 years in the eCompany space I'm not sure who uses it anymore or why.

Posted by chris burleson on 2004-08-30

Now you've probably gone and ruined my Pagerank for "Clearcase sucks".  I wrote down all my grudges against The Great Satan of Version Control (ooohhh, I like that) here: http://www.trmk.org/~adam/blog/archive/000106.html

Posted by Adam Keys on 2004-08-30

Clearcase is neat in a large, multi-team environment, but yes, I can see why it would be the death of a small company.

The worst thing about Clearcase is Rational's attempts to integrate it with the rest of their product range, ClearQuest in particular. The best thing is the ability of a central SCM-guy to keep many teamsworth of source code under some kind of control.

Posted by Alan Green on 2004-08-30

Whoever said that ClearCase has the worst web interface ever hasn't used ClearQuest. We had a vendor who required us to use their ClearQuest server to handle tickets during QA and it was a painful effort. FogBUGZ has some issues, but it's at least quick and easy.

Posted by Bill Brown on 2004-08-30

Hi,
I've used Clearcase some years ago when I was working for a very large
IT companies. They could afford to have 2 Clearcase admins.

For smaller companies Clearcase is just too complicated and requires too much resources.

It has lot's of estoric features, and you probably have some problem with your process, if you really need them :]

Markus

Posted by anonymous on 2004-08-31

I know for certain that here in Singapore, Motorola uses ClearCase in some of their projects.

Posted by Deepak on 2004-08-31

I used it once when I joined a company for a short term contract. The company is still there but I really do not think it will be there next year... if comments are still opened next year, I will confirm this ;)

They had an actual team of 5-6 people devoted to managing that beast. 5 experts that touched ClearCase all day long. As a new developer there, it took me 2 full weeks to manage to get a build working on my machine. Some will argue that the build has nothing to do with ClearCase and you are probably right. But the complexity of things like managing branches caused developers to be lazy and almost never do full updates to newer code. Result: Daily broken builds.

This was (is) no small project: 50 developers working on all tiers of an amazingly complex J2EE application built with WebSphere. Before joining this company, I thought I had seen all nightmares out there. I had not. ClearCase's setup, complexity plus code organization was a huge part of this nightmare.

We used the Windows client. I also remember that for certain operations, we had to login to a unix box and use X to do other specific things (I do not remember what though) not available on the windows client. There were ClearCase admins available 24/7 to support developers that were working around the clock.

Posted by Claude on 2004-09-01

ClearCase is failure, whether the company is big, medium or small. Projects that use it are incapable of recognising their impending death.

Some managers like to buy things that they don't understand - they think they must be impressive. But sometimes they just don't work, and that's why they don't understand them. Developers suffer.

In theory ClearCase can do some things that CVS (or similar) can't. In practice, you don't need to do those things, and you'll never get that far with ClearCase anyway. Get CVS and enjoy actually being able to change a line of code now and then.

Posted by Murray Cumming on 2004-09-02

Interesting.  My company just paid to have me brought up to speed on ClearCase.  I don't think they will be fading away any time soon considering they have already made it past the century mark.  What blows my mind is that this is only one of four versioning systems they use in different parts of the company.

Disclaimer: I am not a fan of ClearCase.

Posted by Peter on 2004-09-04

Interesting comments on Rational service since big blue has taken over: http://www.gripe2ed.com/scoop/story/2004/9/5/14453/96974

Posted by John Beimler on 2004-09-06

Tality Corporation. A spin off from Cadence corporation, doing design services.

Clearcase was purchased. SW development tanked. Tality folds.

Pretty clearcut I'd say.

Posted by Ex Tality Employee on 2004-09-08

Well, I used to work for a multinational that used clear case on large and small projects. It was VERY expensive, but as far as I can tell having since used CVS it is much more powerful. The branching and merging are done properly, and the tagging is also done properly (ie you can see tagged and when...). It also lets you have distributed versioning databases so that you can work in multiple places without having to depend on a central server in overseas somewhere.

The web interface mat not have been the best, but the solaris and windows interface for merging multiple streams of development hammers CVS into a pulp.

Posted by Ryan on 2004-09-08

The branching and merging are done
properly

Branching is done per-file, only when a file has actually changed. As far as I can tell that's for ClearCase-specific performance reasons.

So, instead of one person at one time saying "this project has branched", and then just letting people work on that branch, this means that individual people need to branch each file before they change it,

Firstly, they forget, secondly it's very difficult to make ClearCase do this automatically, thirdly you have to tell people to make ClearCase do this automatically. But most importantly, someone working on a main branch who has no interest in a second branch, will not branch the file because he doesn't care. And then your build breaks. The CVS solution is simpler and usually what you want.

Posted by Murray Cumming on 2004-09-09

ClearCase is really slow.  Explicit checkouts are painful (but the Vim plugin makes it bearable).

The versioning is file-based, like CVS, instead of changeset-based.  I don't know why people would pay so much money for something so fundamentally flawed.

Posted by Kannan Goundan on 2004-09-14

Having switched to a project at wireless telco in Atlanta, I have to agree with everything I've read that you folks have said.

Clearcase and Clearquest are both unnecessarily heavy and unusable. Views, dynamic views, no support for generic linux distros, bastardized command line support, the list goes on and on.

The one thing keeping us on it is the fact that we can't seem to make it so subversion or CVS require a defect number from Clearquest to commit. That alone would give us an out.

There is nothing about the software I like. And I've used VSS, Starteam, CVS, Subversion (My favorite now. Atomic commits, Whoot!), and sccs.

Never work for a team using this software if you have a choice. If you don't supplement it with a subversion repository so you don't have to check in but every once in a while.

Posted by anonymous on 2004-09-16

I am working on a development project, that has 500+ developers in 6 locations worldwide, with 80,000+ files in the system. We use the UCM option in clearcase. Yes clearcase has alot of annoying bits, and the admin overhead can be a big pain. BUT as an admin/developer i see alot of issues are with developer misuse of the system. they beat at it repeatidly, ignoring alot of the main concepts, and then shout when it isnt smooth anymore. And if one more developer comments about how it was much better with CVS, i will break a keyboard over their head. AND finally it can be slow, and the web or WAN options are next to useless

Posted by Liam on 2004-10-16

Odd comments.

I guess we are a smallish company with 200 developers in our office and 800 developers worldwide working on large enterprise development systems.

We use CC and CQ tied together with UCM.
As well as being a developer I am also the admin for CC/CQ on our project.

I have to say that once you get over the initial shock of the appaling interfaces and terrible way the products interface with each other (why do they have different logins neither of which tie into Windows or Unix?), the products do what they claim and rarely need any intervention in my experience.

If you don't use UCM then I can imagine that any company is going to get overwhelmed pretty quickly but if you can live with the restrictions of UCM then I don't see a problem. Merging and versioning is all handled automatically.

I just wish that IBM would rework the products to all be on a common base so that they dovetail together, with the OS, update the interfaces, and improve the performance. (And bring down the crippling price.)

But basically it works for us.

Posted by Rob Scott on 2004-10-19

Let's just clarify, a small company does not have 800 developers. Nor does it have 200 developers.

A small company has less than 10 developers.

Posted by Joe on 2004-10-20

My group has used CC and other systems (primarily for worldwide multisite hardware projects) since the Atria/Rational days.  Our company's worldwide software groups also use CC.

We do have dedicated CC admins, but ours generally do not dictate or enforce configuration mangement (they prefer to let projects choose their own path to success or ruin.)  They just keep the multisite syncs running.

Over the years our company has developed a robust release and configuration management method on top of CC.  Branches are often lifesavers, and we can reliably and automatically generate coherent releases of our designs.  CC is very fast on adequate servers, and ClearMake's DOs and CRs are unmatched.

I am amused by some of the comments above and in the links, which apparently don't understand what is required by CC and what might be imposed by a clueless CC admin.

It doesn't bother me that CC implements its own filesystem and that the raw data is in some database; our RAID NAS works the same way.  I'm never going to see the raw data on a disk somewhere anyway.

Given the resource (staff, hardware, etc) requirements of CC, I heartily agree it isn't for the small developer, just as a (insert-name-of-exotic-car) isn't for the average commuter.  IMHO, if you can afford the infrastructure for it and you take the time to learn to use it properly, I haven't seen anything that can beat it.

Posted by L on 2004-10-26

I have been using Clearquest for about 2 years now. We also have ClearCase. Both are used very widely in our organization. Even though I know ClearQuest and ClearCase have problems but I didnt feel it bad to the extent described in this page.

We are planning to integrate ClearCase with Clearquest and looks like we will have to take our decision very carefully.

Victor Nadar , Mumbai Udyan

Posted by Victor Nadar - Mumbai Udyan on 2004-11-01

We use both CC and CQ in a company of around 100-150 developers. I don't think it's all that bad - granted, the interfaces are shocking and there's a bit of overhead, but I can't see it being the company-killer people are describing...

That said, after four years of CC, we're investigating other options - I gather one team is now running a CVS repository that they periodically sync with CC, and I'd like to see a move to Subversion in future.

Posted by Simon on 2004-11-09

I work for a company that almost exclusively use CC. There have been problems with it ofcourse but nothing major. The features CC provides outweighs the problems (atleast in my book).

There is an initial hurdle before one understands the config-spec syntax. This ofcourse may be the problem with CC because if one can't write correct CS then stuff will not work as expected.

Someone mentioned problems with branching. This is not something I have noticed. If the config-spec is written correctly all files checked out will automagically be branched out to the desired branch.

Posted by anonymous on 2004-11-09

I've used CC at a number of small and medium sized companies, and it seems to me that most of the negative comments here are based on a misunderstanding of how to use the product.  As with most products, CC has a large number of features available to the user, but that doesn't mean you have to use them.

I have a team of 2 C++ programmers who have just migrated from VSS to Base CC.  They use it in exactly the same way (i.e. single stream, no branching) and have no additional overhead with CC.  What they do have is a far more robust system, better merging etc.

I also have a team of 6 Java programmers who migrated from VSS to CC.  At first, I tried to implement the same process in VSS, as I would have done in CC.  I don't think it was even possible!!  Now we use CC UCM and it's a piece of cake.  We have a release stream (branch), a maint stream and a dev stream.  We can easily merge changes between the streams and take full advantages of activities and baselines.

As for admin, there are two of us (of the 8 programmers above) who take care of things.  This basically involves the creation of new dev and maint streams at the end of a release(about 2 hours work max) - not too much of an overhead.

It seems to me that the companies wishing to use CC should think about engaging someone with CC expertise.  This person should be able to set up procedures relative to their project size.

To answer the interface problems, I believe IBM is planing to build CC into the Eclipse framework.  This should improve things in that area.

Posted by Phil Kelly on 2004-11-11

Phil (or anyone), do you mean a dev stream per user or a shared dev stream?  For the teams that share a dev stream, are you using dynamic views (where checked-in items appear instantly in all other developers' workspaces) or snapshot views (where developers "pull" changes when they're ready to integrate)?

If you use per-developer streams (common in UCM), do you find the process of rebasing, delivering, baselining, and recommending baselines to be cumbersome? Is it clear that per-developer streams is implicitly the "edit-merge-commit" style common to CVS and Subversion, but with all the overhead of branches?  (CC streams are really branches.)

We've recently ditched per-developer streams (unless there is the rare need) and use snapshot views on the mainline stream.  ClearCase's support for snapshot views is a little primitive compared to, say, Subversion but is far simpler than using UCM.

Posted by A CC User on 2004-11-12

I have used Clear Case for the past 2 years.  I was the buildmaster/local Clear Case admin for our development teams.  Clear Case is by far (out of any application I have ever used in general) the most difficult and over engineered tool I have ever used.  I have spent countless hours troubleshooting it and dealing with our CC admins.  It is so tightly coupled with the OS that starting up a Windows pc when the network was down caused each developer a half hour startup delay while CC tried to start itself over and over again.  The merge and Diff tools are complete wastes of time.  Imagine dealing with a crap tool every day you came into work.  I would not recommend Rational Head Case to anyone.

Posted by scranthdaddy on 2004-12-01

why are all comments tagged as spam?
oh well forget it

Posted by Another CC user on 2004-12-07

clearcase blows.

Posted by spunkboy on 2005-02-11

I was just browsing the internet for some ideas on software development when I stumbled across this page. I've had a read through and I don't think it represents a fair argument.

I started using CC back in 97 when it was still Atria or Pure Atria (can't remember which!) and CQ from it's beta days in 98/99. I was also in my early 20's and didn't know much about config management so I was just happy to go with the flow. Basically I've just grown up with it, got used to it's quirks along the way and it's the only source control tool I've used professionally. Yes this makes be a bit biased towards ClearCase but I'm not protective of it!

I'll also admit that it took me a long time to fully understand it's methods, and error messages, but at that time the support from Atria was outstanding. Their help desk staff knew their product inside and out. If I had an issue it was usually resolved there and then on the first phone call. If it wasn't then it really was a problem and needed to be investigated. Then Rational turned up and ever since then the support has just gone from bad to worse. Now when I call I have to provide ccdoctor reports, event logs and a million and other things. This usually means a very long email trail and I've usually solved the problem before Rational bother to call me back. Now that IBM have taken over, it's even worse. First of all I have to navigate their website. Ha! At my last company we actually had an expert on the IBM website. If you had a problem you talked to him and he'd navigate their site and send you the URL! Anyway...

ClearCase on it's own with no UCM and CQ integration is a fantastic product. Companies fail with ClearCase implementations for many reasons and license cost is not usually one of them. For example,

1. The projects themselves are planned badly. I have often come in on a Monday morning to find an email saying that development on a new version started last Friday.... How on earth am I supposed to support a new release when I have no input in to it's development strategy, have no ideas about it's lifecycle and can not spend half an hour setting up the new CC environment and perhaps closing off the old one! I could go on but I think most of you will have experienced this whether you are a developer or CC administrator.

2. The CC administrator is usually some poor sole who has no idea about configuration management principles, probably didn't want the job to start with and has a 1 week IBM/Rational course under his/her belt if their lucky. Then the unreasonable requests start coming in from the management teams who think ClearCase is no more complicated than notepad and that this new CC administrator is now a guru! They want projects migrated within the week with no consideration for processes and policies...back to point 1

3. Branching strategies are the next big failure. Don't mean to offend any UNIX guys but ClearCase is like a UNIX system. If it can do it it will do it. It won't complain if it's the wrong way of doing it or suggest a better way. I have seen so many companies using way too many levels of branches that things become unmanageable. A project typically needs 3 or 4 levels branches - main, development, integration and possibly task based branches. Again, back to point 1. Consultation between admins and managers to understand expectiations and plan a strategy.

ClearCase is just a database like any other. Take Oracle for example. If you just slap in a bit of SQL and hope for the best then you might as well use Access. Instead you have to write triggers, procedures, packages and build tools around it. And, as with any database, you have to plan it.

In terms of overhead, ClearCase maintenance should take no more than 3 hours per week. The rest of the time the CC administrator should be writing tools such as build scripts, reporting queries, triggers, automation tools, CM policies/strategies and just generally improving the environment. i.e. things to help developers and keep the boss happy!

I have never had a project fail because of a ClearCase implementation (even with UCM) and I would also expect that those small companies that went under probably would have anyway regardless of the version control tool. Saving the AU$200,000 spent on licensing would probably not have saved the company anyway.

UCM... hmm! First of all this is targeted at managers especially non-technical ones. They like the fancy pictures and models of their project's development. I was disgusted at IBM/Rational's sales pitch (I heard it last week) because, for a non-technical manager, it's like giving candy to a baby. It just gets gobbled up. I personally think UCM should be wiped from this Earth and that anyone who thinks UCM is the solution to their problems or project should be shot! Having said that my boss wants the full UCM rollout..... I know I've just been very harsh but in any event any failure of UCM comes back to points 1, 2 and 3 above. Whether I like UCM or not is irelevant.

The ClearCase Exploder GUI? Let's just say ?bring back ClearCase details?! Why do I need view shortcuts. I'm an administrator and chances are I created most of the 2000 views on the network. I don't need 2000 shortcuts!

I usually work for corporates, oh and by the way, my last job was to support 1500 developers in 8 sites across 6 countries. I was the only CC administrator until I trained someone else up. Total time spent on CC maintenance was probably 1 day a week usually less (assuming no server crashes or catastrophes). Anyway... most of my other jobs had between 10 and 20 developers which is small for a corporate. None of those projects failed because of ClearCase. Most failed because of changing times or crap management. I now work for a small company of 20 staff of which 9 are developers. They are forking out for both CC and CQ licenses at AU$13000 per CC+CQ license. If this company fails because they implemented ClearCase it won't be ClearCase's fault; it will be mine. It will be my fault for not implementing the solution correctly, for not educating the developers properly and overall for being a crap CC & CM administrator.

Problems with CC on Windows? Chances are it's a Microsoft related issue affecting ClearCase. Buy a UNIX box! Windows explorer has just locked up because my server has just crashed. Most of my other applications are now locked up too as a result. If I kill explorer, the other apps spring back to life! CCWeb, ah yes, it runs on a Windows platform and is designed for IE. Apache web server or not I rest my case.

Finally, ClearCase is not always the right tool for the job. Brings me back to point 1. Good project management means evaluating the available software and making the right choice. Whether thats ClearCase, Perforce, Subversion, Bitkeeper or even SourceSafe. It also means employing someone with the right skills, enthusiasm and passion for CM and not just re-training someone to fill an open position. ClearCase is just a tool to assist the CM process. If the CM process is flawed then the tool will fail. Managers and developers will hate it and projects/companies can fail. Back to point 1.

Well I hope I didn't bore you too much and I hope this goes through in one piece because I spent a long time writing it!

Posted by BB on 2005-02-24

How do I integrate Clear Quest in Visual Source Safe.

Posted by Roland Ehi on 2005-03-09

To briefly echo the sentiments of spunkboy, a fool with a tool is still a fool. 

You need to know what you want to do with the tool.  If you can do all of those things with another tool, more power to you.

I have made a reputation out of intergrating tools thaat aren't supposedly integrated.  Base ClearCase (without UCM) is great for that because it has a lot of hooks.  Granted, you have to know how to take advantage of it. 

On a single OS platform ClearCase doesn't take a dedicated admin, except as you work to set it up with automating your processes and putting in enforcement triggers.  I speak from 10 years experience on this.

When working with cross platform (PC to Unix), I have found it is a bit of a mess.  But the mess is strictly based on  crappy networking support on the PC side, not anything to do with ClearCase.

Posted by vobguy on 2005-04-27

Ouch, poor old ClearCase.
I'm trying to set ClearCase up at the moment, and to be fair, it's a bit of pain in the arse. My main question is why the hell do you need ClearCase for 10 or less developers?!?!
Seems like you're using a sledgehammer to crack an egg.
I think VSS would probably do, if anything. You just need a decent, workable CM process.

Posted by Dave LR on 2005-05-27

Well I can see why people with little clearcase skills would not like the tool. However I can asure you in the correct hands a team using clearcase can manage change far better then a team using CVS or worse yet, no cm tool at all.

ClearCase isnt bad, clearcase in unskilled hands is.

Posted by chad on 2005-06-30

I've used CC on four projects now and they've all suffered as a result.

I do believe that CC is fundamentally powerful (in the right hands) BUT the conceptual complexity it brings is overwhelming and it outweighs it's benefits, IMO. Most developers do not have the time nor the inclination to spend the weeks learning how to use the SCM system.

As developers we have enough complexity to worry about. An SCM tool should make working with code easier, not harder.

To me, CC is a fetid putrid piece of stinking dog poop. I hate it deeply.

Posted by Tony on 2005-07-08

Hi Guys,
I was just reading all the comments made from you guys about CC. Well I've been using CC and CQ since a while already. We've started with an old IBM product CMVC. I know that many had their issues with this one as well, but you would not believe how many comapnies out there still use it. I must say that I've seen many other products (VSS, PVCS,CVS,MKS and more recently Seapines software suite), but in some of the concepts CMVC was ahead of its time.
I've originally started in a company origined in germany. We had been aquired (and later sold ) several times. When we started migrating to CC/CQ we used many concepts from CMVC (for good or for the bad).
Well overall I would not say that CC is that bad. It has many many thinks other tools don't have. But at the same time, I believe that with many tools you can make it usable for the needs it is bought for. You just need to know how.

We've succesfully used CC/CQ in a multinational environment utilizing multisite. We've managed to create (with extra effort) a distributed development environment for development around the globe and around the clock.
regards
Ray

Posted by Raymond Masocol on 2005-08-05

Hi,

Judging from the feedback above, ClearCase is an administrator's dream. All the control, all the features, but at the cost of being heavyweight.  To a developer, it represents another complex tool to be overcome in their workday. Hence the schism evident between the developer's viewpoint and the administrator's viewpoint.

This line of discussion raises the question: who is more important to the software development process? The developer or the adminstration staff?

Regards

Will

Posted by Will Waterson on 2005-09-25

We've used Rational at our small (< 10 person) company for a couple of years. I wouldn't say it's been an unmitigated disaster, but it sure as hell is not worth the money. I think a lot of companies make the decision to go to Rational because Rational claims to have a process (RUP) and they convince you that you can't do RUP without their tools. They also try to convince you that every single person needs a full enterprise suite, which is crap. Companies could save a lot of money by figuring out what a good process for them actually is and THEN deciding what tools fit the bill. The most commonly used tools in the suite(CC and CQ) don't do any better IMHO than freely available ones like SubVersion and Bugzilla respectively, and the other ones like Purify/Quantify can be nice but you can probably share one or two licenses across the entire group.

Posted by Ted on 2005-09-27

Just came across this link today as company stated using ClearCase (suckered into it by a Rational Sales team IMHO as  we have succesfully been using CVS for past two year with no problems)!

Quote:
"ClearCase isnt bad, clearcase in unskilled hands is."
If the tool was any good it would be developer proof! CVS up and running in less than an hour (I know a little slow) CC a week and still it's causing the development team nightmares - We will probably thorw out and go back to CVS and just let managment think using it!!!

.... Don't even get me started on the "RUP'ish" - A great process if throw out the "R" bit of it - UP as put forward in XP world great!

Posted by John on 2005-10-03

I fully agree with many of the comments above

1. UCM is bad, very bad. It is all pretty pictures and management-speak. Anybody with a few minutes to spare can use the trigger feature available in base ClearCase to implement a much better, site-specific version of UCM without loosing many of the better features of base ClearCase. Do not use UCM!

2. Base ClearCase config specs are incredibly powerful and knowing how to at least read them is important for a developer.

3. If I gave a responsible, apparently intelligent, adult a chainsaw and they cut their leg off would it be my fault? I love the commment that a fool with a tool is still a fool. That is so true. OK, ClearCase may need a bit of thought to set up (it certainly isn't plug 'n' play) but it is worth it for the right size team.

4. ClearCase is overkill for small projects

5. CC/CQ integration is easy and in my experience has never failed

6. One company I worked at who used ClearCase very nearly went under. It was not the fault of ClearCase. ClearCase wasn't (allegedly) playing with the books, writing crap contracts and deceiving the stock market.

So in summary, don't use ClearCase if you are small because it's not worth it, don't use UCM no matter what your size, and ClearCase can't bring your project/company down without significant help and assistance.

Posted by Jim on 2005-10-10

I fully agree with many of the comments above

1. UCM is bad, very bad. It is all pretty pictures and management-speak. Anybody with a few minutes to spare can use the trigger feature available in base ClearCase to implement a much better, site-specific version of UCM without loosing many of the better features of base ClearCase. Do not use UCM!

2. Base ClearCase config specs are incredibly powerful and knowing how to at least read them is important for a developer.

3. If I gave a responsible, apparently intelligent, adult a chainsaw and they cut their leg off would it be my fault? I love the commment that a fool with a tool is still a fool. That is so true. OK, ClearCase may need a bit of thought to set up (it certainly isn't plug 'n' play) but it is worth it for the right size team.

4. ClearCase is overkill for small projects

5. CC/CQ integration is easy and in my experience has never failed

6. One company I worked at who used ClearCase very nearly went under. It was not the fault of ClearCase. ClearCase wasn't (allegedly) playing with the books, writing crap contracts and deceiving the stock market.

So in summary, don't use ClearCase if you are small because it's not worth it, don't use UCM no matter what your size, and ClearCase can't bring your project/company down without significant help and assistance.

Posted by Jim on 2005-10-10

2004-08-29

Joe GregorioChrysler 300 | BitWorking

BitWorking

Chrysler 300

Chrysler 300

The first time I saw the new Chrylser 300 I hated it.

The next time I saw it I was intrigued.

The third time I saw it the Dick-Tracy-Muscle-Car-Terra-Plane shape had firmly wrapped it's tentacles around my lizard brain.

I don't own one.

Yet.

I use a car service for some of my business trips (travel to the airport).  The service I like best uses the 300 and it is a spectacular car for this purpose. 

From the perspective of a rider in the back seat, the car if very comfortably and spacious.  It also has enough cup holders and arm rest space to be quite functional as an en-route office.

I haven't driven the car, but the drivers have all commented that they like the car very much, and since they're driving for 6+ hours a day I think their opinion is quite valuable.

Posted by Lou on 2005-06-21

2005-06-21

Joe GregorioChristmas 2003 | BitWorking

BitWorking

Christmas 2003

There were cookies.

Picture of decorated sugar cookies

2003-12-25

Joe GregorioChina - Day 9 and 10 Chongqing to the White Swan Hotel in Guangzhou | BitWorking

BitWorking

China - Day 9 and 10 Chongqing to the White Swan Hotel in Guangzhou

Day 9 was pretty uneventful, Lynne, Caden and I spent some time in the morning shopping with Mark, Moya and Andie. The afternoon was filled with paperwork and the rest of the day was spent packing for our travelling to Guangzhou the next day.

Part of our shopping was to buy a new large suitcase. We didn't really need it for this leg of the trip but we plan on doing a lot of shopping in Guangzhou, enough that we know we'll need a whole extra suitcase for everything we are going to purchase.

The paperwork took over two hours to complete, not that it bothered me much as Lynne took care of it while Caden and I had together time. One of the things we did together was get more pictures. Here is a shot of the building I was talking about the other day, where a shop has been setup on the ground floor and the building isn't even complete.

Store and noodleshop setup in a building still under construction

I also got some pictures of what our group referred to as "the big dig" next to the Marriott. It is a large excavation for a basement, one of the three new highrises being constructed in the front of the Marriott. The culture here is definitely not as safety conscious as it is in the US. The pictures I took here are from the driveway of the Marriott. The only thing that separated me from a 50 foot fall to the bottom of the pit was a thin landscaped area. No fence. No barriers. Nothing.

Large excavation for the basement of a highrise.

The paperwork that needed to be done is much longer and more complex than anything we've had to fill out so far. That's because it is all for the American government and not for the Chinese government. Over the next 4 days we will complete all the steps we need to adopt Caden in the US.

Day 10

After packing we turned in early, got up early and were on the bus to the Chongqing airport by 9AM. A one and a half hour flight landed us in Guangzhou and a 20 minute bus ride got us to the White Swan Hotel. The White Swan is a 5 star hotel that is a favorite of adoption agencies to put up their clients because it is so nice and it is conveniently located to the services we need. The American consulate is literally a 5 minute walk down the street. Now we were fortunately warned from others that had gone before us that if you stayed at the Chongqing Marriott then not to expect too much when you go to the White Swan. They were right. I am quite honestly not that impressed. They messed up our room assignment, first giving us a room on the 11th floor then retracting it, saying the room wasn't ready, then after waiting they gave us a room on the 15th floor, but it too wasn't cleaned so they gave us the information, but refused to give us the keys until the room was done. Wendy, one of our agency's representatives out of New York was kind enough to let us use her room on the same floor until ours was cleaned. They assured us that they would get us from Wendy's room and give us the keys as soon as it was done. Well, they didn't and 20 minutes later we checked on things to find that the room was done, but our keys were not available, and they'd forgotten to set up a crib in the room. They eventually got the crib set up and the keys to us, but all-in-all it was a pretty shabby introduction to the hotel. Oh, and the rooms are smaller, the beds are smaller, the bathroom is absolutely cramped with no counter or drawer space. In general the staff have been professional, but they verge on snooty and are nowhere near as nice as the staff at the Chongqing Marriott.

The entire feel of the area is very different from Chongqing. The air is much cleaner and the area surrounding the White Swan is almost alien, as it is all European styled buildings, something to do with Britian and France and the Opium Wars. We'll go into that more tomorrow.

We did a little exploring, mostly to get some bottled water and Coca-Cola, and to find the local laundry.

Water

Water has been constantly on our minds during this trip because none of the tap water is potable. As a matter of fact it has been various shades of brown and tan over the last few days. Given that we don't in anyway want to ingest the water we go through a lot of bottled water. It also means that we have been washing all of Caden's bottles and bowls with bottled water too. You never realize how much water you use washing dishes until you're pouring that water out of a bottle and not from a tap. The water restriction means that at every hotel we have been hanging a towel over the tap to remind ourselves not to use it, even for things like rinsing your mouth after brushing your teeth, or for rinsing your tooth brush off afterwards. Bottled water for everything. Luckly we can pick it up at a reasonable price, a two liter bottle runs only about 70 cents.

Thanks

Special thanks to Ralf for converting the flower street videos into divx, quicktime and mpeg4!

2004-01-10

Joe GregorioChina - Day 8 - Another walk around Chongqing | BitWorking

BitWorking

China - Day 8 - Another walk around Chongqing

Another day and another walk around Chongqing. This time Caden and Lynne accompanied me and Lynne took the pictures.

Porter on a crowded street

Whenever we go out with Caden she gets put in the sling, which does several things. The first thing it does is save my back. The second thing the sling does is cover most of Caden, with a hat only her eyes are visible outside of the sling if she's awake, and she is completely hidden if she falls asleep. The reason you want to keep her hidden is to avoid the "clothing police". The "clothing police", as we call them, are well meaning older chinese women that will run up to you and cover any exposed part of a child and admonish you to keep them covered. The first time we went out Caden had the lower part of her legs exposed and three times on the walk we had ladies approach and pull her pant legs down further over her socks. The "clothing police" are only bested by the "thumb police". Caden sucks her thumb which is apparently a big no-no in China and complete strangers have no problem walking up to her and yanking her thumb from her mouth while admonishing us. Now you could try to explain that we just adopted her and that this is a particularly stressful time for her and we'll deal with the thumb sucking at a later date, but when you only speak english and they only speak Mandarin, well, you just keep the thumb hidden.

Caden in the sling.

Being so deep in China anybody who isn't non-Asian is a rare sight and Lynne and I are constantly stared at just for our looks alone, but the attention is much greater when were out with Caden. Lots of pleasant stares and smiles as we walk along, but if we stop, a crowd will gather. This has an interesting dynamic as many people try out their English on us and if one person 'makes contact', that is they say something we understand, they instantly get promoted to 'local translator' and everyone around them starts peppering them with questions to ask us. News that we are Americans and that we just adopted Caden is always greeted with huge smiles and great appreciation. Here I am getting slightly mobbed on the pedestrian mall.

A crowd gathers around me and Caden

These two girls were at first shy, but after asking about Caden and getting a couple pictures of themselves taken they started to ham it up.

Two children.

One of the strangest sights in both Beijing and Chongqing has been the sight of decidedly non-Asian mannequins. And it doesn't just end with the mannequins, a good portion of the advertising is populated with decidedly European looking models.

Mannequins

Did I mention that there's a lot of people here.

A crowd

One of the things I won't be able to get across, no matter how many pictures or videos I take, is the amazing contrasts that you can find in Chongqing. The brand new building in the front is built right up to, and includes a bit of a facade for, the aging apartment building behind it. Everywhere you see these contrasts and amazing changes. There is a high-rise across the street from the hotel that is under construction. In the uncompleted ground floor of the building people have already setup a noodle shop. From our hotel window I can count 17 cranes on 15 high-rises currently under construction.

Two buildings

Even in the midst of the all the construction and modernity there are reminders that this is an ancient land and culture. Less than half a block from our hotel is this drug-store which is six feet wide it is a single aisle twenty feet long.

Drug store

I hope you're all enjoying the photos. Once I get back to the US I need to learn some more about photography in general and digital photograpy in particular. For example, here is my current 'system' for handling digital photos.

  1. Load photo into Paint Shop Pro.
  2. Crop and resize.
  3. Press "One step photo fix..."
  4. If it doesn't look good then choose another photo.

Joe - it's been really interesting reading about your trip. I can totally relate to some of your "Lost in Translation" moments, as my wife is Chinese (I am not) and we occasionally visit Taiwan.

Congratulations on the adoption!

Posted by Craig Andera on 2004-01-08

2004-01-08

Joe GregorioChina - Day 7 - A short walk around Chongqing | BitWorking

BitWorking

China - Day 7 - A short walk around Chongqing

This afternoon I went for a short walk walk in the area immediately around the Marriott. I looped through the 'pedestrian mall' which is a shipping area where the streets have been closed to cars, and the local flower market.

A crowded sidewalk

This Zhonghua Road, a busy shop lined road on the way to the pedestrian mall. There are a mass of shops along these roads, some no more than 5 feet wide, selling everything from fine suits, to noodles, to shoes. Some of the clothing and shoe shops have their production setup right on the sidewalk, ancient looking sewing machines running all day long. It might look like a trick of the light but the leaves really are that dull grayish-green color. The coal fired smoke and fog combine to produce a heavy smog that rains down a fine soot over everything in the city.

There aren't many private vehicles on the roads, mostly buses and taxi cabs. This is actually a pretty rare shot with the street completely empty. The careful art of crossing the street as a pedestrian, well, I'll leave the description of that for anther day.

An alley way.

The alley way above is a nice study in contrasts. The smaller buildings nestled back in the alley are much shorter and older than the ones near the front of the building. In the back you can see two high-rises in the mist. Both buildings are incomplete, the one on the left you can see the ever present crane on top.

Busy street with a porter

The man on the right hand side of the above picture with the long bamboo pole in his hands is a member of what the locals call the "Pole Army". Chongqing is a very mountainous region and bikes are not much use here, instead there are droves of men with poles that work as porters that carry items around the city. I've seen them carrying some pretty heavy loads. In the center of the picture is one of the clothing shops with the sewing machines set out on the sidewalk.

This brings out a recurrent theme I've seen across the whole trip into China. Whatever problem we would solve with technology in the US, the Chinese solve with people, and where we would use people to do a job in China they would use more people. Where we would use trucks, they have porters. Where we would use backhoes, they use laborers. Another example, in every grocery store and department store we have been in from Beijing to Chongqing, they are staffed with one person per aisle. Yes, you read that correctly, one person per aisle. We have yet to eat in a sit-down restaurant where the patrons outnumbered the staff.

McDonalds sign on a busy street

What good is a city without a McDonalds? How about 60 of them in Chongqing alone. There are two within walking distance of our hotel. This picture is taken on the pedestrian mall, with the McDonalds and a noodle shop on the left. The large mass of people in the middle of the picture are sitting on the benches eating noodles. Huge steaming mounds of delicious smelling noodles served in thin plastic bowls and eaten with the same disposable wooden chopsticks you get at Chinese restaurants in the US.

entrance to the flower market

Here is the flower market across the street from the hotel. I bought Lynne a half-dozen roses here on my way back. It was my first time haggling and I didn't do a very good job, but I got her the half-dozen (actually 7 roses) for 10 Yuan or $1.25US. One of the nicest things about being in China is that I am getting used to the prices and going back to the US is going to be difficult, and we haven't even gotten to do a lot of shopping outside of department stores where there isn't any haggling.

open air flower market

The market is more like an alley, with vendors selling everything from individual flowers, to huge arrangements, to bonsai trees, to gardening supplies. The alleys winds through the block and turns right and as turns it trasitions from flowers to food staples and cooked food. The above picture is a shot looking back down the alley towards the entrance. Each of the rooves on the left is a different shop selling different tools, produce or plants. For those of you with good connections I shot a short video (440KB) of the pedestrian traffic flowing into the entrance of the flower market . This is a pretty good representation of the pedestrian traffic and noise levels that are continuous throughout the city. It is a WMV file that I've tested in both Windows Media Player and Real Player. I can provide the original AVI file to anyone that wants to convert it into different formats, for which I would be eternally grateful.

Special thanks to Ralf for converting the flower street video into divx, quicktime and mp4!

Flower vendor wrapping up a sale.

This final shot is about half-way up the alley, a vendor is wrapping up a sale to the people standing in front of him. Behind him wends another alley, orthogonal to the current one that contains more plants and trees for sale. There aren't many flowers on the branches he's sold but they have a very strong scent. Speaking of scent, that's one of the things about China that I can't blog. There are wonderful scents like the flower market above, and the great smells of food wafting out of some of the stalls. On the other hand there are some pretty awful stenches, some that are nauseatingly familiar, others that of unknown origin and are completely beyond any previous experience.

Thank you so much for your tour.  I have a friend who lives in Chongqing and it is my desire to visit her soon.  You have permitted a glimpse, I thank you for that.

Stephen

Posted by Stephen Cotta on 2004-04-18

2004-01-07

Joe GregorioChina - Day 6 | BitWorking

BitWorking

China - Day 6

Today was a very low key day.

I wanted to keep a low profile as I am still recovering from the stomach flu, and we also want time with Caden, so Lynne and I skipped out on the scheduled site-seeing and stuck around the hotel. In the morning we walked around the nearby pedestrian mall with Mark and Moya doing some window shopping and stopping by the grocery store on the way back for necessities. I'm still not coherent enough, I forgot to bring along the camera, a sure sign that staying back was a good idea. The rest of the day was spent playing with and taking care of Caden, eating and exploring the hotel (you laugh, but this place is 39 stories tall, has 8 restaurants and about that many shops).

Caden playing with a rattle. Caden playing with a teething ring.

We took a lot of pictures of Caden, but because she is always laughing and playing so hard most of the shots end up looking like this one.

Blurry Caden playing.
I have to keep saying it -- she's beautiful, and looks thoroughly happy!  I couldn't be more happy for you.  This has definitely been a long gestation :)

Posted by Eric Vitiello on 2004-01-07

2004-01-06

Joe GregorioChina - Day 5 - Gotcha Day | BitWorking

BitWorking

China - Day 5 - Gotcha Day

This is the day, Gotcha Day, the day we get Caden.

Lynne with Caden

She came right to us, with no crying at all.

Joe and Caden

It was very cute, she sat on Lynne's lap and just stared at me for a good 5 minutes. Then she reached out for me. We sent pictures out to the orphanage earlier and I think the Aunties did a good job of showing her the pictures because she seemed to recognize us.

Officially Adopted in China

After getting the babies we went to complete the official adoption paperwork. We have now officially adopted her in China. Now we wait for some of the paperwork to get back to us and then on to Guangzhou to adopt her in the US.

What you don't see in the pictures is that I am deathly ill. I caught the stomach bug. For the last picture just seconds before it I was sleeping on a chair in the lobby, shivering with the chills. I went back there immediately after the picture was taken. After we returned to the hotel I slept for 15 hours. I'm feeling better now and can hopefully more fully enjoy this time with our daughter.

Congratulations!!  She's beautiful!  I'm glad it's working so well, and wish you a safe trip home.

--Eric

Posted by Eric Vitiello on 2004-01-05

Congratulations!!  She's beautiful!  I'm glad it's working so well, and wish you a safe trip home.

--Eric

Posted by Eric Vitiello on 2004-01-05

Congratulations Joe & Lynne!  What a wonderful thing for you.  Best wishes to you all.

Posted by Jason Clark on 2004-01-05

Congradulations!  Just wonderful.

Posted by Don Park on 2004-01-08

2004-01-05

Joe GregorioDetecting Benchmark Regression

Subtitle if this were an academic paper, which it’s not: A k-means clustering derived point statistic highly correlated with regressions in commit-series data with applications to automatic anomaly detection in large sets of benchmark results.

TL;DR: To detect regressions in benchmark data over a series of commits use k-means clustering on mean and variance normed commit-series. For each of the clusters find the best fitting step function to each cluster’s centroid. The metric |step/fit| is highly correlated with regressions, where step is the height of the step function, and fit is the mean square error of the fit of the step function to the centroid.

Below is the description of how we detect performance regression for the Skia graphics library. I’m writing this up because after much searching I haven’t found anyone describe the method we came up with for detecting performance regressions and maybe this writeup will be useful to other people.

Problem Statement

Skia is an open source cross platform 2D graphics library. In Skia, like many other software projects, we have large number of performance tests, aka benchmarks, and we run those benchmarks every time we change the code. Just having a large number of benchmarks isn’t a problem, but being cross platform means running those tests across many different platforms; Linux, Mac, Windows, Android, ChromeOS, on different GPUs, etc. which leads to a combinatorial explosion in benchmark results. For every commit to Skia today the testing infrastructure generates approximately 40,000 benchmark measurements. That number of results tends to change frequently as tests, platforms, and configurations are added and removed regularly. The number of results has been over 70,000 per commit in the past several months.

Definitions

To make the following discussion easier let’s define some terms.

Trace
A Trace is single benchmark and configuration tracked over a series of commits. Note that this isn’t exactly a time series since the measurements aren’t taken at equidistant times, but are spaced by commits to the codebase. Also note that for each benchmark there may be multiple traces, for example, one for Windows 8, one for Linux, and one for Android.

Fig 1 - Trace

Regression
A “performance regression” is a significant change in either direction of a metric. Now a metric that drops may actually be a good performance increase, but could also be an indication of a test that is broken, or has stopped working entirely. So regardless of the benchmark, we are looking for step-like changes in either direction.

The issue with tens of thousands of traces is that you just can’t look at the raw numbers, or even plot all the data, and figure out when you’ve had a regression. At first we tried setting performance ranges for each trace, i.e. an upper and lower bound for each trace. If a later commit caused the benchmark results to move outside those bounds then that would trigger an error. There are many drawbacks to monitoring benchmarks by manually placing bounds on each trace:

  1. The most important drawback is that in such a system a single test result can trigger an alert. You know old phrase, “the plural of anecdote isn't data”, a single benchmark measurement is virtually meaningless as any number of anomalies could actually be responsible for that benchmark result changing. For example, a machine could overheat forcing a move to frequency scaling, or other processes on the machine may starve the test of CPU cycles. You can work to mitigate these eventualities, but they never completely go away. SPC systems such as the Western Electric rules might be applicable in this case, but we’ve never tested them.
  2. Constant manual editing of trace bounds is time consuming and error prone.
  3. Constantly adding manual trace bounds for new benchmarks is time consuming. Add one test and you may have to add many trace bounds, one for each member of that combinatorial explosion.
  4. Forgetting to add new ranges for new benchmarks another source of error.

Even if you automate the placing of trace bounds, you still have the issue of transient behavior that looks like a regression, and you also have to take pains that the automatic calculation of trace bounds doesn’t mask a true regression.

Fig 2- Is this a regression or an anomaly?

So we needed a better system than setting trace bounds. The next section explains the system we implemented and have successfully run for several months now.

Before we go further let’s define a few more terms.

Normalized Traces
Normalization is the process of modifying each Trace so that it has a mean of zero and a standard deviation of 1.0. Note that if the standard deviation of a trace is too small, then blowing that up to a standard deviation of 1.0 would introduce nothing but noise, so there’s a lower limit for the standard deviation of a trace, and below that we don’t normalize the standard deviation of the trace. The idea is to extract just the shape of the trace, so that all the normalized traces are comparable using a sum of squares distance metric. The smaller the sum of squares error is between two normalized trace, the more similar their shapes.
k-means clustering
I’m not going to explain k-means clustering in one paragraph, you should go look it up on Wikipedia or any of the fine explanations available on the web. The important point is that we are using k-means clustering to group normalized traces together based on their shape. The idea is that many traces will move together in the same direction from commit to commit. For example, if I speed up rectangle drawing then all tests that use rectangles should get faster, if not in the same proportion.
Centroid
The centroid is the center point at the center of a cluster. In this case the mean of the normalized traces in a cluster, which acts as a prototype shape for the members of the cluster.
Regression Factor

For each cluster of normalized traces we find the best fitting step function to the centroid. From that best fitting step function we calculate Fit and Step, where Fit is the sum of squares error between the step function and the centroid, and Step is the height of the step function.

From there we calculate the Regression Factor:

R = Step / Fit

A smaller Fit values gives you a larger R, which means that the more a centroid looks like step function the larger R gets. Similarly the larger Step gets the larger R gets, which is a measure of how big of a change the centroid represents.

Putting it all together.

So finally, with all the preliminaries set up, we can get to the core of the system.

  • Collect all Traces over a specific range of commits. We usually use around the last 100-250 commits worth of data.
  • Normalize all the Traces.
  • Perform k-means clustering on all the Normalized Traces.
  • For each cluster calculate the Regression Factor of the centroid.
  • Any cluster with a Regression Factor whose absolute value is over 150 is considered interesting enough to need triaging. Note that 150 was chosen after observing the system in action for a while, the cutoff may be different for your data.

Here’s a view of the system at work, finding regressions in performance. Note that out of 40,000 benchmarks the first cluster contains 1336 traces and has a Regression Factor of -4.08e+3.

Screenshot 2014-11-23 at 1.14.19 PM.png

Continuous Analysis

The above system works for finding regressions once. But what happens if you want to check for regressions continuously as new commits land and new benchmark results arrive? One last definition:

Interesting
A cluster is considered interesting if it’s Regression Factor is over 150. This is only a rule of thumb based on observing the system and may be relevant only to the Skia benchmarks, while a different cutoff may be appropriate for other datasets. The important point in that as |R| grows so does the likelihood of that cluster being a regression.

To continuously monitor for Interesting clusters, start by running the above analysis once and find interesting clusters. If there are any then triage them as either really needing attention, such as a CL needs to be rolled back, or ignorable, say in the case where a true performance increase was seen.  Then on a periodic basis run the analysis again when new data arrives. What should be done with the new set of interesting clusters produced from the analysis? The existing interesting clusters have already been triaged, and those same clusters may appear in the output of the analysis, and new interesting clusters may appear. The process of only raising up new interesting clusters for triaging while folding existing clusters with similar clusters that appear in the analysis results is called cluster coalescing.

Cluster coalescing currently works by looking at all the new interesting clusters and if they have the same traces as the 20 best traces in an existing cluster then they are considered the same cluster. Note that ‘best’ means the 20 traces that are closest to the centroid of a cluster. Note that this is an area of active work and we are still experimenting regularly with new cluster coalescing schemes.

Wrap Up

I hope that was useful. Please shoot me any questions on Twitter @bitworking. The code for the software that does the above analysis, and much more, is open sourced here.

Tim HopperSundry Links for November 24, 2014

brushfire: Avi Bryant has been building a 'Brushfire is a framework for distributed supervised learning of decision tree ensemble models in Scala.' Fun stuff!

What are the lesser known but useful data structures?: I always enjoy StackOverflow questions like this, but it is not considered a good, on-topic question for this site, of course.

Free Programming Books: A huge, crowd-sourced list of free programming books by language and topic.

PhD Dissertations-Machine Learning Department: Seven years of ML PhD dissertations from Carnegie Mellon University. I wish I had time to read Tools for Large Graph Mining.

Caktus GroupQ4 ShipIt Day: Dedicated to Creativity

This October, nearly everyone at Caktus took a break from their usual projects to take part in Caktus’s 8th ShipIt Day. Apart from a few diligent individuals who couldn’t afford to spare any time from urgent responsibilities, nearly everyone took a break to work and collaborate on creative and experimental projects, with the aim of trying something new and ideally seeing a project through from start to finish in the space of a day and a half.

Participants in ShipIt Day worked on a variety of projects. We all had the chance to try out Calvin’s shared music player application, SharpTunes, which within the first few hours was playing “Hooked On A Feeling”. It utilizes peer-to-peer sharing, similar to BitTorrent, to more efficiently distribute a music file to a large number of users while allowing them to simultaneously listen to a shared playlist. On his blog, he describes how he achieved proof of concept in under an hour and some later challenges with arranging playlists.

Caktus Sharp Tunes - Caktus Ship It Day

David worked on improving the UX (user experience) for Timepiece, our time-tracking system. While getting the chance to brush up on his Javascript and utilize bootstrap modals, he worked on improvements for the weekly schedule feature. The current version, although very handy, is increasingly difficult to read as the company grows and more employees’ hours are on the schedule. Therefore, David built a feature which makes it possible to view individual schedules as a modal. Rebecca provided some assistance in getting it deployed, and although it’s not quite done yet, it should save us all a lot of trouble reading the schedule from the back of big standup meetings.

Timepiece - Caktus Ship It Day

Tobias built tests for the Django Project Template. The Django Project Template makes it easy to provision new projects in Django, but testing the template itself can be difficult. Therefore, the tests, which can be run to test the template on a new server and then reports back in HipChat, should improve usability of the template.

Vinod worked on adding Django 1.7 support to RapidSMS, and with help from Dan, successfully reached his goal by the end of ShipIt Day. For next ShipIt Day, he hopes to implement Python 3 support too.

Brian set up a Clojure-based Overtone real time music environment, and although he didn’t reach his goal of using it to build new instruments, he did succeed in creating, in his own words, “some annoying tones.”

Victor and Alex collaborated on School Navigator (still a work in progress) for Code for Durham, designed to help Durham residents understand the perplexing complexity of public school options available to them. Alex imported GIS (geographic information system) data from Durham Public Schools, modeled the data, and built a backend using django-rest-framework. Victor contributed the frontend, which he built using Angular, while getting the chance to learn more about Appcache and Angular.

NC School Navigator - Caktus Ship It Day

Rebecca did some work for BRP Weather using django-pipeline, which gave her, Caleb, and Victor the opportunity to compare the pros and cons of django-compressor and django-pipeline. Although she finds the error messages with django-compressor to be a nuisance and prefers how django-pipeline handles static files, django-pipeline is not very helpful when it can not find a file and has some issues with sass files.

Michael continued designing a migraine-tracking app. He designed a simplified data entry system and did some animation design as well. The app is intended to track occurrences of migraines as well as potential triggers, such as barometric pressure and the user’s sleep patterns. Trevor also contributed some assistance with UX details.

Dan made progress on an application called Hall Monitor which he has been working on since before coming to Caktus. It accesses an office’s Google Calendars and uses string matching to check event names on calendars in order to determine who is in or out of the office. For instance, if someone has an event called “wfh” (working from home), it concludes that they are out of the office. Similarly, if someone is at an all-day event, it also logically concludes they are probably out. He demonstrated it to us, showing that it does indeed have an uncanny ability to track our presence.

Caleb set up Clojure and Quil and built an application for displaying animated lines in Quil which allows you to use Processing in Clojure. By modifying the program, the user can instantly modify the animation, creating interesting effects. He also created a Game of Life which runs in Quil (see below) and finished a Scheme implementation of Langton’s Ant in Automaton .

Animated Quil Lines - Caktus Ship It Day

Scott used the time to back up changes as well as add a KVM (keyboard, video and mouse) switch to the server rack.

Wray worked on a couple different projects. He completed a Unity tutorial which involved building a space shooter game which runs on Android, which we all got to try out. He also used the time to work on Curvemixer, which creates interesting vector graphics using curves and circles.

I took the time to write some help files for an application designed to allow medical students to test their radiology knowledge. The help files should allow students and instructors to better understand the many features in the application and writing them allowed me to practice documentation creation.

Overall, ShipIt Day was a very productive and refreshing experience for everyone, allowing us to spend time on the sorts of projects we wouldn’t usually find time to work on. Moreover, we got the chance to find new solutions to projects we may have been stuck on through collaboration.

Tim HopperSundry Links for November 17, 2014

There's no magic: virtualenv edition: I didn't really get virtualenvs until long after I started programming Python, though they're now an essential part of my toolkit. This is a great post explaining how they work.

Traps for the Unwary in Python’s Import System: "Python’s import system is powerful, but also quite complicated."

pyfmt: I recently learned about gofmt for auto-formatting Go code. Here's a similar tool for Python.

Q: Setting User-Agent Field?: A 1996 question in comp.lang.java on how to set the user agent field for a Java crawler. The signature on the question? Thanks, Larry Page

alecthomas/importmagic: Python tool and Sublime extension for automatically adding imports.

Caktus GroupSupporting Increased Healthcare Access with NCGetCovered.org

We’ve launched NCGetCovered.org, a site dedicated to helping North Carolinians gain access to health insurance. As many know, enrolling in health insurance can feel daunting. NCGetcovered.org aims to simplify that process by centralizing enrollment information and great resources like live help. The site is launching ahead of the November 15th open enrollment period for the federal healthcare exchange (healthcare.gov).

NCGetCovered.org is a testament to the hard work of the many dedicated to enrolling the uninsured. Caktus created the site on behalf of the Big Tent Coalition, a nonpartisan consortium of more than 100 organizations and 320 individuals pulled from community-based organizations, hospitals, insurance carriers, in-person assisters and non-profit organizations.

Taking the lead on this web project was our neighbor in Durham, MDC, a nonprofit dedicated to closing opportunity gaps and a Big Tent member. MDC is an incredibly forward-thinking organization and saw early on the need for a one-stop shop for health insurance enrollment information. We feel very fortunate to be MDC’s partners in increasing health insurance access in our home state.

Caktus GroupOpen Data Project in Durham - Thumbs Up to Open Government!

In exciting local news, Durham and Durham County are launching a new site dedicated to centralizing public data in Summer 2015. Their press release mentions a health sanitation app Code for Durham built as a model of civic engagement with open data. Our own co-founder and CTO, Colin Copeland, is co-captain of Code for Durham, a volunteer organization dedicated to building apps that improve government transparency.

Their press release describes the project:

“The City of Durham and Durham County Government are embarking on an open data partnership that will lay the groundwork for businesses, non-profits, journalists, universities, and residents to access and use the wealth of public data available between the two government organizations, while becoming even more transparent to the residents of Durham.”

We’re looking forward to seeing all the great apps for Durhamites that result from this big step towards open government!

Caktus GroupWe've Won Two W3 Awards for Creative Excellence on the Web!

We’re honored to announce that we’ve won two W3 Silver Awards for Creative Excellence on the Web. The awards were given in recognition of our homepage redesign and DjangonCon 2014. Many thanks to Open Bastion and, by extension, the Django Software Foundation for selecting us to build the DjangoCon website. Also many thanks to our hardworking team of designers, developers and project managers that worked on these projects: Dan, Daryl, David, Michael, Rebecca, and Trevor!

Here’s a quote from Linda Day, the director of the Academy of Interactive and Visual Arts (the sponsors of the award):

“We were once again amazed with the high level of execution and creativity represented within this year’s group of entrants. Our winners continue to find innovative and forward- thinking ways to push the boundaries of creativity in web design.”

We’re particularly humbled to learn that there were 4,000 entries this year and to be in the company of winners like Google, ESPN, Visa, and Sony and the many other wonderful companies that received recognition. We’re looking forward to continuing to build great web experiences!

The official press release: http://www.prweb.com/releases/2014-CaktusGroup/11/prweb12306675.htm

Tim HopperSundry Links for November 12, 2014

Amazon Picking Challenge: Kiva Systems (where I interned in 2011) is setting up a robotics challenging for picking items off warehouse shelves.

contexttimer 0.3.1: A handy Python context manger and decorator for timing things.

How-to: Translate from MapReduce to Apache Spark: This is a helpful bit from Cloudera on moving algorithms from Mapreduce to Spark.

combinatorics 1.4.3: Here's a Python module adding some combinatorial functions to the language.

Special methods and interface-based type system: Guido van Rossum explains (in 2006) why Python uses len(x) instead of x.len().

Caktus GroupUsing Amazon S3 to Store your Django Site's Static and Media Files

Storing your Django site's static and media files on Amazon S3, instead of serving them yourself, can make your site perform better.

This post is about how to do that. We'll describe how to set up an S3 bucket with the proper permissions and configuration, how to upload static and media files from Django to S3, and how to serve the files from S3 when people visit your site.

S3 Bucket Access

We'll assume that you've got access to an S3 account, and a user with the permissions you'll need.

The first thing to consider is that, while I might be using my dpoirier userid to set this up, I probably don't want our web site using my dpoirier userid permanently. If someone was able to break into the site and get the credentials, I wouldn't want them to have access to everything I own. Or if I left Caktus (unthinkable though that is), someone else might need to be able to manage the resources on S3.

What we'll do is set up a separate AWS user, with the necessary permissions to run the site, but no more, and then have the web site use that user instead of your own.

  • Create the bucket.
  • Create a new user: Go to AWS IAM. Click "Create new users" and follow the prompts. Leave "Generate an access key for each User" selected.
  • Get the credentials
  • Go to the new user's Security Credentials tab.
  • Click "Manage access keys",
  • Download the credentials for the access key that was created, and
  • Save them somewhere because no one will ever be able to download them again.
  • (Though it's easy enough to create a new access key if you lose the old one's secret key.)
  • Get the new user's ARN (Amazon Resource Name) by going to the user's Summary tab. It'll look like this: "arn:aws:iam::123456789012:user/someusername"
  • Go to the bucket properties in the S3 management console.
  • Add a bucket policy that looks like this, but change "BUCKET-NAME" to the bucket name, and "USER-ARN" to your new user's ARN. The first statement makes the contents publicly readable (so you can serve the files on the web), and the second grants full access to the bucket and its contents to the specified user::

    {
        "Statement": [
            {
              "Sid":"PublicReadForGetBucketObjects",
              "Effect":"Allow",
              "Principal": {
                    "AWS": "*"
                 },
              "Action":["s3:GetObject"],
              "Resource":["arn:aws:s3:::BUCKET-NAME/*"
              ]
            },
            {
                "Action": "s3:*",
                "Effect": "Allow",
                "Resource": [
                    "arn:aws:s3:::BUCKET-NAME",
                    "arn:aws:s3:::BUCKET-NAME/*"
                ],
                "Principal": {
                    "AWS": [
                        "USER-ARN"
                    ]
                }
            }
        ]
    }
    
  • If you need to add limited permissions for another user to do things with this bucket, you can add more statements. For example, if you want another user to be able to copy all the content from this bucket to another bucket:

        {
            "Action": "s3:ListBucket",
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::BUCKET-NAME",
            "Principal": {
                "AWS": [
                    "USER-ARN"
                ]
            }
        }
    

That will let the user list the objects in the bucket. The bucket was already publicly readable, but not listable, so adding this permission will let the user sync from this bucket to another one where the user has full permissions.

Expected results:

  • The site can use the access key ID and secret key associated with the user's access key to access the bucket
  • The site will be able to do anything with that bucket
  • The site will not be able to do anything outside that bucket

S3 for Django static files

The simplest case is just using S3 to serve your static files. In Django, we say "static files" to refer to the fixed files that we provide and serve as part of our site - typically images, css, and javascript, and maybe some static HTML files. Static files do not include any files that might be uploaded by users of the site. We call those "media files".

Before continuing, you should be familiar with managing static files, the staticfiles app, and deploying static files in Django.

Also, your templates should never hard-code the URL path of your static files. Use the static tag instead:

      {% load static from staticfiles %}
      <img src="{% static 'images/rooster.png' %}"/>

That will use whatever the appropriate method is to figure out the right URL for your static files.

The two static tags

Django provides two template tags named static.

The first static is in the static templatetags library, and accessed using {% load static %}. It just puts the value of STATIC_URL in front of the path.

The one from staticfiles ({% load static from staticfiles %}) is smarter - it uses whatever storage class you've configured for static files to come up with the URL.

By using the one from staticfiles from the start, you'll be prepared for any storage class you might decide to use in the future.

Moving your static files to S3

In order for your static files to be served from S3 instead of your own server, you need to arrange for two things to happen:

  1. When you serve pages, any links in the pages to your static files should point at their location on S3 instead of your own server.
  2. Your static files are on S3 and accessible to the web site's users.

Part 1 is easy if you've been careful not to hardcode static file paths in your templates. Just change STATICFILES_STORAGE in your settings.

But you still need to get your files onto S3, and keep them up to date. You could do that by running collectstatic locally, and using some standalone tool to sync the collected static files to S3, at each deploy. But we won't be able to get away with such a simple solution for media files, so we might as well go ahead and set up the custom Django storage we'll need now, and then our collectstatic will copy the files up to S3 for us.

To start, install two Python packages: django-storages (yes, that's "storages" with an "S" on the end), and boto:

    $ pip install django-storages boto

Add 'storages' to INSTALLED_APPS:

    INSTALLED_APPS = (
          ...,
          'storages',
     )

If you want (optional), add this to your common settings:

    AWS_HEADERS = {  # see http://developer.yahoo.com/performance/rules.html#expires
        'Expires': 'Thu, 31 Dec 2099 20:00:00 GMT',
        'Cache-Control': 'max-age=94608000',
    }

That will tell boto that when it uploads files to S3, it should set properties on them so that when S3 serves them, it'll include those HTTP headers in the response. Those HTTP headers in turn will tell browsers that they can cache these files for a very long time.

Now, add this to your settings, changing the first three values as appropriate:

    AWS_STORAGE_BUCKET_NAME = 'BUCKET_NAME'
    AWS_ACCESS_KEY_ID = 'xxxxxxxxxxxxxxxxxxxx'
    AWS_SECRET_ACCESS_KEY = 'yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy'

    # Tell django-storages that when coming up with the URL for an item in S3 storage, keep
    # it simple - just use this domain plus the path. (If this isn't set, things get complicated).
    # This controls how the `static` template tag from `staticfiles` gets expanded, if you're using it.
    # We also use it in the next setting.
    AWS_S3_CUSTOM_DOMAIN = '%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME

    # This is used by the `static` template tag from `static`, if you're using that. Or if anything else
    # refers directly to STATIC_URL. So it's safest to always set it.
    STATIC_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN

    # Tell the staticfiles app to use S3Boto storage when writing the collected static files (when
    # you run `collectstatic`).
    STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Only the first three lines should need to be changed for now.

CORS

One more thing you need to set up is CORS. CORS defines a way for client web applications that are loaded in one domain to interact with resources in a different domain. Since we're going to be serving our static files and media from a different domain, if you don't take CORS into account, you'll run into mysterious problems, like Firefox not using your custom fonts for no apparent reason.

Go to your S3 bucket properties, and under "Permissions", click on "Add CORS Configuration". Paste this in:

    <CORSConfiguration>
        <CORSRule>
            <AllowedOrigin>*</AllowedOrigin>
            <AllowedMethod>GET</AllowedMethod>
            <MaxAgeSeconds>3000</MaxAgeSeconds>
            <AllowedHeader>Authorization</AllowedHeader>
        </CORSRule>
    </CORSConfiguration>

I won't bother to explain this, since there are plenty of explanations on the web that you can Google for. The tricky part is knowing you need to add CORS in the first place.

Try it

With this all set up, you should be able to upload your static files to S3 using collectstatic:

    python manage.py collectstatic

If you see any errors, double-check all the steps above.

Once that's successful, you should be able to start your test site and view some pages. Look at the page source and you should see that the images, css, and javascript are being loaded from S3 instead of your own server. Any media files should still be served as before.

Don't put this into production quite yet, though. We still have some changes to make to how we're doing this.

Moving Media Files to S3

Reminder: Django "media" files are files that have been uploaded by web site users, that then need to be served from your site. One example is a user avatar (an image the user uploads and the site displays with the user's information).

Media files are typically managed using FileField and ImageField fields on models. In a template, you use the url attribute on the file field to get the URL of the underlying file.

For example, if user.avatar is an ImageField on your user model, then

    <img src="{{ user.avatar.url }}">

would embed the user's avatar image in the web page.

By default, when a file is uploaded using a FileField or ImageField, it is saved to a file on a path inside the local directory named by MEDIA_ROOT, under a subdirectory named by the field's upload_to value. When the file's url attribute is accessed, it returns the value of MEDIA_URL, prepended to the file's path inside MEDIA_ROOT.

An example might help. Suppose we have these settings:

    MEDIA_ROOT = '/var/media/'
    MEDIA_URL = 'http://media.example.com/'

and this is part of our user model:

    avatar = models.ImageField(upload_to='avatars')

When a user uploads an avatar image, it might be saved as /var/media/avatars/12345.png. Then <img src="{{ user.avatar.url }}"> would expand to <img src="http://media.example.com/avatars/12345.png">.

Our goal is instead of saving those files to a local directory, to send them to S3. Then instead of having to serve them somehow locally, we can let Amazon serve them for us.

Another advantage of using S3 for media files is if you scale up by adding more servers, this makes uploaded images available on all servers at once.

Configuring Django media to use S3

Ideally, we'd be able to start putting new media files on S3 just by adding this to our settings:

    # DO NOT DO THIS!
    MEDIA_URL = "https://%s/" % AWS_S3_CUSTOM_DOMAIN
    DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

Adding those settings would indeed tell Django to save uploaded files to our S3 bucket, and use our S3 URL to link to them.

Unfortunately, this would store our media files on top of our static files, which we're already keeping in our S3 bucket. If we were careful to always set upload_to on our FileFields to directory names that would never occur in our static files, we might get away with it (though I'm not sure Django would even let us). But we can do better.

What we want to do is either enforce storing our static files and media files in different subdirectories of our bucket, or use two different buckets. I'll show how to use the different paths first.

In order for our STATICFILES_STORAGE to have different settings from our DEFAULT_FILE_STORAGE, they need to use two different storage classes; there's no way to configure anything more fine-grained. So, we'll start by creating a custom storage class for our static file storage, by subclassing S3BotoStorage. We'll also define a new setting, so we don't have to hard-code the path in our Python code:

    # custom_storages.py
    from django.conf import settings
    from storages.backends.s3boto import S3BotoStorage

    class StaticStorage(S3BotoStorage):
        location = settings.STATICFILES_LOCATION

Then in our settings:

    STATICFILES_LOCATION = 'static'
    STATICFILES_STORAGE = 'custom_storages.StaticStorage'
    STATIC_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, STATICFILES_LOCATION)

Giving our class a location attribute of 'static' will put all our files into paths on S3 starting with 'static/'.

You should be able to run collectstatic again, restart your site, and now all your static files should have '/static/' in their URLs. Now delete from your S3 bucket any files outside of '/static' (using the S3 console, or whatever tool you like).

We can do something very similar now for media files, adding another storage class:

    class MediaStorage(S3BotoStorage):
        location = settings.MEDIAFILES_LOCATION

and in settings:

    MEDIAFILES_LOCATION = 'media'
    MEDIA_URL = "https://%s/%s/" % (AWS_S3_CUSTOM_DOMAIN, MEDIAFILES_LOCATION)
    DEFAULT_FILE_STORAGE = 'custom_storages.MediaStorage'

Now when a user uploads their avatar, it should go into '/media/' in our S3 bucket. When we display the image on a page, the image URL will include '/media/'.

Using different buckets

You can use different buckets for static and media files by adding a bucket_name attribute to your custom storage classes. You can see the whole list of attributes you can set by looking at the source for S3BotoStorage.

Moving an existing site's media to S3

If your site already has user-uploaded files in a local directory, you'll need to copy them up to your media directory on S3. There are lots of tools these days for doing this kind of thing. If the command line is your thing, try the AWS CLI tools from Amazon. They worked okay for me.

Summary

Serving your static and media files from S3 requires getting a lot of different parts working together. But it's worthwhile for a number of reasons:

  • S3 can probably serve your files more efficiently than your own server.
  • Using S3 saves the resources of your own server for more important work.
  • Having media files on S3 allows easier scaling by replicating your servers.
  • Once your files are on S3, you're well on the way to using CloudFront to serve them even more efficiently using Amazon's CDN service.

Caktus GroupWebcast: Creating Enriching Web Applications with Django and Backbone.js

Update: The live webcast is now available at O'Reilly Media

Our technical director, Mark Lavin, will be giving a tutorial on Django and Backbone.js during a free webcast for O’Reilly Media tomorrow, November 6th, 1pm EST. There will be demos and a discussion of common stumbling blocks when building rich client apps.

Register today!

Here’s a description of his talk:

"Django and Backbone are two of the most popular frameworks for web backends and frontends respectively and this webcast will talk about how to use them together effectively. During the session we'll build a simple REST API with Django and connect it to a single page application built with Backbone. This will examine the separation of client and server responsibilities. We'll dive into the differences between client-side and server-side routing and other stumbling blocks that developers encounter when trying to build rich client applications.

If you're familiar with Python/Django but unfamiliar with Javascript frameworks, you'll get some useful ideas and examples on how to start integrating the two. If you're a Backbone guru but not comfortable working on the server, you'll learn how the MVC concepts you know from Backbone can translate to building a Django application."

Update: The live webcast is now available at O'Reilly Media

Tim HopperSundry Links for November 3, 2014

Public Data Sets : Amazon Web Services: Amazon hosts a number of publicly datasets on AWS (including the common crawl corpus and the "Marvel Universe Social Graph").

Rapid Web Prototyping with Lightweight Tools: I've shared this before, but my boss Andrew did a fantastic tutorial last year on Flask, Jinja2, MongoDB, and Twitter Bootstrap. Combined with Heroku, it's surprisingly easy to get a website running these days.

rest_toolkit: REST has been my obsession of late. Here's a little Python package for quickly writing RESTful APIs.

The Joys of the Craft: A quote from Fred Brooks' The Mythical Man-Month on why programming is fun.

How do I use pushd and popd commands?: I recently learned bash has push and popd commands for temporarily changing directories. This is very handy for scripting.

Tim HopperSundry Links for November 1, 2014

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!): I guess the title says it all. By Joel Spolsky.

Unix Shells - Hyperpolyglot: Very cool comparison of basic command syntax in Bash, Fish, Ksh, Tcsh, and Zsh.

Better Bash history: I'm pretty stuck on Bash at the moment. Here's a way to get a better history in Bash. (Other shells often improve on Bash's history.)

usaddress 0.1: I always love seeing a Python library for something I've tried to do poorly on my own: "usaddress is a python library for parsing unstructured address strings into address components, using advanced NLP methods."

more-itertools: A great extension to the helpful itertools module in Python. Some particularly helpful functions: chunked, first, peekaboo, and take. Unfortunately, it doesn't have Python 3 support at the moment.

Tim HopperPyspark's AggregateByKey Method

I can't find a (py)Spark aggregateByKey example anywhere, so I made a quick one.

Tim HopperSundry Links for September 30, 2014

Hammock: A lightweight wrapper around the Python requests module to convert REST APIs into "dead simple programmatic APIs". It's a clever idea. I'll have to play around with it before I can come up with a firm opinion.

pipsi: Pipsi wraps pip and virtualenv to allow you to install Python command line utilities without polluting your global environment.

Writing a Command-Line Tool in Python: Speaking of Python command line utilities, here's a little post from Vincent Driessen on writing them.

Iterables vs. Iterators vs. Generators: Vincent has been on a roll lately. He also wrote this "little pocket reference on iterables, iterators and generators" in Python.

Design for Continuous Experimentation: Talk and Slides: I didn't watch the lecture, but Dan McKinley's slides on web experimentation are excellent.

Apache Spark: A Delight for Developers: I've been playing with PySpark lately, and it really is fun.

Caktus GroupCelery in Production

(Thanks to Mark Lavin for significant contributions to this post.)

In a previous post, we introduced using Celery to schedule tasks.

In this post, we address things you might need to consider when planning how to deploy Celery in production.

At Caktus, we've made use of Celery in a number of projects ranging from simple tasks to send emails or create image thumbnails out of band to complex workflows to catalog and process large (10+ Gb) files for encryption and remote archival and retrieval. Celery has a number of advanced features (task chains, task routing, auto-scaling) to fit most task workflow needs.

Simple Setup

A simple Celery stack would contain a single queue and a single worker which processes all of the tasks as well as schedules any periodic tasks. Running the worker would be done with

python manage.py celery worker -B

This is assuming using the django-celery integration, but there are plenty of docs on running the worker (locally as well as daemonized). We typically use supervisord, for which there is an example configuration, but init.d, upstart, runit, or god are all viable alternatives.

The -B option runs the scheduler for any periodic tasks. It can also be run as its own process. See starting-the-scheduler.

We use RabbitMQ as the broker, and in this simple stack we would store the results in our Django database or simply ignore all of the results.

Large Setup

In a large setup we would make a few changes. Here we would use multiple queues so that we can prioritize tasks, and for each queue, we would have a dedicated worker running with the appropriate level of concurrency. The docs have more information on task routing.

The beat process would also be broken out into its own process.

# Default queue
python manage.py celery worker -Q celery
# High priority queue. 10 workers
python manage.py celery worker -Q high -c 10
# Low priority queue. 2 workers
python manage.py celery worker -Q low -c 2
# Beat process
python manage.py celery beat

Note that high and low are just names for our queues, and don't have any implicit meaning to Celery. We allow the high queue to use more resources by giving it a higher concurrency setting.

Again, supervisor would manage the daemonization and group the processes so that they can all be restarted together. RabbitMQ is still the broker of choice. With the additional task throughput, the task results would be stored in something with high write speed: Memcached or Redis. If needed, these worker processes can be moved to separate servers, but they would have a shared broker and results store.

Scaling Features

Creating additional workers isn't free. The default concurrency uses a new process for each worker and creates a worker per CPU. Pushing the concurrency far above the number of CPUs can quickly pin the memory and CPU resources on the server.

For I/O heavy tasks, you can dedicate workers using either the gevent or eventlet pools rather than new processes. These can have a lower memory footprint with greater concurrency but are both based on greenlets and cooperative multi-tasking. If there is a library which is not properly patched or greenlet safe, it can block all tasks.

There are some notes on using eventlet, though we have primarily used gevent. Not all of the features are available on all of the pools (time limits, auto-scaling, built-in rate limiting). Previously gevent seemed to be the better supported secondary pool, but eventlet seems to have closed that gap or surpassed it.

The process and gevent pools can also auto-scale. It is less relevant for the gevent pool since the greenlets are much lighter weight. As noted in the docs, you can implement your own subclass of the Autoscaler to adjust how/when workers are added or removed from the pool.

Common Patterns

Task state and coordination is a complex problem. There are no magic solutions whether you are using Celery or your own task framework. The Celery docs have some good best practices which have served us well.

Tasks must assert the state they expect when they are picked up by the worker. You won't know how much time has passed since the original task was queued and when it executes. Another similar task might have already carried out the operation if there is a backlog.

We make use of a shared cache (Memcache/Redis) to implement task locks or rate limits. This is typically done via a decorator on the task. One example is given in the docs though it is not written as a decorator.

Key Choices

When getting started with Celery you must make two main choices:

  • Broker
  • Result store

The broker manages pending tasks, while the result store stores the results of completed tasks.

There is a comparison of the various brokers in the docs.

As previously noted, we use RabbitMQ almost exclusively, though we have used Redis successfully and experimented with SQS. We prefer RabbitMQ because Celery's message passing style and much of the terminology was written with AMQP in mind. There are no caveats with RabbitMQ like there are with Redis, SQS, or the other brokers which have to emulate AMQP features.

The major caveat with both Redis and SQS is the lack of built-in late acknowledgment, which requires a visibility timeout setting. This can be important when you have long running tasks. See acks-late-vs-retry.

To configure the broker, use BROKER_URL.

For the result store, you will need some kind of database. A SQL database can work fine, but using a key-value store can help take the load off of the database, as well as provide easier expiration of old results which are no longer needed. Many people choose to use Redis because it makes a great result store, a great cache server and a solid broker. AMQP backends like RabbitMQ are terrible result stores and should never be used for that, even though Celery supports it.

Results that are not needed should be ignored, using CELERY_IGNORE_RESULT or Task.ignore_result.

To configure the result store, use CELERY_RESULT_BACKEND.

RabbitMQ in production

When using RabbitMQ in production, one thing you'll want to consider is memory usage.

With its default settings, RabbitMQ will use up to 40% of the system memory before it begins to throttle, and even then can use much more memory. If RabbitMQ is sharing the system with other services, or you are running multiple RabbitMQ instances, you'll want to change those settings. Read the linked page for details.

Transactions and Django

You should be aware that Django's default handling of transactions can be different depending on whether your code is running in a web request or not. Furthermore, Django's transaction handling changed significantly between versions 1.5 and 1.6. There's not room here to go into detail, but you should review the documentation of transaction handling in your version of Django, and consider carefully how it might affect your tasks.

Monitoring

There are multiple tools available for keeping track of your queues and tasks. I suggest you try some and see which work best for you.

Summary

When going to production with your site that uses Celery, there are a number of decisions to be made that could be glossed over during development. In this post, we've tried to review some of the decisions that need to be thought about, and some factors that should be considered.

Tim HopperiOS's Launch Center Pro, Auphonic, and RESTful APIs

Lately I've been using Auphonic's web service for automating audio post-production and distribution. You can provide Auphonic with an audio file (via Dropbox, FTP, web upload, and more), and it will perform any number of tasks for you, including

  • Tag the track with metadata (including chapter markings)
  • Intelligently adjust levels
  • Normalize loudness
  • Reduce background noise and hums
  • Encode the audio in numerous formats
  • Export the final production to a number of services (including Dropbox, FTP, and Soundcloud)

I am very pleased with Auphonic's product, and it's replaced a lot of post-processing tools I tediously hacked together with Hazel, Bash, and Python.

Among its many other features, API has a robust RESTful API available to all users. I routinely create Auphoic productions that vary only in basic metadata, and I have started using this API to automate creation of productions from my iPhone.

Launch Center Pro is a customizable iOS app that can trigger all kinds of actions in other apps. You can also create input forms in LCP and send the data from them elsewhere. I created a LCP action with a form for entering title, artist, album, and track metadata that will eventually end up in a new Auphonic production.

The LCP action settings looks like this 2:

When I launch that action in LCP, I get four prompts like this:

After I fill out the four text fields, LCP uses the x-callback URL I defined to send that data to Pythonista, a masterful "integrated development environment for writing Python scripts on iOS."

In Pythonista, I have a script called New Production. LCP passes the four metadata fields I entered as sys.argv variables to my Python script. The Python script adds these variables to a metadata dictionary that it then POSTs to the Auphonic API using the Python requests library. After briefly displaying the output from the Auphonic API, Pythonista returns me to LCP.

Here's my Pythonista script1:

username = "AUPHONIC_USERNAME"
password = "AUPHONIC_PASSWORD

import sys
import requests
import webbrowser
import time
import json
import datetime as dt

# Read input from LCP
title = sys.argv[1]
artist = sys.argv[2]
album = sys.argv[3]
track = sys.argv[4]

d = {
        "metadata": {
            "title": title,
            "artist": artist,
            "album": album,
            "track": track
            }
    }

# POST production to Auphonic API
r = requests.post("https://auphonic.com/api/productions.json",
          auth=(username,password),
          data=json.dumps(d),
          headers={"content-type": "application/json"}).json()

# Display API Response
print "Response", r["status_code"]
print "Error:", r["error_message"]
for key, value in r.get("data",{}).get("metadata",{}).iteritems():
if value:
    print key, ':', value

time.sleep(2)
# Return to LCP
webbrowser.open("launch://")

After firing my LCP action, I can log into my Auphonic account and see a incomplete3 production with the metadata I entered!

While I just specified some basic metadata with the API, Auphonic allows every parameter that can be set on the web client to be configured through the API. For example, you can specify exactly what output files you want Auphonic to create or create a production using one of your presets. These details just needed to be added to the d dictionary in the script above. Moreover, this same type of setup could be used with any RESTful API, not just Auphonic.


  1. If you want to use this script, you'll have to provide your own Auphonic username and password. 

  2. Here is that x-callback URL of you want to copy it: pythonista://{{New Production}}?action=run&args={{"[prompt:Title]" "[prompt:Artist]" "[prompt:Album]" "[prompt:Track]"}} 

  3. It doesn't have an audio file and hasn't been submitted. 

Tim HopperSundry Links for September 25, 2014

Philosophy of Statistics (Stanford Encyclopedia of Philosophy): I suspect that a lot of the Bayesian vs Frequentist debates ignore the important epistemological underpinnings of statistics. I haven’t finished reading this yet, but I wonder if it might help.

Connect Sunlight Foundation to anything: “The Sunlight Foundation is a nonpartisan non-profit organization that uses the power of the Internet to catalyze greater U.S. Government openness and transparency.” They now of an IFTTT channel. Get push notifications when the president signs a bill!

furbo.org · The Terminal: Craig Hockenberry wrote a massive post on how he uses the Terminal on OS X for fun and profit. You will learn things.

A sneak peek at Camera+ 6… manual controls are coming soon to you! : I’ve been a Camera+ user on iOS for a long time. The new version coming out soon is very exciting.

GitHut - Programming Languages and GitHub: A very clever visualization of various languages represented on Github and of the properties of their respective repositories.

Og MacielBooks

Woke up this morning and, as usual, sat down to read the Books section of The New York Times while drinking my coffee. This has become sort of a ‘tradition’ for me and because of it I have been able to learn about many interesting books, some of which I would not have found out on my own. I also ‘blame’ this activity to turning my nightstand into a mini-library on its own.

Currently I have the following books waiting for me:

Anyhow, while drinking my coffee this morning I realized just how much I enjoy reading and (what I like to call) catching up with all the books I either read when I was younger but took for granted or finally getting to those books that have been so patiently waiting for me to get to them. And now, whenever I’m not working or with my kids, you can bet your bottom dollar that you’ll find me somewhere outside (when the mosquitos are not buzzing about the yard) or cozily nestled with a book (or two) somewhere quiet around the house.

Book Queue

But to the point of this story, today I realized that, if I could go back in time (which reminds me, I should probably add “The Time Machine” to my list) to the days when I was looking to buy a house, I would have done two things differently:

  1. wire the entire house so that every room would have a couple of ethernet ports;
  2. chosen a house with a large-ish room and add wall-to-wall bookcases, like you see in those movies where a well-off person takes their guests into their private libraries for tea and biscuits;

I realize that I can’t change the past, and I also realize that perhaps it is a good thing that I took my book reading for granted during my high school and university years… I don’t think I would have enjoyed reading “Dandelion Wine” or “Mrs. Dalloway” as much back then as I when I finally did. I guess reading books is very much like the process of making good wines… with age and experience, the reader, not the book, develops the maturity and ability to properly savor a good story.

Tim HopperSundry Links for September 20, 2014

Open Sourcing a Python Project the Right Way: Great stuff that should be taught in school: “Most Python developers have written at least one tool, script, library or framework that others would find useful. My goal in this article is to make the process of open-sourcing existing Python code as clear and painless as possible.”

elasticsearch/elasticsearch-dsl-py: Elasticsearch is an incredible datastore. Unfortunately, its JSON-based query language is tedious, at best. Here’s a nice higher-level Python DSL being developed for it. It’s great!

Equipment Guide — The Podcasting Handbook: Dan Benjamin of 5by5 podcasting fame is writing a book on podcasting. Here’s his brief equipment guide.

bachya/pinpress: Aaron Bach put together a neat Ruby script that he uses to generate his link posts. This is similar to but better than my sundry tool.

Markdown Resume Builder: I haven’t tried this yet, but I like the idea: a Markdown based resume format that can be converted into HTML or PDF.

Git - Tips and Tricks: Enabling autocomplete in Git is something I should have done long ago.

Apache Storm Design Pattern—Micro Batching: Micro batching is a valuable tool when doing stream processing. Horton Works put up a helpful post outlining three ways of doing it.

Caktus GroupImproving Infant and Maternal Health in Rwanda and Zambia with RapidSMS

Image courtesy of UNICEF, the funders of this project.

I have had the good fortune of working internationally on mobile health applications due to Caktus' focus on public health. Our public health work often uses RapidSMS, a free and open-source Django powered framework for dynamic data collection, logistics coordination and communication, leveraging basic short message service (SMS) mobile phone technology. I was able to work on two separate projects tracking data related to the 1000 days between a woman’s pregnancy and the child’s second birthday. Monitoring mothers and children during this time frame is critical as there are many factors that, when monitored properly, can decrease the mortality rates for both mother and child. Both of these projects presented interesting challenges and resulted in a number of takeaways worth further discussion.

Zambia

The first trip took me to Lusaka, the capitol of Zambia, to work on Saving Mothers Giving Life (SMGL) which is administered by the Zambia Center for Applied Health Research and Development (ZCAHRD) office. The ZCAHRD office had recently finished a pilot phase resulting in a number of additional requirements to implement before expanding the project. In addition to feature development and bug fixes, training a local developer was on the docket.

SMGL collects maternal and fetal/child data via SMS text messages.  When an SMS is received by the application, the message is parsed and routed for additional processing based on matching known keywords. For example, I could have a BirthHandler KeywordHandler that allows the application to track new births. Any message that begins with the keyword birth would be further processed by BirthHandler. KeywordHandlers must have, at a minimum, a defined keyword, help and handler functionality:

from rapidsms.contrib.handlers import KeywordHandler

class BirthHandler(KeywordHandler): 
    def help(self): 
        self.respond("Send BIRTH BOY or BIRTH GIRL.") 

    def handle(self, text): 
        if text.upper() == "BOY": 
            self.respond("A boy was born!") 
        elif text.upper() == "GIRL":
            self.respond("A girl was born!")
        self.help()

An example session:

 > birth 
 < Send BIRTH BOY or birth GIRL. 
 > birth boy 
 < A boy was born! 
 > birth girl
 < A girl was born!
 > birth pizza
 < Send BIRTH BOY or BIRTH GIRL.

New Keyword Handlers

The new syphilis keyword handler would allow clinicians to track a mother’s testing and treatment data. For our handler, a user supplies the SYP keyword, mother id, the date of the visit followed by the test result indicator or shot series and an optional next shot date:

  SYP ID DAY MONTH YEAR P/N/S[1-3] NEXT_DAY NEXT_MONTH NEXT_YEAR

To record a positive syphillis test result on January 1, 2013 for mother #1 with a next shot data of January 2, 2013, the following SMS would be sent:

  SYP 1 01 01 2013 P 02 01 2013

With these records in hand, the system’s periodic reminder application will send notifications to remind patients of their next visit. Similar functionality exists for tracking pre- and post-natal visits.

The other major feature implemented for this phase was a referral workflow.  It is critical for personnel at facilities ranging from the rural village to the district hospital to be aware of incoming patients with both emergent and non-emergent needs, as the reaction to each case differs greatly.  The format for SMGL referrals is as follows:

  REFER ID FACILITY_ID REASON TIME EM/NEM

To refer mother #1 who is bleeding to facility #100 and requires emergency care:

  REFER 1 100 B 1200 EM

Based on the receiving facility and the reason as well as the emergent indicator differing people will be notified of the case. Emergent cases require dispatching ambulances, prepping receiving facilities and other essential components to increase the survivability for the mother and/or child, whereas non-emergent cases may only require clinical workers to be made aware of an inbound patient.

Reporting

The reporting tools were fairly straightforward, creating web based views for each keyword handler that presented the data in filterable, sortable, tabular format. In addition, end users can export the data as a spreadsheet for further analysis.  These views allow clinicians, researchers, and other stakeholders easily accessible metrics to analyze the efficacy of the system as a whole.



Training

As mentioned earlier, training a local developer was also a core component of this visit.  This person was the office’s jack of all trades for all things technical, from network and systems administration to shuttling around important data on thumb drives. Given his limited exposure to Python, we spent most of the time in country pair programming, talking through the model-view-template architecture and finding bite sized tasks for him to work through when not pair programming.

Zambia Takeaways:

  • It was relatively straightfoward to write one off views and exporters for the keyword handlers. But, as the number of handlers increases for the project, this functionality could benefit from abstracting into a generic DRY reporting tool.
  • When training, advocate that the participant has either 100% of his time allocated or draw up designated blocks of time during the day. The ad hoc schedule we worked with was not as fruitful as it could have been, as competing responsibilities often took precedence over actual Django/RapidSMS training.
  • If in Zambia, there are two requisite weekend trips: Victoria Falls and South Luangwa National Park . Visitors to Zambia do themselves a great disservice to not schedule trips to both areas.

Off to Rwanda!

UNICEF  recognized that many countries were working on solving the same problem, monitoring the patients and capturing the data from those all important first 1000 Days.  A 1000 Days initiative was put forward, whereby countries would contribute resources and code to a single open source platform that all countries could deploy independently. Evan Wheeler, a UNICEF Project Manager contacted Caktus about contributing to this project.

We were tasked with building three RapidSMS components of the 1000 Days architecture: an appointment application, a patient/provider API for storing and accessing records from different backends, and a nutrition monitoring application.  We would flesh out these applications before our in country visit to Kigali, Rwanda. While there, working closely with Evan and our in country counterparts, we would finish the initial versions of these applications as well as orient the local development team to the future direction of the 1000 Days deployment.

rapidsms-appointments  allows users to subscribe to a series of appointments based on a timeline of configurable milestones. Appointment reminders are sent out to patient/staff, and there are mechanisms for confirming, rescheduling, and tracking missed/made appointments. The intent of this application was to create an engine for generating keyword handlers based on appointments. Rather than having to write code for each individual timeline based series (pre- and post-natal mother visits, for example), one could simply configure these through the admin panel. The project overview documentation provides a great entry point.

rapidsms-healthcare obviates the need for countries’ to track patient/provider data in multiple databases. Many countries utilize 3rd party datastores, such as OpenMRS , to create a medical records system. With rapidsms-healthcare in 1000 Days, deployments can take advantage of pre-existing patient & provider data by utilizing a default healthcare storage backend, or creating a custom backend for their existent datastore. Additional applications can then utilize the healthcare API to access patients and providers.

rapidsms-nutrition is an example of such a library.  It will consume patient data from the healthcare API and monitor child growth, generating statistical assessments based on WHO Child Growth Standards. It utilizes the pygrowup library. With this data in hand, it is relatively easy to create useful visualizations with a library such as d3.js.

Rwanda Takeaways

  • Rwanda is gorgeous.  We had an amazing time in Kigali and at Lake Kivu, one of three EXPLODING LAKES in the world.

No report on Africa would be complete without a few pictures...enjoy!!

 

Tim HopperQuickly Converting Python Dict to JSON

Recently, I've spent a lot of time going back and forth between Python dicts and JSON. For some reason, I decided last week that I'd be useful to be able to quickly convert a Python dict to pretty printed JSON.

I created a TextExpander snippet that takes a Python dict from the clipboard, converts it to JSON, and pastes it.

Here are the details:

#!/usr/bin/env python
import os, json
import subprocess

def getClipboardData():
 p = subprocess.Popen(['pbpaste'], stdout=subprocess.PIPE)
 retcode = p.wait()
 data = p.stdout.read()
 return data

cb = eval(getClipboardData())

print json.dumps(cb, sort_keys=True, indent=4, separators=(',', ': '))

Caktus GroupQ3 Charitable Giving

Our client social impact projects continue here at Caktus, with work presently being done in Libya, Nigeria, Syria, Turkey, Iraq and the US. But every quarter, we pause to consider the excellent nonprofits that our employees volunteer for and, new this quarter, that they have identified as having a substantive influence on their lives. The following list represents employee-nominated nonprofits which we are giving to in alphabetical order:

Animal Protection Society of Durham

apsofdurham.org
The Animal Protection Society of Durham (APS) is a non-profit organization that has been helping animals in our community since 1970, and has managed the Durham County Animal Shelter since 1990. IAPS feeds, shelters and provides medical attention for nearly 7,000 stray, surrendered, abandoned, abused and neglected animals annually.

The Carrack

thecarrack.org
The Carrack is owned and run by the community, for the community, and maintains an indiscriminate open forum that enables local artists to perform and exhibit outside of the constraints of traditional gallery models, giving the artist complete creative freedom.

Scrap Exchange

scrapexchange.org
The Scrap Exchange is a nonprofit creative reuse center in Durham, North Carolina whose mission is to promote creativity, and environmental awareness. The Scrap Exchange provides a sustainable supply of high-quality, low-cost materials for artists, educators, parents, and other creative people.

Society for the Prevention of Cruelty to Animals - San Francisco

sfspca.org
As the fourth oldest humane society in the U.S. and the founders of the No-Kill movement, the SF SPCA has always been at the forefront of animal welfare. SPCA SF’s animal shelter provides pets for adoption.

Southern Coalition for Social Justice

southerncoalition.org
The Southern Coalition for Social Justice was founded in Durham, North Carolina by a multidisciplinary group, predominantly people of color, who believe that families and communities engaged in social justice struggles need a team of lawyers, social scientists, community organizers and media specialists to support them in their efforts to dismantle structural racism and oppression.

Tim HopperSundry Links for September 10, 2014

textract: textract is a Python module and a command line tool for text extraction from many file formats. It cleverly pulls together many libraries into a consistent API.

Flask Kit: I've been reading a lot about Flask (the Python web server) lately. Flask Kit is a little tool to give some structure to new Flask projects.

cookiecutter: I was looking for this recently, but it I couldn't find it. "A command-line utility that creates projects from cookiecutters (project templates). E.g. Python package projects, jQuery plugin projects." There's even a Flask template!

Over 50? You Probably Prefer Negative Stories About Young People: A research paper from a few years ago show that older people prefer to read negative news about young people. "In fact, older readers who chose to read negative stories about young individuals actually get a small boost in their self-esteem."

Episode 564: The Signature: The fantastic Planet Money podcast explains why signatures are meaningless in a modern age. My scribbles have become even worse since listening to this.

github-selfies: Here's a Chrome and Firefox extension that allows you to quickly embed gif selfies in Github posts. Caution: may lead to improved team morale.

Caktus GroupDjangoCon 2014: Recap

Caktus had a great time at DjangoCon in Portland this year! We met up with old friends and new. The following staff gave talks (we’ll update this post with videos as soon as they’re available):

We helped design the website, so it was gratifying seeing the hard work of our design team displayed on the program ad and various points throughout the conference.

For fellow attendees, you probably noticed our giant inflatable duck, who came out in support of Duckling, our conference outings app. He told us he had a good time too.

Here’s some pictures of our team at DjangoCon:


Update with Caktus DjangoCon talk video links Anatomy of a Django Project REST: It’s Not Just for Servers * What is the Django Admin Good For?

Tim HopperTracking Weight Loss with R, Hazel, Withings, and IFTTT

As I have noted before, body weight is a noisy thing. Day to day, your weight will probably fluctuate by several pounds. If you're trying to lose weight, this noise can cause unfounded frustration and premature excitement.

When I started a serious weight loss plan a year and a half ago, I bought a wifi-enabled Withings Scale. The scale allows me to automatically sync my weight with Montior Your Weight, MyFitnessPal, RunKeeper, and other fitness apps on my phone. IFTTT also has great Withings support allowing me to push my weight to various other web services.

One IFTTT rule I have appends my weight to a text file in Dropbox. This file looks like this:

263.86 August 21, 2014 at 05:56AM
264.62 August 22, 2014 at 08:27AM
264.56 August 23, 2014 at 09:41AM
263.99 August 24, 2014 at 08:02AM
265.64 August 25, 2014 at 08:08AM
267.4 August 26, 2014 at 08:16AM
265.25 August 27, 2014 at 09:08AM
264.17 August 28, 2014 at 07:21AM
264.03 August 29, 2014 at 08:43AM
262.71 August 30, 2014 at 08:47AM

For a few months, I have been experimenting with using this time series to give myself a less-noisy update on my weight, and I've come up with a decent solution.

This R script will take my weight time series, resample it, smooth it with a rolling median over the last month, and write summary stats to a text file in my Dropbox. It's not the prettiest script, but it gets the job done for now.1

INPUT_PATH <- "~/Dropbox/Text Notes/Weight.txt"
OUTPUT_PATH <- "~/Dropbox/Text Notes/Weight Stats.txt"

library(lubridate)
library(ggplot2)
library(zoo)

# READ FILE
con <- file(INPUT_PATH, "rt")
lines <- readLines(con)
close(con)

# PARSE INTO LISTS OF WEIGHTS AND DATES
parse.line <- function(line) {
  s <- strsplit(line, split=" ")[[1]]
  date.str <- paste(s[2:10][!is.na(s[2:10])], collapse=" ")
  date <- mdy_hm(date.str, quiet=TRUE)
  l <- list(as.numeric(s[1]), date)
  names(l) <- c("weight", "date")
  l
}
list.weight.date <- lapply(lines, parse.line)
weights <- lapply(list.weight.date, function(X) X$weight)
dates <- lapply(list.weight.date, function(X) X$date)

# BUILD DATA FRAME
df <- data.frame(weight = unlist(weights), date = do.call("c", dates) )

# CREATE TIME SERIES AND RESAMPLE
ts <- zoo(c(df$weight), df$date)
ts <- aggregate(ts, time(ts), tail, 1)
g <- round(seq(start(ts), end(ts), 60 * 60 * 24), "days")
ts <- na.approx(ts, xout = g)

# FUNCTION TO GET WEIGHT N-DAYS AGO IF WEIGHT IS SMOOTHED BY ROLLING MEDIAN
# OVER A GIVEN (smooth.n) NUMBER OF DAYS
days.ago <- function(days, smooth.n) {
  date <- head(tail(index(ts),days + 1),1)
  smoothed <- rollmedianr(ts, smooth.n)
  as.numeric(smoothed[date])
}

# SMOOTH WEIGHT BY 29 DAYS AND GENERATE SOME SUMMARY STATS
days = 29
current.weight <- days.ago(0, days)
x <- c(current.weight,
       current.weight-days.ago(7, days),
       current.weight-days.ago(30, days),
       current.weight-days.ago(365, days),
       current.weight-max(ts))
x = round(x, 1)
names(x) = c("current", "7days", "30days", "365days", "max")


fileConn<-file(OUTPUT_PATH)
w <- c(paste("Weight (lbs):", x["current"]),
       paste("Total Δ:", x["max"]),
       paste("1 Week Δ:", x["7days"]),
       paste("1 Month Δ:", x["30days"]),
       paste("1 Year Δ:", x["365days"]))
writeLines(w,fileConn)
close(fileConn)

The output looks something like this:

Weight (lbs): 265.7
Total Δ: -112
1 Week Δ: -0.8
1 Month Δ: -4.8
1 Year Δ: -75

I want this script to be run every time my weight is updated, so I created a second IFTTT rule that will create a new file in my Dropbox, called new_weight_measurement, every time I weigh in. On my Mac Mini, I have a Hazel rule to watch for a file of this name to be created. When Hazel sees the file, it runs my R script and deletes that file.

My Hazel rule looks like this:

The 'embedded script' that is run is the R script above; I just have to tell Hazel to use the Rscript shell.2

At this point, every time I step on my scale, a text file with readable statistics about my smoothed weight appear in my Dropbox folder.

Of course, I want this updated information to be pushed directly too me. Hazel is again the perfect tool for the job. I have a second Hazel rule that watches for Weight Stats.txt to be created. Hazel can pass the path of the updated file into any script of your choice. You could, for example, use Mailgun to email it to yourself or Pushover to push it to your mobile devices. Obviously, I want to tweet mine.

I have a Twitter account called @hopsfitness where I've recently been tracking my fitness progress. On my Mac Mini, I have t configured to access @hopsfitness from the command line. Thus, tweeting my updated statistics is just a matter of a little shell script executed by Hazel:

Since this data goes to Twitter, I can get it painlessly pushed to my phone: Twitter still allows you subscribe to accounts via text message, which I've done with @hopsfitness. A minute or so after I step on my scale, I get a text with useful information about where I am and where I'm going; this is much preferable to the noisy weight I see on my scale.

Update (2014-12-06): I replaced my R script with a Python/pandas script. It requires Python 3 (to render the delta characters).

import dateutil
import pandas as pd
import random
from os.path import expanduser, join
home = expanduser("~")


with open(join(home, "Dropbox/Text Notes/Weight.txt"), "r") as f:
    lines = f.readlines()


def parse_line(line):
    s = line.split(" ")
    weight = float(s[0])
    date = dateutil.parser.parse(' '.join(s[1:4]))
    return date, weight

weight = pd.DataFrame([parse_line(l) for l in lines], columns=["date", "weight"]) \
    .set_index("date") \
    .resample("1D", how="mean")
weight["missing"] = weight.weight.isnull()
weight.weight = weight.weight.interpolate(method="linear")
std = weight.weight.diff().dropna().std()
noise = weight.missing.map(lambda missing: random.normalvariate(0, std) if missing else 0)
weight.weight = weight.weight + noise

smoothed = pd.ewma(weight.weight, span=30)
current = smoothed[-1]
stats = """
Weight (lbs): %(weight).1f
Total Δ: %(total).1f
1 Week Δ: %(week).1f
1 Month Δ: %(month).1f
1 Year Δ: %(year).1f
""".strip() % {"weight": current,
               "total": current - smoothed[0],
               "week": current - smoothed[-8],
               "month": current - smoothed[-32],
               "year": current - smoothed[-366],
               }

with open(join(home, "Dropbox/Text Notes/Weight Stats.txt"), "wb") as f:
    f.write(bytes(stats, 'UTF-8'))

  1. This assumes your input file is formatted like mine, but you could easily adjust the first part of the code for other formats. 

  2. You can download R here; installing it should add Rscript to your system path. 

Tim HopperSundry Links for August 30, 2014

Ggplot2 To Ggvis: I'm a huge fan of ggplot2 for data visualization in R. Here's a brief tutorial for ggplot2 users to learn ggvis for generating interactive plots in R using the grammar of graphics.

From zero to storm cluster for scikit-learn classification | Daniel Rodriguez: This is a very cool, if brief, blog post on using streamparse, my company's open source wrapper for Apache Storm, and scikit-learn, my favorite machine learning library, to do machine learning on data streams.

Pythonic means idiomatic and tasteful: My boss Andrew recently shared an old blogpost he wrote on what it means for code to be Pythonic; I think he's right on track.

Pythonic isn’t just idiomatic Python — it’s tasteful Python. It’s less an objective property of code, more a compliment bestowed onto especially nice Python code.

git workflow: In my ever continuing attempt to be able to run my entire life from Alfred, I recently installed this workflow that makes git repositories on my computer easily searchable.

Alfred-Workflow: Speaking of Alfred, here's a handy Python library that makes it easy to write your own (if you're a Python programmer).

Squashing commits with rebase: Turns out you can use git rebase to clean up your commits before you push them to a remote repository. This can be a great way to make the commits your team sees more meaningful; don't abuse it.

Tim HopperSundry Links for August 28, 2014

How do I generate a uniform random integer partition?: This week, I wanted to generate random partitions of integers. Unsurprisingly, stackoverflow pulled through with a Python snippet to do just that.

Firefox and Chrome Bookmarks: I love Alfred as a launcher in OS X. I use it many, many times a day. I just found this helpful workflow for quickly searching and opening my Chrome bookmarks.

YNAB for iPad is Here: YNAB has been the best thing to ever happen to my financial life. I use it to track all my finances. They just released a beautiful iPad app. Importantly, it brings the ability to modify a budget to mobile!

Distributed systems theory for the distributed systems engineer: I work on distributed systems these days. I need to read some of these papers.

Tim HopperKeeping IPython Notebooks Running in the Background

I spend a lot of time in IPython Notebooks for work. One of the few annoyances of IPython Notebooks is that they require kepeing a terminal window open to run the notebook server and kernel. I routinely launch a Notebook kernel in a directory where I keep my work related notebooks. Earlier this week, I started to wonder if there was a way for me to keep this kernel running all the time without having to keep a terminal window open..

If you've ever tried to do chron-like automation on OS X, you've surely come across launchd, "a unified, open-source service management framework for starting, stopping and managing daemons, applications, processes, and script". You've probably also gotten frustated with launchd and given up.

I recently started using LaunchControl "a fully-featured launchd GUI" for launchd; it's pretty nice and worth $10. It occurred to me that LaunchControl would be a good way to keep my Notebook kernel running in the background.

I created a LaunchControl to run the following command.

/usr/local/bin/IPython notebook --matplotlib inline --port=9777 --browser=false

This launches an IPython Notebook kernel accessible on port 9777; setting the browser flag to something other than an installed browser prevents a browser window from opening when the kernel is launch.

I added three other launchd keys in LaunchControl:

  • A Working Directory key to tell LaunchControl to start my notebook in my desired folder.
  • A Run At Load key to tell it to start my kernel as soon as I load the job.
  • And a Keep alive key to tell LaunchControl to restart my kernel should the process ever die.

Here's how it looks in LaunchControl:

After I created it, I just had to save and load, and I was off to the races; the IPython kernel starts and runs in the background. I can access my Notebooks by navigating to 127.0.0.1:9777 in my browser. Actually, I added 127.0.0.1 parsely.scratch to my hosts file so I can access my Notebooks at parsely.scratch:9777. This works nicely with Chrome's autocomplete feature. I'm avoiding the temptation to run nginx and give it an even prettier url.

Tim HopperSundry Links for August 25, 2014

How can I pretty-print JSON at the command line?: I needed to pretty print some JSON at the command line earlier today. The easiest way might be to pipe it through python -m json.tool.

Integrating Alfred & Keyboard Maestro: I love Keyboard Maestro for automating all kinds of things on my Mac, but I'm reaching a limit of keyboard shortcuts I can remember. Here's an Alfred workflow for launching macros instead.

streamparse 1.0.0: My team at Parsely is building a tool for easily writing Storm topologies (for processing large volumes of streaming data) in Python. We just released 1.0.0!

TextExpander Tools: Brett Terpstra, the king of Mac hacks, has some really handy tools for TextExpander.

GNU Parallel: GNU parallel is a shell tool for executing jobs in parallel using one or more computers using xargs-like syntax. Pretty cool. HT http://www.twitter.com/oceankidbilly.

Tim HopperSundry Links for August 23, 2014

Arrow: better dates and times for Python: Arrow is a slick Python library "that offers a sensible, human-friendly approach to creating, manipulating, formatting and converting dates, times, and timestamps". It's a friendly alternative to datetime.

Docker via Homebrew: I'm starting to use Docker ("Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications") on occasion. Here are easy install instructions for Mac users.

Mining Massive Datasets MOOC: I'm terrible at completing MOOCs, but I'm really interested in this new on on Mining Massive Datasets.

URL Pinner - Chrome Web Store: URL Pinner is one of my favorite Chrome Extensions. I use it to automatically pin my Gmail and Rdio windows (which I almost always have open).

Using multitail for monitoring multiple log files: If you work with distributed systems, you're probably used to SSH-ing into multiple machines to access logs. Multitool might save you some time.

Saturday Morning Breakfast Cereal: SBMC shows how job interviews would go if we were more honest.

Tim HopperSundry Links for August 23, 2014

Remove Styles (ie, make the clipboard plain text – not applicable to variables). Set line endings to Mac, Unix or Windows/DOS. Trim Whitespace. Hard wrap or unwrap paragraphs. Lowercase (all characters), Lowercase First (just the first character). Uppercase (all characters), Uppercase First (just the first character). Capitalize (all words) or Title Case (intelligently uppercase certain first letters). Change quotes to Smart, Dumb or French quotation marks. Encode HTML or non-ASCII HTML entities. Decode HTML entities. Generate an HTML list. Percent Encode for URL. Get or delete the last path component or the path extension. Get the basename of the path (ie the name without directory or extension). Expand tilde (~) paths, or abbreviate with a tilde. Resolve symlinks, or standardize the path. Delete or bullet (•) control characters. Calculate an expression and return the result, see the Calculations section. Process Text Tokens and return the result, see the Text Tokens section. Count the characters, words or lines and return the result.

Frank WierzbickiJython 2.7 beta3 released!

On behalf of the Jython development team, I'm pleased to announce that the third beta of Jython 2.7 is available. I'd like to thank Adconion Media Group (now Amobee) for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7b3 brings us up to language level compatibility with the 2.7 version of CPython. We have focused largely on CPython compatibility, and so this release of Jython can run more pure Python apps then any previous release. Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

Some highlights of the changes that come in beta 3:
  • Reimplementation of socket/select/ssl on top of Netty 4.
  • Requests now works.
  • Pip almost works (it works with a custom branch).
  • Numerous bug fixes
To get a more complete list of changes in beta 3, see Jim Baker's talk.

As a beta release we are concentrating on bug fixing and stabilization for a
production release.

This release is being hosted at maven central. The traditional installer can be found here. See the installation instructions for using the installer. Three other versions are available:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Caktus GroupDjangoCon Ticket Giveaway!

Update: Congratulations to @dmpayton for winning this giveaway!

Caktus is giving away a DjangoCon ticket valued at $850! DjangoCon is the main US Django conference and it’s returning to Portland this year, August 30 - September 4th. Meet fellow Django developers, learn what others are doing, and have a good time!

To enter the giveaway: (1) follow us @caktusgroup and (2) retweet our message by clicking the button below:

The giveaway will end Wednesday, August 20th at 9AM PDT. We’ll randomly select a name and alert the winner by 5PM PDT. Please note that only one entry per individual is allowed and winning tickets are non-transferable.

We hope to see you at DjangoCon this year!

Caktus GroupPyOhio Recap: Celery with Python

Caleb Smith recently gave a talk, “Intro to Celery,” at PyOhio (video below). Celery is a pretty popular topic for us here at Caktus. We use it often in our client work and find it very handy. So we were happy Caleb was out in the world, promoting its use. We sat down with him to hear more about PyOhio and Celery.

What did you enjoy about PyOhio?

PyOhio had good quality talks and a broad range of topics including system administration, web development, and scientific programming. This year, they had over 100 talk submissions and 38 spots, so there was a huge interest in speakers and a lot of variety as a result. They have four tracks and sprints every evening.

Also, PyOhio is free. The value of a free conference is that it lowers the barrier to attend to the costs of hotel, food, and travel. Things are pretty affordable in Columbus. So that’s good for students or people without an employer to help cover costs, like freelancers. People do come from a pretty big range of places across the Midwest and South.

They have a good team of volunteers that take care of everything.

Aside from a vegetable, what is Celery and why should developers use it?

Celery is for offloading background tasks so you can have work happening behind-the-scenes while running a web project. A typical web app does everything within requests and any periodic work with cronjobs. A lot of web projects will block a request on work that needs to be done before giving a response. For example, an image upload form might make the user wait while thumbnails are produced. Sometimes, there’s work that your web project needs to do that doesn’t fit within the upper limit of 30 seconds or so to fulfill a request before timing out the request. Celery allows for offloading this work outside of the web request. It also allows for the distribution of work as needed on multiple machines. You can trigger background tasks periodically for things like nightly backups, importing data, checking on updates to a feed or API, or whatever work that needs to run asynchronously in the background. We use this a ton with some of our client work.

What are Celery alternatives?

There are a few significant ones such as RQ, pyres, gearman and kuyruk. I think Celery is the most common choice among these. You can also just use system cron jobs for the periodic tasks, but cron jobs only work on one machine and are rarely well maintained. A task queue solution such as Celery coordinates with a broker to work on different machines.

What do you think are the challenges to getting started with Celery?

A lot of people think that it only works with Django. That was true when Celery was first released but is no longer true. There’s also somewhat of a barrier to entry because of the terminology involved, the work of setting up system resources such as the message broker, and understanding its role within a project.

You were a former public school music teacher and often teach Python in the community for organizations like Girl Develop It. Is there a relationship you see to giving talks?

Giving talks does feel like an extension of teaching. You learn a lot trying to prepare for it. My talk was about how to get everything set up, the basics of how Celery works, and developing a mental model for programming Celery tasks. A project like Celery can seem very difficult if you are approaching the documentation on your own. The high level overview is a little daunting so it’s nice to provide an on-ramp for people.

Our other blog posts contain more on Celery with Python.

Caktus GroupCaleb Smith to Guest Lecture at Iron Yard Academy

Caleb Smith, a developer at Caktus, will be guest lecturing tomorrow to the inaugral class at the Iron Yard in Durham. Iron Yard is a code school that trains its students in modern programming practices and prepares them for immediate hiring upon graduation. Tobias, our CEO, is on the school’s employer advisory board. Caleb will be speaking on his experience as a Python developer. As an exclusive Python shop, we here at Caktus naturally think it’s the best language for new students--28 of the top 30 universities agree.

Joseph TateMoving a Paravirtualized EC2 legacy instance to a modern HVM one

I had to try a few things before I could get this right, so I thought I'd write about it. These steps are what ultimately worked for me. I had tried several other things to no success, which I'll list at the end of the post.

If you have Elastic Compute Cloud (EC2) instances on the "previous generation" paravirtualization based instance types, and want to convert them to the new/cheaper/faster "current generation", HVM instance types with SSD storage, this is what you have to do:

You'll need a donor Elastic Block Store (EBS) volume so you can copy data from it. Either shutdown the old instance and detach the EBS, or, as I did, snapshot the old system, and then create a new volume from the snapshot so that you can mess up without worrying about losing data. (I was also moving my instances to a cheaper data center, which I could only do by moving snapshots around). If you choose to create a new volume, make a note of which Availability Zone (AZ) you create it in.

Create a new EC2 instance of the desired instance type, configured with a new EBS volume set up the way you want it. Use a base image that's as similar to what you currently have as possible. Make sure you're using the same base OS version, CPU type, and that your instance is in the same AZ as your donor EBS volume. I mounted the ephemeral storage too as a way to quickly rollback if I messed up without having to recreate the instance from scratch.

Attach your donor EBS volume to your new instance as sdf/xvdf, and then mount them to a new directory I'll call /donor
mkdir /donor && mount /dev/xvdf /donor


Suggested: Mount your ephemeral storage on /mnt
mount /dev/xvdb /mnt
and rsync / to /mnt
rsync -aPx / /mnt/
If something goes wrong in the next few steps, you can reverse it by running
rsync -aPx --delete /mnt/ /
to revert to known working state. The rsync options tell rsync to copy (a)ll files, links, and directories, and all ownership/permissions/mtime/ctime/atime values; to show (P)rogress; and to not e(x)tend beyond a single file system (this leaves /proc /sys and your scratch and donor volumes alone).

Copy your /donor volume data to / by running
rsync -aPx /donor/ / --exclude /boot --exclude /etc/grub.d ...
. You can include other excludes (use paths to where they would be copied on the final volume, not the path in the donor system. The excluded paths above are for an Ubuntu system. You should replace /etc/grub.d with the path or paths where your distro keeps its bootloader configuration files. I found that copying /boot was insufficient because the files in /boot are merely linked to /etc/grub.d.

Now you should be able to reboot your instance your new upgraded system. Do so, detach the donor EBS volume, and if you used the ephemeral storage as a scratch copy, reset it as you prefer. Switch your Elastic IP, or change your DNS configuration, test your applications, and then clean up your old instance artifacts. Congratulations, you're done.

Notes:
Be careful of slashes. The rsync command treats /donor/ differently from /donor.

What failed:
Converting the EBS snapshot to an AMI and setting the AMI virtualization type as HVM, then launching a new instance with this AMI actually failed to boot (I've had trouble with this with PV instances too with the Ubuntu base image unless I specified a specific kernel, so I'm not sure whether to blame HVM or the Ubuntu base images.
Connecting a copy of the PV ebs volume to a running HVM system and copying /boot to the donor, then replacing sda1 with the donor volume also failed to boot, though I think if I'd copied /etc/grub.d too it might have worked. This might not get you an SSD backed EBS volume though, if that's desirable.

Caktus GroupOSCON 2014 & REST API Client Best Practices

Mark Lavin, Caktus Technical Director and author of the forthcoming Django LightWeight was recently at OSCON 2014 in Portland where he gave a talk on improving the relationship between server and client for REST APIs. OSCON, with over 3000 attendees, is one of the largest open source conferences around. I sat down with him to ask him about his time there.

Welcome back! This was your second year speaking at OSCON. How did you enjoy it this year?

I enjoyed it. There’s a variety of topics at OSCON. It’s cool to see what people do with open source—there’s such a large number of companies, technologies, and approaches to solutions. There were great conversations and presentations. I especially liked Ignite OSCON where people gave really well-prepared 5 minute talks.

I participated in the OSCON 5k [Mark received 5th place] too. There were a lot of people out. We went over bridges and went up and down this spiral bridge twice. That race was pretty late for me but fun [began at 9pm PST, which is 12AM EST].

Why did you choose REST API client best practices as a talk topic?

It was something that came out of working on Django LightWeight. I was writing about building REST APIs and the javascript clients. This prompted a lot of thinking and researching on how to design both ends of it from Julia (Ellman, co-author) and I. I found a lot of mixed content and a lot of things I wasn’t happy to see—people skimping on what I felt were best practices.

I think that you need to think about API design in the same way that you think about websites. How is a client going to navigate the API? If it’s asking for a piece of information, how is it going to find a related piece of information? What actions is it allowed to take? Writing a good server can make a client easier, something I’ve seen in my work at Caktus.

Why do you think this isn’t a more common practice?

The focus is often on building a really fast API, not building an API that’s easy to use necessarily. It’s hard to write the client for most APIs. The information that gets passed to the client isn’t always sufficient. Many APIs don’t spend the time to make themselves discoverable, so the client has to spend a lot of work hard coding to make up for the fact that it doesn’t know the location of resources.

What trade-offs do you think exist?

With relational data models, sometimes you end up trading off normalization. A classical “right way” to build a data model is one that doesn’t repeat itself and that doesn’t store redundant data in a very normalized fashion. Denormalizing data can lead to inconsistencies and duplication, but, at times, it can make things faster.

The API design is similar particularly when you have deeply relational structures. There were a lot of conversations about how do you make this trade off. Interestingly enough, Netflix gave a talk about their API and its evolution. They said they started with a normalized structure and discoverable API and found that eventually they had to restructure some pieces into a less normalized fashion for the performance they needed for some of the settop boxes and game boxes that query their API.

We heard you had an opportunity to give a tutorial. Tell us more about it.

I had the opportunity to help Harry Percival. He recently released a book on Python web development using test-driven development. We’d emailed before and so we knew each other a little bit. He asked me to help him be a TA so I spent Monday morning trying to help people follow his tutorial and get set up learning Python and Django. It was unexpected, but a lot of fun, similar to what Caktus has done with the bootcamps. I like to teach. It’s fun to be a part of that and to help someone understand something they didn’t know before. There were a lot of people interested in learning about Python and Django. I was just happy to participate.

Thanks for sharing your experiences with us Mark!

Thanks for asking me!

Caktus GroupWebsite Redesign for PyCon 2015

PyCon 2015’s website launched today (a day early!). PyCon is the premiere conference for the Python community and one we look forward to attending every year. We’re honored that the Python Software Foundation returned to us this year to revamp the site. We were especially happy to work again with organizer-extraordinaires Ewa Jodlowska and Diana Clarke.

One of the most exciting things for our team is turning ideas into reality. The organizers wanted to retain the colorful nature of the PyCon 2014 site Caktus created. The also wanted the team to use the conference site, the Palais des congrès de Montréal, as inspiration (pictured below). The new design needed to pay homage to the iconic building without being either too literal or too abstract.

montreal-convention_center-composite

The design team, led by Trevor Ray, worked together to create the design using the stairs as inspiration (seen through the photo above). The stairs allowed a sense of movement. The colored panes are linked in a snake-like manner, a nod to Python’s namesake. If you look carefully, you will also see the letter P. Working in collaboration with the organizers, the team created multiple drafts, fine-tuning the look and feel with each phase of feedback. The final design represents the direction of the client, the inspiration of the building itself, and the team’s own creativity.

In addition to refreshing PyCon’s website, our developers, as led by Rebecca Lovewell, made augmentations to Symposion, a Django project for conference websites. We’ve previously worked with Symposion for PyCon 2014 and PyOhio. For this round of changes, the team used these previous augmentations as a jumping off point for refinements to the scheduler, financial aid processing, and sponsor information sharing.

Up next? A conference t-shirt!

Vinod KurupUsing dynamic queries in a CBV

Let's play 'Spot the bug'. We're building a simple system that shows photos. Each photo has a publish_date and we should only show photos that have been published (i.e. their publish_date is in the past).

``` python models.py class PhotoManager(models.Manager):

def live(self, as_of=None):
    if as_of is None:
        as_of = timezone.now()
    return super().get_query_set().filter(publish_date__lte=as_of)

```

And the view to show those photos:

``` python views.py class ShowPhotosView(ListView):

queryset = Hero.objects.live()

```

Can you spot the bug? I sure didn't... until the client complained that newly published photos never showed up on the site. Restarting the server fixed the problem temporarily. The newly published photos would show up, but then any photos published after the server restart again failed to display.

The problem is that the ShowPhotosView class is instantiated when the server starts. ShowPhotosView.queryset gets set to the value returned by Hero.objects.live(). That, in turn, is a QuerySet, but it's a QuerySet with as_of set to timezone.now() WHEN THE SERVER STARTS UP. That as_of value never gets updated, so newer photos never get captured in the query.

There's probably multiple ways to fix this, but an easy one is:

``` python views.py class ShowPhotosView(ListView):

def get_queryset(self):
    return Hero.objects.live()

```

Now, instead of the queryset being instantiated at server start-up, it's instantiated only when ShowPhotosView.get_queryset() is called, which is when a request is made.

Caktus GroupA Culture of Code Reviews

Code reviews are one of those things that everyone agrees are worthwhile, but sometimes don’t get done. A good way to keep getting the benefits of code reviews is to establish, and even nurture, a culture of code reviews.

When code reviews are part of the culture, people don’t just expect their changes to be reviewed, they want their changes reviewed.

Some advantages of code reviews

We can all agree that code reviews improve code quality by spotting bugs. But there are other advantages, especially when changes are reviewed consistently.

Having your own code reviewed is a learning experience. We all have different training and experiences, and code reviews give us a chance to share what we know with others on the team. The more experienced developer might be pointing out some pitfall they’ve learned by bitter experience, while the enthusiastic new developer is suggesting the latest library that can do half the work for you.

Reviewing other people’s code is a learning experience too. You’ll see better ways of doing things that you’ll want to adopt.

If all code is reviewed, there are no parts of the code that only one person is familiar with. The code becomes a collaborative product of the team, not a bunch of pieces “owned” by individual programmers.

Obstacles to code reviews

But you only get the benefits of code reviews if you do them. What are some things that can get in the way?

Insufficient staffing is an obvious problem, whether there’s only one person working on the code, or no one working on the code has time to review other changes, or to wait for their own to be reviewed. To nurture a culture of code reviews, enough staff needs to be allocated to projects to allow code reviews to be a part of the normal process. That means at least two people on a team who are familiar enough with the project to do reviews. If there’s not enough work for two full-team team members, one member could be part-time, or even both. Better two people working on a project part-time than one person full-time.

Poor tools can inhibit code reviews. The more difficult something is, the more likely we are to avoid doing it. Take the time to adopt a good set of tools, whether GitHub’s pull requests, the open source ReviewBoard project, or anything else that handles most of the tedious parts of a code review for you. It should be easy to give feedback, linked to the relevant changes, and to respond to the feedback.

Ego is one of the biggest obstacles. No one likes having their work criticized. But we can do things in ways that reduce people’s concerns.

Code reviews should be universal - everyone’s changes are reviewed, always. Any exception can be viewed, if someone is inclined that way, as an indication that some developers are “better” than others.

Reviews are about the code, not the coder. Feedback should be worded accordingly. Instead of saying “You forgot to check the validity of this input”, reviewers can say “This function is missing validation of this input”, and so forth.

We do reviews because our work is important and we want it to be as good as possible, not because we expect our co-workers to screw it up. At the same time, we recognize that we are all human, and humans are fallible.

Establishing a culture of code reviews

Having a culture where code reviews are just a normal part of the workflow, and we’d feel naked without them, is the ideal. But if you’re not there yet, how can you move in that direction?

It starts with commitment from management. Provide the proper tools, give projects enough staffing so there’s time for reviews, and make it clear that all changes are expected to be reviewed. Maybe provide some resources for training.

Then, get out of the way. Management should not be involved in the actual process of code reviews. If developers are reluctant to have other developers review their changes, they’re positively repelled by the idea of non-developers doing it. Keep the actual process something that happens among peers.

When adding code reviews to your workflow, there are some choices to make, and I think some approaches work better than others.

First, every change is reviewed. If developers pick and choose which changes are reviewed, inevitably someone will feel singled out, or a serious bug will slip by in a “trivial” change that didn’t seem to merit a review.

Second, review changes before they’re merged or accepted. A “merge then review” process can result in everyone assuming someone else will review the change, and nobody actually doing it. By requiring a review and signoff before the change is merged, the one who made the change is motivated to seek out a reviewer and get the review done.

Third, reviews are done by peers, by people who are also active coders. Writing and reviewing code is a collaboration among a team. Everyone reviews and has their own changes reviewed. It’s not a process of a developer submitting a proposed change to someone outside the team for approval.

The target

How will you know when you’re moving toward a culture of code reviews? When people want their code to be reviewed. When they complain about obstacles making it more difficult to get their code reviewed. When the team is happier because they’re producing better code and learning to be better developers.

Vinod KurupSome Emacs Posts

A few cool Emacs posts have flown across my radar, so I'm noting them here for that time in the future when I have time to play with them.

Vinod KurupPygments on Arch Linux

I wrote my first blog post in a little while (ok, ok... 18 months) yesterday and when I tried to generate the post, it failed. Silently failed, which is the worst kind of failure. I'm still not sure why it was silent, but I eventually was able to force it to show me an error message:

`` /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:354:inrescue in get_header': Failed to get header. (MentosError)

from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:335:in `get_header'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:232:in `block in mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/1.9.1/timeout.rb:68:in `timeout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:206:in `mentos'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/pygments.rb-0.3.4/lib/pygments/popen.rb:189:in `highlight'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:24:in `pygments'
from /home/vinod/dev/kurup.org/plugins/pygments_code.rb:14:in `highlight'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:37:in `block in render_code_block'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `gsub'
from /home/vinod/dev/kurup.org/plugins/backtick_code_block.rb:13:in `render_code_block'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:12:in `pre_filter'
from /home/vinod/dev/kurup.org/plugins/octopress_filters.rb:28:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:112:in `block in pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `each'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:111:in `pre_render'
from /home/vinod/dev/kurup.org/plugins/post_filters.rb:166:in `do_layout'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/post.rb:195:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:200:in `block in render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `each'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:199:in `render'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/lib/jekyll/site.rb:41:in `process'
from /home/vinod/.rbenv/versions/1.9.3-p286/lib/ruby/gems/1.9.1/gems/jekyll-0.12.0/bin/jekyll:264:in `<top (required)>'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `load'
from /home/vinod/.rbenv/versions/1.9.3-p286/bin/jekyll:23:in `<main>'

```

Professor Google tells me that this happens when you try to run the pygments.rb library in a Python 3 environment. (pygments.rb is a Ruby wrapper around the Python Pygments library). The fix is to run the code in a Python2 virtualenv. I guess the last time I updated my blog, Arch still had Python2 as the system default. No, I don't want to check how long ago that was.

$ mkvirtualenv -p `which python2` my_blog (my_blog)$ bundle exec rake generate

So now I'm running a Ruby command in a Ruby environment (rbenv) inside a Python 2 virtualenv. Maybe it's time to switch blog tools again...

Vinod KurupHow to create test models in Django

It's occasionally useful to be able to create a Django model class in your unit test suite. Let's say you're building a library which creates an abstract model which your users will want to subclass. There's no need for your library to subclass it, but your library should still test that you can create a subclass and test out its features. If you create that model in your models.py file, then Django will think that it is a real part of your library and load it whenever you (or your users) call syncdb. That's bad.

The solution is to create it in a tests.py file within your Django app. If it's not in models.py, Django won't load it during syncdb.

``` python tests.py from django.db import models from django.test import TestCase

from .models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class AbstractTest(TestCase):

def test_my_test_model(self):
    self.assertTrue(MyTestModel.objects.create(name='foo'))

```

A problem with this solution is that I rarely use a single tests.py file. Instead we use multiple test files collected in a tests package. If you try to create a model in tests/test_foo.py, then this approach fails because Django tries to create the model in an application named tests, but there is no such app in INSTALLED_APPS. The solution is to set app_label to the name of your app in an inner Meta class.

```python tests/test_foo.py from django.db import models from django.test import TestCase

from ..models import MyAbstractModel

class MyTestModel(MyAbstractModel):

name = models.CharField(max_length=20)

class Meta:
    app_label = 'myappname'

class AbstractTest(TestCase):

def test_my_test_model(self):
    self.assertTrue(MyTestModel.objects.create(name='foo'))

```

Oh, and I almost forgot... if you use South, this might not work, unless you set SOUTH_TESTS_MIGRATE to False in your settings file.

Comments and corrections welcome!

Joe GregorioObservations on hg and git

Having recently moved to using Git from Mercurial here are my observations:

Git just works

No matter what I try to do, there's a short and simple git command that does it. Need to copy a single file from one branch to my current branch, need to roll back the last two commits and place their changes into the index, need to push or pull from a local branch to a remote and differently named branch, there are all ways to do those things. More importantly, Git does them natively, I don't have to turn on plugins to get a particular piece of functionality.

Turning on plugins is a hurdle

The fact that what I consider to be core functionality is hidden away in plugins and need to be turned on manually is an issue. For example, look at this section of the docs for the Google API Python Client:

https://code.google.com/p/google-api-python-client/wiki/BecomingAContributor#Submitting_Your_Approved_Code

A big thing that trips up contributors is that "--rebase" is in a plugin (and I keep forgetting to update the docs to explain that).

Git is fast

So Git is fast, not just "ooh that was fast", but fast as in, "there must have been a problem because there's no way it could have worked that fast". That's a feature.

Branching

Git branches are much smoother and integrated than MQ. Maybe this is just because I got stuck on MQ and never learned another way to use hg, but the branch and merge workflow is a lot better than MQ.

SSH

In Git ssh: URIs just work for me. Maybe I just got lucky, or was previously unlucky, but I never seemed to be able to pull or push to a remote repository via ssh with hg, and it just worked as advertised with Git.

Helpful

Git is helpful. Git is filled with helpful messages, many of the form "it looks like you are trying to do blah, here's the exact command line for that", or "you seem to be in 'weird state foo', here's a couple different command lines you might use to rectify the situation". Obviously those are paraphrasing, but the general idea of providing long, helpful messages with actual commands in them is done well throughout Git.

Caveats

I'm not writing this to cast aspersions on the Mercurial developers, and I've already passed this information along to developers that work on Mercurial. I am hoping that if you're building command line tools that you can incorporate some of the items here, such as helpful error messages, speed, and robust out-of-the-box capabilities.

Caktus GroupContributing Back to Symposion

Recently Caktus collaborated with the organizers of PyOhio, a free regional Python conference, to launch the PyOhio 2014 conference website. The conference starts this weekend, July 26 - 27. As in prior years, the conference web site utilizes Eldarion’s Symposion, an opensource conference management system. Symposion powers a number of annual conference sites including PyCon and DjangoCon. In fact, as of this writing, there are 78 forks of Symposion, a nod to its widespread use for events both large and small. This collaboration afforded us the opportunity to abide by one our core tenets, that of giving back to the community.

PyOhio organizers had identified a few pain points during last year’s rollout that were resolvable in a manner that was conducive to contributing back to Symposion so that future adopters could benefit from this work. The areas we focused on were migration support, refining the user experience for proposal submitters and sponsor applicants, and schedule building.

Migration Support

https://github.com/pinax/symposion/pull/47

The majority of our projects utilize South for tracking database migrations. They are not an absolute requirement but for those conferences that reused the same code base from year to year, rather than starting a new repository, it would be beneficial to have a migration strategy in place. There were a few minor implementation details to tackle, namely migration dependencies and introspection rules. The Symposion framework has a number of interdependent apps. As such, when using migrations, the database tables must be created in a certain order. For Symposion, there are two such dependencies: Proposals depend on Speakers, and Sponsorship depends on Conferences. The implementation can be seen in this changeset. In addition, Symposion uses a custom field for certain models; django-timezones’ TimeZoneField. There are a few Pull Requests open on this project to deal with South and introspection rules, but none of them have been incorporated. As such, we add a very simple rule to work around migration errors.

As mentioned before, these migrations give Symposion a solid migration workflow for future database changes, as well as prepping for Django 1.7’s native schema migration support.

User Experience Issues

Currently, if an unauthenticated user manages to make a proposal submission, they are simply redirected to the home page of the site. Similarly, if an authenticated user without a Speaker profile makes a submission, they are redirected to their dashboard. In both cases, there is no additional feedback for what the user should do next. We utilized the django messages framework to provide contextual feedback with help text and hyperlinks should these be valid submission attempts (https://github.com/pinax/symposion/pull/50/files).

Sponsor submissions is another area that benefited from additional contextual messages. There are a variety of sponsor levels (Unobtanium, Aluminum, etc..) that carry their own sponsor benefits (print ad in program, for example). The current workflow redirects a sponsor application to the Sponsor Details page, with no contextual message, that lists Sponsor and Benefits details. For sponsor levels with no benefits, this essentially redirects you to an update form for the details you just submitted. Our pull request redirects these cases to the user dashboard with an appropriate message, as well as providing a more helpful message for sponsor applications that do carry benefits. (https://github.com/pinax/symposion/pull/49/files).

Schedule Builder

https://github.com/pinax/symposion/pull/51/files

The conference schedule is a key component to the web site, as it lets attendees (and speakers) know when to be where! It is also a fairly complex app, with a number of interwoven database tables. The screenshot below lists the components required to build a conference schedule:

At a minimum, to create one scheduled presentation requires 7 objects spread across 7 different tables. Scale this out to tens or nearly one hundred talks and the process of manually building a schedule become egregiously cumbersome. For PyCon 2014 we built a custom importer for the talks schedule. A quick glance reveals this is not easily reusable; there are pinned lunches and breaks, and this particular command assigns accepted proposals to schedule slots. For PyOhio, we wanted to provide something that was more generic and reusable. Rather than building out the entire schedule of approved talks, we wanted a fairly quick and intuitive way for an administrator to build the schedule’s skeleton via the frontend using a CSV file. The format of the CSV is intentionally basic, for example:

"date","time_start","time_end","kind"," room "
"12/12/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/12/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room1"
"12/12/2013","11:00 AM","12:00 PM","talk","Room2"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/12/2013","12:00 PM","12:45 PM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room2"
"12/13/2013","10:00 AM","11:00 AM","plenary","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room1"
"12/13/2013","11:00 AM","12:00 PM","talk","Room2"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room1"
"12/13/2013","12:00 PM","12:45 PM","plenary","Room2"

This sample, when uploaded, will create the requisite backend objects (2 Day, 2 Room, 2 Slot Kinds, 8 slots, and 12 SlotRoom objects). This initial implementation will fail if schedule overlaps occur, allows for front end deletion of Schedules, is tested, and provides documentation as well. Having a schedule builder will allow the conference organizers a chance to divert more energy into reviewing and securing great talks and keynotes, rather than dealing with the minutiae of administering the schedule itself.

Symposion is a great Python based conference management system. We are excited about its broad use in general, and in helping contribute to its future longevity and feature set.

Edited to Add (7/21/2014: 3:31PM): PR Merge Efforts

One of the SciPy2014 tech chairs, Sheila, is part of new efforts to get PRs merged. Join the mailing list to learn more about merges based off the PyOhio fork.

Footnotes