A planet of blogs from our members...

Caktus GroupLAN Party at Caktus

This past weekend, our wonderful Technology Support Specialist Scott Morningstar hosted a Local Area Network (LAN) party at Caktus HQ. Held twice a year since 2008, the event allows geeks, gamers, and retro technology lovers to relive the nostalgia of multiplayer gaming in the early days of dial-up internet. In other words, everyone brings their own computer, and uses the LAN to play online games in the company of others. These parties are a lot of fun and add a more personal social element to the online gaming community.

This year, participants played Terraria, an action game with creative world-building elements, Artemis, a spaceship bridge simulator game, and Counter Strike: Global Offensive, a team-based modern warfare game. Not only was it wonderful to see our space filled with enthusiastic gamers, but it was doubly exciting that participants joined gameplay remotely from LAN party events in Boston, Massachusetts, Farmville, Virginia, and Minneapolis, Minnesota.

We love being able to support and host such a wide variety of technology-related events in our community meeting space! For information on other functions held in our downtown Durham headquarters, or in our Astro Code School space, be sure to check out the Events page on our website.

Astro Code SchoolVideo - Interview with Lead Instructor Caleb Smith

In this Astro interview video I talk with our Lead Instructor Caleb Smith. We learn about Caleb's formal education, a connection between music and computer programming, and why teaching excites him. Caleb wrote the curriculum and teaches the Astro Code School Python & Django Web Development class.

Don't forget to subscribe to the Astro Code School YouTube channel. We have a lot more videos in the works.

Caktus GroupLightning Talk Lunch: Two Useful Organizational Tools

Monthly, we organize short Lightning Talks that take place during the lunch hour here at Caktus. Not only does this allow us a wonderful excuse to have lunch delivered from one of our many local foodie options, but it’s an excellent chance to expand our knowledge on a variety of topics. Past talks have included everything from an introduction to synthesizers and other forms of electronic music, to bug fixing, to the design inspiration behind our PyCon 2015 site.

This month, we had two talks on organizational tools for project management and resource sorting. Developer Dan Poirier gave a brief talk on Pinboard, or, as he fondly refers to it, “social bookmarking for introverts.” Essentially, Pinboard is a database for storing, organizing, and sharing links and bookmarks to articles and pages on the web. Though lacking in sharp design or beautiful layout, Pinboard is useful, highly functional, and extremely intuitive. Dan was a wonderful guide in walking us through how he uses Pinboard to store development tips and articles, as well as information related to his various projects for Caktus. He even built his own front-end for the site to help organize his finds for daily use and to share with other Caktus developers.

Our second talk came from Game Designers Edward and Lucas Rowe, who are currently finishing up the work on our Epic Allies app. Before this project, Caktus wanted to try out a new management tool for development; Epic Allies turned out to be a good fit for testing JIRA, the issue and project tracking software from Atlassian. In their talk, Lucas and Edward took us on a tour of JIRA, discussed its functionality for development projects, and showed us how Epic Allies specifically used this highly customizable platform.

All in all it was an informative day, and Dan, Edward, and Lucas may have all won a few converts to their favorite organizational tools. Now we can’t wait to see what’s in the pipeline for our next set of Lightning Talks!

Astro Code SchoolAnnouncing Caktus Scholarships for Astro Code School

We’re very pleased to announce that Caktus Group will be sponsoring up to $20,000 worth of scholarships annually for Astro Code School students. There will be twenty $1,000 scholarships. We hope that these scholarships help increase access to code schools and the wider tech industry:

Caktus Group Diversity & Veterans Scholarship

This scholarship aims to support the careers of underrepresented groups in technology, specifically women, people of color, military veterans, and people with disabilities. For classrooms and for the tech industry to be the best it can be, it requires ideas from diverse groups of people.

Caktus Group North Carolinians Scholarship

Anyone who lives in North Carolina is eligible to receive this scholarship. Caktus was founded in North Carolina and we’ve benefited from the great talent here. We want tech growth in our area to include those that live here.

You can find more information about our scholarships on the financial aid page.

Caktus GroupAnnouncing Caktus Scholarships for Astro Code School

We’re very pleased to announce that Caktus Group will be sponsoring up to $20,000 worth of scholarships for Astro Code School students per year. There will be twenty $1,000 scholarships. We hope that these scholarships help increase access to code schools and the wider tech industry:

Caktus Group Diversity & Veterans Scholarship

This scholarship aims to support the careers of underrepresented groups in technology, specifically women, people of color, military veterans, and people with disabilities. For classrooms and for the tech industry to be the best it can be, it requires ideas from diverse groups of people.

Caktus Group North Carolinians Scholarship

Anyone who lives in North Carolina is eligible to receive this scholarship. Caktus was founded in North Carolina and we’ve benefited from the great talent here. We want tech growth in our area to include those that live here.

Astro Code SchoolWhat I Learned Teaching at UNC

This spring semester, I had the honor of teaching JOMC-583 "Multimedia Programming and Production" for the University of North Carolina at Chapel Hill School of Journalism and Mass Communication. The course requires university permission and two prior multimedia programming courses that focus on frontend web development. It was a wonderful opportunity to partner with the university, especially with a department that has shown leadership in recent years with adopting innovative programs and coursework for students interested in the data-driven area of journalism.

The subject matter of the course centered around backend web development with Python and Django and also included other technologies such as git, SQL, and the Unix command line. As a rough outline, the lecture topics were:

  1. Unix command line

  2. Git and Github

  3. Python

  4. Introductory Django

  5. Django views and templates

  6. Django models and data modeling

  7. Frontend development inside a Django project

  8. Miscellaneous topics

  9. Group project time

The course materials were based on Steven King's curriculum for the course from the year prior and is available at https://github.com/calebsmith/j583

At a high-level, the first half of the course was a mixture of lecture and individual assignments while the second half of the course was spent on two projects. The first development project was completed individually and was small in scale. The second and final project was more ambitious and required collaboration using Github. This served as a nice progression from focusing on concrete skills in isolation to applying those skills and developing further experientially.

One of the group projects was deployed successfully to Heroku and is visible here: http://rackfind.herokuapp.com/

While I think the course was a major learning experience for the students, it certainly was for me as well. It was particularly interesting to see the subject areas that students picked up easily or struggled with and how this often differed with my expectations. In particular, some areas that students picked up quickly were:

  1. The essential Unix command line tools such as: pwd, ls, cd, and so on

  2. Python basics

  3. Python packaging and setup, especially pip and virtualenv

  4. Using Git as a sole contributor

  5. Creating a data model

The students were much quicker to learn these concepts than I anticipated. For instance, we spent two lecture periods focusing on developing skills for the command line, but the first class was enough for most tasks. In the future, I would likely plan on needing only one lecture for that topic.

Some topics that required more reinforcement than anticipated were:

  1. Why writing a custom backend is desirable as opposed to a static HTML site

  2. The semantics of Django URL routing.

  3. How to glue JavaScript code into Django templates

I think the fundamental reason that students struggled with this more than anticipated relates to their arrival to the domain of backend programming from a background of frontend web development.

This was a great experience for me and it was rewarding to see my students succeed in programming with Python and Django. I'm very much looking forward to more opportunities to teach web development in the future.

Astro Code SchoolPython Beginner’s Night at Astro

Last night we held the first TriPython Python Beginner’s Night. About twenty three people interested in Python attended. Many of them were very experienced developers who answered all kinds of questions. From the very basic to the advanced.

A big thanks to all the Caktus Group folks who attended. You helped a lot of people! Thanks also to the other volunteers who attended. It's really cool to live in a city with so many people who enjoy helping others.

The next free Python Beginner's Night is Monday July 6, 2015 from 6pm to 8pm here at Astro Code School (map). We'll be here on the first Monday of each month with free pizza and Python experts. If you can join us please RSVP on the Meetup page. See you soon!

Caktus GroupEpic Allies Featured at mHealth at Duke 2015 Conference

At this year’s mHealth at Duke 2015 Conference, Dr. Lisa Hightow-Weidman discussed her current mHealth projects for HIV prevention. Chief among these projects is her work with Caktus Group on Epic Allies, a mobile gaming app that utilizes social media and mini-games to increase adherence to prescribed medication amongst HIV-positive men who have sex with men (MSM).

Why this particular population? According to research, MSM account for two-thirds of all new HIV infections. In fact, they are the only risk group experiencing an increase in incidence, especially in the southern United States. With 83% of young adults using smartphones, a mobile solution is ideal for targeting at-risk youth in this particular population.

Enter Epic Allies, an adherence intervention that seeks to make taking medication fun while providing social, community support. The app combines gaming, anonymous social interactions, medication reminders, and healthy habit rewards systems to encourage adherence to treatment. The development of the app is the result of a Small Business Innovation Research Grant endowed by the National Institute of Health and was built by Caktus Group in partnership with the UNC Institute for Global Health and Infectious Diseases and the Duke Global Health Institute.


Astro Code SchoolLearn About Astro Code School Info Session

Learn About Astro Code School Info Session Join us online at 10am EDT on Thursday, June 25, 2015 for a Google Hangout information session. Caleb and I will host the hangout and talk a little bit about Astro then answer any questions you might have. Please share this post and RSVP on the Hangout page.

Caktus GroupStanford Social Innovation Review Highlights Caktus' Work in Libya

The Stanford Social Innovation Review recently featured Caktus in “Text the Vote” in Suzie Boss’ “What’s Next: New Approaches to Social Change” column. It describes how our team of developers built the world’s first SMS voter registration system in Libya using RapidSMS.

Article excerpt

In a classic leapfrogging initiative, Libya has enabled its citizens to complete voter registration via digital messaging technology.

In late 2013, soon after Vinod Kurup joined Caktus Group, an open source software firm based in Durham, N.C., he became the lead developer for a new app. The client was the government of Libya, and the purpose of the app would be to support voter registration for the 2014 national elections in that country. Bomb threats and protests in Libya made in-person registration risky. “I realized right away that this wasn’t your standard tech project,” says Kurup.

As a result of that project, Libya became the first country in the world where citizens can register to vote via SMS text messaging. By the end of 2014, 1.5 million people—nearly half of all eligible voters in Libya— had taken advantage of the Caktus-designed app during two national elections. “This never would have happened in a country like the United States, where we have established systems in place [for registering voters],” says Tobias McNulty, co-founder and CEO of Caktus. “Libya was perfect for it. They didn’t have an infrastructure. They were looking for something that could be built and deployed fast.”

To read the rest of article, visit the Stanford Social Innovation Review online.

Caktus GroupRobots Robots Ra Ra Ra!!! (PyCon 2015 Must-See Talk: 6/6)

Part six of six in our PyCon 2015 Must-See Series, a weekly highlight of talks our staff enjoyed at PyCon.

I've had an interest in robotics since high school, but always thought it would be expensive and time consuming to actually do. Over the past few years, though, I've observed the rise of open hardware such as the Arduino and the Raspberry Pi, and modules and kits built on top of them, that make this type of project more affordable and accessible to the casual hobbyist. I was excited by Katherine's talk because Robot Operating System (ROS) seems to do for the software side what Arduino and such do for the hardware side.

ROS is a framework that can be used to control a wide range of robots and hardware. It abstracts away the hard work, allowing for a publish-subscribe method of communicating with your robot's subsystems. A plus side is that you can use higher-level programming languages such as Python or Lisp, not just C and C++, and there is an active and vibrant open source community built up around it already. Katherine did multiple demonstrations with a robot arm that she'd brought to the talk, that did much with a relatively small amount of easily understandable code. She showed that it was even easy to hook in OpenCV and do such things as finding a red bottle cap in the robot's field of vision.


More in the PyCon 2015 Must-See Talks Series.

Caktus GroupTesting Client-Side Applications with Django Post Mortem

I had the opportunity to give a webcast for O’Reilly Media during which I encountered a presenter’s nightmare: a broken demo. Worse than that it was a test failure in a presentation about testing. Is there any way to salvage such an epic failure?

What Happened

It was my second webcast and I chose to use the same format for both. I started with some brief introductory slides but most of the time was spent as a screen share, going through the code as well as running some commands in the terminal. Since this webcast was about testing this was mostly writing more tests and then running them. I had git branches setup for each phase of the process and for the first forty minutes this was going along great. Then it came to the grand finale. Integrate the server and client tests all together and run one last time. And it failed.

Test Failure

I quickly abandoned the idea of attempting to live debug this error and since I was at the end away I just went into my wrap up. Completely humbled and embarrassed I tried to answer the questions from the audience as gracefully as I could while inside I wanted to just curl up and hide.

Tracing the Error

The webcast was the end of the working day for me so when I was done I packed up and headed home. I had dinner with my family and tried not to obsess about what had just happened. The next morning with a clearer head I decided to dig into the problem. I had done much of the setup on my personal laptop but ran the webcast on my work laptop. Maybe there was something different about the machine setups. I ran the test again on my personal laptop. Still failed. I was sure I had tested this. Was I losing my mind?

I looked through my terminal history. There it was and I ran it again.

Single Test Passing

It passed! I’m not crazy! But what does that mean? I had run the test in isolation and it passed but when run in the full suite it failed. This points to some global shared state between tests. I took another look at the test.

import os

from django.conf import settings
from django.contrib.staticfiles.testing import StaticLiveServerTestCase
from django.test.utils import override_settings

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait


@override_settings(STATICFILES_DIRS=(
    os.path.join(os.path.dirname(__file__), 'static'), ))
class QunitTests(StaticLiveServerTestCase):
    """Iteractive tests with selenium."""

    @classmethod
    def setUpClass(cls):
        cls.browser = webdriver.PhantomJS()
        super().setUpClass()

    @classmethod
    def setUpClass(cls):
        cls.browser = webdriver.PhantomJS()
        super().setUpClass()

    @classmethod
    def tearDownClass(cls):
        cls.browser.quit()
        super().tearDownClass()

    def test_qunit(self):
        """Load the QUnit tests and check for failures."""

        self.browser.get(self.live_server_url + settings.STATIC_URL + 'index.html')
        results = WebDriverWait(self.browser, 5).until(
            expected_conditions.visibility_of_element_located(
                (By.ID, 'qunit-testresult')))
        total = int(results.find_element_by_class_name('total').text)
        failed = int(results.find_element_by_class_name('failed').text)
        self.assertTrue(total and not failed, results.text)

It seemed pretty isolated to me. The test gets its own webdriver instance. There is no file system manipulation. There is no interaction with the database and even if it did Django runs each test in its own transaction and rolls it back. Maybe this shared state wasn’t in my code.

Finding a Fix

I’ll admit when people on IRC or Stackoverflow claim to have found a bug in Django my first instinct is to laugh. However, Django does have some shared state in its settings configuration. The test is using the override_settings decorator but perhaps there was something preventing it from working. I started to dig into the staticfiles code and that’s where I found it. Django was using the lru_cache decorator for the construction of the staticfiles finders. This means they were being cached after their first access. Since this test was running last in the suite it meant that the change to STATICFILES_DIRS was not taking effect. To fix my test meant that I simply needed to bust this cache at the start of my test.

...
from django.contrib.staticfiles import finders, storage
...
from django.utils.functional import empty
...
class QunitTests(StaticLiveServerTestCase):
...
    def setUp(self):
        # Clear the cache versions of the staticfiles finders and storage
        # See https://code.djangoproject.com/ticket/24197
        storage.staticfiles_storage._wrapped = empty
        finders.get_finder.cache_clear()

All Tests Passing

Fixing at the Source

Digging into this problem, it became clear that this wasn’t just a problem with the STATICFILES_DIRS setting but was a problem with using override_settings with most of the contrib.staticfiles related settings. In fact I found the easiest fix for my test case by looking at Django’s own test suite. I decided this really needed to be fixed in Django so that this issue wouldn’t bite any other developers. I opened a ticket and a few days later I created a pull request with the fix. After some helpful review from Tim Graham it was merged and was included in the recent 1.8 release.

What’s Next

Having a test which passes alone and fails when running in the suite is a very frustrating problem. It wasn’t something that I planned to demonstrate when I started with this webcast but that’s where I ended up. The problem I experienced was entirely preventable if I had prepared for the webcast better. However, my own failing lead to a great example of tracking down global state in a test suite and ultimately helped to improve my favorite web framework in just the slightest amount. All together I think it makes the webcast better than I could have planned it.

Caktus GroupTech Community Yoga Now Offered at Caktus

The Caktus office is now home to a weekly yoga class for the tech community of Durham. Via our employee suggestion box, Lead Designer Ross Pike recommended a Caktus yoga class. Through team effort that suggestion will come to fruition next week. Starting Thursday, June 11th, we will be offering a yoga class taught by professional instructor Christina Conley. The class will be open to the public at large and will be held in our community meeting space at our offices in downtown Durham.

If you are interested in joining the yoga class, you can sign up here ($8 per session): http://www.eventbrite.com/e/tech-community-yoga-class-tickets-17261719267

Also, be on the lookout for a Caktus run club in the next few weeks. Here’s to more great ideas from the suggestion box!

Caktus GroupPyLadies RDU and Astro Code School Team Up for an Intro to Django Workshop

This past Saturday, Caktus developer Rebecca Conley taught a 4-hour introductory level workshop in Django hosted by PyLadies RDU. PyLadies RDU is the local chapter of an international mentorship group for women who love coding in Python. Their main focus is to empower women to become more active participants and leaders in the Python open-source community.

The workshop was held in our Astro Code School space and sponsored by Windsor Circle, Astro Code School, and Caktus Group. Leslie Ray, the local organizer of PyLadies, is always looking for new opportunities “to create a supportive atmosphere for women to learn and teach Python.” With a strong interest in building projects in Django herself, Leslie thought an introductory workshop was the perfect offering for those looking to expand their knowledge in Python as well as a great platform from which Rebecca could solidify her own skills in the language.

“Django is practical,” explains Rebecca, “and it’s the logical next step for those with experience in Python looking to expand their toolkit.”

The event was extremely successful, with a total of thirty students in attendance. Rebecca was impressed with the students, who were “ enthusiastic and willing to work cooperatively,” which is always key in workshop environments. The class attracted everyone from undergraduates, to PhD students, to those looking into mid-career changes. In addition, she was glad to team up with PyLadies for the workshop, appreciating the group’s goal to provide a free and friendly environment for those wishing to improve and expand on their skills.

“It’s important to create new channels for individuals to explore programming. Unfortunately, the lack of diversity in tech is an indication not of who is interested in programming or technology, but of the lack of entryways into that industry. So any opportunity to widen that gateway, or to create more gateways, or to give more people the power to program is to be valued and diversity will ultimately make the field better.”It’s important to create new gateways for people to enter the field. The group of people with interest in and aptitude for programming is large and diverse, and diversity will make this field better. It’s up to those of us already in the field to open more doors and actively welcome and support people when they come in.”

For more information on PyLadies and their local programming, be sure to join their Meetup page, follow them on Twitter, or check out the international PyLadies group page. Other local groups that provide opportunities to code and that we’re proud sponsors of include Girl Develop It! RDU, TriPython, and Code for Durham. For women in tech seeking career support, Caktus also founded Durham Women in Tech.

Astro Code SchoolVideo - Conditionals in Python

In Caleb Smith's third video in our series about beginning Python he shows you comparison operators, input(), print(), indentation and if statements in Python. Use http://repl.it/languages/Python3 to follow along in the browser.

Don't forget to subscribe to the Astro Code School YouTube channel. We have a lot more videos in the works.

Astro Code SchoolVideo - Using repl.it with Python 3

This is Caleb Smith's second video in our series about beginning Python. It shows you how to use the web based Python shell and text editor repl.it. Use http://repl.it/languages/Python3 to follow along in the browser.

Don't forget to subscribe to the Astro Code School YouTube channel. We have a lot more videos in the works.

Astro Code SchoolVideo - Very First Steps with Python

This is Caleb Smith's first video in our series about beginning Python. It introduces some fundamentals of programming in Python. Topics for this video include data values, types, basic operators and variables.

Don't forget to subscribe to the Astro Code School YouTube channel. We have a lot more videos in the works.

Caktus GroupCreating and Using Open Source: A Guide for ICT4D Managers

Choosing an open source product or platform upon which to build an ICT4D service is hard. Creating a sustainable, volunteer-driven open source project is even harder. There is a proliferation of open source tools in the world, but the messaging used to describe a given project does not always line up with the underlying technology. For example, the project may make claims about modularity or pluggability that, upon further investigation, prove to be exaggerations at best. Similarly, managers of ICT4D projects may be attracted to Open Source because of the promise of a “free” product, but as we’ve learned through trial and error at Caktus, it’s not always less costly to adapt an existing open source project than it would be to engineer a quality system from the ground up.

In this post I will go over some of the criteria we look at when evaluating a new open source project, from a developer’s perspective, in the hopes that it helps managers of ICT4D projects make educated decisions about when it makes sense to adopt a pre-existing open source solution. For those ICT4D managers looking to release a new open source platform, what follows may also prove helpful when deciding how best to allocate resources to the initial release and ongoing management of an open source product or platform. To that end, I’ll provide a high level overview of what matters most: licensing, code quality assessments, automated testing, development workflow, documentation, release frequency, and community engagement.

The three things that are most important to ICT4D projects, I would argue, are quick iteration, replicability, and scalability. Quick iteration is required in order to get early drafts of solutions out in front of beneficiaries to pilot as quickly as possible. Replicability is important when a pilot project is ready to be tested in multiple locations. Similarly, once a pilot has been shown to be successful, the ability to quickly scale up that project to meet regional, national, or even international demand is critical.

The problem is that these three success factors often place competing demands on the project. Doing things the quick and dirty way may be perceived as shortening the time to a working solution, but it also means the solution might not work in other contexts. Similarly, the project might hit a technical barrier when it comes time to scale up. With proper planning and execution, however, I believe all three of these — quick iteration, replicability, and scalability — can be achieved in a way that does not require compromises nor starting over from scratch when it comes time to replicate or scale an ICT4D project. Furthermore, we believe strongly at Caktus that doing things the right way the first time minimizes both risk and the time to develop a software project, even for quick, iterative pilots.

Selecting permissive licenses lowers the barrier to entry

There are many types and subtypes of open source licensing, and trying to select a project based on a license can easily get confusing. Generally speaking, we opt for the more permissive BSD- or MIT-style licenses at Caktus when we have the choice. The main thing to consider when using software with more restrictive licenses such as the GPL or AGPL is that they tend to be less business- or donor-friendly and hence may attract a smaller overall community than they would have otherwise. They can also add requirements that your project might not otherwise have had, such as open-sourcing it.

Creating code readable by humans improves scalability

Code quality is something that is easy to forget about early in a project. ICT4D pilots are often like startups: the drive is to get features out the door as quickly as possible to test and prove the minimum viable product (MVP). We believe you can produce work that is both speedily deployed and later easy to scale by focusing on code quality from the start. In software development there is a concept of “technical debt:” Moving quickly without concern for quality creates “debt” that must be paid back, with interest accruing over time.

Code quality includes creating code that is readable to fellow developers. Like any language, clarity for other people reading it matters. At Caktus our preference generally tends to be for the Python programming language because it is well known for being highly readable and easy to learn.

For those ICT4D program managers starting new projects, regardless of the programming language, it’s helpful to build in time for the development team to add automated checks to the project that enforce a code formatting standard. For those evaluating a new open source solution, apart from reviewing the code itself, ICT4D program managers can check for the existence of documented coding standards. The end goal is for all developers on a project is to write code that is indistinguishable from another developer’s code; you should not be able to tell from looking at a piece of code who wrote it. This makes it easier both to bring new people into the project and for a developer jump into a part of the code he or she didn’t write, in case the person who wrote it happens to be inaccessible at the time an urgent change is needed. The code should be the product of the team, not a set of disparate individuals, and having code formatting standards in place helps encourage that. At Caktus, we typically use flake8 (run via Travis CI) to check the format of our code automatically each time a developer makes a commit or submits a pull request.

Automated code testing ensures reliability

Automated code testing is both best practice and necessary to avoid software failures, but we have seen it dismissed in the rush to deploy. The key concepts for ICT4D program managers to consider in the planning process is what kind of automated testing developers are using. Automated testing includes both “unit” and “integration” testing. “Unit tests” are pieces of code that individually test discrete parts of the overall code base to ensure they continue to work as expected as changes are made to the system. “Integration tests,” similarly, verify that the different components function when combined into a complete system. The end goal of both types of tests is the same: to ensure that the existing software does not break as features are added or changed or bugs are fixed. Absent automated tests, it’s all too easy for something as small as a bug fix to introduce one or more new, unanticipated bugs in other parts of the system.

At Caktus we primarily use Django’s testing framework, which is based on unittest framework in Python. We also set up Continuous Integration to run tests on every set of changes automatically and email the developers when tests fail, so the team is always aware when the tests aren’t passing. When evaluating whether or not a project relies heavily on automated testing, two things to look for are (a) whether or not the project advertises test coverage (as a percentage, at least 85-90% is preferred), and (b) whether or not the development process requires new features to come bundled with unit tests. As with code quality, if automated tests are left out of a project, I would argue that the time to develop the project will actually increase rather than decrease because the development team will end up spending time tracking down bugs that would have been caught by the testing framework, time that could have been spent developing features.

A documented development workflow streamlines new contributions

The development workflow is another important part of any software project, in particular open source projects. Open source projects should have a clearly documented, community supported method for (a) proposing and discussing potential features or other changes, (b) developing those changes, (c) having those changes reviewed and approved by other developers, (d) merging those changes into the main branch(es), and (e) releasing sets of those changes as numbered releases (e.g., v1.2). Whether a project has these things documented can usually be discovered easily by searching for a “developer manual” or “contributors guide,” as well as reviewing the content of the project’s developer mailing list to see evidence of how contributions work in practice. This documentation acts as a clear entry point for both users and developers without which open source projects wither.

At Caktus we typically use a variant of the GitHub Flow model that includes one additional “staging” or “develop” branch that is used to deploy the code to an intermediary “staging” server. This allows code to be tested before being deployed to the production server. A key part of this workflow is the peer code review, a process by which a fellow developer reviews every new change. Not only does the process help detect potential issues early, it also broadens overall knowledge of the code base. Code reviews can’t be done intermittently or when it’s convenient, but should be done for every change being made to the project. We believe creating a culture of code reviews allows individual developers to forgo ego in favor of a drive towards system integrity. One can evaluate whether a project does code reviews by checking a number of places, including the project developer mailing list, the GitHub or BitBucket “pull requests” feature which allows line-by-line reviews, or simply by reviewing the commit log to see if changes are made directly to the “master” or “default” branch or if they’re made to separate “feature” branches first.

Clear documentation helps create sustainable open source projects

Good documentation is fundamental to any successful open source project. Perhaps counter intuitively, it’s just as easy to have too much documentation as it is to have too little. Signs that an open source project takes documentation seriously include things like how often the documentation is referenced on the project’s mailing list(s), where the documentation is stored, how the documentation is edited, and how easy the documentation makes it, both for new users and developers of the project, to come on board. While not always the case, documentation that is automatically generated by the code can be a case of “too much” rather than “good” documentation. Jacob Kaplan-Moss of the Django project wrote a great blog post back in 2009 on writing good technical documentation that is worth a read for anyone putting together documentation for an open source project.

At Caktus we generally have a preference for storing developer-written documentation in the code repository itself; this allows the team to quickly update documentation when code changes are made, and also makes it easy to spot discrepancies between code changes and documentation changes when doing code reviews. While wikis may be easy to update, they tend to fall out of sync with the code because updating them happens as part of a different process. Hosting documentation in a wiki also makes it harder to refer back to older versions of the documentation if you have a system that’s been running for a few years and have not been able to upgrade the underlying platform.

Regular releases and recent “commits” help ensure continuity

One of the first things we tend to look at (in part because it’s one of the easiest) is to check how recently the project we’re evaluating released a new version and/or how recently someone committed new changes to the code. While it’s not always a bad sign if there hasn’t been a release in a year or two, it’s generally better to find projects that have regular releases of at least 2-3 times a year. It can also be a bad sign, for example, if there are lots of frequent commits to the code repository, but the last “released” (numbered) version is many months or years old. This may mean that the release management has fallen off track, and the project is targeting only internal users rather than the larger open source / ICT4D community.

Developer community engagement necessary to leverage the power of open source

Community engagement and openness are two more important factors to consider when selecting an open source project as the foundation for (or to add to) an ICT4D solution. Community engagement matters because projects without a community of users and contributors tend not to be maintained over the long run. Engagement of the community can be evaluated by reviewing traffic on the project’s mailing list(s) and bug tracker (for both users and developers) and determining the prevailing character of project communications. Key events to look for include the usual response when someone enters a bug report, submits a suggested change or pull request, or proposes a discussion around the project’s development workflow. While reasonable demands can (and should) be placed on new users for following protocol, a high number of rejected changes or disgruntled first-time users tends to be an indicator of poor community relations. These are some of the reasons why we’re big proponents of the Django framework: the community is almost always warm and welcoming and is quick to enforce this culture. In addition to communications, other positive attributes to look for include documentation around adding new members to the core development team as well as codes of conduct or other policies that set forth in a public way the desire to create an inclusive community for all. These things matter because developers are people, and communication -- as in any discipline -- is critical.

Conclusion

While by no means an all-inclusive list, these are some of factors I think it’s important to consider when selecting a new open source product to use for an ICT4D solution. I hope to have provided useful insight into the developer’s perspective, one that I think ICT4D program managers should consider when evaluating open source projects. I realize selecting projects that hold themselves to the highest standard on all of these points may be a difficult task, so as with many things deficiencies in one area may be made up for with excellence in others. Similarly, implementing all of the above points on an open source project you release will not result in a sudden wave of contributions from volunteer developers, but the more you can do the more you’ll lower the barrier to entry for developers and facilitate community growth.

I hope to update this post from time to time with new ideas and approaches for evaluating open source projects for use in ICT4D, so if you have any questions, comments, or suggested additions, please leave them in the comments section below. I look forward to your feedback!

Caktus GroupDurham Women in Tech (D-WiT) Starts Strong

This past Tuesday we held our very first gathering for the new Durham Women in Tech (D-WiT) Meetup group. There was a huge turnout and a lot of enthusiasm for the community we’re seeking to support and build. It was particularly wonderful to see our recently opened Astro Code School space full of people.

We began with a short mingling period. I loved hearing everyone’s stories as to why they had come. I met a wide variety of women involved or interested in tech, from students just learning to code and looking for more support in that arena, to professionals with long careers hoping to learn effective methods for shaping a more inclusive culture within the tech industry.

Hao delivered a short presentation on the evening’s topic: imposter syndrome, or the feeling that you’ve flown in under the radar and are about to be found out. The feelings of incompetency and anxiety it evokes can be triggered by doing something new, a tendency towards perfectionism, or being different from those around you. For women—and especially for women of color—being different is often a de facto situation in a male-dominated field.

More important than the discussion of what imposter syndrome is, was the discussion of how to combat it. Attendees split into four groups to offer their own personal experiences with imposter syndrome as well as the tools and methods they’ve developed for resisting it. It was such a rewarding experience to walk away with viable solutions and methods for learning to internalize one’s achievements.

Our next meeting will be in July, and I don’t think I’m alone in my excitement to meet again with this new circle of support within the local tech community.

Astro Code SchoolAstro Launches in Durham

Astro Code School Director Brian Russell tells Durham Mayor Bill Bell about the school

Friday May 1 we held our launch party. A lot people showed up to welcome Astro Code School to Durham and learn about what we do. I had a great time telling our story to guests. Plus it was fun to meet Mayor Bell!

As a resident of the City of Durham I love working Downtown. It's close to where I live, convenient to a lot of great food and drink, and a great place to run into cool people all the time. I feel as if I'm part of something really awesome at a cool time in Durham history.

Astro's mission to educate people really fits well with a community who's committed to serving others. I first learned about this awesome attribute of Durhamites from friends who work at local non-profits. Inspired by them I joined AmeriCorp in 2004 as a technology VISTA at the Durham Literacy Center. This experience gave me quite an education and was a big influence on me.

A giant thanks to all the people at Caktus Consulting Group who helped organized the event. Without them it wouldn’t have been possible.

Caktus CTO Colin Copeland, Durham Mayor Bill Bell, and Caktus CBO Alex Leman

We’re right downtown Durham at 108 Morris Street. I hope that when you have a moment you'll stop in and say hi.

Caktus GroupCakti at CRS ICT4D 2015

This is Caktus’ first year taking part in the Catholic Relief Service’s (CRS) Information and Communication Technologies for Development (ICT4D) conference. The theme of this year’s conference is increasing the impact of aid and development tools through innovation. We’re especially looking forward to all of the speakers from organizations like the International Rescue Committee, USAID, World Vision, and the American Red Cross. In fact, the offerings are so vast, we thought we would provide a little cheat sheet to help you find Cakti throughout this year’s conference.

Wednesday, May 27th

How SMS Powers Democracy in Libya Vinod Kurup will explain how Caktus used Rapid SMS, a Django-based SMS framework, to build the world’s first voter registration system in Libya.

Commodity Tracking System (CTS): Tracking Distribution of Commodities Jack Byrne of the International Rescue Committee(IRC) is the Syria Response Director. He will present on the Caktus-built system IRC uses to track humanitarian aid for Syrian refugees.

Friday, May 29th

Before the Pilot: Planning for Scale Caktus’ CTO Colin Copeland will be part of a panel discussion on what technology concepts matter most at the start of a project and the various challenges of pilot programs. Also on the panel will be Jake Watson of IRC and Jeff Wishnie of MercyCorps. Hao Nguyen, Caktus’ Strategy Director, will moderate.

Leveraging the Open Source Community for Truly Sustainable ICT4D CEO Tobias McNulty will provide his insider’s perspective on the open source community and how to best use that community in the development of ICT4D tools and solutions.

Wednesday, Thursday, and Friday

Throughout the conference you can stop by the Caktus booth to read more about our ICT4D projects and services, meet Cakti, or play one of the mini-games from our Epic Allies app.

Not attending the conference? You can follow @caktusgroup and #ICT4D2015 for live updates!

Caktus GroupPyPy.js: What? How? Why? by Ryan Kelly (PyCon 2015 Must-See Talk: 5/6)

Part five of six in our PyCon 2015 Must-See Series, a weekly highlight of talks our staff enjoyed at PyCon.

From Ryan Kelly's talk I learned that it is actually possible, today, to run Python in a web browser (not something that interprets Python-like syntax and translates it into JavaScript, but an actual Python interpreter!). PyPy.js combines two technologies, PyPy (the Python interpreter written in Python) and Emscripten (an LLVM-to-JavaScript converter, typically used for getting games running in the browser), to run PyPy in the browser. This talk is a must-see for anyone who's longed before to write client-side Python instead of JavaScript for a web app. While realistically being able to do this in production may still be a ways off, at least in part due to the multiple megabytes of JavaScript one needs to download to get it working, I enjoyed the view Ryan's talk provided into the internals of this project. PyPy itself is always fascinating, and this talk made it even more so.


More in the PyCon 2015 Must-See Talks Series.

Caktus GroupAnnouncing the New Durham Women in Tech (DWiT) Meetup

We’re pleased to officially announce the launch of a new meetup: Durham Women in Tech (DWiT). Through group discussions, lectures, panels, and social gatherings, we hope to provide a safe space for women in small and medium-sized Durham tech firms to share challenges, ideas, and solutions. We especially want to support women on the business side in roles such as operations, marketing, business development, finance, and project management.

A small group of us at Caktus decided to start DWiT after being unable to find a local group for those in similar positions to us: we work on the business side and, as part of a growing company, wear many hats. Our roles often include implementing new processes and policies, tasks that influence culture and corporate direction. We have a seat at the table, but it’s not always clear how to help our companies move forward. How do we work towards removing the barriers women face in the tech industry within our roles? How do we help ourselves and our teams when faced with gendered challenges?

By pulling together a group of similar women, we hope to pool everyone’s experiences into a shared resource. We’ve seen the power of communities for female developers through the organizations Caktus supports internationally and locally with mentors and sponsorship, including, amongst others, Girl Develop It RDU, PyLadies RDU, DjangoGirls, and Pearl Hacks. We’re looking forward to strengthening the resources for women in technology in Durham.

Our inaugural meeting is on Tuesday, May 26th at 6 pm. We will be discussing imposter syndrome, a name given for those unfortunate moments where one feels like an imposter, despite external evidence to the contrary. RSVP by joining our meetup group.

Caktus GroupKeynote by Catherine Bracy (PyCon 2015 Must-See Talk: 4/6)

Part four of six in our PyCon 2015 Must-See Series, a weekly highlight of talks our staff enjoyed at PyCon.

My recommendation would be Catherine Bracy's Keynote about Code for America. Cakti should be familiar with Code for America. Colin Copeland, Caktus CTO, is the founder of Code for Durham and many of us are members. Her talk made it clear how important this work is. She was funny, straight-talking, and inspirational. For a long time before I joined Caktus, I was a "hobbyist" programmer. I often had time to program, but wasn't sure what to build or make. Code for America is a great opportunity for people to contribute to something that will benefit all of us. I have joined Code for America and hope to contribute locally soon through Code for Durham.


More in the PyCon 2015 Must-See Talks Series.

Caktus GroupQ2 2015 ShipIt Day ReCap

Last Friday everyone at Caktus set aside their regular client projects for our quarterly ShipIt Day, a chance for Caktus employees to take some time for personal development and independent projects. People work individually or in groups to flex their creativity, tackle interesting problems, or expand their personal knowledge. This quarter’s ShipIt Day saw everything from game development to Bokeh data visualization, Lego robots to superhero animation. Read more about the various projects from our Q2 2015 ShipIt Day.


Victor worked on our version of Ultimate Tic Tac Toe, a hit at PyCon 2015. He added in Jeff Bradbury’s artificial intelligence component. Now you can play against the computer! Victor also cleaned up the code and open sourced the project, now available here: github.com/caktus/ultimatetictactoe.

Philip dove into @total_ordering, a Python feature that fills in defining methods for sorting classes. Philip was curious as to why @total_ordering is necessary, and what might be the consequences of NOT using it. He discovered that though it is helpful in defining sorting classes, it is not as helpful as one would expect. In fact, rather than speeding things up, adding @total_ordering actually slows things down. But, he concluded, you should still use it to cover certain edge cases.

Karen updated our project template, the foundation for nearly all Caktus projects. The features she worked on will save us all a lot of time and daily annoyance. These included pulling DB from deployed environments, refreshing the staging environment from production, and more.

Erin explored Bokeh, a Python interactive data visualization library. She initially learned about building visualizations without javascript during PyCon (check out the video she recommended by Sarah Bird). She used Bokeh and the Google API to display data points on a map of Africa for potential use in one of our social impact projects.

Jeff B worked on Lisp implementation in Python. PyPy is written in a restricted version of Python (called RPython) and compiled down into highly efficient C or machine code. By implementing a toy version of Lisp on top of PyPy machinery, Jeff learned about how PyPy works.

Calvin and Colin built the beginnings of a live style guide into Caktus’ Django-project-template. The plan was loosely inspired by Mail Chimp's public style guide. They hope to eventually have a comprehensive guide of front-end elements to work with. Caktus will then be able to plug these elements in when building new client projects. This kind of design library should help things run smoothly between developers and the design team for front-end development.

Neil experimented with Mercury hoping the speed of the language would be a good addition to the Caktus toolkit. He then transitioned to building a project in Elm. He was able to develop some great looking hexagonal data visualizations. Most memorable was probably the final line of his presentation: “I was hoping to do more, but it turns out that teaching yourself a new programming language in six hours is really hard.” All Cakti developers nodded and smiled knowingly.

Caleb used Erlang and cowboy to build a small REST API. With more time, he hopes to provide a REST API that will provide geospatial searches for points of interest. This involves creating spatial indexes in Erlang’s built-in Mnesia database using geohashes.

Mark explored some of the issues raised in the Django-project-template and developed various fixes for them, including the way secrets are managed. Now anything that needs to be encrypted is encrypted with a public key generated when you bring up the SALT master. This fixes a very practical problem in the development workflow. He also developed a Django-project-template Heroku-style deploy, setting up a proof of concept project with a “git push” to deploy workflow.

Vinod took the time to read fellow developer Mark Lavin’s book Lightweight Django while I took up DRiVE by Daniel H. Pink to read about what motivates people to do good work or even complete rote tasks.

Scott worked with Dan to compare Salt states to Ansible playbooks. In addition, Dan took a look at Ember, working with the new framework as a potential for front-end app development. He built two simple apps, one for organizing albums in a playlist, and one for to-do lists. He had a lot of fun experimenting and working with the new framework.

Edward and Lucas built a minigame for our Epic Allies app. It was a fun, multi-slot, pinball machine game built with Unity3D.

Hunter built an HTML5 game using Phaser.js. Though he didn’t have the time to make a fully fledged video game, he did develop a fun looking boardgame with different characters, abilities, and animations.

NC developed several animations depicting running and jumping to be used to animate the superheros in our Epic Allies app. She loved learning about human movement, how to create realistic animations, and outputting the files in ways that will be useful to the rest of the Epic Allies team.

Wray showed us an ongoing project of his: a front-end framework called sassless, “the smallest CSS framework available.” It consists of front-end elements that allow you to set up a page in fractions so that they stay in position when resizing a browser window (to a point) rather than the elements stacking. In other words, you can build a responsive layout with a very lightweight CSS framework.

One of the most enertaining projects of the day was the collaboration between Rebecca C and Rob, who programmed Lego-bots to dance in a synced routine using the Lego NXT software. Aside from being a lot of fun to watch robots (and coworkers) dance, the presence of programmable Lego-bots prompted a much welcome visit from Calvin’s son Caelan, who at age of 9 is already learning to code!

Caktus GroupInteractive Data for the Web by Sarah Bird (PyCon 2015 Must-See Talk: 3/6)

Part three of six in our PyCon 2015 Must-See Series, a weekly highlight of talks our staff enjoyed at PyCon.

Sarah Bird's talk made me excited to try the Bokeh tutorials. The Bokeh library has very approachable methods for creating data visualizations inside of Canvas elements all via Python. No javascript necessary. Who should see this talk? Python developers who want to add a beautiful data visualization to their websites without writing any javascript. Also, Django developers who would like to use QuerySets to create data visualizations should watch the entire video, and then rewind to minute 8:50 for instructions on how to use Django QuerySets with a couple of lines of code.

After the talk, I wanted to build my own data visualization map of the world with plot points for one of my current Caktus projects. I followed up with one of the friendly developers from Continuum Analytics to find out that you do not need to spin up a separate Bokeh server to get your data visualizations running via Bokeh.

Astro Code SchoolFall Registration Now Open

Registration for the fall Python & Django Web Engineering class is open. You can fill out the application form on the Apply page and get more details on the application Process page. The deadline for applying is August 24, 2015. You can find a full syllabus for this class over on it's page be102.

This class is twelve weeks long and full time Monday to Friday from 9 AM – 5 PM. It'll be taught here at the Astro Code School at 108 Morris Street, Suite 1b, Durham, NC.

Python and Django make a powerful team to build maintainable web applications quickly. When you take this course you will build your own web application during lab time with assistance from your teacher and professional Django developers. You’ll also receive help preparing your portfolio and resume to find a job using the skills you’ve learned.

Please contact me if you have any questions.

Caktus GroupCakti Comment on Django's Class-based Views

After PyCon 2015, we were surprised when we realized how many Cakti who attended had all been asked about Django's class-based views (CBVs). We talked about why this might be, and this is a summary of what we came up with.

Lead Front-End Developer Calvin Spealman has noticed that there are many more tutorials on how to use CBVs than on how to decide whether to use them.

Astro Code School Lead Instructor Caleb Smith reminded us that while "less code" is sometimes given as an advantage of using CBVs, it really depends on what you're doing. Each case is different.

I pointed out that there seem to be some common misconceptions about CBVs.

Misconception: Functional views are deprecated and we're all supposed to be writing class-based views now.

Fact: Functional views are fully supported and not going anywhere. In many cases, they're a good choice.

Misconception: CBVs means using the generic class-based views that Django provides.

Fact: You can use as much or as little of Django's generic views as you like, and still be using class-based views. I like Vanilla Views as a simpler, easier to understand alternative to Django's generic views that still gives all the advantages of class-based views.

So, when to use class-based views? We decided the most common reason is if you want to reuse code across views. This is common, for example, when building APIs.

Caktus Technical Director Mark Lavin has a simple answer: "I default to writing functions and refactor to classes when needed writing Python. That doesn't change just because it's a Django view."

On the other hand, Developer Rebecca Muraya and I tend to just start with CBVs, since if the view will ever need to be refactored that will be a lot easier if it was split up into smaller bits from the beginning. And so many views fall into the standard patterns of Browse, Read, Edit, Add, and Delete that you can often implement them very quickly by taking advantage of a library of common CBVs. But I'll fall back to Mark's system of starting with a functional view when I'm building something that has pretty unique behavior.

Tim HopperHow I Became a Data Scientist Despite Having Been a Math Major

Caution: the following post is laden with qualitative extrapolation of anecdotes and impressions. Perhaps ironically (though perhaps not), it is not a data driven approach to measuring the efficacy of math majors as data scientists. If you have a differing opinion, I would greatly appreciate you to carefully articulate it and share it with the world.

I recently started my third "real" job since finishing school; at my first and third jobs I have been a "data scientist". I was a math major in college (and pretty good at it) and spent a year in the math Ph.D. program at the University of Virginia (and performed well there as well). These two facts alone would not have equipped me for a career in data science. In fact, it remains unclear to me that those two facts alone would have prepared me for any career (with the possible exception of teaching) without significantly more training.

When I was in college Business Week published an article declaring "There has never been a better time to be a mathematician." At the time, I saw an enormous disconnect between the piece and what I was being taught in math classes (and thus what I considered to be a "mathematician"). I have come across other pieces lauding this as the age of the mathematicians, and more often than not, I've wondered if the author knew what students actually studied in math departments.

The math courses I had as an undergraduate were:

  • Linear algebra
  • Discrete math
  • Differential equations (ODEs and numerical)
  • Theory of statistics 1
  • Numerical analysis 1 (numerical linear algebra) and 2 (quadrature)
  • Abstract algebra
  • Number theory
  • Real analysis
  • Complex analysis
  • Intermediate analysis (point set topology)

My program also required a one semester intro to C++ and two semesters of freshman physics. In my year as a math Ph.D. student, I took analysis, algebra, and topology classes; had I stayed in the program, my future coursework would have been similar: pure math where homework problems consistent almost exclusively of proofs done with pen and paper (or in LaTeX).

Though my current position occasionally requires mathematical proof, I suspect that is rare among data scientist. While the "data science" demarcation problem is challenging (and I will not seek to solve it here), it seems evident that my curriculum lacked preparation in many essential areas of data science. Chief among these are programming skill, knowledge of experimental statistics, and experience with math modeling.

Few would argue that programming ability is not a key skill of data science. As Drew Conway has argued, a data scientist need not have a degree in computer science, but "Being able to manipulate text files at the command-line, understanding vectorized operations, thinking algorithmically; these are the hacking skills that make for a successful data hacker." Many of my undergrad peers, having briefly seen C++ freshman year and occasionally used Mathematica to solve ODEs for homework assignments, would have been unaware that manipulation of a file from the command-line was even possibile, much less have been able to write a simple sed script; there was little difference with my grad school classmates.

Many data science positions require even more than the ability to solve problems with code. As Trey Causey has recently explained, many positions require understanding of software engineering skills and tools such as writing reusable code, using version control, software testing, and logging. Though I gained a fair bit of programming skill in college, these skills, now essential in my daily work, remained foreign to me until years later.

My math training had a lack of statistics courses. Though my brief exposure to mathematical statistics has been valuable in picking up machine learning, experimental statistics was missing altogether. Many data science teams are interested in questions of causal inference and design and analysis of experiments; some would make these essential skills for a data scientist. I learned nothing about these topics in math departments. Moreover, machine learning, also a cornerstone of data science, is not a subject I could have even defined until after I was finished with my math coursework; at the end of college, I would have said artificial intelligence was mostly about rule-based systems in Lisp and Prolog.

Yet even if statistics had play a more prominent role in my coursework, those who have studied statistics know there is often a gulf between understanding textbook statistics and being able to effectively apply statistical models and methods to real world problems. This is only an aspect of a bigger issue: mathematical (including statistical) modeling is an extraordinarily challenging problem, but instruction on effectively model real world problems is absent from many math programs. To this day, defining my problem in mathematical terms one of the hardest problems I face; I am certain that I am not alone on this. Though I am now armed with a wide variety of mathematical models, it is rarely clear exactly which model can or should be applied in a given situation.

I suspect that many people, even technical people, are uncertain as to what academic math is beyond undergraduate calculus. Mathematicians mostly work in the logical manipulation of abstractly defined structures. These structures rarely bear any necessary relationship to physical entities or data sets outside the abstractly defined domain of discourse. Though some might argue I am speaking only of "pure" mathematics, this is often true of what is formally known as "applied mathematics". John D. Cook has made similar observations about the limitations of pure and applied math (as proper disciplines) in dubbing himself a "very applied mathematician". Very applied mathematics is "an interest in the grubby work required to see the math actually used and a willingness to carry it out. This involves not just math but also computing, consulting, managing, marketing, etc." These skills are conspicuously absent from most math curricula I am familiar with.

Given this description of how my schooling left me woefully unprepared for a career in data science, one might ask how I have had two jobs with that title. I can think of several (though probably not all) reasons.

First, the academic study of mathematics provides much of the theoretical underpinnings of data science. Mathematics underlies the study of machine learning, statistics, optimization, data structures, analysis of algorithms, computer architecture, and other important aspects of data science. Knowledge of mathematics (potentially) allows the learner to more quickly grasp each of these fields. For example, learning how principle component analysis—a math model that can be applied and interpreted by someone without formal mathematical training—works will be significantly easier for someone with earlier exposure linear algebra. On a meta-level, training in mathematics forces students to think carefully and solve hard problems; these skills are valuable in many fields, including data science.

My second reason is connect to the first: I unwittingly took a number of courses that later played important roles in my data science toolkit. For example, my current work in Bayesian inference has been made possible by my knowledge of linear algebra, numerical analysis, stochastic processes, measure theory, and mathematical statistics.

Third, I did a minor in computer science as an undergraduate. That provided a solid foundation for me when I decided to get serious about building programming skill in 2010. Though my academic exposure to computer science lacked any software engineer skills, I left college with a solid grasp of basic data structures, analysis of algorithms, complexity theory, and a handful of programming languages.

Fourth, I did a masters degree in operations research (after my year as a math PhD student convinced me pure math wasn't for me). This provided me with experience in math modeling, a broad knowledge of mathematical optimization (central to machine learning), and the opportunity to take graduate-level machine learning classes.1

Fifth, my insatiable curiosity in computers and problem solving has played a key role in my career success. Eager to learn something about computer programming, I taught myself PHP and SQL as a high school student (to make Tolkien fan sites, incidentally). Having been given small Mathematica-based homework assignments in freshman differential equations, I bought and read a book on programming Mathematica. Throughout college and grad school, I often tried—and sometimes succeeded—to write programs to solve homework problems that professors expected to be solved by hand. This curiosity has proven valuable time and time again as I've been required to learn new skills and solve technical problems of all varieties. I'm comfortable jumping in to solve a new problem at work, because I've been doing that on my own time for fifteen years.

Sixth, I have been been fortunate enough to have employers who have patiently taught me and given me the freedom to learn on my own. I have learned an enormous amount in my two and a half year professional career, and I don't anticipate slowing down any time soon. As Mat Kelcey has said: always be sure you're not the smartest one in the room. I am very thankful for three jobs where I've been surrounded by smart people who have taught me a lot, and for supervisors who trust me enough to let me learn on my own.

Finally,4 it would be hard for me to overvalue the four and a half years of participation in the data science community on Twitter. Through Twitter, I have the ear of some of data science's brightest minds (most of whom I've never met in person), and I've built a peer network that has helped me find my current and last job. However, I mostly want to emphasize the pedagogical value of Twitter. Every day, I'm updated on the release of new software tools for data science, the best new blog posts for our field, and the musings of of some of my data science heros. Of course, I don't read every blog post or learn every software tool. But Twitter helps me to recognize which posts are most worth my time, and because of Twitter, I know something instead of nothing about Theano, Scalding, and dplyr.2

I don't know to what extent my experience generalizes3, in either the limitations of my education or my analysis of my success, but I am obviously not going to let that stop me from drawing some general conclusions.

For those hiring data scientists, recognize that mathematics as taught might not be the same mathematics you need from your team. Plenty of people with PhDs in mathematics would be unable to define linear regression or bloom filters. At the same time, recognize that math majors are taught to think well and solve hard problems; these skills shouldn't be undervalued. Math majors are also experienced in reading and learning math! They may be able to read academic papers and understand difficult (even if new) mathematical more quickly than a computer scientist or social scientist. Given enough practice and training, they would probably be excellent programmers.

For those studying math, recognize that the field you love, in its formal sense, may be keeping you away from enjoyable and lucrative careers. Most of your math professors have spent their adult lives solving math problems on paper or on a chalkboard. They are inexperienced and, possibly, unknowledgeable about very applied mathematics. A successful career in pure mathematics will be very hard and will require you to be very good. While there seem to be lots of jobs in teaching, they will rarely pay well. If you're still an student, you have a great opportunity to take control of your career path. Consider taking computer science classes (e.g. data structures, algorithms, software engeering, machine learning) and statistics classes (e.g. experimental design, data analysis, data mining). For both students and graduates, recognize your math knowledge becomes very marketable when combined skills such as programming and machine learning; there are a wealth of good books, MOOCs, and blog posts that can help you learn these things. More over, the barrier to entry for getting started with production quality tools has never been lower. Don't let your coursework be the extent of your education. There is so much more to learn!5


  1. At the same time, my academic training in operations research failed me, in some aspects, for a successful career in operations research. For example, practical math modeling was not sufficiently emphasized and the skills of computer programming and software development were undervalued. 

  2. I have successfully answered more than one interview question by regurgitating knowledge gleaned from tweets. 

  3. Among other reasons, I didn't really plan to get where I am today. I changed majors no fewer than three times in college (physics, CS, and math) and essentially dropped out of two PhD programs! 

  4. Of course, I have plenty of data science skills left to learn. My knowledge of experimental design is still pretty fuzzy. I still struggle with effective mathematical modeling. I haven't deployed a large scale machine learning system to production. I suck at software logging. I have no idea how deep learning works. 

  5. For example, install Anaconda and start playing with some of these IPython notebooks

Tim HopperPublishing a Static Site Generator from iOS

A few weeks ago, I setup Travis CI so this Pelican-based blog will publish itself when I commit a new post to Github.

At the time, I asked on Twitter if there were any good Git clients that would allow me to push new posts from my iPad; I didn't get any promising replies.

However, I just found out about an app called Working Copy "a powerful Git client for iOS 8 that clones, edits, commits, pushes, and more."

I just cloned my Stigler Diet repo on my iPad, and I'm composing this post from the Whole Foods cafe on my iPad. If you're reading this post, it's because I successfully published it from here!

Astro Code SchoolVideo - Tips for Using Generators in Python

Here's the third screencast video in Caleb Smith's series about functional programming in Python. This one describes generators, iterators and iterables in Python with some tips on how to implement generators.

Don't forget to subscribe to the Astro Code School YouTube channel. Lots more educational screencasts to come.

Caktus GroupBeyond PEP 8 by Raymond Hettinger (PyCon 2015 Must-See Talk: 2/6)

Part two of six in our PyCon 2015 Must-See Series, a weekly highlight of talks our staff enjoyed at PyCon.

I think everyone who codes in any language and uses any automated PEP-8 or linter sort of code checker should watch this talk. Unfortunately to go into any detail on what I learned (or really was reminded of) would ruin the effect of actually watching the talk. I'd encourage everyone to watch it. I came away from the talk wanting to figure out a way to incorporate its lesson into our Caktus development practices.

Frank WierzbickiJython 2.7.0 final released!

On behalf of the Jython development team, I'm pleased to announce that the final release of Jython 2.7.0 is available! It's been a long road to get to 2.7, and it's finally here! I'd like to thank Amobee for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython, including - but not limited to - bug reports, patches, pull requests, documentation changes, support emails, and fantastic conversation on Freenode at #jython.

Along with language and runtime compatibility with CPython 2.7.0, Jython 2.7 provides substantial
support of the Python ecosystem. This includes built-in support of pip/setuptools (you can use with bin/pip) and a native launcher for Windows (bin/jython.exe), with the implication that you can finally install Jython scripts on Windows.

Jim Baker presented a talk at PyCon 2015 about Jython 2.7, including demos of new features.

Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Astro Code SchoolVideo - Implementing Decorators in Python

This screencast provides some insights into implementing decorators in Python using functional programming concepts and demonstrates some instances where decorators can be useful.

In the video, I reference the blog post Python Decorators in 12 Steps by Simeon Franklin for further reading.

Caktus GroupCaktus Wins Two Communicator Awards for PyCon 2015

We’re thrilled to announce that we’ve won two Communicator Awards in this year’s 2015 Communicator Awards competition. With over 6000 entries received from across the US and around the world, the Communicator Awards is considered the largest and most competitive international awards program honoring creative excellence for communications professionals.

Caktus Group was honored with the Gold Award for Excellence in Event Website and Silver Award for Distinction for Visual Appeal for the PyCon 2015 site. Both awards recognize the work of designer Trevor Ray, developers David Ray and Rebecca Muraya, and project manager Ben Riseling.

Of course, we’re excited for our work to be recognized, but these awards also represent an opportunity for PyCon to receive well-deserved recognition, especially for the hard work of the event’s organizers. With the 2015 Communicator Awards, they have been placed in the company of such large brands as the Canadian Olympic Team, Frito-Lay, Lexus, and Red Hat.

You can learn more about the origins of the site’s design and the design process for Trevor’s graphic design by listening to his lightning talk “Reimagining PyCon 2015”.

Caktus GroupAIGA Durham Studio Tour Recap

This was the first year Caktus Group participated in the AIGA studio tour and the turnout was amazing. From 5:30 PM till the 9:00 PM close, we had visitors ranging from students to tenured professionals in the design and web development fields sharing stories and touring the newly renovated Caktus Group office. Members from the Caktus design, development, and management teams were present to field questions, give tours, and show select works from the past year.

From the Epic Allies team, visitors got to see a preview of the app’s mini games and designs. Epic Allies is an app that seeks to gamify the process of taking HIV medication. The goal is to help HIV-positive individuals develop and maintain positive habits around taking their medication and making other healthy life choices. The Epic Allies project has been in progress since 2012 and it’s been great to see it evolve.

Visitors were also able to view and explore the 2015 PyCon website. The design and development of the website were completed by Caktus Group in early 2015. Elements of the design were then used throughout the PyCon conference venue in Montreal. The bright winding forms of the design worked well on screen, but they really enveloped the venue and tied everything together. It was a fantastic project made possible by the hard work of many Caktus staff and the conference organizers Ewa Jodlowska and Diana Clarke, who were great to work with.

Finally, there was a behind-the-scenes video of the Caktus Group reception sign installation and the original install template. The video was shot and edited by Caktus’ Wray Bowling and showed the start to finish process of installing the reception sign that was beautifully crafted by Jim at ArtCraft Sign Company - Thanks, Jim. Having missed the actual installation of the sign, I’m glad Wray captured the process.

By the time 9 PM rolled around, a lot of work was viewed, beers were drunk, and information was shared with new friends. If you didn’t make it out for this year’s AIGA studio tour, don’t be sad. You can still make it out next year. There are a lot of talented people in the Triangle and with so many open studio doors you’re bound to run into more than a few of them.

Caktus GroupMarketplace Radio Highlights How Service Info App Helps 1.5 Million Syrian Refugees

Image Courtesy of UK Department for International Development [CC BY 2.0], via Wikimedia Commons

Recently, one of our projects, Service Info, received national attention thanks to a Marketplace interview. American Public Media’s Kai Ryssdal spoke with International Rescue Committee CEO David Miliband about how Service Info is helping 1.5 million refugees of the Syrian conflict in Lebanon. The Syrian conflict is one of the worst ongoing humanitarian crises, accounting for the majority of the world’s refugees.

“We don’t just need to do more in the Syria crisis, but we’ve got to do things differently,” said Miliband. “The refugees from Syria are educated people, they’re tech savvy people.”

Enter Service Info, a platform developed by Caktus in conjunction with the IRC and the United States government to provide a mobile means for refugees to report on, rate, and find the services available to them. Thus far, displaced persons have been one among millions, adrift without the means to inform themselves or take action in their own self-care. The Service Info platform acts as a reliable source of information, informing individuals as to where they can cash in various vouchers for goods and aid services for instance, or where their children can attend school. More significantly, the platform enables users to comment on these services. Such feedback will in turn improve the quality of service.

“Until now, there’s been no proper tech platform for [refugees] to find out what services are available to them,” said Miliband.

Service Info is revolutionary in providing just such a platform. Once the system has been in use on the ground for a certain length of time, Caktus and the IRC hope to increase the reach of Service Info by open sourcing the app. Making the source code freely available enables others to use, improve upon, and replicate the platform. Agencies working in conflict zones and natural disasters would be able to use it to support displaced persons.

Listen to the complete interview to learn more about the excellent work being done by the International Rescue Committee in supporting the world’s most challenging crises.

Caktus GroupPyCon 2015 Talks: Our Must See Picks (1/6)

Whether you couldn’t make it to PyCon this year, were busy attending one of the other amazing talks, or were simply too enthralled by the always popular “hallway track”, there are bound to be talks you missed out on. Thankfully, the PyCon staff does an amazing job not only organizing the conference for the attendees and the days of the conference, but also by producing recordings of all the talks for anyone who couldn’t attend. Even if you attended, you couldn’t have seen every talk, so these recordings are a great safety net.

Because there are so many of them, I asked those who attended for suggestions. We will share our six favorites, one a week, for the next few weeks. Take some time to watch and learn from these talented speakers from Caktus staff who can’t stop talking about the great time they had in Montreal.

Keynote by Jacob Kaplan-Moss

Suggested by Technical Director Mark Lavin

"Jacob's keynote on Sunday was amazing. He really breaks down the myth of the 10x programmer and why it hurts the tech community. Everyone should watch it. I came away from this talk thinking about how we could improve our hiring and review process to ensure we aren't falling in the traps set by this myth. He's an amazing speaker and leader for our community."

Caktus GroupWhy did Caktus Group start Astro Code School?

Our Astro Code School is now officially accepting applications to its twelve-week Python & Django Web Development class for intermediate programmers! To kick off Astro’s opening, we asked Caktus’ CTO and co-founder Colin Copeland, who recently won a 2015 Triangle Business Journal 40 Under 40 Leadership Award, and Astro’s Director Brian Russell to reflect on the development of Astro as well as the role they see the school playing in the Django community.


Why open the East Coast’s first Django and Python-focused code school?

Colin: Technology is an important part of economic growth in the Triangle area and we wanted to make sure those opportunities reached as many residents as possible. We saw that there were no East Coast formal adult training programs for Django or Python, our specialities. We have experience in this area, having hosted successful Django boot camps and private corporate trainings. Opening a code school was a way to consolidate the training side of Caktus’ business while also giving back to the Triangle-area community by creating a training center to help those looking to learn new skills.

Brian: Ultimately, Caktus noticed a need for developers and the lack of a central place to train them. The web framework Django is written in Python and Python is a great language for beginning coders. Python is the top learning language for the nation’s best universities.Those are the skills prominent here at Caktus. It was an opportunity to train more people and prepare them for the growing technology industry at firms like Caktus.

How has demand for Django-based web applications changed since Caktus first began?

Colin: It has increased significantly. We only do Django development now, we weren’t specialized in that way when we first started. The sheer number of inbound sales requests is much higher than before. More people are aware of Django, conferences are bigger. Most significantly, it has an ever-growing reputation as a more professional, stable, and maintainable framework than other languages.

How does Astro, then, fit into this growth timeline?

Colin: It’s a pretty simple supply and demand ratio. Astro comes out of a desire to add more developers to the field and meet a growing demand for Django coders. The Bureau of Labor Statistics projects a 20% growth in demand for web developers by 2020. It is not practical to wait for today’s college, high school, or even middle-school students to become developers. Many great software developers are adults coming from second or third careers. Our staff certainly reflects this truth. Astro provides one means for talented adults to move into the growing technology industry.

Where do you see Astro fitting in to the local Python and Django community? For instance, how do you envision Astro’s relationship to some of the groups Caktus maintains a strong relationship with, such as Girl Develop It or TriPython?

Colin: Astro’s goals clearly align with those of Girl Develop It in terms of training and support. And the space will be a great place to host events for local groups and classes.

Brian: Yeah, I see it as a very natural fit. We hope to help those organizations by sponsoring meetups, hosting events, and providing free community programs and workshops. And there is the obvious hope that folks from those groups will enroll as students at Astro. I think it’s also important to note that Chris Calloway, one of the lead organizers for TriPython, is a member of the Astro advisory committee. There is a natural friendship with that community.

How do you hope Astro will change and add to Durham’s existing technical community?

Brian: In general there are a lot students with training from Astro who will be able to bring their skills to local businesses, schools, non-profits—all sorts of organizations. For me, computer programming is like reading, writing, and arithmetic: it should be a part of core curriculum for students these days. It helps people improve their own net worth and contribute to the local economy. Astro is all about workforce development and improving technical literacy: two things that help entrepreneurs and entrepreneurial enterprises.

What are some of the main goals for Astro in its first year?

Brian: I want to help people find better, higher paying jobs by obtaining skills that are usable in the current economy through our 12-week classes. I’m personally interested in social economic justice and one way to achieve that is by being highly skilled. Training helps people better themselves no matter what kind of education it is. In the 21st century, computer programming education is one of the most powerful tools for job preparedness and improvement.

Colin: I would love to follow alumni who make it through the classes and see how their skills help them in their careers.

A huge amount of work has gone into getting Astro licensed with the North Carolina Community College Board. A lot of code schools are not licensed. Why was this an important step for Astro?

Brian: Mainly because we wanted to demonstrate to potential students and the public at large that we’ve done our due diligence, that other groups and professionals have vetted us and qualified us as prepared to serve. Ultimately we are licensed in order to protect consumers. Not just licensed—we’re bonded, licensed, and insured. And this is an ongoing guarantee to our students. We will be audited annually for six years. I see it as a promise for continuous and ongoing protection, betterment, and improvement.

So, who would you describe as the ideal student for an Astro course?

Brian: A lot of students. Any. All different kinds. But, more specifically? I would recommend it to folks changing their career. Or people who graduated from high school, but for one reason or another are not able to go onto higher education. Astro classes will be excellent for job preparedness and training so anyone looking to market themselves in the current economy.

Additionally, anyone fine tuning their career after college or even after grad school. Coding and learning to code is an excellent way to earn money to pay for school without getting into debt. Astro is in no way a replacement for higher ed, but coding classes can augment a well-rounded education. Successful people have a diverse education. And learning to code enables people to align their toolkits for the modern job market.


To learn more about Astro, meet Colin and Brian in person, and celebrate the opening of Astro Code School, be sure to stop by the school’s Launch Party on Friday, May 1st from 6:00 to 9:00 pm. Registration is required.

Astro Code SchoolVideo - Functional Programming in Python

In this video our Lead Instructor Caleb Smith presents basic functional programming concepts and how to apply them in Python. Check back later for more screencasts here and on the new Astro YouTube channel.

Astro Code SchoolIntro to Django by PyLadies RDU

PyLadies RDU will be offering a free four hour workshop on Django here at Astro! It'll be taught by Caktus Django developer Rebecca Conley. They'll conduct it here at Astro Code School on Saturday May 30, 2015 from 4pm to 8pm. For more information and to RSVP please join the Pyladies RDU meetup group.

Caktus GroupQ1 2015 Charitable Giving

Though our projects often have us addressing issues around the globe, we like to turn our focus to the local level once a quarter with our charitable giving program. Each quarter we ask our employees to suggest charities and organizations that they are involved in or have had a substantive influence on their lives. It’s our way of supporting not only our own employees, but the wider community in which we live and work. This quarter we are pleased to be sending contributions to the following organizations:

The Scrap Exchange

http://scrapexchange.org
The Scrap Exchange is a nonprofit creative reuse center in Durham, North Carolina whose mission is to promote creativity and environmental awareness. The Scrap Exchange provides a sustainable supply of high-quality, low-cost materials for artists, educators, parents, and other creative people. This is the second time staff nominated this organization.

Durham County Library

http://durhamcountylibrary.org/
The Durham County Library provides extensive library services, including book, DVD, audiobook, and A/V equipment rentals. They also provide computer services, internet access, meeting and study rooms on site, as well as a bookmobile and Older Adult and Shut-In Services for those unable to visit the library. Aside from the library’s service towards the community, their archives were incredibly helpful in the restoration of the building at 108 Morris St where our office is now located. Caktus is particularly thankful for the work of Lynn Richardson, Local History Librarian of the North Carolina Collection, for her invaluable help in the restoration process.

Preservation Durham

http://preservationdurham.org/
Preservation Durham’s mission is to protect Durham’s historic assets through action, advocacy, and education. They provide home tours, walking tours, and virtual tours of Durham. They also advocate for historic places in peril and provide informative workshops for those interested in preserving and restoring historical sites. Their workshops were vital in the restoration of our historic office building in downtown Durham.

Durham Bike Co-Op

http://www.durhambikecoop.org/
The Durham Bike Co-op is an all-volunteer, nonprofit, community bike project whose programming includes hands-on repair skill share, the earn-a-bike program, various mobile bike clinics, and community ride events. They help people build, repair, maintain and learn about bicycles and bicycle commuting. Their community-oriented vision and shared labor practices are definitively Durham.

Diaper Bank of North Carolina

http://ncdiaperbank.org/
Safety net programs such as food stamps and WIC do not cover diapers. And a healthy supply of diapers can fall out of the financial reach of many using these programs. The Diaper Bank of North Carolina provides diapers to families in need. The organization makes it easy to get involved—in fact, Caktus leadership volunteered not too long ago—and it addresses a critical need in the fight against poverty in the Triangle.

Frank WierzbickiJython 2.7 release candidate 3 available!

On behalf of the Jython development team, I'm pleased to announce that the third release candidate of Jython 2.7 is available! I'd like to thank Amobee for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Caktus GroupCaktus Group's Colin Copeland Recognized Among TBJ’s 40 Under 40

Caktus co-founder and Chief Technology Officer, Colin Copeland, is among an outstanding group of top business leaders to receive the Triangle Business Journal’s 2015 40 Under 40 Leadership Award. The award recognizes individuals for their remarkable contributions to their organizations and to the community.

Colin was one of the co-founders of Caktus, started in 2007 around a second-hand Chapel Hill dining room table. Now, Caktus is the nation’s largest custom web and mobile software firm specializing in Django, an open source web framework. Caktus has built over 100 solutions that have reached more than 4 million lives. Clutch.io, a research firm, lists Caktus as one of the nation’s top web development firms. As a direct result of Colin’s guidance and vision, Caktus has built technology that not only helps business clients, but has addressed some of the most difficult global challenges facing us today: humanitarian aid for war refugees, HIV/AIDS, and open access to democracy, among others.

Colin also served as UNICEF’s community coordinator for RapidSMS, a platform to build technology for developing nations quickly and freely. He used his experience as part of the Django open source community to lay the foundations of a global network of developers working towards improving the world. RapidSMS projects, featured on the BBC, Time Magazine, Fast Company, and others, have reached untold millions in the effort to improve daily lives.

Colin, a Durham resident, is passionate about improving his local community. He used his community-building skills and keen technical expertise to found Code for Durham, a volunteer group dedicated to improving civic engagement by building free technology tools. The group includes software developers, designers, civic activists, policy experts, and government employees. Colin, along with key Code for Durham members, successfully lobbied for increased Durham government transparency via a new Open Data Manager position. The group is working on web applications to help with school navigation, homelessness, bike crash locations, and more.

In keeping with the spirit of supporting his local Durham community, Colin led the historic restoration of Caktus’ new headquarters in downtown Durham. He ensured renovations included a community meeting space that could support local technology groups such as TriPython, Girl Develop It RDU, and PyLadies RDU. He is also a member of Durham’s Rotary Club.

A strong advocate for the power of technology to change lives, Colin led the founding of Caktus’ Astro Code School. Astro provides full-time software development education for adults in an inclusive environment, and will increase access to the Triangle’s growing technology industry.

Colin will be honored at the 40 Under 40 Leadership Awards Gala on June 11th at the Cary Prestonwood Country Club. The Triangle Business Journal will also profile him in a special section of their June 12th print edition.

Caktus GroupPyCon 2015 ReCap

The best part of PyCon? Definitely the people. This is my fifth PyCon, so I’ve had a chance to see the event evolve, especially with the fantastic leadership of Ewa Jodlowska and Diana Clarke. We were also lucky enough to work with them on the PyCon 2015 website. This year we were once again located in the Centre-Ville section of Montreal, close to lots of great restaurants and entertainment.

Mark Lavin, David Ray, and Caleb Smith arrived before the official start of the conference to host a workshop on “Building SMS Applications with Django.” As avid users of RapidSMS for many of our of projects, including UNICEF’s Project Mwana and the world’s first SMS voter registration app for Libya, it was a great experience to share our knowledge.

We also had a chance to work with future Django developers through the DjangoGirls Workshop this year. Karen Tracey, David Ray, and Mark Lavin served as mentors to help the mentees build their first Django app. It was wonderful to watch new programmers develop their first apps and we are looking forward to participating in similar events in the future.

The conference kicked off Thursday night with a reception where we debuted a game we built during one of our ShipIt Days. Our Caktus-designed “Ultimate Tic Tac Toe” was a huge hit!

Also on Thursday, the O'Reilly booth held a book signing for Mark Lavin’s Lightweight Django that he coauthored with Julia Elman. An impressively long line of people showed up for the event. Luckily, Mark’s around the office enough that we can get him to sign all sorts of books for us.

Look at all those people!

Friday and Saturday the trade booth show was in full swing. At the Caktus booth, people continued to line up to play “Ultimate Tic Tac Toe” and we gave away five copies of Mark’s book, Lightweight Django, as well as three quadcopters. We were sad to see the quadcopters leave the office but hope that the new recipients enjoy playing with them as much as we did.

We also had some visits from our PyCon 2015 ticket giveaway winners. We gave tickets to the Python community at large and to our local community groups here in North Carolina, including TriPython, Girl Develop It RDU, and PyLadies RDU.

Duckling, an app we developed to make it easier to find and join casual outings at conferences, was also in full use this year at PyCon. We brought along the app’s mascot Quacktus. He even had his own Twitter handle this year to give a bird’s eye view of PyCon happenings. It was great to once again use the app to meet new people and catch up with old friends while exploring Montreal.

On the last night of PyCon, PyLadies held their charity auction and Caktus donated a framed collage of Trevor Ray’s preliminary artwork and sketches that went into his redesign of the PyCon 2015 website. We were very honored that it sold for $1,000 (the second highest bidded item, second only to Disney’s artwork) and are glad we can provide support to all of the awesome work PyLadies does for the community.

PyCon was, as always, a terrific time for us and we can’t wait until 2016. See you in Portland!

Caktus GroupNow Launching: Astro Code School for Django and Python Education

Since moving to Durham in Fall 2014, we've been busy here at Caktus. We just finished renovating the first floor of our headquarters to bring the Triangle's (and East Coast's!) first Django and Python code school, Astro Code School. We're proud to say that the school is now officially open and we'll be celebrating with a public launch party on May 1st.

I spoke with Colin Copeland, Caktus co-founder and Chief Technology Officer, about why Astro matters to Caktus and our region here in North Carolina: "The Triangle has seen an influx of people relocating here to be a part of a thriving technology sector. However, as business leaders, we have a responsibility to make sure innovation in the Triangle doesn’t leave people behind. For folks who have lived here and seen Durham and the rest of the Triangle evolve, we want to make sure they have the opportunity to be a part of the change. That starts with education, and that’s why we are opening the Astro Code School.”

The Bureau of Labor Statistics predicts a 20% growth for web developer jobs from 2012-2022. That growth is twice the projected growth of all U.S. occupations for the same period. Not only will Astro train developers to fill this job sector, but it will also focus on the Django and Python languages, of which Python is widely recognized as the leading language for the next generation of programmers.

“It’s an exciting time be in technology,” adds Colin. “It’s a field whose reach extends beyond the latest cool app. It’s clear technology will play a large part in solving some of the biggest issues of our time— humanitarian aid for war refugees, HIV/AIDS, and access to democracy, just to name a few. That’s some of the work we do at Caktus and we want to make sure Astro Code School gives future technologists the same tools to work towards making the world a better place.”

The first class in Python and Django Web Engineering will be held May 18th through August 10th of this year. Applications are now open and due May 11th.

To celebrate the opening of the school, we will be hosting a launch party on Friday, May 1st from 6:00 to 9:00 pm. Registration is required.. The event will be held in our newly renovated historic space at 108 Morris St in downtown Durham. We hope to see you there!

Josh JohnsonRaspberry Pi Build Environment In No Time At All

Leveraging PRoot and qemu, it’s easy to configure Raspberry Pi’s, build and install packages, without the need to do so on physical hardware. It’s especially nice if you have to work with many disk images at once, create specialized distributions, reset passwords, or install/customize applications that aren’t yet in the official repositories.

I’ve recently dug in to building apps and doing fun things with the Raspberry Pi. With the recent release of the Raspberry Pi 2, its an even more exciting platform. I’ve documented what I’ve been using to make my workflow more productive.

Table Of Contents

Setup

We’ll use a Linux machine. Below are setup instructions for Ubuntu and Arch. I prefer Arch for desktop and personal work, I use Debian or Ubuntu for production deployments.

Arch Linux is a great “tinkerer’s” distribution – if you haven’t used it before it’s worth checking out. It’s great on the Raspberry Pi.

Debian and Ubuntu have some differences, but share the same base and use the same package management system (apt). I’ve included instructions for Ubuntu in particular, since it’s the most similar to Raspbian, the default Raspberry Pi operating system, and folks may be more familiar with that environment.

Generally speaking, you’ll need the following things:

  • A physical computer or virtual machine running some version of Linux (setup instructions are provided for the latest Arch and Ubuntu, but any Linux should work).
  • Installation files for the Raspberry Pi.
  • SD cards suitable for whatever Raspberry Pi you have. We’ll learn how to work with raw disk images and how to copy disk images to SD cards.
  • QEMU, an emulator system, and it’s ARM processor support (the Raspberry Pi uses an ARM processor).
  • PRoot – a convenience tool that makes it easy to mount a “foreign” filesystem and run commands inside of it without booting.
  • A way to create disk images, and mount them like physical devices.

Once the packages are installed, the commands and processes for building and working with Raspberry Pi boot disks are the same.

NOTE: we assume you have sudo installed and configured.

Virtual Machine Notes

If you’re using an Apple (Mac OS X) computer or Windows, the easiest way to work with Linux systems is via virtualization. VirtualBox is available for most platforms and is easy to work with.

The virtualbox documentation can walk you through the installation of VirtualBox and creating your first virtual machine.

When working with an SD card, you’ll might want to follow instructions for “Access to entire physical hard disk” to make the card accessible to the virtual machine. As an alternative, you could use a USB SD card reader, and usb pass-thru to present not the disk to the virtual machine, but the entire USB device, and let the virtual machine deal with mounting it.

Both of these approaches can be (very) error prone, but provide the most “native” way of working.

Instead I’d recommend installing guest additions. With guest additions installed in your virtual machine, you can use the shared folders feature of VirtualBox. This makes it easy to copy disk images created in your virtual machine to your host machine, and then you can use the standard instructions for Windows and Mac OS to copy the disks images to your SD cards.

Advanced Usage Note: Personally, my usual method of operations with VirtualBox VMs is to set up Samba in my virtual machine and share a folder over a host-only network (or I’ll use bridged networking so I can connect to it from any machine on my LAN) – I’d consider this a more “advanced” approach but I’ve had more consistent results for day-to-day work than using guest additions or mounting host disks. However, for the simple task of just copying disk images back and forth to the virtual machine, the shared folders feature should suffice. 

Arch Linux

We’ll use pacman and wget to procure and install most of the tools we need:

$ sudo pacman -S dosfstools wget qemu unzip pv
$ wget http://static.proot.me/proot-x86_64
$ chmod +x proot-x86_64
$ sudo mv proot-x86_64 /usr/local/bin/proot

First, we install the following packages:

dosfstools
Gives us the ability to create FAT filesystems, required for making a disk bootable on the RaspberryPi.
wget
General purpose file grabber – used for downloading installation files and PRoot
qemu
QEMU emulator – allows us to run RaspberryPi executables
unzip
Decompresses ZIP archives.
pv
Pipeline middleware that shows a progress bar (we’ll be using it to make copying disk images with dd a little easier for the impatient)

Then we download PRoot, make the file executable, and copy it to a common location for global executable that everyone on a machine can access, /usr/local/bin. This location is just a suggestion – to follow along with the examples in this article, you just need to put the proot executable somewhere on your $PATH.

Finally, we’ll use an AUR package to obtain the kpartx tool.

kpartx wraps a handful of tasks required for creating loopback devices into a single action.

If you haven’t used the AUR before, check out the documentation first for an overview of the process, and to install prerequisites.

$ wget https://aur.archlinux.org/packages/mu/multipath-tools/multipath-tools.tar.gz
$ tar -zxvf multipath-tools.tar.gz
$ cd multipath-tools
$ makepkg
$ sudo pacman -U sudo pacman -U multipath-tools-*.pkg.tar.xz

Ubuntu

Ubuntu Desktop comes with most of the tools we need (in particular, wget, the ability to mount dos file systems, and unzip). As such, the process of getting set up for using PRoot is a bit simpler, compared to Arch.

Ubuntu uses apt-get for package installation.

$ sudo apt-get install qemu kpartx pv
$ wget http://static.proot.me/proot-x86_64
$ chmod +x proot-x86_64
$ sudo mv proot-x86_64 /usr/local/bin/proot

First, we install the following packages:

qemu
QEMU emulator – allows us to run RaspberryPi executables
kpartx
Helper tool that wraps a handful of tasks required for creating loopback devices into a single action.
pv
Pipeline middleware that shows a progress bar (we’ll be using it to make copying disk images with dd a little easier for the impatient)

Then, we install PRoot by downloading the binary from proot.me, making it executable, and putting it somewhere on our $PATH, /usr/local/bin, making it available to all users on the system. This location is merely a suggestion, but putting the proot executable somewhere on your $PATH will make it easier to follow along with the examples below.

Working With A Disk Image

A disk (in the Raspberry Pi’s case, we’re talking about an SD card) is just an arrangement of blocks for data storage. On top of those blocks is a description of how files are represented in those blocks, or a filesystem (for more detail, see the Wikipedia articles on Disk Storage and File System).

Disks can exist in the physical world, or can be represented by a special file, called a disk image. We can download pre-made images with Raspbian already installed from the official Raspberry Pi downloads page.

$ wget http://downloads.raspberrypi.org/raspbian_latest -O rasbian_latest.img.zip
$ unzip rasbian_latest.img.zip
Archive:  raspbian_latest.zip
  inflating: 2015-02-16-raspbian-wheezy.img

Take note of the name of the img file – it will vary depending on the current release of Raspbian at the time.

At this point we have a disk image we can mount by creating a loopback device. Once we have it mounted, we can use QEMU and PRoot to run commands within it without fully booting it.

We’ll use kpartx to set up a loopback device for each partition in the disk image:

$ sudo kpartx -a -v 2015-02-16-raspbian-wheezy.img 
add map loop0p1 (254:0): 0 114688 linear /dev/loop0 8192
add map loop0p2 (254:1): 0 6277120 linear /dev/loop0 122880

The -a command line switch tells kpartx to create new loopback devices. The -v switch asks kpartx to be more verbose and print out what it’s doing.

We can do a dry-run and inspect the disk image using the -l switch:

$ sudo kpartx -l 2015-02-16-raspbian-wheezy.img
loop0p1 : 0 114688 /dev/loop0 8192
loop0p2 : 0 6277120 /dev/loop0 122880
loop deleted : /dev/loop0

We can see the partitions to be sure, using fdisk -l

$ sudo fdisk -l /dev/loop0

Disk /dev/loop0: 3.1 GiB, 3276800000 bytes, 6400000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0009bf4f

Device       Boot  Start     End Sectors Size Id Type
/dev/loop0p1        8192  122879  114688  56M  c W95 FAT32 (LBA)
/dev/loop0p2      122880 6399999 6277120   3G 83 Linux

We can also see them using lsblk:

$ lsblk
NAME      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda         8:0    0 14.9G  0 disk 
└─sda1      8:1    0 14.9G  0 part /
sdc         8:32   0 29.8G  0 disk 
└─sdc1      8:33   0 29.8G  0 part /run/media/jj/STEALTH
loop0       7:0    0  3.1G  0 loop 
├─loop0p1 254:0    0   56M  0 part 
└─loop0p2 254:1    0    3G  0 part 

Generally speaking, the first, smaller partition will be the boot partition, and the others will hold data. It’s typical with RaspberryPi distributions to use a simple 2-partition scheme like this.

The new partitions will end up in /dev/mapper:

$ ls /dev/mapper
control  loop0p1  loop0p2

Now we can mount our partitions. We’ll first make a couple of descriptive directories for mount points:

$ mkdir raspbian-boot raspbian-root
$ sudo mount /dev/mapper/loop0p1 raspbian-boot
$ sudo mount /dev/mapper/loop0p2 raspbian-root

At this point we can go to the next section where we will run PRoot and start doing things “inside” the disk image.

Working With An Existing Disk

We can use PRoot with an existing disk (SD card) as well. The first step is to insert the disk into your computer. Your operating system will likely automatically boot it. We also need to find out which device the disk is registered as.

lsblk can answer both questions for us:

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0 14.9G  0 disk 
└─sda1   8:1    0 14.9G  0 part /
sdb      8:16   1 14.9G  0 disk 
├─sdb1   8:17   1   56M  0 part /run/media/jj/boot
└─sdb2   8:18   1    3G  0 part /run/media/jj/f24a4949-f4b2-4cad-a780-a138695079ec
sdc      8:32   0 29.8G  0 disk 
└─sdc1   8:33   0 29.8G  0 part /run/media/jj/STEALTH

On my system, the SD card I inserted (a Raspbian disk I pulled out of a Raspberry Pi) came up as /dev/sdb. It has two paritions, sdb1 and sdb2. Both partitions were automatically mounted, to /run/media/jj/boot and /run/media/jj/f24a4949-f4b2-4cad-a780-a138695079ec, respectively.

Typically, the first, smaller partition will be the boot partition. To verify this, we’ll again use fdisk -l:

$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 14.9 GiB, 16021192704 bytes, 31291392 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0009bf4f

Device     Boot  Start     End Sectors Size Id Type
/dev/sdb1         8192  122879  114688  56M  c W95 FAT32 (LBA)
/dev/sdb2       122880 6399999 6277120   3G 83 Linux

Here we see that /dev/sdb1 is 56 megabytes in size, and is of type “W95 FAT32 (LBA)”. This is typically indicative of a RasbperryPi boot partition, so /dev/sdb1 is our boot partition, and /dev/sdb2 is our root partition.

We can use the existing mounts that the operating system set up automatically for us, if we want, but it’s a bit easier to un-mount the partitions and mount them somewhere more descriptive, like raspbian-boot and raspbian-boot:

$ sudo umount /dev/sdb1 /dev/sdb2
$ mkdir -p raspbian-boot raspbian-root
$ sudo mount /dev/sdb1 raspbian-boot
$ sudo mount /dev/sdb2 raspbian-root

Note: The -p switch to mkdir causes mkdir to ignore already-exsiting directories. We’ve added it here in case you were following along in the previous section and already have these directories handy.

A call to lsblk will confirm that we’ve mounted things as we expected:

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0 14.9G  0 disk 
└─sda1   8:1    0 14.9G  0 part /
sdb      8:16   1 14.9G  0 disk 
├─sdb1   8:17   1   56M  0 part /run/media/jj/STEALTH/raspbian-boot
└─sdb2   8:18   1    3G  0 part /run/media/jj/STEALTH/raspbian-root
sdc      8:32   0 29.8G  0 disk 
└─sdc1   8:33   0 29.8G  0 part /run/media/jj/STEALTH

Now we can proceed to the next section, and run the same PRoot command to configure, compile and/or install things – but this time we’ll be working directly on the SD card instead of inside of a disk image.

Basic Configuration/Package Installation

Now that we’ve got either a disk image or a physical disk mounted, we can run commands within those filesystems using PRoot.

NOTE: The following command line switches worked for me, but took some experimentation to figure out. Please take some time to read the PRoot documentation so you understand exactly what the switches mean.

We can run any command directly (like say, apt-get) but it’s useful to be able to “log in” to the disk image (run a shell), and then perform our tasks:

$ sudo proot -q qemu-arm -S raspbian-root -b raspbian-boot:/boot /bin/bash

This mode of PRoot forces the root user inside of the disk image. The -q switch wraps every command in the qemu-arm emulator program, making it possible to run code compiled for the RaspberryPi’s ARM processor. The -S parameter sets the directory that will be the “root” – essentially that means that raspbian-root will map to /. -S also fakes the root user (id 0), and adds some protections for us in the event we’ve mixed in files from our host system that we don’t want the disk image code to modify. -b splices in additional directories – we add the /boot partition, since that’s where new kernel images and other boot-related stuff gets installed. This isn’t entirely necessary, but its useful for system upgrades and making changes to boot settings. Finally, we tell PRoot which command to run, in this case, /bin/bash, the BASH shell.

Now that we’re “in” the disk image, we can update and install new packages.

Since root is not a “normal” user in the default Rasbian installation, the path needs to be adjusted:

# export PATH=$PATH:/usr/sbin:/sbin:/bin:/usr/local/sbin

Now we can do the update/upgrade, and install any additional packages we might want (for example, the samba file sharing server):

# apt-get update
# apt-get upgrade
# apt-get install samba

Check out the man page for apt-get for full details (type man apt-get at a shell prompt).

You will likely see a lot of warnings and possibly errors when installing packages – these can usually be ignored, but make note of them – there may be some environmental tweaks that need to be made.

We can do almost anything in the PRoot environment that we could do logged into a running Raspberry Pi.

We can edit config.txt and change settings (for an explanation of the settings, see the documentation):

# vi /boot/config.txt

We can add a new user:

# adduser jj
Adding user `jj' ...
Adding new group `jj' (1004) ...
Adding new user `jj' (1001) with group `jj' ...
Creating home directory `/home/jj' ...
Copying files from `/etc/skel' ...
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
Changing the user information for jj
Enter the new value, or press ENTER for the default
	Full Name []: Josh Johnson
	Room Number []: 
	Work Phone []: 
	Home Phone []: 
	Other []: 

We can grant a user sudo privileges (the default sudo configuration allows anyone in the sudo group to run commands as root via sudo):

# usermod -a -G sudo jj
# groups jj
jj : jj sudo

You can reset someone’s password, or change the password of the default pi user:

# passwd pi
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully

The possibilities here are endless, with a few exceptions:

  • Running code that relies on the GPIO pins or drivers loaded into the kernel will not work.
  • Configuring devices (like, say, a wifi adapter) may work, but device information will likely be wrong.
  • Testing startup/shutdown scripts – since we’re not booting the disk image, these scripts aren’t run.

Compiling For The RPi

Raspbian comes with most of the tools we’ll need (in particular, the build-essential package). Lets build and install the nginx web server – a relatively easy to build package.

If you’ve never compiled software on Linux before, most (but not all!) source code packages are provided as tarballs, and include some scripts that help you build the software in what’s known as the “configure, make, make install” (or CMMI) procedure.

Note: For a great explanation (with examples you can follow to build your own CMMI package), George Brocklehurst wrote an excellent article explaining the details behind CMMI called “The magic behind configure, make, make install“.

First we’ll need to obtain the nginx tarball:

# wget http://nginx.org/download/nginx-1.7.12.tar.gz
# tar -zxvf nginx-1.7.12.tar.gz

Next we’ll look for a README or INSTALL file, to check for any extra build dependencies:

# cd nginx-1.7.12
# ls -l
total 660
-rw-r--r-- 1 jj   indiecity 249016 Apr  7 15:35 CHANGES
-rw-r--r-- 1 jj   indiecity 378885 Apr  7 15:35 CHANGES.ru
-rw-r--r-- 1 jj   indiecity   1397 Apr  7 15:35 LICENSE
-rw-r--r-- 1 root root          46 Apr 18 10:21 Makefile
-rw-r--r-- 1 jj   indiecity     49 Apr  7 15:35 README
drwxr-xr-x 6 jj   indiecity   4096 Apr 18 10:21 auto
drwxr-xr-x 2 jj   indiecity   4096 Apr 18 10:21 conf
-rwxr-xr-x 1 jj   indiecity   2478 Apr  7 15:35 configure
drwxr-xr-x 4 jj   indiecity   4096 Apr 18 10:21 contrib
drwxr-xr-x 2 jj   indiecity   4096 Apr 18 10:21 html
drwxr-xr-x 2 jj   indiecity   4096 Apr 18 10:21 man
drwxr-xr-x 2 root root        4096 Apr 18 10:23 objs
drwxr-xr-x 8 jj   indiecity   4096 Apr 18 10:21 src
# view README

We’ll note that, helpfully (cue eye roll) that nginx has put into the README:

Documentation is available at http://nginx.org

A more direct link gives us a little more useful information. Scanning this, there aren’t any obvious dependencies or features we want to add/enable, so we can proceed.

We can also find out which options are available by running ./configure --help.

Note: There are several configuration options that control where files are put when the compiled code is installed – they may be of use, in particular the standard --PREFIX. This can help segregate multiple versions of the same application on a system, for example if you need to install a newer/older version and already have one installed via the apt package. It is also useful to build self-contained directory structures that you can easily copy from one system to another.

Run ./configure, note any warnings or errors. There may be some modules or other things not found – that’s typically OK, but can help explain why an eventual error happened toward the end of the configure script or during compilation:

# cd nginx-1.7.12
# ./configure
...
checking for PCRE library ... not found
checking for PCRE library in /usr/local/ ... not found
checking for PCRE library in /usr/include/pcre/ ... not found
checking for PCRE library in /usr/pkg/ ... not found
checking for PCRE library in /opt/local/ ... not found
...

./configure: error: the HTTP rewrite module requires the PCRE library.
You can either disable the module by using --without-http_rewrite_module
option, or install the PCRE library into the system, or build the PCRE library
statically from the source with nginx by using --with-pcre=<path> option.

Whoa, we ran into a problem! For our use case (just showing off how to do a CMMI build in a PRoot environment) we probably don’t need the rewrite module, so we can re-run ./configure with the --without-http_rewrite_module switch.

However, it’s useful to understand how to track down dependencies like this, and rewriting is a pretty killer feature of any http server, so lets install the dependency.

The configure script mentions the “PCRE library”. PCRE stands for “Perl Compatible Regular Expressions”. Perl is a classical systems language that has hard-core text processing capabilities. It’s particularly known for its regular expression support and syntax. The Perl regular expression syntax is so useful in fact, that some folks built a library allowing other programmers to use it without having to use Perl itself.

Note: This information can be found by using your favorite search engine!

There are two ways libraries like PCRE are installed. The first, and easiest, is that a system package will be available with the library pre-compiled and ready to go. The second will require the same steps we’re following to install nginx – download a tarball, extract, and configure, make, make install.

To find a package, you can use apt-cache search or aptitude search.

I prefer aptitude, since it will tell us what packages are already installed:

# aptitude search pcre
v   apertium-pcre2                                     -                                                             
p   cl-ppcre                                           - Portable Regular Express Library for Common Lisp            
p   clisp-module-pcre                                  - clisp module that adds libpcre support                      
p   gambas3-gb-pcre                                    - Gambas regexp component                                     
p   haskell-pcre-light-doc                             - transitional dummy package                                  
p   libghc-pcre-light-dev                              - Haskell library for Perl 5-compatible regular expressions   
v   libghc-pcre-light-dev-0.4-4f534                    -                                                             
p   libghc-pcre-light-doc                              - library documentation for pcre-light                        
p   libghc-pcre-light-prof                             - pcre-light library with profiling enabled                   
v   libghc-pcre-light-prof-0.4-4f534                   -                                                             
p   libghc-regex-pcre-dev                              - Perl-compatible regular expressions                         
v   libghc-regex-pcre-dev-0.94.2-49128                 -                                                             
p   libghc-regex-pcre-doc                              - Perl-compatible regular expressions; documentation          
p   libghc-regex-pcre-prof                             - Perl-compatible regular expressions; profiling libraries    
v   libghc-regex-pcre-prof-0.94.2-49128                -                                                             
p   libghc6-pcre-light-dev                             - transitional dummy package                                  
p   libghc6-pcre-light-doc                             - transitional dummy package                                  
p   libghc6-pcre-light-prof                            - transitional dummy package                                  
p   liblua5.1-rex-pcre-dev                             - Transitional package for lua-rex-pcre-dev                   
p   liblua5.1-rex-pcre0                                - Transitional package for lua-rex-pcre                       
p   libpcre++-dev                                      - C++ wrapper class for pcre (development)                    
p   libpcre++0                                         - C++ wrapper class for pcre (runtime)                        
p   libpcre-ocaml                                      - OCaml bindings for PCRE (runtime)                           
p   libpcre-ocaml-dev                                  - OCaml bindings for PCRE (Perl Compatible Regular Expression)
v   libpcre-ocaml-dev-werc3                            -                                                             
v   libpcre-ocaml-werc3                                -                                                             
i   libpcre3                                           - Perl 5 Compatible Regular Expression Library - runtime files
p   libpcre3-dbg                                       - Perl 5 Compatible Regular Expression Library - debug symbols
p   libpcre3-dev                                       - Perl 5 Compatible Regular Expression Library - development f
p   libpcrecpp0                                        - Perl 5 Compatible Regular Expression Library - C++ runtime f
p   lua-rex-pcre                                       - Perl regular expressions library for the Lua language       
p   lua-rex-pcre-dev                                   - PCRE development files for the Lua language                 
v   lua5.1-rex-pcre                                    -                                                             
v   lua5.1-rex-pcre-dev                                -                                                             
v   lua5.2-rex-pcre                                    -                                                             
v   lua5.2-rex-pcre-dev                                -                                                             
p   pcregrep                                           - grep utility that uses perl 5 compatible regexes.           
p   pike7.8-pcre                                       - PCRE module for Pike                                        
p   postfix-pcre                                       - PCRE map support for Postfix       

See man aptitude for full details, but the gist is that p means the package is available but not installed, v is a virtual package that points to other packages, and i means the package is installed.

What we want is a package with header files and modules we can compile against – these are usually named lib[SOMETHING]-dev.

Scanning the list, we see a package named libpcre3-dev – this is probably what we want, we can find out by installing it:

# apt-get install libpcre3-dev

Now we can re-run ./configure and see if it works:

# ./configure
...
checking for PCRE library ... found
...
Configuration summary
  + using system PCRE library
  + OpenSSL library is not used
  + using builtin md5 code
  + sha1 library is not found
  + using system zlib library

  nginx path prefix: "/usr/local/nginx"
  nginx binary file: "/usr/local/nginx/sbin/nginx"
  nginx configuration prefix: "/usr/local/nginx/conf"
  nginx configuration file: "/usr/local/nginx/conf/nginx.conf"
  nginx pid file: "/usr/local/nginx/logs/nginx.pid"
  nginx error log file: "/usr/local/nginx/logs/error.log"
  nginx http access log file: "/usr/local/nginx/logs/access.log"
  nginx http client request body temporary files: "client_body_temp"
  nginx http proxy temporary files: "proxy_temp"
  nginx http fastcgi temporary files: "fastcgi_temp"
  nginx http uwsgi temporary files: "uwsgi_temp"
  nginx http scgi temporary files: "scgi_temp"

The library was found, the error is gone, and so now we can proceed with compilation.

To build nginx, we simply run make:

# make

If all goes well, then you can isntall it:

# make install

This same basic process can be used to build custom applications written in C/C++, to build applications that aren’t yet in the package repository, or build applications with specific features or optimizations enabled that the standard packages might not have.

Using Apt To Install Build Dependencies

One more useful thing that apt-get can do for us: it can install the build dependencies for any given package in the repository. This allows us to get most, if not all, potentially missing dependencies to build a known application.

We could have started off with our nginx exploration by first installing it’s build dependencies:

# apt-get build-dep nginx

This won’t solve every dependency issue, but it’s a useful tool in getting all of your ducks in a row for building, especially for more complex things like desktop applications.

Be careful with build-dep – it can bring in a lot of things, some you may not really need. In our case it’s not really a problem, but be aware of space limitations.

Umount and Clean Up

Once we’ve gotten our disk image configured as we like, we need to un-mount it.

First, we need to exit the bash shell we started with PRoot, then we’ll call sync to ensure all data is flushed to any disks:

# exit
$ sync

Now we can un-mount the partitions (the command is the same whether we’re using a disk image or an SD card):

$ sudo umount raspbian-root rasbian-boot

We can double-check that the disk is no longer mounted by calling mount without any additional parameters, or using lsblk

$ mount
...

With lsblk, we’ll still see the disks (or loopback devices) present, but not mounted:

$ lsblk
NAME      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda         8:0    0 14.9G  0 disk 
└─sda1      8:1    0 14.9G  0 part /
sdc         8:32   0 29.8G  0 disk 
└─sdc1      8:33   0 29.8G  0 part /run/media/jj/STEALTH
loop0       7:0    0  3.1G  0 loop 
├─loop0p1 254:0    0   56M  0 part 
└─loop0p2 254:1    0    3G  0 part 

If we’re using a disk image, we’ll want to destroy the loopback devices. This is accomplished with kpartx -d:

$ sudo kpartx -d 2015-02-16-raspbian-wheezy.img

We can verify that it’s gone using lsblk again:

$ lsblk
sda      8:0    0 14.9G  0 disk 
└─sda1   8:1    0 14.9G  0 part /
sdc      8:32   0 29.8G  0 disk 
└─sdc1   8:33   0 29.8G  0 part /run/media/jj/STEALTH

At this point we can write the disk image to an SD card, or eject the SD card and insert it into a Raspberry Pi.

Writing a Disk Image to an SD Card

We’ll use the dd command, which writes raw blocks of data from one block device to another, to copy the disk image we made into an SD card.

NOTE: The SD card you use will be COMPLETELY erased. Proceed with caution.

First, insert the SD card into your computer (or card reader, etc). Depending on your system, it may be automatically mounted. We can find out the device name and if its mounted using lsblk:

$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0  14.9G  0 disk 
└─sda1   8:1    0  14.9G  0 part /
sdb      8:16   1  14.9G  0 disk 
├─sdb1   8:17   1 114.3M  0 part 
├─sdb2   8:18   1     1K  0 part 
└─sdb3   8:19   1    32M  0 part /run/media/jj/SETTINGS
sdc      8:32   0  29.8G  0 disk 
└─sdc1   8:33   0  29.8G  0 part /run/media/jj/STEALTH

We can see the new disk came up as sdb. It has three partitions, sdb1, sdb2, and sdb3. Looking at the MOUNTPOINT column, we can tell that my operating system auto-mounted sdb3 into the /run/media/jj/SETTINGS directory.

Note: The partition layout may vary depending on what was on the SD card before you inserted it. My SD card had a fresh copy of NOOBS that hadn’t yet installed an OS.

We can double-check that sdb is the right disk with fdisk:

$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 14.9 GiB, 16021192704 bytes, 31291392 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000cb53d

Device     Boot    Start      End  Sectors   Size Id Type
/dev/sdb1           8192   242187   233996 114.3M  e W95 FAT16 (LBA)
/dev/sdb2         245760 31225855 30980096  14.8G 85 Linux extended
/dev/sdb3       31225856 31291391    65536    32M 83 Linux

fdisk tells us that this is a 16GB drive. The exact amount cited by some drive manufacturers is not in “real” gigabytes, an exponent of 2[*] but in billions of bytes – note the byte count: 16,021,192,704.

We can see the three partitions, and what format they are in. The small FAT filesystem is a good indication that this is a bootable Raspberry Pi disk.

With a fresh SD card, the call to fdisk may look more like this:

Disk /dev/sdb: 14.9 GiB, 16021192704 bytes, 31291392 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start      End  Sectors  Size Id Type
/dev/sdb1        8192 31291391 31283200 14.9G  c W95 FAT32 (LBA)

Most SD cards are pre-formatted with a single partition containing a FAT32 filesystem.

It’s important to be able to differentiate between your system drives and the target for copying over your disk image – if you point dd at the wrong place, you can destroy important things, like your operating system!

Now that we’re sure that /dev/sdb is our SD card, we can proceed.

Since lsblk indicated that at least one of the partitions was mounted (sdb3), we will fist need to un-mount it:

$ sudo umount /dev/sdb3

Now we can verify it’s indeed not mounted:

$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0  14.9G  0 disk 
└─sda1   8:1    0  14.9G  0 part /
sdb      8:16   1  14.9G  0 disk 
├─sdb1   8:17   1 114.3M  0 part 
├─sdb2   8:18   1     1K  0 part 
└─sdb3   8:19   1    32M  0 part 
sdc      8:32   0  29.8G  0 disk 
└─sdc1   8:33   0  29.8G  0 part /run/media/jj/STEALTH

And copy the disk image:

$ sudo dd if=2015-02-16-raspbian-wheezy.img of=/dev/sdb bs=4M
781+1 records in
781+1 records out
3276800000 bytes (3.3 GB) copied, 318.934 s, 10.3 MB/s

This will take some time, and dd gives no output until it’s finished. Be patient.

dd has a fairly simple interface. The if option indicates the in file, or the disk (or disk image in our case) that is being copied. The of option sets the out file, or the disk to write to. bs sets the block size, which indicates how big of a piece of data to write at a time.

The bs value can be tweaked to get faster or more reliable performance in various situations – we’re using 4M (four megabytes) as recommended by raspberrypi.org. The larger the value, the faster dd will run, but there are physical limits to what your system can handle, so it’s best to stick with the recommended value.

So dd gives us no output until it’s completed. This is kind of an annoying thing about dd but it can be remedied. The easiest way is to install a tool called pv, and split the command – pv acts as an intermediary between two commands and displays a progress bar as it moves along. dd can read and write data to a pipe (details). So we can use two dd commands, put pv in the middle, and get a nice progress bar.

Here’s the same copy as before, but using pv:

Note: Here we’re using sh -c to wrap the command pipeline in quotes. This allows us to provide the entire pipeline as a single unit. If we didn’t, the shell would interpret the first pipe in the pipeline as part of the call to sudo, and not what we want to run as root.

$ ls -l 2015-02-16-raspbian-wheezy.img 
-rw-r--r-- 1 jj jj 3276800000 Apr 18 07:58 2015-02-16-raspbian-wheezy.img
$ sudo sh -c "dd if=2015-02-16-raspbian-wheezy.img bs=4M | pv --size=3276800000 | dd of=/dev/sdb"
 613MiB 0:02:31 [4.22MiB/s] [===========>                                                      ] 19% ETA 0:10:04
# exit

We pass pv a --size argument to give it an idea of how big the file is, so it can provide accurate progress. We found out the size of our disk image using ls -l., which shows the size of the file in bytes.

If we run lsblk again, we’ll see the different partition arrangement now on sdb:

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda      8:0    0 14.9G  0 disk 
└─sda1   8:1    0 14.9G  0 part /
sdb      8:16   1 14.9G  0 disk 
├─sdb1   8:17   1   56M  0 part 
└─sdb2   8:18   1    3G  0 part 
sdc      8:32   0 29.8G  0 disk 
└─sdc1   8:33   0 29.8G  0 part /run/media/jj/STEALTH

fdisk -l gives a bit more detail:

$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 14.9 GiB, 16021192704 bytes, 31291392 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0009bf4f

Device     Boot  Start     End Sectors Size Id Type
/dev/sdb1         8192  122879  114688  56M  c W95 FAT32 (LBA)
/dev/sdb2       122880 6399999 6277120   3G 83 Linux

Now we can sync the disks:

$ sync

At this point we have an SD card we can put into a Raspberry Pi and boot.

[*] (1GB = 1 byte * 1024 (kilobyte) * 1024 (megabyte) * 1024, or 1,073,741,824 bytes)

Extra Credit: Making our own disk image

Some distributions, such as Arch, don’t distribute disk images, but instead distribute tarballs of files. They let you set up the disk however you want, then copy the files over to install the operating system.

We can create our own disk images using fallocate, and then use fdisk or parted (or if you prefer a GUI, gparted) to partition the disk.

We’ll create a disk image for the latest Arch Linux ARM distribution for the Raspberry Pi 2.

Note: You must create the disk image file on a compatible filesystem, such as ext4, for this to work. This is the default system disk filesystem for most modern Linux distributions, including Arch and Ubuntu, so for most people this isn’t a problem. The implication is that this will not work on, say, an external hard drive formatted in an incompatible format, such as FAT32.

First we’ll create an 8 gigabyte empty disk image:

$ fallocate -l 8G arch-latest-rpi2.img

We’ll use fdisk to partition the disk. We need two partitions. The first will be 100 megabytes, formatted as FAT32. We’ll need to set the partition’s system id to correspond to FAT32 with LBA so that the Raspberry Pi’s BIOS knows how to read it.

Note: I’ve had trouble finding documentation as to exactly why FAT + LBA is required, the assumption is it has something to do with how the ARM processor loads the operating system in the earliest boot stages; if anyone knows more detail or can point me to the documentation about this, it would be greatly appreciated!

The offset for the partition will be 2048 blocks – this is the default that fdisk will suggest (and what the Arch installation instructions tell us to do).

Note: This seems to work well- however, there is some confusion about partition alignment. The Raspbian disk images use a 8192 block offset, and there is a lot of information available explaining how a bad alignment can cause quicker SD card degradation and hurt write performance. I’m still trying to figure out the best way to address this, this is another area where community help would be appreciated :) Here are a few links that dig into the issue: http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device, http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/, http://3gfp.com/wp/2014/07/formatting-sd-cards-for-speed-and-lifetime/.

The second partition will be ext4, and use the rest of the the available disk space.

We’ll start fdisk and get the initial prompt. No changes will be saved until we instruct fdisk to do so:

$ fdisk arch-latest-rpi2.img
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x152a22d4.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help):

Most of the information here is just telling us that this is a block device with no partitions. If you need help, as indicated, you can type m:

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

First, we need to create a new disk partition table. This is done by entering o:

Command (m for help): o
Building a new DOS disklabel with disk identifier 0xa8e8538a.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Next, we’ll create our first primary partition, the boot partition, at 2048 blocks offset, 100MB in size.

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-16777215, default 2048): 2048
Last sector, +sectors or +size{K,M,G} (2048-16777215, default 16777215): +100M

By using the relative number +100M, we save ourselves some trouble of having to do math to figure out how many sectors we need.

We can see what we have so far, by using the p command:

Command (m for help): p

Disk arch-latest-rpi2.img: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa8e8538a

               Device Boot      Start         End      Blocks   Id  System
arch-latest-rpi2.img1            2048      206847      102400   83  Linux

Next, we need to set the partition type (system id) by entering t:

 
Command (m for help): t
Selected partition 1
Hex code (type L to list codes): L

 0  Empty           24  NEC DOS         81  Minix / old Lin bf  Solaris
 1  FAT12           27  Hidden NTFS Win 82  Linux swap / So c1  DRDOS/sec (FAT-
 2  XENIX root      39  Plan 9          83  Linux           c4  DRDOS/sec (FAT-
 3  XENIX usr       3c  PartitionMagic  84  OS/2 hidden C:  c6  DRDOS/sec (FAT-
 4  FAT16 <32M      40  Venix 80286     85  Linux extended  c7  Syrinx
 5  Extended        41  PPC PReP Boot   86  NTFS volume set da  Non-FS data
 6  FAT16           42  SFS             87  NTFS volume set db  CP/M / CTOS / .
 7  HPFS/NTFS/exFAT 4d  QNX4.x          88  Linux plaintext de  Dell Utility
 8  AIX             4e  QNX4.x 2nd part 8e  Linux LVM       df  BootIt
 9  AIX bootable    4f  QNX4.x 3rd part 93  Amoeba          e1  DOS access
 a  OS/2 Boot Manag 50  OnTrack DM      94  Amoeba BBT      e3  DOS R/O
 b  W95 FAT32       51  OnTrack DM6 Aux 9f  BSD/OS          e4  SpeedStor
 c  W95 FAT32 (LBA) 52  CP/M            a0  IBM Thinkpad hi eb  BeOS fs
 e  W95 FAT16 (LBA) 53  OnTrack DM6 Aux a5  FreeBSD         ee  GPT
 f  W95 Ext'd (LBA) 54  OnTrackDM6      a6  OpenBSD         ef  EFI (FAT-12/16/
10  OPUS            55  EZ-Drive        a7  NeXTSTEP        f0  Linux/PA-RISC b
11  Hidden FAT12    56  Golden Bow      a8  Darwin UFS      f1  SpeedStor
12  Compaq diagnost 5c  Priam Edisk     a9  NetBSD          f4  SpeedStor
14  Hidden FAT16 <3 61  SpeedStor       ab  Darwin boot     f2  DOS secondary
16  Hidden FAT16    63  GNU HURD or Sys af  HFS / HFS+      fb  VMware VMFS
17  Hidden HPFS/NTF 64  Novell Netware  b7  BSDI fs         fc  VMware VMKCORE
18  AST SmartSleep  65  Novell Netware  b8  BSDI swap       fd  Linux raid auto
1b  Hidden W95 FAT3 70  DiskSecure Mult bb  Boot Wizard hid fe  LANstep
1c  Hidden W95 FAT3 75  PC/IX           be  Solaris boot    ff  BBT
1e  Hidden W95 FAT1 80  Old Minix
Hex code (type L to list codes): c
Changed system type of partition 1 to c (W95 FAT32 (LBA))

After the t command, we opted to enter L to see the list of possible codes. We then see that W95 FAT32 (LBA) corresponds to the code c.

Now we can make our second primary partition for data storage, utilizing the rest of the disk. We again use the n command:

Command (m for help): n
Partition type:
   p   primary (1 primary, 0 extended, 3 free)
   e   extended
Select (default p): p
Partition number (1-4, default 2): 2
First sector (206848-16777215, default 206848):
Using default value 206848
Last sector, +sectors or +size{K,M,G} (206848-16777215, default 16777215):
Using default value 16777215

We accepted the defaults for all of the prompts.

Now, entering p again, we can see the state of the partition table:

Command (m for help): p

Disk arch-latest-rpi2.img: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xa8e8538a

               Device Boot      Start         End      Blocks   Id  System
arch-latest-rpi2.img1            2048      206847      102400    c  W95 FAT32 (LBA)
arch-latest-rpi2.img2          206848    16777215     8285184   83  Linux

Now we can write out the table (w), which will exit fdisk:

Command (m for help): w
The partition table has been altered!


WARNING: If you have created or modified any DOS 6.x
partitions, please see the fdisk manual page for additional
information.
Syncing disks.

Now we need to format the partitions. We’ll use kpartx to create block devices for us that we can format:

$ sudo kpartx -av arch-latest-rpi2.img
add map loop0p1 (252:0): 0 204800 linear /dev/loop0 2048
add map loop0p2 (252:1): 0 16570368 linear /dev/loop0 206848

As we saw earilier, the devices will show up in /dev/mapper, as /dev/mapper/loop0p1 and /dev/mapper/loop0p2.

First we’ll format the boot partition loop0p1, as :

$ sudo mkfs.vfat /dev/mapper/loop0p1
mkfs.fat 3.0.26 (2014-03-07)
unable to get drive geometry, using default 255/63

Next the data partition, in ext4 format:

$ sudo mkfs.ext4 /dev/mapper/loop0p2
mke2fs 1.42.9 (4-Feb-2014)
Discarding device blocks: done
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
518144 inodes, 2071296 blocks
103564 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2122317824
64 block groups
32768 blocks per group, 32768 fragments per group
8096 inodes per group
Superblock backups stored on blocks:
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

At this point we just need to mount the new filesystems, download the installation tarball and use tar to extract and copy the files:

First we’ll grab the installation files:

$ wget http://archlinuxarm.org/os/ArchLinuxARM-rpi-2-latest.tar.gz

Next we’ll mount the new filesystems:

$ mkdir arch-root arch-boot
$ sudo mount /dev/mapper/loop0p1 arch-boot
$ sudo mount /dev/mapper/loop0p2 arch-root

And finally populate the disk image with the system files, and move the boot directory to the boot partition:

$ sudo tar -xpf ArchLinuxARM-rpi-2-latest.tar.gz -C arch-root
$ sync
$ sudo mv arch-root/boot/* arch-boot/

We’re using a few somewhat less common parameters for tar. Typically we’ll use -xvf to tell tar to extract (-x), be verbose (-v) and specify the file (-f). We’ve added the -p switch to preserve permissions. This is especially important with system files.

The -C switch tells tar to change to the arch-root directory before extraction, effectively extracting the files directly to the root filesystem.

You may see some warnings about extended header keywords, these can be ignored.

Now we just need to clean up (unmount, remove the loopback devs):

$ sudo umount arch-root arch-boot
$ sudo kpartx -d arch-latest-rpi2.img

Now we’ve got our own Arch disk image we can distribute, or copy onto SD cards. We can also mount it on the loopback and use PRoot to further configure it, as we did above with Raspbian.

Where To Go From Here

With this basic workflow, we can do all sorts of interesting things. A few ideas:

  • Distribute disk images pre-configured with applications we created.
  • Pre-configure images and SD cards for use in classrooms, meetups, demos, etc.
  • Set up a cron job that runs nightly and creates a disk image with the latest packages.
  • Build our own packages (either just create tarballs or use a tool like FPM and build deb packages) for drivers and other software and save other folks the hassle of doing this themselves.
  • Create rudimentary disk duplication setups for putting one image on a bunch of SD cards.
  • Fix broken installs.
  • Construct build and testing systems; integrate with tools like Jenkins.

So there we go – now you can customize the Raspberry Pi operating system with impunity, on your favorite workstation or laptop machine. If you have any questions, corrections, or suggestions for ways to streamline the process, please leave a comment!


Tim HopperUpdated About Me Page

I just gave my About Me page a long over due update.

Astro Code SchoolAstro Launch Party

RSVP: astro-caktus.eventbrite.com
What: Astro Code School Launch Party
Where: 108 Morris Street, Suite 1B, Durham, NC 27705
When: May 1, 2015, 6pm - 9pm

You are invited to the Astro Code School launch party! We’ll have light refreshments and opportunities to meet the fine folks at Astro and Caktus Consulting Group. Come learn more about the first full-time code school to specialize in Python and Django on the East Coast!

Please RSVP at the URL above. I hope you can make it!

Astro Code SchoolFULL Class Syllabus for Python & Django Web Engineering

A day by day full class syllabus with a lot more information about what you can learn in our Python & Django Web Engineering class is now available. It's now all on it's own page. (BTW, we call this class BE 102. BE stands for Back End. It's a formal name to differentiate it from other classes we plan on providing.)

The deadline to apply for our first class is May 11. If you're interested please head on over to the Apply page and fill out the form.

Thanks!

Caktus GroupEpic Allies Team Members to Speak at Innovate your Cool

The Art of Cool festival is a staple of spring happenings in the Triangle. A three-day festival to present, promote, and preserve jazz and jazz-influenced music, The Art of Cool always promises to be a great time for those interested in music, art, and delicious food from Durham’s many food trucks. But what does music have to do with programming and app development? This year, Caktus Group is helping to sponsor a new portion of the festival called Innovate Your Cool. Innovate Your Cool celebrates the power of cool ideas, advancing innovative thinking by bringing together intelligent people with radically new ideas.

Not only is Caktus helping to sponsor the event, but our very own Digital Health Product Manager NC Nwoko will be giving a lightning talk on “Hacking HIV Stigma with Game Apps” with Assistant Professor at UNC Gillings Scool of Global Public Health Kate Muessig. Both Kate and NC are part of the team of intelligent people working on the Epic Allies gaming app for young men and teens who are HIV positive.

The Epic Allies project, originally begun in 2012 in collaboration with Duke and UNC, is a gaming app that seeks to make taking HIV medication—as well as creating and maintaining healthy habits—fun. The app uses games and social networking to reinforce drug adherence, thereby lowering viral loads and curbing the spread of HIV. It is an innovative mHealth solution for a high-risk population in critical need, an ideal topic for the Innovate Your Cool conference.

Also present will be the keynote speaker, Wayne Sutton, talking about diversity in the fields of tech and innovation. Other topics of the event will include startup culture, innovation on “Black Wall Street,” community and economic development, and a panel discussion on Code the Dream. Dr. Chris Emdin of #hiphoped will also be leading a hackathon combining science and hip hop and geared towards high school aged students.

Innovate your Cool is Saturday, April 25th from 10 am to 4pm and will be hosted at American Underground. Register today. We can’t wait!

Caktus GroupFrom Intern to Professional Developer

Quite often, people undertake internships at the beginning of their career. For me, it was a mid-career direction change and a leap of faith. In order to facilitate this career move, I took a Rails Engineering class at The Iron Yard in the fall of 2014. I had limited experience as a developer and no experience in Django prior to my internship at Caktus. Because of the structure and support Caktus provided and my enthusiasm for becoming a developer, my internship turned out to be the ideal way for me to make the transition from a novice to a professional developer.

What I Expected

When I chose to make this career shift, I read and thought a great deal about the challenges I might face due to my age and gender. I had minimal apprehensions about coding itself. I like math and languages, and I’m a good problem-solver. My concerns were about how I would navigate a new industry at this point in my life.

While I had general concerns about making this leap, I was sure Caktus was the place I wanted to try it. When I was in code school, I met Caktus employees and saw some of the work they do, particularly SMS apps in the developing world. It was clear that Caktus’ values as a company align well with mine. They are principled, creative people whose apps make significant and sustainable positive impact on people’s lives. I was excited to be part of a team whose work I supported so wholeheartedly.

What Caktus did

Caktus did a number of things, both consciously and subconsciously, to create a welcoming and supportive environment in which I could learn and succeed. The sexism and ageism that is allegedly rampant in tech is notably absent at Caktus. My co-workers understood that I was a capable but inexperienced developer. They were all eager to share their knowledge and help without making any assumptions about me or my abilities. Sharing knowledge cooperatively is standard operating procedure throughout Caktus, and I think it’s one of the reasons the company is so successful.

Something Caktus did, very deliberately, to help me was to provide me with a female mentor, Karen Tracey, Lead Developer and Technical Manager at Caktus. While any of the developers at Caktus would make great mentors, pairing me with a woman who has worked as a developer for her entire career was incredibly valuable. Karen provided me with thoughtful guidance and insight gained from her experience. She was able to guide me in career choices, goal setting, and on navigating an industry that can be very unwelcoming to women. She showed me that I can succeed and be happy in this industry and, more importantly, helped me figure out how. She also helped me strategize about how I can open doors for others in this industry, particularly those from groups underrepresented in tech. That’s a personal goal of mine, and I one I know I will find support from Caktus in pursuing.

What Rob did

Caktus provided additional support in the form of another co-worker, Rob Lineberger who worked very closely and patiently with me on coding itself. We worked on a real client project, and Rob was very good at scaling work for me so that I could experience some challenges and some accomplishments each day. When I was stuck on a problem, Rob intuited what conceptual background I needed to move forward. He walked me through problems so that I would be able to use the skills and knowledge I was acquiring in the future when I was working on a problem on my own. Working with Rob on this project ended up being a series of lessons in the fundamentals of web development that, in the end, gave me a broad and useful toolbox to use after the internship.

What I did

Because the project was well managed, I was able to work on a variety of different pieces in order to get a really good sense of how a Django app works. One piece of which I took significant ownership was a routing app that communicated with the google directions API. This app in particular required that I explore Javascript and JQuery in a confined, practical context, a very useful opportunity for me to expand my skills. Having discrete, challenging, yet attainable assignments like this created an ideal learning experience, and I was able to produce code that was demonstrated to the client.

In addition to this app, I worked with tables, database logic, and testing, all essential to understanding how Django apps work. I gained knowledge and confidence, and I had a lot of fun coding and getting to know my co-workers professionally and personally. The experience allowed me to see myself as a developer, more specifically as a developer at Caktus. Happily, Caktus saw me the same way, and I am thrilled to continue as a full-time developer with this passionate, dedicated, and inspiring group of people.

Josh JohnsonAdvanced Boot Scripting

As covered in previous posts, boot is an all-around useful tool for building clojure applications, but one feature in particular has proven a adjuncti finalum*: boot lets you do clojure scripting. This elevates clojure to the same high productivity of scripting languages (like my personal favorite, Python), but bakes in dependency management and other goodies. This allows the user to build complexity iteratively, in a straight-forward manner (verses generating a bunch of boiler plate project code and building a package). This article explores boot scripting further, illustrating how boot can be used to quickly and easily develop and distribute applications and tools. There’s also discussion about getting your jars into Clojars, and setting up a simple bare-minimum Maven repository.

* EDIT: I originally had “interfectorem pluma” to represent “killer feature” in Latin, however thanks to danielsmulewicz in #hoplon reminding me how stupid Google Translate can be, I consulted a Latin->English dictionary and Wikipedia to attempt an uneducated, but better Latin equivalent. I mention it here because it’s all extremely funny, as interfectorem pluma literally translates to something like “feather murderer”. In my amateur approach adjuncti finalum literally translates to something like “characteristic of the ultimate goal”, which, if even remotely correct, is pretty accurate.

Setup

<a title="Boot: Getting Started With Clojure In As I’ve covered before, boot is easy to install. All you need is a JDK and the boot executable. Here’s a recap for the Linux and OSX crowd, just to get you going (we’ll assume you already have a JDK set up, have wget, and have sudo privileges):

$ wget https://github.com/boot-clj/boot/releases/download/2.0.0/boot.sh
$ mv boot.sh boot && chmod a+x boot && sudo mv boot /usr/local/bin
$ boot -u

Note that we are also instructing boot to update itself. This is useful if you’ve used boot in the past – the executable and the core boot libraries are distributed separately.

Making Boot Faster

Adding the following to your environment will speed boot startup by a vast amount. You can either run this command in your terminal, or make it permanent by putting this line into ~/.bash_profile or similar other files for your particular shell. See the JVM-Options page in the boot documentation for details, and other ways to incorporate these settings into your projects:

export BOOT_JVM_OPTIONS="-client -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none"

A Simple Script

For this article, we’ll start with an example of a useful application that grabs the most recent tweet from the Nihilist Arby’s twitter feed. A great addition to your MOTD to de-motivate users overzealous about the fact that they have SSH privileges to your machine.

Twitter API Tokens

Before we begin, set up an application and obtain a consumer key  using a twitter account for which you have the username and password. For the sake of security, you may want to limit the application’s access to read only. The tokens can be used to read anything in the account, and any private feeds the account has access to, so be careful.

Quick Note: Development Deviations

Since we’re not building anything right now, or utilizing the task infrastructure, we don’t need a build.boot file. However, to make prototyping a bit easier, it’s useful to create one that will load our dependencies or libraries we’re playing with, when we run boot repl:

#!/usr/bin/env boot
(set-env! :dependencies '[[twitter-api "0.7.8"]])

Alternatively, we can pre-load dependencies on the command line when we run the repl task:

$ boot -d twitter-api:0.7.8 repl

The Script: Version 1

For the first pass of the script, we will hard-code our credentials, and not bother taking any command-line arguments. This illustrates what a bare-minimum boot script looks like.

#!/usr/bin/env boot
(set-env! :dependencies '[[twitter-api "0.7.8"]])
  
(use '[twitter.oauth]
     '[twitter.api.restful]
     '[twitter.callbacks]
     '[twitter.callbacks.handlers])

(import '(twitter.callbacks.protocols SyncSingleCallback))

(defn printer
  [response]
    (println (:text (second response))))

(defn -main
  []
  (statuses-user-timeline 
    :oauth-creds 
      (make-oauth-creds 
        "[YOUR API KEY ID]"
        "[YOUR API KEY]")
    :callbacks (SyncSingleCallback. 
                (comp printer response-return-body)
                exception-print
                exception-print) 
    :params 
      {:screen-name "nihilist_arbys"
       :count 2}))

Making this script executable, it can be run on the command line. The result will be the last tweet. I named my script downer, but you can name it however you’d like:

$ chmod +x downer
$ ./downer
Rip it to shreds. Put it on a bun. Slather it in horsey sauce. Watch them line up to gorge. Feeding pigs to pigs. Arbys: a flat circle.

You may see some output on stderr about some missing logging libraries. For now, these can be ignored.

Lets take a quick look at the script’s main components:

  • The first 2 lines are what make this a boot script. The set-env! function and general information about environments can be found in the boot documentation.

    First we have the “shebang” line, which tells the operating system what interpreter to use to run the script. In this case, we’re taking advantage of the convention of having /bin/env available in the same location on most systems, to figure out where boot is. Then we declare our sole dependency on twitter-api.

  • lines 4-9 are typical use/import statements. In a boot script, a special namespace is created, called boot.user. You can alternatively load external code using the ns form. The example code could be replaced thusly:
    (ns boot.user
      (:use [twitter.oauth]
            [twitter.api.restful]
            [twitter.callbacks]
            [twitter.callbacks.handlers])
      
      (:import [twitter.callbacks.protocols SyncSingleCallback]))
    
  • Lines 11-28 are the “meat” of the program. Boot will execute the first -main function that it finds in a script. For details about what the code is doing, see the twitter-api and the twitter restful api documentation. In essence, the app makes a RESTful call to the twitter API, providing an API key and the necessary parameters. We then use a special callback to print the message from the result of that call.

Distribution/Installation: Mark 1

The real beauty of this boot script we have, is that it is a self-contained entity. We can send it to anyone who has boot and a JDK installed. They can place the script anywhere they like. Dependencies are automatically downloaded the first time its run.

A Not-So-Simple Script

Boot scripting provides a natural progression from “just a script” to “full-blown application”.

Boot scripts contain all of the functions needed to run, but this poses some problems:

  • as functionality grows, the script can quickly become unruly
  • because of the way boot encapsulates the running code, it can be difficult to debug.

The solution to both of these problems is to move code into other files, and use the -main function in your boot script to invoke that code.

This is handled quite simply by utilizing boot’s :source-paths environment option, and a little refactoring.

We’ll construct a directory named src, and create a last_tweet.clj file. In it, we’ll declare a new namespace, last-tweet, and move the code there.

src/last_tweet.clj:

(ns last-tweet
  (:use [twitter.oauth]
        [twitter.api.restful]
        [twitter.callbacks]
        [twitter.callbacks.handlers])
  (:import [twitter.callbacks.protocols SyncSingleCallback]))
  
(defn printer
  [response]
    (println (:text (first response))))

(defn last-tweet
  []
  (statuses-user-timeline 
    :oauth-creds 
      (make-oauth-creds 
        "[YOUR API KEY ID]"
        "[YOUR API KEY]")
    :callbacks (SyncSingleCallback. 
                (comp printer response-return-body)
                exception-print
                exception-print) 
    :params 
      {:screen-name "nihilist_arbys"
       :count 1}))

This code is copied from the original boot script, almost verbatim. We’ve just made use of our own namespace, and renamed -main to last-tweet.

Here is the new downer script:

#!/usr/bin/env boot
(set-env! 
  :dependencies '[[twitter-api "0.7.8"]]
  :source-paths #{"src"})

(require '[last-tweet :refer [last-tweet]])

(defn -main
  []
  (last-tweet))

This greatly simplifies our script, and does a better job of separating our concerns. We’ve segregated the application logic from the user interface. We’ve set ourselves up for some additional refactoring to make things more flexible.

We can add many namespaces to the src directory. We can also add other source paths – the :source-paths directive is a hash set.

Now we can refactor the last-tweet/last-tweet function to take credentials and the twitter account to get a tweet from as arguments:

(defn last-tweet
  "Print the last tweet from a given twitter account"
  [account secret-id secret-key]
  (let [creds (make-oauth-creds secret-id secret-key)
        callback (SyncSingleCallback. 
                  (comp printer response-return-body)
                  exception-print
                  exception-print)]
    (statuses-user-timeline 
      :oauth-creds creds
      :callbacks callback 
      :params 
        {:screen-name account
         :count 1})))

We’ve gone from a hard-coded function to one that is more general-purpose.

Now we can utilize boot’s extremely useful defclifn macro and boot’s task option DSL to wrap our function, allowing the user to provide the values on the command-line, creating a proper user interface.

#!/usr/bin/env boot
(set-env! 
  :dependencies '[[twitter-api "0.7.8"]]
  :source-paths #{"src"})

(require 
  '[last-tweet :refer [last-tweet]]
  '[boot.cli :as cli])

(cli/defclifn -main
  "Prints the last tweet from the given account. Requires twitter user app 
  authentication tokens. 
  
  The authentication tokens can be set using the command-line options below, or
  in the TWITTER_KEY and TWITTER_KEY_ID environment variables.
  
  USAGE: downer [options] [twitter account]"
  [k secret-key KEY str "Secret key from Twitter"
   i secret-key-id KEYID str "Secret key id from Twitter"]
  (let [account (get *args* 0 "nihilist_arbys")
        secret-key (or (System/getenv "TWITTER_KEY") (:secret-key *opts*))
        secret-key-id (or (System/getenv "TWITTER_KEY_ID") (:secret-key-id *opts*))]
    
    (if (or (nil? secret-key) (nil? secret-key-id))
      (println "ERROR: you must provide twitter credentials. Try -h")
      (last-tweet
        account
        secret-key-id
        secret-key))))

A few notes:

  • The docstring for the function is used as the “usage” message when the user passes the -h flag.
  • The task option DSL allows for a pre-processing step to be defined for each value. In this case, we used str, which treats each argument as a string. This can be changed to one of many very useful options, including keywords, symbols, files (which take a path and return a java.io object) and many more, including complex compound values.
  • There are two special variables that are provided by the defclifn macro: *opts* and *args*. *opts* contains all of the processed options as defined in the argument list, in the form of a map. *args* contains all other values passed on the command line, as a vector. We use the *args* variable to allow the user an intuitive way to override the default twitter account.
  • The use of environment variables as alternatives to CLI options is illustrated here. It’s very useful for deployment of more complex applications, and keeps sensitive information out of the process list.
  • We’ve added some error handling to give the user a nice message if they neglect to set their credentials.

Now we can see command-line output:

$ ./downer
ERROR: you must provide twitter credentials. Try -h

The output of ./downer -h:

$ ./downer -h
Prints the last tweet from the given account. Requires twitter user app
authentication tokens.

The authentication tokens can be set using the command-line options below, or
in the TWITTER_KEY and TWITTER_KEY_ID environment variables.

USAGE: downer [options] [twitter account]

Options:
  -h, --help                 Print this help info.
  -k, --secret-key KEY       Set secret key from Twitter to KEY.
  -i, --secret-key-id KEYID  Set secret key id from Twitter to KEYID.

We set the environment variables, and try getting the last post from a different, possibly more depressing account:

$ export TWITTER_KEY_ID="XXXXXXXXXXXXXXXXX"
$ export TWITTER_KEY="YYYYYYYYYYYYYYYYYYYYYYYYY"
$ ./downer jjmojojjmojo
FINALLY... this just makes getting the sweet, sweet carrot dogs that much easier... http://t.co/TWYer14JH4  @adzerk

Distribution/Installation, Mark 2

Pulling some of the code out into a separate file has made our little script cleaner, but now distributing the file is slightly more complicated, since we have to provide the script access to the code we factored out.

There are several ways to handle this:

  • Distribute the source code via git, or a tarball. The :source-paths environment parameter can be changed if needed to point to a proper location such as /opt/downer, or /usr/local/lib/downer.
  • Build a library jar file. The jar file can be installed into a local maven repository, or a public one like clojars.

The first option is sub-optimal. It can be made somewhat easier with help from fpm, but it’s still a bit cumbersome. The real beauty of boot scripting is we don’t have to bother with complex installation procedures.

We can leverage the power of java jar files (which are just zip files under the hood) to contain our source code and other artifacts.

This makes the jar file the best path. Once the jar is installed into a maven repository the script can reach, the script can once again be distributed as a simple stand-alone text file.

We can use boot for this. That’s what it does.

Compiling A Library Jar

For a jar file to be installable via maven (which is what boot and the clojure ecosystem uses under the hood), it must contain a pom.xml file. This file will declare the project version, the dependencies and other metadata.

We can construct a jar file from our source code just using the command line, or we can wrap it up in a build.boot file in a custom task.

Here’s the basic command to get our last tweet jar:

$ boot -d org.clojure/clojure:1.6.0 \
     -d boot/core:2.0.0-rc12 \
     -d twitter-api:0.7.8 \
     -s src/ \
     aot -a \
     pom -p last-tweet -v 1.0.0 \
     jar

Looking in the target directory, we can see our jar file:

$ ls target/*.jar
last-tweet-1.0.0.jar

We have several options for distribution, now that we have a jar file, each one takes advantage of the Apache Maven ecosystem:

  1. We can send the jar file along with the script to the user, and they can install it with boot.
  2. We can set up our own maven repository and upload the jar to that, then provide access to the user.
  3. We can send the jar file to a public repository like clojars.
  4. We can upload the file to S3, and provide credentials to our user.

Wait, Why Not Distribute A Self-Contained Jar?

We could move the CLI logic into our last-tweet namespace, and get rid of the boot script altogether. We could add the “uber” task and bundle all of our dependencies into a single, stand-alone, self-contained jar file that could be distributed (via maven as described above) without any external dependencies besides a JVM (the user won’t even need boot or clojure).

This process is covered in some detail here.

There’s nothing inherently wrong with this practice. In fact, it’s a good idea to seriously consider it when deciding how to deploy an application.

But when writing boot scripts, it can be very useful to allow the user to change things in the script, or encourage them to write new scripts that use the underlying code in new ways.

It helps to start looking at a boot script much like we would any other shell script – consider composing calls to external code instead of implementing and containing it internally.

This concept coupled with the “it just works” approach of boot makes distributing core code as library dependencies of particular interest. You can make changes to your library code and distribute it once, and when your users run their boot script it will automatically update.

On the other side of that coin, you have less worry about breaking existing scripts “in the wild”.  Users can pin the version of your library to a specific number and avoid automatic updates altogether.

It amounts to an extremely elegant way of constructing tools.

Script Modifications

To use an external jar instead of our bundled-in code, we just need to omit the :source-paths environment directive, and add our jar into the :dependencies list.

Here are the changes to the (set-env!) call:

(set-env! 
  :dependencies '[[twitter-api "0.7.8"]
                  [last-tweet "LATEST"]])

Note that we’re not pinning the version to a particular release, instead specifying the special keyword LATEST to signal that we always want the latest. This is helpful when distributing jar files that are updated frequently while the boot script is not.

However, be careful not to rely on this too heavily. If the API in the library falls too far out of sync with the script, users will get errors.

Installing A Jar With Boot

Boot provides the install task, which can install jars built with a pipeline of tasks, or a specific jar with the -f option.

$ boot install -f target/last-tweet-1.0.0.jar

Now we can run our script and it will use the locally installed jar:

$ ./downer jjmojojjmojo
RT @adzerk: 3 ways for vendors to keep mobile ad tech lean - "be easy to work with" should be a no brainer http://t.co/P3yrKH74WW @blp101 v…

This is the easiest way to get jars working with boot, but it’s the least flexible. Every time you make a change to your code, you need to create a new version of your jar and distribute it to all of your users, and they will need to install it.

Uploading To Clojars

Clojars provides a public maven repository for the greater Clojure community.

There isn’t much in the way of documentation for using boot with clojars, but there is a tutorial, and a handy tool called bootlaces that provides a couple of wrapper boot tasks to make the process more seamless.

Alas, neither of these things goes far enough to help the brand new boot user who wants to make use of clojars for their libraries. Very little is explained, and the tutorial is leiningen-centric.

NOTE: There is also an excellent write up of the process (also linegien-centric but it covers GPG and signing your jars) by Michael Peterson over at ThornyDev, including links to the rationale for signing packages.

So lets go over the process in detail, from the ground up. Admittedly, this is probably best left for a separate blog post, but as clojars is a great service and something any clojurist should be equipped to participate in – once you’ve got a handle on how it works “the hard way”, you are free to use bootlaces or derive your own workflow. It slots in nicely with the next section, where we build our own maven repository.

In preparation for pushing your jar to clojars, you’ll first need to install GPG.

GPG will be used to sign jar files to ensure they are not tampered with by malicious third parties.

NOTE: For a comprehensive introduction, see The GPG Mini HOWTO.

GPG can be installed via the downloads located at gnupg.org, or using your preferred package manager.

MacOs users can use homebrew (brew install gpg).

We’ll need to generate our key, if we’ve never used GPG before:

$ gpg --gen-key

You will be asked many questions. For most, you can specify the default suggested by gpg (press ENTER). Take note of the e-mail address that you use for your key, it will be the identifier for your new key in your keyring.

NOTE: It’s a good idea to specify a pass-phrase. If you decide not to, you can just enter an empty pass-phrase when prompted.

Now that we’ve generated our key, we can see it using gpg --list-keys:

$ gpg --list-keys
/Users/jj/.gnupg/pubring.gpg
----------------------------
pub   2048R/5A36EA7C 2015-05-21
uid                  Josh Johnson &lt;[THE EMAIL YOU PROVIDED]&gt;
sub   2048R/6C662B47 2015-05-21

Next, we need to sign up for a clojars account. Ignore the SSH key entry. We will need to generate a text-based “ASCII-armored” version of our public GPG key to paste into the corresponding text box in the form. This is accomplished with the gpg command:

$ gpg --armor --export [THE EMAIL YOU PROVIDED] code
-----BEGIN PGP PUBLIC KEY BLOCK-----
[KEY CONTENT HERE]
-----END PGP PUBLIC KEY BLOCK-----

Copy everything from -----BEGIN PGP PUBLIC KEY BLOCK----- to -----END PGP PUBLIC KEY BLOCK-----, inclusive.

Once you have your account set up, the next thing to do is set up a new repository in our build.boot file:

(set-env! :dependencies '[[twitter-api "0.7.8"]]
          :repositories 
            #(conj % ["clojars-upload" 
                      {:url "https://clojars.org/repo"
                       :username "[YOUR USERNAME]"
                       :password "[YOUR PASSWORD]"}]))

WARNING: You will want to source your username and password from an environment variable, or some other place, like a local config file. We’re putting them here for the sake of simplicity, but this is not a sound practice!

We’ve provided a function to set the environment property :repositories. This allows us to update the list of repositories instead of replacing it.

We’re ready to upload our jar. This can be done, as before, with use push boot task:

$ boot push -f target/last-tweet-1.0.0.jar -g -k [THE EMAIL FOR YOUR KEY] -r clojars-upload

Taking a look at clojars, we will see our new jar file has been uploaded!

However, it’s missing a lot of key information – things that weren’t so important when we were building a jar for our own use, but are very important when distributing software to a public repository.

In the next section, we’ll fix this, but also use the power of boot to make our workflow easier.

Adding better metatdata, fleshing out our build.boot

We’ve constructed a library jar, and have successfully uploaded it to clojars. However, at this point we cannot build and distribute boot scripts that depend on our library. Clojars has a “promotion” process that protects users from seeing jars that do not have essential metadata.

Let’s rebuild our jar with a URL, a license, and a proper description:

$ boot -d org.clojure/clojure:1.6.0 \
     -d boot/core:2.0.0-rc12 \
     -d twitter-api:0.7.8 \
     -s src/ \
     aot -a \
     pom -p last-tweet\
         -v 1.0.0 \
         -u "https://lionfacelemonface.wordpress.com/2015/04/11/advanced-boot-scripting/"\
         -d "Demo project for advanced boot scripting blog post"
     jar

Now, this is getting a bit (more) unwieldy. It’s better if we put this information into our build.boot file. We’ll still use the command line for now, as opposed to building our own boot tasks, but we’ll set these properties as default options. This way, we are free to construct our build pipeline as we see fit, but we don’t have to specify all of these lengthy parameters on the command line.

We will be able to override these values if we desire, using command line arguments as before.

(set-env! :dependencies 
            '[[twitter-api "0.7.8"]
              [org.clojure/clojure "1.6.0"]
              [boot/core "2.0.0"]]
          :source-dirs #{"src/"}
          :repositories 
            #(conj % ["clojars-upload" 
                      {:url "https://clojars.org/repo"
                       :username "[YOUR USERNAME]"
                       :password "[YOUR PASSWORD]"}]))

(task-options!
  pom {:project 'last-tweet
       :url "https://lionfacelemonface.wordpress.com/2015/04/11/advanced-boot-scripting/"
       :version "1.0.1"
       :description "Demo project for advanced boot scripting blog post."
       :license {"MIT License" "http://opensource.org/licenses/mit-license.php"}}
  aot {:all true}
  push {:gpg-sign true
        :repo "clojars-upload"
        :gpg-user-id "[EMAIL ASSOCIATED WITH YOUR KEY]"
        :gpg-passphrase "[YOUR PASSPHRASE]"})

This is a lot of stuff, so lets walk through the new concepts line by line:

Lines 1-4 invokes the set-env! function to declare the dependencies we require to be included in our jar. These correspond to the -d options in the command line we used earlier.

Line 5 specifies the source directories. We previously specified our source directory with the -s command-line option.

Lines 6-10 update the repositories list with our clojars destination and credentials, as we implemented earlier.

For general explanation of these environment modifying lines, check out Boot Environment, in the Boot Wiki.

The rest of the file represents settings that are passed to boot tasks.

Generally speaking, these correspond 1:1 with the command line options, but are expected to be pre-processed into clojure data objects.

You can figure out the exact key to set for each value using the -h switch. For example, the help text for the pom task, looks like this:

$ boot pom -h
Create project pom.xml file.

The project and version must be specified to make a pom.xml.

Options:
  -h, --help              Print this help info.
  -p, --project SYM       Set the project id (eg. foo/bar) to SYM.
  -v, --version VER       Set the project version to VER.
  -d, --description DESC  Set the project description to DESC.
  -u, --url URL           Set the project homepage url to URL.
  -l, --license NAME:URL  Conj [NAME URL] onto the project license map.
  -s, --scm KEY=VAL       Conj [KEY VAL] onto the project scm map (KEY in url, tag).

And we can see that the -d command line option corresponds to the :description key passed to task-options!.

Of particular interest to us are the --project and --license options – these are not specified as simple strings.

The --project option is converted to a clojure symbol, as hinted at by the SYM placeholder variable. To verify this, we need to look at the source for the task, and read the task-option DSL:

 "Create project pom.xml file.
  The project and version must be specified to make a pom.xml."

  [p project SYM      sym       "The project id (eg. foo/bar)."
   v version VER      str       "The project version."
   d description DESC str       "The project description."
   u url URL          str       "The project homepage url."
   l license NAME:URL {str str} "The project license map."
   s scm KEY=VAL      {kw str}  "The project scm map (KEY in url, tag)."]

Here we see in the 4th column, the handling directive for each command line option. In the case of the --project option, the sym specification casts the value from the command line into a symbol.

The --license is specificed as {str str}, indicating it is a mapping. On the command line, a colon is used to separate the key of the map from its value. Additional --license command line options will conjoin into a single map. As such, in task-options!, a map is expected.

NOTE: For a comprehensive explanation of the various options, see the Task Options DSL page in the Boot Wiki.

The rest of the options are simply strings. A few, such as the -a, or :all parameter to the aot task, are flags, and are specified with a boolean value.

One last note: the version of our project has to be incremented every time that we change the metadata in our jar file. This is important to note since the output jar will be named differently. If you try to upload a jar with the same version as a previous upload, it will fail with an “Access Denied” error.

Now we can rebuild and redeploy our jar. Since we’re chaining the boot tasks, the push task knows to look for jar files to upload in the working file set, so we don’t have to specify the path.

$ boot aot pom jar push

These tasks can be simply composed into a custom boot task. This is left as an exercise for the reader, but with the following caveat:

Once you’ve uploaded a jar to clojars, there’s no automatic or simple way to get it removed.

You can open an issue in github to ask for a deletion (details here), but it’s considered bad form.

As such, please be careful what you upload!. Make sure that you’re running tests, and doing verifications on your jar files before you push them out for mass consumption.

It’s a good idea to work those sorts of checks into any custom tasks that you put together.

Building Your Own Maven Repository

Maven handles resolving dependencies in the Java ecosystem. In maven terms, a repository is where you store artifacts, chiefly jar files. It’s what boot uses under the hood to resolve and store dependencies.

Maven repositories are relatively simple. If you’ve been using boot, you already have one, located in ~/.m2.

If you take a look you’ll see how the files are laid out:

$ ls -la ~/.m2/repository/
total 0
drwxr-xr-x  41 jj  staff  1394 Apr  5 10:50 .
drwxr-xr-x   3 jj  staff   102 Apr  1 09:46 ..
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 alandipert
drwxr-xr-x   7 jj  staff   238 Apr  1 09:46 boot
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 byte-streams
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 cheshire
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clj-http
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clj-http-lite
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clj-jgit
drwxr-xr-x   3 jj  staff   102 Apr  1 10:49 clj-oauth
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clj-stacktrace
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clj-tuple
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clj-yaml
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 clojure-complete
drwxr-xr-x   7 jj  staff   238 Apr  1 10:49 com
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 commons-codec
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 commons-fileupload
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 commons-io
drwxr-xr-x   3 jj  staff   102 Apr  1 09:46 commons-logging
drwxr-xr-x   3 jj  staff   102 Apr  1 10:49 crouton
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 fs
drwxr-xr-x   3 jj  staff   102 Apr  1 10:49 http
drwxr-xr-x   4 jj  staff   136 Apr  1 12:46 io
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 javax
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 javazoom
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 jline
drwxr-xr-x   3 jj  staff   102 Apr  5 10:50 last-tweet
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 mvxcvi
drwxr-xr-x   4 jj  staff   136 Apr  1 09:47 net
drwxr-xr-x   3 jj  staff   102 Apr  3 08:20 opencv
drwxr-xr-x   3 jj  staff   102 Apr  3 09:52 opencv-native
drwxr-xr-x  14 jj  staff   476 Apr  1 10:49 org
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 potemkin
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 primitive-math
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 reply
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 riddley
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 ring
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 slingshot
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 tigris
drwxr-xr-x   3 jj  staff   102 Apr  1 09:47 trptcolin
drwxr-xr-x   3 jj  staff   102 Apr  1 10:49 twitter-api

Note the last-tweet directory – this is where boot put our jar file when we installed it in the last section.

A maven repository is this directory structure, accessible from one of a plethora of different protocols. This includes the file system, HTTP, WebDAV, even directly from S3.

We’ll build a repository that we use the file system to write to (we could also use SFTP if this were a remote system), and provide HTTP access for a read-only use.

Boot doesn’t currently contain any tools to do this sort of work, so we’ll need to install maven.

This is fairly simple, we just need to download the tarball, and unzip it. We can then put its bin directory into our $PATH so it’s available (note this will need to go into your .bash_profile or similar location to make the change “stick”):

$ wget http://apache.mirrors.hoobly.com/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
$ tar -xvf apache-maven-3.3.3-bin.tar.gz
$ export PATH="$PWD/apache-maven-3.3.3/bin:$PATH"
$ which mvn
...path to the mvn executable

See the download page for alternative mirrors and formats.

If you are using OS X, you can install maven via homebrew:

$ brew install maven

To construct a new maven repository, we just need to install our jar to it:

$ mvn deploy:deploy-file \
    -DpomFile=target/META-INF/maven/last-tweet/last-tweet/pom.xml \
    -Dfile=target/last-tweet-1.0.0.jar \
    -DrepositoryId=local-repo \
    -Durl="file:///$PWD/my-maven-repo"

As a first pass, we can use the file:// protocol to load the jar from our new repository. We’ll need to remove the file from our local repository first:

$ rm -rf ~/.m2/repository/last-tweet

Then we can add the new repository to our downer script:

(set-env! 
  :dependencies '[[twitter-api "0.7.8"]
                  [last-tweet "LATEST"]]
  :repositories #(conj % '["my-maven-repo" {:url "file://[full-path-to-your-repo]"}]))

We use conj here to preserve the baked-in defaults.

When we run downer now, we’ll see an ever-so-slight pause and a blank line to indicate the jar is being found and copied. We can then verify that it was used by checking ~/.m2/repository:

$ ./downer
$ ls -l ~/.m2/repository
...
last-tweet
...

To share this repository, we have many options, but we’re going to do the simplest for our introductory purposes: set up nginx to serve our repository to the public.

Note: Any web server will work, as long as it generates directory listings.

First, we need to install nginx. There are packages available for most operating systems, and it’s in homebrew for folks using OS X.

Since the location of the nginx configuration is variable depending on what operating system you’re using, we’ll make a bare-minimum configuration and pass it to nginx, called nginx.conf:

events {
    worker_connections  1024;
}

http {
    default_type  application/octet-stream;
    
    server {
        
        listen 8080;
        
        location / {
            root [FULL PATH TO YOUR REPOSITORY];
            autoindex on;
        }
    }
} 

Note: You will want to better fine-tune the web server in a “production” deployment, this is just a bare-minimum example to get you going.

We can then start up nginx:

$ nginx -c nginx.conf

Nginx will run in the background. Now you can open a browser to http://localhost:8080/, and see your repository.

We can now configure the boot script to use this repository in the same manner we used the file path earlier:

(set-env! 
  :dependencies '[[twitter-api "0.7.8"]
                  [last-tweet "LATEST"]]
  :repositories #(conj % '["my-maven-repo" {:url "http://localhost:8080"}]))

And we can test it in the same way as before:

$ rm -rf ~/.m2/repository/last-tweet
$ ./downer
$ ls -l ~/.m2/repository
...
last-tweet
...

To shut down nginx, we use the -s switch:

$ nginx -s stop

From here, you can construct fairly complex maven systems. Maven supports HTTP authentication, so you can present your repository to the world and limit access. You can use WebDAV to make the HTTP-side of the repository read and write.

Outside of the HTTP front-end, you can settle on the file:// protocol and put the repository on a shared drive, and ensure each user has it mounted to the same location.

SFTP is an option for read/write of a remote system, using SSH for authentication (works with keys).


Astro Code SchoolAstro at PyCon 2015

Hello from Montréal, QC! We're here participating in the annual North American 2015 Python Conference.

So far Caleb has helped out at the Django Girls Workshop with three other Caktus Group colleagues.

Caleb teaching at the Django Girls workshop at PyCon2015

I went to the PyCon Education Summit. Great to see folks from around the world, including North Carolina, share cutting edge education ideas. Lots of amazing K-12 and University examples of how Python is teaching programming.

Caleb teaching at Django Girls Workshop at PyCon 2015

We're now hanging out at the Expo telling folks from around the world about Durham and our school. So far I've met people from Poland, Canada, India, Hawaii, and lots of US States. Very fun to represent for North Carolina.

Frank WierzbickiJython 2.7 release candidate 2 available!

On behalf of the Jython development team, I'm pleased to announce that the second release candidate of Jython 2.7 is available! We've now fixed the windows installer issues from rc1. I'd like to thank Amobee for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Frank WierzbickiJython 2.7 release candidate 1 available!

[Update: on Windows machines the installer shows an error at the end. The installer needs to be closed manually, but then the install should still work. We will fix this for rc2.]

On behalf of the Jython development team, I'm pleased to announce that the first release candidate of Jython 2.7 is available! We're getting very close to a real release finally! I'd like to thank Amobee for sponsoring my work on Jython. I'd also like to thank the many contributors to Jython.

Jython 2.7rc1 brings us up to language level compatibility with the 2.7 version of CPython. We have focused largely on CPython compatibility, and so this release of Jython can run more pure Python apps then any previous release. Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go here and navigate to the appropriate distribution and version.

Caktus GroupHow to Find Cakti at PyCon 2015

We’re very excited for PyCon 2015 and can’t wait for the fun to begin. Working on the PyCon website stoked our excitement early, so it’s almost surreal that PyCon is finally here. With an overwhelming number of great events, we wanted to highlight ones Caktus and our staff will be taking part in. Below you’ll find a list of where we’ll be each day. Please join us!

Wednesday: Building an SMS App with Django (3:30pm)

Ever wanted to build an SMS app? It’s UNICEF’s tool of choice for reaching the most remote and under-resourced areas in the world. Our team (Mark Lavin, David Ray, Caleb Smith) can walk you through the process.

Thursday: DjangoGirls Workshop (9am)

We’re a proud DjangoGirls sponsor. For this workshop, Mark Lavin and Karen Tracey, the leaders of our development team, David Ray, and Astro Code School lead instructor Caleb Smith, will act as TAs and help participants create their first Django app.

Thursday: O’Reilly Book Signing and Opening Reception (6pm)

Mark Lavin will be signing 25 free(!) copies of his O’Reilly book, Lightweight Django. Stop on by while you can— it’s first come first serve. You’ll also find many of us at the reception itself.

Friday - Saturday: Tradeshow

Stop by our tradeshow booth. You can also visit our latest venture, Astro Code School at their own booth. Bring a friend and have a showdown with our Ultimate Tic Tac Toe game (you’ll get some pretty sweet stickers too). We’ll also have daily giveaways like Mark’s Lightweight Django and some mini-quadcopters.

Saturday: PyLadies Auction (6pm)

There’s some fantastic art and other items being offered during the PyLadies auction. Caktus is contributing a framed piece showing early concept art and sketches for PyCon 2015.

Sunday: Job Fair

Do you want to join our team and become a part of the nation’s largest Django firm? Then please come by our booth at the job fair. We’d love to talk to you about ways you can grow with Caktus.

The Whole Thing: Outings with Duckling (24/7)

We cannot neglect to mention the giant duck. You’ll find our duck, nicknamed Quaktus, standing next to the "Meet here for Outings" sign. We built Duckling.us to help people create impromptu get togethers during PyCon. You can use the app to figure out where everyone is going for dinner, drinks, etc. and join in the fun.

Astro Code SchoolPyCon 2015 : See You in Montreal!

Caleb Smith and I are going to Montréal, Quebec, Canada next week for PyCon 2015! It's a huge conference all about the open-source Python programming language. Python is a big part of what we teach here at Astro Code School.

We’ll be at booth #613 in Exhibit Hall 210 in the Palais des Congres. Please come look for us. We’ll have the usual swag like t-shirts for women and men. PLUS we’ll have the very addictive game Ultimate Tic Tac Toe. Play against one another on our big touch screen. It’s harder than it sounds. Will you be a Ultimate Tic Tac Toe champion? Can we win more games than Caktus Group?

Caleb is co-presenting with our Caktus colleagues on Wednesday April 8 from 3:30 p.m. - 5 p.m on Building SMS Applications with Django. He’s also coaching at the Django Girls Workshop April 9. No programming experience required. Just bring a laptop and some energy to learn. You’ll be going through the awesome Django Girls tutorial.

I’ll be attending the Python Education Summit. I’m really looking forward to learning more from other professional and amateur python educators. The talk schedule looks nice!

Are you going to PyCon 2015? What parts of PyCon 2015 are you looking forward too? Tutorial Days, Lightening Talks, or Dev Sprints? Let us know by tweeting at us @AstroCodeschool.

Caktus GroupDiamondHacks 2015 Recap

Image via Diamond Hacks Facebook Page

This past weekend, Technical Director Mark Lavin came out to support DiamondHacks, NCSU’s first ever hackathon and conference event for women interested in computer science. Not only is NCSU Mark’s alma mater, but he’s also a strong supporter of co-organizer Girl Develop It RDU (GDI), of which Caktus is an official sponsor.

The weekend’s events began Saturday with nervous excitement as Facebook developer Erin Summers took the stage for her keynote address. Most memorable for Mark was a moment towards the end of Summers’ talk, in which she called for collaboration between neighboring audience members. It was at this point Mark first realized he was the only male in the room—a unique experience for a male developer. “I’m sure there’s lots of women who have felt the way I did,” Mark commented. The moment not only flipped the norms of a traditionally male-dominated field, but also filled Mark with a renewed appreciation for the importance of active inclusivity in the tech industry.

Aside from helping fill swag bags for the weekend’s participants and attending several of the talks, Mark gave a lightning talk, “Python and Django: Web Development Batteries Included” on Python and Django. Knowing attendees would be thinking about their upcoming projects and which language to build in, Mark chose to advocate for Django (he’s a little biased as the co-author of Lightweight Django). He highlighted the overall uses of Python as well as the extensiveness of its standard library. According to Mark, “Python comes with a lot of built-in functionality,” so it’s a great coding language for beginning developers. Mark also covered the basic Django view and model in his talk, emphasizing the features that make Django a complete framework—an excellent language for a hackathon.

Since supporting diversity in the tech industry was a key focus of the day, Mark also wanted to emphasize the inclusiveness of the Python and Django communities. From the diversity statement on Python’s website, to Django’s code of conduct, the Python community and Django subcommunity have been at the forefront of advocating for diversity and inclusion in the tech world. For Mark, this element has and continues to be “important for the growth of [the] language,” and has contributed to the vitality of these communities.

All in all the weekend was a great success, with especially memorable talks given by speakers working for Google, Trinket, and Hirease. Mark was impressed with the students’ enthusiasm and focus and lingered after both of iterations his talk to speak with attendees about their careers and interests. The next day he was equally affected by the range and talent behind Sunday’s hackathon projects as he followed the progress of various teams on Twitter. “These are the students [who] are going to help define what’s next,” he remarked.

Can’t get enough of Python, Django, and the talented Mark Lavin? Neither can we. Mark will be leading a workshop at PyCon on Building SMS Applications with Django along with fellow Cakti David Ray and our code school’s lead instructor, Caleb Smith. We’ll hope to see you there!

Tim HopperParsley the Recipe Parser

A few years ago, I created a Github repo with only a readme for a project I was hoping to start. The project was a tool for parsing ingredients from cooking recipes. I never did start this project, and I just decided to delete the Github repository. What follows is the README file I had written.

The parser should take in an unstructured ingredient recipe string and output a structured version of the ingredient.

In particular, we follow the structure described by Rahul Agarwal and Kevin Miller in a Stanford CS 224n class project. They four aspects of an ingredient (bullets quoted directly): AMOUNT: Defines the quantity of some ingredient. Does not refer to lengths of time, sizes of objects, etc. UNIT : Specifies the unit of measure of an ingredient. Examples include "cup", "tablespoons", as well as non-standard measures such as "pinch". INGREDIENT: The main food word of an item that is mentioned in the ingredient list. Groups or transformations of sets of ingredients (such as “dough”) do not fall into this category DESCRIPTION: A word or phrase that modifies the type of food mentioned, such as the word "chopped".

For example, the ingredient string

1 teaspoon finely chopped, peeled fresh ginger

will be parsed as follows:

  • AMOUNT: 1
  • UNIT : tsp
  • INGREDIENT: ginger
  • DESCRIPTION: finely chopped, peeled

and

2 (11 ounce) can mandarin orange segments, drained

will/might be parsed as:

  • AMOUNT: 22
  • UNIT : oz
  • INGREDIENT: mandarin orange segments
  • DESCRIPTION: drained

Footnotes