A planet of blogs from our members...

Caktus GroupCommuter Benefits and Encouraging Sustainable Commuting

Growth for Durham has meant a lot of great things for Caktus, from an expanding pool of tech talent to an increased interest in civic-minded tech solutions to shape the evolving community. This growth has also brought logistical challenges. Most recently, this meant providing adequate commuter support to our employees in a city whose transportation infrastructure is still nascent.

With limited available parking and an ever growing staff, we were unsure how best to tackle this problem. Rather than find additional parking where it didn’t yet exist, we began instead to investigate how we could potentially change commuting culture itself and create a more sustainable pathway for continued growth.

After careful examination and research, Caktus decided to vastly expand our commuter benefits. Beyond simply offering subsidized parking to our employees, starting in September of 2017, Cakti will have a range of benefits to choose from. For those employees who opt out of their parking benefit, they can choose instead to receive stipends for expenses related to biking or pre-tax contributions to help cover the cost of public transportation.

As part of this expansion, Caktus has also opted to build ties with local businesses and programs that offer additional perks to green commuters. Employees who choose to bike to work will become automatic members of Bicycle Benefits, an independent group that works with local businesses to offer perks and discounts to local bikers. We’ve also partnered with GoTriangle, the public face of the Research Triangle Public Transportation Authority, and their Emergency Ride Home and GoPerks programs to offer further aid, perks, and rewards to employees who choose greener commuting options.

By offering commuter benefits along with additional perks and rewards for green commuting, we hope to transition a number of our staff to greener modes of transportation. Not only will this provide a more sustainable growth plan in Durham’s increasingly urban environment, but it also encourages us to live up to what we value most as a company. We strive to do what’s best for the community, whether that be the thing that supports our employees or the thing that supports a local call for sustainable commuting. We hope this will be another step in that direction.

Interested in working for Caktus? Head to our Careers page to view our open positions.

Caktus GroupCaktus 10th Anniversary Event

Caktus turned ten this year and we recently celebrated with a party at our office in Durham, NC. We wouldn’t be where we are today without our employees, clients, family, and friends, so this wasn’t just a celebration of Caktus. It was a celebration of the relationships the company is built on.

Caktus party guests having a good time.

The last five years

Since our last milestone birthday celebration five years ago, Caktus has more than doubled in size, growing from 15 employees to 30-plus. Co-founder and CTO Colin Copeland was honored with the Triangle Business Journal’s 40 Under 40 award. The company itself moved from Carrboro to the historic building we now own in downtown Durham, where we’re pleased to be able to host local tech meetups when we’re not using it for our own special events.

Guests at the Caktus 10th anniversary party listening to a speech in the Tech Space.

In our work, we’ve continued our mission to use technology for good, building the world’s first SMS-based voter registration system, beginning the Epic Allies project to improve outcomes for young men with HIV/AIDS, and launching the Open Data Policing website for tracking police stop data.

Celebrations

Co-founder and CEO Tobias McNulty gave a speech to mark the occasion, sharing a view of how far the company has come. There was also enthusiasm for what we can achieve at Caktus in the next ten years - and those to come after.

Caktus co-founders Tobias McNulty and Colin Copeland.

As part of the celebrations we had food and birthday cupcakes, as well as prize giveaways for our team. Family and friends of Caktus employees joined in on the fun and games.

Caktus employees playing a game at the 10th-anniversary party.

We welcomed several clients as well, and we thank them along with all of those who have worked with us for giving us the opportunity to create meaningful tools that help people. A number of our clients have been with us for years, and we’re proud to have such a good relationship with those who trust us to build solutions for them.

Looking forward

The communities we’re a part of and the individuals in those communities have always been central to our focus. Growing sharp web apps is what we do, but it’s the people who build them and those we build them for that matter. With that in mind, we look forward to continuing to develop our internal initiatives around diverse representation, transparency and fair pay. We are also dedicated to continuing support of the various communities we are a part of, whether technical or geographic, through our charitable giving initiatives, conference and meetup sponsorships, open source contributions, and requiring a code of conduct to be in place and enforced at events we sponsor or attend.

Supportive, inclusive, and welcoming communities helped Caktus grow to where we are today, and we’re honored to be in a position to give back as we celebrate our tenth anniversary.

Credit for all photos: Madeline Gray.

Caktus GroupFalse Peaks and Temporary Code

In the day-to-day work of building new software and maintaining old software, we can easily lose sight of the bigger picture. I think we can find perspective when we step back and walk through the evolution of a single piece of software.

For example: first, you are asked for a simple slideshow to showcase a few images handed to you. Just five images and the images won't change.

An easy request! It only takes you a short time to build with some simple jQuery. You show the client, they approve it. You deploy it to production and call it a day.

This example, and all the examples in this blog post, are interactive. Try it out!

The next week, your client comes back with a new request. They don't think the users notice the slideshow can be navigated. They ask for previews of the next and last image, to use for navigation:

https://blog-post-false-peaks.caktustest.net/images/wireframe_ex.png

So you jump in. It’s an easy enough addition to the pretty simple slideshow widget you've already built. You slap in two images, position them just so, and add a few more lines of jQuery to bind navigation to them. Once again, it’s a quick review with the client and you ship it off to production.

But the next day, there's a bug report from a user. The slideshow works, the thumbnails show the right image, and the new previous/next preview images navigate correctly. However, the features don't work together, because the thumbnail navigation doesn’t change the new left and right preview images you added.

The client also wants the new navigation to act like a carousel and animate.

Now they want to add more photos.

And they want to be able to add, reorder, and change the photos on their own. That will break all the assumptions you made based on a fixed number of photos.

Every step along the way you added one more layer. One small request at a time, the overall complexity increased as those small requests added up. You took each step with the most efficient option you could find, but somehow you ended up with costly bloat. How does this keep happening?

https://blog-post-false-peaks.caktustest.net/images/false-peaks.jpeg

At each step, you took the next best step. Ultimately, this didn't take you where you needed to go.

We don't subscribe to waterfall development practices at Caktus. Agile is a good choice, but as we work through iterations how do we bridge across those sprints to get a larger picture of our path, and how do we make those larger decisions about technical debt and decisions that impact on a larger scale than a single sprint?

Some of the code we write is there to get us somewhere else. Maybe you need to build a prototype to understand the original problem before you can tackle it with a final solution. Sometimes you need to stub out one area of code because your current task is another focus, and you'll come back to finish it or replace it in a future ticket. Many disciplines have these kinds of temporary artifacts, from the scaffolding built by construction crews to sketches artists make before paintings.

Maybe it is harder for software developers because we often don't know what code is temporary ahead of time. A construction crew would never say, "Now that we've built the roof, we really don't need those walls anymore!" but this is what it can often feel like when we refactor or tear down pieces of a project we worked hard on, even if it really is for the best.

My suggestion: become comfortable with tearing down code as part of the iterative process! Extreme Programming calls this rule "Relentlessly Refactor".

We need to think about some of the features we implement as prototypes, even if they've been shipped to end users. We won't always know when new code or features are stop-gaps or prototypes when we're building them, but we may realize later they had been so all along when more information comes to light about where those features need to go next.

Falling into the trap of thinking the work done in sprints is inherently additive is common, but destructive.

If each sprint is about "adding value", we tend to develop a bias towards the addition of artifacts. We consider modification to happen as part of bug fixes, seeing it as correcting a mistake in earlier requirements or code, or as changes stemming from evolving or misunderstood requirements. We may hold a bias against removing artifacts previously added, either within a given sprint or in a later sprint.

Going back to the construction analogy, when you construct a building you create a lot of things that don't end up in the final construction. You build scaffolding for your workers to stand on while the walls are being built up. You build wooden forms to shape concrete, removing them when the foundations and structures are solid.

You fill the building-in-progress with temporary lighting and wiring for equipment until the project is near completion and the permanent electrical and plumbing are hooked up and usable. A construction crew creates a lot of temporary artifacts in the path to creating a permanent structure, and we can learn from this when building software iteratively. We're going to have cases where work that supports what we're completing this sprint isn't necessary or may even be a hindrance in a future sprint. Not everything is permanent, and removing those temporary artifacts isn't a step backward or correcting a mistake. It is just a form of progress.

Jeff TrawickUpgrading from python-social-auth 0.2.19 to social-auth-core 1.4.0 + social-auth-app-django 1.2.0


I had a few issues with this many moons ago when I was trying the initial social-auth-core packaging. Yesterday I was able to get it to work with the latest version, which in turn allowed me to move from Django 1.10 to Django 1.11.
You will most likely encounter failed Django migrations when making the switch. Some posts on the 'net recommend first upgrading to an intermediate version of python-social-auth to resolve that, but I wanted a simpler production switchover, which I found in this social-app-django ticket. The eventual production deploy solution after testing locally with a copy of the production database was:
  1. Temporarily hack my Ansible deploy script to fail after updating the source tree and virtualenv for the new libraries but before running migrations.
  2. On the server, as the project user, run pip uninstall python-social-auth to delete the old package.
  3. On the server, make another copy of the production database and then run update django_migrations set app='social_django' where app='default'; via psql.
  4. On the server, as the project user, run python manage.py migrate social_django 0001 --fake.
  5. Remove the temporary fail from my Ansible deploy script.
  6. Run the deploy again, which will run the remaining migrations.

Caktus GroupQuick Tips: How to Change Your Name in JIRA

In May 2017, Atlassian rolled out the new Atlassian ID feature, which gives Atlassian product users a central Atlassian account that holds their user details. When this change occurred, our integration with G Suite combined with the Atlassian ID feature to result in some users with strange display names in JIRA, which I (as the JIRA admin) can’t fix since users now control their own profile. However, they don’t control their profile through JIRA. So, how does one change the names that display in JIRA for their users? (Hint: you can’t do it through User Management.)

Step 1. Go to https://id.atlassian.com/profile/profile.action and log in.

JIRA account settings page

Step 2. Enter your desired display name in the field labeled Full Name.

JIRA account settings with a name change.

Step 3. Click Save.

Step 4. Return to your JIRA instance. If your name has not updated, log out and then back in again.

Step 5. Revel in your new name.

Setting a new JIRA name.

Want more JIRA? Read up on how we made use of the priority field to ease team anxiety.

Caktus GroupTips for Product Ownership and Project Management in a Client Services Organization

Looking for some pointers to improve my own client management skills, I scoured the internet to find practical ideas on how to handle challenges better as the product owner (PO) in a client-services organization. I came up completely short.

Using Scrum in a client-services organization comes with its own unique challenges. As the product owner (PO), you are technically the project’s key stakeholder (along with many other responsibilities nicely outlined here). However, when serving as a PO with external clients you hold the responsibility, but not always the power, to make the final decisions on the priorities and final features of the product. Clients sometimes have an idea of what they want, but it may run counter to what you and your team recommend based on your experiences (which is why the client hired you in the first place! It is okay to offer alternatives to their requests, as long as you can back it up with facts). Ultimately the client makes the final decision, but it is our job to give them our best recommendations.

Some companies designate the client as the PO, with all of the responsibilities that go along with that. This approach is often not feasible at Caktus since our clients are off-site, not part of the Scrum team, and have many other external responsibilities that do not involve our project. The client is the subject expert, but not necessarily well-versed enough in Scrum or software development to have the skill set to be a good PO at a technical level.

Here are some tips that I think are helpful for working with non-technical, external clients when using Scrum.

Set and reinforce expectations

You can explain Scrum in detail and give real-world situations to help build an understanding of what it entails, but until a person works within that framework, their full grasp of it will be limited. If your client is working in a less technical environment, it is likely Scrum is new to them. Use any opportunity (discovery phase, Sprint Zero, every review and relevant communication) as an opportunity to underscore what you need from them as a client to help make this project successful. At Caktus, Scrum represents uncharted territory for many of our customers, but the process works because we treat each project as a learning opportunity, incrementally reinforcing the process and improving the agility of our partnership throughout the project.

Be transparent, but take into account the client’s background

In the name of transparency, we always offer clients full access to our ticket tracker and product backlog, a detailed release plan for the most valuable features listing out all the tickets we believe we can achieve within the sprints, a breakdown of and calendar invites for all the sprint activities for the team, and how the activities relate to their particular project (i.e., in backlog grooming we do ABC, in sprint planning we do XYZ, etc.).

Too much information, however, can be paralyzing. Get to know your client (how technical they are, how much time they have to be involved in the project, etc.) before deciding what information will be most helpful for them. The whole point is to create a product that delights the client, and make the process of getting there as smooth and easy as possible.

A client with limited technical knowledge may find digging through a product backlog requires more time than they have. Instead, you can give them consistent updates in other formats, even something as simple as a bulleted list. For example: “These are the tickets we are going to estimate in backlog grooming on Tuesday. Please review the user stories and the Acceptance Criteria (AC) to ensure it aligns with what you feel is important for this feature.” At Caktus, we typically take on the day-to-day management of the product backlog, based on our understanding of the project and the relative priorities communicated to us by our clients. For some clients this can take the place of having full access to everything, which at times serves more to overwhelm than to inform.

Similarly, the release plan should be built around certain features rather than specific tickets. Since a release plan is a best guess based on the initial estimates of the team and is constantly being adjusted, including features to be completed rather than specific tickets gives the team the means to focus attention on meeting the overarching project goals. Hewing to the release plan is not always possible, but when you can do it, it makes things less stressful for your client.

(Over) Communicate

There is a lot to accomplish in a sprint review meeting. You need to talk about what was accomplished, share it with the client, discuss their feedback on the completed work, talk about priorities for the upcoming sprint, and then possibly make adjustments based on the feedback that came out of the review. To help take the pressure off the client to review everything, give feedback, and think about next steps in a one-hour meeting, let clients know when features are ready for review on staging, in advance of the sprint review. That way they have ample time to play around with the features. By the time sprint review comes, they have a solid understanding of progress and we can use the sprint review to walk through specific feedback.

We recommend writing up your upcoming sprint goals as early as you can and sharing them ahead of time. It's important to note that these are only the goals, and that the team decides what they pull into the sprint. Then, after sprint planning, keep the client updated on which features your team was able to pull into the sprint so their expectations are set appropriately.

If you need something from a client, just ask. Explaining dependencies also helps (i.e., adjusting this feature too far down the road will be more expensive than fixing it now, so please give us feedback by X date so we can address it soon). Throughout my four-plus years at Caktus, I've found that technical expertise is only half the battle, and our most successful projects are those in which we stay in constant communication in with the client.

Compromise when it makes sense for the client and for your team

Some clients are not comfortable using or navigating the tools we use every day. Therefore, if it helps a client to, for example, download ticket details from JIRA into an Excel spreadsheet formatted in a way that allows them to understand something better, it is worth the extra time and effort. However, keep in mind the overall balance of time and effort. If they ask you to keep a shared spreadsheet updated in real time with all updates in JIRA, help them understand why that might not be a good idea, and come up with some alternative solutions to get them what they need.

Conclusion

Much of what is out there on the internet related to project ownership is related to being a PO at a software company, with internal stakeholders. Having external clients doesn’t make Scrum impossible; it just makes it a little bit more challenging, and requires some tweaking to keep your client - and your team - happy!

Caktus GroupAdvanced Django File Handling

Advanced Django File Handling

Modern Django's file handling capabilities go well beyond what's covered in the tutorial. By customizing the handlers that Django uses, you can do things pretty much any way you want.

Static versus media files

Django divides the files your web site is serving unchanged (as opposed to content delivered by your Django views) into two types.

  • "Static" files are files provided by you, the website developer. For example, these could be JavaScript and CSS files, HTML files for static pages, image and font files used to make your pages look nicer, sample files for users to download, etc. Static files are often stored in your version control system alongside your code.
  • "Media" files are files provided by users of the site, uploaded to and stored by the site, and possibly later served to site users. These can include uploaded pictures, avatars, user files, etc. These files don't exist until users start using the site.

Two jobs of a Django storage class

Both kinds of files are managed by code in Django storage classes. By configuring Django to use different storage classes, you can change how the files are managed.

A storage class has two jobs:

  • Accept a name and a blob of data from Django and store the data under that name.
  • Accept a name of a previously-stored blob of data, and return a URL that when accessed will return that blob of data.

The beauty of this system is that our static and media files don't even need to be stored as files. As long as the storage class can do those two things, it'll all work.

Runserver

Given all this, you'd naturally conclude that if you've changed STATICFILES_STORAGE and DEFAULT_FILE_STORAGE to storage classes that don't look at the STATIC_URL, STATIC_ROOT, MEDIA_URL, and MEDIA_ROOT settings, you don't have to set those at all.

However, if you remove them from your settings, and try to use runserver, you'll get errors. It turns out that when running with runserver, django.contrib.staticfiles.storage.StaticFilesStorage is not the only code that looks at STATIC_URL, STATIC_ROOT, MEDIA_URL, and MEDIA_ROOT.

This is rarely a problem in practice. runserver should only be used for local development, and when working locally, you'll most likely just use the default storage classes for simplicity, so you'll be configuring those settings anyway. And if you want to run locally in the exact same way as your deployed site, possibly using other storage classes, then you should be running Django the same way you do when deployed as well, and not using runserver.

But you might run into this in weird cases, or just be curious. Here's what's going on.

When staticfiles is installed, it provides its own version of the runserver command that arranges to serve static files for URLs that start with STATIC_URL, looking for those files under STATIC_ROOT. (In other words, it's bypassing the static files storage class.) Therefore, STATIC_URL and STATIC_ROOT need to be valid if you need that to work. Also, when initialized, it does some sanity checks on all four variables (STATIC_URL, STATIC_ROOT, MEDIA_URL, and MEDIA_ROOT), and the checks assume those variables' standard roles, even if the file storage classes have been changed in STATICFILES_STORAGE and/or DEFAULT_FILE_STORAGE.

If you really need to use runserver with some other static file storage class, you can either configure those four settings to something that'll make runserver happy, or use the --nostatic option with runserver to tell it not to try to serve static files, and then it won't look at those settings at startup.

Using media files in Django

Media files are typically managed in Python using FileField and ImageField fields on models. As far as your database is concerned, these are just char columns storing relative paths, but the fields wrap that with code to use the media file storage class.

In a template, you use the url attribute on the file or image field to get a URL for the underlying file.

For example, if user.avatar is an ImageField on your user model, then

<img src="{{ user.avatar.url }}">

would embed the user's avatar image in the web page.

The default storage class for media, django.core.files.storage.FileSystemStorage, saves files to a path inside the local directory named by MEDIA_ROOT, under a subdirectory named by the field's upload_to value. When the file's url attribute is accessed, it returns the value of MEDIA_URL, prepended to the file's path inside MEDIA_ROOT.

An example might help. Suppose we have these settings:

MEDIA_ROOT = '/var/media/'
MEDIA_URL = '/media/'

and this is part of our user model:

avatar = models.ImageField(upload_to='avatars')

When a user uploads an avatar image, it might be saved as /var/media/avatars/12345.png. That's MEDIA_ROOT, plus the value of upload_to for this field, plus a filename (which is typically the filename provided by the upload, but not always).

Then <img src="{{ user.avatar.url }}"> would expand to <img src="/media/avatars/12345.png">. That's MEDIA_URL plus upload_to plus the filename.

Now suppose we've changed DEFAULT_FILE_STORAGE to some other storage class. Maybe the storage class saves the media files as attachments to email messages on an IMAP server - Django doesn't care.

When 12345.png is uploaded to our ImageField, Django asks the storage class to save the contents as avatars/12345.png. If there's already something stored under that name, Django will change the name to come up with something unique. Django stores the resulting filename in the database field. And that's all Django cares about.

Now, what happens when we put <img src="{{ user.avatar.url }}"> in our template? Django will retrieve the filename from the database field, pass that filename (maybe avatars/12345.png) to the storage class, and ask it to return a URL that, when the user's browser requests it, will return the contents of avatars/12345.png. Django doesn't know what that URL will be, and doesn't have to.

For more on what happens between the user submitting a form with attached files and Django passing bits to a storage class to be saved, you can read the Django docs about File Uploads.

Using Static Files in Django

Remember that static file handling is controlled by the class specified in the settings STATICFILES_STORAGE.

Media files are loaded into storage when users upload files. Static files are provided by us, the website developers, and so they can be loaded into storage beforehand.

The collectstatic management command finds all your static files, and saves each one, using the path relative to the static directory where it was found, into the static files storage. [2]

By default, collectstatic looks for all the files inside static directories in the apps in INSTALLED_APPS, but where it looks is configurable - see the collectstatic docs.

So if you have a file myapp/static/js/stuff.js, collectstatic will find it when it looks in myapp/static, and save it in static files storage as js/stuff.js.

You would most commonly access static files from templates, by loading the static templatetags library and using the static template tag. For our example, you'd ask Django to give you the URL where the user's browser can access js/stuff.js by using {% static 'js/stuff.js' %} in your template. For example, you might write:

{% load 'static' %}
<script src="{% static 'js/stuff.js' %}"></script>

If you're using the default storage class and STATIC_URL is set to http://example.com/, then that would result in:

<script src="http://example.com/js/stuff.js"></script>

Maybe then you deploy it, and are using some fancy storage class that knows how to use a CDN, resulting in:

<script src="http://23487234.niftycdn.com/239487/230498234/js/stuff.js"></script>

Other neat tricks can be played here. A storage class could minimize your CSS and JavaScript, compile your LESS or SASS files to CSS, and so forth, and then provide a URL that refers to the optimized version of the static file rather than the one originally saved. That's the basis for useful packages like django-pipeline.

[2]collectstatic uses some optimizations to try to avoid copying files unnecessarily, like seeing if the file already exists in storage and comparing timestamps to the origin static file, but that's not relevant here.

If you’re looking for more Django tips, we have plenty on our blog.

Caktus GroupDjangoCon 2017 Recap

Mid-August brought travel to Spokane for several Caktus staff members attending DjangoCon 2017. As a Django shop, we were proud to sponsor and attend the event for the eighth year.

Meeting and Greeting

We always look forward to booth time as an opportunity to catch up with fellow Djangonauts and make new connections. Caktus was represented by a team of six this year: Charlotte M, Karen, Mark, Julie, Tobias, and Whitney. We also had new swag and a GoPro Session to give away. Our lucky winner was Vicky. Congratulations!

Winner of our DjangoCon 2017 prize giveaway.

This year we also had a special giveaway: one free ticket to the conference, donated to DjangoGirls Spokane. The winner, Toya, attended DjangoCon for the first time. We hope she had fun!

Top Talks

Our technical staff learned a lot from attending the other talks presented during the conference. Their favorite talks included the keynote by Alicia Carr, The Denormalized Query Engine Design Pattern, and The Power and Responsibility of Unicode Adoption.

Charlotte delivered a well-received talk about writing an API for almost anything. We’ll add the video to this post as soon as it’s available in case you missed it.

Another excellent talk series presented at DjangoCon!

See You Next Time

As always, we had a great time at DjangoCon and extend our sincere thanks to the organizers, volunteers, staff, presenters, and attendees. It wouldn’t be the same conference without you, and we look forward to seeing you at next year’s event.

Caktus GroupLetting Go of JIRA: One Team's Experiment With a Physical Sprint Board

At Caktus, each team works on multiple client-service projects at once, and it’s sometimes challenging to adapt different clients’ various tools and workflows into a single Scrum team’s process.

One of our Scrum teams struggled with their digital issue tracker; we use JIRA to track most of our projects, including the all-important sprint board feature. However, one client used their own custom issue tracker, and it was not feasible to transfer everything to our JIRA instance. A challenge then arose: how do we visualize the work we are doing for this project on our own sprint board?

We stick with JIRA

Since the tasks were already tracked in the client’s tracker, we did not want to duplicate that effort in JIRA, and we were unable to find an existing app to integrate the two trackers so that the data would sync both ways. But we still wanted the work to be represented in our sprints since it took up a significant portion of the team’s time.

Initially, we included placeholder JIRA tickets in our sprint for each person who would work on this project. Those tickets were assigned story points relative to the time that person was planning to spend on it. Essentially, among our other projects’ tasks and stories, we also had distinct blocks of hours to represent the work being done on this separate project.

This solution started to cause some confusion when the team tried to relate story points directly to hours, and it didn’t add any real value since the placeholder tickets lacked any specificity, so we decided to stop using them altogether. As a result, this project was not represented at all on our sprint board or in our velocity, and we did not have a good overall picture of our sprint work. This hindered our transparency and visibility into the team’s workload, and hurt our ability to allocate time across projects effectively (take a look at this post to see how we do that using tokens!).

We transition to a low-tech solution

Eventually, the team left JIRA behind and started using a physical whiteboard in the team room to visualize sprint work. The board allowed us to include tickets from our tracker and our client’s tracker in one central location.

A physical task board at Caktus.

We divided the board into the same columns that were on our JIRA sprint board to represent ticket status: To Do, In Progress, Pull Request, On Staging, Blocked, and Done. We use sticky notes to represent each user story, task, or bug, color-coded by project. Each sticky contains a ticket number that maps to the ticket in one of the trackers, a short title or description, and a story point estimate. We also started tracking sprint burndown and team velocity on large sticky sheets, also posted on the walls of the team room.

A physical sprint burndown chart at Caktus. A physical sprint burndown chart.

A physical team velocity chart at Caktus. A physical team velocity chart.

The physical board evolves

Including distinct tickets from the project in our sprints highlighted another challenge: the project’s priorities were determined by the client instead of by the team’s Product Owner, and the client did not use Scrum. This meant that the client changed the current priorities more frequently than our two-week sprint cadence, and the nature of the project was such that we had to keep up.

The team pointed out that we could not commit to finishing a specific set of tasks for that project since priorities at the beginning of the sprint were not fixed for the following two weeks (which is essential for carrying out a sprint effectively, as it allows the team to stay focused on a stable goal instead of having to shift gears often).

We decided that the best way to handle uncertain priorities was to divide the whiteboard into horizontal rows (or swimlanes), each with its own rules and expectations:

  • One swimlane for sprint work that we commit to finishing, and whose priorities do not change within the sprint.
  • A second swimlane for work that we want to make progress on but cannot commit to finishing in the sprint (mostly due to external dependencies).
  • A third swimlane for work that we have no control over, such as projects where priorities are not stable enough for two-week sprints, and the release day does not align with the end of our sprint. This swimlane uses more of a Kanban workflow, minus the work in progress limits.

All of the team’s projects are now represented with tickets that map to distinct user stories, tasks, and bugs in one central place, giving the team full visibility into the work being done during the sprint, without committing to work that is likely to fall in priority.

Where we are now

The team continues to work out the kinks of using a physical board, such as overlooking details that are included only in the issue trackers, needing to be physically in the team room to know what to work on next, updating tickets only once a day during standup, and sticky notes falling off the board when the room gets too hot.

We have also observed some distinct benefits to leaving JIRA behind:

  • We can easily include new projects that use any issue tracker into our physical sprint board;
  • The team is fully engaged with the physical artifacts and actively drive standups and sprint planning together, as opposed to having one person operate JIRA while everyone else watches;
  • The team enjoys moving the sticky notes along the board, and take satisfaction in updating the burndown chart (especially when it gets down to zero!);
  • They feel more freedom to experiment with the board, knowing that the possibilities are only limited by their imagination rather than the capabilities of the software.

I don’t know if the team will continue to use the whiteboard, if they will choose to go back to using JIRA’s sprint board, or if they will come up with some other solution; but as their Scrum Master, I have appreciated the journey, the team’s willingness to experiment and try new things, and their creativity in overcoming the challenges they encountered.

We didn’t always use Scrum at Caktus - check out this blog post to learn how we got started.

Caktus GroupShipIt Day Recap Q3 2017

Caktus recently held the Q3 2017 ShipIt Day. Each quarter, employees take a step back from business as usual and take advantage of time to work on personal projects or otherwise develop skills. This quarter, we enjoyed fresh crêpes while working on a variety of projects, from coloring books to Alexa skills.

Technology for Linguistics

As both a linguist and a developer, Neil looked at using language technology for a larger project led by Western Carolina University to revitalize Cherokee. This polysynthetic language presents challenges for programming due to its complex word structure.

Using finite state morphology with hfst and Giellatekno, Neil explored defining sounds, a lexicon, and rules to develop a model. In the end, he feels a new framework could help support linguists, and says that Caktus has shown him the value of frameworks and good tooling that could be put to use for this purpose.

Front-end Style Guide Primer

front-end style guide Although design isn’t optional in product development, the Agile methodology doesn’t address user interface (UI) or user experience (UX) design. We use Agile at Caktus, but we also believe in the importance of solid UX in our projects.

Basia, Calvin, and Kia worked to fill the gap. They started building a front-end style guide, with the intention to supply a tool for Caktus teams to use in future projects. Among style guide components considered during this ShipIt Day were layout, typography, and color palettes. Calvin worked to set up the style guide as a standalone app that serves as a demo and testbed for ongoing style guide work. Kia explored the CSS grid as a flexible layout foundation that makes building pages easier and more efficient while accommodating a range of layout needs. Basia focused on typography, investigating responsive font sizing, modular scale, and vertical rhythm. She also started writing color palettes utilizing colors functions in Stylus.

Front-end style guides have long been advocated by Lean UX. They support modular design, enabling development teams to achieve UI and UX consistency across a project. We look forward to continuing this work and putting our front-end style guide into action!

Command Line Interface for Tequila

Jeff B worked on a command line interface to support our use of Tequila. While we currently use Fabric to execute shell commands, it’s not set up to work with Python 3 at the time of writing. Jeff used the Click library to build his project and incorporated difflib from the standard library in order to show a git-style diff of deployment settings. You can dig into the Tequila CLI on the Caktus GitHub account and take a look for yourself!

Wagtail Calendar

Caktus has built several projects using Wagtail CMS, so Charlotte M and Dmitriy looked at adding new functionality. Starting with the goal of incorporating a calendar into the Bakery project, they added an upcoming events button that opens a calendar of events, allowing users to edit and add events.

Charlotte integrated django-scheduler events into Wagtail while Dmitriy focused on integrating the calendar widget onto the EventIndexPage. While they encountered a few challenges which will need further work, they were able to demonstrate a working calendar at the end of ShipIt Day.

Scrum Coloring Book

Charlotte F and Sarah worked together to create a coloring book teaching Scrum information, principles, and diagrams in an easily-digested way. The idea was based on The Scrum Princess. Their story follows Alex, a QA analyst who joins a development team, through the entire process of completing a Scrum project.

Drafting out the Caktus Scrum coloring book.

Over the course of the day, they came up with the flow of the narrative, formatted the book so that all the images are on separate pages with story text and definitions/image to color. Any illustrators out there who want to help it come to life?

QA Test Case Tools

Gerald joined forces with Robbie, to follow up on Gerald’s project from our Q2 2017 ShipIt Day. This quarter, our QA analysts tinkered with QMetry, adding it to JIRA to see whether this could be the tool to take Caktus’ QA to the next level.

QMetry creates visibility for test cases related to specific user stories and adds a number of testing functions to JIRA, including the ability to group different scenarios by acceptance criteria and add bugs from within the interface when a test fails. Although there are a few configuration issues to be worked out, they feel that this tool does most of what they want to do without too much back-and-forth.

Wagtail Content Chooser

Phil also took the chance to do some work with Wagtail. Using the built-in page-chooser as a guide, he developed a content-chooser that shows all of the blocks associated with that page’s StreamFields. The app can get a content block with its own unique identifier and would enable the admin user to pull that content from other pages into a page being worked on. Next steps will be incorporating a save function.

Publishing an Amazon Alexa Skill

For those seeking inspiring quotes, David authored a skill for Amazon Alexa which would return a random quote from forismatic. An avid fan of swag socks, David came across the opportunity to earn some socks (and an Echo Dot) from Amazon if he submitted an Alexa skill and got it certified. He used the Flask app Flask-Ask to develop the skill rapidly, deployed it to AWS Lambda via Zappa, and is now awaiting certification (and socks). Caktus is an AWS Consulting Partner, so acquiring AWS Alexa development chops would present another service we could offer to clients.

Catching Up on Conferences

Dan caught up on videos of talks from conferences:

He also looked at the possibility of building a new package that preprocesses JavaScript and CSS, but after starting work he realized there’s a reason why existing packages are complicated and resolved to revisit this another time.

That’s all for now!

Although the ShipIt Day projects represent time away from client work, each project helps our team learn new skills or strengthen existing ones that will eventually contribute toward external work. We see it as a way to invest in both our employees and the company as a whole.

To see some of our skills at work in client projects, check out our Case Studies page.

Caktus GroupTransitioning to Scrum: Mapping Job Titles to Scrum Roles

Early in your transition to Scrum, you will be faced with a hard truth: your team or organization has job titles and Scrum has roles, and there is probably little to no overlap between the two. How do you map Susan, lead technical architect, and Tom, project manager, to the three Scrum roles: product owner, Scrum master, and developer?

Depending on the resources driving your transition, you’ll find some ready-made solutions at your fingertips: product managers and strategists become product owners, project managers become Scrum masters, and all the other actors become developers. Easy, problem solved, you might think. Susan is now a developer and Tom is a Scrum master. Stick a fork in this transition because it’s done.

I suggest a different approach. Instead of trying to map titles directly to roles, map people to roles. Take a deeper dive into the Scrum roles: What characteristics does a Scrum master need? What authority do they need? Once you’ve figured that out, which person - not title - best matches the needs of the role?

The Product Owner

The Scrum role of product owner (PO) has the following core responsibilities:

  • Maintain the vision of the product
  • Manage trade-offs in scope, schedule, budget, and quality
  • Own the product backlog
  • Empowered to make decisions
  • Define acceptance criteria and verify that they are met
  • Collaborate with the development team and all stakeholders

Additionally, a good product owner has the following characteristics:

  • Domain knowledge
  • Good communicator
  • Good negotiator
  • Great at building and managing relationships
  • Powerful motivator
  • Willing to make hard and/or unpopular decisions
  • Available to the team

Take a look at the people you have available. Who can best fulfill these responsibilities and has all the necessary characteristics? Pro tip: If someone checks all the boxes except availability, keep looking. A Scrum team with an absent or remote PO is not going to be nearly as effective as a team with a readily and consistently available PO.

The Scrum master

The core responsibilities of a Scrum master (SM) are to:

  • Lead the team by serving them (servant leadership)
  • Coach
  • Shield team from interference
  • Resolve and remove impediments
  • Act as an agent of change

A good SM also has the following characteristics:

  • Knowledgeable about Agile and Scrum
  • Questioning
  • Patient and steady
  • Collaborative
  • Protective of the team
  • Transparent in their communications

There’s also an additional consideration for the Scrum master role, and that’s the lack of command and control. A Scrum master should not be commanding or controlling; they don’t tell team members what to do, and they don’t control what team members work on or how they work. Which person on your team best fits this role? It’s likely that the best candidate is not your project manager. (After all, what PM is happy not being in control?) And don’t forget that your SM can be a developer if they are the best person for the role (and are suited to wearing multiple hats at once). If you don’t have a suitable candidate for the SM role, it would be better to hire a trained and experienced Scrum master rather than placing an unsuitable person into the role.

The Developer

And what about the developer role? The Scrum Guide defines the Development Team as “professionals who do the work of delivering a potentially releasable Increment of ‘Done’ product at the end of each Sprint”. So ask yourself: who is making the product? You’ll probably come up with a collection of folks with varying job titles, like developer, programmer, quality assurance, architect, artist, designer, etc. Congratulations, all those folks are now in the developer role in Scrum!

Stay Focused

Looking for people with suitable characteristics for each Scrum role may take longer than mapping based on job titles, but it’s worth the effort. If you stay focused on people during your transition, you’ll end up with a smoother transition, happier people, and more productive teams. Find out more about transitioning to Scrum by reading about how we did it at Caktus.

Caktus GroupFrom User Story Mapping to High-Level Release Plan

At Caktus, we begin many projects with a discovery workshop. A discovery workshop is an opportunity for our product team to get together with client stakeholders in order to answer three questions:

  • What is the problem we are trying to solve?
  • For whom are we solving this problem?
  • How are we going to solve the problem?

This blog post on product discovery outlines ways to help determine the problem to be solved and answer the question of for whom we are solving the problem.

In short, when discussing the problem to be solved, we talk about:

  • Business goals
  • Project goals
  • Potential constraints and risks
  • Success criteria

To find out for whom we are solving the problem we:

  • Define user roles for the application
  • Discuss user goals
  • Identify user pain points

Finally, to identify how we are going to solve the problem, we map out user flows and tasks in an activity called user story mapping.

User Story Mapping

User story mapping is a visualization technique popularized by Jeff Patton that allows product teams to map out an entire application with respect to the different user roles the application must support.

The activity begins by identifying top-level user actions (or user outcomes), writing them out on sticky notes, and arranging them into a row at the top of the user story map. We refer to that top level row as the narrative flow or the backbone of the user story map.

Top-level user actions mapped out.

If you imagine building a to-do list application, the narrative flow could include user outcomes such as:

  • Manage my account
  • Manage my to-do list
  • Share my to-do list

Once the high-level tasks have been identified and represented in the narrative flow, we move on to identify detail tasks, subtasks, and alternative ways of accomplishing a task. To distinguish detailed tasks from the narrative flow in the user story map, we write them out on different color sticky notes and add them to the user story map under the relevant high-level tasks.

A user story map indicating subtasks under the main tasks.

In the case of this imaginary to-do list application, under “Manage my account,” we could list detail tasks such as:

  • Create my account
  • Edit my account
  • Delete my account

and subtasks such as:

  • Edit my contact information
  • Edit my password
  • Edit my avatar

After the entire application is mapped out in this way, we identify a list of most valuable features. We do that by asking stakeholders which features the application can go live without and still deliver its essential business and user value. We draw a prioritization line across the map, consider each user story in the map, and move sticky notes that represent non-essential stories (or features) under the priority line.

A user story map with priority line indicating the most valuable features.

The user story mapping activity leads us to a planned-out application and a list of most valuable features. The prioritized user story map also becomes the first iteration of the project backlog. (A backlog is a list of features or technical tasks that are necessary and sufficient to complete a project.)

Writing User Stories

After the discovery workshop, we translate every sticky note from the map into a properly structured user story. In Agile software development, a user story is a brief description of a desired feature that is written from the perspective of an end-user, and that captures user outcomes that the feature is meant to support. A user story follows a prescribed format:

As a [user type], I want [feature] so that [benefit].

We write user stories as a team on index cards and assign acceptance criteria to each of them. (Acceptance criteria are conditions that a feature must satisfy to be accepted as done or completed.) User stories are then estimated by the development team. There are a variety of Agile estimation techniques available. We generally use Planning Poker at Caktus but at the beginning of a project, there are too many backlog items to estimate for Planning Poker to be effective. We have found that in those cases, Relative Mass Valuation works well. Using this technique, the team first arranges the user stories in order of their relative size, from small to large level of effort, and then assigns story points to each one using a modified Fibonacci sequence. The result is a fully estimated initial product backlog, which will allow the product owner to create a release plan.

Here is what a set of estimated user stories could look like:

Estimating user stories at Caktus

Creating a High-Level Release Plan

The product owner ranks the estimated user stories by priority, taking into account the business value and relative effort of each one, to best take advantage of the development time available. If the team’s velocity is already known, the product owner can divide the major features into rough sprints to create an initial release plan:

An example of a high-level release plan at Caktus.

The product backlog, and by extension the release plan, evolve constantly as the project progresses: priorities change, scope is added or reduced as feedback is gathered, stories are broken down into smaller ones, etc. As long as new backlog items are estimated and prioritized, the product owner can adjust the release plan to maintain a realistic release timeframe.

Conclusion

The process from user story mapping through writing and estimating the user stories gives development teams a foundation on which to base the development effort. User story mapping is a good way to determine what user tasks must be supported and how they break down into subtasks, as well as which user tasks are not essential for the application to deliver on business and user value. Writing user stories as a team is an opportunity to articulate each story in more detail and spread the knowledge among all members of the team. Finally, estimating user stories with the Relative Mass Valuation technique is an efficient way of sizing many stories in one estimation session by comparing them to each other.

We have found the process useful, but we have also learned some lessons:

  • During user story mapping, the stakeholders’ understanding of the project may evolve and by the end of the activity, the user stories identified at the beginning may change accordingly. In these cases, it is important to revisit those stories at the end of the discovery workshop to confirm or adjust them in light of the newly gained understanding.
  • Writing user stories with team members who have not participated in the discovery workshop can be challenging. In future, we may include a separate workshop debrief session to bring the entire team up to speed on the findings from the discovery workshop before we set out to write user stories.
  • A high-level release plan can be a helpful tool offering an initial timeline for the product release. However, it can become an impediment if its transient nature is not fully understood. In Agile software development, it’s paramount that a high-level release plan such as the one shown here not be treated as a definitive schedule, but rather as an initial take on a possible order of work. As soon as the work begins, that order will change as new information about the project is revealed through the development process.

To learn more about UX techniques used at Caktus, read Product Discovery Part 1: Getting Started or Product Discovery Part 2: From User Contexts to Solutions.

Caktus GroupIs Django the Right Fit for your Project?

You need a website. You may have done some searches, and come back overwhelmed and empty-handed, having found hosting providers offering services that may have sounded familiar (“WordPress”) and ones that may have sounded like a foreign language (“cPanel”). You may have considered hiring someone to build your website, and gotten conflicting answers about what you need and what it would cost. You may have also heard about Django, but you're not sure how it it fits into the picture and whether or not it's the right fit for your project. This is common, because there are many different types of websites out there. To help answer the question of whether Django is the right fit for your project, let’s take a look at the landscape.

Figuring out your needs

Most websites fall into one of three categories: Static, Dynamic, or Interactive. Static sites are ones which don’t change much at all; these are typically websites for small, local businesses, listing things such as address, hours, and phone number. Dynamic websites, which are more common, have a static structure but changing content such as a news feed, blog, or pricing which needs to be updated often. A dynamic website may even have a store embedded, where users can make online purchases. At its core, though, the business generates the updates to a dynamic website; visitors simply use what is there. An interactive website, on the other hand, provides many more opportunities for user interaction. Social media websites are interactive, with users creating content (posts) and interacting with others’ content. Dynamic and interactive websites need a content management system (like WordPress or Drupal), or a more custom solution (like Django).

What is a Content Management System?

If you’ve looked into creating a website, you may have heard the term “Content Management System” or “CMS” thrown around. I’ll explain how this fits in by using the analogy of getting a house ready to move in. A static website would be analogous to a furnished apartment, where all the resident needs to do is show up. A CMS, on the other hand, is a fully-built house, but there’s no paint on the walls yet, and there’s no furniture. You’ll need to provide these niceties before you can move in, but you don’t need the expertise of a builder in order to get it ready. Maybe you’ll hire a designer to take care of some of it, or help with some of the decisions, but most people can manage this and do an acceptable job.

That’s pretty much what a CMS is: a website that’s pre-built, but needs that coat of paint, furniture, and some pictures on the walls. A web designer might help you with this, or even do some of that work, but many people can manage this on their own in a pinch. Once set up, a non-technical website owner can add and manage their content there. You may have heard of some common CMS options; WordPress and Drupal are some of the more popular ones. Lots of dynamic websites built today use one type of CMS or another. Even many static websites are now being built using a CMS; the website content may not need to change more than once every year or two, but it’s still nice not to need a developer to change the code directly.

How Django compares to a typical CMS system

While WordPress and Drupal are established platforms that can be used to create solid dynamic websites, they are all built around being a CMS first. The result of this is that building in interactive content can be a headache, since the frameworks weren’t really built for users to do much more than browse.

To return to our analogy, if a CMS is a pre-built house that’s missing the paint and furniture, Django is instead the pile of lumber, nails, tools, and other supplies needed to assemble that house. Building a house from those components is certainly not the sort of task that the average homeowner is comfortable taking on, but it has a distinct advantage if the homeowner needs something particularly custom, and that’s exactly where Django shines: in custom website creation.

While Django can be used to create a seamless dynamic website, its flexibility really pays off when building sites that are interactive, or which straddle the boundary between dynamic and interactive. The advantages of Django are numerous, from the vast diversity of Python libraries available (since Django is a Python framework), to the flexibility written into Django itself. If you’re curious to dig into the details of this, we’ve written in much more depth about why we use Django.

Conclusion

If you know that you’ll only ever need a CMS, and the most complex bit of interactivity you’ll need is an online store, then you can probably meet your needs using something like WordPress or Drupal. But if you want the ability to be flexible and add a lot of user interaction like posts, forums, or account management to your website, you’ll probably be better off with a Django solution.

Caktus has been building custom Django websites and apps since 2007. We’ve developed a success model for developing websites the right way and are always happy to chat about your project if you’re still not sure that Django is the right fit for you.

Caktus GroupUpgrading from Wagtail 1.0 to Wagtail 1.11

There are plenty of reasons to upgrade your Wagtail site. Before we look at how to do it, let’s take a look at a few of those reasons.

Why upgrade your Wagtail site?

  • Wagtail is now compatible with Django 1.11 and Python 3.6, so you can use the latest versions (at the time of this blog post) of all three together.
  • Page Revision Management was released in Wagtail 1.4, allowing users to preview and rollback revisions.
Page revision management in Wagtail

Image from http://docs.wagtail.io/

The Wagtail user bar

The new Wagtail Userbar shown with the top-left configuration, note it does not conflict with Django Debug Toolbar.

  • Streamfield was already really nice, but the addition of TableBlock looks useful for easily editing tabular data.
  • Page-level permissions for logged-in users belonging to specific groups are now possible via the new Page Privacy Options
  • Wagtail now supports many-to-many relations on the Page model.
  • If you’re using PostgreSQL, you can use the built-in PostgreSQL search engine rather than ElasticSearch.
  • Finally, with the June 2017 release of Wagtail 1.11, the Wagtail team updated the Wagtail Explorer with the new admin API and React components. The explorer is now faster to use, includes all of the pages in your site (not just parent pages), and lets you edit a page in fewer steps.

How I ported a Wagtail 1.0 site to Wagtail 1.11

Now that we’ve had a look at the features gained from updating, let’s see how to update.

I decided to port a Wagtail 1.0 project to Wagtail 1.11. I was able to upgrade from 1.0 to 1.11 directly, rather than upgrading version by version (which is a slower process), with a few changes along the way.

To start, I went ahead and created a brand new local virtual environment on my laptop. I pip installed all the current requirements for my Wagtail 1.0 project, and then updated Wagtail.

$(newwagtailenv) pip install -r requirements/dev.txt
$(newwagtailenv) pip install wagtail==1.11

Because we’re tracking versions of our requirements in a file, I updated the versions that required updates from the Wagtail update. This included updates to django-taggit and django-modelcluster among some other new requirements.

I assumed that data migrations would be required for this Wagtail upgrade. When I ran migrate, I encountered an issue right away.

$(newwagtailenv) python manage.py migrate
$(newwagtailenv) ... in bind_to_model
related = getattr(model, self.relation_name).related
TypeError: getattr(): attribute name must be string

I found this post to help me solve the issue. Going forward, I noticed, the Wagtail core team recommends using Stack Overflow to research Wagtail questions.

The error was caused because I was using an older style of the InlinePanel definitions with the page model as the first parameter. Because that style was deprecated in Wagtail 1.2, I needed to make a few code changes like this one:

Change:

InlinePanel(CaseStudyPage, 'countries', label="Countries"),

To:

InlinePanel('countries', label="Countries"),

The next error I saw when I tried to migrate had to do with tuples and lists.

$(newwagtailenv) python manage.py migrate
$(newwagtailenv) index.SearchField('intro'),
$(newwagtailenv) TypeError: can only concatenate list (not "tuple") to list

For the 1.5 release of Wagtail, the search_fields attribute on the Page models (and other searchable models) changed from a tuple to a list.

This was another pretty simple fix.

Change:

class MyPage(Page):
    ...
    search_fields = Page.search_fields + (
        index.SearchField('intro'),
    )

To:

class MyPage(Page):
    ...
    search_fields = Page.search_fields + [
        index.SearchField('intro'),
    ]

At this point, I was able to successfully run python manage.py migrate. I gave my test suite a try and it ran successfully, so I tested the site out locally as well. It worked beautifully.

That’s all I had to do! But I decided to do one last thing anyway.

I was excited about the fact that Wagtail solved the issue of not having many-to-many fields on the Page model in version 1.9. I read up on the new ParentalManyToManyField and made a plan to use it because having one fewer model meant that there would be less code to maintain long-term. Paying down some technical debt now meant that future developers who were maintaining this Wagtail site wouldn’t have to spend time researching an older work-around in order to get up to speed, and is generally considered best practice.

When we originally built this Wagtail site, I used the “through model” workaround described in this issue by defining three separate models for each many-to-many relationship. For instance, I had a CaseStudyPage, based off the Page model, the Country model, and a through model called CountryCaseStudy that created the many-to-many relationship between CaseStudyPage and Country.

Here’s how to move from the “through model” method to the new ParentalManyToManyField, including how to port the data:

Moving from the through model to the newly available many-to-many relationship via the ParentalManyToManyField

I wanted to move from the through model to the newly available many-to-many relationship via the ParentalManyToManyField.

  • Create a new field to replace the through model implementation on the Page model (CountryCaseStudy in this case) called countries_new
countries_new = ParentalManyToManyField('portal_pages.CaseStudyPage', blank=True)
  • Make a new migration file for this new field so it retains the old data before ripping out the old models.
$ python manage.py makemigrations
  • Create a new data migration file to copy data from the through model to countries_new.
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import models, migrations


# Loop through all Case Study pages and save the countries
# to the new ParentalManyToManyField

def no_op(apps, schema_editor):
    # Do nothing on reversal to data
    pass

def save_countries_to_new_parental_m2m(apps, schema_editor):
    # Need to import the actual model here
    # versus the "fake" model
    # so that the Clusterable model logic works and we can
    # successfully save the ParentalManyToManyField
    from portal_pages.models import CaseStudyPage

    for csp in CaseStudyPage.objects.all():
        csp.countries_new = [country.country for country in csp.countries.all()]
        csp.save()


class Migration(migrations.Migration):

    dependencies = [
        ('portal_pages', '0055_casestudypage_countries_new'),
        ('wagtailcore', '0038_make_first_published_at_editable'),
    ]

    operations = [
        migrations.RunPython(save_countries_to_new_parental_m2m, no_op),
    ]
  • Update existing code that used the through model to use the new field instead
  • Delete the through model now that it’s no longer needed
  • Run makemigrations and migrate again

The trickiest part for me was moving data to the newly implemented ParentalManyToManyField. Because the Page model is a Clusterable model, I needed to import the current model class rather than use the historical model state. I spent a little time figuring that out and have to thank Matthew Westcott, who guided me in the right direction from the Wagtail Slack channel.

You can see the updates I made on GitHub on the RapidPro Community Portal wagtail-updates branch. There is still more work to be done and we hope to complete it soon.

Conclusion

The Wagtail CMS has really come into its own as a beautiful and easy-to-use content management system. I highly recommend keeping your Wagtail site up to date to take advantage of all the newest features. To read more about Caktus and Wagtail, check out this post about adding pages outside of the CMS or this one about our participation in Wagtail Sprints.

Caktus GroupCaktus at DjangoCon2017

In less than a month we’ll be heading out to Spokane, WA for DjangoCon 2017. We’re proud to be attending as sponsors for the eighth year, and look forward to greeting everyone at our booth. On August 16th, we’ll be raffling off a GoPro Session action camera, so be sure to stop by and enter. We’ll also have our comfy new t-shirts and some limited-edition Caktus 10th Anniversary water bottles to give away. They went fast at PyCon, so don’t wait to get yours.

Swag and giveaways for the Caktus DjangoCon booth

As part of our commitment to sharing quality Django content with the community, we’ll also be offering a survey at the booth to find out what you, our audience, are interested in seeing more of. We hope you’ll help us out! If you can’t make it to DjangoCon but still want to participate, you can take the survey on Ona.

Speakers

One of our very own developers will be speaking at DjangoCon this year! We’re excited that Charlotte Mays was selected to speak about writing APIs for almost anything, in which she’ll cover the power and flexibility of Django Rest Framework.

Caktus developer Charlotte Mays delivering a talk

Congratulations to Charlotte! We hope you’ll all go have a listen on Monday, August 14th at 5:30pm.

Talks

In addition to Charlotte’s talk, Caktus developers have quite the list of what they’re excited to see, including:

See you there!

Working on a Django web or SMS project and looking for help? We’d love to see if we can help with team augmentation, a discovery workshop, or start-to-finish custom development. Contact us to set up a dedicated time to talk.

Caktus GroupConstructive Code Review (Bonus PyCon 2017 Must-See Talk)

There were so many good talks this year that we're including a bonus entry in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon.

Erik Rose’s talk “Constructive Code Review” is on the surface a talk about how to do just that: review code in a way that builds people up rather than tearing them down. However, in 40 minutes he manages to cover a breadth of topics relevant to anyone who works with other people, including (but not limited to): simple rules to assist you in maintaining constructive communications, tips on how to ensure you receive the feedback you want, methods to manage your emotional state, stress management, a three-step approach to training new people, and ideas on how to build trust. I found this talk so helpful that I’ve watched it twice and taken detailed notes, and recommended it to my teams to watch as well. Highly recommended, whether you code for a living or not!

Caktus GroupReadability Counts (PyCon 2017 Must-See Talk 6/6)

Part 6 in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

"Readability Counts" was a good talk about why your code should be readable and how you get it there. One of the things I appreciated was that while it was very developer-focused, it was human-oriented rather than technical.

In his presentation, Trey Hunner shared four reasons why code should be readable:

  • It makes your life easier
  • Code is more often read than written
  • It is easier to maintain readable code
  • It’s easier to onboard new team members

He also shared a few best practices to achieve this, including usage of white space, line breaks, and code structure; descriptive naming; and choosing the right construct and coding idioms.

Caktus GroupPython Tool Review: Using PyCharm for Python Development - and More

Back in 2011, I wrote a blog post on using Eclipse for Python Development.

I've never updated that post, and it's probably terribly outdated by now. But there's a good reason for that - I haven't used Eclipse in years. Not long after that post, I came across PyCharm, and I haven't really looked back.

Performance

Eclipse always felt sluggish to me. PyCharm feels an order of magnitude more responsive. Sometimes it takes a minute to update its indices after I've checked out a new branch of a very large project but usually, even that is barely noticeable. Once the indices are updated, everything is very fast.

Responding quickly is very important. If I'm deep in a problem and suddenly have to stop and wait for my editor to finish something, it can break my concentration and end up slowing me down much more than you might expect simply because an operation took a few seconds longer than it should.

It's not just editing that's fast. I can search for things across every file in my current project faster than I can type in the search string. It's amazing how useful that simple ability becomes.

Python

PyCharm knows Python. My favorite command is Control-B, which jumps to the definition of whatever is under the cursor. That's not so hard when the variable was just assigned a constant a few lines before. But most of the time, knowing the type of a variable at a particular time requires understanding the code that got you there. And PyCharm gets this right an astonishing amount of the time.

I can have multiple projects open in PyCharm at one time, each using its own virtual environment, and everything just works. This is another absolute requirement for my workflow.

The latest release even understands Python type annotations from the very latest Python, Python 3.6.

Django

PyCharm has built-in support for Django. This includes things like knowing the syntax of Django templates, and being able to run and debug your Django app right in PyCharm.

Git

PyCharm recognizes that your project is stored in a git repo and has lots of useful features related to that, like adding new files to the repo for you and making clear which files are not actually in the repo, showing all changes since the last commit, comparing a file to any other version of itself, pulling, committing, pushing, checking out another branch, creating a branch, etc.

I use some of these features in PyCharm, and go back to the command line for some other operations just because I'm so used to doing things that way. PyCharm is fine with that; when I go back to PyCharm, it just notices that things have changed and carries on.

Because the git support is so handy, I sometimes use PyCharm to edit files in projects that have no Python code at all, like my personal dotfiles and ansible scripts.

Code checking

PyCharm provides numerous options for checking your code for syntax and style issues as you write it, for Python, HTML, JavaScript, and probably whatever else you need to work on. But every check can be disabled if you want to, so your work is not cluttered with warnings you are ignoring, just the ones you want to see.

Cross-platform

When I started using PyCharm, I was switching between Linux at work and a Mac at home. PyCharm works the same on both, so I didn't have to keep switching tools.

(If you're wondering, I'm always using Linux now, except for a few hours a year when I do my taxes.)

Documentation

Admittedly, the documentation is sparse compared to, say, Django. There seems to be a lot of it on their support web site, but when you start to use it, you realize that most pages only have a paragraph or two that only touch on the surface of things. It's especially frustrating to look for details of how something works in PyCharm, and find a page about it, but all it says is which key invokes it.

Luckily, most of the time you can manage without detailed documentation. But I often wonder how many features could be more useful for me but I don't know it because what they do isn't documented.

Commercial product

PyCharm has a free and a paid version, and I use the paid version, which adds support for web development and Django, among other things. I suspect I'm like a lot of my peers in usually looking for free tools and passing over the paid ones. I might not ever have tried PyCharm if I hadn't received a free or reduced-cost trial at a conference.

But I'm here to say, PyCharm is worth it if you write a lot of Python. And I'm glad they have revenue to pay programmers to keep PyCharm working, and to update it as Python evolves.

Conclusion

I'm not saying PyCharm is better than everything else. I haven't tried everything else, and don't plan to. Trying a new development environment seriously is a significant investment in time.

What I can say is that I'm very happy and productive using PyCharm both at work and at home, and if you're dissatisfied with whatever you're using now, it might be worth checking it out.

(Editor’s Note: Neither the author nor Caktus have any connection with JetBrains, the makers of PyCharm, other than as customers. No payment or other compensation was received for this review. This post reflects the personal opinion and experience of the author and should not be considered an endorsement by Caktus Group.)

Caktus GroupRequests, Under the Hood (PyCon 2017 Must-See Talk 5/6)

Part five of six in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

My must-see talk this year was "Requests Under the Hood", in which Cory Benfield reveals some of the dark corners in the Requests library for Python. As one of the library’s core maintainers, Cory is in a unique position to share insights about how beautifully-written code intended for a specific problem becomes dirty over time as it is adapted to edge cases, workarounds, or hacks once deployed.

Cory respectfully sheds light on some of Requests’ most troubling code in an effort to provide teachable moments. He’s a natural speaker, so it made for an engaging presentation. It’s a great reminder for all developers to not rush to judgment when working with legacy code.

Og MacielJust What Is A Quality Engineer? Part 2

Picture of `Batman`_

The last time I wrote about Quality Engineering, I mentioned that some of the reasons why people are not familiar with this term are, in no particular order:

  • 'Quality' is usually something that is added as an after thought and doesn't really come into the picture, if ever, until the very end of the release process
  • Nobody outside of a QA team really knows what they do. It has something to do with testing...
  • Engineering is usually identified with skills related to writing code and designing algorithms, usually by a developer and not by QA

A quick search on Google shows the following results:

  • 104,000,000 hits for "Software Engineer"
  • 86,900,000 hits for “Quality Control”
  • 83,100,000 hits for “Quality Assurance”
  • 5,390,000 hits for “Quality Engineer”

As you can see, it is no wonder that whenever I say 'quality engineer' people always think that what I really meant to say was 'quality assurance' or 'quality control'. The term is just not that well-known! So in order to clarify what the difference is between these professions, today I'd like to talk a little bit about quality assurance and what I usually think whenever someone tells me that they either work in QA or have a 'QA team'.

Wikipedia tells us that the terms 'quality assurance' (QA) and 'quality control' (QC) are often used interchangeably to refer to ways of ensuring the quality of a service or product.

Furthermore,

"Quality assurance comprises administrative and procedural activities implemented in a quality system so that requirements and goals for a product, service or activity will be fulfilled. It is the systematic measurement, comparison with a standard, monitoring of processes and an associated feedback loop that confers error prevention." -- Wikipedia

That is quite a mouth full (the emphasized words are mine), but I feel that it does a good job at stating the following ideas:

  • Quality Assurance and/or Quality Control is used to assure the quality of a product, but there is no clear distinction as to when in the release process it should be used. In my experience, it usually happens when the product is close to being shipped!
  • Used to make sure that requirements (the what) are fulfilled (the how)
  • Used to measure, monitor and compare results against a standard
  • Used for error preventions (which to me denotes a reactive mode compared to a proactive mode)

In other words, those who do quality assurance for a living are involved in verifying that the final version of the product being tested delivers exactly what was designed with the expected behavior and outcome. It requires that the QA person fully understand what is being added to or changed in the product and, most importantly, what the end result should be. Testing is definitely a big part of the 'day to day' activities for someone in QA, which does provide useful information to create a positive feedback loop and hopefully increase error prevention.

Here's what I don't like about this whole business though:

Quality is something that must be part of all phases of a product and not at the very end of the process. A good QA person is usually so familiar with the product being tested that one could say that QA is the first customer a company has! If you have someone in your team who can fully understand how your product works, where the pain points are, knows at a glance if a new feature or a fix does not follow the existing standards, and has the ability to tell you if something doesn't feel right, would you want to hear this type of feedback at the very end? By then, can you really afford to put things on hold and re-design your product??? In my experience, the answer to this question has 99.99% of the time been 'No'.

Quality is the responsibility of everyone involved with a product and not only of those in QA! Everyone, document writers, translators, user experience (UX) experts, product managers, you name it, everyone should be in the business of delivering and assuring the quality of the product! If you bought something, would you be OK with accepting mediocre user experience, documentation, features and translations? I doubt it.

Monitoring and measuring how a product compares against some set of standardized benchmarks is definitely important but as customers request more and more new features and the product's complexity increases, are your benchmarks also keeping up with all these changes? More importantly, since you are the one using the product day and night, do you have any input into updating the benchmarks? I certainly hope so.

Lastly, if your job is to make sure that no product 'goes out the door' without a thorough validation, that it works as expected and that all known issues have been fixed, aren't you forgetting something? What about the issues that are not known yet? You may be thinking that I'm joking, but seriously. If all you do is prevent errors from being shipped to your customers, how about detecting them as early as possible to give all major stake-holders enough time to make a decision as to what should be done with them? Again, if you're catching them at the end of the release cycle, it could be too late.

If your company has a QA team, then you're already ahead of the game, since it is only when customer dissatisfaction is very high and the final numbers for the quarter start to look gloomy that people start paying attention to delivering quality. But it is not enough if you're only kicking the can down the road only to find yourself facing the same scenario later on! Quality, good quality, is what everyone in your team should be striving for... not some times, but all the time!

If you are in a QA team, do you ever feel like you're ahead of the game or feel like you're constantly playing catch up? Do you wish you could have a chance to catch issues as early as possible? Wouldn't you want to stop racing against the clock to get issues verified and have a shot at doing more exploratory testing and identify problems early on? Would you say 'no' to an opportunity to provide some insight into how the product could be improved and perhaps how some work-flows could be simplified to increase the usability?

It should be clear by now that quality is something that should be something systemic for any project or company who takes their customer satisfaction as their top priority! Sure you can test the product as much as you (or your QA team) can handle, but you'd be only treating the symptoms. Maintaining a 'quality first' mentality and improving existing processes to make sure that quality is an integral part of everyone's day to day activities is primordial if you really want to make a bigger impact!

This is when a Quality Engineer comes in! A Quality Engineer is someone who can actively and continuously keep driving improvements to the release cycle process and are in the unique position to help the entire team adopt these improvements so that everyone is using the same methodologies.

Next time I will then talk about quality engineering (QE), what it is, what it isn't, and how you should be either hiring more QE or, if you're in QA, how you should be working to become a QE!

As always, please let me know what your thoughts are on this topic as I'd live to get some constructive feedback!

Disclaimer: The opinions contained within this article are mine alone and do not necessarily represent the opinions of any entity whatsoever with which I have been, am now or will be affiliated.

Frank WierzbickiJython 2.7.1 final released!

On behalf of the Jython development team, I'm pleased to announce that the final release of Jython 2.7.1 is available! We thought 2017-07-01 was a perfect time to release version 2.7.1 :) This is a bugfix release. Bug fixes include improvements in ssl and pip support along with lots of improvements in CPython compatibility.

Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go to the maven query for org.python+Jython and navigate to the appropriate distribution and version.

Og MacielJust What Is A Quality Engineer? Part 1

Picture of `Batman`_

Whenever I meet someone for the first time, after we get past the initial niceties typically involved when you meet someone for the first time, eventually the conversation shifts to work and what one does for a living. Inevitably I'm faced with what, at a first glance, may sound like a simple question and the conversation goes like this:

  • New acquaintance: "What do you do at Red Hat?"
  • Me: "I manage a team of quality engineers for a couple of different products."
  • New acquaintance: "Oh, you mean quality assurance, right? QA?"
  • Me: "No, quality engineers. QE."

What usually followed then was a lengthy monologue whereby I spent usually around ten to fifteen minutes explaining what the difference between QA and QE is and what, in my opinion, sets these two professions apart. Now, before I get too deep into this topic, I have to add a disclaimer here so not to give folks the impression that what I'm talking about is backed by any official definition or some type of professional trade organization! The following are my own definitions and conclusions, none of which were pulled out of thin air, but backed by (so far) 10 years of experience working on the field of delivering quality products. If there are formal definitions out there, and they match with my own, it is by pure coincidence.

Why the term 'Quality Engineer' is not well known I'm not sure, but I have a hunch that it may be related to something I noticed throughout the 10 years that I have spent on this field. In my personal experience, 'quality' is something that is not always considered as part of the creation of a new company, product or project. Furthermore, the term 'quality' is also not well defined or understood by those involved in actually attempting to 'get more' of it.

In my experience, folks usually forget about the word 'quality', whatever that may be, happily start planning and developing their new ideas/products and eventually ship it to their customers. If the customer complains that something is not working or performing as advertised or it doesn't meet their expectations, no problem. Someone will convey the feedback back to the developers, a fix will eventually be provided and off it goes to the customer. Have you ever seen this before? I have!

Eventually, assuming that the business is doing well and is attracting more paying customers, it is highly likely that support requests or requests for new features will increase. After all, who wants to pay for something that doesn't work as expected? Also, who doesn't want a new feature of their own either? Depending on the size of the company and the number of new requests going into their backlog, I'd expect that either one of the following events would then take place:

  • More tasks from the backlog would be added to individual's 'plates', or
  • New associates would be hired to handle the volume of tasks

I guess one could also stop accepting new requests for support or new features, but that would not make your customers happy, would it?

Regardless of the outcome, the influx of new tasks is dealt with and if things get out of control again, one could always try to get an intern or distribute tasks more evenly. Now, notice how the word 'quality' has not been mentioned yet? It is no accident that to solve an increase of more work, most often than not the number one solution used is to throw more resources at it. There's even a name for this type of 'solution': The Mythical Man-Month.

You see, sadly, 'quality' is something that usually only becomes important as an afterthought. It is the last piece added to the puzzle that comprises the machinery of delivering something to an end user. It is only when enough angry and unsatisfied paying customers make enough noise about the unreliability or usability of the product that folks start asking: "Was this even tested before being put on the market?"

If the pain being inflicted by customer feedback is sharp enough, a Quality Assurance (QA) team is hastily put together. Most of the time in my experience, this is a Team of One usually made up of one of the developers who after being dragged kicking and screaming from his cubicle, eventually is beat into accepting his new role as a button pusher, text field filler, testing guy. Issues are then assigned to him and a general sense of relief is experienced by all. Have you also seen this before? I have! I'm 2 for 2 so far!

The idea is that by creating a team of one to sit in the receiving end of the product release cycle, nothing would get shipped until some level of 'quality' is achieved. The fallacy with this statement, however, is that no matter how agile your team may be, the assurance of the quality for a product somehow is still part of a waterfall model. Wouldn't it be better if problems were caught as early as possible in the process instead of waiting until the very end? To me that is a no brainer but somehow the process of testing a product is still relegated to the very end, usually when the date for the release is just around the corner.

Why is the term Quality Engineer not well known then? I feel that the answer is comprised of several parts:

  • 'Quality' doesn't come into the picture, if ever, until the very end of the game;
  • If there is a QA team, nobody outside of that team really knows what they do. It has something to do with testing...
  • Engineering is usually identified with skills related to writing code and designing algorithms, usually by a developer and not by QA;

No surprise that quality engineering is something foreign to most!

OK, so what is a Quality Engineer then? Glad you asked! The answer to that I shall provide in a subsequent post, as I still need to cover some more ground and talk about what 'quality' is, what someone in QA does and finally what is a QE!

My next article will continue this journey through the land of Quality and Engineering, and in the meantime, please let me know what you think about this subject.

Caktus GroupManaging your AWS Container Infrastructure with Python

We deploy Python/Django apps to a wide variety of hosting providers at Caktus. Our django-project-template includes a Salt configuration to set up an Ubuntu virtual machine on just about any hosting provider, from scratch. We've also modified this a number of times for local hosting requirements when our customer required the application we built to be hosted on hardware they control. In the past, we also built our own tool for creating and managing EC2 instances automatically via the Amazon Web Services (AWS) APIs. In March, my colleague Dan Poirier wrote an excellent post about deploying Django applications to Elastic Beanstalk demonstrating how we’ve used that service.

AWS have added many managed services that help ease the process of hosting web applications on AWS. The most important addition to the AWS stack (for us) was undoubtedly Amazon RDS for Postgres, launched in November 2013. As long-time advocates for Postgres, this addition to the AWS suite was the final puzzle piece necessary for building an AWS infrastructure for a typical Django app that requires little to no manual management. Still, the suite of AWS tools and services is immense, and configuring these manually is time-consuming and error-prone; despite everything it offers, setting up "one-click" deploys to AWS (à la Heroku) is still a complex challenge.

In this post, I'll be discussing another approach to hosting Python/Django apps and managing server infrastructure on AWS. In particular, we'll be looking at a Python library called troposphere that allows you to describe AWS resources using Python and generate CloudFormation templates to upload to AWS. We'll also look at a sample collection of troposphere scripts I compiled as part of the preparation for this post, which I've named (at least for now) AWS Container Basics.

Introduction to CloudFormation and Troposphere

CloudFormation is Amazon's answer to automated resource provisioning. A CloudFormation template is simply a JSON file that describes AWS resources and the relationships between them. It allows you to define Parameters (inputs) to the template and even includes a small set of intrinsic functions for more complex use cases. Relationships between resources are defined using the Ref function.

Troposphere allows you to accomplish all of the same things, but with the added benefit of writing Python code rather than JSON. To give you an idea of how Troposphere works, here's a quick example that creates an S3 bucket for hosting (public) static assets for your application (e.g., in the event you wanted to host your Django static media on S3):

from troposphere import Join, Template
from troposphere.s3 import (
    Bucket,
    CorsConfiguration,
    CorsRules,
    PublicRead,
    VersioningConfiguration,
)

template = Template()
domain_name = "myapp.com"

template.add_resource(
    Bucket(
        "AssetsBucket",
        AccessControl=PublicRead,
        VersioningConfiguration=VersioningConfiguration(Status="Enabled"),
        DeletionPolicy="Retain",
        CorsConfiguration=CorsConfiguration(
            CorsRules=[CorsRules(
                AllowedOrigins=[Join("", ["https://", domain_name])],
                AllowedMethods=["POST", "PUT", "HEAD", "GET"],
                AllowedHeaders=["*"],
            )]
        ),
    )
)

print(template.to_json())

This generates a JSON dump that looks very similar to the corresponding Python code, which can be uploaded to CloudFormation to create and manage this S3 bucket. Why not just write this directly in JSON, one might ask? The advantages to using Troposphere are that:

  1. it gives you all the power of Python to describe or create resources conditionally (e.g., to easily provide multiple versions of the same template),
  2. it provides compile-time detection of naming or syntax errors, e.g., via flake8 or Python itself, and
  3. it also validates (most of) the structure of a template, e.g., ensuring that the correct object types are provided when creating a resource.

Troposphere does not detect all possible errors you might encounter when building a template for CloudFormation, but it does significantly improve one's ability to detect and fix errors quickly, without the need to upload the template to CloudFormation for a live test.

Supported resources

Creating an S3 bucket is a simple example, and you don't really need Troposphere to do that. How does this scale to larger, more complex infrastructure requirements?

As of the time of this post, Troposphere includes support for 39 different resource types (such as EC2, ECS, RDS, and Elastic Beanstalk). Perhaps most importantly, within its EC2 package, Troposphere includes support for creating VPCs, subnets, routes, and related network infrastructure. This means you can easily create a template for a VPC that is split across availability zones, and then programmatically define resources inside those subnets/VPCs. A stack for hosting an entire, self-contained application can be templated and easily duplicated for different application environments such as staging and production.

AWS managed services for a typical web app

AWS includes a wide array of managed services. Beyond EC2, what are some of the services one might need to host a Dockerized web application on AWS? Although each application is unique and will have differing managed service needs, some of the services one is likely to encounter when hosting a Python/Django (or any other) web application on AWS are:

  • S3 for storing and serving static and/or uploaded media
  • RDS for a Postgres (or MySQL) database
  • ElastiCache, which supports both Memcached and Redis, for a cache, session store, and/or message broker
  • CloudFront, which provides edge servers for faster serving of static resources
  • Certificate Manager, which provides a free SSL certificate for your AWS-provided load balancer and supports automatic renewal
  • Virtual Private Clouds (VPCs) for overall network management
  • Elastic Load Balancers (ELBs), which allow you to transparently spread traffic across Availability Zones (AZs). These are managed by AWS and the underlying IPs may change over time.

Provisioning your application servers

For hosting a Python/Django application on AWS, you have essentially four options:

  • Configure your application as a set of task definitions and/or services using the AWS Elastic Container Service (ECS). This is a complex service, and I don't recommend it as a starting point.
  • Create an Elastic Beanstalk Multicontainer Docker environment (which actually creates and manages an ECS Cluster for you behind the scenes). This provides much of the flexibility of ECS, but decouples the deployment and container definitions from the infrastructure. This makes it easier to set up your infrastructure once and be confident that you can continue to use it as your requirements for running additional tasks (e.g., background tasks via Celery) change over the lifetime of a project.
  • Configure an array of EC2 instances yourself, either by creating an AMI of your application or manually configuring EC2 instances with Salt, Ansible, Chef, Puppet, or another such tool. This is an option that facilitates migration for legacy applications that might already have all the tools in place to provision application servers, and it's typically fairly simple to modify these setups to point your application configuration to external database and cache servers. This is the only option available for projects using AWS GovCloud, which at the time of this post supports neither ECS nor EB.
  • Create an Elastic Beanstalk Python environment. This option is similar to configuring an array of EC2 instances yourself, but AWS manages provisioning the servers for you, based on the instructions you provide. This is the approach described in Dan's blog post on Amazon Elastic Beanstalk.

Putting it all together

This was originally a hobby / weekend learning project for me. I'm much indebted to the blog post by Jean-Philippe Serafin (no relation to Caktus) titled How to build a scalable AWS web app stack using ECS and CloudFormation, which I recommend reading to see how one can construct a comprehensive set of managed AWS resources in a single CloudFormation stack. Rather than repeat all of that here, however, I'm going to focus on some of the outcomes and potential uses for this project.

Jean-Philippe Serafin provided all the code for his blog post on GitHub. Starting from that, I've updated and released another project -- a workable solution for hosting fully-featured Python/Django apps, relying entirely on AWS managed services -- on GitHub under the name AWS Container Basics. It includes several configuration variants (thanks to Troposphere) that support stacks with and without NAT gateways as well as three of the application server hosting options outlined above (ECS, EB Multicontainer Docker, or EC2). Contributions are also welcome!

Setting up a demo

To learn more about how AWS works, I recommend creating a stack of your own to play with. You can do so for free if you have an account that's still within the 12-month free tier . If you don't have an account or it's past its free tier window, you can create a new account at aws.amazon.com (AWS does not frown on individuals or companies having multiple accounts, in fact, it's encouraged as an approach for keeping different applications or even environments properly isolated). Once you have an account ready:

  • Make sure you have your preferred region selected in the console via the menu in the top right corner. Sometimes AWS selects an unintuitive default, even after you have resources created in another region.

  • If you haven't already, you'll need to upload your SSH public key to EC2 (or create a new key pair). You can do so from the Key Pairs section of the EC2 Console.

  • Next, click the button below to launch a new stack:

    https://s3.amazonaws.com/cloudformation-examples/cloudformation-launch-stack.png
  • On the Select Template page:

  • On the Specify Details page:

    • Enter a Stack Name of your choosing. Names that can be distinguished via the first 5 characters are better, because the name will be trimmed when generating names for the underlying AWS resources.
    • Change the instance types if you wish, however, note that the t2.micro instance type is available within the AWS free tier for EC2, RDS, and ElastiCache.
    • Enter a DatabaseEngineVersion. I recommend using the latest version of Postgres supported by RDS. As of the time of this post, that is 9.6.2
    • Generate and add a random DatabasePassword for RDS. While the stack is configured to pass this to your application automatically (via DATABASE_URL), RDS and CloudFormation do not support generating their own passwords at this time.
    • Enter a DomainName. This should be the fully-qualified domain name, e.g., myapp.mydomain.com. Your email address (or one you have access to) should be listed in the Whois database for the domain. The domain name will be used for several things, including generation of a free SSL certificate via the AWS Certificate Manager. When you create the stack, you will receive an email asking you to approve the certificate (which you must do before the stack will finish creating). The DNS for this domain doesn't need to exist yet (you'll update this later).
    • For the KeyName, select the key you created or uploaded in the prior step.
    • For the SecretKey, generate a random SECRET_KEY which will be added to the environment (for use by Django, if needed). If your application doesn't need a SECRET_KEY, enter a dummy value here. This can be changed later, if needed.
    • Once you're happy with the values, click Next.
  • On the Options page, click Next (no additional tags, permissions, or notifications are necessary, so these can all be left blank).

  • On the Review page, double check that everything is correct, check the "I acknowledge that AWS CloudFormation might create IAM resources." box, and click Create.

The stack will take about 30 minutes to create, and you can monitor its progress by selecting the stack on the CloudFormation Stacks page and monitoring the Resources and/or Events tabs.

Using the demo

When it is finished, you'll have an Elastic Beanstalk Multicontainer Docker environment running inside a dedicated VPC, along with an S3 bucket for static assets (including an associated CloudFront distribution), a private S3 bucket for uploaded media, a Postgres database, and a Redis instance for caching, session storage, and/or use as a task broker. The environment variables provided to your container are as follows:

  • AWS_STORAGE_BUCKET_NAME: The name of the S3 bucket in which your application should store static assets.
  • AWS_PRIVATE_STORAGE_BUCKET_NAME: The name of the S3 bucket in which your application should store private/uploaded files or media (make sure you configure your storage backend to require authentication to read objects and encrypt them at rest, if needed).
  • CDN_DOMAIN_NAME: The domain name of the CloudFront distribution connected to the above S3 bucket; you should use this (or the S3 bucket URL directly) to refer to static assets in your HTML.
  • DOMAIN_NAME: The domain name you specified when creating the stack, which will be associated with the automatically-generated SSL certificate.
  • SECRET_KEY: The secret key you specified when creating this stack
  • DATABASE_URL: The URL to the RDS instance created as part of this stack.
  • REDIS_URL: The URL to the Redis instance created as part of this stack (may be used as a cache or session storage, e.g.). Note that Redis supports multiple databases and no database ID is included as part of the URL, so you should append a forward slash and the integer index of the database, e.g., /0.

Optional: Uploading your Docker image to the EC2 Container Registry

One of the AWS resources created by AWS Container Basics is an EC2 Container Registry (ECR) repository. If you're using Docker and don't have a place to store images already (or would prefer to consolidate hosting at AWS to simplify authentication), you can push your Docker image to ECR. You can build and push your Docker image as follows:

DOCKER_TAG=$(git rev-parse HEAD)  # or "latest", if you prefer
$(aws ecr get-login --region <region>)
docker build -t <stack-name> .
docker tag <stack-name>:$DOCKER_TAG <account-id>.dkr.ecr.<region>.amazonaws.com/<stack-name>:$DOCKER_TAG
docker push <account-id>.dkr.ecr.<region>.amazonaws.com/<stack-name>:$DOCKER_TAG

You will need to replace <stack-name> with the name of the stack you entered above, <account-id> with your AWS Account ID, and <region> with your AWS region. You can also see these commands with the appropriate variables filled in by clicking the "View Push Commands" button on the Amazon ECS Repository detail page in the AWS console (note that AWS defaults to using a DOCKER_TAG of latest instead of using the Git commit SHA).

Updating existing stacks

CloudFormation, and by extension Troposphere, also support the concept of "updating" existing stacks. This means you can take an existing CloudFormation template such as AWS Container Basics, fork and tweak it to your needs, and upload the new template to CloudFormation. CloudFormation will calculate the minimum changes necessary to implement the change, inform you of what those are, and give you the option to proceed or decline. Some changes can be done as modifications whereas other, more significant changes (such as enabling encryption on an RDS instance or changing the solution stack for an Elastic Beanstalk environment) require destroying and recreating the underlying resource. CloudFormation will inform you if it needs to do this, so inspect the proposed change list carefully.

Coming Soon: Deployment

In the next post, I'll go over several options for deploying to your newly created stack. In the meantime, the AWS Container Basics README describes one simple option.

Og MacielOn Reading and writing

Picture of 'On Writing'

This week I started reading On Writing: A Memoir of the Craft by Stephen King, a book that has been mentioned a few times by people I usually interview for my weekly podcast as something that is both inspiring and has had a major impact on their lives and careers. After the third or forth time someone mentioned I finally broke down and got myself a copy at the local bookstore.

I have to say that, so far, I am completely blown away by this book! I can totally see why everyone else recommended it as something that people should add to their BTR (Books To Read) list! First of all, the first section of the book, which Stephen King calls his 'C.V.' (and not his memories or auto biography), covers his early life as a child, his experiences and struggles (there are quite a few passages that will most likely get you to laugh out loud) growing up with his mom and older brother, Dan. This section, roughly speaking around 100 pages or so, are so easy to relate to that you can probably be done with them in about 2 hours no matter what your reading pace is. I am always captivated to learn how someone 'came to be', the real 'behind the scenes' if you will, of how someone started out their lives and the paths they took to get to where they are now.

The next sections talk about what any aspiring writer should add to their 'toolbox' and it covers many interesting topics and suggestions which, if you really think about it, makes a ton of sense. This is where I am in the book right now, and though it isn't as captivating as the first section, it should still appeal to anyone looking for solid advice on how to become a better writer in my humble opinion.

Though I one day do aspire to become a published writer (fiction most likely), and I am enjoying this book that I'm having a real hard time putting it down, the reason why I chose to write about it is related to a piece of advice that Stephen King shares with the reader about the habit of reading.

Stephen King claims that, to become a better writer one must at least obey the following rules:

  • Read every day!
  • Write every day!

It is by reading a lot (something that should come naturally to anyone who reads every day) that one learns new vocabulary words, different styles of prose, how to structure ideas into paragraphs and rhythm. He says that it doesn't matter if you read in 'tiny sips' or in huge 'swallows', but as long as you continue to read every day, you'll develop a great and, in his opinion, required habit for becoming a better writer. Obviously, based on his two rules you'd need to write every day too, and if you're one of us who is toying with the idea of becoming a writer one day (or want to become a better writer), I too highly recommend that you give this book a shot! I know, I know, I have not finished it yet but still... I highly recommend it!

Back to the habit of reading and the purpose of this post, I remember back in 2008 my own 'struggle' to 'find the time' to read non technical books. You know, reading for fun? Back then I was doing a lot of reading, but mostly it consisted of blog posts and articles recommended by my RSS feeds, and since I was very much involved with a lot of different open source projects, I mostly read about GNOME, KDE, Ubuntu and Python. Just the thought of reading a book that did not cover any of these topics gave me a feeling of uneasiness and I couldn't picture myself dedicating time, precious time, to reading 'for fun.' But eventually I realized that I needed to add a bit more variety to my reading experience and that sitting in front of my computer during my lunch break would not help me with this at all. There were too many distractions to lure me away from any book I may be trying to read.

I started out by picking up a book that everyone around me had mentioned many times as being 'wicked cool' and 'couldn't put it down' kind of book. Back then I worked at a startup and most of the engineers around me were much younger than me and at one point or another most of them were into 'the new Harry Potter' book. I confess that I felt judgmental and couldn't fathom the idea of reading a 'kid book' but since I was trying to create a new habit and since my previous attempts had failed miserably, I figured that something drastic was just what the doctor would have recommended. One day after work, before driving back home, I stopped by the public library and picked up Harry Potter and the Sorcerer's Stone.

Next day at work when I took my lunch break, I locked my laptop and went downstairs to a quiet corner of the building's lobby. I picked a nice, comfortable seat with a lot of natural sun light and view of the main entrance and started reading... or at least I thought I did. Whenever I started to read a paragraph, someone would open the door at the main entrance to the building either on their way in or out, and with them went my focus and my mind would start wandering. Eventually I'd catch myself and back to the book my eyes went, only to be disrupted by the next person opening the door. Needless to say, experiment 'Get More Reading Done' was an utter failure!

Caktus Group5 Ways to Deploy Your Python Web App in 2017 (PyCon 2017 Must-See Talk 4/6)

Part four of six in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

I went into Andrew T Baker’s talk on deploying Python applications with some excitement about learning some new deployment methods. I had no idea that Andrew was going to deploy a simple “Hello World” app live, in 5 different ways!

  1. First up, Andrew used ngrok to expose localhost running on his machine to the web. I’ve used ngrok before to share with QA, but never thought about using it to share with a client. Interesting!

  2. Heroku was up next with a gunicorn Python web app server, with a warning that scaling is costly after the one free app per account.

  3. The third deploy was “Serverless” with an example via AWS Lambda, although many other serverless options exist.

  4. The next deploy was described as the way most web shops deploy, via virtual machines. The example deploy was done over the Google Cloud Platform, but another popular method for this is via Amazon EC2. This method is fairly manual, Andrew explained, with a need to Secure Shell (SSH) into your server after you spin it up.

  5. The final deploy was done via Docker with a warning that it is still fairly new and there isn't as much documentation available.

I am planning to rewatch Andrew’s talk and follow along on my machine. I’m excited to see what I can do.

Tim HopperPython Plotting for Exploratory Data Analysis

Plotting is an essential component of data analysis. As a data scientist, I spend a significant amount of my time making simple plots to understand complex data sets (exploratory data analysis) and help others understand them (presentations).

In particular, I make a lot of bar charts (including histograms), line plots (including time series), scatter plots, and density plots from data in Pandas data frames. I often want to facet these on various categorical variables and layer them on a common grid.

To that end, I made pythonplot.com, a brief introduction to Python plotting libraries and a "rosetta stone" comparing how to use them. I also included comparison to ggplot2, the R plotting library that I and many others consider a gold standard.

Philip SemanchukAnalyzing the Anglo-Saxonicity of the Baby BNC

Summary

This is a followup to an earlier post about using Python to measure the “Anglo-Saxonicity” of a text. I’ve used my code to analyze the Baby version of the British National Corpus, and I’ve found some interesting results.

How to Measure Anglo-Saxonicity – With a Ruler or Yardstick?

Introduction

Thanks to a suggestion from Ben Sizer, I decided to analyze the British National Corpus. I started with the ‘baby’ corpus which, as you might imagine, is smaller than the full corpus.

It’s described as a “100 million word snapshot of British English at the end of the twentieth century“. It categorizes text samples into four groups: academic, conversations, fiction, and news. Below are stack plots showing the percentage of Anglo-Saxon, non-Anglo-Saxon, and unknown words for each document in each of the four groups. The Y axis shows the percentage of words in each category. The numbers along the X axis identify individual documents within the group.

I’ve deliberately given the charts non-specific names of Group A, B, C, and D so that we can play a game. :-)

Before we get to the game, here’s the averages for each group in table form. (The numbers might not add exactly to 100% due to rounding.)

Anglo-Saxon (%) Non-Anglo-Saxon (%) Unknown (%)
Group A 67.0 17.7 15.3
Group B 56.1 25.8 18.1
Group C 72.9 13.2 13.9
Group D 58.6 22.0 19.3

Keep in mind that “unknown” words represent shortcomings in my database more than anything else.

The Game

The Baby BNC is organized into groups of academic, conversations, fiction, and news. Groups A, B, C, and D each represent one of those groups. Which do you think is which?

Click below to reveal the answer to the game and a discussion of the results.

Answers

Anglo-Saxon (%) Non-Anglo-Saxon (%) Unknown (%)
A = Fiction 67.0 17.7 15.3
B = Academic 56.1 25.8 18.1
C = Conversations 72.9 13.2 13.9
D = News 58.6 22.0 19.3

Discussion

With the hubris that only 20/20 hindsight can provide, I’ll say that I don’t find these numbers terribly surprising. Conversations have the highest proportion of Anglo-Saxon (72.9%) and the lowest of non-Anglo-Saxon (13.2%). Conversations are apt to use common words, and the 100 most common words in English are about 95% Anglo-Saxon. The relatively fast pace of conversation doesn’t encourage speakers to pause to search for those uncommon words lest they bore their listener or lose their chance to speak. I think the key here is not the fact that conversations are spoken, but that they’re impromptu. (Impromptu if you’re feeling French, off-the-cuff if you’re more Middle-English-y, or extemporaneous if you want to go full bore Latin.)

Academic writing is on the opposite end of the statistics, with the lowest portion of Anglo-Saxon words (56.1%) and the highest non-Anglo-Saxon (25.8%). Academic writing tends to be more ambitious and precise. Stylistically, it doesn’t shy away from more esoteric words because its audience is, by definition, well-educated. It doesn’t need to stick to the common core of English to get its point across. In addition, those who shaped academia were the educated members of society, and for many centuries education was tied to the church or limited to the gentry, and both spoke a lot of Latin and French. That has probably influenced even the modern day culture of academic writing.

Two of the academic samples managed to use fewer than half Anglo-Saxon words. They are a sample from Colliding Plane Waves in General Relativity (a subject Anglo-Saxons spent little time discussing, I’ll wager) and a sample from The Lancet, the British medical journal (49% and 47% Anglo-Saxon, respectively). It’s worth noting that these samples also displayed highest and 5th highest percentage of words of unknown etymology (26% and 21%, respectively) of the 30 samples in this category. A higher proportion of unknowns depresses the results in the other two categories.

Fiction rests in the middle of this small group of 4 categories, and I’m a little surprised that the percentage of Anglo-Saxon is as high as it is. I feel like fiction lends itself to the kind of description that tends to use more non-Anglo-Saxon words, but in this sample it’s not all that different from conversation.

News stands out for having barely more Anglo-Saxon words than academic writing, and also the highest percentage of words of unknown etymological origin. The news samples are drawn principally from The Independent, The Guardian, The Daily Telegraph, The Belfast Telegraph, The Liverpool Daily Post and Echo, The Northern Echo, and The Scotsman. It would be interesting to analyze each of these groups independently to see if they differ significantly.

Future

My hypothesis that conversations have a high percentage of Anglo-Saxon words because they’re off-the-cuff rather than because they’re spoken is something I can challenge with another experiment. Speeches are also spoken, but they’re often written in advance, without the pressure of immediacy, so the author would have time to reach for a thesaurus. I predict speeches will have an Anglo-Saxon/non-Anglo-Saxon profile closer to that of fiction than of either of the extremes in this data. It might vary dramatically based on speaker and audience, so I’ll have to choose a broad sample to smooth out biases.

I would also like to work with the American National Corpus.

Stay tuned, and let me know in the comments if you have observations or suggestions!

 

 

 

 

Caktus GroupThe Caktus Success Model

Here at Caktus, we’ve been fortunate to work on some incredible projects with our clients. For example, we built the world’s first SMS voter registration app for Libya, developed a digital archiving system for the world’s largest on-demand video provider, and created a survey tool with auto-scaling architecture to help the University of Chicago track school reform.

While working on these and other projects over the past 10 years, we recognized a set of factors that contribute to development done right, which we call the Caktus Success Model. Those factors are strategic partnerships, a sharp team, focusing on impact, and developing scalable apps. Keep reading for more details on each factor, or download our Shipping Faster white paper for tips on implementing them.

Strategic Partnerships

We look at the word “partnership” a little differently here. Rather than simply focusing on building strong external connections, we actively look for ways to break down internal silos. Our sales and marketing team is invited to sit in on sprint reviews, our technical staff contribute to content for our blog, and all areas of the business are encouraged to communicate openly with each other. As a result, we can offer a more cohesive experience to our clients and deliver applications that take into account the expertise - and expectations - of all of the stakeholders involved.

How does this contribute to our success? It removes barriers to working effectively, improves morale, and helps us actively manage scope for client projects.

Focusing on Impact

We use the Agile Scrum methodology in our work to better prioritize the features that will have the most impact for business value. Because cost and schedule are often less flexible than scope, effective prioritization ensures that the features being worked on are those that are most important to meet project objectives.

We call our approach Most Valuable Features First (MVFF), which is essentially a more approachable way to look at Minimum Viable Product (MVP). We’ve found it does more to encourage buy-in and input from non-technical stakeholders. As part of the MVFF process, stakeholders come to an agreement on what features are the most valuable to them and how high on the priority list they should go. This approach complements our Agile development process well because we receive feedback quickly and can change course if needed.

Scalable Apps

Approaching projects with the assumption that they will eventually grow or change is key to the planning process at Caktus. Maintaining high code quality and building with future scalability in mind is essential to delivering quality web apps. A deadline cannot be allowed to overrule quality work, which is why we insist on 90%+ test coverage, peer code reviews on every pull request, PEP8 coding standards, and following Django best practices.

Although it can seem like this slows things down or limits scope in the short-term, future updates and improvements will come much more easily to projects developed correctly the first time. For our clients, our focus on technical clarity and robustness from the start of the project means that they save time and money on future work. For us, it means we can take pride in our work and spend less time onboarding new developers or new projects.

Building with scalability and future updates in mind is especially important considering that your web app will need regular updates over the course of its lifetime. (Not convinced? We’ve got a blog post sharing 3 big reasons why you should update your Django app.)

A Sharp Team

The sharp team at Caktus is our most valuable asset. Our staff includes core Django contributors, as well as developers dedicated to contributing to the wider open source community. In our hiring process, we look for individuals with depth of vision and a passion for tackling complex challenges. We also strive to build a diverse team, because we find that a variety of perspectives enables us to find solutions that may otherwise be overlooked. Additionally, getting everyone together under one roof and building a co-located team increases the effectiveness of our communication.

It’s not just about what people bring with them when they join Caktus, however. One of our company benefits is a personal development budget for each member of staff and dedicated time off to attend conferences, workshops, and trainings that benefit further career growth. We also hold quarterly ShipIt days to provide dedicated time for our entire team - both developers and non-technical staff - to work on a project or skill of personal interest. Investing in our team once they’re here ensures that the individuals, the company, and our clients all benefit from their growth.

Shipping Faster

Now that you’ve seen the overview of what makes Caktus successful, read more about each factor, including the steps to take to implement each one and more on the benefits of doing so, in our free white paper.

Frank WierzbickiJython 2.7.1 release candidate 3 released!

On behalf of the Jython development team, I'm pleased to announce that the third release candidate of Jython 2.7.1 is available! This is a bugfix release. Bug fixes include improvements in ssl and pip support.

Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go to the maven query for org.python+Jython and navigate to the appropriate distribution and version.

Caktus GroupHow Documentation Works, and How to Make It Work for Your Project (PyCon 2017 Must-See Talk 3/6)

Part three of six in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

A talk that stuck out in my mind from PyCon 2017 was Daniele Procida’s "How documentation works, and how to make it work for your project". In his presentation, he walks through four types of documentation: “Tutorials”, “How-To Guides”, “Reference Guides”, and “Discussions”, giving examples of each, as well as the reasons why each one is valuable to a project.

We value good documentation at Caktus and try to make our projects easily understandable for both technical and non-technical people, so it is always interesting to hear others’ thoughts on the topic. I found Daniele’s presentation clear in terms of why each of the four types of documentation is valuable, and actionable in terms of how to structure documentation within projects.

Caktus GroupTriAgile 2017 Recap

I attended the TriAgile conference in Raleigh, NC on March 30th, 2017 and wanted to share some takeaways from a couple of the talks I saw. Overall, the conference was a nice opportunity to network with some local Agilists and hear about their work and experiences. It was well organized, and I was pleasantly surprised by the size of the Agile community in the Triangle!

The Power of an Agile Mindset

Linda Rising’s keynote talk, "The Power of an Agile Mindset" addressed the differences between fixed versus growth mindsets and how that relates to Agile. She began by outlining an experiment in which students were split into “growth” and “fixed” groups after being given an easy set of questions. Later presented with a series of choices, the fixed group students chose easier tests, felt easily discouraged, showed decreased performance, and lied about their test scores. The growth group students, however, chose more challenging tests, worked hard, enjoyed the challenges, wanted to see tests of students who did better than them, and showed improved performance. (See Mindset by Carol Dweck for further reading on this experiment.)

When boiled down, the fixed mindset views ability as a static attribute that cannot be changed or improved (like height), avoids challenges, is defined by failure, sees effort as being for those who have no talent, and reacts with helplessness in the face of challenge. The growth mindset, or Agile mindset, views ability as an attribute that can be exercised and grown like a muscle, wants to learn, embraces challenge, learns from and is resilient to failure, and recognizes effort as the path to mastery.

When put into the context of an organization, a fixed mindset results in companies who hire only “the best talent” and then continuously fire the lowest performing employees (a practice called “rank and yank”). Growth mindset companies hire based on attitude, focusing on growing their employees, providing learning opportunities, and establishing a culture of trust.

Mindset is important in the practice of Agile at all levels of an organization, from management, to the teams, to the individuals. A fixed mindset can grow to become an Agile mindset through feedback and experimental manipulations (sound familiar?). Rising offered some practical advice for how to reach an Agile mindset, such as praising effort, practice, strategy, and process rather than talent or innate abilities, and encouraging making mistakes and learning from failure rather than punishing or ignoring it.

I found this talk inspiring, and was proud to see that Caktus squarely aligns with the characteristics of the growth mindset companies. Agile is a journey, not a destination: in the end, “reaching the Agile mindset” is not about attaining some nebulous end goal, and more about valuing the process of learning and growing along the way.

The Stances of Coaching

The talk that was most relevant to my role as ScrumMaster was “The Stances of Coaching” by AgileBill Krebs. Krebs’ talk was an introduction to the four “stances” that an Agile Coach must learn to consciously adopt depending on the situation at hand. They are:

  • TEACH: In Teaching stance, the coach relays their expertise on the subject to a group.
  • FACILITATE: In Facilitation stance, the coach leaves their expertise at the door and remains neutral in order to facilitate a group’s conversation/meeting/activity.
  • MENTOR: In Mentoring stance, the coach uses their expertise to help an individual improve.
  • COACH: In Coaching stance, the coach remains neutral and guides an individual on their journey to answer their own questions.

A diagram of the four Agile coaching stances.

The most practical advice that I took away from this talk was to ask powerful questions that are open rather than leading, not stacking questions (asking one at a time and giving the person or group a chance to answer), and getting comfortable with silence. Consciously switching stances can be challenging, but knowing which stance is appropriate will get you halfway there.

Some of the other talks I saw were:

  • “Things to Consider to Ensure a Successful Minimum Viable Product (MVP)” by Thomas Friend, who emphasized dedicating resources to simplifying the MVP in order to focus on core functionality, building and rolling out a small product, and then improving on it once market feedback can be collected.
  • “Why Are There No Women On Our Team? Cognitive Biases We See in Product Development” by Catherine Louis, who explained some common biases and the benefits of having varied and diverse perspectives on development teams, leading to increased creativity, new ideas, and a broader culture.
  • “Consensus Workshop: Group Facilitation to Generate Creative Solutions” by Becca Halstead, who walked the audience through the Consensus Workshop Method in five stages: Context, Brainstorm, Cluster, Name, Resolve. This method helps generate true consensus from a large number of ideas in an engaging way.
  • “Release Control, Become Good Servants” by Camille Spruill, who outlined the characteristics of a good servant-leader, as well as the goals of building teams and communities, empowering others, and bringing out the best in them.

The variety of talks at TriAgile meant there was something on offer for anyone working in an Agile environment. I was tempted to simply follow the Coaching track, but I am glad to have deviated from it in order to see some of the talks that were less directly relevant to my role. I will certainly return for TriAgile 2018, where I also hope to see more pre-conference workshops available.

Caktus GroupPython at Instagram (PyCon 2017 Must-See Talk 2/6)

Part two of six in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

One of the talks that I considered a must-see was a keynote presentation by Instagram employees Lisa Guo and Hui Ding about upgrading Python and Django.

This is something that many businesses know they "should" do, but think is too impractical "right now". Instagram performed an upgrade without any downtime and without slowing down the pipeline of new features. Caktus carries out Django upgrades as part of our managed hosting and upgrade protection services, so this talk was especially relevant to what we do as a company.

Tim HopperParallelizing a Python Function for the Extremely Lazy

Do you ever want to be able to run a Python function in parallel on a set of inputs? Have you ever gotten frustrated with the GIL, the multiprocessing library, or joblib?

Try this:

Install Python Fire to run your command from the command line

Install Python Fire with $ pip install fire.

Add this snippet to the bottom of your file:

if __name__ == '__main__':
    import fire
    fire.Fire()

Install GNU Parallel

$ brew install parallel or $ sudo apt-get install parallel may work for you. Otherwise, see this.

Run your function from the command line

$ parallel -j3 "python python_file.py function_name {1} " ::: input1 input2 input3 input4 input5

  • parallel is the command for GNU Parallel.
  • -j3 tells Parallel to run at most 3 processes at once.
  • {1} fills in each item after the ::: as an argument to the function_name.

For example

(lazy) ~ $ cat python_file.py
from time import sleep

def function_name(arg1):
    print("Starting to run with", arg1)
    sleep(2)
    print("Finishing to run with", arg1)

if __name__ == '__main__':
    import fire
    fire.Fire()
(lazy) ~ $ parallel -j3 --lb  "python -u python_file.py function_name {1} " ::: input1 input2 input3 input4 input5
Starting to run with input2
Starting to run with input1
Starting to run with input3
Finishing to run with input2
Finishing to run with input1
Finishing to run with input3
Starting to run with input4
Starting to run with input5
Finishing to run with input4
Finishing to run with input5

I added --lb and -u to keep Python and Parallel from buffering the output so you can see it being run in parallel.

Tim HopperCondaHTTPError: HTTP 401 UNAUTHORIZED for url

I was getting this message when I tried to install packages from conda-forge with Conda:

Fetching package metadata ...
CondaHTTPError: HTTP 401 UNAUTHORIZED for url <https://conda.anaconda.org/conda-forge/osx-64/repodata.json>
Elapsed: 00:00.920954
CF-RAY: 36ad7cbd5d1c23d8-IAD

The remote server has indicated you are using invalid credentials for this channel.

If the remote site is anaconda.org or follows the Anaconda Server API, you
will need to
  (a) remove the invalid token from your system with `anaconda logout`, optionally
      followed by collecting a new token with `anaconda login`, or
  (b) provide conda with a valid token directly.

Further configuration help can be found at <https://conda.io/docs/config.html>.

I tried to do $ anaconda logout but didn't have a program called anaconda installed.

You can install the Anaconda Cloud Client with $ conda install anaconda-client.

After that, I was able to do $ anaconda logout followed by $ anaconda login where I used my old Binstar credentials (now anaconda.org).

I'm not the only one having this problem.

Caktus GroupDecorators, Unwrapped: How Do They Work? (PyCon 2017 Must-See Talk 1/6)

Part one of six in the 2017 edition of our annual PyCon Must-See Series, highlighting the talks our staff especially loved at PyCon. While there were many great talks, this is our team's shortlist.

Back when I started coding in Python, I remember that decorators were one of the most difficult concepts for me to understand. At the time, I tried to watch a couple of videos about them to better understand them and ended up with no more clarification.

In her talk "Decorators, unwrapped", Katie Silverio uses the example of a “timer” decorator which will log the time any function takes to stdout.

And then, Katie does something really great. She writes some code without any decorators which will act like a decorator. It was a really novel way to teach the audience about how decorators work. I’m looking forward to giving this talk a re-watch the next time I need to use a decorator in some code.

Frank WierzbickiJython 2.7.1 release candidate 2 released!

On behalf of the Jython development team, I'm pleased to announce that the second release candidate of Jython 2.7.1 is available! This is a bugfix release. Bug fixes include improvements in ssl and pip support.

Please see the NEWS file for detailed release notes. This release of Jython requires JDK 7 or above.

This release is being hosted at maven central. There are three main distributions. In order of popularity:
To see all of the files available including checksums, go to the maven query for org.python+Jython and navigate to the appropriate distribution and version.

Caktus GroupSubtests are the Best

Subtests are the best

Testing our code is important. Because developers write bugs, it’s valuable to catch and correct them before the code gets to production so our apps work as they should. Specifically, we want tests that are DRY (Don’t Repeat Yourself), thorough, and readable. Though there are many ways to try to accomplish these goals, subtests make each of them easier. If you’re not using subtests in your test classes, you probably should be.

Subtests were introduced in Python 3.4, and a few uses are briefly covered in the Python documentation. However, there are many more uses that have made my testing better. Their value is probably best seen through an example.

DRYer code and using parameters for cleaner errors

Let’s say we have a function is_user_error() that takes a status code and returns True if it means a user error (note that any 400-level status code is considered user error) has occurred. We could test this function with one test that has many assertions:

import unittest

from ourapp.functions import is_user_error


class IsUserErrorTestCase(unittest.TestCase):
    def test_yes(self):
        """User errors return True."""
        self.assertTrue(is_user_error(400))
        self.assertTrue(is_user_error(401))
        self.assertTrue(is_user_error(402))
        self.assertTrue(is_user_error(403))
        self.assertTrue(is_user_error(404))
        self.assertTrue(is_user_error(405))
        

But that’s many lines of code, so to be DRYer, we could test the same functionality by writing a for loop:

import unittest

from ourapp.functions import is_user_error


class IsUserErrorTestCase(unittest.TestCase):
    def test_yes(self):
        """User errors return True."""
        for status_code in range(400, 499):
            self.assertTrue(is_user_error(status_code))

That’s a lot DRYer, and allows us to test many more status codes, but if one of those status codes failed, we would get something like:

======================================================================
FAIL: test_yes (IsUserErrorTestCase)
User errors return True.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_example.py", line 10, in test_yes
    self.assertTrue(is_user_error(status_code))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (failures=1)

So we’re left wondering, which status code failed? Instead, we can use subtests with parameters:

import unittest

from ourapp.functions import is_user_error


class IsUserErrorTestCase(unittest.TestCase):
    def test_yes(self):
        """User errors return True."""
         for status_code in range(400, 499):
            with self.subTest(status_code=status_code):
                 self.assertTrue(is_user_error(status_code))

Our failure becomes:

======================================================================
FAIL: test_yes (IsUserErrorTestCase) (status_code=405)
User errors return True.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_example.py", line 10, in test_yes
    self.assertTrue(is_user_error(status_code))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (failures=1)

Which lets us know in the first line that the status_code 405 was the one that failed.

As you can see, subtests give us DRYer code than a single test, and clearer error messages than multiple tests.

DRYer code and using a message (msg) for cleaner errors

Another example: Let's say we're working on a Django project where we created a new API endpoint (/api/people/), and we want to test the fields that are required for it. For the sake of this example, let's say that the endpoint looks like this:

{
    "id": "",
    "first_name": "",
    "last_name": "",
    "address": "",
    "birthdate": "",
    "favorite_color": "",
    "favorite_number": "",
    "timezone_name": ""
}

and the required fields for POSTing are: first_name, last_name, and address. One way to test which fields are required would be to write a single test:

from django.test import TestCase


class PeopleEndpointTestCase(TestCase):
    def setUp(self):
        super().setUp()
        # Do the rest of the setup for logging in the user and giving the user
        # the required permissions

    def test_people_endpoint_post_data(self):
        """Test POSting to the people endpoint with valid and invalid data."""
        url = '/api/people/'
        minimum_required_data = {
            "first_name": "Joe",
            "last_name": "Shmo",
            "address": "123 Fake Street",
        }

        # POSTing with all of the required data
        response = self.client.post(
            url,
            data=minimum_required_data,
            content_type='application/json')
        self.assertEqual(response.status_code, 201)

        # POSTing with the first_name missing
        data = minimum_required_data.copy()
        data.pop('first_name')
        response = self.client.post(
            url,
            data=data,
            content_type='application/json')
         self.assertEqual(response.status_code, 400)

        # POSTing with the last_name missing
        data = minimum_required_data.copy()
        data.pop('last_name')
        response = self.client.post(
            url,
            data=data,
            content_type='application/json')
        self.assertEqual(response.status_code, 400)

        # POSTing with the address missing
        data = minimum_required_data.copy()
        data.pop('address')
        response = self.client.post(
            url,
            data=data,
            content_type='application/json')
        self.assertEqual(response.status_code, 400)

But that's not very DRY, since each section does the same thing. If we split each section into its own test it might be easier to read, but it would be even less DRY. Instead, we could write a loop for each of the sections like this:

from django.test import TestCase

class PeopleEndpointTestCase(TestCase):
    def setUp(self):
        super().setUp()
        # Do the rest of the setup for logging in the user and giving the user
        # the required permissions

    def test_people_endpoint_post_data(self):
        """Test POSTing to the /people/ endpoint with valid and invalid data."""
        url = '/api/people/'
        minimum_required_data = {
            "first_name": "Joe",
            "last_name": "Shmo",
            "address": "123 Fake Street",
        }

        with self.subTest('POSTing with all of the required data'):
            response = self.client.post(
                url,
                data=minimum_required_data,
                content_type='application/json')
            self.assertEqual(response.status_code, 201)

        missing_subtests = (
            # A tuple of (field_name, subtest_description)
            ('first_name', 'Missing the first_name field'),
            ('last_name', 'Missing the last_name field'),
            ('address', 'Missing the address field'),
        )
        for field_name, subtest_description in missing_subtests:
            with self.subTest(subtest_description):
                # Remove the missing field from the minimum_required_data
                data = minimum_required_data.copy()
                data.pop(field_name)

                # POST with the missing field name
                response = self.client.post(
                    url,
                    data=data,
                    content_type='application/json')
                self.assertEqual(response.status_code, 400)

Now the test uses up fewer lines, and any failures are logged into the console based on the subtest_description.

Independent tests

Another thing to be aware of is that subtests run independently, so we get all of the subtest failures printed into the console, rather than just the first one.

For example, let’s say we’re working on a Django project that has a User model in the our_cool_app app who sometimes creates Blogposts, and we want to track the most recent time that this user has posted, so we create a method called get_latest_activity(). The Blogpost model has a created_datetime field that tracks when it was created, and a creator field that tracks the user that created it.

# models.py
from django.db import models
from django.utils import timezone

from our_cool_app.models import User


class Blogpost(models.Model):
    # ...other fields here...
    created_datetime = models.DateTimeField(default=timezone.now)
    creator = ForeignKey(User, on_delete=models.CASCADE)

We should test that this method works correctly, so we can write the test a few different ways. As one test:

import datetime

from django.test import TestCase
from django.utils import timezone


class UserTestCase(TestCase):
    def test_get_latest_activity(self):
    user = UserFactory()

    # A user with no blogposts has no latest activity
    self.assertIsNone(user.get_latest_activity())

    # A user with 1 blogpost has that blogpost's created_time as the latest activity
    blogpost = BlogpostFactory(creator=user)
    self.assertEqual(user.get_latest_activity(), blogpost.created)

    # A user with multiple blogposts
    a_day_ago = timezone.now() - datetime.timedelta(days=1)
    a_week_ago = timezone.now() - datetime.timedelta(days=7)
    BlogpostFactory(creator=user, created_datetime=a_day_ago)
    BlogpostFactory(creator=user, created_datetime=a_week_ago)
    self.assertEqual(user.get_latest_activity(), blogpost.created)

    # Future blogposts don't count
    tomorrow = timezone.now() + datetime.timedelta(days=1)
    BlogpostFactory(creator=user, created_datetime=tomorrow)
    self.assertEqual(user.get_latest_activity(), blogpost.created)

    # Other people's blogposts don't count
    another_user = UserFactory()
    BlogpostFactory(creator=another_user)
    self.assertEqual(user.get_latest_activity(), blogpost.created)

This looks ok as long as we are careful to make clear comments and use blank lines for better readability. However, if the first assertion (# A user with no blogposts has no latest activity) fails, then the rest of the test doesn’t run. Instead, we can use subtests to write:

import datetime

from django.test import TestCase
from django.utils import timezone


class UserTestCase(TestCase):
    def test_get_latest_activity(self):
        """Test the get_latest_activity() method."""
        user = UserFactory()

        with self.subTest("A user with no blogposts has no latest activity"):
            self.assertIsNone(user.get_latest_activity())

        with self.subTest("A user with one blogpost"):
            blogpost = BlogpostFactory(creator=user)
            self.assertEqual(user.get_latest_activity(), blogpost.created)

        with self.subTest("A user with multiple blogposts"):
            a_day_ago = timezone.now() - datetime.timedelta(days=1)
            a_week_ago = timezone.now() - datetime.timedelta(days=7)
            BlogpostFactory(creator=user, created_datetime=a_day_ago)
            BlogpostFactory(creator=user, created_datetime=a_week_ago)
            self.assertEqual(user.get_latest_activity(), blogpost.created)

        with self.subTest("Future blogposts don't count"):
            tomorrow = timezone.now() + datetime.timedelta(days=1)
            BlogpostFactory(creator=user, created_datetime=tomorrow)
            self.assertEqual(user.get_latest_activity(), blogpost.created)

        with self.subTest("Other people's blogposts don't count"):
            another_user = UserFactory()
            BlogpostFactory(creator=another_user)
            self.assertEqual(user.get_latest_activity(), blogpost.created)

As a result, our code looks more readable, is DRY, and provides useful error messages for each section that fails, rather than just the first one.

A Caveat

While subtests do run independently of each other they don’t run in database transactions, so any changes to the database that are made within one subtest will persist in any subsequent subtests until the end of the test. From our previous example, the user’s first blogpost was created in the “A user with one blogpost” subtest, and it persists until the end of the test (which is why the user’s last activity in the “A user with multiple blogposts” subtest is this first blogpost’s created_datetime).

Conclusion

Subtests allow us to write short DRY sections of code and to have meaningful error messages for when our code fails. There are many ways to use subtests, and more things you can do with them than what is outlined in this blog post, so go and use them to write tests that are thorough, DRY, and readable.

For further reading about newer features of Python, check out New year, New Python: 3.6.

Tim HopperChoice of the Name Dynamic Programming

Richard Bellman quoted by Stuart Dreyfus via Garrett Jones:

I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. “An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research.

We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical.

The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons.

I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities.

Caktus GroupPyCon 2017 Recap

Caktus attended PyCon 2017 in Portland from May 18-21. It was the first PyCon for some, while others were PyCon veterans. All of us were looking forward to the opportunity to hear some great talks and make or renew connections in the Python community.

Getting Set Up

Right after arriving from the East Coast on Thursday it was time to set up for Day 1 of the expo. We eagerly prepared for the meet and greet opening night.

Some of the Caktus team at PyCon 2017

Our Ultimate Tic Tac Toe game made a comeback this year with a new AI feature. We only had 2 winners, although one of them beat the AI four times! (Our developers think it’s time to turn the difficulty up to 11.)

Visitors to the Caktus booth playing Ultimate Tic Tac Toe

Meetings and Giveaways

As expected, booth time was busy. It was exciting to welcome so many members of the Python community. We enjoyed chatting about what we do at Caktus and our work building web apps with Django, as well as making connections or meeting with people we haven’t seen in awhile. Mark was excited to catch up with Tom Christie of Django REST Framework.

Technical Director Mark Lavin with Tom Christie of Django REST Framework.

In addition to Caktus swag, we had two prize giveaways on offer this year. The first one was a social contest on Twitter. Christine was our winner by random selection - congratulations!

Winner of the Caktus Twitter contest at PyCon 2017

The second giveaway was a random drawing of people who signed up for our newsletter. Congratulations to Mike for winning that drawing.

Winner of the prize draw at the Caktus booth

Lots to Talk About

Six Caktus developers attended a variety of talks and came out of each of them feeling inspired by fresh ideas.

We can’t wait to see how they apply what they’ve learned to their work at Caktus.

After-hours Event

This year, Caktus hosted an exclusive, invite-only event at Fuse Bar. We were excited to welcome clients and special guests for sushi, nibbles, and refreshments.

Sushi ordered for the Caktus after-hours event.

Some of our clients are based across the country from us, so it was nice to have the opportunity to catch up in person and mingle.

Job Fair

We spoke to lots of interested developers at the PyCon job fair on May 21st. Some of the most common questions we received - and their answers - are below.

Question 1: Where is Caktus based?

Our office is in a lovely brick building in downtown Durham, North Carolina.

Question 2: What does Caktus do / What projects have you worked on?

Caktus is a consultancy and development agency. Work primarily focuses on building custom web and SMS apps in Python, using the Django framework. Some of our projects include building the world’s first SMS-based voter registration system, a live event management app, and responsive, database-driven websites for clients in industries like food and beverage, travel and entertainment.

We also do team augmentation and offer discovery workshops to help our clients strategically plan out the user experience and technical development aspects of their projects. To find out more about our specialties at Caktus, visit our Services page.

Question 3: Do you have opportunities for remote work?

We hire remotely for some contract work, but full-time positions must be based out of our Durham office.

Question 4: Where can I find your current openings / How can I apply?

If you’re interested in joining us at Caktus, check out our Careers page. We’ll need a resume and cover letter, and encourage you to send a link to your GitHub, portfolio, or website as well.

Caktus is growing fast, and we’re pleased to offer great benefits and a fair, equal-opportunity working environment. Why not grow with us?

Until next year!

As always, we had a great time meeting and mingling with our fellow Pythonistas. Thanks to the organizers for putting on another fantastic event, our fellow sponsors for supporting it, the volunteers who kept things moving, and of course, all the attendees for your energy and enthusiasm. See you next year!

Tim HopperBackyard Macro Videography

I've been munching on sunflower seeds while working on my back patio, and some tiny ants (Monomorium minimum, I think) have been enjoying the leftovers.

I pulled out my camera, 50mm lens, and extension tube to experiment with macro videography. The result is quite fun!


Here's what the recording setup looked like.

Caktus Group3 Reasons to Upgrade to the Latest Version of Django

When considering a website upgrade, many business stakeholders probably think about the frontend, i.e., how the website looks or the features users interact with. Perhaps less often considered is the importance of upgrading the backend; that is, the databases, applications, and servers powering all the behind-the-scenes activity. Infrastructure support and upgrades are necessary but often performed as a separate project from any improvements to design or user experience, rather than as part of a holistic update project.

With that in mind, it helps to have an understanding of why upgrading the backend should be considered a necessary part of any website upgrade project. We offer 3 reasons, focusing on our specialty of Django-based websites. Upgrading:

  • increases security,
  • reduces development and maintenance costs, and
  • ensures support for future growth.

Read on for details about each of these factors, or get in touch to speak with us about them.

Increase the security of your site

The Django framework is continually being improved and certain releases are designated as “Long Term Support” (LTS) versions. LTS versions receive security updates and bug fixes for a three-year period, as opposed to the usual 18 months. When your website uses an unsupported version of Django, newly uncovered bugs are not being fixed, patched, or supported by the Django and Open Source communities. No new security fixes are planned for retired versions, a situation that carries a number of risks.

These risks come in the form of vulnerabilities - weaknesses that leave your site open to attack. Attacks could potentially cause servers to go down, data to be leaked or stolen, or features to stop working. If a vulnerability is taken advantage of, it could lead to a loss of reputation and potentially a loss of revenue or legal ramifications. With high consumer expectations and increasing requirements from international data protection laws, this could prove disastrous for organizations or web applications without stringent upgrade plans in place.

If your site is using an older version of Django, a security patch may not be released for your version of Django. This means that a fix for the vulnerability would have to be authored and implemented by your development team, which, over time, is less cost effective than upgrading to the LTS version.

Upgrading to an LTS release offers significant benefits, including security updates as needed. Fixes for security issues and vulnerabilities are implemented quickly. There is no need to implement fixes yourself (or hire out expensive custom work). Taking proactive steps to upgrade reduces risk and can save you the trouble of expensive, reactive steps in the event of a cyberattack.

Reduce development and maintenance costs

In addition to improving security and ensuring support for future growth, upgrading also offers productivity benefits for development teams. Many extra lines of code may be required in order to continue to backport fixes for your website or app as issues occur or features are added. Adding all this code and continuing to use old versions of Django will eventually lead to technical debt, where the short-term fixes and outdated code end up creating extra work to patch and maintain a project in the long run.

Custom fixes and patches also introduce a large learning curve for new developers or contractors. The issue here is two-fold: Onboarding new developers is more time consuming than it needs to be, and if key personnel leave, you may lose knowledge which is integral to maintaining or updating the project.

Upgrading your version of Django reduces technical debt by eliminating old, no-longer-needed code. It also allows your development team to reduce the time and money spent on addressing security issues and bug fixes, freeing up time for them to work on website improvements or revenue-generating work.

Ensure support for future growth

Extensibility is the practice of keeping future growth in mind when working on a development project. We often hear from potential clients who built a website or web app in the early days of their business, when releasing features quickly took precedence over planning for future growth. Whether that growth is in the form of more data, more users, or more functionality, planning for it impacts current design and development decisions. When growth isn’t considered, scaling up the project and adding new features requires a disproportionate amount of work. If the original development was not intended to support the changes being made, custom workarounds must be introduced.

Where does this leave your web project? Technologically out of date, unnecessarily clunky, and less able to deliver a quality experience to site visitors.

Upgrading Django from an out-of-date version to a more recent LTS version not only provides access to software that is constantly receiving bug and security fixes; it also simplifies the upgrade process when a new version of Django is released with a feature needed by your project. If your project is two, three, even four releases behind, upgrading all at once could be cost-prohibitive. By regularly upgrading, you gain near-immediate access to new features in Django if and when needed. In other words, you can depend on a highly-engaged developer community actively working to add features rather than reinventing the wheel by developing them yourself.

Next steps

The wider open source development community is producing great tools and enhancements every day and the community continues to grow in size and support. Your project may find itself left behind if Django is left unsupported - or growing along with the community if you upgrade.

So where to get started? For clients considering an upgrade, we generally advise moving up to the most recent LTS release. While the latest version of Django offers the newest features, the LTS version represents a version of Django that will be more cost efficient to maintain given the community’s three-year commitment to releasing updates for it.

As Django specialists, Caktus developers have experience upgrading and maintaining Django-based websites. We have successfully completed upgrades for numerous clients and offer an upgrade protection plan that ensures your site will be maintained year to year as well as updated to the most recent LTS version of Django. Sound good? Get in touch to start the process of upgrading and securing your website, or take a look at some of our other services if you’ve got a larger project in mind.

Tim HopperAdversarial Learning: Stories of Degradation and Humiliation

My friends Andrew and Joel were kind enough to have me back on their podcast Adversarial Learning. We shared our tales of bad data science interviews. Enjoy!

Caktus GroupUniqueness is an Advantage

Back in March, the organizers from the Women in Tech summit asked if I’d like to collaborate on a panel on diversity in technology at their Philadelphia summit. Back in October, I had created a panel on “Staying a Women in Tech” and was excited for the opportunity to speak on such a significant topic in my hometown of Philadelphia. I was introduced to Brigitte Daniel, who had submitted this new panel, and we began putting the panel plans together. Brigitte would moderate the discussion and I would be a panelist along with three other dynamic women in tech: Elise Wei, Jumoke Dada and Gulrukh Ahanger.

Brigitte, founder of Mogulette, a program centered “around educating, mentoring, and empowering women, with a focus on women of color who are interested in careers in business and technology”, thought it would be a great ice breaker to begin the panel with a clip of Bozoma Saint John at WWDC 2016. Saint John’s ability to connect with her audience on a massive scale set an example for other women of color in the tech industry looking for ways to become more visible and make a name for themselves.

After panel introductions, with panelists ranging from developers and managers to founders and chapter leaders of non-profit organizations that help women learn to code, we began chatting about how we could use our uniqueness to our advantage on our respective teams. Here are some of the highlights captured by our audience on Twitter:

Gulrukh Ahanger

Jumoke Dada

Elise Wei

Erin Mullaney

Overall, I was incredibly happy with how our panel turned out. I definitely heard things from other panelists that I could take back with me and think about. It’s an incredible feeling to be with a group of smart women who are there to help lift each other up.

Gif via Libby VanderPloeg on Giphy

See what other events Cakti have participated in or check out these talks.

Caktus GroupCaktus Consulting Group is an Official AWS Consulting Partner

We’re proud to announce that Caktus has become a certified Amazon Web Services (AWS) Consulting Partner in recognition of the depth and breadth of our AWS expertise. Since AWS became an option for fast, flexible, and low cost infrastructure, we’ve used it to build scalable web or cloud apps for our clients. We’ve used AWS services for computing, networking, storage, databases, security, application services and security for 10 clients over the last few years (and that’s not including the projects we do for fun or as part of ShipIt Day projects).

In addition to our client experience, we have 7 individual AWS certifications amongst our staff. AWS Certification is industry-recognized and demonstrates a thorough, tested knowledge of Amazon Web Services.

We’re looking forward to building more apps with AWS as our top pick for cloud computing services. Joining the Amazon Web Services Partner Network puts us in good company and grants us access to a special range of tools that we can put to work for our clients. To learn more about how we use AWS to deliver highly scalable apps, please contact us.

Or, check out a few of our top blog posts on working with AWS:

Caktus GroupCelebrating 10 Years of Building Web Apps the Right Way

This year marks 10 years of building sharp web apps at Caktus Group. We’re honored by the trust our clients have put in us; it has enabled Caktus to grow from a team of 3 Python developers to an organization of 31 people and supported our efforts to give back to the local and open source communities.

What do Caktus staff have to say about this milestone?

Looking back

Looking back on her 7 years of work at Caktus, Karen says, “It’s been fun to go from 6 people to 30, be a part of that growth and work on communication and defining roles.” She cites her enjoyment of building long-term customer relationships and being able to nurture projects from their early development to completion and improvement over time as her main reason for sticking with Caktus.

Mark, also with Caktus for 7 years, is proud of how his work here and the support of his colleagues provided opportunities to speak at community events and co-author Lightweight Django. He adds, “I feel like I've grown with this company and it's grown with me. I'm proud to say I work here. I remember my first DjangoCon when were were 5-6 people. I told someone I worked at Caktus and they said ‘Oh, I've heard of you guys.’ I knew then that we were doing something right.”

Caktus Top 10s

Top 10 Caktus GitHub contributions

  1. django-project-template
  2. django-scribbler
  3. django-treenav
  4. django-pagelets
  5. fabulaws
  6. django-email-bandit
  7. margarita
  8. django-file-picker
  9. django-jsx
  10. django-comps

Top 10 blog posts in the last year

As part of our commitment to giving back to the development community, we maintain a technical blog with tips for Django, Python, UX, and more. The collection has gotten pretty big over the years! Our 10 most popular blog posts, in order of views, are:

  1. Using Amazon S3 to Store Your Django Site's Static and Media Files
  2. Getting Started Scheduling Tasks with Celery
  3. Migrating to a Custom User Model in Django
  4. Getting Started using Python in Eclipse
  5. Configuring a Jenkins Slave
  6. Custom JOINs with Django's query.join()
  7. Best Python Libraries
  8. Celery in Production
  9. Django Logging Configuration: How the Default Settings Interfere with Yours
  10. Writing Unit Tests for Django Migrations

Building Apps the Right Way

Even with those top 10s, one of the things we’re most proud of is our dedication to building web apps the right way. Caktus CEO Tobias McNulty says, “Doing things right is an ethos that extends to all areas of the business - not just app development. We’ve spent the last 10 years working to implement processes and systems that ensure we continually deliver excellent work to our clients and treat our staff with fairness and respect.”

Caktus' core values guide both our internal and external interactions and have played a key part in growing Caktus to where it is today. We’re confident they will also continue to drive our future growth.

We’ve encountered an immense variety of technical challenges in our 10-year mission to build web applications the right way. It is always a delight to bring that experience to bear on a new project. If you have a project that might benefit from Caktus’ approach, don’t hesitate to get in touch.

Caktus GroupUsing Tokens During Sprint Planning to Allocate Time

In January of 2016, Caktus transitioned from a general Agile development environment to a more focused Scrum environment. Part of this transition entailed moving from a targeted budget allocation approach per project, to a self-organizing, goal-based team structure with no obvious provision for tight, consistent control over project budgets.

If managing budgets is part of your job, you can appreciate how much our project managers struggled with this. We shifted to working in 2-week-long goal-based sprints, but still had to pay attention to budget constraints. We searched for a way to still effectively manage our budgets, but to do so without exercising unseemly amounts of un-Scrum-like “command and control”.

We also noticed that the development team members were having their share of budget-related struggles:

  • If we didn’t discuss hourly budgets in sprint planning, it was difficult for team members to gauge whether the stories they were committing to aligned both with the sprint goals and the project budgets.
  • However, if we did discuss hourly budgets in sprint planning, the teams tended to feel a lack of agency, which inhibited self-organization. This also tended to introduce an unwanted comparison of hours to story points.
  • It was common for the team to commit to stories that appeared to meet the sprint goals, and not realize until the end of the sprint that multiple team members had over-focused on the same project during the sprint. This could lead to overspending on some projects, while underspending on others.
  • The teams realized that without increased transparency regarding budgets, it was entirely possible for them to deliver sprint goals and satisfy client needs, but still come in over- or under-budget.

So how do we maintain our project budgets, empower the team, be truly Agile, and still deliver a working product at the end of each sprint that meets the goals?

Tokens!

Here’s our solution:

  1. Acquire supplies: colored tokens, large pads of paper, markers.
  2. Create a grid on a large piece of paper with enough boxes for your team’s budget sources. These can be projects, sprint activities, time off, etc., but should reflect the main buckets of time that your team members allocate to during sprints. Label each box.
  3. Designate a budget for each token to represent. Our tokens each represent 1 half day (or 4 hours).
  4. During sprint planning, each team member selects a color and takes the amount of tokens equal to their availability during the sprint. Full-time team members at Caktus get 20 tokens; part-time members take fewer as appropriate.
  5. Team members then allocate their tokens in the boxes as they see fit. At this time, the Product Owner (PO) can communicate any budget limitations for specific projects. The team resolves allocation conflicts amongst themselves.

This exercise helps the team identify and resolve these frequent conflicts:

  • Sprint goals not achievable within the budget
  • Over- or under-allocation by individual team members, due to PTO, enthusiasm, or other commitments
  • Project favoritism (everyone wants to work on one project)
  • Project only gets time from a single team member, leaving no space for pull request review or quality assurance from the rest of the team

Typically, our project managers (playing the role of Product Owner) go over the sprint goals prior to this exercise. Once the initial allocation is complete, the team progresses to planning their sprint, starting with the token exercise. This usually takes no more than 5-10 minutes. The grid with the tokens is left in play through the end of the sprint, allowing team members to reference and adjust it as needed. After sprint planning, the Scrum Master posts the initial allocations to the team’s Basecamp.

The budget allocations per person that come out of this exercise are not communicated to the team again during the sprint by the PO, Scrum Master, or other stakeholders. The team can choose to reference the allocations or not, as they see fit. If desired, the PO can compare the initial allocation data to the actual expenditures after the sprint ends. This type of comparison over multiple sprints can be useful in identifying trends that the PO can act upon. For example, if a project is consistently allocated more time in sprint planning than is actually spent on it, yet the goals are always completed, this could be an indication that the goals and/or stories for that project are too small and can be increased in scope.

All three of our Scrum teams are choosing to use this exercise for now. We’ve found that the token exercise provides budget transparency for the development teams, a mechanism for hourly budget management (without command and control) for the POs, and a starting point for team conversations about resource allocation. It also starts sprint planning with a hands-on activity that gets the team thinking and moving around.

Looking for more info about using Scrum and Agile in web development? Read about how we implemented Scrum in a client-services organization, or check out this post about using priority in Scrum to reduce team anxiety.

Caktus GroupShipIt Day Recap Q2 2017

Once per quarter, Caktus employees have the opportunity to take a day away from client work to focus on learning or refreshing skills, testing out ideas, or working on open source contributions. The Q2 2017 ShipIt Day work included building apps, updating open source projects, trying out new tools, and more. Keep reading for the details.

PostgreSQL Performance

Erin used ShipIt Day to watch a tutorial on Postgres performance by Craig Kerstiens and test the Caktus website with some of the things she learned. She used the free pgAdmin III tool to try out some of Craig’s suggested database queries for performance monitoring. While drilling down into our website, she explored cache and index hit rates, reviewed query performance on our blog, and tested with pg_stat_statements to find the most expensive queries in aggregate across database. Erin plans to use her findings to inform decisions impacting website performance.

GitHub Pull Requests Tool

Dan built a tool to help with GitHub pull requests. The tool watches a pull request until it’s ready to merge, then merges it for him. He built it from scratch using the requests library and GitHub API. The tool works by reloading the page occasionally to see if the request is ready to merge, and has tab title changes to make it easy to keep an eye on the status of the request.

Book Club Voting App

Charlotte M built an app to help the Caktus book club vote on their next book. Members can view and add books to the book list for the next election, then vote using an election interface. Once the books are selected, members vote by dragging and dropping the titles in their preferred order to submit votes.

As part of her project, Charlotte researched real-life voting systems and settled on the Borda Count method, preferring a consensus-based system over a majoritarian one.

Open Source Projects

Mark reviewed open source projects and worked on maintenance for the Sick Muse project, a front end for collectd. He wanted to make it work on Python 3 and ensure it works on the latest version of Tornado. While the back end worked, he found that the JavaScript/Bower-based front end broke and plans to remove Bower in future. As part of maintenance, he also worked to improve test coverage from 50% to 88%.

Test Case Management Tool Research

Gerald researched test case management tools that would integrate with JIRA, aiming to find something that would mesh with JIRA as well as sharing a similar visual style. He looked at qTest by QAsymphony, Xray for JIRA, and Zephyr for JIRA, settling on Zephyr for testing. Although it required him to create a few workarounds, Gerald got Zephyr up and running, demonstrating a few user stories and test cases.

For the next ShipIt Day, Gerald plans to look at QA metrics and reporting.

Python for Data Visualization

As part of her grad school projects, NC’s coursework requires at least one semester of Python and data visualization. She spent this ShipIt Day working on creating a media library, working on the exit function and queries for media types which would allow the user to get a list of all of the records that fit a given query.

User Stories for Agile Development

UX designer Basia read User Stories Applied for Agile Software Development by Mike Cohn to brush up her skill in writing user stories as a way to enhance the user story mapping techniques she leverages in discovery workshops. She shared a review of what a user story is and what it conveys, noting that each must be accompanied by acceptance criteria that will validate developed functionality.

She also walked through why user stories should be used in software development, the importance of working as a team to verbally communicate them, the usefulness of user stories in helping to defer details until the team is sure they’re needed, and how they discourage teams from pretending they know up-front everything there is to be known about the project. Most importantly for the Agile developer, she explained how user stories encourage iterative work.

Hello Ansible

Working on an Ansible project for ShipIt Day.

Neil, Dmitriy, and Jeff B worked as a team to start a bare-bones “hello world” project using Ansible and and write an Ansible playbook for its deployment using nginx, gunicorn, Django, Postgres, and memcached.

As a newcomer to devops, Neil liked using Ansible and learned that it’s not as scary as it seems. Dmitriy liked working through the different steps and generally learning about Ansible. Jeff was on board as an advisor, and sees areas where more documentation can be written.

Project Tequila

Vinod worked together with the ‘Hello Ansible’ team on a similar project. Caktus hosts many client projects, all of which were initially created with varying deployment recipes. Many of these use Margarita, Caktus’s homegrown library of Salt recipes. Vinod decided to take one of these older projects (the Libya SMS project) and investigate how it could be migrated from Margarita to Tequila, our internally-developed library for Ansible. This worked surprisingly well (thanks to help from Jeff B, the primary author of Tequila) and by Friday afternoon, we had a single app server deployed successfully to a Vagrant box.

Blogging

Sarah, Charlotte F, Elizabeth and Gannon created and reviewed posts for the Caktus blog, with topics including sprint planning, conference recaps, and project management. Keep an eye out for those in the next several weeks.

Triangulated Hearts

Kia Lam demonstrates her project for ShipIt Day Q2 2017.

Kia revisited a project built a year and a half ago using the Processing library for animation. The project uses the Triangulate and Minim libraries. The animation of a heart reacts to sound, including voice or a song, by changing color and shifting geometric lines.

For the next version Kia would like to make adjustments to functions built into the library. It’s currently audio reactive but not beat reactive, something she intends to work on.

Or, maybe she’ll animate the Caktus logo for our 10th anniversary party!

Until next time

As you can see, Cakti have been busy on a range of projects. Want to join us and work on sharp web apps? Check out the Caktus careers page for current openings.

Caktus GroupBuilding a Custom Block Template Tag

Building custom tags for Django templates has gotten much easier over the years, with decorators provided that do most of the work when building common, simple kinds of tags.

One area that isn't covered is block tags, the kind of tags that have an opening and ending tag, with content inside that might also need processing by the template engine. (Confusingly, there's a block tag named "block", but I'm talking about block tags in general).

A block tag can do pretty much anything, which is probably why there's not a simple decorator to help write them. In this post, I'm going to walk through building an example block tag that takes arguments that can control its logic.

Django Documentation

There are a couple of pages in the Django documentation that you should at least scan before continuing, and will likely want to consult while reading:

What our example tag will do

Let's write a tag that can make simple changes to its content, changing occurrences of one string to another. We'll call it replace, and usage might look like this:

{% replace old="dog" new="cat" %}
My dog is great.  I love dogs.
{% endreplace %}

which would end up rendered as My cat is great.  I love cats..

We'll also have an optional numeric argument to limit how many times we do the replacement:

{% replace 1 old="dog" new="cat" %}
My dog is great.  I love dogs.
{% endreplace %}

which we'll want to render as My cat is great. I love dogs..

Parsing the template

The first thing we'll write is the compilation function, which Django will call when it's parsing a template and comes across our tag. Conventionally, such functions are called do_<tagname>. We tell Django about our new tag by registering it:

from django import template

register = template.Library()

def do_replace(parser, token):
  pass

register.tag('replace', do_replace)

We'll be passed two arguments, parser which is the state of parsing of the template, and token which represents the most recently parsed token in the template - in our case, the contents of our opening template tag. For example, if a template contains {% replace 1 2 foo='bar' %}, then token will contain "replace 1 2 foo='bar'".

To parse that token, I ended up writing the following method as a general-purpose template tag argument parser:

from django.template.base import FilterExpression, kwarg_re

def parse_tag(token, parser):
    """
    Generic template tag parser.

    Returns a three-tuple: (tag_name, args, kwargs)

    tag_name is a string, the name of the tag.

    args is a list of FilterExpressions, from all the arguments that didn't look like kwargs,
    in the order they occurred, including any that were mingled amongst kwargs.

    kwargs is a dictionary mapping kwarg names to FilterExpressions, for all the arguments that
    looked like kwargs, including any that were mingled amongst args.

    (At rendering time, a FilterExpression f can be evaluated by calling f.resolve(context).)
    """
    # Split the tag content into words, respecting quoted strings.
    bits = token.split_contents()

    # Pull out the tag name.
    tag_name = bits.pop(0)

    # Parse the rest of the args, and build FilterExpressions from them so that
    # we can evaluate them later.
    args = []
    kwargs = {}
    for bit in bits:
        # Is this a kwarg or an arg?
        match = kwarg_re.match(bit)
        kwarg_format = match and match.group(1)
        if kwarg_format:
            key, value = match.groups()
            kwargs[key] = FilterExpression(value, parser)
        else:
            args.append(FilterExpression(bit, parser))

    return (tag_name, args, kwargs)

Let's work through what that does.

Calling split_contents() on the token is like calling .split(), but it's smart about quoted parameters and will keep them intact. We get back args, a list of strings representing the parts of the template tag invocation, very much like sys.argv gives us for running a program, except that no quotation marks have been stripped away.

The first element in args is our template tag name itself. We remove it because we don't really need it for parsing the args, but save it for generality.

Next we work through the arguments, using the same regular expression as Django's template library to decide which arguments are positional and which are keyword arguments.

The regular expression for keyword arguments also splits on the =, so we can extract the keyword and the value.

We'd like our argument values to support literal values, variables, and even applying filters. We can't actually evaluate our arguments yet, since we're just parsing the template and don't have any particular template context yet where we could look for things like variables. What we do instead is construct a FilterExpression for each one, which parses the syntax of the value, and uses the parser state to find any filters that are referred to.

When all that is done, this method returns a three-tuple: (<tagname>, <args>, <kwargs>).

Our replace tag has two required kwargs and an optional arg. We can check that now:

from django.template import TemplateSyntaxError

# ...

def do_replace(parser, token):
    tag_name, args, kwargs = parse_tag(token, parser)

    usage = '{% {tag_name} [limit] old="fromstring" new="tostring" %} ... {% end{tag_name} %}'.format(tag_name=tag_name)
    if len(args) > 1 or set(kwargs.keys()) != {'old', 'new'}:
        raise TemplateSyntaxError("Usage: %s" % usage)

Note again how we haven't hardcoded the tag name.

Let's pull our limit argument out of the args list:

if args:
    limit = args[0]
else:
    limit = FilterExpression('-1', parser)

If no limit was supplied, we default to -1, which will indicate later that there's no limit. We wrap it in a FilterExpression so we can just call limit.resolve(context) without having to check whether limit is a FilterExpression or not.

We can't check the values here. They might depend on the context, so we'll have to check them at rendering time.

This is all similar to what we might do if we were writing a non-block tag without using any of the helpful decorators that hide some of this detail. But now we need to deal with some unknown amount of template following our opening tag, up to our closing tag. We need to ask the template parser to process everything in the template until we get to our closing tag:

nodelist = parser.parse(('end_replace',))

We get back a NodeList object (django.template.NodeList), which represents a list of template "nodes" representing the parsed part of the template, up to but not including our end tag.

We tell the parser to just ignore our end tag, which is the next token:

parser.delete_first_token()

Now we're done parsing the part of the template from our opening tag to our closing tag. We have the arguments to our tag in limit and kwargs, and the parsed template between our tags in nodelist.

Django expects our function to return a new node object that stores that information for us to use later when the template is rendered. We haven't written the code for our node object yet, but here's how our parsing function will end:

return ReplaceNode(nodelist, limit=limit, old=kwargs['from'], new=kwargs['to'])

Reviewing what we've done so far

Each time Django comes across {% replace ... %} while parsing a template, it calls do_replace(). We parse all the text from {% replace ... %} to {% endreplace %} and store the result in an instance of ReplaceNode. Later, whenever Django renders the parsed template using a particular context, we'll be able to use that information to render this part of the template.

The node

Let's start coding our template node. All we need it to do so far is to store the information we got from parsing part of the template:

from django import template

class ReplaceNode(template.Node):
    def __init__(self, nodelist, limit, old, new):
        self.nodelist = nodelist
        self.limit = limit
        self.old = old
        self.new = new

Rendering

As we've seen, the result of parsing a Django template is a NodeList containing a list of node objects. Whenever Django needs to render a template with a particular context, it calls each node object, passing the context, and asks the node object to render itself. It gets back some text from each node, concatenates all the returned pieces of text, and that's the result.

Our node needs its own render method to do this. We can start with a stub:

class ReplaceNode(template.Node):
  ...
  def render(self, context):
    ...
    return "result"

Now, let's look at those arguments again. We've mentioned that we couldn't validate their values before, because we wouldn't know them until we had a context to evaluate them in.

When we code this, we need to keep in mind Django's policy that in general, render() should fail silently. So we program defensively:

class ReplaceNode(template.Node):
  ...
  def render(self, context):
      # Evaluate the arguments in the current context
      try:
          limit = int(self.limit.resolve(context))
      except (ValueError, TypeError):
          limit = -1

      from_string = self.old.resolve(context)
      to_string = conditional_escape(self.new.resolve(context))
      # Those should be checked for stringness. Left as an exercise.

Also note that we conditionally escape the replacement string. That might have come from user input, and can't be trusted to be blindly inserted into a web page due to the risk of Cross Site Scripting.

Now we'll render whatever was between our template tags, getting back a string:

content = self.nodelist.render(context)

Finally, do the replacement and return the result:

content = mark_safe(content.replace(from_string, to_string, limit))
return content

We've escaped our own input, and the block contents we got from the template parser should already be escaped too, so we mark the result safe so it won't get double-escaped by accident later.

Conclusion

We've seen, step by step, how to build a custom Django template tag that accepts arguments and works on whole blocks of a template. This example does something pretty simple, but with this foundation, you can create tags that do anything you want with the contents of a block.

If you found this post useful, we have more posts about Django, Python, and many other interesting topics.

Caktus GroupCaktus Activities at PyCon 2017

It’s almost time for PyCon and the team here at Caktus is ready to meet other attendees. Where and how can you find us?

At the booth

Keep an eye out for the Caktus booth in space 232 at the conference expo from May 18-20. Members of our team look forward to welcoming you there, including Colin, David, Julie, and Whitney.

For those interested in learning how Caktus delivers web apps faster, we’ll have copies of our Shipping Faster white paper on hand for review. There will be a prize draw for a Polaroid Cube+ mini action camera, so stop by the booth and let us scan your badge to be entered. The winner will be announced on May 20. Follow Caktus on Twitter for live updates.

Caktus giveaway prizes for PyCon 2017

We’ll also be doing an early giveaway for wireless headphones on May 18 during the opening reception at the expo. Here’s how you can win:

  1. Take a picture of yourself at the Caktus booth
  2. Tweet it to us @CaktusGroup with hashtag #CaktusPyCon

The winner will be randomly selected and announced around 8pm the same evening.

In addition to the prizes we’ll have Caktus swag. Be among the first to get one of our special edition 10th anniversary water bottles!

Caktus swag for PyCon 2017

At the talks and events

Several of our development team are attending as well, so look for them in different talks and events. While some of them are repeat attendees, Dmitriy is looking forward to attending for the first time and listening to talks like Immutable Programming - Writing Functional Python.

Developer Erin is interested in listening to Cython as Secret Weapon for Efficiency. She’s also looking forward to volunteering as a TA at Young Coders: Outside In because of how much she’s enjoyed being a Django Girls coach, and is excited to share her enthusiasm for code with them.

Sarah went to PyCon last year and was very impressed by the inclusiveness of the community. She said, “Everyone is so willing to teach, help, and share with other attendees. I'm looking forward to the testing-centered talks.”

If you’re planning on heading out for the 5k fun run, look out for Mark. He’s also interested in several talks, including Library UX: Using abstraction towards friendlier APIs.

After hours

We’ll be hosting an after hours event on Friday, May 19. Join us for an exclusive happy hour gathering to enjoy some light refreshments. Stop by our booth to get on the invite list.

At the job fair

Caktus is hiring! We’re looking for a Django Web Developer, so stop by table 24 on Sunday, May 21 to talk to members of the team about what it’s like to work with Caktus. We have offices in Durham, NC and Baltimore, MD.

See you soon!

There’s less than a month to go and we can’t wait to meet you. Be sure to contact us to let us know that you’re interested in speaking with us about something in particular, and we’ll be sure to set up a time.

Tim HopperLike most great mathematicians, he expects universal precision

From the Autobiography of Benjamin Franklin:

Thomas Godfrey, a self-taught mathematician, great in his way, and afterward inventor of what is now called Hadley's Quadrant. But he knew little out of his way, and was not a pleasing companion; as, like most great mathematicians I have met with, he expected universal precision in every-thing said, or was for ever denying or distinguishing upon trifles, to the disturbance of all conversation.

I'm a recovering Godfrey Precisionist.

Philip SemanchukThanks, Science!

I took part in the Raleigh March for Science last Saturday. For the opportunity to learn about it, participate in it, photograph it, share it with you — oh, and also for, you know, being alive today — thanks, science!







Caktus GroupCalling all Cat Herders: New Meetup for Digital Project Managers

When I first became a digital project manager (DPM), I struggled to find relevant resources. A ton of information was available on traditional project management, but not much specifically on digital project management. Eventually, I connected with another DPM in my organization and we quickly became friends and confidants. She opened my eyes to the Digital PM Summit, a new conference targeted at DPMs, which was ultimately the inspiration for my new Meetup group.

I attended the Digital PM Summit three times (and even presented last year!). The conference opened up a whole new world to me, one full of practical resources and friendly, helpful contacts. Ever since I first attended the conference, I wanted to replicate some piece of it to connect with DPMs in the Triangle area. It took a few years, but the Triangle Digital Project Managers Meetup has finally come to fruition, thanks in large part to Caktus Group.

The new Meetup is focused on providing opportunities for DPMs in the Research Triangle Area (and beyond) to network, share knowledge, and support each other. No certifications are required to join, and it doesn’t matter what process (or lack thereof) that you use -- Waterfall, Agile, Scrum, or your own Special Secret Sauce. Some of our meetings will be based on a professional topic, while other meetings will be more social. Our goal is to meet at least once every two months.

Project Management is a Team Effort

Over the past few years, I’ve found that many DPMs work solo, or have few cohorts within their organization. At Caktus, we have three full-time project managers and one full-time project management director. PMs at Caktus also serve in the more defined Scrum role of product owner. I’ve been a part of the Caktus team since September 2016, and it’s the first time I’ve been lucky enough to work with a team of DPMs.

When I worked alone, it was often intimidating and even frustrating, because I didn’t have anyone to bounce ideas off of and there was no PM precedent or process already in place. I felt like I was constantly reinventing the wheel. Working in a silo makes connecting with a group of similar professionals even more important, in order to share ideas, stay current, and grow your skills. For these reasons, my team supported my idea to create a Meetup and Caktus, which believes in supporting community involvement, provided me with the time and resources needed to do so.

The Triangle DPM kickoff meeting was held in the Caktus Tech Space at the end of February. The group was small, but passionate and experienced. One attendee even joined remotely via web conference. It was perfect, exactly the kind of inclusive, friendly group that I wanted to bring together. A group where even if you can’t attend in person, you can join remotely and still be home with your kids. A group where everyone is welcome, regardless of industry or job title -- and DPMs have a variety of titles like Digital Producer, Product Owner, or even Account Director! If you manage anything digital, from the Django web development that we do at Caktus to online marketing campaigns or even video games, you’re welcome to join the Meetup.

Come join fellow cat herders at a future meeting. Details will be posted on the Triangle DPM Meetup page and the Caktus events page.

Caktus GroupProduct Discovery Part 2: From User Contexts to Solutions

In the first installment of this two-part series, I introduced product discovery as the process of building a shared understanding about the product between stakeholders and the product team, which helps you make better decisions about what to build. I also suggested that we look at product discovery as a four-step process:

  1. Frame the problem
  2. Identify the users
  3. Map out user actions, tasks, and workflows
  4. Sketch out ideas

Having previously discussed how to frame the problem and identify the users, let’s move on to mapping out user tasks and workflows, and sketching out solutions.

Map out user tasks and workflows

User-centered software design and development arose from the recognition that we must account for human capabilities and characteristics when we build systems and technologies. That’s why so much emphasis is placed on understanding users through research and on empathizing with them by employing tools such as personas and proto-personas.

In order to understand the users, you identify their demographic, psychological, and behavioral characteristics, as well as their goals, needs, pain points, and possible solutions to their challenges. And to place that information in context, you build a narrative within which your users function as they use your product.

User task flowchart

At the very least, when building a product you will create a user flowchart to capture tasks the product should support, decisions the user will be making within the system, user inputs, and system outputs.

While a user flowchart is a useful and succinct way to diagram tasks that need to be supported by an application, there are other methods of capturing user actions that are more story-based and thereby help build a richer representation of user behaviors.

Agile user story mapping

Agile user story mapping is a visualization technique introduced by Jeff Patton that depicts a user’s path through an application. It can also be used to map out user workflows outside of the application.

In Agile software development, a user story is a brief description of a desired feature that is written from the perspective of an end-user, and that captures user outcomes that the feature is meant to support.

Mapping user stories is a group activity in which teams build a narrative about how users engage with software. Using sticky notes, stakeholders and product teams map out user workflows, tasks, task variations, and sub-tasks in chronological order from left to right, and in order of priority or detail from top to bottom.

The resulting user story map is an artifact that offers a quick look at the application’s big picture while preserving a level of detail that can be leveraged to create a backlog. It also shows feature prioritization and can assist in the estimation process.

Example of a user story map created at Caktus for a client. A fragment of a user story map done at Caktus for an animal rescue

User experience mapping

User experience mapping (or user journey mapping) is the process of capturing and communicating complex user interactions across various channels through which the user comes in contact with your product and/or company. It helps build an understanding of user actions, feelings, thoughts, pain and satisfaction points that go beyond the realm of the application itself. The resulting experience (or journey) map provides an omnichannel representation of user experience with touch points at which the user interacts with your product and opportunities to create new or better experiences.

Example of a user journey map created at Caktus for a client. A fragment of a user experience (journey) map created at Caktus for an animal rescue

Narrative arc story mapping

I recently learned about yet another story mapping technique. While I have not had a chance to try it out in my workflow, I found it intriguing and thought it worth sharing.

This approach to story mapping, popularized by Donna Lichaw, relies on a narrative arc as a framework to develop three types of stories — the concept story (the big picture story), the origin story (how your product will be discovered by users), and the usage story (how people use your product). The concept and origin stories are perfect tools for discovering new products, while the usage story can be leveraged to understand the current use and engagement patterns of your product and to identify opportunities for improvements.

Sketch out solutions

By this time in the process, you should know what problem you’re solving and for whom you’re solving it. You should also have a pretty good idea of what workflows and experiences your application should support. Now it’s time to start discussing specific solutions.

Whether in a session involving stakeholders or with your internal team, you can conduct activities that will help you hone in on the right idea. Here are a few suggestions:

  • Ideation: Work individually to produce as many ideas as possible (either by sketching or describing them in writing), then as a team select a small list of solutions that best address the problem at hand.
  • Brainstorming: Work as a team to generate a wealth of ideas by finding inspiration in each other’s concepts, then work together at identifying the best ones.
  • Prototyping: Build paper prototypes, sketch wireframes (on paper or digitally), or code a simple interface to start validating your ideas. You can test your prototypes with team members or recruit a few users and conduct lightweight usability testing.

Get the job done

Each project is unique and product discovery should be tailored to the needs of each project. Whether you develop personas or proto-personas, draw user flowcharts, map user stories, or create an omnichannel user experience map depends on the product you’re building, the resources you have, and the project management paradigm your team follows.

For Agile teams, the lean proto-persona strategy combined with small scale user research and agile story mapping can build a strong foundation of product discovery. But Agile teams will also benefit from the omnichannel perspective of a user journey map that places the product in the context of a broader ecosystem.

At Caktus we often kick off a project with a discovery workshop. The workshop is an opportunity for our team and the client team to get together and build a shared understanding of the product to be built. Working off of existing data or making assumptions where data are lacking, we frame the problem, identify user types, and build personas or proto-personas. In the process, we also identify knowledge gaps and may recommend small scale user research as appropriate. On projects where the problem is well framed and users understood, we work with stakeholders to map out user actions, tasks, and workflows using a technique that best fits the needs and the budget of a given project.

We come out of a workshop with a summary of product goals, identified target user types, and a list of the most valuable content or most valuable product features. If the workshop includes Agile user story mapping, the added benefit is an artifact that can easily be translated into a prioritized backlog. If, during the workshop, we map users’ entire and omnichannel experience, we gain a breadth of understanding of the user journey that goes beyond the application itself, and can support the development of current and future projects.

By establishing in this way a solid, shared understanding of stakeholders’ and users’ needs before any code is committed, we increase our chances of making right decisions about what to build. And by doing so, we reduce the long-term cost because we reduce risks and decrease a need for rework down the road.

Don't forget to read part 1 of this blog post to learn about how to get started with product discovery.

Caktus GroupProduct Discovery Part 1: Getting Started

When setting out to build a new website or web application, it is a good idea to build a shared understanding of the product between stakeholders and the product team. Through research and collaborative activities that aim to answer questions about the product, its goals, and its users’ needs, the stakeholders and product team discover the full breadth and depth of the application to be built, as well as contexts and implications that need to be considered for the product to be successful. We call this process product discovery.

A study conducted by the Institute of Electrical and Electronic Engineers (IEEE) found that software development projects fail when they do not address stakeholders’ needs adequately. It has also been shown that 50% of programmers’ time is spent on avoidable rework. By devoting resources upfront to build a solid, shared understanding of project goals, users, and user contexts, you can ensure that you will be building the right solution and minimizing waste.

Product discovery can be approached as a four-step process:

  1. Frame the problem
  2. Identify the users
  3. Map out user actions, tasks, and workflows
  4. Sketch out ideas

Steps 1 and 2, framing the problem and identifying the users, get you started with understanding the ramifications of your product. Step 3, mapping out user tasks and workflows, is a way to define user contexts and begin exploring solutions. Finally step 4, sketching out ideas, is a step toward articulating a solution.

In this article, I will focus on steps 1 and 2. Steps 3 and 4 will be covered in the second installment of this two-part series.

Frame the problem

When framing the problem(s), you are striving to answer the following questions:

  • What problem(s) am I trying to solve?
  • For whom am I solving this problem?
  • Why am I solving this problem?
  • What does success mean and how can I measure it?
  • What constraints do I need to accommodate?

Answers to these questions may be drawn from your business analytics and existing user or customer research. Data that inform your answers may come from:

  • Competitive audit
  • SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis
  • User or customer interviews or surveys that reveal pain points, needs, and goals
  • Existing use and engagement patterns
  • Points of drop-off or failure, etc.

If data in those areas are lacking, you may start out by making assumptions and stating hypotheses that you will later put to test.

Identify the users

You can identify your users by asking questions such as:

  • What are the demographic, psychological, and behavioral characteristics of the users?
  • What are users’ goals, needs, and pain points?
  • What user outcomes do I need to support?
  • What are the workflows my users employ?
  • How do users interface with my product?
  • How do users leverage technology in their life and/or work?
  • What types of solutions would best serve the users?
  • Other questions about your users’ lives and work, and their interactions with products similar to yours.

You can gain answers to these questions by conducting user research including:

  • Usability testing (observing people using an existing product or a competitor’s product)
  • User interviews (talking to users directly about their workflows, goals, needs, and pain points)
  • User surveys (having users answer questions, usually online)
  • Contextual inquiry (observing users in the context in which they use or would use your product)

Armed with the data, you can then develop user profiles called personas. Personas are tools that allow you to consolidate information about your users into a succinct format, and, perhaps more importantly, give your users a human face. Personas are documented user profiles. But they are also a device that helps you identify with your users and develop empathy for them. It is particularly true in the case of so-called proto-personas — user profiles not based on actual data, but rather on assumptions and guesses you make about your users.

Sample persona created at Caktus for an animal rescue project.

A sample proto-persona done at Caktus for an animal rescue

Personas (or proto-personas) include information grouped into categories, and there are multiple suggestions about good categories to use.

In Lean UX, Eric Ries recommends grouping information about users into:

  • Sketch and name
  • Behavioral and demographic information
  • Pain points and needs
  • Potential solutions

Ladies that UX suggest the following information categories to build a persona:

  • Bio and demographics
  • Emotions and behaviors
  • Goals
  • Solutions

In User Story Mapping, Jeff Patton shares a persona template that includes:

  • User type and role
  • Name and sketch
  • Context
  • About
  • Implications

Next, strive for deeper understanding and explore solutions

Once you’ve gained an understanding of the problem you are solving and the characteristics of the users, you’re ready to dive deeper into user contexts and to start considering solutions. In the next blog post, I will discuss techniques that can be leveraged to explore user contexts and ways to start identifying solutions. Stay tuned!

This blog post continues with Part 2: From User Contexts to Solutions, to be released later this week.

Tim HopperMetawork is more interesting than work

This Software Engineering Radio interview with Neal Ford on Success Skills for Architects is full of gems about building effective software.

He talks a lot about how coders love to solve problems, and that love can lead them to invent interesting, but unnecessary, problems to solve. This is true.

Metawork is more interesting than work. It's so hard to get back to simplicity, because we love complicated little puzzles to solve, so we keep overengineering everything.

Anyone who's developing software would benefit from listening.

Tim HopperTowards Reducing Distractions while Working

Staying focused while working in front of a computer and within reach of a smartphone is hard.

In 2017, teaching people to focus is becoming a industry.

I've been trying to rethink distractions in my own life, particularly in my work environment. Here are some things that have helped:

Working from Home

Working in an office, especially an open-floor plan office, is disastrous for staying focused. DeMarco and Lister wrote about this in Peopleware 30 years ago, and yet open offices are the norm for startups today.

I'm much more productive by working from home in my quiet office or on my back patio. I'm finally able to spend my time thinking about hard problems rather than ways of silencing Constant Throat Clearer or Perpetual Annoying Laugher.

Notifications

Every app and website these days wants to send you notifications. I'm aggressive about reducing notifications down to those that I need see, and I let almost nothing notify me with sound. I use Do Not Disturb mode on my phone and Mac whenever I need to stop notifications altogether.

Slack

Slack has become the new normal for company communication. Some would say Slack itself is ruining our focus, but having it regularly available has been essential for my own work.

I've come up with a few ways to take control of Slack:

  1. Only show "My unread, along with everything I've starred" in the sidebar. See Michael Lopp's excellent post on Slack for more here.
  2. Enable notifications selectively.
  3. Sign out of distracting avocational Slacks.

Social Media

I've started using an app called Focus to block distracting websites (including Facebook and Twitter.com) and apps on my work computer from 9 AM to 5:30 PM. I use Focus's scheduling feature so blocking isn't optional for me.

I've decided not to block Tweetbot. Though it can be distracting, Twitter is an invaluable way for me to learn from my professional colleagues, bounce ideas off of them, and have a good laugh.

On my iPhone, iPad, and personal Laptop, I've started using Freedom to block all social media during the day. This has stopped me from instinctively checking Instagram every time I walk to the bathroom or get suck on a hard problem. I highly recommend it.1

I also use Freedom to block social media for the first hour I'm up in the morning and before I go to bed.

Email

I have two main tactics to keep email from being distracting.

  • I aggressively unsubscribe from mailing lists and ads.
  • I use Sanebox to filter low priority messages out of my inbox.

When emails only need a brief reply, I tend to write responses as soon as possible. At the moment, I'm trying to break people of the expectation that I'll respond quickly. Using services like Boomerang which lets me write emails now and have them sent later helps here.

Reading

Long-form reading at the computer is terrible for comprehension. As Doug Lemov has argued, you have to get away from your computer and other devices to read deeply. I do this by printing articles or reading on my iPad with Freedom blocking enabled. I take my printouts or iPad and walk away from my desk to read.

Todo Items

I'm a firm believer in the Getting Things Done principle of reducing the cognitive overhead of tracking to-do items in my head. I use Omnifocus for task management. Mail Drop and this Alfred workflow help me to quickly add tasks to my Omnifocus inbox. When I think of something I need to take care of outside of work, I drop that thought into Omnifocus; this keeps those personal to-do items from distracting me while I'm working.


Staying focused is hard. I'm still learning how to do it well, and I'm sure I'm not the only one struggling to improve here. If you have any tips to share, I'd love to hear them!


  1. I can't use Freedom on my work computer, because it acts as a VPN which conflicts with my work VPN. 

Philip SemanchukHow to Measure Anglo-Saxonicity – With a Ruler or Yardstick?

Summary (Nutshell)

This is a first look at a work in progress. I’m using Python to study text from an etymological perspective. Specifically, I’m measuring how many words in a given English language text have Anglo-Saxon origin. Many people (including myself) think that Anglo-Saxon words convey a different sense than their counterparts of French/Latin origin. To demonstrate the point in a small way, I’ve included a Latin and Anglo-Saxon version of each heading in this blog post.

Background (Milieu)

English is a Germanic language with Scandinavian influence, with a big layer of Old French poured on top. That Old French (Anglo Norman French, to be specific) was principally derived from Latin, so English is a hybrid between two major Indo-European language groups. Those mongrel origins are a big part of why English is messy and rich.

French was introduced to English as the language of conquerers and nobility. French was also the language of some European royalty in the 18th and 19th century, further adding to its reputation as a language associated with high status. Even today, English words with French origins often have higher cultural status than their counterparts with Anglo-Saxon origins (think cuisine versus cooking, illumination versus light, create versus make, and escargot versus snail). By contrast, the Anglo-Saxon words are often considered more visceral (think sea versus ocean, sweat versus perspire, and free versus emancipated — more on that last pair in a moment).

For instance, when taunting someone, you reach for blunt Anglo-Saxon words. “Your mother was a hamster, and your father smelled of elderberries!” is 100% Anglo-Saxon, except for “elderberries” which was coined in Middle English from “elder” and “berry”, both of Anglo-Saxon origin.

A still of the French taunting King Arthur from Monty Python and the Holy GrailWilliam of Normandy in 1067, addressing his English subjects.

Legal documents and government issuances, on the other hand, tend to include more words of Latin/French origin. It’s no coincidence that the Latin/French words “Emancipation Proclamation” describe a legal act, but if you want to stir the heart about emancipation, you say something like “Free at last!”(1)  which is all Anglo-Saxon.

Others have written more eloquently than I about how word origin influences tone (Annalisa Quinn at NPRGemma Varnom, and M. Birch, to suggest a few), so I won’t belabor the point more than I already have. But I wanted to talk about how it inspired the project I’ve been working on.

The Project (The Work)

I should preface this by saying that I Am Not A Linguist, and I don’t even play one on TV.

I thought it would be interesting to perform lexicographical analysis of text from an etymological perspective. My etymological categorization is necessarily simple. When I look at a text, I put each word into one of three etymological categories: Anglo-Saxon, non-Anglo-Saxon, or unknown. From this rough grouping I generate statistics that allow me to compare one text to another.

For instance, does one author consistently use more Anglo-Saxon words than other authors? Does an author’s usage of Anglo-Saxon words change from one work to another? Also of interest to me is the etymology of words as the book progresses from front to back. Do the relative frequencies of etymologies change as the book progresses towards its exciting conclusion? For authors writing in English as a second language, is their word selection influenced by their first language?

All of the questions above can be explored with the tool I’ve written. It’s easier to show the tool’s output than describe it, so here’s an analysis of Lewis Carroll’s 1865 work “Alice’s Adventures in Wonderland”.

The graph below shows the relative frequency of the three etymological categories as the book progresses from beginning to end.

A graphical representation of how the etymological ration of Alice in Wonderland changes as one progresses through the book

This graph shows the relative frequency of the three etymological categories as counting statistics for various part-of-speech categories.

A graphical representation of the counts of words by parts of speech and etymology in Alice in Wonderland

The table below is a more detailed version of the chart immediately above. Some percentages may not add up to 100% due to rounding.

Total %age of All Words Anglo-Saxon non-Anglo-Saxon Unknown
All Words 26624 100% 18233 (68%) 3812 (14%) 4579 (17%)
Unique 3528 13% 1354 (38%) 899 (25%) 1275 (36%)
Nouns 8522 32% 4521 (53%) 2354 (27%) 1647 (19%)
Verbs 5479 20% 2994 (54%) 565 (10%) 1920 (35%)
Adjectives 1639 6% 896 (54%) 375 (22%) 368 (22%)
Adverbs 1974 7% 1348 (68%) 420 (21%) 206 (10%)
Other 9010 33% 8474 (94%) 98 (1%) 438 (4%)

Observations (What I See)

There’s some minor observations to be made here, but the strength of this tool will be in comparative analysis. It’s hard to draw conclusions from one analysis before I have an idea of what’s typical.

For instance, at first glance, the ratio of Anglo-Saxon to non-Anglo-Saxon words looks dramatic, but this says more about English than it does about Carroll. The most common words in English are overwhelmingly Anglo-Saxon in origin. (2)  For the small sample size of works I’ve processed so far (just 8 in total), I can see that it’s common for roughly three quarters of the words to be Anglo-Saxon. Alice in Wonderland isn’t an outlier by that standard.

We can also see that the frequency of Anglo-Saxon words decreases slightly throughout the book. This is the kind of trend that I find interesting, but in this case it’s due to an increase in the number of words of unknown etymology. Sometimes a word’s etymology is truly unknown. More often, though, the etymology is classified as unknown for other reasons. Most likely, it’s simply not in my etymological database (which isn’t very complete yet). Also, the word could be a proper noun, an invented word (like “woodshadows” from James Joyce’s Ulysses), or a word for which the etymology is ambiguous. An example of this last category is “bank” which is Anglo-Saxon in origin when referring to the side of a river, but French/Italian in origin when referring to a place that handles money.

At present, the quantity of words classified as “unknown” is too large for my tastes, and I plan to reduce it by improving both my database and the tool.

Verbs are overrepresented in the “unknown” category. My guess is that this is an artifact of my stemmer having difficulty stemming verbs. (I’m currently using the Snowball Stemmer from NLTK.)

As you can see, at this point it’s easier to draw conclusions about the representation of the data than it is about the data themselves. That leads me to the next (and final) topic in this post.

Future (What’s to Come)

As I said in the introduction, this is an early look at a work in progress. Here’s some of the things I’d like to add –

  • Better etymological data
  • Large scale comparisons of text to look for trends (across authors, genres, etc.)
  • More numeric (rather than visual) descriptions of the data to facilitate automated comparison. One idea is to add the mean and standard deviation of the percentage of Anglo-Saxon words.
  • Open sourcing

If you have any suggestions on how to use this tool or make it more interesting, I’d love to hear them in the comments below. I moderate all comments to filter spam which is yet another Viking influence on England.

Endnotes

Like English itself, “Endnote” is an etymological hybrid. “End” is of Anglo-Saxon origin, while  “note” comes from Old French/Latin.

1. Martin Luther King, Jr. isn’t the only person to have said “Free at last!”, but his use of it is perhaps the most famous. His “I Have a Dream” speech makes brilliant use of etymological contrasts. Many of his memorable phrases in that speech (“I have a dream today”, “Let freedom ring”, “Free at last”) are Anglo-Saxon.

2. In 2014 I pulled from Wikipedia a list of the 100 most common English words. At the time, it contained just four non-Anglo-Saxon words. They were “just” (ME < Latin), “people” (ME < Anglo-French < Latin), “use” (ME < Old French, replaced OE brucan, cognate w/modern Swedish bruk-), and “because” (ME < Fr ‘par cause’). There are lots of ways to count the 100 most common words, and doubtless the list would have been different in Carroll’s day. But my guess is that the presence of Anglo-Saxon hasn’t changed dramatically from that 96% regardless of when and how one counts.

Caktus GroupLearning to ask the right questions, or people

I ask a lot of questions as a developer. Some of them have been more basic, like ‘How do I import a Python function from one file into another?’, and some more complex, like ‘How should we take an API request and return a dynamically-generated PDF as a response?’

As I have continued to learn, a couple things have been particularly beneficial:

  1. Learning to Google the question to find the answer
  2. Finding more advanced developers to answer my questions and guide my thinking

As I have grown as a developer, I have improved at knowing when to use each resource, and each remains an important part of my growth. I’ve gathered some points here that have been helpful to me during my learning, as well as some suggestions on how to help others become sharper developers.

At the beginning:

When learning to develop we oftentimes have direct questions that can be answered by another person, or by searching the internet. I remember having questions such as ‘What is a terminal and a shell?’ or ‘How do I know if something is a Python file?’. An experienced developer can answer these questions quickly, but can also point a beginner in the direction of how to find these answers on the internet. Some important things I learned at this point:

  • The answer to my question is most likely on the internet (Stack Overflow, Stack Exchange, etc.) and I should get better at finding it
  • Asking a more experienced developer may be faster, but figuring out the answer on my own can be more useful for knowing how to find answers successfully in the future
  • It is helpful to have a person help me think through the implications of my questions Thinking critically through my question and what I’m trying to solve is a lot more important than the specific answer

For people new to development, I recommend trying to Google your questions first. It may take some time to figure out how to look things up on Google or Stack Overflow, but these are useful skills that even experienced developers use every day. I also recommend finding an experienced programmer to provide any further clarification and direction - look out for meetups, or, if possible, ask a friend. For some Python meetups around Durham, NC, see TriPython.

For experienced programmers fielding such questions, remember that it’s a lot more important to become better at thinking through issues than to receive a quick answer.

With some experience:

As we gain development experience, we become better able to answer our own questions by doing internet searches, but we also encounter more complex questions, like ‘What happened to cause this timeout error?’ or ‘Is it possible to build an app that does...?’. Again, it’s important to do Google searches to see if other people have asked similar questions, or to ask these questions to more advanced developers. Some things I learned at this point:

  • Other people have probably attempted to use this feature, and either written about it, or documented it
  • There are multiple ways to accomplish what I am trying to do
  • Some features/libraries are better than others
  • It’s always helpful to ask myself, ‘What am I trying to solve here?’
  • More experienced developers oftentimes know how to solve my issue better, so I should ask their opinion

For people asking these questions, I recommend searching the internet not only to find what other people have asked or answered about specific features or libraries but also to ask questions about what a particular feature or library means for the project: What do we want to accomplish with this feature or library? How would this feature affect the user?

The answers to these questions can be beneficial in framing questions about the feature or library to more experienced developers.

With more experience:

Greater experience means that we can usually answer our more basic questions on our own. However, it also leads to different questions, oftentimes changing ‘Can I’ and ‘How does one’ questions into ‘Should I’ and ‘Why would one’ questions. An example may be considering a library and researching its issues, maintenance, and other people’s experience with it, or considering a feature for a project and whether it will lead to our own maintenance issues. Generally, the questions we ask can aid in deciding between the tradeoffs of different options, and this is something that a more experienced developer can help with.

Some things I learned at this point:

  • It is almost always possible to have a feature to do
  • Every feature has tradeoffs
  • I usually haven’t thought of all the tradeoffs or the historical reasons for using certain libraries and not others, but a more experienced developer may have

I encourage people with more complex development questions to come up with some options for solving an issue or adding a feature. Use a Google search or another developer’s experience to research these, and then ask a more experienced developer what the implications of each option might mean for the project.

A caveat:

When searching online, it is always possible to find outdated answers, especially for rapidly-changing topics like JavaScript, or when researching new libraries. While someone’s answer may have worked 5 years ago, the library may have changed and no longer supports the previous method, or the community may have moved along to use different development guidelines or frameworks. To combat this, I recommend limiting Google searches to just the past year or two, or try to see if older answers have newer comments with more accurate information. Another possibility is finding options on the internet and asking a more experienced developer which one is a better solution.

Further thoughts:

As I have continued learning as a developer, I have become better at knowing what to search for on the internet, and also at asking questions of more experienced developers. I still have lots of basic questions, and now also some more complex ones. While practice makes us better at searching for answers on the internet, I am particularly grateful to be able to ask questions of more experienced developers and to allow them to guide my thinking about certain topics. In fact, I have really relied on the expertise of more experienced developers to guide the way that I approach technical issues and figure out the best way to resolve them. I encourage any tech firm to support these kinds of interactions between developers, whether in a formal way (mentorship meetings, hosting meetups, etc.) or informally. I know it has made my time at Caktus both more enjoyable and more efficient.

Becoming a better developer is an ongoing process. Check out how to plan for mistakes as a developer to continue the journey.

Caktus GroupDigging Into Django QuerySets

Digging Into Django QuerySets

Object-relational mappers (or ORMs for short), such as the one that comes built-in with Django, make it easy for even new developers to become productive without needing to have a large body of knowledge about how to make use of relational databases. They abstract away the details of database access, replacing tables with declarative model classes and queries with chains of method calls. Since this is all done in standard Python developers can build on top of it further, adding instance methods to a model to wrap reusable pieces of logic. However, the abstraction provided by ORMs is not perfect. There are pitfalls lurking for unwary developers, such as the N + 1 problem. On the bright side, it is not difficult to explore and gain a better understanding of Django's ORM. Taking the time and effort to do so will help you become a better Django developer.

In this article I'll be setting up a simple example app, consisting of nothing more than a few models, and then making use of the Django shell to perform various queries and examine the results. You don't have to follow along, but it is recommended that you do so.

First, create a clean virtualenv. Here I'll be using Python 3 all of the way, but there should be little difference with Python 2.

$ mkvirtualenv -p $(which python3) querysets
Already using interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/jrb/.virtualenvs/querysets/bin/python3
Also creating executable in /home/jrb/.virtualenvs/querysets/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.

Next, install Django and IPython,

(querysets) $ pip install django ipython

Create the new project.

(querysets) $ django-admin.py startproject querysets
(querysets) $ cd querysets/
(querysets) $ ./manage.py startapp qs

Update querysets/settings.py to add 'qs', to the end of the INSTALLED_APPS list. Then, edit qs/models.py to add the simple models we will be dealing with

from django.db import models


class OnlyOne(models.Model):
    name = models.CharField(max_length=16)


class MainModel(models.Model):
    name = models.CharField(max_length=16)
    one = models.ForeignKey(OnlyOne)


class RelatedModel(models.Model):
    name = models.CharField(max_length=16)
    main = models.ForeignKey(MainModel, related_name='many')

Finally, set up the database.

(querysets) jrb@caktus025:~/caktus/querysets$ ./manage.py makemigrations qs
Migrations for 'qs':
  qs/migrations/0001_initial.py:
    - Create model MainModel
    - Create model OnlyOne
    - Create model RelatedModel
    - Add field one to mainmodel
(querysets) jrb@caktus025:~/caktus/querysets$ ./manage.py migrate
...

Running python manage.py shell should now pull up an IPython session.

Now that we have a working project set up, we'll need some means of keeping track of the quantity and the raw SQL of any queries sent to the database. Django's TransactionTestCase class provides an assertNumQueries method, which is interesting but too specific and too tied to the test suite for our needs. However, examining its implementation, we can see that it ultimately makes use of a context manager called CaptureQueriesContext, from the django.test.utils module. This context manager will cause a database connection to capture all of the SQL queries sent, even if such is currently turned off (i.e. if DEBUG = False is set), and make those queries available on the context object. I find this a useful tool to use in debugging to track down code that is issuing too many queries to the database, in situations where Django Debug Toolbar won't help.

At the time of writing, the most recent released version of Django is 1.10.6. I've copied the code for CaptureQueriesContext for this version below, with a few irrelevancies redacted.

class CaptureQueriesContext(object):
    def __init__(self, connection):
        self.connection = connection

    @property
    def captured_queries(self):
        return self.connection.queries[self.initial_queries:self.final_queries]

    def __enter__(self):
        self.force_debug_cursor = self.connection.force_debug_cursor
        self.connection.force_debug_cursor = True
        self.initial_queries = len(self.connection.queries_log)
        self.final_queries = None
        request_started.disconnect(reset_queries)
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.connection.force_debug_cursor = self.force_debug_cursor
        request_started.connect(reset_queries)
        if exc_type is not None:
            return
        self.final_queries = len(self.connection.queries_log)

So here we can see several things of interest to us. The context manager keeps a reference to the database connection (as self.connection), it sets and then unsets a flag on the connection (self.connection.force_debug_cursor) which tells the connection to do the captures, it stores the number of queries at the start and at the end (self.initial_queries and self.final_queries), and finally, it provides a slice of the actual queries captured as the property captured_queries. Nothing here restricts its use to the test suite, so we'll be making use of it throughout in our IPython session.

Let's try it out now.

In [1]: from django.test.utils import CaptureQueriesContext

In [2]: from django.db import connection

In [3]: from qs import models

In [4]: with CaptureQueriesContext(connection) as context:
   ...:     print(models.MainModel.objects.all())
   ...:
<QuerySet []>

In [5]: print(context.initial_queries, context.final_queries)
0 1

So we can see that there were no queries to start out with, and that a query was issued to the database by our code. Let's see what that looks like,

In [6]: print(context.captured_queries)
[{'time': '0.001', 'sql': 'SELECT "qs_mainmodel"."id", "qs_mainmodel"."name", "q
s_mainmodel"."one_id" FROM "qs_mainmodel" LIMIT 21'}]

This shows us that the captured_queries property gives us a list of dicts, and each dict contains the raw SQL and the time it took to execute. In the above query, note the LIMIT 21. This is there because the repr() of a QuerySet limits itself to showing no more than 20 of the items it contains. The additional twenty-first item is captured so that it knows whether or not to add an ellipsis at the end to indicate that there are more items available.

Let's create some data. First up, we need a quick and dirty way of populating the name fields

In [7]: import random

In [8]: import string

In [9]: def random_name():
   ...:     return ''.join(random.choice(string.ascii_letters) for i in range(16
   ...: ))
   ...:

In [10]: random_name()
Out[10]: 'nRtybzKaSZWjHOBZ'

Now the objects

In [11]: with CaptureQueriesContext(connection) as context:
    ...:     models.OnlyOne.objects.bulk_create([
    ...:         models.OnlyOne(name=random_name())
    ...:         for i in range(5)
    ...:     ])
    ...:     models.MainModel.objects.bulk_create([
    ...:         models.MainModel(name=random_name(), one_id=i + 1)
    ...:         for i in range(5)
    ...:     ])
    ...:     models.RelatedModel.objects.bulk_create([
    ...:         models.RelatedModel(name=random_name(), main_id=i + 1)
    ...:         for i in range(5)
    ...:         for x in range(7)
    ...:     ])
    ...:

In [12]: print(context.final_queries - context.initial_queries)
6

In [13]: print(context.captured_queries)
[{'sql': 'BEGIN', 'time': '0.000'}, {'sql': 'INSERT INTO "qs_onlyone" ("name") S
ELECT \'TSdxoYKuGUnijmVY\' UNION ALL SELECT \'DNcSpIJbnVXjbabq\' UNION ALL SELEC
T \'suAQQAzEflBqLxuc\' UNION ALL SELECT \'hWtuPkfjdATxZhNV\' UNION ALL SELECT \'
GTwPkXTUSpZYBWCT\'', 'time': '0.000'}, {'sql': 'BEGIN', 'time': '0.000'}, {'sql'
: 'INSERT INTO "qs_mainmodel" ("name", "one_id") SELECT \'fsekHOfSJxdiGiqp\', 1
UNION ALL SELECT \'dHdPoqeKZzRCJEql\', 2 UNION ALL SELECT \'MiDwEPvqIuxCEArT\',
3 UNION ALL SELECT \'yCCRaPiLzWnnUewS\', 4 UNION ALL SELECT \'ftfQWWfuZhXblNlF\'
, 5', 'time': '0.000'}, {'sql': 'BEGIN', 'time': '0.000'}, {'sql': 'INSERT INTO
"qs_relatedmodel" ("name", "main_id") SELECT \'tMOCzPRjKZHbwBLb\', 1 UNION ALL S
ELECT \'SLxCmCtmxeCwpkAC\', 1 UNION ALL SELECT \'qccDQKsgmTFCIMsF\', 1 UNION ALL
 SELECT \'LWAvYdGQvBdlsYjI\', 1 UNION ALL SELECT \'gLjaGTjoLoNMkbDl\', 1 UNION A
LL SELECT \'PqjvVLhCFoMOlVfH\', 1 UNION ALL SELECT \'BpHnEzhucSZNryWs\', 1 UNION
 ALL SELECT \'CYOkzJgsrGYOLOoB\', 2 UNION ALL SELECT \'HheikdsLaQnWpZBj\', 2 UNI
ON ALL SELECT \'mXqccnNYLDrQOoiT\', 2 UNION ALL SELECT \'BipJWoDVlJoyNPxD\', 2 U
NION ALL SELECT \'BgvvYHlAHegyRjbF\', 2 UNION ALL SELECT \'GrlOlbMwnqfkPKZX\', 2
 UNION ALL SELECT \'OFJchZLVjmXNAHjO\', 2 UNION ALL SELECT \'SYHRkSvBmupzUHXO\',
 3 UNION ALL SELECT \'imAQEQUyrNoRNRSG\', 3 UNION ALL SELECT \'ZEvmnPMurchiLfcd\
', 3 UNION ALL SELECT \'kYtKQNoeoUuxpYPC\', 3 UNION ALL SELECT \'FvGFRSMariUanWs
L\', 3 UNION ALL SELECT \'VKXEeClDnrnruAng\', 3 UNION ALL SELECT \'eDnEaWAqWRWdC
vMc\', 3 UNION ALL SELECT \'wmIKiiqHBAJiOkMb\', 4 UNION ALL SELECT \'pzEMvmVqbSk
LICVO\', 4 UNION ALL SELECT \'dIclLsVIXaHIyUYk\', 4 UNION ALL SELECT \'nDyHLSYgB
AYIZAkP\', 4 UNION ALL SELECT \'GfrOYcPPYRXMBvmC\', 4 UNION ALL SELECT \'PZUiAwe
kQlmIMJAW\', 4 UNION ALL SELECT \'jnWbngcVgVPFAJNn\', 4 UNION ALL SELECT \'RQQyr
DQpTIPxItND\', 5 UNION ALL SELECT \'SaFNLtavfdceqzTE\', 5 UNION ALL SELECT \'CSm
oYuPNttJTFdlH\', 5 UNION ALL SELECT \'PxufMeDfIBeMAtQV\', 5 UNION ALL SELECT \'m
NaTQepfHkFMRFet\', 5 UNION ALL SELECT \'CHlOqOHXIDyzorfW\', 5 UNION ALL SELECT \
'BKgXGwdXJQBMQGJM\', 5', 'time': '0.000'}]

This looks pretty ugly, but we can see that each .bulk_create() results in two queries, a BEGIN starting the transaction, and an INSERT INTO with a crazy set of SELECT and UNION ALL clauses following it.

Ok, now that we are finally all set up, let's explore. What happens if we just create a QuerySet and set it in a variable?

In [14]: with CaptureQueriesContext(connection) as context:
    ...:     qs = models.MainModel.objects.all()
    ...:

In [15]: print(context.final_queries - context.initial_queries)
0

In [16]: print(context.captured_queries)
[]

No queries were sent to the database! This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed. Similarly, .filter(), .exclude(), and the other QuerySet-returning methods will not, by themselves, trigger a query sent to the database.

In [17]: with CaptureQueriesContext(connection) as context:
    ...:     qs = models.MainModel.objects.filter(name='foo')
    ...: print(context.final_queries - context.initial_queries)
    ...:
0

In [18]: with CaptureQueriesContext(connection) as context:
    ...:     qs = models.MainModel.objects.filter(name='foo')
    ...:     qs2 = qs.filter(name='bar')
    ...: print(context.final_queries - context.initial_queries)
    ...:
0

Here we see that even chaining a filtered QuerySet off of another QuerySet is insufficient to cause a database access. However, non-QuerySet-returning methods such as .count() will result in a query sent to the database.

In [19]: with CaptureQueriesContext(connection) as context:
    ...:     count = models.MainModel.objects.count()
    ...: print(context.final_queries - context.initial_queries)
    ...:
1

So, when will a QuerySet result in a round-trip to the database? Basically, this happens any time concrete results are needed from the QuerySet, such as looping explicitly or implicitly. Here are some of the more typical ones

In [20]: with CaptureQueriesContext(connection) as context:
    ...:     for m in models.MainModel.objects.all():
    ...:         obj = m
    ...:     r = repr(models.OnlyOne.objects.all())
    ...:     l = len(models.RelatedModel.objects.all())
    ...:     list_main = list(models.MainModel.objects.all())
    ...:     b = bool(models.OnlyOne.objects.all())
    ...: print(context.final_queries - context.initial_queries)
    ...:
5

Note that each of these triggers its own query. The Django docs have a full list of the things that cause a QuerySet to trigger a query.

We've now seen that simply instantiating a QuerySet doesn't send a query to the database, and that obtaining data out of it does. The next most obvious question is, will a QuerySet ask for data from the database multiple times? Let's find out

In [21]: with CaptureQueriesContext(connection) as context:
    ...:     qs = models.MainModel.objects.all()
    ...:     L = list(qs)
    ...:     L2 = list(qs)
    ...: print(context.final_queries - context.initial_queries)
    ...:
1

Terrific! Just as we would hope, the QuerySet somehow reuses its previous data when we ask for it again. Keep in mind, though, if we attempt to further refine a QuerySet,

In [22]: with CaptureQueriesContext(connection) as context:
    ...:     qs = models.MainModel.objects.all()
    ...:     L = list(qs)
    ...:     qs2 = qs.filter(name__startswith='b')
    ...:     L2 = list(qs2)
    ...: print(context.final_queries - context.initial_queries)
    ...:
2

it does not re-use the data. So how does this work? The implementation of QuerySet can be found in django.db.models.query, but in particular, let's look at the implementation of the relevant methods

def __iter__(self):
    self._fetch_all()
    return iter(self._result_cache)

def _fetch_all(self):
    if self._result_cache is None:
        self._result_cache = list(self.iterator())
    if self._prefetch_related_lookups and not self._prefetch_done:
    self._prefetch_related_objects()

def iterator(self):
    return iter(self._iterable_class(self))

So we can see that iterating over a QuerySet will check to see if a cache at ._result_cache is populated yet, and if not, populates it with a list of objects. This list, then, is what will be iterated over. Subsequent iterations will then get the cache, so no further queries are issued. Doing a chained .filter() call, though, results in a new QuerySet that does not share the cache of the previous one.

The iterator() method used above is a documented public method, which returns an iterator over a configurable iterable class of model instances. Note that it does not involve the cache, so subsequent calls will result in a new query to the database. So why is this a public method? Under what circumstances would it be useful to not populate the cache? The iterator() method is most useful when you have memory concerns when iterating over a particularly large QuerySet, or one that has a large amount of data stored in the fields, especially if it is known that the QuerySet will only be used once and then thrown away.

Interestingly, certain non-QuerySet-returning methods such as .count(),

In [23]: with CaptureQueriesContext(connection) as context:
    ...:     qs = models.MainModel.objects.all()
    ...:     L = list(qs)
    ...:     c = qs.count()
    ...: print(context.final_queries - context.initial_queries)
    ...:
1

can also make use of an already filled cache.

A common pattern that you will see is iterating over a QuerySet in a template, and rendering information about each item, which may involve access of related objects. To simulate this, let's loop and set the name of each item's OnlyOne into a variable.

In [24]: with CaptureQueriesContext(connection) as context:
    ...:     for item in models.MainModel.objects.all():
    ...:         name = item.one.name
    ...: print(context.final_queries - context.initial_queries)
    ...:
6

Six queries! What could possibly be going on here?

In [25]: for q in context.captured_queries:
    ...:     print(q['sql'])
    ...:
SELECT "qs_mainmodel"."id", "qs_mainmodel"."name", "qs_mainmodel"."one_id" FROM "qs_mainmodel"
SELECT "qs_onlyone"."id", "qs_onlyone"."name" FROM "qs_onlyone" WHERE "qs_onlyone"."id" = 1
SELECT "qs_onlyone"."id", "qs_onlyone"."name" FROM "qs_onlyone" WHERE "qs_onlyone"."id" = 2
SELECT "qs_onlyone"."id", "qs_onlyone"."name" FROM "qs_onlyone" WHERE "qs_onlyone"."id" = 3
SELECT "qs_onlyone"."id", "qs_onlyone"."name" FROM "qs_onlyone" WHERE "qs_onlyone"."id" = 4
SELECT "qs_onlyone"."id", "qs_onlyone"."name" FROM "qs_onlyone" WHERE "qs_onlyone"."id" = 5

As we can see, we have one query which populates the main QuerySet, but then as each item gets processed, each sends an additional query to get the item's associated OnlyOne object. This is referred to as the N + 1 Problem. But how can we fix it? It turns out that Django comes with a QuerySet method for just this purpose: select_related(). If we adjust our code like this,

In [26]: with CaptureQueriesContext(connection) as context:
    ...:     for item in models.MainModel.objects.select_related('one').all():
    ...:         name = item.one.name
    ...: print(context.final_queries - context.initial_queries)
    ...:
1

we drop back down to only one query again

In [27]: for q in context.captured_queries:
    ...:     print(q['sql'])
    ...:
SELECT "qs_mainmodel"."id", "qs_mainmodel"."name", "qs_mainmodel"."one_id", "qs_
onlyone"."id", "qs_onlyone"."name" FROM "qs_mainmodel" INNER JOIN "qs_onlyone" O
N ("qs_mainmodel"."one_id" = "qs_onlyone"."id")

So .select_related('one') tells Django to do an INNER JOIN across the foreign key, and make use of that information when instantiating the objects in Python. Great! The select_related() method is capable of taking multiple arguments and will do a join for each of them. You can also join multiple tables deep by using Django's double-underscore syntax, for example .select_related('foo__bar') would join our main model's table with the table for 'foo', and then further join with the table for 'bar'. Note that other things that would cause a join in the sql, such as filtering on a field on the related object, will not cause that related object to be made available as a Python object; you still need to specify your .select_related() fields explicitly.

This all works if the model we are querying has a foreign key to the other model. What if the relationship runs the other way, resulting in a one-to-many relationship?

In [29]: with CaptureQueriesContext(connection) as context:
    ...:     for item in models.MainModel.objects.all():
    ...:         for related in item.many.all():
    ...:             name = related.name
    ...: print(context.final_queries - context.initial_queries)
    ...:
6

In [30]: for q in context.captured_queries:
    ...:     print(q['sql'])
    ...:
SELECT "qs_mainmodel"."id", "qs_mainmodel"."name", "qs_mainmodel"."one_id" FROM
"qs_mainmodel"
SELECT "qs_relatedmodel"."id", "qs_relatedmodel"."name", "qs_relatedmodel"."main
_id" FROM "qs_relatedmodel" WHERE "qs_relatedmodel"."main_id" = 1
SELECT "qs_relatedmodel"."id", "qs_relatedmodel"."name", "qs_relatedmodel"."main
_id" FROM "qs_relatedmodel" WHERE "qs_relatedmodel"."main_id" = 2
SELECT "qs_relatedmodel"."id", "qs_relatedmodel"."name", "qs_relatedmodel"."main
_id" FROM "qs_relatedmodel" WHERE "qs_relatedmodel"."main_id" = 3
SELECT "qs_relatedmodel"."id", "qs_relatedmodel"."name", "qs_relatedmodel"."main
_id" FROM "qs_relatedmodel" WHERE "qs_relatedmodel"."main_id" = 4
SELECT "qs_relatedmodel"."id", "qs_relatedmodel"."name", "qs_relatedmodel"."main
_id" FROM "qs_relatedmodel" WHERE "qs_relatedmodel"."main_id" = 5

As before, we get 6 queries. However, if we were to try to use .select_related('many'), we would get a FieldError. For this situation, Django provides a different method to mitigate the problem: prefetch_related.

In [31]: with CaptureQueriesContext(connection) as context:
    ...:     for item in models.MainModel.objects.prefetch_related('many').all()
    ...: :
    ...:         for related in item.many.all():
    ...:             name = related.name
    ...: print(context.final_queries - context.initial_queries)
    ...:
2

Two queries, that's at least better. What's going on here, though, why two? If we take a look at the queries generated, we see

In [32]: for q in context.captured_queries:
    ...:     print(q['sql'])
    ...:
SELECT "qs_mainmodel"."id", "qs_mainmodel"."name", "qs_mainmodel"."one_id" FROM
"qs_mainmodel"
SELECT "qs_relatedmodel"."id", "qs_relatedmodel"."name", "qs_relatedmodel"."main
_id" FROM "qs_relatedmodel" WHERE "qs_relatedmodel"."main_id" IN (1, 2, 3, 4, 5)

So it turns out that Django first loads up the QuerySet for MainModel, then it determines what primary key values it received, and then does a second query on RelatedModel, filtering on those that have a foreign key to one of those values.

There is one thing that you should be aware of when prefetching one-to-many relationships in this manner. A fairly typical thing to do is to make use of Django model's object-oriented nature, and write instance methods that do some non-trivial computation, sometimes involving looping or filtering on one-to-many or many-to-many relationships. We'll simulate that here by just using a .filter() call in the inner loop

In [33]: with CaptureQueriesContext(connection) as context:
    ...:     for item in models.MainModel.objects.prefetch_related('many').all()
    ...: :
    ...:         for related in item.many.filter(name__startswith='b'):
    ...:             name = related.name
    ...: print(context.final_queries - context.initial_queries)
    ...:
7

And now we find that we're back up to seven queries, despite the use of .prefetch_related(). What's going on here is that the prefetch is making item.many.all() act exactly like an already iterated-over QuerySet, like from earlier in this article, by filling its cache for later re-use. However, as in those earlier cases, if you do any further refinement of the QuerySet it does not share the cache with the new QuerySet. In many cases, it would simply be better to iterate over the relationship and filter using Python directly. Additionally, Django starting with version 1.7 introduced a Prefetch object, which allows more control over the query used in the prefetch_related() call. I advise using tools such as Django Debug Toolbar, using real data, to determine what makes the most sense for your use.

There is another thing that you should be aware of when encapsulating queries involving one-to-many or many-to-many relationships. You may see code like this

def some_expensive_calculation(self):
    related_objs = RelatedModel.objects.filter(main=self)
    ...

This code, as we should now be able to see, is an anti-pattern that will always issue a query when called from a MainModel item, regardless of whatever optimizations have been used on the QuerySet which obtained the MainModel in the first place. It would be better to do this instead

def some_expensive_calculation(self):
    related_objs = self.many.all()
    ...

That way, if we have calling code that does this

for item in models.MainModel.objects.prefetch_related('many'):
    result = item.some_expensive_calculation()
    ...

we should only get the two queries we expect, not one for the main set plus one each for however many items are in that set.

So now we've seen that the QuerySets that you use in your apps can have significant real-world performance implications. However, with some care and understanding of the simple concepts behind Django's QuerySets, you can improve your code and become a better Django developer. But more than that, I hope that you take away from this article the realization that you shouldn't be afraid to read Django's source code to see how something works, or to build minimal working examples or simple tools to explore problems within the Django shell.

Read more Django posts on the Caktus blog.

Tim HopperWeb Development and Design for the Backend Developer

I've been tinkering with websites for nearly 20 years. My friend Hunter and I were big into making terrible Angelfire sites as pre-teens. In high school, my dad paid me to make him a webpage for his doctor's office (I used Frontpage). A year or two after that, I read Kevin Yank's "Build Your Own Database Driven Website Using PHP & MySQL" and hacked together a PHP back-end for a Lord of the Rings fan site.

In recent years, I've put together this blog, shouldigetaphd.com, and a few other simple web-based side projects. However, I haven't kept up with modern web development, and my projects have been hacked together from boilerplate or templates. I've programmed professionally since 2011, I've spent very little of that writing anything close to graphical user interfaces.

I have a number of other side projects that I'd like to do at some point, and most of them would require some sort of graphical interface. While I could work on app development, I think web-based implementations would be a great starting place.

A few months back, I decided to stop watching Netflix on the treadmill and instead use those 45 minutes each morning to learn; in particular, I've been trying to learn more about modern(ish) web design and development. My work has a subscription to Safari Books Online which gives me access to copious technical books and video tutorials.

The number of resources available on Safari (along with YouTube, blog posts, etc) is astounding. I started many video tutorials on Safari that I quickly realized weren't going to be useful. Yet there many gems to be found, which I share here with you.

What follows is an overview of the technologies I've realized I need to learn more about and links to the resources I've found valuable in learning about them. If you think there are gaps I haven't yet filled or better resources than I've listed below, I'd love your feedback.

What I Knew Going In

I've been a professional software developer and data scientist since 2012. I mostly write Python, but I've programmed in a number of different languages.

I have a pretty good grasp on how HTML and CSS work. I've used enough Javascript over the years to be dangerous; I understood how it runs in the browsers. I understand what a DOM is and how it relates to the page source.

I've used the Python Flask web framework for several projects. I understand how to repond to HTTP requests with server-generated content. I had some idea of how to run my own web server on AWS.

I've used Jekyll, Hugo, and Pelican to create statically generated sites.

I understood DNS at a high level, but never really learned what all the different DNS types were, and I didn't understand why name server changes take so long to propagate.

I had some idea of what node.js and npm are.

I'm a committed Sublime Text user.

A Meta Tutorial on Web Development

A great place to start is Andrew Montalenti's lengthy tutorial on using Python, Flask, Bootstrap, and Mongo to rapidly prototype a website. The tutorial is out of date, but the principles still stand.

Another great resource is Cody Lindley's free Front-End Developer's Handbook. This is a substantial list meta-resource that organizes links for learning all angles of front-end development. "It is specifically written with the intention of being a professional resource for potential and currently practicing front-end developers to equip themselves with learning materials and development tools."

Chrome Developer Tools

One of the most important tools for me in learning more about web development has been the Chrome Developer Tools. You can live edit the DOM elements and style sheets and watch how a website changes. I've mostly learned Developer Tools through exploring it myself, but there are lots of tutorials for it on Youtube.

HTML, CSS, and Bootstrap

Many modern websites are responsive: they automatically adapt to various size screens and devices, from phones to desktops. Writing responsive websites from scratch requires deep knowledge of HTML, CSS, Javascript, and browsers. Unless you're doing this professionally, you probably don't want to write a responsive site from scratch.

For several projects, I've used the lightweight Skeleton project to create simple, responsive pages.

Recently, I decide to dive deep into the more robust Bootstrap framework originally developed at Twitter.

I watched Brock Nunn's Building a Responsive Website with Bootstrap (Safari), a two hour tutorial on getting started with Bootstrap. The documentation for Bootstrap is clear (if terse) and worth reading through.

Once you have a basic idea of how Bootstrap works, the best thing you can do is start playing with it. Since I was familiar with the Pelican static site generator, I decided to switch this blog to Bootstrap theme starting with pelican-bootstrap3.

I've worked with Bootstrap 3 until now. Bootstrap 4 is about to come out. Bootstrap 4 moves the style sheets from LESS to SASS and adds Flexbox functionality. Unless you understand what those mean (more below), you'd be fine using version 3.

I wanted to get a better grasp on CSS Selectors, so I read Eric Meyer's brief Selectors, Specificity, and the Cascade: Applying CSS3 to Documents (Safari)

I watched Marty Hall's JavaScript, jQuery, and jQuery UI tutorial (Safari). I was able to skip big chunks where I already understood certain parts, but it helped me fill in lots of gaps.

Advanced Stylesheets (LESS, SASS, and Flexbox)

There are several alternatives to writing raw CSS. Two popular ones are Less and SASS. These "preprocessors" allow you to write CSS-like stylesheets but with constructs such as variables, nesting, inheritance, and mathematical operators.

I found this brief tutorial on Less (Safari) helpful, and I've enjoyed Less a lot. I haven't used SASS yet, but it's very similar. I'll probably switch to SASS when I start using Bootstrap 4.

Another modern innovation is the Flexbox layout model for CSS. Stone River Learning has a great tutorial on Flexbox (Safari). It seems that Flexbox is the future of CSS-based layouts, and it's worth learning about.

Advanced JavaScript (Elm, React, Angular, Backbone, Ember)

The JavaScript web framework space has exploded. Many of these are implementations of the Model, View, Controller pattern, including React, Angular, and Ember. These tools allow the creation of complex web apps (as well as mobile apps).

Web Server Operations and DNS

I learned a ton form Linux Web Operations (Safari) by Ben Whaley. "The videos discuss the relationship between web and application servers, load balancers, and databases and introduce configuration management, monitoring, containers, cryptography, and DNS."

I've struggled with DNS configuration over the years, so I watched Cricket Liu's Learning DNS series (Safari). I still wouldn't want to be responsible for a company's complex DNS infrastructure, but I can now configure my own sites DNS with a little more understanding.

Development Automation

Package Managers

It's likely that any modern web project will have some external Javascript dependencies. Package managers (analogous to Pypi or Anaconda.org on Python) have emerged to help support this. Node.js comes with the npm package manager, but Bower seems to make more sense for front-end development.1 Cody Lindley has a nice introduction to npm and Bower. Bower is well documented and easy to start using. There is a nice Flask extension to help you integrate Bower with your Python project.

Task Automation

Web development comes with lots of build-style tasks that have to happen repeatedly. For example, before you can render a webpage in the browser, you might need to convert the Less to CSS and start a local web server. Before deploying to production, you might want to also run tests and minify your Javascript.

There's a GUI application called Codekit that can do a lot of these tasks. You can also do it through a Node.js program called Grunt. I haven't used it yet, but it looks like following the documentation would be the best way to get started.

Gulp is a popular alternative to Grunt.

Design

Visual Design

Design has never been my strong point. One way to compensate for that is to rely on the work of others. There are copious Bootstrap themes available, and some are even free.

I enjoyed Software Engineering Daily's interview with Tracy Osborn on Design for Non-designers. She has some blog posts on the topic. Tracy recommends COLOURLovers for color ideas and Font Pair for selecting fonts from Google Fonts.

User Experience Design

On the topic of UX, I finally read Steve Krug's classic Don't Make Me Think (Safari); it's great. Ginny Redish's Letting Go of Words (Safari) is similarly excellent. Steve Krug's Don't Make Me Think

Conclusion

I've learned a lot in the past few months. I've filled in some gaps about how CSS works. I've gotten a better grasp on the Javascript prototype model. I've learned that I can start with higher level tools (e.g. Bootstrap and JQuery) to rapidly build my side projects with some amount of visual appeal. I'm learning how to use available tools to reduce the boilerplate I have to write, automate tedious tasks, and reduce my personal technical debt.

I still have a lot of learning and a lot of practicing ahead of me, but I'm starting to feel confident that I could make headway on some of my projects. The modern frontend development landscape is massive, varied, and ever changing, but that shouldn't prohibit you from diving in if you want to.


  1. The recent buzz in package management has been about Yarn, a replacement for npm. 

Footnotes