A planet of blogs from our members...

Caktus GroupCoding for Time Zones & Daylight Saving Time — Oh, the Horror

In this post, I review some reasons why it's really difficult to program correctly when using times, dates, time zones, and daylight saving time, and then I'll give some advice for working with them in Python and Django. Also, I'll go over why I hate daylight saving time (DST).

TIME ZONES

Let's start with some problems with time zones, because they're bad enough even before we consider DST, but they'll help us ease into it.

Time Zones Shuffle

Time zones are a human invention, and humans tend to change their minds, so time zones also change over time.

Many parts of the world struggle with time changes. For example, let's look at the Pacific/Apia time zone, which is the time zone of the independent country of Samoa. Through December 29, 2011, it was -11 hours from Coordinated Universal Time (UTC). From December 31, 2011, Pacific/Apia became +13 hours from UTC.

What happened on December 30, 2011? Well, it never happened in Samoa, because December 29, 23:59:59-11:00 is followed immediately by December 31, 0:00:00+13:00.

Date Time Zone Date Time Zone
2011-12-29 23:59:59 UTC-11 2011-12-30 10:59:59 UTC
2018-12-31 00:00:00 UTC+13 2011-12-30 11:00:00 UTC

That's an extreme example, but time zones change more often than you might think, often due to changes in government or country boundaries.

The bottom line here is that even knowing the time and time zone, it's meaningless unless you also know the date.

Always Convert to UTC?

As programmers, we're encouraged to avoid issues with time zones by "converting" times to UTC (Coordinated Universal Time) as early as possible, and convert to the local time zone only when necessary to display times to humans. But there's a problem with that.

If all you care about is the exact moment in the lifetime of the universe when an event happened (or is going to happen), then that advice is fine. But for humans, the time zone that they expressed a time in can be important, too.

For example, suppose I'm in North Carolina, in the eastern time zone, but I’m planning an event in Memphis, which is in the central time zone. I go to my calendar program and carefully enter the date and "3:00 p.m. CST". The calendar follows the usual convention and converts my entry to UTC by adding 6 hours, so the time is stored as 9:00 p.m. UTC, or 21:00 UTC. If the calendar uses Django, there's not even any extra code needed for the conversion, because Django does it automatically.

The next day I look at my calendar to continue working on my event. The event time has been converted to my local time zone, or eastern time, so the calendar shows the event happening at "4:00 p.m." (instead of the 3:00 p.m. that it should be). The conversion is not useful for me, because I want to plan around other events in the location where the event is happening, which is using CST, so my local time zone is irrelevant.

The bottom line is that following the advice to always convert times to UTC results in lost information. We're sometimes better off storing times with their non-UTC time zones. That's why it's kind of annoying that Django always "converts" local times to UTC before saving to the database, or even before returning them from a form. That means the original timezone is lost unless you go to the trouble of saving it separately and then converting the time from the database back to that time zone after you get it from the database. I wrote about this before.

By the way, I've been putting "convert" in scare quotes because talking about converting times from one time zone to another carries an implicit assumption that such converting is simple and loses no information, but as we see, that's not really true.

DAYLIGHT SAVING TIME

Daylight saving time (DST) is even more of a human invention than time zones.

Time zones are a fairly obvious adaptation to the conflict between how our bodies prefer to be active during the hours when the sun is up, and how we communicate time with people in other parts of the world. Historical changes in time zones across the years are annoying, but since time zones are a human invention it's not surprising that we'd tweak them every now and then.

DST, on the other hand, amounts to changing entire time zones twice every year. What does US/eastern time zone mean? I don't know, unless you tell me the date. From January 1, 2018 to March 10, 2018, it meant UTC-5. From March 11, 2018 to November 3, 2018, it meant UTC-4. And from November 4, 2018 to December 31, 2018, it's UTC-5 again.

But it gets worse. From Wikipedia:

The Uniform Time Act of 1966 ruled that daylight saving time would run from the last Sunday of April until the last Sunday in October in the United States. The act was amended to make the first Sunday in April the beginning of daylight saving time as of 1987. The Energy Policy Act of 2005 extended daylight saving time in the United States beginning in 2007. So local times change at 2:00 a.m. EST to 3:00 a.m. EDT on the second Sunday in March and return at 2:00 a.m. EDT to 1:00 a.m. EST on the first Sunday in November.

So in a little over 50 years, the rules changed 3 times.

Even if you have complete and accurate information about the rules, daylight saving time complicates things in surprising ways. For example, you can't convert 2:30 a.m. March 11, 2018. in US/eastern time zone to UTC, because that time never happened — our clocks had to jump directly from 1:59:59 a.m. to 3:00:00 a.m. See below:

Date Time Zone Date Time Zone
2018-03-11 1:59:59 EST 2018-03-11 6:59:59 UTC
2018-03-11 3:00:00 EDT 2018-03-11 7:00:00 UTC

You can't convert 1:30 a.m. November 4, 2018, in US/eastern time zone to UTC either, because that time happened twice. You would have to specify whether it was 1:30 a.m. November 4, 2018 EDT or 1:30 a.m. November 4, 2018 EST:

Date Time Zone Date Time Zone
2018-11-04 1:00:00 EDT 2018-11-04 5:00:00 UTC
2018-11-04 1:30:00 EDT 2018-11-04 5:30:00 UTC
2018-11-04 1:59:59 EDT 2018-11-04 5:59:59 UTC
2018-11-04 1:00:00 EST 2018-11-04 6:00:00 UTC
2018-11-04 1:30:00 EST 2018-11-04 6:30:00 UTC
2018-11-04 1:59:59 EST 2018-11-04 6:59:59 UTC

Advice on How to Properly Manage datetimes

Here are some rules I try to follow.

When working in Python, never use naive datetimes. (Those are datetime objects without timezone information, which unfortunately are the default in Python, even in Python 3.)

Use the pytz library when constructing datetimes, and review the documentation frequently. Properly managing datetimes is not always intuitive, and using pytz doesn't prevent me from using it incorrectly and doing things that will provide the wrong results only for some inputs, making it really hard to spot bugs. I have to triple-check that I'm following the docs when I write the code and not rely on testing to find problems.

Let me strengthen that even further. It is not possible to correctly construct datetimes with timezone information using only Python's own libraries when dealing with timezones that use DST. I must use pytz or something equivalent.

If I'm tempted to use datetime.replace, I need to stop, think hard, and find another way to do it. datetime.replace is almost always the wrong approach, because changing one part of a datetime without consideration of the other parts is almost guaranteed to not do what I expect for some datetimes.

When using Django, be sure USE_TZ = True. If Django emits warnings about naive datetimes being saved in the database, treat them as if they were fatal errors, track them down, and fix them. If I want to, I can even turn them into actual fatal errors; see this Django documentation.

When processing user input, consider whether a datetime's original timezone needs to be preserved, or if it's okay to just store the datetime as UTC. If the original timezone is important, see this post I wrote about how to get and store it.

Conclusion

Working with human times correctly is complicated, unintuitive, and needs a lot of careful attention to detail to get right. Further, some of the oft-given advice, like always working in UTC, can cause problems of its own.

Caktus GroupWhy We Love Wagtail (and You Should, Too)

New clients regularly ask us if we build WordPress sites. When we dig deeper, we generally learn that they’re looking for a user-friendly content management system (CMS) that will allow them to effortlessly publish and curate their site content. As we’ve written about previously, WordPress can be a good fit for simple sites. However, the majority of our clients need a more robust technical solution with customizable content management tools. For the Python-driven web applications that we develop, we love to work with Wagtail.

What is Wagtail?

Wagtail is a Python-driven CMS built on the Django web framework. It has all the features you’d expect from a quality CMS:

  • intuitive navigation and architecture
  • user-friendly content editing tools
  • painless image uploading and editing capabilities
  • straightforward and rapid installation

What Makes Wagtail Different?

From the user’s perspective, Wagtail’s content editor is what sets it apart, and it’s why we really love it. Most content management systems use a single WYSIWYG (“what you see is what you get”) HTML editor for page content. While Wagtail includes a WYSIWYG editor — the RichTextField — it also has the Streamfield, which provides an interface that allows you to create and intermix custom content modules, each designed for a specific type of content.

What does that mean in practice? Rather than wrangling an image around text in the WYSIWYG editor and hoping it displays correctly across devices, you can drop an image into a separate, responsive module, which has a custom data model. In other words:

As a user, you don’t need to customize your content to the capabilities of your CMS. Instead, you customize your CMS to maximize your content.

Let’s Take a Look

Below is a screenshot of the WordPress editing dashboard, with the single HTML content area. You can edit and format text, add an image, and insert an HTML snippet — all the basics.

Screenshot of the WordPress CMS

Now take a look at Wagtail. Each type of content has its own block — that’s Streamfield at work. The icons at the bottom display the developer-defined modules available, which in this case are Heading block, Paragraph block, Image block, Block quote, and Embed block. This list of modules can be extended to include a variety of custom content areas based on your specific website.

Screenshot of the Wagtail CMS

Using the blocks and modules, a web content editor can quickly add a paragraph of text, followed by an image, and then a blockquote to create a beautiful, complete web page. To demonstrate this better, we put together a short video of Streamfields in action.

You can also learn more about Streamfields and other features on the Wagtail website.

Powered by Django

At its core, Wagtail is a Django app, meaning that it seamlessly integrates with other Django and Python applications. This allows near-endless flexibility to extend your project with added functionality. For example, if your application includes complex Python-based data analysis on the backend but you want to easily display output to site visitors, Wagtail is the ideal choice for content management.

The Bottom Line

Wagtail provides content management features that go above and beyond the current abilities of a WordPress site, plus the inherent customization and flexibility of a Django app. We love working with Wagtail because of the clear advantages it provides to our clients and content managers. We highly recommend the Wagtail CMS to all our clients.

Contact us to see if Wagtail would be a good fit for your upcoming project.

Caktus GroupDjango: Recommended Reading

Pictured: Our library of reference books at Caktus cover topics including Django and Python, as well as project management and Agile methodologies.

At Caktus, we believe in continued learning (and teaching). It's important to read up on the latest industry trends and technologies to stay current in order to address our clients' challenges. We even maintain a library in our office for staff use, and we add references frequently. Our team enjoys sharing what they've learned by contributing to online resources, such as the Django Documentation and the Mozilla Developer Network Web Docs. Below is a list (in alphabetical order) of the books, blogs, and other documents that we’ve found to be the most accurate, helpful, and practical for Django development.

Django Documentation

Authors: Various

Recommended by Developer Dmitriy Chukhin

Overview: When Dmitriy first began learning about Django, he went through the official Django tutorial. Then, as a developer, he read through other pieces of documentation that are relevant to his work.

A Valuable Lesson: Dmitriy learned that detailed documentation makes working with a framework significantly easier than trying to figure it out on his own or from other developers’ posts about their errors. The documentation is readable, uses understandable language, and gives useful examples, making Django Documentation a lot friendlier than Dmitriy expected. It encouraged him to continue using it, since other core developers consider it important to make their software usable and well-documented. One thing that’s particularly helpful about the Django documentation is that pages now have a ‘version switcher’ in the bottom right corner of the screen, allowing readers to switch between the versions of Django for a specific feature. Since our projects at Caktus involve using a number of different versions of Django, it’s helpful to switch between the documentation to see when a feature was added, changed, or deprecated. Seeing the documentation on the Django Documentation site also encouraged Dmitriy to thoroughly document the code he writes for people who will work with it in the future.

Why You Should Read This: The Django tutorial is a great place to begin learning about using Django. The reference guide is best for those who are already using Django and need to look up details on how to use forms, views, URLs, and other parts of the Django API. The topic guides provide high-level explanations.

Django User’s Group

Authors: Various

Recommended by Lead Developer and Technical Director Karen Tracey

Overview: The Django User’s Group is a public Google Group that Karen found when she first started using Django in 2006 and ran into some trouble with database tables. She posted her challenges and questions on the Google Group and received a response the same day. She’s been using Django ever since — coincidence?

A Valuable Lesson: Django Users was Karen’s first introduction to the Django community and she learned a great deal from it. It was also her entry into becoming a regular contributing member of the community.

Why You Should Read This: If you have a Django puzzle that you can’t solve, searching the group and (if that fails to yield results) writing up and posting a question is a great way to get a solution. Karen also notes that sometimes it’s not even necessary to post since the act of writing the question in a way others can understand sometimes makes the answer clear! Reading various posts in the group is also a way to see the issues that trip up newcomers, and trying to solve questions by others also provides helpful learning opportunities. Cover of High Performance Django book

High Performance Django

Authors: Peter Baumgartner & Yann Malet

Recommended by Developer Neil Ashton

Overview: High Performance Django, proclaims to “give you a repeatable blueprint for building and deploying fast, scalable Django sites.” Neil first learned about this book from friend and former coworker Jeff Bradberry, who pointed it out as a way to start pushing his Django development skills beyond a firm grasp of the basics.

A Valuable Lesson: Neil learned that making Django perform at scale means keeping the weight off Django itself. The book taught him about making effective use of the high-performance technologies that make up the rest of the stack to respond to browser requests as early and quickly as possible. It taught him that there’s more to building web apps with Django than just Django, and it opened the door to thinking and learning about many other features of the web app development landscape.

Why You Should Read This: This book is ideal for anyone who’s beginning a career in web app development. It’s especially helpful for those with a different background, whether it’s front-end development or something further afield like computational linguistics. It’s easy to lose sight of the forest for the trees as a new web developer, and this book manages to provide you with a feel for the big picture in a surprisingly small number of pages.

Mozilla Developer Network Web Docs

Authors: Various

Recommended by Developer Vinod Kurup

Overview: The Mozilla Developer Network (MDN) Web Docs are a popular resource when it comes to nearly any general web development topic. It’s authored by multiple contributors, and you can be an author, too. Vinod usually visits the site when he’s struggling with a piece of code, and the MDN pops up at the top of his web search results. Caktus especially loves the MDN because we were fortunate to work with Mozilla on the project that powers the MDN.

A Valuable Lesson: Vinod and his team used Vue.js on a recent project, and he learned a lot more about modern Javascript than he needed to know in the past. One specific topic that has confused him was Javascript Promises. Fortunately, the MDN has documentation on using promises and more detailed reference material about the Promise.then(). Those two pieces of documentation cleared up a lot of confusion for Vinod. He also likes how each page of reference documentation includes a browser compatibility section, which helps him to identify whether his code will work in browsers that our clients use.

Why You Should Read This: The MDN provides excellent documentation on every basic front-end technology including HTML, CSS, and Javascript, among others. Since Mozilla is at the forefront of helping to create the specifications for these tools, you can trust that the documentation is authoritative. It’s also constantly being worked on and updated, so you know you’re not getting documentation on a technology that has been deprecated. Finally, and most importantly, the documentation is GOOD! They cover the basic syntax, and always include common usage examples, so that it’s clear how to use the tool. In addition, there are many other gems including tutorials (both basic and advanced) on a wide variety of web development topics.

Towards 14,000 Write Transactions Per Second on my Laptop

Author: Peter Geoghegan

Recommended by CEO Tobias McNulty

Overview: Towards 14,000 write transactions per second on my laptop, is a relatively short blog post that provides an overview of two little-discussed Postgres settings: commit_delay and commit_siblings.

A Valuable Lesson: This post not only provides an overview of commit_delay and commit_siblings, but also an important change to the former that dramatically improved its effectiveness since the release of Postgres 9.3. For database servers that need to handle a lot of writes, the commit_delay setting (which is disabled by default, as of Postgres 11) gives you an efficient way to "group" writes to disk that helps increase overall throughput by sacrificing a small amount of latency. The setting has been instrumental to us at Caktus in optimizing Postgres clusters for a couple of client projects, yet Tobias rarely, if ever, sees it mentioned in more general talks and how-tos on optimizing Postgres.

Why You Should Read This: These settings will change nothing for read-heavy sites/apps (such as a CMS), but if you use Postgres in a write-heavy Django (or other) application, you should learn about and potentially configure these settings to improve the product. Book "Two Scoops of Django"

Two Scoops of Django

Authors: Daniel Roy Greenfeld and Audrey Roy Greenfeld

Recommended by Developer Dan Poirier

Overview: Two Scoops of Django has several editions, and the latest is 1.11 (Dan read edition 1.8). The editions stand the test of time, and the authors go through nearly all facets of Django development. They share what has worked best for them and what to watch out for, so you don't have to learn it all the hard way. The authors’ tagline is, “Making Python and Django as fun as ice cream,” and who doesn’t love ice cream?

A Valuable Lesson: By reading the Django Documentation (referenced earlier in this post), you can learn what each setting does. Then in chapter 5 of Two Scoops, read about a battle-tested scheme for managing different settings and files across multiple environments, from local development to testing servers and production, while protecting your secrets (passwords, keys, etc). Similarly, chapter 19 covers what cases you should and shouldn't use the Django admin for, warns about using list_editable in multi-user environments and gives tips for securing the admin and customizing it.

Why You Should Read It: The great thing about the book is that the chapters stand alone. You can pick it up and read whatever chapter you need. Dan keeps the book handy at his desk, for nearly all his Django projects. The book is not only full of useful information, but almost every page also includes examples or diagrams.

Well Read

We recommend these readings on Django development because they provide valuable insight and learning opportunities. What do you refer to when you need a little help with Django? If you have any recommendations or feedback, please leave them in the comments below.

Caktus GroupImpressed by Devopsdays Charlotte 2019

We have a small two-person Infrastructure Ops team here at Caktus (including myself) so I was excited to go to my first devopsdays Charlotte and be surrounded by Ops people. The event was held just outside of Charlotte, at the Red Ventures auditorium in Indian Land, South Carolina. About 200 people gathered there for two days of talks and open sessions. Devopsdays are held multiple times a year, in various locations around the world. Check out their schedule to see if there will be an event near you.

On Thursday afternoon, Quintessence Anx gave an awesome technical Ignite talk on Sensory Friendly Monitoring. She packed a whole lot of monitoring wisdom into 5 minutes and 20 slides, so I was then looking forward to what she had to say about diversity. She spoke on Unquantified Serendipity: Diversity in Development, and it ended up being my favorite talk.

Devopsdays speaker Quintessence Anx on stage, giving her presentation

Quintessence (pictured right) provided a lot of actionable information and answered many common concerns that people have with diversity. She told the story of how as a junior developer her mentors often told her how to solve problems, while they told her male peers how to find answers. She suggested that mentors give a “hand up, not a hand out” and stressed the importance of being introduced to a mentors network so that the mentee can start building their own networks. I thought that the talk had the right balance between urgency and applicability.

View her slides and a mind map of an earlier version of the talk.

The Friday Keynote was given by Sonja Gupta and Corey Quinn, and was titled Embarrassingly Large Numbers: Salary Negotiation for Humans. It focused on how to upgrade your income by getting a new job. This talk was informative and entertaining, including more f-bombs than all the other presentations combined. Some of the points they made were:

  • interview for jobs you don’t plan to take
  • interview at least once a quarter
  • never take the first offer

They also recognized that negotiation is hard, but you are not rude if you ask for what you’re worth. I was looking forward to this keynote since I recently began following Corey’s newsletter, Last Week in AWS, where he has elevated snark to an art form.

I enjoyed the event, and I am looking forward to attending devopsdays Raleigh in the fall. The next devopsdays Charlotte will take place in 2020.

Caktus GroupSuggestions For Picking Up Old Projects

At Caktus, we work on many projects, some of which are built by us from start to finish, while others are inherited from other sources. Oftentimes, we pick up a project that we either have not worked on in a long time, or haven’t worked on at all, so we have to get familiar with the code and figure out the design decisions that were made by those who developed it (including when the developers are younger versions of ourselves). Moreover, it is a good idea to improve the setup process in each project, so others can have an easier time getting set up in the future. In our efforts to work on such projects, a few things have been helpful both for becoming familiar with the projects more quickly, and for making the same projects easier to pick up in the future.

Getting Set Up

Here are my top tips for getting up to speed on projects:

Documentation

As a perhaps obvious first step, it can be helpful to read through a README or other documentation, assuming that it exists. At Caktus, we write steps for how to get set up locally with each project, and those steps can be helpful when getting familiar with the major aspects of a project. If there is no README or obvious documentation, we look for comments within files. The first few lines may document the functionality of the rest of the file. One thing that can also be beneficial for future developers is either adding to or creating documentation for getting set up on a project as you get set up yourself. Though you likely don’t have a lot of project knowledge at this point, it can be helpful to write down some documentation, even notes like ‘installing node8.0 here works as of 2019-01-01’.

Project Structure

If you're working on a Django project, for instance, you can look for the files that come with the project: a urls.py, models.py files, views.py files, and other such files, to illuminate the functionality that the project provides and the pages that a user can visit. For non-Django projects, it can still be useful to look at the directories within the project and try to make sense of the different parts of the application. Even large Django projects with many models.py files can provide helpful information on what is happening in the project by looking at the directories. As an example, we once began working on a project with a few dozen models.py files, each with a number of models in it. Since reading through each models.py file wasn’t a feasible option, it was helpful to see which directory each models.py file was in, so that we could see the general structure of the project.

Tests

In terms of getting familiar with the project, tests (if they exist) are a great place to look, since they provide data for how the different parts of the project are supposed to work. For example, tests for Django views give examples of what data might be sent to the server, and what should happen when such data is sent there. Any test data files (for example, a test XML file for a project that handles XML) can also provide information about what the code should be handling. If we know that a project needs to accept a new XML structure, seeing the old XML structure can save a lot of time when figuring out how the code works.

Improving the Code

Getting familiar with the code should also mean making the code friendlier for future developers. Beware that future developers may, in fact, be us in a few years, and it’s much friendlier and more efficient to start working on a project that is well-documented and well-tested, than a project that is neither. While all the code doesn’t have to be improved all at once, it is possible to start somewhere, even if it means adding comments and tests for a short function. With time, the codebase can be improved and be easier to work with.

Refactoring

Oftentimes when beginning to look at a new (or unfamiliar) project, we get the urge to begin by refactoring code to be more efficient or more readable, or just to be more modern. However, it has been more helpful to resist this urge at the beginning until we understand the project better, since having working code is better than not working code. Also, there are often good reasons for why things were written a certain way, and changing them may have consequences that we are not currently aware of, especially if there aren’t sufficient tests in the project. It may be helpful to add comments to the code as we figure out how things work, and instead focus on tests, leaving refactoring for a future time.

Testing

Testing is a great place to start improving the codebase. If tests already exist, then working on a feature or bugfix should include improving the tests. If tests don’t exist, then working on a feature or bugfix should include adding relevant tests. Having tests in place will also make any refactoring work easier to do, since they can be used to check what, if anything, broke during the refactoring.

Documentation

As mentioned above, documentation makes starting to work on a project much easier. Moreover, working through getting the project set up is a great time to either add or improve a README file or other setup documentation. As you continue working on the code, you can continue to make documentation improvements along the way.

Conclusion

Having made these recommendations, I should also acknowledge that we are often faced with various constraints (time, resources, or scope) when working on client projects. While these constraints are real, best practices should still be followed, and changes can be made while working on the code, such as adding tests for new functionality, or improving comments, documentation, and tests for features that already exist and are being changed. Doing so will help any future developers to understand the project and get up to speed on it more efficiently, and this ultimately saves clients time and money.

Caktus GroupCommunity & Caktus: Charitable Giving, Winter 2018

Pictured: Developer Dan Poirier is an advocate for WCPE and a volunteer announcer. WCPE is one of the recipients of our charitable giving program.

We are pleased to continue serving the North Carolina community at-large through our semi-annual Charitable Giving Program. Twice a year we solicit proposals from our team to contribute to a variety of non-profit organizations. With this program, we look to support groups in which Cakti are involved or that have impacted their lives in some way. This gives Caktus a chance to support our own employees as well as the wider community. For winter 2018, we were pleased to donate to the following organizations:

ARTS North Carolina

ARTS North Carolina “calls for equity and access to the arts for all North Carolinians, unifies and connects North Carolina’s arts communities, and fosters arts leadership.” Our Account Executive Tim Scales has been a board member and supporter of this organization for several years.

Learn more about ARTS NC.

Museum of Life and Science

The Museum of Life and Science’s mission is to “create a place of lifelong learning where people, from young child to senior citizen, embrace science as a way of knowing about themselves, their community, and their world.” Our Chief Business Development Officer Ian Huckabee is a current museum board member and sits on the executive and finance committees.

Learn more about the Museum of Life and Science.

Sisters’ Voices

Developer Vinod Kurup’s daughter and niece, who are members of the Sisters' Voices choir. Sisters’ Voices is a “choral community of girls, within which each is known and supported while being challenged to grow as a musician and as a person.” Caktus Developer Vinod Kurup’s niece Vishali and daughter Anika (pictured from left to right) are members of the Sisters' Voices choir. Vinod believes that being a member of the choir has “enriched their lives and taught them the love of music, and they developed an appreciation of their own voice.”

Learn more about Sisters’ Voices.

WCPE Radio

WCPE is a non-commercial, independent, listener-supported radio station, dedicated to excellence in classical music. Broadcasting includes service to the Piedmont area, Raleigh, Durham, and Chapel Hill on 89.7 FM. Their facility is staffed 24 hours a day, 7 days a week.

“WCPE gained the distinction of being the only public radio station in the eastern half of North Carolina to stay on the air during Hurricane Fran in 1996, acting as an Emergency Broadcast Relay station, providing weather information directly from the National Weather Service.” Caktus Developer Dan Poirier is an advocate for WCPE and has been listening and donating for years. Last year, he trained as a volunteer announcer and now commutes 90 miles round-trip, 2-3 times a month to work a shift on the air.

Learn more about WCPE.

The Giving Continues!

Caktus’ next round of giving will be June 2019, and we look forward to supporting another group of organizations that are committed to enriching the lives of North Carolinians!

Caktus GroupA Guide To Creating An API Endpoint With Django Rest Framework

As part of our work to make sharp web apps at Caktus, we frequently create API endpoints that allow other software to interact with a server. Oftentimes this means using a frontend app (React, Vue, or Angular), though it could also mean connecting some other piece of software to interact with a server. A lot of our API endpoints, across projects, end up functioning in similar ways, so we have become efficient at writing them, and this blog post gives an example of how to do so.

First, a few resources: read more about API endpoints in this previous blog post and review documentation on Django Rest Framework.

A typical request for an API endpoint may be something like: 'the front end app needs to be able to read, create, and update companies through the API'. Here is a summary of creating a model, a serializer, and a view for such a scenario, including tests for each part:

Part 1: Model

For this example, we’ll assume that a Company model doesn’t currently exist in Django, so we will create one with some basic fields:

# models.py
from django.db import models


class Company(models.Model):
    name = models.CharField(max_length=255)
    description = models.TextField(blank=True)
    website = models.URLField(blank=True)
    street_line_1 = models.CharField(max_length=255)
    street_line_2 = models.CharField(max_length=255, blank=True)
    city = models.CharField(max_length=80)
    state = models.CharField(max_length=80)
    zipcode = models.CharField(max_length=10)

    def __str__(self):
        return self.name

Writing tests is important for making sure our app works well, so we add one for the __str__() method. Note: we use the factory-boy and Faker libraries for creating test data:

# tests/factories.py
from factory import DjangoModelFactory, Faker

from ..models import Company


class CompanyFactory(DjangoModelFactory):
    name = Faker('company')
    description = Faker('text')
    website = Faker('url')
    street_line_1 = Faker('street_address')
    city = Faker('city')
    state = Faker('state_abbr')
    zipcode = Faker('zipcode')

    class Meta:
        model = Company
# tests/test_models.py
from django.test import TestCase

from ..models import Company
from .factories import CompanyFactory


class CompanyTestCase(TestCase):
    def test_str(self):
        """Test for string representation."""
        company = CompanyFactory()
        self.assertEqual(str(company), company.name)

With a model created, we can move on to creating a serializer for handling the data going in and out of our app for the Company model.

Part 2: Serializer

Django Rest Framework uses serializers to handle converting data between JSON or XML and native Python objects. There are a number of helpful serializers we can import that will make serializing our objects easier. The most common one we use is a ModelSerializer, which conveniently can be used to serialize data for Company objects:

# serializers.py
from rest_framework.serializers import ModelSerializer

from .models import Company

class CompanySerializer(ModelSerializer):
    class Meta:
        model = Company
        fields = (
            'id', 'name', 'description', 'website', 'street_line_1', 'street_line_2',
            'city', 'state', 'zipcode'
        )

That is all that’s required for defining a serializer, though a lot more customization can be added, such as:

  • outputting fields that don’t exist on the model (maybe something like is_new_company, or other data that can be calculated on the backend)
  • custom validation logic for when data is sent to the endpoint for any of the fields
  • custom logic for creates (POST requests) or updates (PUT or PATCH requests)

It’s also beneficial to add a simple test for our serializer, making sure that the values for each of the fields in the serializer match the values for each of the fields on the model:

# tests/test_serializers.py
from django.test import TestCase

from ..serializers import CompanySerializer
from .factories import CompanyFactory


class CompanySerializer(TestCase):
    def test_model_fields(self):
        """Serializer data matches the Company object for each field."""
        company = CompanyFactory()
        for field_name in [
            'id', 'name', 'description', 'website', 'street_line_1', 'street_line_2',
            'city', 'state', 'zipcode'
        ]:
            self.assertEqual(
                serializer.data[field_name],
                getattr(company, field_name)
            )

Part 3: View

The view is the layer in which we hook up a URL to a queryset, and a serializer for each object in the queryset. Django Rest Framework again provides helpful objects that we can use to define our view. Since we want to create an API endpoint for reading, creating, and updating Company objects, we can use Django Rest Framework mixins for such actions. Django Rest Framework does provide a ModelViewSet which by default allows handling of POST, PUT, PATCH, and DELETE requests, but since we don’t need to handle DELETE requests, we can use the relevant mixins for each of the actions we need:

# views.py
from rest_framework.mixins import (
    CreateModelMixin, ListModelMixin, RetrieveModelMixin, UpdateModelMixin
)
from rest_framework.viewsets import GenericViewSet

from .models import Company
from .serializers import CompanySerializer


class CompanyViewSet(GenericViewSet,  # generic view functionality
                     CreateModelMixin,  # handles POSTs
                     RetrieveModelMixin,  # handles GETs for 1 Company
                     UpdateModelMixin,  # handles PUTs and PATCHes
                     ListModelMixin):  # handles GETs for many Companies

      serializer_class = CompanySerializer
      queryset = Company.objects.all()

And to hook up our viewset to a URL:

# urls.py
from django.conf.urls import include, re_path
from rest_framework.routers import DefaultRouter
from .views import CompanyViewSet


router = DefaultRouter()
router.register(company, CompanyViewSet, base_name='company')

urlpatterns = [
    re_path('^', include(router.urls)),
]

Now we have an API endpoint that allows making GET, POST, PUT, and PATCH requests to read, create, and update Company objects. In order to make sure it works just as we expect, we add some tests:

# tests/test_views.py
from django.test import TestCase
from django.urls import reverse
from rest_framework import status

from .factories import CompanyFactory, UserFactory


class CompanyViewSetTestCase(TestCase):
      def setUp(self):
          self.user = UserFactory(email='testuser@example.com')
          self.user.set_password('testpassword')
          self.user.save()
          self.client.login(email=self.user.email, password='testpassword')
          self.list_url = reverse('company-list')

      def get_detail_url(self, company_id):
          return reverse(self.company-detail, kwargs={'id': company_id})

      def test_get_list(self):
          """GET the list page of Companies."""
          companies = [CompanyFactory() for i in range(0, 3)]

          response = self.client.get(self.list_url)

          self.assertEqual(response.status_code, status.HTTP_200_OK)
          self.assertEqual(
              set(company['id'] for company in response.data['results']),
              set(company.id for company in companies)
          )

      def test_get_detail(self):
          """GET a detail page for a Company."""
          company = CompanyFactory()
          response = self.client.get(self.get_detail_url(company.id))
          self.assertEqual(response.status_code, status.HTTP_200_OK)
          self.assertEqual(response.data['name'], company.name)

      def test_post(self):
          """POST to create a Company."""
          data = {
              'name': 'New name',
              'description': 'New description',
              'street_line_1': 'New street_line_1',
              'city': 'New City',
              'state': 'NY',
              'zipcode': '12345',
          }
          self.assertEqual(Company.objects.count(), 0)
          response = self.client.post(self.list_url, data=data)
          self.assertEqual(response.status_code, status.HTTP_201_CREATED)
          self.assertEqual(Company.objects.count(), 1)
          company = Company.objects.all().first()
          for field_name in data.keys():
                self.assertEqual(getattr(company, field_name), data[field_name])

      def test_put(self):
          """PUT to update a Company."""
          company = CompanyFactory()
          data = {
              'name': 'New name',
              'description': 'New description',
              'street_line_1': 'New street_line_1',
              'city': 'New City',
              'state': 'NY',
              'zipcode': '12345',
          }
          response = self.client.put(
              self.get_detail_url(company.id),
              data=data
          )
          self.assertEqual(response.status_code, status.HTTP_200_OK)

          # The object has really been updated
          company.refresh_from_db()
          for field_name in data.keys():
              self.assertEqual(getattr(company, field_name), data[field_name])

      def test_patch(self):
          """PATCH to update a Company."""
          company = CompanyFactory()
          data = {'name': 'New name'}
          response = self.client.patch(
              self.get_detail_url(company.id),
              data=data
          )
          self.assertEqual(response.status_code, status.HTTP_200_OK)

          # The object has really been updated
          company.refresh_from_db()
          self.assertEqual(company.name, data['name'])

      def test_delete(self):
          """DELETEing is not implemented."""
          company = CompanyFactory()
          response = self.client.delete(self.get_detail_url(company.id))
          self.assertEqual(response.status_code, status.HTTP_405_METHOD_NOT_ALLOWED)

As the app becomes more complicated, we add more functionality (and more tests) to handle things like permissions and required fields. For a quick way to limit permissions to authenticated users, we add the following to our settings file:

# settings file
REST_FRAMEWORK = {
    'DEFAULT_PERMISSION_CLASSES': ('rest_framework.permissions.IsAuthenticated',)
}

And add a test that only permissioned users can access the endpoint:

# tests/test_views.py
from django.test import TestCase
from django.urls import reverse
from rest_framework import status

from .factories import CompanyFactory, UserFactory


class CompanyViewSetTestCase(TestCase):

      ...

      def test_unauthenticated(self):
          """Unauthenticated users may not use the API."""
          self.client.logout()
          company = CompanyFactory()

          with self.subTest('GET list page'):
              response = self.client.get(self.list_url)
              self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)

          with self.subTest('GET detail page'):
              response = self.client.get(self.get_detail_url(company.id))
              self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)

          with self.subTest('PUT'):
              data = {
                  'name': 'New name',
                  'description': 'New description',
                  'street_line_1': 'New street_line_1',
                  'city': 'New City',
                  'state': 'NY',
                  'zipcode': '12345',
              }
              response = self.client.put(self.get_detail_url(company.id), data=data)
              self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
              # The company was not updated
              company.refresh_from_db()
              self.assertNotEqual(company.name, data['name'])

          with self.subTest('PATCH):
              data = {'name': 'New name'}
              response = self.client.patch(self.get_detail_url(company.id), data=data)
              self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
              # The company was not updated
              company.refresh_from_db()
              self.assertNotEqual(company.name, data['name'])

          with self.subTest('POST'):
              data = {
                  'name': 'New name',
                  'description': 'New description',
                  'street_line_1': 'New street_line_1',
                  'city': 'New City',
                  'state': 'NY',
                  'zipcode': '12345',
              }
              response = self.client.put(self.list_url, data=data)
              self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)

      with self.subTest('DELETE'):
              response = self.client.delete(self.get_detail_url(company.id))
              self.assertEqual(response.status_code, status.HTTP_403_FORBIDDEN)
              # The company was not deleted
              self.assertTrue(Company.objects.filter(id=company.id).exists())

As our project grows, we can edit these permissions, make them more specific, and continue to add more complexity, but for now, these are reasonable defaults to start with.

Conclusion

Adding an API endpoint to a project can take a considerable amount of time, but with the Django Rest Framework tools, it can be done more quickly, and be well-tested. Django Rest Framework provides helpful tools that we’ve used at Caktus to create many endpoints, so our process has become a lot more efficient, while still maintaining good coding practices. Therefore, we’ve been able to focus our efforts in other places in order to expand our abilities to grow sharp web apps.

Caktus Group7 Conferences We’re Looking Forward To

Above: The Internet Summit in Raleigh is one of the local conferences we recommend attending. (Photo by Ian Huckabee.)

At Caktus, we strongly believe in professional development and continued learning. We encourage our talented team to stay up to date with industry trends and technologies. During 2018, Cakti attended a number of conferences around the country. Below is a list (in alphabetical order) of the ones we found the most helpful, practical, and interesting. We look forward to attending these conferences again, and if you get the chance, we highly recommend that you check them out as well.

All Things Open

Recommended by Account Executive Tim Scales

Next Conference Location: Raleigh

Next Conference Date: October 13 - 15, 2019

All Things Open is a celebration and exploration of open source technology and its impact. Topics range from nuts and bolts sessions on database design and OSS languages to higher-level explorations of current trends like machine learning with Python and practical blockchain applications.

The annual conference is heavily publicized in the open source community, of which Caktus is an active member. All Things Open attracts open source thought leaders from across industries, and it’s a valuable learning experience for both non-technical newbies and expert developers to gain insight into a constantly evolving field.

Tim particularly enjoyed the session by expert Jordan Kasper of the Defence Digital Service about his efforts to implement open source development at the US Department of Defense with the code.mil project. It was an enlightening look at the use of open source in the federal government, including the challenges and opportunities involved.

Image of attendees at DjangoCon Above: Hundreds of attendees at DjangoCon 2018. (Photo by Bartek Pawlik.)

DjangoCon

Recommended by Developer David Ray

Next Conference Location: TBD

Next Conference Date: September 22 - 27, 2019

DjangoCon is the preeminent conference for Django users, and as early proponents of Django, Caktus has sponsored the conference since 2009. We always enjoy the annual celebration of all things Django, featuring presentations and workshops from veteran Djangonauts and enthusiastic beginners.

David enjoyed Russell Keith-Magee’s informative and hilarious talk on navigating the complexities of time zones in computer programming. It’s rare for a conference talk to explore both the infinite complexity of a topic while also providing concrete tools to resolve the complexity as Russell did. Attending DjangoCon is a must for Django newbies and veterans alike. It’s an opportunity to sharpen your craft and explore the possibilities of our favorite framework in a fun, collaborative, and supportive environment.

Read more in our conference recap post and Django Depends on You: A Takeaway from DjangoCon.

Digital PM Summit

Recommended by Lead Project Manager Gannon Hubbard

Next Conference Location: Orlando

Next Conference Date: October 20 - 22, 2019

The Digital PM Summit is an annual gathering of project managers, which includes presentations, workshops, and breakout sessions focused on managing digital projects. The event is organized by the Bureau of Digital and was held in Memphis in 2018. Cakti attended and spoke at the conference in previous years; check out the highlights of Marketing Content Manager Elizabeth Michalka’s talk on investing in relationships from the 2016 conference.

The summit provides a unique opportunity for attendees to network and learn from each other. Indeed, the biggest draw for Gannon was the chance to be in the same room as so many other project managers. The project management (PM) role encompasses an array of activities — planning and defining scope, activity planning and sequencing, resource planning, time and cost estimating, and reporting progress, to name a few. There are professional books and months-long certificates to teach this knowledge, but nothing is better than being able to ask, “Hey, have you run into this before?” The ability to compare notes with PMs he doesn’t work with every day is invaluable, and the three most impactful sessions for Gannon were:

  • Rachel Gertz’s talk “Static to Signal: A Lesson in Alignment”
  • Meghan McInerney’s talk on “The Ride or Die PM”
  • Lynn Winter’s talk “PM Burnout: The Struggle Is Real”

Each of these talks seemed to build on one another. Rachel Gertz set the stage with her keynote by pointing out that the project manager is the nexus of a project. The course of a project is determined by countless small adjustments, and the PM is the one who makes those adjustments. Nine times out of 10, when a project fails, it’s because of something the PM did or didn’t do.

Image of presentation from the Digital PM Summit Also a keynote, Meghan McInerney’s talk (pictured) identified the primary attributes of a PM who’s at the top of their game. They’re reliable, adaptable, and a strategic partner for their clients. When you hit this ideal, you’re the one asking stakeholders and team members hard questions about “Should we?” — and as the PM, you’re the only one who can be counted on to ask those questions. Lynn Winter’s lightning talk cautions against giving too much of yourself over to this, though. As she pointed out, the role is often made up of all the tasks that no one else wants to do, and there’s a good chance at least a few of those tasks will take a toll on you. You have to make space for yourself if you’re going to be effective.

Internet Summit

Recommended by Chief Business Development Officer Ian Huckabee

Next Conference Location: Raleigh

Next Conference Date: November 13 - 14, 2019

The Internet Summit is a marketing conference that attracts impressive speakers and provides opportunities to stay on top of digital marketing trends and technologies that can help drive growth and success. Internet Summit conferences are organized by Digital Summit and are also held in varies cities, in addition to Raleigh.

Digital marketing is constantly changing, so it’s important to stay current. At Internet Summit, Ian heard valuable information from dozens of the country’s top digital marketers who shared current trends and best practices for using real-time data to build intelligence and improve customer interaction and engagement. Keynote speakers included marketing guru, author, and former dot-com business exec Seth Godin and founder of The Onion Scott Dikkers.

These summits are key to staying on trend with how people want to be reached, and the execution strategies are, in most cases, proven. For instance, behavioral targeting tools have evolved to the point where ABM (account-based marketing) can be extremely effective when executed properly. Also meaningful for Ian was the talk Marketing Analytics: Get the Insights You Need Faster, by Matt Hertig. Ian walked away from the workshop with tangible advice on managing large volumes of data to provide meaningful insights and analysis more quickly.

JupyterDay in the Triangle

Recommended by Chief Technical Officer and Co-founder Colin Copeland

Next Conference Location: Chapel Hill

Next Conference Date: TBD

In 2018, Colin was excited to attended JupyterDay in the Triangle because it provided a chance to learn from the greater Python/Jupyter community, and it occurred around the corner in Chapel Hill. He especially enjoyed the following presentations:

Matthew McCormick’s talk on Interactive 3D and 2D Image Visualization for Jupyter, which demonstrated how notebooks can be used to explore very large datasets and scientific imaging data Joan Pharr’s talk on Learning in Jupyter focused on how her company uses Jupyter notebooks for SME Training and onboarding new members to their team

Ginny Gezzo during her presentationColin’s favorite presentation was Have Yourself a Merry Little Notebook by Ginny Gezzo of IBM (pictured). She focused on using Python notebooks to solve https://adventofcode.com/ puzzles. The talk highlighted the value of notebooks as a tool for exploring and experimenting with problems that you don’t know how to solve. Researching solutions can easily lead in many directions, and it’s valuable to have a medium to record what you did and how you got there, and you can easily share your results with other team members.

Colin has worked with Jupyter notebooks, and sees great value in their utility. For example, Caktus used them on a project with Open Data Policing to inspect merging multiple sheets of an Excel workbook in Pandas (see the project in github).

Caktus booth at PyCon 2018

Above: The Caktus team and booth during PyCon 2018.

Pycon

Recommended by Technology Support Specialist Scott Morningstar

Next Conference Location: Cleveland

Next Conference Date: May 1- 9, 2019

Cakti regularly attend and sponsor PyCon. It’s the largest annual gathering of Python users and developers. The next conference will take place in Cleveland, OH, in May 2019.

The event also includes an “unconference” that runs in parallel of the scheduled talks. Scott especially enjoyed these open sessions during the 2018 conference. Pycon dedicates space to open sessions every year. These are informal, community-driven meetings where anybody can post a topic, with a time and place to gather. The open sessions that Scott attended covered a wide range of topics from using Python to control CNC Milling machines to how reporters can use encryption to protect sources. He also enjoyed a session on Site Reliability Engineering (SRE), which included professionals from Intel, Google, and Facebook who spoke about how they managed infrastructure at scale.

See our full recap of PyCon 2018, and our must-see talks series:

TestBash

Recommended by Quality Assurance Analyst Gerald Carlton

Next Conference Location: Varies

Next Conference Date: Multiple

TestBash was so good in 2017, that Gerald decided to attend again in 2018. The conferences are organized nationally and internationally by the Ministry of Testing, and in 2018, Gerald attended the event in San Francisco. Gerald originally learned about the event on a QA Testing forum called The Club. Read about what he loved at the 2017 conference. Cookie with the Test Bash logo

TestBash is a single-track conference providing a variety of talks that cover all areas of testing. Talks range from topics such as manual testing, automation, and machine learning, to less technical topics including work culture and quality.

One particularly interesting talk was given by Paul Grizzaffi, a Principal Automation Architect at Magenic. He declared automation development is like software development. Paul talked about how the same principles used when developing features for a website can also be used when building out the automation that tests said website. Just like code is written to build a website, code is also written to create the automated scripts that run to test the website, therefore there is a valid argument for treating automation as software development. The talk highlighted the point that sometimes automation is seen as an extra tool, but it’s actually something that we build to perform a task. So when we think about it, it’s not that different from the development process one would go through when developing a new website. Paul’s talk is available on The Dojo (with a free membership), and you can read more on his blog.

TestBash provides practical information that attendees can learn and take back to their teams to implement. Attendees not only learn from the speakers but also from each other by sharing their challenges and how they overcame them. It’s also a positive environment for networking and building friendships. Gerald met people at the conference who he’s stayed in touch with and who provide a lasting sounding board.

Worth Going Again

We recommend these conferences and look forward to attending them again because they provide such valuable learning and networking opportunities.

What conferences do you plan to attend this year? If you have any recommendations, please leave them in the comments below.

Caktus GroupThe Secret Lives of Cakti (Part 2)

Pictured from left: Our musically inclined Cakti, Dane Summers, Dan Poirier, and Ian Huckabee.

The first installment of the secret lives of Cakti highlighted some colorful extracurriculars (including rescuing cats, running endurance events, and saving lives). This time, we’re taking a look at our team’s unexpected musical talents.

If you Google musicians and programming, you’ll find dozens of posts exploring the correlation between musical talent and programming expertise. Possible factors include musicians’ natural attention to detail, their trained balance between analysis and creativity, and their comfort with both solitary focus and close collaboration.

Cakti are no exception to this, and creative talent runs deep across our team. Here are a few of our musical colleagues.

Dane Summers with his fretless banjo.

Appalachian Picker Dane Summers

Contract programmer Dane is inspired by old-time Appalachian music as both a banjo player and flat foot clogger. After ten years of learning to play, he’s managed to accumulate four banjos, but his favorite (and the only currently-functional one) is a fretless that he plays in the traditional Round Peak style. He’s working up to singing while he plays, at which point we hope he'll do an in-office concert.

The Multi-Talented Dan Poirier

Our sharp developer Dan has multiple musical passions. As a singer, he lends his baritone to Voices, a distinguished community chorus in Chapel Hill. You can also hear him a couple of times a month on WCPE, the Classical Station, as an announcer for Weekend Classics. Rumor has it that he’s also a dab hand at the ukulele, though until he shows off his talents at the office we won’t know for sure.

Blues Guitarist Ian Huckabee

Holding the distinction as the only Caktus team member to jam with Harry Connick, Jr., our chief business development officer Ian has played blues guitar since he was 12 years old. He also started his professional career in the music business, managing the NYC recording studios for Sony Music Entertainment. His current musical challenge is mastering Stevie Ray Vaughan’s cover of Little Wing.

Waiting for the Band to Get Together

No word yet on whether Dan, Dane, and Ian are planning to start a Caktus band, but we’ll keep you posted. If they do, they’ll have more talent to draw from: our team also includes an opera singer, multiple guitarists, a fiddle player, and others.

Want to get to know us better? Drop us a line.

Caktus GroupHow to Use Django Bulk Inserts for Greater Efficiency

It's been awhile since we last discussed bulk inserts on the Caktus blog. The idea is simple: if you have an application that needs to insert a lot of data into a Django model — for example a background task that processes a CSV file (or some other text file) — it pays to "chunk" those updates to the database so that multiple records are created through a single database operation. This reduces the total number of round-trips to the database, something my colleague Dan Poirier discussed in more detail in the post linked above.

Today, we use Django's Model.objects.bulk_create() regularly to help speed up operations that insert a lot of data into a database. One of those projects involves processing a spreadsheet with multiple tabs, each of which might contain thousands or even tens of thousands of records, some of which might correspond to multiple model classes. We also need to validate the data in the spreadsheet and return errors to the user as quickly as possible, so structuring the process efficiently helps to improve the overall user experience.

While it's great to have support for bulk inserts directly in Django's ORM, the ORM does not provide much assistance in terms of managing the bulk insertion process itself. One common pattern we found ourselves using for bulk insertions was to:

  1. build up a list of objects
  2. when the list got to a certain size, call bulk_create()
  3. make sure any objects remaining (i.e., which might be fewer than the chunk size of prior calls to bulk_create()) are inserted as well

Since for this particular project we needed to repeat the same logic for a number of different models in a number of different places, it made sense to abstract that into a single class to handle all of our bulk insertions. The API we were looking for was relatively straightforward:

  • Set bulk_mgr = BulkCreateManager(chunk_size=100) to create an instance of our bulk insertion helper with a specific chunk size (the number of objects that should be inserted in a single query)
  • Call bulk_mgr.add(unsaved_model_object) for each model instance we needed to insert. The underlying logic should determine if/when a "chunk" of objects should be created and does so, without the need for the surrounding code to know what's happening. Additionally, it should handle objects from any model class transparently, without the need for the calling code to maintain separate object lists for each model.
  • Call bulk_mgr.done() after adding all the model objects, to insert any objects that may have been queued for insertion but not yet inserted.

Without further ado, here's a copy of the helper class we came up with for this particular project:

from collections import defaultdict
from django.apps import apps


class BulkCreateManager(object):
    """
    This helper class keeps track of ORM objects to be created for multiple
    model classes, and automatically creates those objects with `bulk_create`
    when the number of objects accumulated for a given model class exceeds
    `chunk_size`.
    Upon completion of the loop that's `add()`ing objects, the developer must
    call `done()` to ensure the final set of objects is created for all models.
    """

    def __init__(self, chunk_size=100):
        self._create_queues = defaultdict(list)
        self.chunk_size = chunk_size

    def _commit(self, model_class):
        model_key = model_class._meta.label
        model_class.objects.bulk_create(self._create_queues[model_key])
        self._create_queues[model_key] = []

    def add(self, obj):
        """
        Add an object to the queue to be created, and call bulk_create if we
        have enough objs.
        """
        model_class = type(obj)
        model_key = model_class._meta.label
        self._create_queues[model_key].append(obj)
        if len(self._create_queues[model_key]) >= self.chunk_size:
            self._commit(model_class)

    def done(self):
        """
        Always call this upon completion to make sure the final partial chunk
        is saved.
        """
        for model_name, objs in self._create_queues.items():
            if len(objs) > 0:
                self._commit(apps.get_model(model_name))

You can then use this class like so:

import csv

with open('/path/to/file', 'rb') as csv_file:
    bulk_mgr = BulkCreateManager(chunk_size=20)
    for row in csv.reader(csv_file):
        bulk_mgr.add(MyModel(attr1=row['attr1'], attr2=row['attr2']))
    bulk_mgr.done()

I tried to simplify the code here as much as possible for the purposes of this example, but you can obviously expand this as needed to handle multiple model classes and more complex business logic. You could also potentially put bulk_mgr.done() in its own finally: or except ExceptionType: block, however, you should be careful not to write to the database again if the original exception is database-related.

Another useful pattern might be to design this as a context manager in Python. We haven't tried that yet, but you might want to.

Good luck with speeding up your Django model inserts, and feel free to post below with any questions or comments!

Caktus GroupCaktus Blog: Top 18 Posts of 2018

In 2018, we published 44 posts on our blog, including technical how-to’s, a series on UX research methods, web development best practices, and tips for project management. Among all those posts, 18 rose to the top of the popularity list in 2018.

Most Popular Posts of 2018

  1. Creating Dynamic Forms with Django: Our most popular blog post delves into a straightforward approach to creating dynamic forms.

  2. Make ALL Your Django Forms Better: This post also focuses on Django forms. Learn how to efficiently build consistent forms, across an entire website.

  3. Django vs WordPress: How to decide?: Once you invest in a content management platform, the cost to switch later may be high. Learn about the differences between Django and WordPress, and see which one best fits your needs.

  4. Basics of Django Rest Framework: Django Rest Framework is a library which helps you build flexible APIs for your project. Learn how to use it, with this intro post.

  5. How to Fix your Python Code's Style: When you inherit code that doesn’t follow your style preferences, fix it quickly with the instructions in this post. Woman typing on a laptop.

  6. Filtering and Pagination with Django: Learn to build a list page that allows filtering and pagination by enhancing Django with tools like django_filter.

  7. Better Python Dependency Management with pip-tools: One of our developers looked into using pip-tools to improve his workflow around projects' Python dependencies. See what he learned with pip-tools version 2.0.2.

  8. Types of UX Research: User-centered research is an important part of design and development. In this first post in the UX research series, we dive into the different types of research and when to use each one.

  9. Outgrowing Sprints: A Shift from Scrum to Kanban: Caktus teams have used Scrum for over two years. See why one team decided to switch to Kanban, and the process they went through.

  10. Avoiding the Blame Game in Scrum: The words we use, and the tone in which we use them, can either nurture or hinder the growth of Scrum teams. Learn about the importance of communicating without placing blame.

  11. What is Software Quality Assurance?: A crucial but often overlooked aspect of software development is quality assurance. Find out more about its value and why it should be part of your development process.

  12. Quick Tips: How to Find Your Project ID in JIRA Cloud: Have you ever created a filter in JIRA full of project names and returned to edit it, only to find all the project names replaced by five-digit numbers with no context? Learn how to find the project in both the old and new JIRA experience.

  13. UX Research Methods 2: Analyzing Behavior: Learn about UX research methods best suited to understand user behavior and its causes.

  14. UX Research Methods 3: Evaluating What Is: One set of techniques included in UX research involves evaluating the landscape and specific instances of existing user experience. Learn more about competitive landscape review.

  15. Django or Drupal for Content Management: Which Fits your Needs?: If you’re building or updating a website, you should integrate a content management system (CMS). See the pros and cons of Django and Drupal, and learn why we prefer Django.

  16. 5 Scrum Master Lessons Learned: Whether your team is new to Scrum or not, check out these lessons learned. Some are practical, some are abstract, and some are helpful reminders like “Stop being resistant to change, let yourself be flexible."

  17. Add Value To Your Django Project With An API: This post for business users and beginning coders outlines what an API is and how it can add value to your web development project.

  18. Caktus Blog: Best of 2017: How appropriate that the last post in this list is about our most popular posts from the previous year! So, when you’ve read the posts above, check out our best posts from 2017.

Thank You for Reading Our Blog

We look forward to giving you more content in 2019, and we welcome any questions, suggestions, or feedback. Simply leave a comment below.

Caktus GroupMy New Year’s Resolution: Work Less to Code Better

You may look at my job title (or picture) and think, “Oh, this is easy, he’s going to resolve to stand up at his desk more.” Well, you’re not wrong, that is one of my resolutions, but I have an even more important one. I, Jeremy Gibson, resolve to do less work in 2019. You’re probably thinking that it’s bold to admit this on my employer’s blog. Again, you’re not wrong, but I think I can convince them that the less work I do, the more clear and functional my code will become. My resolution has three components.

1) I will stop using os.path to do path manipulations and will only use pathlib.Path on any project that uses Python 3.4+

I acknowledge that pathlib is better than me at keeping operating system eccentricities in mind. It is also better at keeping my code DRYer and more readable. I will not fight that.

Let's take a look at an example that is very close to parity. First, a simple case using os.path and pathlib.

  # Opening a file the with os.path
  import os

  p = 'my_file.txt'

  if not os.path.exists(pn) : open(pn, 'a')

  with open(pn) as fh:
    # Manipulate

Next, pathlib.Path

  # Opening a file with Path

  from pathlib import Path

  p = Path("my_file.txt")

  if not p.exists() : p.touch()

  with p.open() as fh:
    # Manipulate

This seems like a minor improvement, if any at all, but hear me out. The pathlib version is more internally consistent. Pathlib sticks to its own idiom, whereas os.path must step outside of itself to accomplish path related tasks like file creation. While this might seem minor, not having to code switch to accomplish a task can be a big help for new developers and veterans, too.

Not convinced by the previous example? Here’s a more complex example of path work that you might typically run across during development — validating a set of files in one location and then moving them to another location, while making the code workable over different operating systems.

With os.path

  import os
  from shutil import move

  p_source = os.path.join(os.curdir(), "my", "source", "path")
  p_target = os.path.join("some", "target", "path")

  for root, dirs, files in os.walk(p_source):
    for f in files:
      if f.endswith(".tgz"):
        # Validate
        if valid:
          shutil.move(os.path.join(root,f), p_target)

With pathlib

  from pathlib import Path

  # pathlib translates path separators
  p_source = Path().cwd() / "my" / "source" / "path"
  p_target = Path("some/target/path")

  for pth in p_source.rglob("*.tgz"):
    # Validate
    if valid:
      p_target = p_target / pth.name
      pth.rename(p_target)

Note: with pathlib I don't have to worry about os.sep() Less work! More readable!

Also, as in the first example, all path manipulation and control is now contained within the library, so there is no need to pull in outside os functions or shutil modules. To me, this is more satisfying. When working with paths, it makes sense to work with one type of object that understands itself as a path, rather than different collections of functions nested in other modules.

Ultimately, for me, this is a more human way to think about the processes that I am manipulating. Thus making it easier and less work. Yaay!

2) I will start using f'' strings on Python 3.6+ projects.

I acknowledge that adding .format() is a waste of precious line characters (I'm looking at you PEP 8) and % notation is unreadable. The f'' string makes my code more elegant and easier to read. They also move closer to the other idioms used by python like r'' and b'' and the no longer necessary (if you are on Python3) u''. Yes, this is a small thing, but less work is the goal.

    for k, v in somedict.items():
        print("The key is {}\n The value is {}'.format(k, v))

vs.

    for k, v in somedict.items():
        print(f'The key is {k}\n The value is {v}')

Another advantage in readability and maintainability is that I don't have to keep track of parameter position as before with .format(k, v) if I later decide that I really want v before k.

3) I will work toward, as much as possible, writing my tests before I write my code.

I acknowledge that I am bad about jumping into a problem, trying to solve it before I fully understand the behavior I want to see (don't judge me, I know some of you do this, too). I hope, foolishly, that the behavior will reveal itself as I solve the various problems that crop up.

Writing your tests first may seem unintuitive, but hear me out. This is known as Test Driven Development. Rediscovered by Kent Beck in 2003, it is a programming methodology that seeks to tackle the problem of managing code complexity with our puny human brains.

The basic concept is simple: to understand how to build your program you must understand how it will fail. So, the first thing that you should do is write tests for the behaviors of your program. These tests will fail and that is good because now you (the programmer with the puny human brain) have a map for your code. As you make each test pass, you will quickly know if the code doesn’t play well with other parts of the code, causing the other tests to fail.

This idea is closely related to Acceptance Test Driven Development, which you may have also heard of, and is mentioned in this Caktus post.

It All Adds Up

Although these three parts of my resolution are not huge, together they will allow me to work less. Initially, as I write the code, and then in the future when I come back to code I wrote two sprints ago and is now a mystery to me.

So that's it, I'm working less next year, and that will make my code better.

Caktus GroupHow to Fix your Python Code's Style

Sometimes we inherit code that doesn't follow the style guidelines we prefer when we're writing new code. We could just run flake8 on the whole codebase and fix everything before we continue, but that's not necessarily the best use of our time.

Another approach is to update the styling of files when we need to make other changes to them. To do that, it's helpful to be able to run a code style checker on just the files we're changing. I've written tools to do that for various source control systems and languages over the years. Here's the one I'm currently using for Python and flake8.

I call this script flake. I have a key in my IDE bound to run it and show the output so I can click on each line to go to the code that has the problem, which makes it pretty easy to fix things.

It can run in two modes. By default, it checks any files that have uncommitted changes. Or I can pass it the name of a git branch, and it checks all files that have changes compared to that branch. That works well when I'm working on a feature branch that is several commits downstream from develop and I want to be sure all the files I've changed while working on the feature are now styled properly.

The script is in Python, of course.

Work from the repository root

Since we're going to work with file paths output from git commands, it's simplest if we first make sure we're in the root directory of the repository.

#!/usr/bin/env python3
import os
import os.path
import subprocess

if not os.path.isdir('.git'):
    print("Working dir: %s" % os.getcwd())
    result = subprocess.run(['git', 'rev-parse', '--show-toplevel'], stdout=subprocess.PIPE)
    dir = result.stdout.rstrip(b'\n')
    os.chdir(dir)
    print("Changed to %s" % dir)

We use git rev-parse --show-toplevel to find out what the top directory in the repository working tree is, then change to it. But first we check for a .git directory, which tells us we don't need to change directories.

Find files changed from a branch

If a branch name is passed on the command line, we want to identify the Python files that have changed compared to that branch.

import sys
...
if len(sys.argv) > 1:
    # Run against files that are different from *branch_name*
    branch_name = sys.argv[1]
    cmd = ["git", "diff", "--name-status", branch_name, "--"]
    out = subprocess.check_output(cmd).decode('utf-8')
    changed = [
        # "M\tfilename"
        line[2:]
        for line in out.splitlines()
        if line.endswith(".py") and "migrations" not in line and line[0] != 'D'
    ]

We use git diff --name-status <branch-name> -- to list the changes compared to the branch. We skip file deletions — that means we no longer have a file to check — and migrations, which never seem to quite be PEP-8 compliant and which I've decided aren't worth trying to fix. (You may decide differently, of course.)

Find files with uncommitted changes

Alternatively, we just look at the files that have uncommitted changes.

else:
    # See what files have uncommitted changes
    cmd = ["git", "status", "--porcelain", "--untracked=no"]
    out = subprocess.check_output(cmd).decode('utf-8')
    changed = []
    for line in out.splitlines():
        if "migrations" in line:
            # Auto-generated migrations are rarely PEP-8 compliant. It's a losing
            # battle to always fix them.
            continue
        if line.endswith('.py'):
            if '->' in line:
                # A file was renamed. Consider the new name changed.
                parts = line.split(' -> ')
                changed.append(parts[1])
            elif line[0] == 'M' or line[1] != ' ':
                changed.append(line[3:])

Here we take advantage of git --porcelain to ensure the output won't change from one git version to the next, and it's fairly easy to parse in a script. (Maybe I should investigate using --porcelain with the other git commands in the script, but what I have now works well enough.)

Run flake8 on the changed files

Either way, changed now has a list of the files we want to run flake8 on.

cmd = ['flake8'] + changed
rc = subprocess.call(cmd)
if rc:
    print("Flake8 checking failed")
    sys.exit(rc)

Running flake8 with subprocess.call this way sends the output to stdout so we can see it. flake8 will exit with a non-zero status if there are problems; we print a message and also exit with a non-zero status.

Wrapping up

I might have once written a script like this in Shell or Perl, but Python turns out to work quite well once you get a handle on the subprocess module.

The resulting script is useful for me. I hope you'll find parts of it useful too, or at least see something you can steal for your own scripts.

Caktus GroupOur Top Tip for Computer Security

‘Tis the season for shopping online, sending cute holiday e-cards, and emailing photos to grandparents. But during all this festive online activity, how much do you think about your computer security? For example, is your password different for every shopping and e-card site that you use? If not, it should be!

Given that Friday, November 30, is Computer Security Day, it’s a good time to consider whether your online holiday habits are putting you at risk of a data breach. And our top tip is to use a different password for every website and online account. You’ve probably heard this a hundred times already, but it’s the first line of defense that you have against attacks.

We all should take computer and internet security seriously. The biggest threat to ordinary users is password reuse, like having the same (or similar) username and password combination for Amazon, Facebook, and your health insurance website. This issue is frighteningly common — the resource Have I Been Pwned has collected 5.6 billion username and password pairs since 2013. Once attackers breach one of your online accounts, they then try the same username and password on sites across the internet, looking for another match.

If one password on one website is breached, then all your other accounts with the same password are vulnerable.

It’s worth reiterating: Don’t use the same password on more than one website. Otherwise, your accounts are an easy target for an attacker to gain valuable data like your credit card number and go on a holiday shopping spree that’ll give you a headache worse than any eggnog hangover you’ve ever had!

More Tips to Fend Off an Online Grinch

Here are a few more tips for password security, to help protect your personal information from attacks, scams, phishing, and other unsavory Grinch-like activity:

  1. Create a strong password for every website and online account. A password manager like LastPass or 1Password can help you create unique passwords for every online account. Be sure to also choose a strong passphrase with 2-factor authentication for your password manager login, and then set it up to automatically generate passwords for you.

  2. Choose 2-factor authentication. Many websites now offer some means of 2-factor authentication. It takes a few more minutes to set up, but it’s worth it. Do this on as many websites as possible to make your logins more secure.

  3. Do not send personal or business-related passwords via email. It may be an easy means of communication, but email is not a secure method of communication.

Have Holly, Jolly Holidays

You have an online footprint consisting of various accounts, email providers, social media, and web browsing history. Essential personal info, like your health records, banking and credit records are online, too. All of this info is valuable and sellable to someone, and the tools they use to steal your data are cheap. All they need to do is get one credit card number and the payoff may be huge. Don’t let that credit card number be yours, otherwise, you won’t have a very jolly holiday.

Be vigilant, especially around the holidays, when there’s an increase in online commerce and communication, and therefore a greater chance that an attacker may succeed in getting the info they want from you.

Caktus GroupDjango Depends on You: A Takeaway from DjangoCon

Photo by Bartek Pawlik.

DjangoCon 2018 attracted attendees from around the world, including myself and several other Cakti (check out our DjangoCon recap post). Having attended a number of DjangoCons in the past, I looked forward to reconnecting with old colleagues and friends within the community, learning new things about our favorite framework, and exploring San Diego.

While it was a privilege to attend DjangoCon in person, you can experience it remotely. Thanks to technology and the motivated organizers, you can view a lot of the talks online. For that, I am thankful to the DjangoCon organizers, sponsors, and staff that put in the time and energy to ensure that these talks are readily available for viewing on YouTube.

Learn How to Give Back to the Django Framework

While I listened to a lot of fascinating talks, there was one that stood out and was the most impactful to me. I also think it is relevant and important for the whole Django community. If you have not seen it, I encourage you to watch and rewatch Carlton Gibson’s “Your web framework needs you!". Carlton was named a Django Fellow in January of 2018 and provides a unique perspective on the state of Django as an open source software project, from the day-to-day management, to the (lack of) diversity amongst the primary contributors, to the ways that people can contribute at the code and documentation levels.

This talk resonated with me because I have worked with open source software my entire career. It has enabled me to bootstrap and build elegant solutions with minimal resources. Django and its ilk have afforded me opportunities to travel the globe and engage with amazing people. However, in over 15 years of experience, my contributions back to the software and communities that have served me well have been nominal in comparison to the benefits I have received. But I came away from the talk highly motivated to contribute more, and am eager to get that ball rolling.

Carlton says in his talk, “we have an opportunity to build the future of Django here.” He’s right, our web framework needs us, and via his talk you will discover how to get involved in the process, as well as what improvements are being made to simplify onboarding. I agree with Carlton, and believe it’s imperative to widen the net of contributors by creating multiple avenues for contributions that are easily accessible and well supported. Contributions are key to ensuring a sound future for the Django framework. Whether it’s improving documentation, increasing test coverage, fixing bugs, building new features, or some other aspect that piques your interest, be sure to do your part for your framework. The time that I am able to put toward contributing to open source software has always supplied an exponential return, so give it a try yourself!

Watch the talk to see how you can contribute to the Django framework.

Caktus GroupDjangoCon 2018 Recap

Above: Hundreds of happy Djangonauts at DjangoCon 2018. (Photo by Bartek Pawlik.)

That’s it, folks — another DjangoCon in the books! Caktus was thrilled to sponsor and attend this fantastic gathering of Djangonauts for the ninth year running. This year’s conference ran from October 14 - 19, in sunny San Diego. ☀️

Our talented Caktus contractor Erin Mullaney was a core member of this year’s DjangoCon organizing team, plus five more Cakti joined as participants: CTO Colin Copeland, technical manager Karen Tracey, sales engineer David Ray, CBDO Ian Huckabee, and myself, account exec Tim Scales.

What a Crowd!

At Caktus we love coding with Django, but what makes Django particularly special is the remarkable community behind it. From the inclusive code of conduct to the friendly smiles in the hallways, DjangoCon is a welcoming event and a great opportunity to meet and learn from amazing people. With over 300 Django experts and enthusiasts attending from all over the world, we loved catching up with old friends and making new ones.

What a Lineup!

DjangoCon is three full days of impressive and inspiring sessions from a diverse lineup of presenters. Between the five Cakti there, we managed to attend almost every one of the presentations.

We particularly enjoyed Anna Makarudze’s keynote address about her journey with coding, Russell Keith-Magee’s hilarious talk about tackling time zone complexity, and Tom Dyson’s interactive presentation about Django and Machine Learning. (Videos of the talks should be posted soon by DjangoCon.)

What a Game!

Thanks to the 30+ Djangonauts who joined us for the Caktus Mini Golf Outing on Tuesday, October 16! Seven teams putted their way through the challenging course at Belmont Park, talking Django and showing off their mini golf skills. We had fun meeting new friends and playing a round during the beautiful San Diego evening.

Thanks to all the organizers, volunteers, and fellow sponsors who made DjangoCon 2018 a big success. We look forward to seeing you again next year!

Caktus GroupFiltering and Pagination with Django

If you want to build a list page that allows filtering and pagination, you have to get a few separate things to work together. Django provides some tools for pagination, but the documentation doesn't tell us how to make that work with anything else. Similarly, django_filter makes it relatively easy to add filters to a view, but doesn't tell you how to add pagination (or other things) without breaking the filtering.

The heart of the problem is that both features use query parameters, and we need to find a way to let each feature control its own query parameters without breaking the other one.

Filters

Let's start with a review of filtering, with an example of how you might subclass ListView to add filtering. To make it filter the way you want, you need to create a subclass of FilterSet and set filterset_class to that class. (See that link for how to write a filterset.)

class FilteredListView(ListView):
    filterset_class = None

    def get_queryset(self):
        # Get the queryset however you usually would.  For example:
        queryset = super().get_queryset()
        # Then use the query parameters and the queryset to
        # instantiate a filterset and save it as an attribute
        # on the view instance for later.
        self.filterset = self.filterset_class(self.request.GET, queryset=queryset)
        # Return the filtered queryset
        return self.filterset.qs.distinct()

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        # Pass the filterset to the template - it provides the form.
        context['filterset'] = self.filterset
        return context

Here's an example of how you might create a concrete view to use it:

class BookListView(FilteredListView):
    filterset_class = BookFilterset

And here's part of the template that uses a form created by the filterset to let the user control the filtering.

<h1>Books</h1>
  <form action="" method="get">
    {{ filterset.form.as_p }}
    <input type="submit" />
  </form>

<ul>
    {% for object in object_list %}
        <li>{{ object }}</li>
    {% endfor %}
</ul>

filterset.form is a form that controls the filtering, so we just render that however we want and add a way to submit it.

That's all you need to make a simple filtered view.

Default values for filters

I'm going to digress slightly here, and show a way to give filters default values, so when a user loads a page initially, for example, the items will be sorted with the most recent first. I couldn't find anything about this in the django_filter documentation, and it took me a while to figure out a good solution.

To do this, I override __init__ on my filter set and add default values to the data being passed:

class BookFilterSet(django_filters.FilterSet):
    def __init__(self, data, *args, **kwargs):
        data = data.copy()
        data.setdefault('format', 'paperback')
        data.setdefault('order', '-added')
        super().__init__(data, *args, **kwargs)

I tried some other approaches, but this seemed to work out the simplest, in that it didn't break or complicate things anywhere else.

Combining filtering and pagination

Unfortunately, linking to pages as described above breaks filtering. More specifically, whenever you follow one of those links, the view will forget whatever filtering the user has applied, because that filtering is also controlled by query parameters, and these links don't include the filter's parameters.

So if you're on a page https://example.com/objectlist/?type=paperback and then follow a page link, you'll end up at https://example.com/objectlist/?page=3 when you wanted to be at https://example.com/objectlist/?type=paperback&page=3.

It would be nice if Django helped out with a way to build links that set one query parameter without losing the existing ones, but I found a nice example of a template tag on StackOverflow and modified it slightly into this custom template tag that helps with that:

# <app>/templatetags/my_tags.py
from django import template

register = template.Library()


@register.simple_tag(takes_context=True)
def param_replace(context, **kwargs):
    """
    Return encoded URL parameters that are the same as the current
    request's parameters, only with the specified GET parameters added or changed.

    It also removes any empty parameters to keep things neat,
    so you can remove a parm by setting it to ``""``.

    For example, if you're on the page ``/things/?with_frosting=true&page=5``,
    then

    <a href="/things/?{% param_replace page=3 %}">Page 3</a>

    would expand to

    <a href="/things/?with_frosting=true&page=3">Page 3</a>

    Based on
    https://stackoverflow.com/questions/22734695/next-and-before-links-for-a-django-paginated-query/22735278#22735278
    """
    d = context['request'].GET.copy()
    for k, v in kwargs.items():
        d[k] = v
    for k in [k for k, v in d.items() if not v]:
        del d[k]
    return d.urlencode()

Here's how you can use that template tag to build pagination links that preserve other query parameters used for things like filtering:

{% load my_tags %}

{% if is_paginated %}
  {% if page_obj.has_previous %}
    <a href="?{% param_replace page=1 %}">First</a>
    {% if page_obj.previous_page_number != 1 %}
      <a href="?{% param_replace page=page_obj.previous_page_number %}">Previous</a>
    {% endif %}
  {% endif %}

  Page {{ page_obj.number }} of {{ paginator.num_pages }}

  {% if page_obj.has_next %}
    {% if page_obj.next_page_number != paginator.num_pages %}
      <a href="?{% param_replace page=page_obj.next_page_number %}">Next</a>
    {% endif %}
    <a href="?{% param_replace page=paginator.num_pages %}">Last</a>
  {% endif %}

  <p>Objects {{ page_obj.start_index }}{{ page_obj.end_index }}</p>
{% endif %}

Now, if you're on a page like https://example.com/objectlist/?type=paperback&page=3, the links will look like ?type=paperback&page=2, ?type=paperback&page=4, etc.

Caktus GroupThe Secret Lives of Cakti

Pictured from left: Caktus team members Vinod Kurup, Karen Tracey, and David Ray.

The Caktus team includes expert developers, sharp project managers, and eagle-eyed QA analysts. However, you may not know that there’s more to them than meets the eye. Here’s a peek at how Cakti spend their off-hours.

Vinod Kurup, M.D.

By day Vinod is a mild-mannered developer, but at night he swaps his keyboard for a stethoscope and heads to the hospital. Vinod’s first career was in medicine, and prior to Caktus he worked many years as an MD. While he’s now turned his expertise to programming, he still works part-time as a hospitalist. Now that’s what I call a side hustle.

Karen Tracey, Cat Rescuer

When Karen isn’t busy as both lead developer and technical manager for Caktus, she works extensively with Alley Cats and Angels, a local cat rescue organization dedicated to improving the lives and reducing the population of homeless cats in the Triangle area. She regularly fosters cats and kittens, which is why you sometimes find feline friends hanging out in the Caktus office.

David Ray, Extreme Athlete

Software development and extreme physical endurance training don’t generally go together, but let me introduce you to developer/sales engineer David. When not building solutions for Caktus clients, David straps on a 50-pound pack and completes 24-hour rucking events. Needless to say, he’s one tough Caktus. (Would you believe he’s also a trained opera singer?)

David Ray at a rucking event.

Pictured: David Ray at a recent rucking event.

These are just a few of our illustrious colleagues! Our team also boasts folk musicians, theater artists, sailboat captains, Appalachian cloggers, martial artists, and more.

Want to get to know us better? Drop us a line.

Caktus GroupDjango or Drupal for Content Management: Which Fits your Needs?

If you’re building or updating a website, you’re probably wondering about which content management system (CMS) to use. A CMS helps users — particularly non-technical users — to add pages and blog posts, embed videos and images, and incorporate other content into their site.

CMS options

You could go with something quick and do-it-yourself, like WordPress (read more about WordPress) or a drag-and-drop builder like Squarespace. If you need greater functionality, like user account management or asset tracking, or if you’re concerned about security and extensibility, you’ll need a more robust CMS. That means using a framework to build a complex website that can manage large volumes of data and content.

Wait, what’s a framework?

Put simply, a framework is a library of reusable code that is easily edited by a web developer to produce custom products more quickly than coding everything from scratch.

Django and Drupal are both frameworks with dedicated functionality for content management, but there is a key difference between them:

  • Drupal combines aspects of a web application framework with aspects of a CMS
  • Django separates the framework and the CMS

The separation that Django provides makes it easier for content managers to use the CMS because they don’t have to tinker with the technical aspects of the framework. A popular combination is Django and Wagtail, which is our favorite CMS.

I think I’ve heard of Drupal ...

Drupal is open source and built with PHP programming language. For some applications, its customizable templates and quick integrations make it a solid choice. It’s commonly used in higher education settings, among others.

However, Drupal’s predefined templates and plugins can also be its weakness: while they are useful for building a basic site, they are limiting if you want to scale the application. You’ll quickly run into challenges attempting to extend the basic functionality, including adding custom integrations and nonstandard data models.

Other criticisms include:

  • Poor backwards compatibility, particularly for versions earlier than Drupal 7. In this case, updating a Drupal site requires developers to rewrite code for elements of the templates and modules to make them compatible with the newest version. Staying up-to-date is important for security reasons, which can become problematic if the updates are put off too long.
  • Unit testing is difficult due to Drupal’s method of storing configurations in a database, making it difficult to test the effects of changes to sections of the code. Failing to do proper testing may allow errors to make it to the final version of the website.
  • Another database-related challenge lies in how the site configuration is managed. If you’re trying to implement changes on a large website consisting of thousands of individual content items or users, none of the things that usually make this easier — like the ability to view line-by-line site configuration changes during code review — are possible.

What does the above mean for non-technical stakeholders? Development processes are slowed down significantly because developers have to pass massive database files back and forth with low visibility into the changes made by other team members. It also means there is an increased likelihood that errors will reach the public version of your website, creating even more work to fix them.

Caktus prefers Django

Django is used by complex, high-profile websites, including Instagram, Pinterest, and Eventbrite. It’s written in the powerful, open-source Python programming language, which was created specifically to speed up the process of web development. It’s fast, secure, scalable, and intended for use with database-driven web applications.

A huge benefit of Django is more control over customization, plus data can easily be converted. Since it’s built on Python, Django uses a paradigm called object-oriented programming, which makes it easier to manage and manipulate data, troubleshoot errors, and re-use code. It’s also easier for developers to see where changes have been made in the code, simplifying the process of updating the application after it goes live.

How to choose the right tool

Consider the following factors when choosing between Drupal and Django:

  • Need for customization
  • Internal capacity
  • Planning for future updates

Need for customization: If your organization has specific, niche features or functionality that require custom development — for example, data types specific to a library, university, or scientific application — Django is the way to go. It requires more up-front development than template-driven Drupal but allows greater flexibility and customization. Drupal is a good choice if you’re happy to use templates to build your website and don’t need customization.

Internal capacity: Drupal’s steep learning curve means that it may take some time for content managers to get up to speed. In comparison, we’ve run training workshops that get content management teams up and running on Django-based Wagtail in only a day or two. Wagtail’s intuitive user interface makes it easier to manage regular content updates, and the level of customization afforded by Django means the user interface can be developed in a way that feels intuitive to users.

Planning for future updates: Future growth and development should be taken into account when planning a web project. The choices made during the initial project phase will impact the time, expense, and difficulty of future development. As mentioned, Drupal has backwards compatibility challenges, and therefore a web project envisioned as fast-paced and open to frequent updates will benefit from a custom Django solution.

Need a second opinion?

Don’t just take our word for it. Here’s what Brad Busenius at the University of Chicago says about their Django solution:

"[It impacts] the speed and ease at which we can create highly custom interfaces, page types, etc. Instead of trying to bend a general system like Drupal to fit our specific needs, we're able to easily build exactly what we want without any additional overhead. Also, since we're often understaffed, the fact that it's a developer-friendly system helps us a lot. Wagtail has been a very positive experience so far."

The bottom line

Deciding between Django and Drupal comes down to your specific needs and goals, and it’s worth considering the options. That said, based on our 10+ years of experience developing custom websites and web applications, we almost always recommend Django with Wagtail because it’s:

  • Easier to update and maintain
  • More straightforward for content managers to learn and use
  • More efficient with large data sets and complex queries
  • Less likely to let errors slip through the cracks

If you want to consider Django and whether it will suit your next project, we’d be happy to talk it through and share some advice. Get in touch with us.

Caktus GroupDiverse Speaker Line-up for DjangoCon is Sharp

Above: Caktus Account Manager Tim Scales gears up for DjangoCon.

We’re looking forward to taking part in the international gathering of Django enthusiasts at DjangoCon 2018, in San Diego, CA. We’ll be there from October 14 - 19, and we’re proud to attend as sponsors for the ninth year! As such, we’re hosting a mini golf event for attendees (details below).

This year’s speakers are impressive, thanks in part to Erin Mullaney, one of Caktus’ talented developers, who volunteered with DjangoCon’s Program Team. The three-person team, including Developer Jessica Deaton of Wilmington, NC, and Tim Allen, IT Director at The Wharton School, reviewed 257 speaker submissions. They ultimately chose the speakers with the help of a rating system that included community input.

“It was a lot of fun reading the submissions,” said Erin, who will also attend DjangoCon. “I’m really looking forward to seeing the talks this year, especially because I now have a better understanding of how much work goes into the selection process.”

Erin and the program team also created the talk schedule. The roster of speakers includes more women and underrepresented communities due to the DjangoCon diversity initiatives, which Erin is proud to support.

What we’re excited about

Erin said she’s excited about a new State of Django panel that will take place on Wednesday, October 17, which will cap off the conference portion of DjangoCon, before the sprints begin. It should be an informative wrap-up session.

Karen Tracey, our Lead Developer and Technical Manager, is looking forward to hearing “Herding Cats with Django: Technical and social tools to incentivize participation” by Sage Sharp. This talk seems relevant to the continued vibrancy of Django's own development, said Karen, since the core framework and various standard packages are developed with limited funding and rely tremendously on volunteer participation.

Our Account Manager Tim Scales is particularly excited about Tom Dyson’s talk, “Here Come The Robots,” which will explore how people are leveraging Django for machine learning solutions. This is an emerging area of interest for our clients, and one of particular interest to Caktus as we grow our areas of expertise.

Other talks we’re looking forward to include:

Follow us on Twitter @CaktusGroup and #DjangoCon to stay tuned on the talks.

Golf anyone?

If you’re attending DjangoCon, come play a round of mini golf with us. Look for our insert in your conference tote bag. It includes is a free pass to a mini golf outing that we’re hosting at Tiki Town Adventure Golf on Tuesday, October 16, at 7:00 p.m. (please RSVP online). The first round of golf is on us! Whoever shoots the lowest score will win a $100 Amazon gift card.*

No worries if you’re not into mini golf! Instead, find a time to chat with us one-on-one during DjangoCon.

*In the event of a tie, the winner will be selected from a random drawing from the names of those with the lowest score. Caktus employees can play, but are not eligible for prizes.

Caktus GroupBetter Python Dependency Management with pip-tools

I recently looked into whether I could use pip-tools to improve my workflow around projects' Python dependencies. My conclusion was that pip-tools would help on some projects, but it wouldn't do everything I wanted, and I couldn't use it everywhere. (I tried pip-tools version 2.0.2 in August 2018. If there are newer versions, they might fix some of the things I ran into when trying pip-tools.)

My problems

What were the problems I wanted to find solutions for, that just pip wasn't handling? Software engineer Kenneth Reitz explains them pretty well in his post, but I'll summarize here.

Let me start by briefly describing the environments I'm concerned with. First is my development environment, where I want to manage the dependencies. Second is the test environment, where I want to know exactly what packages and versions we test with, because then we come to the deployed environment, where I want to use exactly the same Python packages and versions as I've used in development and testing, to be sure no problems are introduced by an unexpected package upgrade.

The way we often handle that is to have a requirements file with every package and its version specified. We might start by installing the packages we know that we need, then saving the output of pip freeze to record all the dependencies that also got installed and their versions. Installing into an empty virtual environment using that requirements file gets us the same packages and versions.

But there are several problems with that approach.

First, we no longer know which packages in that file we originally wanted, and which were pulled in as dependencies. For example, maybe we needed Celery, but installing it pulled in a half-dozen other packages. Later we might decide we don't need Celery anymore and remove it from the requirements file, but we don't know which other packages we can also safely also remove.

Second, it gets very complicated if we want to upgrade some of the packages, for the same reasons.

Third, having to do a complete install of all the packages into an empty virtual environment can be slow, which is especially aggravating when we know little or nothing has changed, but that's the only way to be sure we have exactly what we want.

Requirements

To list my requirements more concisely:

  • Distinguish direct dependencies and versions from incidental
  • Freeze a set of exact packages and versions that we know work
  • Have one command to efficiently update a virtual environment to have exactly the frozen packages at the frozen versions and no other packages
  • Make it reasonably easy to update packages
  • Work with both installing from PyPI, and installing from Git repositories
  • Take advantage of pip's hash checking to give a little more confidence that packages haven't been modified
  • Support multiple sets of dependencies (e.g. dev vs. prod, where prod is not necessarily a subset of dev)
  • Perform reasonably well
  • Be stable

That's a lot of requirements. It turned out that I could meet more of them with pip-tools than just pip, but not all of them, and not for all projects.

Here's what I tried, using pip, virtualenv, and pip-tools.

How to set it up

  1. I put the top-level requirements in requirements.in/*.txt.

    To manage multiple sets of dependencies, we can include "-r file.txt", where "file.txt" is another file in requirements.in, as many times as we want. So we might have a base.txt, a dev.txt that starts with -r base.txt and then adds django-debug-toolbar etc, and a deploy.txt that starts with -r base.txt and then adds gunicorn.

    There's one annoyance that seems minor at this point, but turns out to be a bigger problem: pip-tools only supports URLs in these requirements files if they're marked editable with -e.

# base.txt
Django<2.0
-e git+https://github.com/caktus/django-scribbler@v0.8.0#egg=django-scribbler

# dev.txt
-r base.txt
django-debug-toolbar

# deploy.txt
-r base.txt
gunicorn
  1. Install pip-tools in the relevant virtual environment:
$ <venv>/bin/pip install pip-tools
  1. Compile the requirements as follows:
$ <venv>/bin/pip-compile --output-file requirements/def.txt requirements.in/dev.txt

This looks only at the requirements file(s) we tell it to look at, and not at what's currently installed in the virtual environment. So one unexpected benefit is that pip-compile is faster and simpler than installing everything and then running pip freeze.

The output is a new requirements file at requirements/dev.txt.

pip-compile nicely puts a comment at the top of the output file to tell developers exactly how the file was generated and how to make a newer version of it.

#
# This file is autogenerated by pip-compile
# To update, run:
#
#    pip-compile --output-file requirements/dev.txt requirements.in/dev.txt
#
-e git+https://github.com/caktus/django-scribbler@v0.8.0#egg=django-scribbler
django-debug-toolbar==1.9.1
django==1.11.15
pytz==2018.5
sqlparse==0.2.4           # via django-debug-toolbar
```
  1. Be sure requirements, requirements.in, and their contents are in version control.

How to make the current virtual environment have the same packages and versions

To update your virtual environment to match your requirements file, ensure pip-tools is installed in the desired virtual environment, then:

$ <venv>/bin/pip-sync requirements/dev.txt

And that's all. There's no need to create a new empty virtual environment to make sure only the listed requirements end up installed. If everything is already as we want it, no packages need to be installed at all. Otherwise only the necessary changes are made. And if there's anything installed that's no longer mentioned in our requirements, it gets removed.

Except ...

pip-sync doesn't seem to know how to uninstall the packages that we installed using -e <URL>. I get errors like this:

Can't uninstall 'pkgname1'. No files were found to uninstall.
Can't uninstall 'pkgname2'. No files were found to uninstall.

I don't really know, then, whether pip-sync is keeping those packages up to date. Maybe before running pip-sync, I could just

rm -rf $VIRTUAL_ENV/src

to delete any packages that were installed with -e? But that's ugly and would be easy to forget, so I don't want to do that.

How to update versions

  1. Edit requirements.in/dev.txt if needed.
  2. Run pip-compile again, exactly as before:
$ <venv>/bin/pip-compile--output-file requirements/dev.txt requirements.in/dev.txt
  1. Update the requirements files in version control.

Hash checking

I'd like to use hash checking, but I can't yet. pip-compile can generate hashes for packages we will install from PyPI, but not for ones we install with -e <URL>. Also, pip-sync doesn't check hashes. pip install will check hashes, but if there are any hashes, then it will fail unless all packages have hashes. So if we have any -e <URL> packages, we have to turn off hash generation or we won't be able to pip install with the compiled requirements file. We could still use pip-sync with the requirements file, but since pip-sync doesn't check hashes, there's not much point in having them, even if we don't have any -e packages.

What about pipenv?

Pipenv promises to solve many of these same problems. Unfortunately, it imposes other constraints on my workflow that I don't want. It's also changing too fast at the moment to rely on in production.

Pipenv solves several of the requirements I listed above, but fails on these: It only supports two sets of requirements: base, and base plus dev, not arbitrary sets as I'd like. It can be very slow. It's not (yet?) stable: the interface and behavior is changing constantly, sometimes multiple times in the same day.

It also introduces some new constraints on my workflow. Primarily, it wants to control where the virtual environment is in the filesystem. That both prevents me from putting my virtual environment where I'd like it to be, and prevents me from using different virtual environments with the same working tree.

Shortcomings

pip-tools still has some shortcomings, in addition to the problems with checking hashes I've already mentioned.

Most concerning are the errors from pip-sync when packages have previously been installed using -e <URL>. I feel this is an unresolved issue that needs to be fixed.

Also, I'd prefer not to have to use -e at all when installing from a URL.

This workflow is more complicated than the one we're used to, though no more complicated than we'd have with pipenv, I don't think.

The number and age of open issues in the pip-tools git repository worry me. True, it's orders of magnitude fewer than some projects, but it still suggests to me that pip-tools isn't as well maintained as I might like if I'm going to rely on it in production.

Conclusions

I don't feel that I can trust pip-tools when I need to install packages from Git URLs.

But many projects don't need to install packages from Git URLs, and for those, I think adding pip-tools to my workflow might be a win. I'm going to try it with some real projects and see how that goes for a while.

Josh JohnsonState And Events In CircuitPython: Part 3: State And Microcontrollers And Events (Oh My!)

In this part of the series, we'll apply what we've learned about state to our simple testing code from part one.

Not only will we debounce some buttons without blocking, we'll use state to more efficiently control some LEDs.

We'll also explore what happens when state changes, and how we can take advantage of that to do even more complex things with very little code, using the magic of event detection 🌈 .

All of this will be done in an object-oriented fashion, so we'll learn a lot about OOP as we go along.

Josh JohnsonState And Events In CircuitPython: Part 2: Exploring State And Debouncing The World

In this part of the series, we're going to really dig into what state actually is. We'll use analogies from real life, and then look at how we might model real-life state using Python data structures.

But first, we'll discuss a common problem that all budding electronics engineers have to deal with at some point: "noisy" buttons and how to make them "un-noisy", commonly referred to as "debouncing".

We'll talk about fixing the problem in the worst, but maybe easiest way: by blocking. We'll also talk about why it's bad.

Caktus GroupNational Day of Civic Hacking in Durham

Pictured: Simone Sequeira, Senior Product Manager of GetCalFresh, with event attendees at Caktus.

On August 11, I attended the National Day of Civic Hacking hosted by Code for Durham. More than 30 attendees came to the event, hosted in the Caktus Group Tech Space, to collaborate on civic projects that focus on the needs of Durham residents.

National Day of Civic Hacking is a nationwide day of action that brings together civic leaders, local government officials, and community organizers who volunteer their skills to help their local community. Simone Sequeira, Senior Product Manager of GetCalFresh, came from Oakland to participate and present at our Durham event. Simone inspired us with a presentation of GetCalFresh, a project supported by Code for America, that streamlines the application process for food assistance in California. It started as just an idea, and turned into a product used statewide that’s supported by over a half dozen employees. Many Code for Durham projects also start as ideas, and the National Day of Civic Hacking provided an opportunity to turn those ideas into realities.

Laura Biedeger, City of Durham Community Engagement Coordinator, presents at the event at Caktus.

Pictured: Laura Biedeger, a Team Captain at Code for Durham and a co-organizer of the event, speaks to attendees. I'm standing to the left.

Durham Projects

We worked on a variety of projects in Durham, including the following:

One group of designers, programmers, and residents audited the Code for Durham website. The group approached the topic from a user-centered design perspective: they identified and defined user personas and wrote common scenarios of visitors to the site. By the end of the event they had documented the needs of the site and designed mockups for the new site.

Regular volunteers with Code for Durham have been working with the Durham Innovation Team to create an automated texting platform for the Drivers License Restoration Initiative, which aims to support a regular amnesty of driver’s license suspensions in partnership with the Durham District Attorney’s Office. During our event volunteers added a Spanish language track to the platform.

The “Are We Represented?” project focused on voter education: showing how the makeup of County Commissioner boards across the state compare to the population in their county. During the event I worked with Jason Jones, the Analytics and Innovation Manager of Greensboro, to deploy the project to the internet (and we succeeded!).

The Are We Represented group reviews the State Board of Elections data files on a screen.

Pictured: The Are We Represented group reviews State Board of Elections data files.

Another group partnered with End Hunger in Durham, which provides a regularly updated list of food pantries and food producers (gardeners, farmers, grocery stores, bakeries) that regularly donate surplus food. The volunteers reviewed an iOS app they had developed to easily find a pantry, and discussed the development of an Android app.

Join Us Next Time!

The National Day of Civic Hacking gave volunteers a chance to get inspired about new project opportunities, to meet new volunteers, city employees, and to focus on a project for an extended period of time. The projects will continue at Code for Durham’s regularly hosted Meetup at the Caktus Group Tech Space. Volunteers are always welcome, so join us at the next Meetup!

Josh JohnsonState And Events In CircuitPython: Part 1: Setup

This is the first article in a series that explores concepts of state in CircuitPython.

In this installment, we discuss the platform we're using (both CircuitPython and the Adafruit M0/M4 boards that support it), and build a simple circuit for demonstration purposes. We'll also talk a bit about abstraction.

This series is intended for people who are new to Python, programming, and/or microcontrollers, so there's an effort to explain things as thoroughly as possible. However, experience with basic Python would be helpful.

Caktus GroupComplicated Project? Start with our Discovery Workshop Guide

If you ever struggled to implement a complicated development project, starting your next one with a discovery workshop will help. Discovery workshops save you time and money over the course of a project because we help you answer important questions in advance, ensuring that the final product lines up with your primary end goals. Our new guide, Shared Understanding: A Guide to Caktus Discovery Workshops, demonstrates the value of these workshops and why we’ve made them a core component of our client services.

Set Your Project Up for Success

Discovery workshops are vital in defining a project and are an ideal way to overcome the challenges that arise when multiple stakeholders have varying opinions and conflicting visions. By facilitating a discovery workshop, we create a shared understanding of the project and ultimately streamline the development process to ensure that our clients get the best value for their investment. Projects that begin with a discovery phase are more successful for these simple reasons:

  • They cost less because we build the right thing first
  • They’re done faster because we focus on the most valuable features first
  • They have better results because user needs are prioritized from the start

Discovery workshops are part of our best practices for building sharp web apps the right way. We’ve proven that these workshops ensure that projects not only hit their objectives but that they do so on budget, reducing the likelihood of requiring additional work (or money) further down the line.

Get Our Guide

Shared Understanding: A Guide to Caktus Discovery Workshops explains what a Caktus discovery workshop is. It also:

  • Demonstrates how to achieve a shared understanding among stakeholders
  • Provides techniques to uncover discrepancies or areas lacking clarity in the project vision
  • Explains how this knowledge translates into tangible benefits during the project estimation and development process

The guide is an introduction to the aspects of user-centered requirements gathering, which we find most useful at Caktus, and we hope you’ll take a moment to read the free guide:

Caktus GroupShipIt Day Summer 2018 Recap

On July 27 - 28, we ran our quarterly ShipIt Day here at Caktus. These 24-hour sprints, which we’ve organized since 2012, allow Cakti to explore new projects that expand or increase their skill sets. The event kicked off at 3:00 pm on Thursday and we reconvened at 3:00 pm on Friday to showcase our progress. The goal is to experiment, take risks, and try something new.

Here’s what we got up to this time:

Ship It Day Infographic 2018

Show me more!

Read about previous ShipIt Days in these blog posts.

Philip SemanchukThanks to PyOhio 2018!

Thanks to organizers, sponsors, volunteers, and attendees for another great PyOhio!

Here’s the slides from my talk on Python 2 to 3 migration.