A Balanced Look at GraphQL

Is GraphQL a tool that will make your life better or worse?

TL;DR: as with any technology it depends on your situation and needs. There are real benefits for developer experience and solving many related problems with a cohesive architecture, but it’s not just a simple drop-in replacement for REST, HATEOAS or {g,t}RPC. GraphQL is better for serious projects with interactive user interfaces and a rich relational data model. If you are willing to adopt a new paradigm and use it as a foundation for a new application, it can be worthwhile and make your life moderately easier.

I’ll summarize the tradeoffs here and go into more detail with examples and explanations afterwards.

Pros:

Can obviate the need for a web framework or server entirely
End-to-end type safety
Great for language-agnostic APIs (vs. something like tRPC) and code generation
Presents a unified client interface that can weave together independent services or functions on the backend
Implement simple handlers without even talking to your application backend
Add (simple) declarative access controls
Smart caching
Real-time push events (WebSockets) are solved for you
Documentation generation for free (like OpenAPI but better)
Client has more control over what data is fetched retrieved
Designed for data structured as a graph (obviously) or hierarchical/relational data
Fetch all the data the application needs in a single request (great for mobile apps)
Can segregate your application services into independent units to get the benefits of a microservice architecture without the overhead and complexity

Cons:

Requires some specialized tools to work with compared to REST
Not as universally understood and adopted as REST
Can be more strict than necessary for your project
Requires some sort of extra schema definition step to add a new “endpoint”
Not necessarily suitable for public/unauthenticated APIs
May involve code generation, an extra build step
Unusual HTTP semantics
- All requests are to the same endpoint
- All responses are 200 OK
- Everything is a POST
No support for untyped/unstructured fields, just “string”

Caveats

I’m writing based on my experience with particular implementations of GraphQL servers and clients. My setup has been AWS AppSync, graphql-code-generator, TypeScript, React, and Apollo Client. Some of my opinions are relevant to this setup and some are broadly applicable. I think AWS and React are common enough technologies that it’s worth looking more at some concrete implementation details there.

Other Opinions

GraphQL was designed and created by Facebook, a company that seems to make a lot of money and build performant software and complex user interfaces that work across a variety of platforms. They are also behind React and are worth taking seriously when it comes to web technologies, even if they were originally cursed with PHP and MySQL.

Anecdotally I’ve seen some articles and videos across the transom bagging on GraphQL. I recently listened to a terrific interview on the Changelog podcast diving deep into API design, and the design of hypermedia and HATEOAS which I highly recommend. There was some very light and admitted highly subjective criticism of GraphQL on it but they are keeping an open mind. I hope to fill some open minds with my arguments for and against GraphQL, which is the best you can do with any technology other than say “it depends.” The episode and transcript can be found here:

Changelog & Friends 24: HATEOAS corpus – Listen on Changelog.com

Amazon has some arguments for GraphQL over REST as well.

What is GraphQL?

GraphQL is an attempt to add another layer of abstraction and standardization on top of traditional HTTP-based APIs. There’s nothing amazingly groundbreaking about it, but it provides for a standard architecture that enforces many practices that can be beneficial for people building moderately serious apps. You must define a schema for the types of requests your clients can make, what fields appear in the request and response bodies, what changes can be subscribed to (think WebSockets), and extensions that can be implemented by the gateway (think permissions or @deprecated).

This schema can be derived from code or written before code. It contains enough information to generate types so your clients and servers know exactly what will be in a given request, which is also enabled by the validation that will be automatically performed to ensure all requests and responses strictly conform to the declared schema. So as far as checking types of data or getting only the fields you explicitly allow in a request, that is one less thing to worry about on the server side and is a more secure default than REST APIs which allow the client to send any sort of data and fields it pleases with a request. The schema also enables documentation, client generation, a fully discoverable API, and a playground for testing requests. Very similar to what you get if you use OpenAPI (aka Swagger) generation tools with REST APIs, but in my opinion it’s much smoother and pleasant compared with OpenAPI. When you define an operation (equivalent of an endpoint), it must be a query, mutation, or subscription. This is very much like HTTP verbs like GET and POST, but more helpfully explicit and strict about the semantics.

The API discovery of fields and operations gives a bit more flexibility in API design and the actual work that must be done to make sure the client can retrieve all the data it needs in a single request. I’m going to borrow the examples from graphql.org here:

On the left is a query which is asking for two separate pieces of information at the same time: give me the `name` of `hero` and give me the `name` of the `droid` with `id` `2000`.

The schema definition and server implementation can make many fields on objects like hero or droid available to be queried, optionally spawning additional requests to fulfill the request. This lets the API designer expose all the data they deem necessary for the client to have at some point without having to define endpoints or parameters to handle every permutation of data fetches and information a client may wish to display in a given interface. It pushes a little more work onto the client side of things but in a very natural way. Let the client decide for a given situation what data it needs, ask the server, which then will issue the appropriate requests to fulfill the request. This may sound a bit like a “backend for frontend” often implemented in microservice architectures. GraphQL is a wonderful replacement for a BFF.

A Replacement for Web Frameworks

This is going to be controversial statement but you can replace your web server, web framework, DTOs (data transfer objects, also called schemas), OpenAPI generation tooling, BFF, and application load balancer with GraphQL. It solves all of these problems for you in an elegant and well-thought-out cohesive architecture. But it is a different paradigm than your typical backend developer is used to.

Boring History

As a certified Old Person building web applications since about 2000, I’ve encountered a good number of web frameworks. Perl, Python, Java, Node, Ruby, they’re all the exact same thing from Catalyst to Flask to Spring to Express to Nest to Rails. In a nutshell they:

Let you map routes to function handlers
Define what shape data should be in for requests and responses
Validate API requests
Run in some container to service requests

They typically have some other helpful things like ORM integration and middleware but these are trivial to incorporate if desired.

What? How Can I Live Without A Web Framework??

Do I detect a note of panic? Relax. GraphQL handles these things for you. It maps requests to handlers, strictly defines what shape data will be in going in and out of your endpoints, and takes care of the HTTP request handling layer for you. So you can define a schema and then functions which are invoked as needed to service client requests, possibly in parallel or in response to certain fields being requested by the client.

Arguably at the end of the day we have in its end result the same functionality whether we use a web framework or GraphQL, but I believe the bundling of this functionality with the extra thought and care that has gone into making GraphQL a more modern solution is worth serious consideration for new projects. Web application frameworks haven’t changed in twenty years, best practices have evolved since then. In the end you can achieve the same results with CGI scripts or PHP 3, or <form> tags and jQuery, but we know that there have been advancements in tooling and architecture since then.

Maybe I Do Need A Web Framework?

As I started off with, I am absolutely not arguing GraphQL is better in every way or should be used in all situations. There are a great many valid reasons for not choosing to adopt GraphQL for your project. A few off the top of my head:

Adding a new GraphQL-based service when you already have other APIs (REST or otherwise). GraphQL is a terrific unifier of services but a big ask of teams to adopt incrementally. Managing tooling for two API paradigms sounds like extra suffering. I will point out that you can use a REST API Gateway as a data source for an AppSync GraphQL service though.
Small projects. Often I am working on projects that are business-critical, involve large relational databases, will be worked on and maintained actively for years. If you want to throw together a small project or a prototype, the extra overhead is not worth it. The payoffs come with larger applications.
When your frontend and backend are tightly coupled. GraphQL adds an extra layer of abstraction which is extraneous if your backend and frontend are part of the same application. For example if you are using NextJS as both client and server, I would recommend server actions or tRPC.
Public APIs. If you are building an API which does not require authentication, or is designed to be easy to consume by clients outside your organization, I would advise sticking to something everyone understands and can be easily tested with tools like CURL. If you want other developers to get up and running and consuming your API with a minimum of extra obstacles, just stick with REST like everyone else.

However if you are embarking on a greenfield, serious project and you expect your API to scale with your organization and provide a high level of strictness and clarity and documentation, GraphQL may be a strong candidate.

Type Safety

I have similar feelings about GraphQL as I do about TypeScript. Sure, people have been building web applications just fine thank you with REST and JavaScript for many years, maybe around 25 at this point which is a lifetime in an engineer’s career. But there comes a point, maybe after another late-night debugging session where you eventually discover some mistyped parameter or property that shouldn’t have existed at runtime is ruining your weekend. Maybe you wonder, why do all these things have to be runtime errors? Maybe the Python heads don’t know there is a better way and think all errors should be runtime errors. I don’t know.

GraphQL and TypeScript ask a bit more of you, the person designing the API or writing the code to implement it. It’s more steps to solve the same problem and you just want to close this ticket. Why waste my time? The answer is that our tools can yell at us before we ship the code, not after. Obviously not in all cases, but many classes of bugs become less likely to happen at runtime, which does save us time and suffering in the long run. They increase maintainability, readability, code completion, AI copilots, and documentationability.

Same reason we write tests. It takes more time to write tests than to not write tests, but it pays off in the long run if we think about the runtime bugs we avoid and the safety we get when refactoring code, and the lower likelihood of someone less familiar with the system breaking other things in a few years time when they try to change something. Again not every software project needs tests, but you know who you are.

GraphQL’s strictness is especially helpful if annoying in the context of an API. A published API is a contract with your clients you must not break. If a client relies on a username field being present in some endpoint, you must keep that field around. If you want to move or rename the field or make it a different type, you risk breaking every old version of your clients which still expect that field to exist with a particular type. GraphQL doesn’t solve the problem by allowing you to just change schemas up on your clients as you like, but it can alert you to the fact you are making a backwards-incompatible schema change and yell at you. This is a very useful CI action to run. For example the graphql-inspector:

Because there’s nothing worse than accidentally pushing an API change that breaks every client. These are the kinds of validations we can leverage with GraphQL.

GraphQL Server

You do of course need some software that performs the duties of a GraphQL API server. There are many very featureful and robust options like Hasura and Apollo, with features like taking your existing Postgres database and generating operations and types, or federation where you can have multiple servers handling different pieces of your graph.

Myself, I always default to using an AWS service if it is good enough to suit my needs. It’s not because AWS has the greatest version of every service you will never need. Reasons why:

The more infrastructure Amazon runs instead of me, the better.
- It will be more reliable than if I run it myself
- I don’t have to maintain it
- They have great documentation and support and monitoring
- I don’t ever have to think about scaling anything
AWS services are designed to work with each other. You end up with a cohesive architecture that is more than the sum of its parts.
- Integration with distributed tracing
- Integration with other AWS data sources
  - Can run queries on RDS directly
  - Can query DynamoDB directly
- Integrated authentication with Cognito
Pay-as-you-go billing. Great for prototyping on the cheap but being able to scale to handle serious volume without changing anything.

AWS AppSync

So I’ve been using AWS AppSync. It’s a bit oddly named but it’s a serverless GraphQL API. You upload a schema to it, define how users authenticate to it, and then connect operations to resolvers. You can tell it to directly query RDS or DynamoDB, you can write JavaScript resolvers that are executed inside AppSync (this is very cool but also maybe unwieldy to maintain or debug, I would compare it to using stored procedures in your DB instead of application code), and you can use lambda functions to handle requests. It can handle caching and tracing for you.

A powerful architecture is to use a serverless GraphQL server and functions defined for most of your application query, mutation, and subscription operations. Depending on your setup and needs this can give you many of the benefits of a microservices-style architecture without all the headaches and overhead introduced. You may have different problems (mostly cold starts) but I am a big of it regardless.

To give an example, suppose a client runs a getProfile query. You want some business logic to grab the currently authenticated user’s profile information from the database and return it. You would create a standalone lambda function in whatever language you like that is called when the getProfile query is performed, do your business logic, and return the result. AppSync handles the request/response validation and authentication and dispatch. This function is self-contained, not connected to anything else in your application, which has some consequences:

You can deploy new versions of this function without changing anything else in your application, if you want.
You can define scaling limits, memory limits, and timeout for this function, if you want.
You can define exactly what permissions the function has if you want to, what security groups it’s in, its IAM role. This is the absolute maximum application of the principle of least privilege, if fine-grained security is important for you.
You can write it in whatever language you want, build it with whatever tooling you want.
Different teams and projects can manage different subsets of functions, enabling teams to work independently but each be contributing to a unified service.

These are all optional benefits. You can also use a single IAM role and toolchain and language and limits for all functions in your application, but you also have the flexibility to slice up pieces of your services into independent units to avoid a monolith or big ball of mud situation. You can create a “monolith” library of all of your business logic and repositories but package it as separate functions. Again, if you want. You can also create a single resolver function that handles all requests if you prefer simplicity. Or do both!

This architecture is most suited to compiled languages (like Rust or Go) or JavaScript/TypeScript because each function should be small and self-contained. This is where JavaScript’s tree-shaking tools really shine. If your function loads some data from the database and imports some functions, that is all of the code that will be in your function. Your codebase may grow but the resolver functions stay the same size. If you want you can have one large monolithic-style library, or a set of libraries, where each resolver function only pulls out the pieces it needs from the library and compiles or bundles it into a small piece of code that is invoked on-demand by your GraphQL server. Basically any language should be fine for this setup unless it’s Python.

In my opinion this is a wonderful modern application of the UNIX tool philosophy: do one thing and do it pretty well. Create small self-contained units that can be assembled by the user to suit their needs.

Another thing AppSync handles for you are subscriptions. These let your client application subscribe to events or data updates in real-time. Setting up and administering your own WebSocket services can be a giant pain in the ass, especially if behind a load balancer. GraphQL and AppSync make this Just Work and Not Your Problem.

My biggest problem with AppSync is that it does not support unauthenticated requests. Any application will have a need for these, for example handling signup or forgot password flows (arguably these can be handled out of the box by Cognito but you will have other needs). So your application may need something like a REST API to fall back on for these types of requests or to use an unauthenticated Cognito identity pool role (cool although a pain to set up). I hope AWS adds a better solution some day.

Just REST With Extra Steps?

Some feedback from my coworkers when building a project making heavy use of GraphQL was “it feels like REST with extra steps.” Which is absolutely true, though the number of extra steps can depend quite a bit on your setup, like if you’re doing code-first or schema-first, need to run a code generation step, and depending on how you consume the API.

Speaking as generally as possible, it’s likely building a REST API will be faster and easier than GraphQL. If these are your priority, you probably want REST. It’s standard, you can build new endpoints quickly without having to define the request and response shapes, polymorphic types are much easier to deal with, you probably don’t need any code generation or extra tooling, and everyone’s familiar with it.

GraphQL will probably be more complex but this complexity pays off depending on your application. Your technical and business requirements are the deciding factor. If end-to-end type safety and validation is important to you, or self-documenting APIs, or real-time subscriptions, or you need to do relational queries via your API, or want client generation for different platforms, GraphQL may save you time and be more maintainable.

If your application isn’t so serious, or is small and unlikely to grow into a large API surface, if it’s just you or a small team, or if your frontend and backend can be tightly coupled, I would probably do a REST API due to its simplicity and low overhead.

GraphQL vs. REST: When to Use Each

When to Use GraphQL

GraphQL is particularly effective in scenarios with:

Complex Data Requirements: When applications need to retrieve complex, nested data structures in a single request, GraphQL provides a more efficient way to query the exact data required.
Dynamic Queries: In cases where clients may need to change their data requirements frequently, GraphQL enables clients to request only the fields they need, reducing the payload size and improving performance.
Multiple Resources: If your application architecture involves multiple resources that often link together, GraphQL allows for a more streamlined approach to fetching related data in a single round trip.
Real-Time Functionality: Applications that benefit from real-time updates, such as chat applications or live dashboards, can leverage GraphQL’s subscription capabilities for handling real-time data effortlessly.

When to Use REST

If you’re looking for:

Simplicity and Speed: For simple applications with straightforward API needs, REST provides a quick and easy method to implement CRUD operations with standard HTTP verbs.
Less Frequent Changes: If your API endpoints and data requirements are stable and unlikely to change frequently, REST’s fixed endpoints reduce complexity and can be more straightforward for consumer integration.
Well-Defined Resource Models: When your application is built around a clear resource structure that maps directly to HTTP verbs, REST’s resource-oriented approach is a natural fit, making it easy to understand and implement.
Support for Caching: RESTful services often leverage HTTP caching mechanisms effectively. If caching responses is a key consideration for performance, REST can provide built-in support for caching through standard HTTP headers.

In conclusion, evaluating your application’s specific needs will guide your choice between GraphQL and REST. For complex, dynamic data interactions with a focus on developer flexibility and efficiency, opt for GraphQL. For simpler use cases with stable requirements and resource-oriented designs, REST remains a robust and effective solution.

Mastering JavaScript Tree-Shaking

One fantastic feature of JavaScript (compared to say, Python) is that it is possible to bundle your code for packaging; removing everything that is not needed to run your code. We call this “tree-shaking” because the stuff you aren’t using falls out leaving the strongly-attached bits. This reduces the size of your output, whether it’s a NPM module, code to run in a browser, or a NodeJS application.

By way of illustration, suppose we have a script that imports a function from a library. Here it calls apple1() from lib.ts:

If we bundle script.ts using esbuild:

esbuild --bundle script.ts > script-bundle.js

Then we get the resulting output:

The main point here being that only apple1 is included in the bundled script. Since we didn’t use apple2 it gets shaken out like an overripe piece of fruit.

Motivation

There are many reasons this is a valuable feature, the main one is performance. Less code means less time spent parsing JavaScript when your code is executed. It means your web app loads faster, your docker image is smaller, your serverless function cold start time is reduced, your NPM module takes up less disk space.

A robust application architecture can be a serverless architecture where your application is composed of discrete functions. These functions can function like an API if you put an API gateway or GraphQL server that invokes the functions for different routes and queries, or can be triggered by messages in queues or files being uploaded to a bucket or as regularly scheduled events. In this setup each function is self-contained, only containing whatever code is needed for its specific functionality and no unrelated code. This is in contrast to a monolith or microservice where the entire application must be loaded up in order to handle a request. No matter how large your project gets, each function remains about the same size.

I build applications using Serverless Stack, which has a terrific developer experience focused on building serverless applications on AWS with TypeScript and CDK in a local development environment. Under the hood it uses esbuild. Let’s peek under the hood.

Mechanics

Conceptually tree-shaking is pretty straightforward; just throw away whatever code our application doesn’t use. However there are a great number of caveats and tricks needed to debug and finesse your bundling.

Tree-shaking is a feature provided by all modern JavaScript bundlers. These include tools like Webpack, Turbopack, esbuild, and rollup. If you ask them to produce a bundle they will do their best to remove unused code. But how do they know what is unused?

The fine details may vary from bundler to bundler and between targets but I’ll give an overview of salient properties and options to be aware of. I’ll use the example of using esbuild to produce node bundles for AWS Lambda but these concepts apply generally to anyone who wants to reduce their bundle size.

Measuring

Before trying to reduce your bundle size you need to look at what’s being bundled, how much space everything takes up, and why. There are a number of tools at our disposal which help visualize and trace the tree-shaking process.

Bundle Buddy

This is one of the best tools for analyzing bundles visually and it has very rich information. You will need to ask your bundler to produce a meta-analysis of the bundling process and run Bundle Buddy on it (it’s all local browser based). It supports webpack, create-react-app, rollup, rome, parcel, and esbuild. For esbuild you specify the --metafile=meta.json option.

When you upload your metafile to Bundle Buddy you will be presented with a great deal of information. Let’s go through what some of it indicates.

Let’s start with the duplicate modules.

This section lets you know that you have multiple versions of the same package in your bundle. This can be due to your dependencies or your project depending on different versions of a package which cannot be resolved to use the same version for some reason.

Here you can see I have versions 3.266.0 and 3.272 of the AWS SDK and two versions of fast-xml-parser. and The best way to hunt down why different versions may be included is to simply ask your package manager. For example you can ask pnpm:

$ pnpm why fast-xml-parser

dependencies:
@aws-sdk/client-cloudformation 3.266.0
├─┬ @aws-sdk/client-sts 3.266.0
│ └── fast-xml-parser 4.0.11
└── fast-xml-parser 4.0.11
@aws-sdk/client-cloudwatch-logs 3.266.0
└─┬ @aws-sdk/client-sts 3.266.0
  └── fast-xml-parser 4.0.11
...
@prisma/migrate 4.12.0
└─┬ mongoose 6.8.1
  └─┬ mongodb 4.12.1
    └─┬ @aws-sdk/credential-providers 3.282.0
      ├─┬ @aws-sdk/client-cognito-identity 3.282.0
      │ └─┬ @aws-sdk/client-sts 3.282.0
      │   └── fast-xml-parser 4.1.2
      ├─┬ @aws-sdk/client-sts 3.282.0
      │ └── fast-xml-parser 4.1.2
      └─┬ @aws-sdk/credential-provider-cognito-identity 3.282.0
        └─┬ @aws-sdk/client-cognito-identity 3.282.0
          └─┬ @aws-sdk/client-sts 3.282.0
            └── fast-xml-parser 4.1.2

So if I want to shrink my bundle I need to figure out how to get it so that both @aws-sdk/client-* and @prisma/migrate can agree on a common version to share so that only one copy of fast-xml-parser needs to end up in my bundle. Since this function shouldn’t even be importing @prisma/migrate (and certainly not mongodb) I can use that as a starting point for tracking down an unneeded import which will discuss shortly. Alternatively you can open a PR for one of the dependencies to use a looser version spec (e.g. ^4.0.0) for fast-xml-parser or @aws-sdk/client-sts.

With duplicate modules out of the way, the main meat of the report is the bundled modules. This will usually be broken up into your code and stuff from node_modules:

When viewing in Bundle Buddy you can click on any box to zoom in for a closer look. We can see that of the 1.63MB that comprises our bundle, 39K is for my actual function code:

This is interesting but not where we need to focus our efforts.

Clearly the prisma client and runtime are taking up sizable parcels of real-estate. There’s not much you can do about this besides file a ticket on GitHub (as I did here with much of this same information).

But looking at our node_modules we can see at a glance what is taking up the most space:

This is where you can survey what dependencies are not being tree-shaken out. You may have some intuitions about what belongs here, doesn’t belong here, or seems too large. For example in the case of my bundle the two biggest dependencies are on the left there, @redis-client (166KB) and gremlin 97KB). I do use redis as a caching layer for our Neptune graph database, of which gremlin is a client library that one uses to query the database. Because I know my application and this function I know that this function never needs to talk to the graph database so it doesn’t need gremlin. This is another starting point for me to trace why gremlin is being required. We’ll look at that later on when we get into tracing. Also noteworthy is that even though I only use one redis command, the code for handling all redis commands get bundled, adding a cost of 109KB to my bundle.

Finally the last section in the Bundle Buddy readout is a map of what files import other files. You can click in for what looks like a very interesting and useful graph but it seems to be a bit broken. No matter, we can see this same information presented more clearly by esbuild.

esbuild –analyze

Your bundler can also operate in a verbose mode where it tells you WHY certain modules are being included in your bundle. Once you’ve determined what is taking up the most space in your bundle or identified large modules that don’t belong, it may be obvious to you what the problem is and where and how to fix it. Oftentimes it may not be so clear why something is being included. In my example above of including gremlin, I needed to see what was requiring it.

We can ask our friend esbuild:

esbuild --bundle --analyze --analyze=verbose script.ts --outfile=tmp.js 2>&1 | less

The important bit here being the --analyze=verbose flag. This will print out all traces of all imports so the output gets rather large, hence piping it to less. It’s sorted by size so you can start at the top and see why your biggest imports are being included. A couple down from the top I can see what’s pulling in gremlin:

   ├ node_modules/.pnpm/gremlin@3.6.1/node_modules/gremlin/lib/process/graph-traversal.js ─── 13.1kb ─── 0.7%
   │  └ node_modules/.pnpm/gremlin@3.6.1/node_modules/gremlin/index.js
   │     └ backend/src/repo/gremlin.ts
   │        └ backend/src/repo/repository/skillGraph.ts
   │           └ backend/src/repo/repository/skill.ts
   │              └ backend/src/repo/repository/vacancy.ts
   │                 └ backend/src/repo/repository/candidate.ts
   │                    └ backend/src/api/graphql/candidate/list.ts

This is extremely useful information for tracking down exactly what in your code is telling the bundler to pull in this module. After a quick glance I realized my problem. The file repository/skill.ts contains a SkillRepository class which contains methods for loading a vacancy’s skills which is used by the vacancy repository which is eventually used by my function. Nothing in my function calls the SkillRepository methods which need gremlin, but it does include the SkillRepository class. What I foolishly assumed was that the methods on the class I don’t call will get tree-shaken out. This means that if you import a class, you will be bringing in all possible dependencies any method of that class brings in. Good to know!

@next/bundle-analyzer

This is a colorful but limited tool for showing you what’s being included in your NextJS build. You add it to your next.config.js file and when you do a build it will pop open tabs showing you what’s being bundled in your backend, frontend, and middleware chunks.

The amount of bullshit @apollo/client pulls in is extremely aggravating to me.

Modularize Imports

It was helpful for learning that using top-level Material-UI imports such as import { Button, Dialog } from "@mui/material" will pull in ALL of @mui/material into your bundle. Perhaps this is because NextJS still is stuck on CommonJS, although that is pure speculation on my part.

While you can fix this by assiduously doing import { Button } from "@mui/material/Button" everywhere this is hard to enforce and tedious. There is a NextJS config option to rewrite such imports:

  modularizeImports: {
    "@mui/material": {
      transform: "@mui/material/{{member}}",
    },
    "@mui/icons-material": {
      transform: "@mui/icons-material/{{member}}",
    },
  },

Webpack Analyzer

Has a spiffy graph of imports and works with Webpack.

Tips and Tricks

CommonJS vs. ESM

One factor that can affect bundling is using CommonJS vs. EcmaScript Modules (ESM). If you’re not familiar with the difference, the TypeScript documentation has a nice summary and the NodeJS package docs are quite informative and comprehensive. But basically CommonJS is the “old, busted” way of defining modules in JavaScript and makes use of things like require() and module.exports, whereas ESM is the “cool, somewhat less busted” way to define modules and their contents using import and export keywords.

Tree-shaking with CommonJS is possible but it is more wooley due to the more procedural format of defining exports from a module whereas ESM exports are more declarative. The esbuild tool is specifically built around ESM, in the docs it says:

This way esbuild will only bundle the parts of your packages that you actually use, which can sometimes be a substantial size savings. Note that esbuild’s tree shaking implementation relies on the use of ECMAScript module import and export statements. It does not work with CommonJS modules. Many packages on npm include both formats and esbuild tries to pick the format that works with tree shaking by default. You can customize which format esbuild picks using the main fields and/or conditions options depending on the package.

So if you’re using esbuild, it won’t even bother trying unless you’re using ESM-style imports and exports in your code and your dependencies. If you’re still typing require then you are a bad person and this is a fitting punishment for you.

As the documentation highlights, there is a related option called mainFields which affects which version of a package esbuild resolves. There is a complicated system for defining exports in package.json which allows a module to contain multiple versions of itself depending on how it’s being used. It can have one entrypoint if it’s require‘d, a different one if imported, or another if used in a browser.

The upshot is that you may need to tell your bundler explicitly to prefer the ESM (“module“) version of a package instead of the fallback CommonJS version (“main“). With esbuild the option looks something like:

esbuild --main-fields=module,main --bundle script.ts

Setting this will ensure the ESM version is preferred, which may result in improved tree-shaking.

Minification

Tree-shaking and minification are related but distinct optimizations for reducing the size of your bundle. Tree-shaking eliminates dead code, whereas minification rewrites the result to be smaller, for example replacing a function identifier e.g. “frobnicateMajorBazball” with say “a1“.

Usually enabling minification is a simple option in your bundler. This bundle is 2.1MB minified, but 4.5MB without minification:

❯ pnpm exec esbuild --format=esm --target=es2022 --bundle --platform=node --main-fields=module,main  backend/src/api/graphql/candidate/list.ts --outfile=tmp.js

  tmp.js  4.5mb ⚠️

⚡ Done in 239ms


❯ pnpm exec esbuild --minify --format=esm --target=es2022 --bundle --platform=node --main-fields=module,main  backend/src/api/graphql/candidate/list.ts --outfile=tmp-minified.js

  tmp-minified.js  2.1mb ⚠️

⚡ Done in 235ms

Side effects

Sometimes you may want to import a module not because it has a symbol your code makes use of but because you want some side-effect to happen as a result of importing it. This may be an import that extends jest matchers, or initializes a library like google analytics, or some initialization that is performed when a file is imported.

Your bundler doesn’t always know what’s safe to remove. If you have:

import './lib/initializeMangoids'

In your source, what should your bundler do with it? Should it keep it or remove it in tree-shaking?

If you’re using Webpack (or terser) it will look for a sideEffects property in a module’s package.json to check if it’s safe to assume that simply importing a file does not do anything magical:

{
  "name": "your-project",
  "sideEffects": false
}

Code can also be annotated with /*#__PURE__ */ to inform the minifier that this code has no side effects and can be tree-shaken if not referred to by included code.

var Button$1 = /*#__PURE__*/ withAppProvider()(Button);

Read about it in more detail in the Webpack docs.

Externals

Not every package you depend on needs to necessarily be in your bundle. For example in the case of AWS lambda the AWS SDK is included in the runtime. This is a fairly hefty dependency so it can shave some major slices off your bundle if you leave it out. This is done with the external flag:

❯ pnpm exec esbuild --minify --format=esm --target=es2022 --bundle --platform=node --main-fields=module,main  backend/src/api/graphql/candidate/list.ts --outfile=tmp-minified.js

  tmp-minified.js  2.1mb 


❯ pnpm exec --external:aws-sdk --minify --format=esm --target=es2022 --bundle --platform=node --main-fields=module,main  backend/src/api/graphql/candidate/list.ts --outfile=tmp-minified.js

  tmp-minified.js  1.8mb

One thing worth noting here is that there are different versions of packages depending on your runtime language and version. Node 18 contains the AWS v3 SDK (--external:@aws-sdk/) whereas previous versions contain the v2 SDK (--external:aws-sdk). Such details may be hidden from you if using the NodejsFunction CDK construct or SST Function construct.

On the CDK slack it was recommended to me to always bundle the AWS SDK in your function because you may be developing against a different version than what is available in the runtime. Or you can pin your package.json to use the exact version in the runtime.

Another reason to use externals is if you are using a layer. You can tell your bundler that the dependency is already available in the layer so it’s not needed to bundle it. I use this for prisma and puppeteer.

Performance Impacts

For web pages the performance impacts are instantly noticeable with a smaller bundle size. Your page will load faster both over the network and in terms of script parsing and execution time.

Another way to get an idea of what your node bundle is actually doing at startup is to profile it. I really like the 0x tool which can run a node script and give you a flame graph of where CPU time is spent. This can be an informative visualization and let you dig into what is being called when your script runs:

For serverless applications you can inspect the cold start (“initialization”) time for your function on your cloud platform. I use the AWS X-Ray tracing tool. Compare before and after some aggressive bundle size optimizations:

The cold-start time went from 2.74s to 1.60s. Not too bad.

What is Web3? Should You Care?

Web3 is: read/write/execute with artificial scarcity and cryptographic identity. Should you care? Yes.

What?

Let’s break it down.

Back when I started my career, “web2.0” was the hot new thing.

The “2.0” part of it was supposed to capture a few things: blogs, rounded corners on buttons and input fields, sharing of media online, 4th st in SOMA. But what really distinguished it from “1.0” was user-generated content. In the “1.0” days if you wanted to publish content on the web you basically had to upload an HTML file, maybe with some CSS or JS if you were a hotshot webmaster, to a server connected to the internet. It was not a user-friendly process and certainly not accessible to mere mortals.

Web 1.0 Developer Starter Kit pic.twitter.com/wqr52Use60
— Mischa Spiegelmock 🇺🇦 (@spiegelmock) October 22, 2019

The user-generated content idea was that websites could allow users to type stuff in and then save it for anyone to see. This was mostly first used for making blogs like LiveJournal and Moveable Type possible, later MySpace and Facebook and Twitter and wordpress.com where I’m still doing basically the same thing as back then. I don’t have to edit a file by hand and upload it to a server. You can even leave comments on my article! This concept seems so mundane to us now but it changed the web into an interactive medium where any human with an internet connection and cheap computer can publish content to anyone else on the planet. A serious game-changer, for better or for worse.

If you asked most people who had any idea about any of this stuff what would be built with web 2.0 they would probably have said “blogs I guess?” Few imagined the billions of users of YouTube, or grandparents sharing genocidal memes on Facebook, or TikTok dances. The concept of letting normies post stuff on the internet was too new to foresee the applications that would be built with it or the frightful perils it invited, not unlike opening a portal to hell.

Web3

The term “web3” is designed to refer to a similar paradigm shift underway.

Before getting into it I want to address the cryptocurrency hype. Cryptocurrency draws in a lot of people, many of dubious character qualities, that are lured by stories of getting rich without doing any work. This entire ecosystem is a distraction, although some of the speculation is based on organizations and products which may or may not have actual value and monetizable utility at some point in the present or future. This article is not about cryptocurrency, but about the underlying technologies which can power a vast array of new technologies and services that were not possible before. Cryptocurrency is just the first application of this new world but will end up being one of the most boring.

What powers the web3 world? What underlies it? With the help of blockchain technology a new set of primitives for building applications is becoming available. I would say the key interrelated elements are: artificial scarcity, cryptographic identity, and global execution and state. I’ll go into detail what I mean here, although to explain these concepts in detail in plain English is not trivial so I’m going to skip over a lot.

Cryptographic identity: your identity in web3-land consists of what is called a “keypair” (see Wikipedia), also known as a wallet. The only thing that gives you access to control your identity (and your wallet) is the fact that you are in physical or virtual possession of the “private key” half of the keypair. If you hold the private key, you can prove to anyone who’s asking that you own the “public key” associated with it, also known as your wallet address. So what?

Your identity is known to the world as your public key, or wallet address. There is an entire universe of possibilities that this opens up because only you, the holder of your private key, can prove that you own that identity. To list just a short number of examples:

No need to create a new account on every site or app you use.
No need for relying on Facebook, Google, Apple, etc to prove your identity (unless you want to).
People can encrypt messages for you that only you can read, without ever communicating with you, and post the message in public. Only the holder of the private key can decrypt such messages.
Sign any kind of message, for example voting over the internet or signing contracts.
Strong, verifiable identity. See my e-ID article for one such example provided by Estonia.
Anonymous, throwaway identities. Create a new identity for every site or interaction if you want.
Ownership or custody of funds or assets. Can require multiple parties to unlock an identity.
Link any kind of data to your identity, from drivers licenses to video game loot. Portable across any application. You own all the data rather than it living on some company’s servers.
Be sure you are always speaking to the same person. Impossible to impersonate anyone else’s identity without stealing their private key. No blue checkmarks needed.

There are boundless other possibilities opened up with cryptographic identity, and some new pitfalls that will result in a lot of unhappiness. The most obvious is the ease with which someone can lose their private key. It is crucial that you back yours up. Like write the recovery phrase on a piece of paper and put it in a safe deposit box. Brace yourself for a flood of despairing clickbait articles about people losing their life savings when their computer crashes. Just as we have banks to relieve us of the need to stash money under our mattresses, trusted (and scammer) establishments with customer support phone numbers and backups will pop up to service the general populace and hold on to their private keys.

Artificial scarcity: this one should be the most familiar by now. With blockchain technology came various ways of limiting the creation and quantity of digital assets. There will only ever be 21 million bitcoins in existence. If your private key proves you own a wallet with some bitcoin attached you can turn it into a nice house or lambo. NFTs (read this great deep dive explaining WTF a NFT is) make it possible to limit ownership of differentiated unique assets. Again we’re just getting started with the practical applications of this technology and it’s impossible to predict what this will enable. Say you want to give away tickets to an event but only have room for 100 people. You can do that digitally now and let people trade the rights. Or resell digital movies or video games you’ve purchased. Or the rights to artwork. Elites will use it for all kinds of money laundering and help bolster its popularity.

Perhaps you require members of your community to hold a certain number of tokens to be a member of the group, as with Friends With Benefits to name one notable example. If there are a limited number of $FWB tokens in existence, it means these tokens have value. They can be transferred or resold from people who aren’t getting a lot out of their membership to those who more strongly desire membership. As the group grows in prestige and has better parties the value of the tokens increases. As the members are holders of tokens it’s in their shared interest to increase the value the group provides its members. A virtuous cycle can be created. Governance questions can be decided based on the amount of tokens one has, since people with more tokens have a greater stake in the project. Or not, if you want to run things in a more equitable fashion you can do that too. Competition between different organizational structures is a Good Thing.

This concept is crucial to understand and so amazingly powerful. When it finally clicked for me is when I got super excited about web3. New forms of organization and governance are being made possible with this technology.

The combination of artificial scarcity, smart contracts, and verifiable identity is a super recipe for new ways of organizing and coordinating people around the world. Nobody knows the perfect system for each type of organization yet but there will be countless experiments done in the years to come. No technology has more potential power than that which coordinates the actions of people towards a common goal. Just look at nation states or joint stock companies and how they’ve transformed the world, both in ways good and bad.

The tools and procedures are still in their infancy, though I strongly recommend this terrific writeup of different existing tools for managing these Decentralized Autonomous Organizations (DAOs). Technology doesn’t solve all the problems of managing an organization of course, there are still necessary human layers and elements and interactions. However some of the procedures that have until now rested on an reliable and impartial legal system (something most people in the world don’t have access to) for the management and ownership of corporations can now be partially handled not only with smart contracts (e.g. for voting, enacting proposals, gating access) but investment, membership, and participation can be spread to theoretically anyone in the world with a smartphone instead of being limited to the boundaries of a single country and (let’s be real) a small number of elites who own these things and can make use of the legal system.

Any group of like-minded people on the planet can associate, perhaps raise investment, and operate and govern themselves as they see fit. Maybe for business ventures, co-ops, nonprofits, criminal syndicates, micro-nations, art studios, or all sorts of new organizations that we haven’t seen before. I can’t predict what form any of this will take but we have already seen the emergence of DAOs with billions of dollars of value inside them and we’re at the very, very early stages. This is what I’m most juiced about.

Check out the DAO Dashboard. This is already happening and it’s for real.

And to give one more salient example: a series of fractional ownership investments can be easily distributed throughout the DAO ecosystem. A successful non-profit that sponsors open source development work, Gitcoin, can choose to invest some of its GTC token in a new DAO it wants to help get off the ground, Developer DAO. The investment proposal, open for everyone to see and members to vote on, would swap 5% of the newly created Developer DAO tokens (CODE being the leading symbol proposal right now) for 50,000 GTC tokens, worth $680,000 at the time of writing. Developer DAO plans to use this and other funds raised to sponsor new web3 projects acting as an incubator that helps engineers build their web3 skills up for free. Developer DAO can invest its own CODE tokens in new projects and grants, taking a similar fraction of token ownership in new projects spun off by swapping CODE tokens. In this way each organization can invest a piece of itself in new projects, each denominated in their own currency which also doubles as a slice of ownership. It’s like companies investing shares of their own stock into new ventures without having to liquidate (liquidity can be provided via Uniswap liquidity pools). In this case we’re talking about an organic constellation of non-profit and for-profit ventures all distributing risk, investment capital, and governance amongst themselves with minimal friction that anyone in the world can participate in.

Global execution and state: there are now worldwide virtual machines, imaginary computers which can be operated by anyone and the details of their entire history, operations, and usage is public. These computers can be programmed with any sort of logic and the programs can be uploaded and executed by anyone, for a fee. Such programs today are usually referred to as smart contracts although that is really just one possible usage of this tool. What will people build with this technology? It’s impossible to predict at this early age, like imagining what smartphones will look like when the PC revolution is getting started.

These virtual machines are distributed across the planet and are extremely resilient and decentralized. No one person or company “owns” Ethereum (to use the most famous example) although there is a DAO that coordinates the standards for the virtual machine and related protocols. When a new proposal is adopted by the organization, the various software writers update their respective implementations of the Ethereum network to make changes and upgrades. It’s a voluntary process but one that works surprisingly well, and is not unlike the set of proposals and standards for the internet that have been managed for decades by the Internet Engineering Task Force (IETF).

A diagram showing where gas is needed for EVM operations — Ethereum virtual machine. More pictures here.

Also worth mentioning are zero-knowledge proofs which can enable privacy, things like anonymizing transactions and messaging. Of course these will for sure be used to nefarious ends, but they also open up possibilities for fighting tyranny and free exchange of information. Regardless of my opinion or anyone else’s, the cat’s out of the bag and these will be technologies that societies will need to contend with.

History of the Web Infographic: Web1, Web2, Web3.

Why should I care?

I didn’t care until recently, a month ago maybe. When I decided to take a peek to see what was going on in the web3 space, I found a whole new world. There are so many engineers out there who have realized the potential in this area, not to mention many of the smartest investors and technologists. The excitement is palpable and the amount of energy in the community is invigorating. I joined the Developer DAO, a new community of people who simply want to work on cool stuff together and help others learn how to program with this new technology. Purely focused on teaching and sharing knowledge. People from all over the world just magically appear and help each other build projects, not asking for anything in return. If you want to learn more about the web3 world you could do a lot worse than following @Developer_DAO on twitter.

As with all paradigm shifts, some older engineers will scoff and dismiss the new hotness as a stupid fad. There were those who pooh-poohed personal computers which could never match the power and specialized hardware of mainframes, those who mocked graphical interfaces as being for the weak, a grumpy engineer my mother knew who said the internet is “just a fad”, and people like Oracle’s CEO Larry Ellison saying the cloud is just someone else’s computer. Or me, saying the iPhone looks like a stupid idea.

The early phase of web3 is cryptocurrencies and blockchains (“layer 1”) solutions. Not something that non-technical people or really anyone can take full advantage of because there are few interfaces to interact with it. In the phase we’re in right now developer tools and additional layers of abstraction (“layer 2”) are starting to become standardized and accessible, and it’s just now starting to become possible to build web3 applications with user interfaces. Very soon we’ll start to see new types of applications appearing, to enable new kinds of communities, organizations, identity, and lots more nobody has dreamed up yet. There will be innumerable scams, a crash like after the first web bubble, annoying memesters and cryptochads. My advice is to ignore the sideshows and distractions and focus on the technology, tooling, and communities that weren’t possible until now and see what creative and world-changing things people build with web3.

For more information I recommend:

Ethereum.org overview of DAOs (text)
The decentralized future (podcast)
Chris Dixon and Naval Ravikant on The Wonders of Web3 (podcast – Tim Ferriss)
@Developer_DAO on twitter (tweets)

Frameworkless Web Applications

Since we have (mostly) advanced beyond CGI scripts and PHP the default tool many people reach for when building a web application is a framework. Like drafting a standard legal contract or making a successful Hollywood film, it’s good to have a template to work off of. A framework lends structure to your application and saves you from having to reinvent a bunch of wheels. It’s a solid foundation to build on which can be a substantial “batteries included” model (Rails, Django, Spring Boot, Nest) or a lightweight “slap together whatever shit you need outta this” sort of deal (Flask, Express).

The idea of a web framework is that there are certain basic features that most web apps need and that these services should be provided as part of the library. Nearly all web frameworks will give you some custom implementation of some or all of:

Configuration
Logging
Exception trapping
Parsing HTTP requests
Routing requests to functions
Serialization
Gateway adaptor (WSGI, Rack, WAR)
Middleware architecture
Plugin architecture
Development server

There are many other possible features but these are extremely common. Just about every framework has its own custom code to route a parsed HTTP request to a handler function, as in “call hello() when a GET request comes in for /hello.”

There are many great things to say about this approach. The ability to run your application on any sort of host from DigitalOcean to Heroku to EC2 is something we take for granted, as well as being able to easily run a web server on your local environment for testing. There is always some learning curve as you learn the ins and outs of how you register a URL route in this framework or log a debug message in that framework or add a custom serializer field.

But maybe we shouldn’t assume that our web apps always need to be built with a framework. Instead of being the default tool we grab without a moment’s reflection, now is a good time to reevaluate our assumptions.

Serverless

What struck me is that a number of the functions that frameworks provide are not needed if I go all-in on AWS. Long ago I decided I’m fine with Bezos owning my soul and acceded to writing software for this particular vendor, much as many engineers have built successful applications locked in to various layers of software abstraction. Early programmers had to decide which ISA or OS they wanted to couple their application to, later we’re still forced to make non-portable decisions but at a higher layer of abstraction. My python or JavaScript code will run on any CPU architecture or UNIX OS, but features from my cloud provider may restrict me to that cloud. Which I am totally fine with.

I’ve long been a fan of and written about serverless applications on this blog because I enjoy abstracting out as much of my infrastructure as possible so as to focus on the logic of my application that I’m interested in. My time is best spent concerning myself with business logic and not wrangling containers or deployments or load balancer configurations or gunicorn.

I’ve had a bit of a journey over the years adopting the serverless mindset, but one thing has been holding me back and it’s my attachment to web frameworks. While it’s quite common and appropriate to write serverless functions as small self-contained scripts in AWS Lambda, building a larger application in this fashion feels like trying to build a house without a foundation. I’ve done considerable experimentation mostly with trying to cram Flask into Lambda, where you still have all the comforts of your familiar framework and it handles all the routing inside a single function. You also have the flexibility to easily take your application out of AWS and run it elsewhere.

There are a number of issues with the approach of putting a web framework into a Lambda function. For one, it’s cheating. For another, when your application grows large enough the cold start time becomes a real problem. Web frameworks have the side-effect of loading your entire application code on startup, so any time a request comes in and there isn’t a warm handler to process it, the client must wait for your entire app to be imported before handling the request. This means users occasionally experience an extra few seconds of delay on a request, not good from a performance standpoint. There are simple workarounds like provisioned concurrency but it is a clear sign there is a flaw in the architecture.

Classic web frameworks are not appropriate for building a truly serverless application. It’s the wrong tool for the architecture.

The Anti-Framework

Assuming you are fully bought in to AWS and have embraced the lock-in lifestyle, life is great. AWS acts like a framework of its own providing all of the facilities one needs for a web application but in the form of web services of the Amazonian variety. If we’re talking about RESTful web services, it’s possible to put together an extremely scalable, maintainable, and highly available application.

Logging, monitoring: CloudWatch
Tracing: X-Ray
Alerting: Incident Manager
Configuration, exception trapping, execution: Lambda
HTTP request parsing, request routing: API Gateway
Relational database: Aurora Serverless
Configuration, secrets: Secrets Manager

No docker, kubernetes, or load balancers to worry about. You can even skip the VPC if you use the Aurora Data API to run SQL queries.

The above list could go on for a very long time but you get the point. If we want to be as lazy as possible and leverage cloud services as much as possible then what we really want is a tool for composing these services in an expressive and familiar fashion. Amazon’s new Cloud Development Kit (CDK) is just the tool for that. If you’ve never heard of CDK you can read a friendly introduction here or check out the official docs.

In short CDK lets you write high-level code in Python, TypeScript, Java or .NET, and compile it to a CloudFormation template that describes your infrastructure. A brief TypeScript example from cursed-webring:

// API Gateway with CORS enabled
const api = new RestApi(this, "cursed-api", {
  restApiName: "Cursed Service",
  defaultCorsPreflightOptions: {
    allowOrigins: apigateway.Cors.ALL_ORIGINS,
  },
  deployOptions: { tracingEnabled: true },
});
// defines the /sites/ resource in our API
const sitesResource = api.root.addResource("sites");
// get all sites handler, GET /sites/
const getAllSitesHandler = new NodejsFunction(
  this,
  "GetCursedSitesHandler",
  {
    entry: "resources/cursedSites.ts",
    handler: "getAllHandler",
    tracing: Tracing.ACTIVE,
  }
);
sitesResource.addMethod("GET", new LambdaIntegration(getAllSitesHandler));

Is CDK a framework? It depends how you define “framework” but I consider more to be infrastructure as code. By allowing you to effortlessly wire up the services you want in your application, CDK more accurately removes the need for any sort of traditional web framework when it comes to features like routing or responding to HTTP requests.

While CDK provides a great way to glue AWS services together it has little to say when it comes to your application code itself. I believe we can sink even lower into the proverbial couch by decorating our application code with metadata that generates the CDK resources our application declares, specifically Lambda functions and API Gateway routes. I call it an anti-framework.

@JetKit/CDK

Why TypeScript?

As an aside, TypeScript is now my preferred choice for backend development. JavaScript no, but TypeScript yes. The rapid evolution and improvements in the language with Microsoft behind it have been impressive. The language is as strict as you want it to be. Having one set of tooling, CI/CD pipelines, docs, libraries and language experience in your team is much easier than supporting two. All the frontends we work with are React and TypeScript, why not use the same linters, type checking, commit hooks, package repository, formatting configuration, and build tools instead of maintaining say, one set for a Python backend and another for a TypeScript frontend?

Python is totally fine except for its lack of type safety. Do not even attempt to blog at me ✋🏻 about mypy or pylance. It is like saying a Taco Bell is basically a real taqueria. Might get you through the day but it’s not really the same thing 🌮

Construct Generation

So we’ve seen the decorated application code, how does it get turned into cloud resources? With the ResourceGeneratorConstruct, a CDK construct that takes your functions and classes as input and generates AWS resources as output.

import { CorsHttpMethod, HttpApi } from "@aws-cdk/aws-apigatewayv2"
import { Construct, Duration, Stack, StackProps, App } from "@aws-cdk/core"
import { ResourceGeneratorConstruct } from "@jetkit/cdk"
import { aliveHandler, AlbumApi } from "../backend/src"  // your app code
export class InfraStack extends Stack {
  constructor(scope: App, id: string, props?: StackProps) {
    super(scope, id, props)
    // create API Gateway
    const httpApi = new HttpApi(this, "Api", {
      corsPreflight: {
        allowHeaders: ["Authorization"],
        allowMethods: [CorsHttpMethod.ANY],
        allowOrigins: ["*"],
        maxAge: Duration.days(10),
      },
    })
    // transmute your app code into infrastructure
    new ResourceGeneratorConstruct(this, "Generator", {
      resources: [AlbumApi, aliveHandler], // supply your API views and functions here
      httpApi,
    })
  }
}

It is necessary to explicitly pass the functions and classes you want resources for to the generator because otherwise esbuild will optimize them out of existence.

Try It Out

I’m a big fan of Serverless Stack, a lightweight toolkit for doing CDK-driven development with a few very useful features like the most advanced local development environment for AWS serverless applications.

Also please take a look at my starter kit for new serverless TypeScript applications.

Woodworker Designs and Builds the Perfect Tiny House Boat called the Le Koroc — Maybe a foundation isn’t needed after all

Web Services with AWS CDK

If you want to build a cloud-native web service, consider reaching for the AWS Cloud Development Kit. CDK is a new generation of infrastructure-as-code (IaC) tools designed to make packaging your code and infrastructure together as seamless and powerful as possible. It’s great for any application running on AWS, and it’s especially well-suited to serverless applications.

The CDK consists of a set of libraries containing resource definitions and higher-level constructs, and a command line interface (CLI) that synthesizes CloudFormation from your resource definitions and manages deployments. You can imperatively define your cloud resources like Lambda functions, S3 buckets, APIs, DNS records, alerts, DynamoDB tables, and everything else in AWS using TypeScript, Python, .NET, or Java. You can then connect these resources together and into more abstract groupings of resources and finally into stacks. Typically one entire service would be one stack.

class HelloCdkStack extends Stack {
  constructor(scope: App, id: string, props?: StackProps) {
    super(scope, id, props);

    new s3.Bucket(this, 'MyFirstBucket', {
      versioned: true
    });
  }
}

CDK doesn’t exactly replace CloudFormation because it generates CloudFormation markup from your resource and stack definitions. But it does mean that if you use CDK you don’t really ever have to manually write CloudFormation ever again. CloudFormation is a declarative language, which makes it challenging and cumbersome to do simple things like conditionals, for example changing a parameter value or not including a resource when your app is being deployed to production. When using a typed language you get the benefit of writing IaC with type checking and code completion, and the ability to connect resources together with a very natural syntax. One of the real time-saving benefits of CDK is that you can group logical collections of resources into reusable classes, defining higher level constructs like CloudWatch canary scripts, NodeJS functions, S3-based websites with CloudFront, and your own custom constructs of whatever you find yourself using repeatedly.

The CLI for CDK gives you a set of tools mostly useful for deploying your application. A simple cdk deploy parses your stacks and resources, synthesizes CloudFormation, and deploys it to AWS. The CLI is basic and relatively new, so don’t expect a ton of mature features just yet. I am still using the Serverless framework for serious applications because it has a wealth of built-in functionality and useful plugins for things like testing applications locally and tailing CloudWatch logs. AWS’s Serverless Application Model (SAM) is sort of equivalent to Serverless, but feels very Amazon-y and more like a proof-of-concept than a tool with any user empathy. The names of all of these tools are somewhat uninspired and can understandably cause confusion, so don’t feel bad if you feel a little lost.

Sample CDK Application

I built a small web service to put the CDK through its paces. My application has a React frontend that fetches a list of really shitty websites from a Lambda function and saves them in the browser’s IndexedDB, a sort of browser SQL database. The user can view the different shitty websites with previous and next buttons and submit a suggestion of a terrible site to add to the webring. You can view the entire source here and the finished product at cursed.lol.

To kick off a CDK project, run the init command: cdk init app --language typescript.

This generates an application scaffold we can fill in, beginning with the bin/cdk.ts script if using TypeScript. Here you can optionally configure environments and import your stacks.

#!/usr/bin/env node
import "source-map-support/register";
import * as cdk from "@aws-cdk/core";
import { CursedStack } from "../lib/stack";

const envProd: cdk.Environment = {
  account: "1234567890",
  region: "eu-west-1",
};

const app = new cdk.App();
new CursedStack(app, "CursedStack", { env: envProd });

The environment config isn’t required; by default your application can be deployed into any region and AWS account, making it easy to share and create development environments. However if you want to pre-define some environments for dev/staging/prod you can do that explicitly here. The documentation suggests using environment variables to select the desired AWS account and region at deploy-time and then writing a small shell script to set those variables when deploying. This is a very flexible and customizable way to manage your deployments, but it lacks the simplicity of Serverless which has a simple command-line option to select which stage you want. CDK is great for customizing to your specific needs, but doesn’t quite have that out-of-the-box user friendliness.

DynamoDB

Let’s take a look at a construct that defines a DynamoDB table for storing user submissions:

import * as core from "@aws-cdk/core";
import * as dynamodb from "@aws-cdk/aws-dynamodb";

export class CursedDB extends core.Construct {
  submissionsTable: dynamodb.Table;

  constructor(scope: core.Construct, id: string) {
    super(scope, id);

    this.submissionsTable = new dynamodb.Table(this, "SubmissionsTable", {
      partitionKey: {
        name: "id",
        type: dynamodb.AttributeType.STRING,
      },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
    });
  }
}

Here we create a table that has a string id primary key. In this example we save the table as a public property (this.submissionsTable) on the instance of our Construct because we will want to reference the table in our Lambda function in order to grant write access and provide the name of the table to the function so that it can write to the table. This concept of using a class property to keep track of resources you want to pass to other constructs isn’t anything particular to CDK – it’s just something I decided to do on my own to make it easy to connect different pieces of my service together.

Lambda Functions

Here I declare a construct which defines two Lambda functions. One function fetches a list of websites for the user to browse, and the other handles posting submissions which saved into our DynamoDB submissionsTable as well as Slacked to me. I am extremely lazy and manage most of my applications this way. We use the convenient NodejsFunction high-level construct to make our lives easier. This is the most complex construct of our stack. It:

Loads a secret containing our Slack webhook URL
Defines a custom property submissionsTable that it expects to receive
Defines an API Gateway with CORS enabled
Creates an API resource (/sites/) to hold our function endpoints
Defines two Lambda NodeJS functions (note that our source files are TypeScript – compilation happens automatically)
Connects the Lambda functions to the API resource as GET and POST endpoints
Grants write access to the submissionsTable to the submitSiteHandler function

import * as core from "@aws-cdk/core";
import * as apigateway from "@aws-cdk/aws-apigateway";
import * as sm from "@aws-cdk/aws-secretsmanager";
import { NodejsFunction } from "@aws-cdk/aws-lambda-nodejs";
import { LambdaIntegration, RestApi } from "@aws-cdk/aws-apigateway";
import { Table } from "@aws-cdk/aws-dynamodb";

// ARN of a secret containing the slack webhook URL
const slackWebhookSecret =
  "arn:aws:secretsmanager:eu-west-1:178183757879:secret:cursed/slack_webhook_url-MwQ0dY";

// required properties to instantiate our construct
// here we pass in a reference to our DynamoDB table
interface CursedSitesServiceProps {
  submissionsTable: Table;
}

export class CursedSitesService extends core.Construct {
  constructor(
    scope: core.Construct,
    id: string,
    props: CursedSitesServiceProps
  ) {
    super(scope, id);

    // load our webhook secret at deploy-time
    const secret = sm.Secret.fromSecretCompleteArn(
      this,
      "SlackWebhookSecret",
      slackWebhookSecret
    );

    // our API Gateway with CORS enabled
    const api = new RestApi(this, "cursed-api", {
      restApiName: "Cursed Service",
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
      },
    });

    // defines the /sites/ resource in our API
    const sitesResource = api.root.addResource("sites");

    // get all sites handler, GET /sites/
    const getAllSitesHandler = new NodejsFunction(
      this,
      "GetCursedSitesHandler",
      {
        entry: "resources/cursedSites.ts",
        handler: "getAllHandler",
      }
    );
    sitesResource.addMethod("GET", new LambdaIntegration(getAllSitesHandler));

    // submit, POST /sites/
    const submitSiteHandler = new NodejsFunction(
      this,
      "SubmitCursedSiteHandler",
      {
        entry: "resources/cursedSites.ts",
        handler: "submitHandler",
        environment: {
          // let our function access the webhook and dynamoDB table
          SLACK_WEBHOOK_URL: secret.secretValue.toString(),
          CURSED_SITE_SUBMISSIONS_TABLE_NAME: props.submissionsTable.tableName,
        },
      }
    );
    // allow submit function to write to our dynamoDB table
    props.submissionsTable.grantWriteData(submitSiteHandler);
    sitesResource.addMethod("POST", new LambdaIntegration(submitSiteHandler));
  }
}

While there’s a lot going on here it is very readable if taken line-by-line. I think this showcases some of the real expressibility of CDK. That props.submissionsTable.grantWriteData(submitSiteHandler) stanza is really 👨🏻‍🍳👌🏻. It grants that one function permission to write to the DynamoDB table that we defined in our first construct. We didn’t have to write any IAM policy statements, reference CloudFormation resources, or even look up exactly which actions this statement needs to consists of. This gives you a bit of the flavor of CDK’s simplicity compared to writing CloudFormation by hand.

If you’d like to look at the source code of these Lambdas you can find it here. Fetching the list of sites is accomplished by loading a Google Sheet as a CSV (did I mention I’m really lazy?) and the submission handler does a simple DynamoDB Put call and hits the Slack webhook with the submission. I love this kind of web service setup because once it’s deployed it runs forever and I never have to worry about managing it again, and it costs roughly $0 per month. If a website is submitted I can evaluate it and decide if it’s shitty enough to be included, and if so I can just add it to the Google Sheet. And I have a record of all submissions in case I forget or one gets lost in Slack or something.

CloudFront CDN

Let’s take a look at one last construct I put together for this application, a CloudFront CDN distribution in front of a S3 static website bucket. I realized the need to mirror many of these lame websites because due to their inherent crappiness they were slow, didn’t support HTTPS (needed when iFraming), and might not stay up forever. A little curl --mirror magic fixed that right up.

It’s important to preserve these treasures

Typically defining a CloudFront distribution with HTTPS support is a bit of a headache. Again the high-level constructs you get included with CDK really shine here and I made use of the CloudFrontWebDistribution construct to define just what I needed:

import {
  CloudFrontWebDistribution,
  OriginProtocolPolicy,
} from "@aws-cdk/aws-cloudfront";
import * as core from "@aws-cdk/core";

// cursed.llolo.lol ACM cert
const certificateArn =
  "arn:aws:acm:us-east-1:1234567890:certificate/79e60ba9-5517-4ce3-8ced-2d9d1ddb1d5c";

export class CursedMirror extends core.Construct {
  constructor(scope: core.Construct, id: string) {
    super(scope, id);

    new CloudFrontWebDistribution(this, "cursed-mirrors", {
      originConfigs: [
        {
          customOriginSource: {
            domainName: "cursed.llolo.lol.s3-website-eu-west-1.amazonaws.com",
            httpPort: 80,
            originProtocolPolicy: OriginProtocolPolicy.HTTP_ONLY,
          },
          behaviors: [{ isDefaultBehavior: true }],
        },
      ],
      aliasConfiguration: {
        acmCertRef: certificateArn,
        names: ["cursed.llolo.lol"],
      },
    });
  }
}

This creates a HTTPS-enabled CDN in front of my existing S3 bucket with static website hosting. I could have created the bucket with CDK as well but, since there can only be one bucket with this particular domain that seemed a bit overkill. If I wanted to make this more reusable these values could be stack parameters.

The Stack

Finally the top-level Stack contains all of our constructs. Here you can see how we pass the DynamoDB table provided by the CursedDB construct to the CursedSitesService containing our Lambdas.

import * as cdk from "@aws-cdk/core";
import { CursedMirror } from "./cursedMirror";
import { CursedSitesService } from "./cursedSitesService";
import { CursedDB } from "./db";

export class CursedStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const db = new CursedDB(this, "CursedDB");
    new CursedSitesService(this, "CursedSiteServices", {
      submissionsTable: db.submissionsTable,
    });
    new CursedMirror(this, "CursedSiteMirrorCDN");
  }
}

Putting it all together, all that’s left to do is run cdk deploy to summon our cloud resources into existence and write our frontend.

Security Warnings

It’s great that CDK asks for confirmation before opening up ports:

Is This Better?

Going through this exercize of creating a real service using nothing but CDK was a great way for me to get more comfortable with the tools and concepts behind it. Once I wrapped my head around the way the constructs fit together and started discovering all of the high-level constructs already provided by the libraries I really started to dig it. Need to load some secrets? Need to define Lambda functions integrated to API Gateway? Need a CloudFront S3 bucket website distribution? Need CloudWatch canaries? It’s already there and ready to go along with strict compile-time checking of your syntax and properties. I pretty much never encountered a situation where my code compiled but the deployment was invalid, a vastly improved state of affairs from trying to write CloudFormation manually.

And what about Terraform? In my humble opinion if you’re going to build cloud-native software it’s a waste of effort to abstract out your cloud provider and their resources. Better to embrace the tooling and particulars of one provider and specialize instead of pursuing some idealistic cloud-agnostic setup at a great price of efficiency. Multi-cloud is the worst practice.

The one thing that I missed most from the Serverless framework was tailing my CloudWatch logs. When I had issues in my Lambda logic (not something the CDK can fix for you) I had to go into the CloudWatch console to look at the logs instead of simply being able to tail them from the command line. The upshot though is that CDK is simply code, and writing your own tooling around it using the AWS API should be straightforward enough. I expect SAM and the CDK CLI to only get more mature and user-friendly over time, so I imagine I’ll be building projects of increasing seriousness with them as time progresses.

If you want to learn more, start with the CDK docs. And if you know of any cursed websites please feel free to mash that submit button.

Is Software Contracting For You?

It was in the last great recession that I started doing contract software development, about 2008-2010. The bubble didn’t burst with as much force and shrapnel as in 2000, but it had a distinct dampening of the animal spirits of SOMA, San Francisco where all the startups lived.

It was somewhat by design. The previous job I had, writing Java and ActionScript for a marketing research company was chill enough but felt aimless. I had so little motivation I ended up coasting for a few months and playing a lot of pingpong. I wanted some new challenges, so decided to go freelance.

The first gig I got was off of Craigslist, although I imagine these days Upwork would be a better place to look for work. The project was very limited in scope; mostly adding a Google Maps visualization on top of some grant data for a nonprofit. It was a couple weeks of work, a couple grand, and time to move on.

Some time later a former colleague of mine hired me for some development work at a SF startup doing IP telephony combined with podcasting. This was 2009 so nobody had ever heard of podcasts and smartphones were still a fresh new technology people weren’t entirely sure what to do with. The gig was a fun challenge – enabling people to listen to and produce podcasts using only a telephone, no apps involved. I ended up developing some pretty innovative software that resembled a web framework but for touch tone and interactive voice response (IVR) phone applications. We successfully moved the application from an expensive managed solution to our own in-house platform that I designed, significantly cutting down operating costs.

After some time working on that project, I met a couple of guys who wanted to build a SEO-optimized directory of medical professionals, starting with Spanish-speaking plastic surgeons. I said I could cobble together a search engine in my spare time and was engaged. Some days I would walk the two blocks over from the small telephony company office to the small office housing the nascent doctor directory business and show my progress.

It seemed clear at the time that the podcasting telephony company while highly experimental, was professionally run. It had not one but two Stanford business school co-CEOs running it, with what amounted to a successful track record in the form of an early dot-com electronic greeting card company. Remember those? Weird shit. There were respectable investors, a small team of smart and highly competent professionals, and a beautiful office on Howard St. I recall situated on the floor below our office sat a little room consisting of no furniture save a well-stocked bar with never any people in sight, and a sign on the door reading “GitHub.” I felt like great things could happen.

In contrast, my side gig seemed like some small-time SEO hustling along with a smaller team and paycheck and no Stanford business vibes or major VC funding. I didn’t see anything wrong with that, and still tried to do a professional job, but it seemed like more of a dead end compared to the “real” engineering I was doing, fighting battles with touchy open-source PBX software and voice recognition grammars.

Then things turned out completely differently from what I expected. The doctor directory project kept growing, expanding, and taking on a life of its own. We got a proper office at 1st and Mission, hired an engineer, a designer, a salesperson. Plastic surgeons were mostly ditched, and now we were primarily helping American dentists establish a presence on this new “world wide web” technology they couldn’t quite wrap their heads around. This contacting gig that I imagined would consist of a couple months of basic work kept growing and there was always more work to do. Without any planning or expectations it turned into a real company, with eventually a staff of 25 talented and terrific people and an extremely respectable office in the Financial District. We built software to help all kinds of small medical practices in the US manage their patient communication, from appointment reminders to e-visits to actually useful online medical Q&A. Six years later we sold the company to a large practice management software firm in Irvine, CA.

Following that experience my business partner John and I started a new company together again, this time on purpose. We started hiring and training some of the best young engineers, taking on projects and filling outstaffing needs for our clients, staffing a couple offices in Eastern Europe until covid forced us to go purely remote. This has in effect scaled up from my original single-person consulting operation into a powerhouse team of crack young engineers ready to take on complex software projects.

The classic Silicon Valley VC-backed, Stanford-connected, hip startup went nowhere, and closed its doors. I got permission to open-source the IVR framework we built but little else came of it. As it happened, consulting across different clients helped me to gain a broader picture of what was actually possible and break my preconceived notions founded on image instead of substance. I accidentally ended up starting a company which went on to help make American health care just a little tiny bit less terrible, created a couple dozen jobs, and had the profound and unique experience of building up the software for a company starting completely from nothing up through due diligence and acquisition. Along the way I learned some lessons about software contracting I want to share with others who may be considering going rōnin and setting out on their own as freelancers.

Tradeoffs and Considerations

It’s my nature to think of everything in terms of trade-offs. Maybe because I have engineer-brain, or because I’m a libra, who knows. There are real benefits to consulting as opposed to being a full-time employee, but also some downsides.

Legal Concerns

In America at least, contracting means forming your own business, doing 1099 tax forms and racking up deductions, and drafting and reviewing contracts. It’s more effort and responsibility than being a full-time employee somewhere, as you’re now responsible for taxes and legal matters. Even if you’re not in America, being able to work across borders means creating a legal business entity.

I started by initially getting a DBA, or “Doing Business As” name, under which I could legally create contracts and other paperwork using an official-looking business name. Later on I “upgraded” to a California S-Corporation, which gives favorable tax treatment once your income reaches a certain threshold along with some legal liability protection. If your corporation is sued, all that can be collected usually is what the corporation has, shielding you personally to some degree. A Cali S-Corp will run you $800 a year not counting the time spent on paperwork and taxes you or your accountant/tax attorney will be doing.

Even getting a DBA or Fictitious Business Name is far from simple. In Contra Costa County for example:

Within 30 days after a fictitious business name statement has been filed, the registrant shall cause it to be published in a newspaper of general circulation in the county where the fictitious business name statement was filed or, if there is no such newspaper in that county, in a newspaper of general circulation in an adjoining county. If the registrant does not have a place of business in this state, the notice shall be published in a newspaper of general circulation in Sacramento County. The publication must be once a week for four successive weeks and an affidavit of publication must be filed with the county clerk where the fictitious business name statement was filed within 30 days after the completion of the publication.
https://www.ccclerkrec.us/clerk/clerk/fictitious-business-name/

You also need to take out an ad in a local paper announcing the new business and inform your county.

You don’t need to do this yourself necessarily, services like BusinessRocket can take care of business formation and taxes for a small fee.

One of the first things you’ll need to do is ask your friends for legal services recommendations. If you know anyone who is a contractor or a small business owner they probably have lawyers they work with and can recommend. Or you can hit me up. I enlisted the services of a business attorney to help me draft and review contracts, and a tax attorney to take care of the corporation taxes and paperwork. Obviously lawyers are not cheap and in theory you can do all of this yourself, but they can also save you a great time of time and money as well by warning you about common pitfalls and fuckups, and suggest ways to better protect yourself or take advantage of favorable tax laws.

When you are going to agree to do work for a client they will want to know your hourly or project rate and a contract to sign. Understand that around a third to a half of your hourly rate is going to go to taxes, so adjust accordingly. A contract will need to have some important pieces of information. I highly suggest not listening to me and listening to an Actual Lawyer in your country or state about drafting a contract, but as far as software development you will typically need to include a “schedule of work.” This is the scope of what you will be expected to deliver in order to get paid.

Come at the schedule of work with a PM mindset – you have to figure out what the client actually wants and what actually needs to get built. If you end up needing to do more work outside of this scope then it can be a headache, and having a contract which spells out what you’ve been asked to do can make an effective backstop against scope creep. As a related matter, I highly suggest doing per-hour billing rather than doing projects for a fixed price whenever possible, because as we all know the best laid o’ work statement often go awry.

Technology

Sometimes you have a particular skill or framework or vertical or other some such specialization, and you can look for relevant gigs. When I started out long ago mine was Perl, and I didn’t have a hard time at all finding work. The first nonprofit project I worked on was already using my favorite Perl web framework. Some jobs will leave it entirely up to you to create something from nothing, and you can have your choice of technology for solving the problem. Sometimes you will go work for a company with their own existing codebase that wants to expand it, fix existing problems, or throw it away and rewrite from scratch.

For me this is one of the most thrilling parts of contracting; getting to see how different companies operate. I’ve gotten an opportunity to look at a good amount of different codebases and operational setups. It gives you a broader view of the landscape, allowing you to borrow best practices others have landed on, and learn from the mistakes of others. I’ve gotten to encounter a lot of technologies I would not have otherwise run into, like seeing different implementations of microservices, getting really familiar with SNMP, and deconstructing a J2EE application. When you work for one company or for yourself for a long time, it can be hard to stay current or get experience with other technologies. When working with different companies, you can rapidly take in and observe various stacks that organizations have coalesced around, usually ending up with good practices. There’s an infinite combination of frameworks, languages, architectures, libraries, development environments, and security practices and the state of the art is always in flux. Having exposure to new assemblies of technology keeps you curious and informed and better able to make decisions for your clients and projects.

Most of our business consists of either building new projects for people, taking over existing projects, or joining existing teams. We get to experience not just code of course, but see how different organizations are run, different business models, all kinds of personalities and team dynamics. It can potentially open you up to a richer tapestry of experiences and cultures than working on the same product and team for years and years will.

The technology you encounter will vary wildly, from hipster web microframeworks to ancient enterprise Java. Being flexible and able to rapidly adapt and figure out the basics of a lot of different technologies is a very valuable skill, as is learning how to start up every kind of bespoke development environment out there. I sort of have a weird perverted dream of going around rewriting ancient COBOL applications for desperate businesses to run on modern serverless cloud-first architecture.

You never know where freelancing will take you.

Clients

The coolest thing about being a contractor is that you can be your own boss. You set your own schedule, work from where you want, and don’t technically have to wear pants a lot of the time. Maybe this is less of a big deal than it used to be thanks to the ‘rona but flexibility is definitely something I value a lot.

Of course in the end, everyone has a boss, you can’t escape from it. The CEO has to answer to the board and investors, the investors have to answer to their partners, the partners have to answer to funds, and so on. Your boss is now the client, since they’re the one cutting your checks now.

In my experience this has been a good thing. I can report honestly that I’ve enjoyed working with 100% of my past and present clients and things have on the whole gone very smoothly. Much of it comes down to choosing your clients. You will turn some people down because they are looking for someone with different skills, don’t pay enough, have a Million Dollar App Idea I Just Need Someone To Build It, are unprofessional, or just not cut out for the whole business thing. Just don’t work for these people. It’s okay to turn down work. Always act professionally though, no matter what situation you find yourself in. Your reputation absolutely follows you around, and taking pride in your work and professionalism is a requirement for being a freelancer.

Another skill that you must be consciously aware of and always seeking to improve is communication. Being direct, open, and transparent with your clients can often mean the difference between a successful project and one that ends up in a mess of assumptions and bad feelings. Underpromise and overdeliver is the mantra. Over-communicate, bring any concerns about the project to the fore, have regularly scheduled progress meetings when applicable, and do demos for your client. Focus on delivering something visible, something the client can look at and play with, so you can get feedback early. There will almost always be some gray area between what your client has in their mind and what you envision in your own, not just in designs but in all the details in the details.

Sometimes a client may come to you with detailed designs and specifications, but I’ve never been in a situation where all the information needed to deliver a project was hashed out up front. Most of the time very little of it is. You need to establish a good two-way street of communication and always be asking for feedback and clarification of ambiguities. Get a MVP in their hands as early as possible and iterate on it.

Other Skills

Consider also what skills you can bring besides just writing code. Familiarity and expertise in UX and design is very valuable and basically a requirement for most software jobs these days. We’ve had clients come to us to help them perform due diligence on codebases of companies that they are considering acquiring. We’ve performed security audits on codebases, sometimes unbidden. And thanks to our extensive commercial experience building successful companies we also can provide valuable consulting services on marketing, sales and raising capital.

Whatever extra you can bring to the table for your clients be sure to market it and build up your experience and knowledge in that area. It can make the difference between being just another coder and a valuable partner to your client.

Working on three laptops at once can be a valuable skill.

Getting Started

For many contractor-curious folks getting started may be scary or daunting.

If you’re a full-time employee, becoming a contractor means giving up some job security. You may not have a guaranteed paycheck for a while, if ever. Being a contractor, especially if you’re starting out and doing it alone, brings many uncertainties. However working as a full-time employee carries its own sort of risks. Job security isn’t what it used to be, spending time on bureaucracy and pleasing your superiors may not make you fulfilled, and there’s a real limit to how much you can accomplish for yourself as part of a larger organization.

There is comfort in working at a company; all you have to do is show up and be told what to do, you don’t have to think much about taxes or contracts, someone else can make a lot of the decisions about how to run the company. But if you want to take on more responsibility, have the opportunity to grow and learn to run your own business, and do things your way then consider becoming a contractor. Almost everyone I know who works for a company of any size will be happy to tell you all about the mistakes and boneheaded ideas of their superiors. Everyone has ideas of how the company they work for could be run better. I say if you really believe this then work for yourself.

So how to begin? You have two options: line up to your first gig, or join a company doing freelance work. You can put the word out to your network that you are available for work, or you can go look for jobs posted on sites like Upwork. There are also companies that specialize in doing contract work and are often looking for contractors to augment their pool of developers. If you go this route know that such companies may not always have work immediately lined up for you, but they may be happy to interview you and keep you on file in case some work comes up that matches your qualifications.

My suggestion is to do both; look at what’s out there, get a feel for what people are looking for and how much they’re offering and also check out companies that are doing the kind of contracting work you’re interested in and offer your services to them. There certainly is no shortage of work and job opportunities out there for contractors, it’s more a matter of finding a good fit for you that will be engaging work and well-compensated. Even if you start with some small simple jobs, they can definitely lead to greater opportunities as you gain more confidence, experience, references, and a better understanding of the market.

I would of course be remiss as a small business owner if I did not mention that our consulting company JetBridge is always looking for smart and talented engineers. If you’re thinking of becoming a contractor, feel free to drop me a line, and I might be able to help you get started or refer you to other work out there. I know it can be an intimidating career jump, but it can be extremely rewarding and full of new opportunities as well. And if the current tech bubble happens to pop again someday, it just might be a great time to try something new.

Communication Tips for Engineers

Communication is a fundamental skill for engineers. No one builds anything on their own. Whether participating in an open-source project or being employed to crank out code, you need to work with others. The value of effective communication skills cannot be overstated.

Even Linus Torvalds, a curmudgeon who will never lack work and whose word is law in the Linux community, has acknowledged the need to be more measured in his criticisms and more generous with empathy.

Soft skills, in the parlance of our times, are in vogue. Employers and potential collaborators will judge you based on your ability to lucidly communicate your thoughts in an agreeable and succinct fashion.

There are many great guides and books written about how to communicate effectively with other human beings. A lot centers around having empathy. Having an understanding of where someone is coming from, considering the information that they have that you don’t or vice-versa, and being respectful are basic tenets. I offer a few suggestions here:

Use interrogatives instead of declarations

Even when you are pretty sure of a fact you want to communicate to someone else, it is often better phrased as a question rather than a statement of fact.

“Why did you write it this way?” is infinitely preferable to “this code is wrong.”

If discussing a solution or implementation with someone, ask them what they think first, regardless if you already have a plan in your head. They may suggest what you are already thinking, may have thought about things you haven’t, and may bring up other good ideas. Especially for junior people; it helps them engage their critical thinking skills instead of learning to rely on you to provide the answer. It gives people more ownership and desire to defend their idea because it’s theirs.

Don’t argue forever over things

This is one I am especially guilty of, and it has to do with knowing when to concede a technical argument. Very often there are no clear right answers for how to proceed with a feature or fix, and a healthy discussion of the tradeoffs can be illuminating and help arrive at a reasonable solution.

However since many tradeoff estimations involve a lot of guesswork and feelings and intuition, the “best” answer may never be agreed on. At some point you have to agree and move forward, and different people may have ideas of when that time has passed. I know I’ve driven at least one person crazy by continuing past the point they considered to be entering the domain of diminishing returns.

Often the proper solution isn’t clear. Agreeing on a proposal, prototyping, and gathering more data is more fruitful than making people exasperated or arguing for hours. Once you sit down to actually try it, it may quickly resolve the debate.

Try to not talk too much shit on other people’s code

“Oho!” said the pot to the kettle; “You are dirty and ugly and black! Sure no one would think you were metal, Except when you’re given a crack.”

“Not so! not so!” kettle said to the pot; “‘Tis your own dirty image you see; For I am so clean – without blemish or blot – That your blackness is mirrored in me.”

Source

If there’s one thing engineers love to do it’s complain about languages, libraries, tools, operating systems, service providers, interfaces, APIs, containerization systems, people on mailing lists, etc… When your tools don’t work right it can cause you hours of frustration and confusion, which nobody enjoys.

A moment’s contemplation will recollect the vast amount of sub-standard, buggy, hacked together balls of mud that the experienced engineer has thrown together in the past. If you write perfect defect-free code then you should continue with your expressions of distaste for the inferior engineers out there making fools of themselves. If on the other hand, you realize you have made plenty of blunders of your own, consider going easy on the target of your ire.

I’ve seen many online communities devolve into an ever-smaller group of grumpy guys, mostly chatting about how much everyone else is stupid and sucks. These communities sap your soul. Inject some positivity into your world when you can.

Communicate proactively

People aren’t mind readers and they generally are focused on their own work and problems. If you have information that may be useful to others, don’t wait for them to come to you and ask for it. If someone hasn’t asked you for an update or provided you with what you need, reach out rather than suffering in silence.

Over-communicating in a team is almost always better than under-communicating. By letting people know what you’re doing you can help others prioritize, not duplicate work, know whom to ask questions, inform you if they’re making changes that could affect your work, and reduce the need for people to make assumptions.

Don’t assume

Assumptions are frequently foolish and rarely right. Often a brief message can save you time, making sure you don’t start off down a fruitless path. And by asking, you are communicating information about what you are working on and the fact that the knowledge you seek is not as widely shared or accessible as the owner of it may assume.

Comment and document what you’re doing and why

Nothing makes working with code others have written go smoothly like having comments liberally sprinkled throughout the source code. Whether it’s a new hire or yourself in two years when you’ve forgotten why you need to check that condition in that insane query, a brief summary of your logic and thinking at the time can help impart understanding and reduce the need for assumptions.

Thanks for reading!

Org2Blog

This is my first post published using the emacs-to-wordpress tool Org2Blog. It’s pretty nifty.

Decentralizing Social Media

No, it has nothing to do with blockchain.

What kind of language should Facebook forbid? What kind of regulations should the U.S. government promogulate regarding whom Twitter can ban?

☞ Who cares?? ☜
Not me.

A depressing amount of energy and ink is wasted on these questions which shouldn’t even be issues in the first place. We don’t have to base our public discourse on platforms that corporations or even governments control.

The great news is that there does exist an alternative to the model of having all social media content go through a couple of companies. There is certainly no technical reason it should work that way, and there is a solution to the problem that has a foundation in technology, though there is naturally a social component as well.

What is this problem that needs a solution? I think it’s fantastically illustrated by all of these articles and experts and laws being passed to try to nudge Facebook, Youtube, Twitter to control what people are allowed to say and post. Busybodies, Concerned Citizens, corrupt politicians, think tanks, your parents, all want to petition these platforms to decide what you should be able to read or write. I view this as a problem, because I don’t think anyone should decide for me what information I should be able to share or consume. Not Mark Zuckerberg, not Donald Trump, not Jack Dorsey, not my congressional representatives, not the People’s Republic of China.

Government and private corporations in control of censorship are not the only problem here. As everyone knows these services are free, and as everyone also knows if the product is free then you are the product. Facebook and Google make almost all of their money from extracting and mining as much personal data about you as possible to sell to advertisers, PR agencies, and politicians. There is a better way.

Federated Social Media

The answer is federation. Decentralization. Distributed systems. You’re already familiar with the concept, just think of email. You don’t have an email username, you have an address. Your email address is a username on a host – mspiegelmock@gmail.com specifies the user mspiegelmock on the system gmail.com. I can write an email to someone else like rms@gnu.org, even if they don’t use gmail. I ask my provider gmail.com to send a message to the gnu.org host which is responsible for delivering it to the user rms.

No company “runs” email, yet all email servers know how to pass messages to each other. There is a vast array of different email hosts, providers, server applications and client apps. You can choose to sign up with a free provider like Google or Microsoft, your employer may provide you an account, or you can run your own server. You can use any sort of app you like with email, such as gmail.com, Apple Mail, Superhuman, Outlook, mutt, or emacs. In the earlier days of web-based mail there were a few options to choose from, like Yahoo and Hotmail, and eventually the company which provided the best user experience ended up grabbing a significant slice of email users, thanks to the wonders of competition. There were and are some issues with spam and malicious content to be sure, though a great amount of progress has been made on systems to combat it (spam and virus filters, real-time blacklists, DKIM/SPF). This is what federation looks like.

To grasp the concept behind federated social media, think of email. You sign up for an account with an instance (“host”) that you feel comfortable with, or run your own if you’re so inclined.

To follow someone you need their address, like @wooster@social.coop.

Think Twitter, but as an email address. The address denotes the username (@wooster) and the host the user is registered on (social.coop).

You have two timelines, in addition the people you follow. One timeline is the “local” timeline, which is everyone else on your instance. If you join an instance of people that share a particular hobby, language, interest, region or philosophy you get to start out with a feed that may have posts that may be relevant for you. Your instance can link, or “federate”, with other similar instances, connecting users on your instance with users on the other instances.

Moderation

Why not run a whitehouse.gov instance instead?

Just because there is no central authority for content moderation doesn’t mean that the system is full of abuse and Those Sorts Of People you would like to avoid. These things exist to be sure, as they do on any platform, but they are confined to their own instances. Moderation does exist, but unlike Facebook or Twitter you can choose your moderators. Most instances have policies about what external content they block, what types of instances they want to federate with, and what kinds of content they permit. If you disagree with their policies, you are free to join an instance that fits with your preferences, or start your own.

There are plenty of people I don’t want to hear from, there are plenty of posts our there that would decrease my quality of life, and I’m fine with outsourcing some moderation. I just don’t want this guy to be the final arbiter of all information.

Propaganda, trolls, abuse, and misinformation exist on every platform. You can find it on YouTube, LiveJournal, TikTok, Twitter, and no doubt on federated social media. Media literacy is an important skill that should be taught to help media consumers understand biases and distortions inherent in all media. A platform that helpfully provides fact-checking would be desirable to many users. But the fact remains that you cannot outsource critical thinking. There’s no getting around this.

The problem with top-down centralized structures that take it upon themselves to decide what information can or can not be spread should be plain. Corporations and politicians have bad incentives and the temptation to misuse such power to cover up misdeeds is too powerful for most to resist. We all know now what happened with Chernobyl and the misery caused by the suppression of information. Maybe if we had some supremely enlightened and benevolent information despot it would be okay to put them in charge, but I can’t really think of anyone I want to grant that authority over me.

The instance I belong to is social.coop, a social media cooperative. It’s a group of people who donate a small amount of money to pay for a server to host a Mastodon instance and volunteers who help maintain and administer it. There is an online forum for discussions and consensus-based decision-making, and a lot of smart people on it. This is just one example of the kind of self-organization that is possible in the fediverse.

Mastodon

Today the most popular software for plugging into the federated social media network is Mastodon. It’s free (AGPL) and open source naturally, and there are a number of apps you can use with it, including some slick paid apps. The Mastodon web interface looks something like this:

And the “Toot” app looks like this:

Many communities run Mastodon instances, some are public, some are cooperatives, some are private. You don’t have to use Mastodon to talk to people using Mastodon, if you use software that speaks ActivityPub then you can follow, share, post, like, comment, and communicate with anyone else in the fediverse. Just like if you use any email software, you can email anyone else using email software, which I think is pretty neat.

Mastodon is federated social media, federated social media is not Mastodon.

Technical Details (ActivityPub)

In the early days there were a number of attempts at creating social networking protocols with really obnoxious names like “pubsubhubbub.” After a bit of experimentation an official standard was published by the World Wide Web Consortium (W3C) in 2018, going by the name ActivityPub. You may know the W3C from their earlier hits like HTML, CSS, XML, and SVG.

As it’s an extremely recent standard, we’re still in the very early days of implementations. I expect there will be a number of libraries and clients popping up, along with not just standalone servers but server capabilities integrated into existing platforms and sites. Any site that lets you log in and post content could be modified to plug into the fediverse by implementing the relatively simple, JSON-based protocol. Your existing accounts could turn into ActivityPub Actor objects. New social networks can in effect bootstrap themselves by leveraging existing users, software, and federation networks, and we’ll see companies and open source projects compete to offer the best user experience.

Honestly the best way to learn about ActivityPub is to go read the standard doc. It’s in plain English with friendly cartoons and very straightforward.

Actor with messages flowing from rest of world to inbox and from outbox to rest of world

The very brief gist of it is that there is a client ↔︎ server protocol and a server ↔︎ sever protocol, much like IRC. You can read notifications that arrive in your inbox and you can publish messages to the world via an outbox. All content and objects on the platform are simple JSON documents that live at a URL (an IRI to be precise). Technically you could make a compliant ActivityPub server with a static webserver (I think? Tell me if I’m wrong).

Social media isn’t just text and image posts. The site PeerTube is a decentralized version of YouTube:

PeerTube is a free and open-source, decentralized, federated video platform powered by ActivityPub and WebTorrent, that uses peer-to-peer technology to reduce load on individual servers when viewing videos.

User Base

The network effect is what makes a social network attractive. The more people on it, the more useful it is. Having celebrities is a big draw for many people. It’s the biggest challenge that any challenger to the status quo faces.

I believe there is nothing permanent about Facebook or Twitter. I remember when it was hard to imagine anything replacing MySpace, or when everyone on the Russian-speaking internet had a LiveJournal. Fads change and great masses of people move smoothly to new platforms with ever increasing rapidity.

https://the-federation.info/ – Federation Stats

The thing that is so powerful about a social media protocol is that it seems, at least to me, like the logical conclusion of social media. Once a growing critical mass of people move to it, either because they are sick of Facebook’s shit or some cool new company’s platform happens to be powered by ActivityPub, most of the innovation around communities, software, moderation, new forms of media, organization and technology can happen within the fediverse. Because of the open and extensible nature of the underlying foundation anyone can plug in a conformant piece of software and shape their piece of it how they want while still interoperating with the wider world. It’s a perfect vehicle in which one can imagine fictional visions of the future internet taking shape, like the Metaverse from Snowcrash or the VR Net from Otherland.

https://pawoo.net – Japanese Mastodon with 625,000 users and 45 million posts

The internet is a decentralized collection of autonomous systems held together by communities defining standards and protocols. Distributed social media maps very nicely onto the architecture of the internet and promotes freedom of expression and experimentation with new forms of media, social organization, and technology. Doubtlessly it will create new problems as well, like intensifying the internet hypernormalization effect. It will likely be some time before norms and robust community structures are figured out at scale. The ability for anyone to participate in the wider social community within bounds and parameters they set for themselves will be messier but ultimately more powerful and driven by more positive incentives than the current social media monopolists.

Links

If you want to give Mastodon a spin you can check out joinmastodon.org.

To learn more about ActivityPub read the spec (really, it’s not dry). More info about the types of Actors, Activities and Objects on ActivityPub can be found here.

PeerTube homepage explaining federation of video content.

Why does decentralization matter? Mastodon op-ed.

The-federation.info – statistics and open source fediverse projects.

Excellent ActivityPub Conf Talk on this topic, from September 2020

Python 2020: Modern Best Practices

Python and related tooling continues to progress and evolve. I’d like to share some of the tools and practices we’re using at JetBridge to develop python web applications.

This is by no means an exhaustive account or a definite list of all best practices, and I hope readers will share what’s working well for them so I can learn and incorporate that knowledge. I don’t know about everything out there but I can at least present a survey of what we’ve been using on multiple projects with success.

Python

Let’s start with… python. As of January 1st, 2020 python 2 support was officially discontinued. If you are still maintaining any python 2 code you are using the language equivalent of Windows XP. Not only is python 2 no longer receiving security updates but now all python module authors will feel comfortable dropping any support for python 2 in any future versions of their modules, which means your dependencies are unlikely to receive security updates as well. Using python 2 is now a legitimate security risk.

Python 3.8 is out. What’s new in it?

The “walrus operator” := allows you to initialize a variable as part of any expression and save a line or two of code. The battle in PEP 572 over getting this operator included in the language was so unpleasant that it caused Guido van Rossum to ragequit his Benevolent Dictator For Life of Python role.

__pycache__ directories are now managed out-of-tree so they stop polluting your deployments and source control.

New additions to python’s type system – TypedDict lets you define the shape of a dictionary type, Literal lets you easily construct literal value constraints such as for enumerated value options, and at long last we have built-in support for structural subtyping, also known as Protocols.

F-string debug syntax – now instead of writing:

print(f"blorp={blorp}")

You can write:

print(f"{blorp=}")

Which is terrific news for those of us who will continue using print statements to debug until the day we die.

Python 3.9 is expected out in October 2020.

Linting and Formatting

Keeping your code neat and formatted can really help with readability and enforcing a consistent style. The tooling can also help catch potential bugs or mistakes. Here’s what we’re using:

Flake8 – Classic Linting Tool

Run as a pre-commit hook or in your CI flow. We suggest installing and enabling the plugins:

tox.ini configuration:

[flake8]
ignore = E305,E402,E501,I101,I100,I201
max-line-length = 160
exclude = .git,__pycache__,build,dist,.serverless,node_modules,migrations,.venv,.bento
enable-extensions = pep8-naming,flake8-debugger,flake8-docstrings

Mypy – Type Checking

Mypy performs the useful function of type-checking, to the extent one can in python. It does on some occasions catch useful errors for you and is improving as time goes on. Still, the usefulness of python’s bolted-on type system afterthought is limited compared to say, any other typed language.

If you are adding it to an existing project with many dependencies you may need to add ignore_missing_imports = True to your mypy.ini configuration file until you can resolve all of the warnings you’re going to get.

Bento – Static Analysis

Bento is a very new tool that attempts to be sort of a meta-linter, combining a number of different checker tools into one, most notably Bandit, a “Security oriented static analyser for python code.” It’s designed to integrate into git hooks and CI workflows relatively easily. It’s still quite new and not super mature yet but this is definitely a tool to keep your eye on. The analysis engines are open source and provided for free, though the company behind it is working to offer paid features for larger teams.

Black – Formatting

Black is a brutal and fantastic code formatter, much like prettier for python. It can be run as a pre-commit hook to make sure your code is formatted correctly, or you can have your editor run it automatically on save (my preference). It is technically possible to modify the formatting rules but there is no reason you should ever do that. Just enable it, always run it on every changed file, and never worry about 97% of code formatting issues ever again.

Workflow Integration

People are of differing opinions on whether you should add these tools into your editor, git hooks, or CI pipeline. Personally I have all of these tools hooked into my editor (mostly spacemacs but giving PyCharm a try) and love having my code formatted upon saving and seeing type errors inline in my code. This is definitely the best way to develop but it doesn’t enforce any standards in your team. Maybe you can always expect the people working on your project to have their editors configured correctly but this is mostly unrealistic for most teams.

You can add it as a pre-commit (or pre-push) git hook, which ensures everything is run before it goes to CI. The downside is this can add extra setup steps for the project or greatly increased execution time for common git commands.

Another option is to run all of your checks in CI and let developers be responsible for committing code that is correct or suffer failed tests. I have CircleCI configured to install dependencies and then run the checks as separate jobs in parallel.

And these options are not mutually exclusive. You can totally do all three together.

Testing

Switching away from unittest.TestCase and lots of custom helper functions to create objects in favor of pytest fixtures and factoryboy made testing vastly more pleasant, especially when writing tests that talk to the database.

Our setup for writing tests that interact with Flask and SQLAlchemy is to set up fixtures with factoryboy which helps you declaratively write fixture factories for all your database models and pytest-factoryboy which lets you register your factories as pytest fixtures. The plugin pytest-postgresql allows easy creation of a PostgreSQL database for running tests and pytest-flask-sqlalchemy patches in a mocked database session (or sessionmaker or engine if you need them) during tests that ensures each test runs in a subtransaction. Subtransactions (aka SAVEPOINT) allow you to run each test isolated in its own transaction and all changes are rolled back at the end of the test. This allows each test to be invisible to any other test or transaction and also to have all database changes cleaned up automatically. This is the most efficient way to run database tests with a high degree of reproducibility to how your application will be running for real.

There are a lot of pieces here but they fit together beautifully in the end. Your test setup may look something like this:

myapp/db/fixture.py – where we like to define database factories. These can be used for populating development environments and tests with sample DB rows.

from faker import Factory as FakerFactory
import factory
from jetkit.db import Session  # see https://github.com/jetbridge/jetkit-flask/blob/e3fc3448933ffbfb573cc1dfc873364cd17d4aca/jetkit/db/__init__.py#L10

faker: FakerFactory = FakerFactory.create()

class SQLAFactory(factory.alchemy.SQLAlchemyModelFactory):
    """Use a scoped session when creating factory models."""

    class Meta:
        abstract = True
        # by providing access to our current sqlalchemy session the factory can automatically 
        # add newly-created objects to the session (i.e. insert into the DB)
        sqlalchemy_session = Session


class UserFactoryFactory(SQLAFactory):
    """Base class for user factories with common fields."""
    class Meta:
        abstract = True

    dob = factory.LazyAttribute(lambda x: faker.simple_profile()["birthdate"])
    name = factory.LazyAttribute(lambda x: faker.name())
    password = 'my-default-pw!'
    avatar_url = factory.LazyAttribute(
        lambda x: f"https://placem.at/people?w=200&txt=0&random={random.randint(1, 100000)}"
    )


class NormalUserFactory(UserFactoryFactory):
    """Create a user with type=Normal."""
    class Meta:
        model = NormalUser

    email = factory.Sequence(lambda n: f"normaluser.{n}@example.com")

This sets us up with a factory that can produce NormalUser objects. In our setup we use SQLAlchemy polymorphism to distinguish between different user types with different model classes and the UserFactoryFactory (how very enterprise) gives us a base class to quickly define factories for each type of user model.

myapp/test/conftest.py – place to add fixtures made available to your tests. Documentation on these fixtures is provided here.

from myapp.db.fixtures import NormalUserFactory
from pytest_factoryboy import register

register(NormalUserFactory)

This register helper function takes our factory and creates two pytest fixtures out of it. One fixture will be called normal_user which will always return a user object in our DB session, created on demand once per test. The other fixture will be normal_user_factory which will accept arguments to override the factory defaults.

Next we set up fixtures for database, app, and our DB session:

@pytest.fixture(scope="session")
def database(request):
    """Create a Postgres database for the tests, and drop it when the tests are done."""
    with DatabaseJanitor(DB_USER, DB_HOST, DB_PORT, DB_NAME, DB_VERSION):
        yield

This provides a new database for the entire test session – it’s only created once and dropped when everything is finished.

@pytest.fixture(scope="session")
def app(database):
    """Create a Flask app context for tests."""
    # here we pass in config overrides to our create_app
    app = create_app(config=dict(SQLALCHEMY_DATABASE_URI=DB_CONN, TESTING=True))

    with app.app_context():
        yield app

The above code provides us with a Flask app and context for the duration of the entire test session. You can push a new context for each test if you like (remove the scope fixture argument) but I’ve never needed to do this.

@pytest.fixture(scope="session")
def _db(app):
    """Provide the transactional fixtures with access to the database via a Flask-SQLAlchemy database connection."""
    from myapp.db import db
    db.create_all()
    return db

This is the magic hook to provide our database session to pytest-flask-sqlalchemy. We need to provide the package of our SQLAlchemy instance to our pytest configuration in tox.ini:

[pytest]
# mock sqlalchemy database session during testing
mocked-sessions = myapp.db.db.session

Now we can define a fixture for a HTTP client to talk to our app:

@pytest.fixture
def client(app, normal_user):
    # get flask test client
    client = app.test_client()

    access_token = create_access_token(identity=normal_user)

    # set environ http header to authenticate user
    client.environ_base["HTTP_AUTHORIZATION"] = f"Bearer {access_token}"

    return client

This fixture has a dependency on two other fixtures; app and normal_user. We defined the app fixture just above, and the normal_user fixture is automatically added for us by the pytest_factoryboy register helper.

So now that we have a client fixture and a normal_user fixture, we can write very straightforward tests for API calls. Suppose we want to test a user API:

 def test_user_api(client, normal_user):
    response = client.get("/api/user/0")
    assert response.status_code == 404

    user_response = client.get(f"/api/user/{normal_user.id}")
    assert user_response.status_code == 200
    assert user_response.json.get("id") == normal_user.id

The simplicity and compactness of this test is striking. We don’t have any test cases, we define our dependencies in the function arguments, we use straightforward assert statements to check our responses. The test runs in an isolated subtransaction, dependency injection is performed to load the complete dependencies for this particular test, and it couldn’t possibly be any cleaner.

If you’re curious why we’re doing a simple assert here and not something like self.assertEqual() the answer is that pytest overrides the built in assert function with a more test-friendly and powerful version. You will still receive output exactly as you would expect from any test framework if the assertion fails. See the pytest documentation for more details.

Virtual Environments ﹠ Dependencies

The most modern tool for managing dependencies and virtual environments is Pipenv. It’s a bit more npm-style than venv or virtualenvwrapper, with a lockfile, split dev dependencies, and environment management via command line instead of sourcing anything in your shell. It saves the virtual environment files away out of tree.

The downsides for Pipenv are that it is frankly super slow and there hasn’t been an official release in over a year despite very active development. I hope that a faster new release will come out sometime soon.

Pipfiles are the future, no reason to be using requirements.txt anymore.

One more feature that may be of interest to some is the ability to define multiple sources in a Pipfile. If you have certain dependencies that need to be pulled from an internal package index server for example, you can define that source for only those dependencies instead of having to globally change your pypi mirror.

Web Framework

Some of the popular modern web frameworks are Django, Flask, and Falcon.

Django

Django is a pretty heavy solution but has the benefit of everything being set up for you. It’s not a tool I reach for because I normally only try to create lightweight API servers, with little to no server-side rendering of HTML, and I don’t find Django as suited to a serverless architecture as something more lightweight.

Flask

Flask has been our go-to tool for years. It gives you a basic core into which you can plug in components and features as needed. The setup involved in creating the perfect enterprise-ready Flask app from scratch is considerable and takes some experience to get right on your own. The flexibility and ability to craft an application perfectly suited to your needs is invaluable for serious projects, and the simplicity and whipupitude makes it perfect for dead-simple services too.

I’ve written at length about writing serverless web applications with Flask:

Falcon

If Flask is too heavy for you, there’s the Falcon microframework. If you’re writing a web service for a system with 64k of RAM and it’s not talking to any database or external services and the CPU overhead of handling HTTP requests and responses is the main bottleneck, Falcon may be a good choice. Their documentation really emphasizes how fast it is. I don’t think your web framework is usually the primary concern when it comes to speed but doubtless there are situations where this is needed.

Digression: Request Globals

<Digression>

There is one funky aspect of how Flask provides access to the current “app” context and the current request context that bothers or confuses some people. There exists an instance of your web application that contains configuration, routes, error handlers, and extensions that comprise your app. When your app is started up a new “app context” is pushed onto the app context stack to keep track of what app is currently active:

from myapp import app
with app.app_context():
    do_stuff_with_my_app()

In any code running inside of this context, you can access the current application.

from flask import current_app
def do_stuff_with_my_app():
    print(current_app.config['SOME_KEY'])

What’s important here is that current_app is a context variable proxy, which you can treat like a global variable but actually belongs to a context stack and is thread safe. Typically you only need to deal directly with pushing an app context if you’re writing scripts or wrappers that utilize your Flask app instance.

A similar approach is used for the current request context. When your Flask app is running (inside an app context) and a new request comes in, a new request context is pushed onto the request context stack to keep track of the request and request-local variables.

So whereas in many web frameworks like node’s Express you get passed in request and response objects as part of your handler:

app.post('/', function(request, response) {
  console.log(request.body);
  response.send(request.body);
})

Or in python’s Falcon:

import json
import falcon
class Resource(object):
    def on_post(self, req, resp):
        body = json.load(req.stream)
        print(body)
        resp.body = body
        resp.status = falcon.HTTP_200

In Flask one might write:

from flask import request
@app.route("/", methods=["POST"])
def app_index():
    body = request.get_json()
    print(body)
    return body

Again, request looks somewhat like a global variable but in reality it is a proxy object to a thread-local object on a context stack. The request is pushed automatically for you by Flask when the request comes in, so you mostly don’t have to know or care about manipulating this stack, unless you are writing some of the more exotic kinds of test cases.

This global-seeming access to context may feel dirty to some, likely conditioned by a healthy aversion to global variables or “god-objects” because of thread safety issues, poor code organization, and the inability to grapple with multiple instances of such objects simultaneously in the same program. These are valid concerns that the LocalProxy objects and context stacks effectively mitigate, while still providing a simple and convenient method to access the instances as needed from anywhere in your codebase, with the only caveat that you are responsible for pushing an app context if you are doing something outside the normal request flow.

I confess that the appeal of this approach was not obvious to me until I tried building a Flask app that talked to a database without using the Flask-SQLAlchemy extension. This extension integrates SQLAlchemy (an ORM) sessions with the Flask contexts so you can always easily access a database session that is local to the current request and transaction, or linked to your app context if not inside a request.

The real value of these context variables comes when you try to modularize your code and database routines. One problem that this solves is when you have a database transaction started inside a request, and then you call into some other code which may call other code which performs queries that should be inside the same transaction, as in a typical atomic operation that a RESTful endpoint might do. Somewhere you must retain a database handle to this operation, and expecting it to be passed through every function that might conceivably call another function that might perform a query is not feasible or clean. Being able to simply import a database session object that is automatically scoped to the finest level of application work you are performing (i.e. to the current request, or not) and assume it belongs to the current database transaction is a truly simple and elegant solution.

This approach has been recognized as a useful tool and in fact in python 3.7 gained first-class support in the form of contextvars from PEP 567. Opinions certainly may differ on the purity and magical-ness of this mechanism but I consider the simplicity and accessibility it affords to be the stronger argument. And given that it is now enshrined in python core means it is unlikely to go away anytime soon.

</Digression>

Putting Into Practice

If some of these ideas sound just splendid to you and you want to try them out, by all means give them a spin. If you’re looking to incrementally adopt new tools and features to your codebase implementing each of these suggestions independently should be manageable. However if you’re starting a new project or want to maximally embrace JetBridge style, it’s a daunting task to configure and wire up all of these practices into a well-organized and clean template. Honestly, setting up the database tests and Flask extensions is tedious. I’m lazy and don’t feel like doing it for new projects. That’s why we’ve created an open-source app starter kit and utility library for rapidly building modern, enterprise-ready python web applications with all of these practices and many more baked in and ready to go. Sort of a Create-React-App (we have one of those too) for our very opinionated python web service setup where we can put these recommendations into practice and save ourselves time setting up each new service.

sls-flask

Our starter kit is called sls-flask. It generates a Flask app skeleton with pytest fixtures, RESTful APIs and serialization, database factories, linting, authentication and more in a serverless-first package. It utilizes our handy JetKit-Flask python library that provides common database utilities (soft delete, upsert, UUIDs), S3 asset support, starting points for authentication and user access and other bits of functionality we’ve found useful in many projects.

Updates:

It’s February and already this article is out of date!

I tried setting up Github Actions for CI and boy is it a lot easier to configure than CircleCI. Highly recommend giving it a shot.

Poetry and pipx have been suggested as alternatives to pipenv for package management and running python applications (think npx) and I definitely plan to give these tools a closer look for my next project.

Also async python programming is becoming much more popular now. Some favorite servers and libraries mentioned were: starlette, fastapi, aiohttp, httpx, databases, Gino, asyncpg. At the application server level I’m not sure there is a big benefit to using asynchronous invocation handling if you’re already using functions as a service, but there are absolutely benefits to be had for making asynchronous calls to external services and database queries within a function invocation to parallelize requests.

Serverless WebSockets

WebSockets, the standard for doing real-time bidirectional communication typically between a browser and a server, is a fair attempt to create a standard to supplant the previously employed hacky solutions and continues to evolve in terms of implementation.

The basic idea has primarily been to establish some sort of channel in which a server can “push” events to a client, rather than the client “polling” every so often to see if there is new information. This was until fairly recently a relatively obscure concept, but now any smartphone owner is extremely well-acquainted with push notifications. This real-time channel has been used for not just notifications but also services like VOIP and gaming.

In the days before the WebSocket standard various semi-clever attempts to implement push notifications were devised. The first was using <iframe>s to load an HTML document using chunked encoding, where the server would write a script tag with some new data in the form of JavaScript commands when the data became available. When the browser encountered a closing script tag it would execute the JS immediately even though the document was still streaming.

The next scheme was using XML HTTP Request (aka XHR [aka AJAX]) to do something similar but without needing an <iframe>. This was known as “long-polling”, or “comet.” This was still mostly a unidirectional channel and suffered from timeouts and reconnection issues with potential race conditions.

Now with WebSockets we have a much improved system and wide browser support. But what about the backend? What happens when a browser or other client connects to a WebSocket server?

Previously we’ve developed and hosted WebSocket servers written in Perl, Go, and Python, using PostgreSQL asynchronous events as the message passing system. Deploying WebSocket servers is not as straightforward as HTTP servers because of the long-lived connections and having to perform TCP load balancing. Depending on your hosting setup you may have to deal with internal timeouts or getting events from your message bus to the right backend via some subscription mechanism.

Architecture

Since I love not running servers I’ve been excited about the chance to use serverless WebSockets via AWS API Gateway. In this new scheme you define Lambda functions that react to events such as authentication, connect, disconnect, and user-defined events that can be read from JSON message bodies.

Infrastructure-wise the setup is extremely basic. All of the real work to handle authorization and events and done in code, which we will look at shortly. Let’s use a concrete example of a typical WebSocket use case – sending notifications from the server to the client to inform it of some data change in order for the client to update some information in real time or notify the user.

For my application I created an authorizer function that validates a JWT encoded in the WebSocket URL query parameters (there is no good way in a browser to set headers when opening a WebSocket connection). This function denies or grants access to proceed and saves the authenticated user ID in the principalId response field, which is passed along to subsequent event handlers.

Once the authorization check is successful the special $connect route is called if there is a handler defined. In this handler we have the user ID in the invocation event passed along from the authorizer response and we have a connectionId. We save this user ID and connection ID pair in our database so that we can know who is connected and have the ability to send them a notification later on using their connectionId.

The API Gateway makes a best-effort attempt to detect disconnections and invokes the special $disconnect route whereupon our handler removes the connection record from the database.

Putting all of these pieces together with actual working code required me gathering a fair bit of information from different sources and working out the proper request fields and response formats but it all worked out wonderfully in the end. I’d like to share the working code examples for the handlers and some sample client code as well.

The Code

To define your handlers and when they get invoked you need to configure API Gateway to register your authorizer handler and the assorted route handlers. Using the Serverless toolkit this is straightforward and nicely documented. My configuration looks something like:

functions:
  # websocket authorizer
  wsAuth:
    handler: notifier.ws.handler.authorizer

  # websocket $connect
  wsConnect:
    handler: notifier.ws.handler.connect
    events:
      - websocket:
          route: $connect
          authorizer:
            name: wsAuth
            identitySource:
              - route.request.querystring.token  # token query param

  # websocket $disconnect
  wsDisconnect:
    handler: notifier.ws.handler.disconnect
    events:
      - websocket:
          route: $disconnect

And the authorizer:

def authorizer(event, context):
    method_arn = event.get("methodArn")
    def deny(msg):
        return {"message": msg,
                "policyDocument": gen_policy(method_arn=method_arn, allow=False)
        }

    # get access token from query string
    query_params = event.get("queryStringParameters")
    if not query_params:
        return deny("missing queryStringParameters")
    if "token" not in query_params:
        return deny("missing token in query string")
    token = query_params["token"]
    if not token:
        return deny("empty token")

    # decode and verify JWT token
    decoded = None
    try:
        decoded = decode_token(token)
    except ExpiredSignatureError:
        return deny("Expired token")

    identity = decoded.get("identity")
    if not identity:
        raise Exception("invalid JWT; missing identity")

    # allow access
    policy = gen_policy(method_arn=method_arn, allow=True)
    context = {}  # can add more auth context info here if desired
    res = {
        "principalId": identity,
        "policyDocument": policy,
        "context": context
    }
    return res

def gen_policy(method_arn: str, allow: bool):
    effect = "Allow" if allow else "Deny"
    return {
        "Version": "2012-10-17",
        "Statement": [{
            "Action": "execute-api:Invoke",
            "Effect": effect,
            "Resource": method_arn
        }],
    }

This looks for a JWT in the query string and attempts to parse and validate it. If successful then an IAM policy is returned along with the decoded identity ID. The details of the event and policy can be found in the Lambda REQUEST WebSocket authorizer documentation.

If the client is granted Invoke access to the execute-api service then API Gateway will call our $connect route next:

def connect(event, context):
    ctx = event.get("requestContext", {})
    # get user and connection id
    conn_id = ctx.get("connectionId")
    auth = ctx.get("authorizer", {})
    user_id = auth.get("principalId")

    if not user_id:
        return make_response(401, "Not authorized")

    if not conn_id:
        raise Exception("missing connectionId")

    # save the connection id/user id pair in DB
    WebsocketClient.save_connection(
        user_id=user_id,
        connection_id=conn_id,
        domain_name=ctx["domainName"],
        stage=ctx["stage"],
    )
    db.session.commit()

    return make_response(200, "ok")

def make_response(status_code, body):
    if not isinstance(body, str):
        body = json.dumps(body)
    return {"statusCode": status_code, "body": body}

The purpose of this route is to store the user ID and connection ID in the database along with the connection’s domain and stage. We will use this to send our notification to the client.

def send_ws(user_id, message):
    """Push a notification to the user if they have an active websocket connection."""
    connections = WebsocketClient \
        .query \
        .filter_by(user_id=user_id) \
        .all()

    for conn in connections:
        conn.send(message)

And conn.send():

import boto3
import json
from notifier.db import db, Model
from botocore.exceptions import ClientError

class WebsocketClient(Model):

    ...

    def send(self, message):
        """Send a message to an active connection.

        :param message: can be anything that is JSON-serializable."""
        # get APIGW management client
        apigw_mgmt_client = boto3.client(
            "apigatewaymanagementapi",
            endpoint_url=f"https://{self.domain_name}/{self.stage}",
        )
        try:
            # send message
            apigw_mgmt_client.post_to_connection(
                Data=json.dumps(message).encode("utf-8"),
                ConnectionId=self.connection_id,
            )
        except ClientError as err:
            # gracefully handle case where client is no longer connected
            code = int(err.response["Error"]["Code"])
            if code == 410:
                # client gone, cleanup
                db.session.delete(self)
                db.session.commit()
                return
            raise

This is the where the real action happens. When we want to send a message from the server to the client we do it with the PostToConnection call. We need to provide the API Gateway domain and stage for it to construct the URL needed for the API call. Boto is simply doing HTTP requests to interact with the WebSocket connection as documented here. And you can use an HTTP client directly if you like to get connection info, send a message, and close the connection.

For completeness let’s look at handling the $disconnect route:

def disconnect(event, context):
    # get connection ID
    ctx = event.get("requestContext", {})
    conn_id = ctx.get("connectionId")
    if not conn_id:
        raise Exception("no connection id found")

    # delete the connection record from our DB
    WebsocketClient.delete_connection(connection_id=conn_id)
    db.session.commit()
    return make_response(200, "ok")

Client ➞ Server Messages

But wait, there’s more!

Our application is now ready to send notifications to our client, but if we want to be able to receive messages from the client we can support this case as well. We can define custom routes that are matched based on a route key as documented here and here. In practice this means that if API Gateway receives a JSON message it looks for the route name by default in a field called "action" and decides which Lambda to call based on that value. You can also create a $default route to catch any unhandled message if you prefer to do things that way as well.

Client Code

I implemented a basic WebSocket client in TypeScript using the standard WebSocket API. The only special thing it does is append your access token (managed with axios-jwt) to the WebSocket connection URL.

import { refreshTokenIfNeeded } from 'axios-jwt'

export const WEBSOCKET_EVENT = 'onwebsocketmessage'

export class WSEvent extends Event {
  message: object

  constructor(msg: object) {
    super(WEBSOCKET_EVENT)
    this.message = msg
  }
}

export type WSEventHandler = (ev: WSEvent) => void

export default class WSClient extends EventTarget {
  ws: WebSocket | undefined
  public isConnected: boolean = false
  reconnectTime: number = 1 // time in seconds before reconnect

  // connect
  public open = async () => {
    if (this.ws) {
      if (this.ws.readyState === WebSocket.CONNECTING || this.ws.readyState === WebSocket.OPEN)
        // already open/opening
        return

      this.ws.close() // do reconnect
    }

    // config from create-react-app+dotenv
    if (!process.env.REACT_APP_WS_URL) throw new Error('REACT_APP_WS_URL missing')
    const host = new URL(process.env.REACT_APP_WS_URL)

    // make sure auth token is fresh
    // requestRefresh defined elsewhere - see axios-jwt documentation
    const accessToken = await refreshTokenIfNeeded(requestRefresh)

    // add auth token to URL
    if (accessToken) host.searchParams.set('token', accessToken)

    // create new websocket client
    if (!this.ws) {
      this.ws = new WebSocket(String(host))
      this.ws.onopen = this.handleOpen
      this.ws.onclose = this.handleClose
      this.ws.onmessage = this.handleMessage
    }
  }

  // disconnect
  public close = () => {
    if (this.ws) this.ws.close()
  }

  public reconnect() {
    if (this.ws) this.ws.close()
    this.open()
  }

  // CALLBACKS

  protected handleOpen = (ev: Event) => {
    this.isConnected = true
    this.reconnectTime = 1 // reset reconnect timer

    const ws = this.ws
    if (!ws) return
  }

  protected handleClose = (ev: Event) => {
    this.isConnected = false

    // do reconnect
    setTimeout(() => {
      this.reconnectTime *= 2 // exponential backoff

      this.open()
    }, this.reconnectTime * 1000)

    // reconnect?
    this.open()
  }

  protected handleMessage = (ev: MessageEvent) => {
    // handle message received on WS
    const data = ev.data
    if (!data) return

    // try to parse as JSON
    const msg = JSON.parse(data)

    // create new websocket event and dispatch it to listeners
    const msgEvt = new WSEvent(msg)
    this.dispatchEvent(msgEvt)
  }
}

And as a bonus here’s a React hook that lets you register an event handler for WebSocket messages:

import * as React from 'react'
import WSClient, { WEBSOCKET_EVENT, WSEvent } from './api'

// singleton
let client: WSClient

interface IUseWebSocketClientArgs {
  onEvent?: (evt: WSEvent) => void
}

const useWebSocketClient = ({ onEvent }: IUseWebSocketClientArgs) => {
  React.useEffect(() => {
    if (!client) client = new WSClient()

    // listen for events
    if (onEvent) client.addEventListener(WEBSOCKET_EVENT, onEvent as EventListener)

    // ensure client is connected
    client.open()

    // cleanup handler
    return () => {
      if (onEvent) client.removeEventListener(WEBSOCKET_EVENT, onEvent as EventListener)
    }
  })
  return { client }
}

export default useWebSocketClient

Conclusion

Like many other serverless technologies this approach is certainly not practical for every use case but it is quite reasonable for a lot of common cases. While API Gateway WebSockets kind of support binary data payloads the serverless approach is probably best suited to your application if you’re passing occasional JSON messages around and dealing with relatively low throughput and volume.