Lambda Function To Route Twilio Incoming Phone Calls

The absolute fastest way to create a serverless Twilio incoming call webhook Lambda:

npm install -g serverless sls install --url https://github.com/revmischa/slspy --name twilcall echo "twilio" > requirements.txt
Code language: PHP (php)

handler.py:

from twilio.twiml.voice_response import VoiceResponse from urllib.parse import parse_qsl def hello(event, context): call = dict(parse_qsl(event['body'])) resp = VoiceResponse() resp.say("hello world!", voice='alice') response = { "headers": {"content-type": "text/xml"}, "statusCode": 200, "body": str(resp), } return response
Code language: JavaScript (javascript)

Deploy, then set webhook handler for your phone number:

sls deploy

That’s all!

Customize your response with TwiML.

See this gist to check out more advanced handling with loading the Twilio API key from Secrets Manager, doing lookups with Twilio Add-Ons to detect spam/robocalling, and detailed caller lookup info that is output to a slack channel every call.

Making Use Of AWS Secrets Manager

One of the many new services re-invented at AWS’s re:invent conference was the storage of secrets for applications. Secrets in essence are generally things your application may need to run but you don’t really want to put in source control. Things like API keys, password salt, database connection strings and the like.

Current ways of providing secrets to applications are things like configuration files deployed separately, Heroku’s config which exposes them as environment variables, HashiCorp Vault, CloudFormation variables, and plenty of other solutions. AWS already had a dedicated mechanism for storing secrets and making them available via an API call in it’s SSM Parameter Store service, in which you could store values, with optional encryption.

AWS Secrets Manager is slightly but not very different from SSM Parameter Store; it adds secret rotation capabilities and isn’t as buried deep down inside the obliquely named “AWS Systems Manager” service. I think those are the main features. Oh it also lets you package a set of key/value pairs into one secret, whereas Param Store makes you create separate values that require individual API calls to retrieve.


Using Secrets With Serverless

To store encrypted secrets in the AWS Secrets Manager and make them available to your serverless application, you need to do the following:

  • Create a secret in Secrets Manager.  Select “Other type of secrets” unless you are storing database connection info, in which case click one of those buttons instead.
  • Select an encryption key to use. Probably best to create a key per application/stage.
  • Create key/value pairs for your secrets.
  • After creating the secret, there will be sample code for different languages that shows you how to read it in in your application.

Encryption Key Access

To create encryption keys, look in IAM (click “Encryption Keys on the bottom left in IAM) and assign admins and users of the keys. I manually assign the lambda role that’s been created as a user of the encryption key it needs access to in the console. There may be a way to automate this but it feels like an appropriate step to have some small manual intervention.

Your lambda’s role will look like $service-$stage-$region-lambdaRole

IAM Role

In addition to granting your lambda role access to decrypt your secret, you also need to grant it the ability to access your secret.:

plugins: - serverless-python-requirements - serverless-pseudo-parameters provider: name: aws runtime: python3.7 stage: ${opt:stage, 'dev'} # default to dev if not specified on command-line iamRoleStatements: - Effect: Allow Action: secretsmanager:GetSecretValue Resource: arn:aws:secretsmanager:#{AWS::Region}:#{AWS::AccountId}:secret:myapp/${self:provider.stage}/* environment: LOAD_SECRETS: true
Code language: PHP (php)

This is taken from my serverless.yml file. Note the secret:myapp/dev/* bit – it’s a good idea to use prefixes in your key names so that you can restrict access to resources by stage (i.e. dev only has access to dev secrets, prod only has access to prod secrets). In general prefixing AWS resources with application/stage identifiers is a crude but readable and simple way to create IAM policies for access groups of related resources.

serverless-pseudo-parameters is a handy plugin that lets you interpolate things like AWS::Region in your configuration so that you don’t need to do an unwieldy CloudFormation Fn::Join array to generate the ARN.

Reading Secrets In Your Application

Now that your application has access to GetSecretValue and is a user of the encryption key used to encrypt your secrets, you need to access the secrets in your application.

Sample code is provided in the Secrets Manager console to read your secret. One thing to note is that if you store key/value pairs in your secret, which is most likely what you’ll want to do almost all the time, you get it back as JSON. The sample code provided doesn’t decode it. The general idea is something like:

import boto3 import base64 import json def get_secret(secret_name): """Fetch secret via boto3.""" client = boto3.client(service_name='secretsmanager') get_secret_value_response = client.get_secret_value(SecretId=secret_name) if 'SecretString' in get_secret_value_response: secret = get_secret_value_response['SecretString'] return json.loads(secret) else: decoded_binary_secret = base64.b64decode(get_secret_value_response['SecretBinary']) return decoded_binary_secret
Code language: JavaScript (javascript)

And then you can do as you please with the secrets.

App Configuration

I like consuming secrets directly into my application’s configuration at startup. With Flask it looks something like this:

if os.getenv('LOAD_SECRETS'): # fetch config secrets from Secrets Manager secret_name = app.config['SECRET_NAME'] # dev, prod secrets = get_secret(secret_name=secret_name) if secrets: log.debug(f"{len(secrets.keys())} secrets loaded") app.config.update(secrets) else: log.debug("Failed to load secrets")
Code language: PHP (php)
  • First, only try to load secrets if we’re executing under serverless/lambda. LOAD_SECRETS is defined as true in the serverless.yml snippet above. This should not be true when running tests, as you don’t want tests to access secrets and you don’t want to pay for all the additional accesses anyhow.
  • SECRET_NAME is in application config and can be different for dev and prod.
  • app.config.update(secrets) adds whatever you stored in Secrets Manager to your app config.
  • I should probably make a dumb PyPI module to do this automatically.

Serverless Python API Development

In a previous article I discussed how to interact with the serverless AWS Lambda platform using only tools provided by Amazon. This was a valuable experiment that I suggest applying to any new technology or interesting new system you’d like to learn. Start with the basics and try doing a project without too many extra tools or abstractions so that you can get an idea of how the underlying system works and what’s unpleasant or boilerplate-y or requires too much effort. Once you have an idea of how the pieces fit together you can have a much better appreciation for the abstractions that go on top because you understand how they work, what problems they are solving, and what pain they are saving you from.

AWS services are powerful but generally need to be put together in coherent ways to achieve your goals. They’re modules that provide the functionality you need but still require some glue to make a nice developer experience. Fortunately because the entire platform is scriptable, software tools and additional layers of abstraction are rapidly increasing the capabilities of software engineers on their own to manage configuration without the need for any hardware or humans in between. CloudFormation (CF) allows declaration of your infrastructure with JSON or YAML. CF templates like the Serverless and CodeStar transforms make it easier to write less CloudFormation code to describe a serverless configuration. And then tools like the Serverless toolkit add another layer of automation on top of CF and provide a really excellent developer experience. Not to be outdone, Amazon provides an even higher level toolkit called Amplify (subject of a future article) to further increase the leverage of effort to available hardware and software muscle.

Serverless Toolkit

After going through the process of building some toy applications using AWS SAM and the Serverless CF transform, I quickly saw some of the drawbacks of not using a more advanced system to automate things:

  1. Viewing logs. Looking at CloudWatch logs in the AWS Console is not a great way to view the output of your application in real time, or in any time really. 
  2. It wasn’t clear to me how to save some pieces of a serverless application architecture for re-use in later projects. I posed a question to the Flask mailing list and IRC channel about how to make an extension based around it and didn’t get a useful response.
  3. Defining stuff like API gateways, S3 buckets for code, and domains in CF is tedious. It can be automated further.
  4. It would be nice to have some information readily available, such as what URL my application is deployed at.
  5. Deployments, including to different stages.
  6. Telling me when a deployment is finished, especially when using CodeStar.
  7. Invoking functions for testing and via automation.
  8. Managing dependencies.

And some other general stuff like keeping track of the correct AWS configuration profile and region. 

As happens so often in the field of Computers, I’m not the first one to encounter these issues and some other people have already solved most of the problem for me. 

To ensure a steady supply of confusion when discussing the relatively recent trend of serverless application architecture, there exists a collection of tools called Serverless, which resides on serverless.com. This should not be confused with serverless the adjective or the Serverless Application Model (SAM) or the AWS Serverless CF transform.

Every one of the issues mentioned above is simply handled by Serverless. I believe it’d be an unnecessary expenditure of time and effort to continue to develop serverless applications without it, based upon my recent experience trying to do so. Unless you’re just starting out and want to get a feel for the basics first, that is.

I won’t reiterate the Serverless quickstart here, go try it out yourself on their site. It takes very little effort, especially if you already have AWS credentials set up. I will instead talk about what advantages it gives you:

Logging

This is easy.  You can view (and tail) the logs for any function with

sls logs -f myfunction -t

Reusability

# immediately create a Flask app based on my template
sls install --url https://github.com/revmischa/serverless-flask --name myapp

Some of what people have been doing is going the same route as Create-React-App and creating templates for Serverless projects that can be accessed with “sls install.” On the one hand this does make it very easy to create and share reusable setups and allows for divergence as templates evolve, but it makes it much harder for projects started with older templates to incorporate new refinements. In the realm of Flask and Python, I don’t feel this problem is solved just by templates and some sort of python module that can co-evolve is needed. Something analogous to the react-scripts package that goes along with Create-React-App would likely be the way to go.

Configuration And CloudFormation

Now you declare your resources and functions in the serverless.yml configuration file, along with lots of other useful stuff

Nearly all of the boilerplate CF needed for serverless like a S3 bucket for code, IAM permissions for invoke and CloudWatch, API Gateway, etc are totally hidden from you and you never need to care about them. Only the minimum configuration and CF needed to describe what’s unique about your setup is required from you. On a scale of sendmail.conf to .emacs, serverless.yml is fairly high on the configuration file sublimity scale.

Info

This is easy. Where’d I park my domain again?

$ sls info
Service Information
service: myapp
stage: dev
region: eu-central-1
stack: myapp-dev
api keys:
  None
endpoints:
  ANY - https://di1baaanvc.execute-api.eu-central-1.amazonaws.com/dev
  ANY - https://di1baaanvc.execute-api.eu-central-1.amazonaws.com/dev/{proxy+}
  GET - https://di1baaanvc.execute-api.eu-central-1.amazonaws.com/dev/openapi
functions:
  app: myapp-dev-app
  openapi: myapp-dev-openapi
Serverless Domain Manager Summary
Domain Name
  myappmyapp.net
Distribution Domain Name
  dcwyw3gslhqw1.cloudfront.net

Deployment

This is easy too! Too easy!

$ sls deploy
$ sls deploy -s prod # specify stage

This bundles requirements if needed, packages the service, uploads to S3, and kicks off a CloudFormation stack update. 

Notice that sweet Serverless Domain Manager Summary section?
That, my friend, is the serverless-domain-manager plugin. If you want your endpoints to be deployed under a domain name you already have in a Route53 zone (and hopefully have an ACM certificate in us-west-1 to go with it) you can have Serverless automatically fire up the domain or subdomain for you along with a CloudFront distribution and API Gateway domain mapping.

I discovered an issue with the domain manager plugin selecting the ACM certificate for your domain at random among a list of matching domain names. This was picking an expired previous certificate, so I fixed it to filter out any unusable certificates. My PR was quickly and politely merged. Always a positive sign.

Waiting / Notifications

The aforementioned deploy command tells you when it’s done. Then you can test it out right away. You can speed it by only deploying a specific function, or using the S3 accelerate option to speed up uploading of your artifacts. Don’t waste time deploying stuff you don’t need or watching the CodeStar web UI.

Invoking Functions

AWS SAM is pretty easy, and so is Serverless. If developing a python webapp with the serverless-wsgi plugin, you can also serve your app up locally.

Managing Dependencies

(This part is python-specific)

How to manage dependencies for your python lambda? Well, just stick them in requirements.txt. Duh, right? With Serverless, more or less right. Remember that any dependencies have to be bundled in your lambda’s zip file. Need to build binary dependencies and not on a linux amd64 platform? Just add “dockerizePip: true” to the serverless-python-requirements plugin configuration in serverless.yml and you’re good to go.

Note that if invoking functions locally or starting the WSGI server, you still need a local virtualenv. One wacky non-Serverless template I looked at used pipenv instead to manage both local and lambda dependencies, but I couldn’t advise it; it’s pretty weaksauce.

Extending Serverless

Mostly what I’ve been doing with AWS Lambda is making small web API services using Python and the Flask microframework. With serverless providing exactly the tooling I need, I also want to be able to start new projects with a minimum of effort and have some pieces already in place that I can build on for my application.

I forked a serverless-flask template I found and started building on top of it. I made it not ask if you want to use python 2 or 3 (why not ask if I want UTF-8 or EBCDIC while you’re at it?) and defaulted dockerizing pip to false.

If building an API server in Flask, your life can be made much nicer with the addition of marshmallow to handle serializing and deserializing requests, flask-apispec to integrate marshmallow with OpenAPI (“swagger”) and Flask, and CORS. My version of the template includes all of this to make it as easy as humanly possible to make a documented serverless python REST API with the absolute minimal amount of effort and typing. And as a bonus it generates client libraries for your API from the OpenAPI definition in any language you desire.

Instructions for using the template and getting started quickly can be found here.

Serverless? Why Not

This article is a marker in the path our journey so far has taken us. Improving how we build applications and services is an ongoing process. Our previous milestone was unassisted AWS services, this present adventure was improved tooling for those services, and the next level to up may be AWS Amplify and GraphQL. Or maybe not. Stay tuned.

Serverless Python Web Applications With AWS Lambda and Flask

This is the first part of a series on serverless development with Python. Next part.

Serverless applications are great from the perspective of a developer – no infrastructure to manage, automatically scaling to meet requests without ever having to think about it, pay by the RAM gigabyte/second, and the ability to deploy via code however you desire. Logging comes free. For DevOps folks it’s a nightmare, as it represents the rapidly approaching obsolescence of their skills involving setting up web servers, load balancing, monitoring, logging, hanging out in datacenters, and other such quaint aspects of deploying web applications. Making a traditional web application run on AWS Lambda is not quite trivial yet, but is well worth understanding and considering next time you need a web service somewhere and it will surely get smoother and easier with time.

Oh yeah, what’s serverless mean here? It means you don’t manage any servers or long-lived processes. You put code in the cloud, and you run it whenever you want, however much you want, and don’t have to worry about the scaling part, while paying for only the CPU, memory, network traffic, and other services you consume. It’s a completely different way of deploying an application compared to managing daemons and servers and infrastructure and load balancing and all fun stuff that has very little to do with the code you want to write and deploy.

Python And Flask

There’s no shortage of options for web frameworks, and you can do a lot worse than Python 3 with Flask. Flask is a nice mixture of being able to create a simple web service with very little boilerplate, but also can also be used as a component for building more complex web applications, with the caveat that it’s more suited to APIs rather than server-side rendered apps when compared with something like Django. You should probably be writing single-page apps these days anyway.

Python is a very well-supported, readable, maintainable language with a vast amount of libraries available on the python module index. If you take the time to set up lint with mypy and flake8 you can even have some up-front checking of types and common mistakes. And it’s supported by AWS Lambda.

What A Serverless Flask Application Looks Like

To build a serverless Lambda application you should have a CloudFormation configuration template that describes your architecture.

The Lambda itself is one piece of this architecture; it is a function that can be invoked and can return a result. To use it as a web service, there needs to be a way to reach it from the web. AWS provides the API Gateway service which can listen for HTTP(S) requests on an endpoint and do something when requested. APIGW can either have an entry for each endpoint you desire (POST /api/foo,  GET /api/bar, etc…) or it can proxy any request at a given host and path prefix to a Lambda and then interpret the response as an HTTP response to send back to the requestor. In this way it works like CGI or WSGI, and as long as your web framework knows how to deserialize an API Gateway proxy request and then serialize the response back into the format the APIGW expects, it can appear as any other web application “container” to your app.

There is a very simple Flask extension for this – AWSGI. The example there shows everything you need to do to make a web application that runs as a serverless app:

Screen Shot 2018-08-29 at 21.54.26.png

The only thing that is really Lambda-specific is the lambda_handler, which gives Flask the WSGI request object it expects and then translates the response appropriately into the format AWSGI expects.

So between the Lambda hosting your application code and the API Gateway acting as its reverse proxy, your infrastructure is pretty clear at this point. There are some associated bits needed like IAM permissions and the like, and maybe a database or S3 bucket or whatever else your app requires. This can all be specified in the CloudFormation template.

AWS has CloudFormation “transforms” that simplify this configuration by providing templated resources for your template to use. Templates on templates is a truly auspicious way to declare configuration. You will harness more slack than you had previously dreamed possible.

Resources:
    HelloWorldFunction:
        Type: AWS::Serverless::Function
        Properties:
            CodeUri: hello_world/build/
            Handler: app.lambda_handler
            Runtime: python3.6
            Events:
                HelloWorld:
                    Type: Api
                    Properties:
                        Path: /hello
                        Method: get

This gives you a Lambda that runs code uploaded to a S3 bucket when invoked at /hello.

Deployment

Deployment is not yet quite as smooth as say, deploying your application to Heroku, but it is getting there. There do exist some popular tools for doing serverless orchestration like Serverless Framework but I’ve been trying to see how far I can get just using AWS’s native tooling.

AWS has a service that I think no one but myself has actually used called CodeStar. This sets up a serverless deployment pipeline for you automatically, configuring CodePipeline, CodeBuild, and CloudFormation to give you an entire CI/CD system. You can easily configure it to run a build every time you check something into a GitHub repo, and update a CloudFormation deployment automatically. In addition, it even has another level of template laziness in the form of the CodeStar CloudFormation transform.

The documentation on CodeStar and the transform is more or less non-existent, which makes it actually kind of a special challenge to use. However it does appear to take care of the step of uploading your code to an S3 bucket and providing some roles. Your CodeStar template.yml may look something like:

  Flask:
    Type: AWS::Serverless::Function
    Properties:
      Timeout: 10
      Handler: myapp/index.lambda_handler
      Runtime: python3.6
      Role:
        Fn::ImportValue:
          !Join ['-', [!Ref 'ProjectId', !Ref 'AWS::Region', 'LambdaTrustRole']]

If you create a repository with the file myapp/index.py, with a lambda_handler function like the AWSGI handler previously shown, then you now have a serverless web application that will be updated every time you push to your GitHub repo and CodeBuild passes.

Note that CodeStar may have the potential to make developing serverless applications as easy as Heroku with the power of everything you get from AWS and CloudFormation, but it definitely feels today like something no one has actually tried to use. I attempted to add an IAM role to my Lambda function and I managed to stump AWS support. They finally admitted to me that there is no documented way to customize the IAM role:

“After some further testing and research into this I found that you cannot currently customize the serverless Lambda function’s IAM Role from the template.yml.

Instead, you will need to manually add the desired permissions to the IAM Role directly. As you pointed out though, the Lambda function’s role is created automatically and you do not have insight into how to identify the IAM Role being used.”

There is a workaround involving deducing how the role name is derived in the CodeStar transform (which is a rather opaque and mystical bit of machinery) and adding permissions to it, but as far as I’m aware none of this is really supported or documented. They said they plan to fix it though.

You absolutely do not have to use CodeStar for deploying Lambdas, but if you relish a little bit of a challenge with some of the more esoteric AWS services it can be a valuable tool depending on your needs. There’s always some satisfaction at watching unloved AWS services grow and mature over the years (CodeDeploy…) and you can tell war stories about arguing with AWS support reps for months on a single ticket.

Speaking of which, when I tried using CodeBuild for running tests for my project they didn’t have a python 3.6 image, making that tool completely useless. Looks like they have one now though, so maybe not useless anymore.

Other Deployment Options

If you really just want to whip up a quick Lambda, you really don’t need a whole CI/CD system or CloudFormation of course. If you just want to write some code and run it and then not worry about it anymore, you can set it all up manually as a one-off function. I do this for some things like Slack bots and random little web services. I use Sublime Text 3 with an AWS Lambda editor plugin that I made. It lets you edit a Lambda directly from within Sublime, and uploads a new version whenever you hit save. It lets you invoke it and view the output within Sublime, and it has a handy shortcut for adding dependencies via pip to your project. It’s incredibly simple and vastly superior to using the web-based function editor, or unzipping and rezipping your bundle every time you want to modify the code.

Dependencies

Screen Shot 2018-08-29 at 23.15.50

Your Lambda is actually just distributed as a zip file of a directory. Into this directory goes your application code as well as any data files or dependencies it needs, along with your hopes and dreams. If you depend on other libraries (besides boto3, which Lambda already has for your convenience), you need to include them.

For a simple deployment, you can copy in libraries installed into a virtualenv for your project. If the libraries include native code, you must compile it on a linux amd64 machine because that’s what Lambda runs on. Some tools automate this with docker.

If you want something a little more friendly to use, you can set up a directory to stuff things in.

For my project, I made a dumb little “local pip” script that I can use to install packages with pip into a directory (“vendor/”). It’s nothing special or fancy. Just runs “pip install -t …” and deletes some unnecessary files afterwards.

In my application’s __init__.py file at the top I add vendor and the root path to PYTHONPATH, sort of giving me a mini-venv where I can just use any module installed to vendor/:

import os
import sys
vendor_path = os.path.abspath(os.path.join(__file__, '..', '..', 'vendor'))
lib_path = os.path.abspath(os.path.join(__file__, '..', '..'))
sys.path.append(lib_path)
sys.path.append(vendor_path)
from flask import Flask
...

And then any dependencies I package up can be imported easily.

Putting It Together

To experiment with CodeStar and serverless Flask I made a simple web application. It lets people ask a question or answer a question. It was created initially as a Slack app, although that didn’t quite go as I hoped.

An Aside: My goal was to allow anyone on Slack to receive the questions and answer them, but as I had it only hooked up to a Slack team full of trolls and degenerates the Slack app reviewer was extremely unimpressed with the quality and thoughtfulness of the responses he got when testing it. Which is entirely my fault, but whatever. It’s still available on the Slack app repository but since they wouldn’t permit it to work across teams (something about not being “appropriate for the workplace”?) the Slack interface is of limited usefulness.

Anywhoozlebee, the application is a simple one: allow users to ask questions, or respond to questions. It was implemented first as a web service compatible with the Slack webhooks and Slashcommand HTTP APIs, and then later as a REST API for the web.

If this was a serious project I would use PostgreSQL for a database, but in addition to trying to teach myself how to best design a serverless Flask application, I also wanted to spend as little money as possible hosting it. Unfortunately PostgreSQL is not exactly serverless at this point in time, and you can expect to spend at least tens of dollars a month on AWS if you want a PostgreSQL server on anything except a free tier micro EC2 instance. So I decided to try using AWS’s DynamoDB nosql… thing. It’s a pretty unpleasant key-value store and the boto3 documentation is written by a sadist, but it is cheap and can also scale a lot without having to care much. In theory anyway. Though apparently it sucks?

DynamoDB costs a few bucks a month for tables and indexes, although you can probably get by with one or two if you don’t try to do things like you would in a relational database. Between that and a million requests and 400,000 GB-seconds of compute time a month for free for Lambda, you can have some code run and a place to store data for peanuts. And it should scale horizontally without any effort or thought. I’m sure it’s not that simple in reality, but it’s nice to imagine. At least I never have to configure a webserver or administer a machine just to deploy a web application, and can do it on the (hella) cheap. One of the real values of serverless applications is the ability to just set something up once and then never worry about it again. If it works the first time, it’ll keep working. You don’t need to worry about disks dying or backups or dealing with traffic spikes or downtime. Sure AWS isn’t absolutely perfect but I sure trust them to keep my lambdas running day and night more than I trust most people, including myself. Especially including myself.

Secrets

With any application deployment, you will likely need to store some secrets. You don’t need to give your Lambda an AWS API key as it is invoked with an IAM role that you can grant access to the services it needs. For external services, you can use the AWS SSM Parameter Store. It just lets you store secrets and retrieve them if your role or user is granted permissions to read them. It’s a great place to store things like API keys and tokens.

Screen Shot 2018-08-29 at 23.47.30

Since we’re using Flask, we can easily integrate SSM Parameter Store with the Flask config.py:

import boto3
ssm = boto3.client('ssm')

def get_ssm_param(param_name: str, required: bool = True) -> str:
    """Get an encrypted AWS Systems Manger secret."""
    response = ssm.get_parameters(
        Names=[param_name],
        WithDecryption=True,
    )
    if not response['Parameters'] or not response['Parameters'][0] or not response['Parameters'][0]['Value']:
        if not required:
            return None
        raise Exception(
            f"Configuration error: missing AWS SSM parameter: {param_name}")
    return response['Parameters'][0]['Value']

TWILIO_API_SID = get_ssm_param('qanda_twilio_account_sid')
TWILIO_API_SECRET = get_ssm_param('qanda_twilio_account_secret')
SLACK_OAUTH_CLIENT_ID = get_ssm_param('qa_slack_oauth_client_id')
SLACK_OAUTH_CLIENT_SECRET = get_ssm_param('qa_slack_oauth_client_secret')
SLACK_VERIFICATION_TOKEN = get_ssm_param('qanda_slack_verification_token')
SLACK_LOG_ENDPOINT = get_ssm_param('qanda_slack_log_webhook', required=False)

Secrets status: secreted.

Running Locally

Because Lambdas run inside of AWS, you might think that it would be very cumbersome to have to deploy and test every code change you make using AWS. And that would suck, if you actually had to do that. There’s an AWS project called SAM-CLI – Serverless Application Model Command Line Interface. Using docker Lambda images, you can invoke your application within the same environment it would be running under on Lambda. You can either feed it a JSON file describing a Lambda request and view the response, or you can start it up as a server that you can connect to like any other local development webserver. You do have to provide an AWS API key though if you want your app to make use of AWS services, as it’s running on your local machine and not under the auspices of an instance role in AWS.

Further Examples

In summary the above are considerations that are necessary for creating and deploying a serverless web application. I’m pretty pleased with the way everything fit together in my learning project QandA and I invite you to look at the project structure and source code for a complete working example. There are some more details I could go into about how I structured the Flask application, but they aren’t really Lambda- or serverless-specific and if you’re interested, really just check out the code.

Serverless API

Lambda is still in the early days and far from mature, and not yet as easy to work with as Heroku. But there is a high upside to being able to just have “code running in the cloud” without having to think about it or manage any server, and for basically free. Once you’ve taken the small amount of effort to set up a serverless application, you’re rewarded with an easy way to run code on the internet without having to worry about anything below the level of “request -> application code -> response”. I prefer worrying about code and configuration files over managing infrastructure and servers.

Ulex: an Open Source Legal Framework

Originally posted on the Charter Cities Institute blog

What Can Lawyers Learn from Programmers?

What kinds of professionals spend their days reading, writing, and editing rules?  Two kinds: lawyers and computer programmers. Despite this fundamental similarity, however, they seem to live in different worlds.  That’s probably because lawyers mistakenly think that programmers don’t have much to teach them (and, as a consequence, because programmers try to stay very, very far away from lawyers).  In fact, though, lawyers could learn a lot from coders.

Computer code and legal code are similar in more than a few respects. Both declare operations to be performed under specific sets of conditions, attempt to create definitions that correspond to human activity, incorporate and revise previously-written code, and (hopefully) account for exceptional circumstances. While there are a great many differences as well, there may yet be an opportunity for cross-pollinating knowledge and experience from one field to the other.

Legal systems resemble pre-1970’s software in terms of portability and reusability. Much like how programs and operating systems were bespoke designed unique for each architecture, governance and codes of laws are currently crafted for each environment or government and unable to build upon each other, despite performing very similar functions.

Modern software developers are aware of the recent explosion of new ideas, experimentation, forms of organization and impressive decentralized projects that came with the introduction of suitable layers of abstraction and portability in the form of C and UNIX and later combined with the open source movement and the internet. By freeing programmers from having to rewrite operating systems that did more or less the same thing but with different interfaces, they could focus on actually writing programs, and even port those programs to different computer architectures without having to rewrite the entire application. Instead of writing code in an assembly language that was incompatible with all other platforms, the introduction of a higher-level language allowed programs to be transferred from one environment to another.

While some members of the software profession may take this wonderful state of affairs for granted in the digital realm, in the offline world they live under systems of laws and governance that are still waiting for a common framework, a standardized legal operating system, a basic foundation that hobbyists and experts from different countries and disciplines can openly collaborate on. Many of us may have vague desires and ideas of how to share the lessons learned from portable computer systems with the legal profession, though such ideas are likely worth their weight in gold Dunning-Krugerrands. Now however, a few in the legal profession have attempted to apply these concepts to law.

But first, an explanation of the problem: many people are unsatisfied with aspects of the governments and legal systems they live under, and most people have little agency to come up with their own improvements. There is a high exit cost to changing countries, citizenship, and governments. Improvements and refinements from one jurisdiction cannot always be easily taken and applied to another because of incompatible legal systems, different definitions, unintelligible languages, and jurisprudence. The overhead of experimentation can be great; each new government writes its constitution from scratch, comes up with court systems, has its own legislatures and judges, and so on. This limits the scope for the sort of healthy robust competition and innovation that has been seen in the world of software since the introduction of portable programs and operating systems, open-source development, reusable libraries, and collaboration on a global scale.

Perhaps you would like to create a new set of codes for yourself and like-minded individuals to live by. Maybe you believe laws or punishments are unfair or unjust, or you certain immoral or unsavory behavior should be curtailed. Taxes should go to fund socially useful schemes or taxes are too damn high. Freedom of movement is a basic human right or we need to keep the bad guys out. Whatever your vision, there are practical and theoretical ways of implementing it today, be it via homeowner associations, Special Economic Zones, setting up your own seasteading colony in international waters, founding a religion, or incorporating a new town. There exists a vast number of overlapping codes and laws that govern all people already, but they often lack a shared foundation of well-understood, time-tested common principles. Enter Ulex.

What is Ulex?

In a nutshell, Ulex provides a set of sane legal defaults including a simple system for resolving disputes closely modeled on the system commonly used to arbitrate international trading disputes. There are recommended basic modules for civil procedure, torts, contracts, and property that incorporate best practices as codified by organizations including the American Law Institute/International Institute for the Unification of Law’s (ALI/UNIDROIT’s) Principles of Transnational Civil Procedure, and selected volumes from The ALI’s Restatements of the Law.

Ulex 1.1 can be viewed as a template for creating a legal distribution. It references the contents of the legal packages from quality upstream maintainers, along with some system utilities in the form of meta-rules, optional modules for criminal law, procedural rules and substantive rules. This distribution should not be considered final, complete, or the best legal system for any new self-governing group of people, but rather a starting point for experimentation. Since not everyone who wishes to create a new society is well versed in legal history and modern best practices, having a jumping-off point with quality material curated by a law professor should be useful. Not everyone wanting to build applications may know how to design a working operating system, and they shouldn’t have to.

Creating legal systems by means of references other documents is a common and systematic practice, much in the way that software is rarely written from scratch but instead makes use of libraries of already packaged code. So too can legal distributions be created with a few well-thought references to systems that already work well and some legal wording glue to create a coherent system. Implementation is accomplished via contract law in the context of the host state, a necessary bootstrapping mechanism for now. Ulex version 1.1 includes an optional host sovereign integration module (section 5) if better compatibility is desired.

All that is needed for implementation is for parties to formally agree:

Only Ulex 1.1 shall govern any claim or question arising under or related to this agreement, including the proper forum for resolving disputes, all rules applied therein, and the form and effect of any judgment.

Ulex is not the final answer in self-organizing legal systems but a potential first organizing principle and base layer of abstraction upon which more varied and ambitious legal projects can be based. If a common template and minimum functioning system can be designed, the process and results can be embraced and extended around the world as people increasingly experiment with new forms and options for self-governance. With competing designs and implementations may come, eventually, more inspired and community-driven legal systems all developed in the finest tradition of open source development.

For future reading, the description of Ulex 1.1 is recommended.


projectM: OpenGL and Shader Modernization

I wrote recently about the interesting work being done on the open source music visualizer projectM, and that progress is being made on modernizing the graphics capabilities of this library. Since it was first created, there have been changes made in the way OpenGL is to be used for widespread compatibility and support for GPU shader programs.

Now the efforts are in the home stretch; OpenGL ES support is close to complete, and there is support for handling a decent percentage of the shader programs found in the visualization presets. The vast majority of the work has been done (by @deltaoscarmike), and mostly bug fixing remains.

After setting up vertex buffers and attributes and reworking the texturing and sampling code, the final step for full modern compatibility was dealing with the HLSL shaders. MilkDrop is the visualizer for Milkdrop that projectM is built for compatibility with. It was made with DirectX and the shader language that it supported in presets is HLSL, the language used by DirectX. Since projectM is designed to be run on other systems besides Windows, some form of support for HLSL was needed. Previously it was handled by the Cg toolkit from nVIDIA, but this project has been long abandoned and it was an unpleasant and sizeable dependency. The only other solution was to convert the HLSL code into GLSL, the OpenGL shader language. Use is now being made of hlsltranslator to transpile HLSL to GLSL, which is no small feat. Additional support for a #define preprocessor directive needed to be added due to its liberal use in previous code and presets.

Default warp and composite fragment shaders were created, switching to the preset versions if they are present and compile without problems. The blur shader was reimplemented in GLSL, and some helper routines recreated in GLSL for the presets that expect things like computing angle or texture sampling. The combination of running preset shader code on the GPU with the recent intel SSE optimizations for CPU per-pixel math brings performance improvements as well.

The real amazing news is that projectM now builds on a raspberry pi. This is notable not just because of the popularity of them, including running media centers, but because if it builds on a raspberry pi it should (with some effort of course) eventually run anywhere. One of the biggest issues was that projectM could only be run on a desktop or laptop computer. But now by using modern graphics operations it can run on accelerated hardware on mobile and embedded devices via OpenGLES, opening up many new opportunities, and potentially making it much easier to re-integrate into downstream projects like Kodi and VLC. Running it on the web via Emscripten, compiling to wasm and using WebGL is doable again. It’s amazing what possibilities modernizing software can bring.

It should be mentioned that the vast majority of the OpenGL work was done by superstar deltaoscarmike who apparently really knows what he’s doing.

Some screenshots of the glsl branch:

 

<Programming>: A Peek At Academia

IMG_1747

This week the 2nd conference was held in Nice, France. The topic: the art, science, and engineering of programming.

It was my first brush with the world of serious academic computer scientists. I don’t have a degree myself, though I did study CS for a couple of years at USF before deciding work was more fun and interesting and lucrative. I was by far the least qualified person at the conference; most were Ph.D students or professors or postdocs. They all came from a world whose details I was hazy on, at best. It turned out that my world, the world of “industry” as they term it, also is something of a faraway land to them.

Throughout the workshops and presentations I was struck by a few things: how well-read and smart and knowledgable the participants were, the fun abstract and interesting nature of the problems they were solving, and the complete lack and regard of any (at least to me) practical applications of their work and research.

Partly due to being in an unfamiliar realm out of my depth on many topics, and maybe partly with a little bit of jealousy of the intellectual playground they get to spend their days in, I tried to keep an open mind about the talks and learn what I could. I did really want to know if there were applications of the problems they were solving outside the world of academic research into solving problems of other academic researchers, but I felt like it’d be improper to inquire. Let me give some concrete examples.

IMG_1743

Researchers at Samsung built a prototype for sending code to be executed on other devices and making use of their resources. One demo was showing a game being played on a phone, and then having the game display seamlessly transfer to another device (that could be a smart TV, in theory). Pretty neat!

Well for one, this was for Tizen, an operating system that exists purely as a bargaining chip for Samsung and a backup strategy to not be solely dependent on Android. So there is no real-world application for this to run on any real devices. Furthermore, giving other devices the ability to make use of Tizen device resources is a huge avenue for security problems, as the presenter readily acknowledged. When combined with the fact that Tizen has more holes than swiss cheese this is doubly worrying. Additionally, one of the major unsolved huge problems with IoT (besides security) is interoperability between devices of different manufacturers. When I asked if there was any interest or plan to submit their work to a standards track, the presenter and host of the session got very confused.

IMG_1741

Another talk was a study of running safe C/C++/Fortran code. It included implementations to provide safety of memory management, bounds checking, variadic arguments length checking, use-after-free, and double-free errors. Awesome! Fantastic! Just one catch – it’s only for running said code on the JVM. Since people don’t usually run C++ code on the JVM, this is of limited use, except possibly for tooling for people running Ruby on the JVM, one attendee told me. The talk and the paper had nothing to say on the performance overhead. Research was sponsored by Oracle Labs.

Actually, a vast amount of the research topics were relating to Java and the JVM, a situation I found scandalous. I had hoped and even assumed that academics would be proponents of Free Software, because of the massive contributions to learning, understanding, and implementation, while being unencumbered by profit-driven abuses of the legal system to the determent of progress. And yet they all live and breathe Java! Java is of course NOT free (not as in free beer, but as in libre – a French word meaning “Richard Stallman”), as was recently affirmed by a US district court which found Google liable for potentially 8 or 9 billion dollars for writing header files mimicking an interface of Java. Java is probably the least free language out there now.

In the first workshop I went to there was a session where we would all get to play with a new attribute-based grammar to compose a basic C language parser. Cool! But it was all done in Java. I said “sorry, I don’t have a JDK” and the entire room burst out laughing. “Who the fuck uses JAVA?” I asked, incredulous. “Uh, everybody!” came the smart-ass reply. Since there was no functioning internet at the conference venue, I couldn’t download the JDK, so I left. What a sad state of affairs for academia, to be so beholden to the most evil corporation in software today.

Research was presented on improving the efficiency of parsing ambiguities resulting from deep priority conflicts. An interesting and thoughtful study of helping compilers do a better job of catching a certain type of ambiguity and resolving it in an optimal fashion. They applied their analysis to 10,000 Java files on GitHub and 3,000 OCaml files, and found three conflicts in two Java files, but a great many in the OCaml source files.

Screen Shot 2018-04-15 at 16.15.54.png

So for all the folks out there doing serious work with OCaml, you’re in luck!

IMG_1750
My favorite talk and the winner of an award at the conference was simply about Lisp, Jazz, and Aikido. And how they’re all cool and similar.

IMG_1758

Another enjoyable and award-winning talk was about how academics talk about Monads. Whether they are railroad tracks, boxes, or sometimes burritos.

IMG_1762

One of the student research projects sounded at first like it might be getting dangerously close to some sort of potentially useful application one day. One student talked about his system for dynamic access control for database applications. Unfortunately it requires using a contract language of his own devising in a lisp-lisp environment.


Don’t get me wrong, I enjoyed the conference and was frankly intimidated by all the super smart folks there. But it left me with a feel that so much talent, brains and time was being spent solving problems purely for the benefit and respect of other academics, instead of trying to solve serious problems facing us vulgārēs who have deadlines and business objectives and real-world problems to solve. The best part of the conference by far was just talking to the people there. I had a lot of interesting and thoughtful conversations. The research, eh.

IMG_1756

projectM Music Visualizer Status Update

As I’ve ended up with de facto maintainership of the illustrious projectM open source music visualizer I’ve seen a fair bit of interest in the project. I think I at least owe a blog post to update folks on where it’s at, what needs working on, and how to help make it better.

 

What is projectM?

projectM is a music visualizer program. In short it makes cool animations that are synchronized and reactive to any music input. I say music and not audio because it includes beat detection for making interesting things happen on the beat.

Screen Shot 2014-08-25 at 12.31.07 AM

History

Some of you may remember the old windows mp3 player WinAmp. It contained a supremely amazing and innovative music visualizer called Milkdrop written by a gentleman from nVidia named Ryan Geiss, known just as Geiss. The visualizer was not a single set of rules for visualizing audio but rather a mathematical interpreter that would read in “preset” files which were sets of equations. You can read the very illuminating description here of how the files are defined if you’re interested. In short there is a set of per-frame equations describing colors and FFT waveforms and simple transformations, and there is a set of per-vertex equations for more detailed transformations and deformations.

Due to the popularity of WinAmp and Milkdrop there have been many thousands of presets authored and shared with really stunning and innovative visual effects ranging from animated fractals to dancing stick figures to bizarre abstract soups. The files are often named things like:

  • shifter – cellular_Phat_YAK_Infusion_v2.milk
  • [dylan] cube in a room -no effects – code is very messy nz+ finally some serious stfu (loavthe).milk
  • NeW Adam Master Mashup FX 2 Zylot – In death there is life (Dancing Lights mix)+ Tumbling Cubes 3d.milk
  • suksma + aderassi geiss – the sick assumptions you make about my car [shifter’s esc shader] nz+.milk
  • flexi + cope – i blew you a soap bubble now what – feel the projection you are, connected to it all nz+ wrepwrimindloss w8.milk

And so on.

Screen Shot 2014-07-18 at 2.15.36 PM

As I understand it, possibly incorrectly, there were two major problems with Milkdrop. First that it was implemented with DirectX, win32 APIs and assembler, and secondly that it was not open source (though it was made open source fairly recently). So some enterprising folks in 2003 created projectM as an open source reimplementation that would be Milkdrop preset-compatible.

I didn’t work on projectM originally and I am not responsible for the vast majority of it. However the previous authors and contributors have for whatever reason mostly abandoned the project so it was left to random people to make it work. The code is quite old although the core Milkdrop preset parsing, beat detection, most of the OpenGL (more on that later) calls, and rendering is in fine shape. projectM is really just a library though, designed to be used by applications. In the past there have been XMMS and VLC plugins, a Qt application, pulseaudio and jack-based applications, and more.

 

OSX iTunes Plugin

Not really having a good solution for OSX I went ahead and ported the ancient iTunes visualizer code to work on a then-modern version of iTunes and voila! projectM on OSX. Though I did have to deal with the very unfortunate Objective-C++ “language” to make it work. Not Objective-C, Objective-C++. No I didn’t know that existed either.

Screen Shot 2014-08-25 at 12.33.50 AM

I tried to submit the plugin to the Mac App store as a free download. Not to make money or anything, just to make it easy for people to get it. The unpleasantness of this experience with Apple and their rejection is actually what spurred me to start this blog so I could complain about it.

Much to my, and apparently a number of other people’s dismay, a very recent version of iTunes or macOS caused the iTunes visualizer to stop working as well as it did. It appears to be related to drawing and subviews in the plugin.

 

Cross-Platform Standalone Application

I decided that what would be better is a cross-platform standalone application that simply listens to audio input and visualizes it. This dream was made possible by a very recent addition to the venerable cross-platform libsdl2 media library adding support for audio capture. I quickly hacked together a passable but very basic SDL2-based application that runs on Linux and macOS and in theory windows and other platforms as well. Some work needs to be done to add key commands, text overlays (preset name, help, etc), better fullscreen support and easy selection of which audio input device to use.

The main application code demonstrates how simple libprojectM is to use. All one must do is set up an OpenGL rendering context, set some configuration settings, and start feeding in audio PCM data to the projectM instance. It automatically performs beat detection and drawing to the current OpenGL context. It’s really ideal for being integrated into other applications and I hope people continue to do so.

Screen Shot 2018-02-18 at 20.49.03.png

You can obtain source, OSX and linux builds from the releases page. This is super crappy and experimental and needed some configuration tuning to make it look good, and you need to drop the presets folder in. But it’s a start.

Build System

In their infinite wisdom the original authors chose the cmake build system. After wasting many hours of my life I will not get back and almost giving up on the software profession altogether I decided it would be easier to switch to GNU autotools, the same build system almost all other open source projects use, than to deal with cmake’s bullshit. So now it uses autotools (aka the “./configure && make && make install” system everyone knows and loves).

 

Needed Efforts

This is where you come in. If you like music visualizers and want to help the software achieve greater things there is some work to be done modernizing it.

The most important task by far is getting rid of the OpenGL immediate-mode calls and replacing them with vertex buffer object instructions. VBO is a “new” (not new at all) way of doing things that involves creating a chunk of memory containing vertices and pushing it to the GPU so it can decide how and when to render your triangles. The old-school way was “immediate mode” where you would tell OpenGL things like glBegin(GL_QUADS) (“I’m going to give you a sequence of vertices for quadrilaterals”) and give it vertices one at a time. This is tremendously inefficient and slow so it isn’t supported on the newer OpenGL ES which is what any embedded device (like a phone or raspberry pi) supports, as well as WebGL.

I believe that projectM would be most awesome as a hardware device with an audio input and an HDMI output, but making a reasonably-sized and -priced solution would mean using an embedded device. It would be great to have a web application (I attempted to do this with Emscripten, a JavaScript backend for llvm) but that requires WebGL. Having an open source app for Android and iOS would be amazing. All of this requires the small number of existing immediate-mode calls to be updated to use VBOs instead. Somebody who knows more about this stuff or has more time than me should do it. There aren’t a lot of places in the code where they are used; see this document.

Astute readers may note that there already are iOS and Android projectM apps. They are made by one of the old developers who has made the decision to not share his modern OpenGL modifications with the project because he makes money off of them.

rms3.png
why the fuck hoarders don’t share their code back

Another similar effort is to replace the very old dependency on the nVidia Cg framework for enabling shaders. Cg was used because it matches Directx’s shader syntax. GLSL, the standard OpenGL shader language is not the same, and requires manual conversion of the shaders in each preset.

The Cg framework has been deprecated and unsupported for many years and work needs to be done to use the built-in GLSL compilation calls instead of Cg and convert the preset shaders. I already did some work on this but it’s far from finished.

 

The Community

The reason I’m writing this blog post is because of the community interest in the project. People do send pull requests and file issues, and we definitely could use more folks involved. I am busy with work and can’t spend time on it right now but I’m more than happy to guide and help out anyone wishing to contribute. We got an official IRC channel on irc.freenode.net #projectm so feel free to hang around there and ask any questions you have. Or just start making changes and send PRs.

Open-source Trusted Computing for IoT

(Originally posted on the Linux Weekly News)

At this year’s FOSDEM in Brussels, Jan Tobias Mühlberg gave a talk on the latest work on Sancus, a project that was originally presented at the USENIX Security Symposium in 2013. The project is a fully open-source hardware platform to support “trusted computing” and other security functionality. It is designed to be used for internet of things (IoT) devices, automotive applications, critical infrastructure, and other embedded devices where trusted code is expected to be run.

IMG_1125

A common security practice for some time now has been to sign executables to ensure that only the expected code is running on a system and to prevent software that is not trusted from being loaded and executed. Sancus is an architecture for trusted embedded computing that enables local and remote attestation of signed software, safe and secure storage of secrets such as encryption keys and certificates, and isolation of memory regions between software modules. In addition to the technical specification [PDF], the project also has a working implementation of code and hardware consisting of compiler modifications, additions to the hardware description language for a microcontroller to add functionality to the processor, a simulator, header files, and assorted tools to tie everything together.

Many people are already familiar with code signing; by default, smartphones won’t install apps that haven’t been approved by the vendor (i.e. Apple or Google) because each app must be submitted for approval and then signed using a key that is shipped pre-installed on every phone. Similarly, many computers support mechanisms like ARM TrustZone or UEFI Secure Boot that are designed to prevent hardware rootkits at the bootloader level. In practice, some of those technologies have been used to restrict computers to boot only Microsoft Windows or Google Chrome OS, though there are ways to disable the enforcement for most hardware.

In somewhat of a contrast to more proprietary schemes that some argue restrict the freedom of end-users, the Sancus project is a completely open-source design built explicitly on open-source hardware, libraries, operating systems, crypto, and compilers. It can be used, if desired, in specialized contexts where it is of critical importance that trusted code runs in isolation, on say an automobile braking actuator attached to a controller area network bus, or a smart grid system such as the type that was hacked in Ukraine during the attack by Russia. These are the opposite of general-purpose devices; instead, one specific function must be performed and integrity and isolation are critical.

The problem is that many medical devices, automotive controllers, industrial controllers, and similar sensitive embedded systems are made up of limited microcontrollers that may have software modules from different vendors. Misbehaving or malicious software can interfere in the operation of those other modules, expose or steal secrets, and compromise the integrity of the system. Integrity checks based in software are bypassed relatively easily compared to gate-level hardware checking; those checks also add considerable overhead and non-deterministic performance behavior.

Sancus 2.0 extends the openMSP430 16-bit microcontroller with a small and efficient set of strong security primitives, weighing in at under 1,500 lines of Verilog code and increasing power consumption by about 6%, according to Mühlberg. It can disallow jumps to undeclared entry points, provide memory isolation, and attestation for software modules.

Besides providing a key hierarchy and chain of trust for loading software modules, Sancus has a simple metadata descriptor for each module that stores the .text and .data ranges in memory; it then ensures that a .data section is inaccessible unless the program counter is in the .text range of the appropriate module. This is a simple but effective process isolation mechanism to ensure that secrets are not accessible from other software modules and that one module cannot disturb the memory of other modules.

Sancus 2.0 comes with openMSP430 hardware extension Verilog code for use with FPGA boards and with the open-source Icarus Verilog tool. A simple “hello, world” example module written in C demonstrates the basic structure of a software module designed to be loaded in a trusted environment. There are also more complex examples and a demonstration trusted vehicular component system. An LLVM-based compiler is used to compile software to signed modules designed to be loaded by a trusted microcontroller.

Mühlberg mentioned that there is ongoing work on creating secure paths between peripherals for secure I/O, integration with common existing hardware solutions such as ARM TrustZone or Intel SGX, formal verification, and ensuring suitability for realtime applications.

To give a feel for the system in action, Mühlberg showed a demonstration video comparing two simulated automotive controller networks with malicious code running on a node. One can see the unsecured system behave erratically when receiving invalid messages, whereas the Sancus system gracefully slows down and safely disengages.

Much has been written about the upcoming IoTpocalypse: the lack of security in critical infrastructure and general despair about the dismal state of easily exploitable embedded systems as they multiply and get connected to the internet. A project based on open-source building blocks and free-software ethos that attempts to provide a layer of integrity and deterministic behavior to microcontrollers should be lauded and considered by anyone building hardware applications where security and reliability are strong requirements.

 

Information vs. Encodings

A concept about modern computing that often confuses people is the difference between some piece of data and the encoding, or representation of that data.

Everyone knows computers use binary. They use 1s and 0s to store and manipulate information. Do they use binary numbers?

Computers can only store information as patterns of electrical switches, set in the “on” or “off” position. There is no such thing as a “binary” number, only a number that is encoded as a binary pattern. Numbers are information, and they don’t actually exist. We can write down Arabic numerals like “42”, or write it in base-2 as “101010”, but these are merely different ways of encoding the same number. It’s up to us to come up with a scheme of encoding information using whatever is available.

Humans have all used base-10 numbering systems throughout history because we have ten fingers. In Roman times people used Roman numerals, which were pretty clumsy and not especially well-suited for arithmetic or algebra. Later, Europeans switched to Arabic numerals (0-9) while keeping the Latin writing system (A-Z).

So the number 42 is still the same number whether it’s written as XLII, “forty-two”, 4️⃣2️⃣, 0x2A, etc. All represent the same number, just encoded different ways. It’s up to the person interpreting the encoding using a particular scheme to translate it from the written-down form into useful information.

This doesn’t apply to only numbers but text, audio, video, web pages, hard disks, subtitles, and anything else one may want to be able to store in some hard copy form and represent digitally. Assuming lossless encoding, a FLAC of a song is the same information as an AIFF of a song is the same information as a zipped WAV of a song. They all represent the same PCM audio data just in different formats.

This blog post is a bunch of dumb words that anyone who understands English can make some sense of, but it’s stored as a sequence of bytes using the UTF-8 encoding standard which is a way of storing Unicode glyphs as a sequence of bytes (byte = 8 bits, hence “UTF-8”). Unicode is a mapping of codepoints (numbers) to glyphs, with some fancy rules about combining glyphs and things. Unicode is not a format, there are different ways to encode the codepoints into a machine-processable format.

As far as computers are concerned you can only deal with bits, grouped into bytes. The most convenient way to store and retrieve any data from RAM or storage or over a network is a stream of bytes. If you want to represent some information in a computer, you need some encoding scheme to translate it to and from a stream of bytes. How you want to accomplish this can be entirely up to you. The information only has the meaning you choose to imbue it with.