Travel: Venice

Water. Lots and lots of water.

Tourists. Lots and lots of tourists. I’ve been to a lot of major cities but never have I seen so many tourists in one place.

Good food of course, this being Italy. Expect to wait half an hour to get your order taken and up to 45 minutes to get the check and pay.

Venice hosts the Biennale, sometimes for art and sometimes for architecture. Each country participating sets up their own wacky pavilion with their stuff to show off. It’s a pretty fantastic experience. America’s art installation was a giant ball of modern so-called art.


<Programming>: A Peek At Academia


This week the 2nd conference was held in Nice, France. The topic: the art, science, and engineering of programming.

It was my first brush with the world of serious academic computer scientists. I don’t have a degree myself, though I did study CS for a couple of years at USF before deciding work was more fun and interesting and lucrative. I was by far the least qualified person at the conference; most were Ph.D students or professors or postdocs. They all came from a world whose details I was hazy on, at best. It turned out that my world, the world of “industry” as they term it, also is something of a faraway land to them.

Throughout the workshops and presentations I was struck by a few things: how well-read and smart and knowledgable the participants were, the fun abstract and interesting nature of the problems they were solving, and the complete lack and regard of any (at least to me) practical applications of their work and research.

Partly due to being in an unfamiliar realm out of my depth on many topics, and maybe partly with a little bit of jealousy of the intellectual playground they get to spend their days in, I tried to keep an open mind about the talks and learn what I could. I did really want to know if there were applications of the problems they were solving outside the world of academic research into solving problems of other academic researchers, but I felt like it’d be improper to inquire. Let me give some concrete examples.


Researchers at Samsung built a prototype for sending code to be executed on other devices and making use of their resources. One demo was showing a game being played on a phone, and then having the game display seamlessly transfer to another device (that could be a smart TV, in theory). Pretty neat!

Well for one, this was for Tizen, an operating system that exists purely as a bargaining chip for Samsung and a backup strategy to not be solely dependent on Android. So there is no real-world application for this to run on any real devices. Furthermore, giving other devices the ability to make use of Tizen device resources is a huge avenue for security problems, as the presenter readily acknowledged. When combined with the fact that Tizen has more holes than swiss cheese this is doubly worrying. Additionally, one of the major unsolved huge problems with IoT (besides security) is interoperability between devices of different manufacturers. When I asked if there was any interest or plan to submit their work to a standards track, the presenter and host of the session got very confused.


Another talk was a study of running safe C/C++/Fortran code. It included implementations to provide safety of memory management, bounds checking, variadic arguments length checking, use-after-free, and double-free errors. Awesome! Fantastic! Just one catch – it’s only for running said code on the JVM. Since people don’t usually run C++ code on the JVM, this is of limited use, except possibly for tooling for people running Ruby on the JVM, one attendee told me. The talk and the paper had nothing to say on the performance overhead. Research was sponsored by Oracle Labs.

Actually, a vast amount of the research topics were relating to Java and the JVM, a situation I found scandalous. I had hoped and even assumed that academics would be proponents of Free Software, because of the massive contributions to learning, understanding, and implementation, while being unencumbered by profit-driven abuses of the legal system to the determent of progress. And yet they all live and breathe Java! Java is of course NOT free (not as in free beer, but as in libre – a French word meaning “Richard Stallman”), as was recently affirmed by a US district court which found Google liable for potentially 8 or 9 billion dollars for writing header files mimicking an interface of Java. Java is probably the least free language out there now.

In the first workshop I went to there was a session where we would all get to play with a new attribute-based grammar to compose a basic C language parser. Cool! But it was all done in Java. I said “sorry, I don’t have a JDK” and the entire room burst out laughing. “Who the fuck uses JAVA?” I asked, incredulous. “Uh, everybody!” came the smart-ass reply. Since there was no functioning internet at the conference venue, I couldn’t download the JDK, so I left. What a sad state of affairs for academia, to be so beholden to the most evil corporation in software today.

Research was presented on improving the efficiency of parsing ambiguities resulting from deep priority conflicts. An interesting and thoughtful study of helping compilers do a better job of catching a certain type of ambiguity and resolving it in an optimal fashion. They applied their analysis to 10,000 Java files on GitHub and 3,000 OCaml files, and found three conflicts in two Java files, but a great many in the OCaml source files.

Screen Shot 2018-04-15 at 16.15.54.png

So for all the folks out there doing serious work with OCaml, you’re in luck!

My favorite talk and the winner of an award at the conference was simply about Lisp, Jazz, and Aikido. And how they’re all cool and similar.


Another enjoyable and award-winning talk was about how academics talk about Monads. Whether they are railroad tracks, boxes, or sometimes burritos.


One of the student research projects sounded at first like it might be getting dangerously close to some sort of potentially useful application one day. One student talked about his system for dynamic access control for database applications. Unfortunately it requires using a contract language of his own devising in a lisp-lisp environment.

Don’t get me wrong, I enjoyed the conference and was frankly intimidated by all the super smart folks there. But it left me with a feel that so much talent, brains and time was being spent solving problems purely for the benefit and respect of other academics, instead of trying to solve serious problems facing us vulgārēs who have deadlines and business objectives and real-world problems to solve. The best part of the conference by far was just talking to the people there. I had a lot of interesting and thoughtful conversations. The research, eh.


Travel: Amsterdam

Amsterdam’s a fine place; good cheese, markets, art, bicycles, weed, idyllic empty parks.

Seems like it’d be a chill place to live, but quite pricey.

Hemp, Marijuana, and Hash Museum

First day there I nearly got run down by a dozen bicycles, not realizing you have to look both ways before crossing the sidewalk.

Rembrandt’s House


projectM Music Visualizer Status Update

As I’ve ended up with de facto maintainership of the illustrious projectM open source music visualizer I’ve seen a fair bit of interest in the project. I think I at least owe a blog post to update folks on where it’s at, what needs working on, and how to help make it better.


What is projectM?

projectM is a music visualizer program. In short it makes cool animations that are synchronized and reactive to any music input. I say music and not audio because it includes beat detection for making interesting things happen on the beat.

Screen Shot 2014-08-25 at 12.31.07 AM


Some of you may remember the old windows mp3 player WinAmp. It contained a supremely amazing and innovative music visualizer called Milkdrop written by a gentleman from nVidia named Ryan Geiss, known just as Geiss. The visualizer was not a single set of rules for visualizing audio but rather a mathematical interpreter that would read in “preset” files which were sets of equations. You can read the very illuminating description here of how the files are defined if you’re interested. In short there is a set of per-frame equations describing colors and FFT waveforms and simple transformations, and there is a set of per-vertex equations for more detailed transformations and deformations.

Due to the popularity of WinAmp and Milkdrop there have been many thousands of presets authored and shared with really stunning and innovative visual effects ranging from animated fractals to dancing stick figures to bizarre abstract soups. The files are often named things like:

  • shifter – cellular_Phat_YAK_Infusion_v2.milk
  • [dylan] cube in a room -no effects – code is very messy nz+ finally some serious stfu (loavthe).milk
  • NeW Adam Master Mashup FX 2 Zylot – In death there is life (Dancing Lights mix)+ Tumbling Cubes 3d.milk
  • suksma + aderassi geiss – the sick assumptions you make about my car [shifter’s esc shader] nz+.milk
  • flexi + cope – i blew you a soap bubble now what – feel the projection you are, connected to it all nz+ wrepwrimindloss w8.milk

And so on.

Screen Shot 2014-07-18 at 2.15.36 PM

As I understand it, possibly incorrectly, there were two major problems with Milkdrop. First that it was implemented with DirectX, win32 APIs and assembler, and secondly that it was not open source (though it was made open source fairly recently). So some enterprising folks in 2003 created projectM as an open source reimplementation that would be Milkdrop preset-compatible.

I didn’t work on projectM originally and I am not responsible for the vast majority of it. However the previous authors and contributors have for whatever reason mostly abandoned the project so it was left to random people to make it work. The code is quite old although the core Milkdrop preset parsing, beat detection, most of the OpenGL (more on that later) calls, and rendering is in fine shape. projectM is really just a library though, designed to be used by applications. In the past there have been XMMS and VLC plugins, a Qt application, pulseaudio and jack-based applications, and more.


OSX iTunes Plugin

Not really having a good solution for OSX I went ahead and ported the ancient iTunes visualizer code to work on a then-modern version of iTunes and voila! projectM on OSX. Though I did have to deal with the very unfortunate Objective-C++ “language” to make it work. Not Objective-C, Objective-C++. No I didn’t know that existed either.

Screen Shot 2014-08-25 at 12.33.50 AM

I tried to submit the plugin to the Mac App store as a free download. Not to make money or anything, just to make it easy for people to get it. The unpleasantness of this experience with Apple and their rejection is actually what spurred me to start this blog so I could complain about it.

Much to my, and apparently a number of other people’s dismay, a very recent version of iTunes or macOS caused the iTunes visualizer to stop working as well as it did. It appears to be related to drawing and subviews in the plugin.


Cross-Platform Standalone Application

I decided that what would be better is a cross-platform standalone application that simply listens to audio input and visualizes it. This dream was made possible by a very recent addition to the venerable cross-platform libsdl2 media library adding support for audio capture. I quickly hacked together a passable but very basic SDL2-based application that runs on Linux and macOS and in theory windows and other platforms as well. Some work needs to be done to add key commands, text overlays (preset name, help, etc), better fullscreen support and easy selection of which audio input device to use.

The main application code demonstrates how simple libprojectM is to use. All one must do is set up an OpenGL rendering context, set some configuration settings, and start feeding in audio PCM data to the projectM instance. It automatically performs beat detection and drawing to the current OpenGL context. It’s really ideal for being integrated into other applications and I hope people continue to do so.

Screen Shot 2018-02-18 at 20.49.03.png

You can obtain source, OSX and linux builds from the releases page. This is super crappy and experimental and needed some configuration tuning to make it look good, and you need to drop the presets folder in. But it’s a start.

Build System

In their infinite wisdom the original authors chose the cmake build system. After wasting many hours of my life I will not get back and almost giving up on the software profession altogether I decided it would be easier to switch to GNU autotools, the same build system almost all other open source projects use, than to deal with cmake’s bullshit. So now it uses autotools (aka the “./configure && make && make install” system everyone knows and loves).


Needed Efforts

This is where you come in. If you like music visualizers and want to help the software achieve greater things there is some work to be done modernizing it.

The most important task by far is getting rid of the OpenGL immediate-mode calls and replacing them with vertex buffer object instructions. VBO is a “new” (not new at all) way of doing things that involves creating a chunk of memory containing vertices and pushing it to the GPU so it can decide how and when to render your triangles. The old-school way was “immediate mode” where you would tell OpenGL things like glBegin(GL_QUADS) (“I’m going to give you a sequence of vertices for quadrilaterals”) and give it vertices one at a time. This is tremendously inefficient and slow so it isn’t supported on the newer OpenGL ES which is what any embedded device (like a phone or raspberry pi) supports, as well as WebGL.

I believe that projectM would be most awesome as a hardware device with an audio input and an HDMI output, but making a reasonably-sized and -priced solution would mean using an embedded device. It would be great to have a web application (I attempted to do this with Emscripten, a JavaScript backend for llvm) but that requires WebGL. Having an open source app for Android and iOS would be amazing. All of this requires the small number of existing immediate-mode calls to be updated to use VBOs instead. Somebody who knows more about this stuff or has more time than me should do it. There aren’t a lot of places in the code where they are used; see this document.

Astute readers may note that there already are iOS and Android projectM apps. They are made by one of the old developers who has made the decision to not share his modern OpenGL modifications with the project because he makes money off of them.

why the fuck hoarders don’t share their code back

Another similar effort is to replace the very old dependency on the nVidia Cg framework for enabling shaders. Cg was used because it matches Directx’s shader syntax. GLSL, the standard OpenGL shader language is not the same, and requires manual conversion of the shaders in each preset.

The Cg framework has been deprecated and unsupported for many years and work needs to be done to use the built-in GLSL compilation calls instead of Cg and convert the preset shaders. I already did some work on this but it’s far from finished.


The Community

The reason I’m writing this blog post is because of the community interest in the project. People do send pull requests and file issues, and we definitely could use more folks involved. I am busy with work and can’t spend time on it right now but I’m more than happy to guide and help out anyone wishing to contribute. We got an official IRC channel on #projectm so feel free to hang around there and ask any questions you have. Or just start making changes and send PRs.

Open-source Trusted Computing for IoT

(Originally posted on the Linux Weekly News)

At this year’s FOSDEM in Brussels, Jan Tobias Mühlberg gave a talk on the latest work on Sancus, a project that was originally presented at the USENIX Security Symposium in 2013. The project is a fully open-source hardware platform to support “trusted computing” and other security functionality. It is designed to be used for internet of things (IoT) devices, automotive applications, critical infrastructure, and other embedded devices where trusted code is expected to be run.


A common security practice for some time now has been to sign executables to ensure that only the expected code is running on a system and to prevent software that is not trusted from being loaded and executed. Sancus is an architecture for trusted embedded computing that enables local and remote attestation of signed software, safe and secure storage of secrets such as encryption keys and certificates, and isolation of memory regions between software modules. In addition to the technical specification [PDF], the project also has a working implementation of code and hardware consisting of compiler modifications, additions to the hardware description language for a microcontroller to add functionality to the processor, a simulator, header files, and assorted tools to tie everything together.

Many people are already familiar with code signing; by default, smartphones won’t install apps that haven’t been approved by the vendor (i.e. Apple or Google) because each app must be submitted for approval and then signed using a key that is shipped pre-installed on every phone. Similarly, many computers support mechanisms like ARM TrustZone or UEFI Secure Boot that are designed to prevent hardware rootkits at the bootloader level. In practice, some of those technologies have been used to restrict computers to boot only Microsoft Windows or Google Chrome OS, though there are ways to disable the enforcement for most hardware.

In somewhat of a contrast to more proprietary schemes that some argue restrict the freedom of end-users, the Sancus project is a completely open-source design built explicitly on open-source hardware, libraries, operating systems, crypto, and compilers. It can be used, if desired, in specialized contexts where it is of critical importance that trusted code runs in isolation, on say an automobile braking actuator attached to a controller area network bus, or a smart grid system such as the type that was hacked in Ukraine during the attack by Russia. These are the opposite of general-purpose devices; instead, one specific function must be performed and integrity and isolation are critical.

The problem is that many medical devices, automotive controllers, industrial controllers, and similar sensitive embedded systems are made up of limited microcontrollers that may have software modules from different vendors. Misbehaving or malicious software can interfere in the operation of those other modules, expose or steal secrets, and compromise the integrity of the system. Integrity checks based in software are bypassed relatively easily compared to gate-level hardware checking; those checks also add considerable overhead and non-deterministic performance behavior.

Sancus 2.0 extends the openMSP430 16-bit microcontroller with a small and efficient set of strong security primitives, weighing in at under 1,500 lines of Verilog code and increasing power consumption by about 6%, according to Mühlberg. It can disallow jumps to undeclared entry points, provide memory isolation, and attestation for software modules.

Besides providing a key hierarchy and chain of trust for loading software modules, Sancus has a simple metadata descriptor for each module that stores the .text and .data ranges in memory; it then ensures that a .data section is inaccessible unless the program counter is in the .text range of the appropriate module. This is a simple but effective process isolation mechanism to ensure that secrets are not accessible from other software modules and that one module cannot disturb the memory of other modules.

Sancus 2.0 comes with openMSP430 hardware extension Verilog code for use with FPGA boards and with the open-source Icarus Verilog tool. A simple “hello, world” example module written in C demonstrates the basic structure of a software module designed to be loaded in a trusted environment. There are also more complex examples and a demonstration trusted vehicular component system. An LLVM-based compiler is used to compile software to signed modules designed to be loaded by a trusted microcontroller.

Mühlberg mentioned that there is ongoing work on creating secure paths between peripherals for secure I/O, integration with common existing hardware solutions such as ARM TrustZone or Intel SGX, formal verification, and ensuring suitability for realtime applications.

To give a feel for the system in action, Mühlberg showed a demonstration video comparing two simulated automotive controller networks with malicious code running on a node. One can see the unsecured system behave erratically when receiving invalid messages, whereas the Sancus system gracefully slows down and safely disengages.

Much has been written about the upcoming IoTpocalypse: the lack of security in critical infrastructure and general despair about the dismal state of easily exploitable embedded systems as they multiply and get connected to the internet. A project based on open-source building blocks and free-software ethos that attempts to provide a layer of integrity and deterministic behavior to microcontrollers should be lauded and considered by anyone building hardware applications where security and reliability are strong requirements.


Travel: Budapest, Hungary

Budapest is one of my most favorite places of all. The architecture is some of the best in the world, there’s a mix of cultures from Austria-Hungary to Turks to Magyars. Everything I ate there was delicious. Great public transit. Turkish-style baths everywhere. Fun boat tour on the Danube river, separating Buda and Pest.

Good to visit as an American, too. At immigration most foreigners were being fingerprinted and questioned for long periods of time. When they saw my US passport they just waved me through.

Only real complaint is that all written words in Budapest are in Hungarian, a language I have absolutely zero chance of ever learning or understanding. It belongs to the Uralic family along with Finnish and Estonian, containing basically no relation to any other Greek, Latin, Romance, or Slavic tongues or words. Besides the unfortunate language it’s a great place.



Soviet war memorial
Amazing architecture
Opera house?
One of my favorite restaurants – Cafe Kör
Inside: organ concert
Sweet church
Energy drink
IMG_4474 2
Man in airport approving of Budapest

Why Travel

Most people enjoy traveling, myself included. A relatively recent trend gaining populartiy however is turning travel from a vacation- or business-oriented experience to a general modus vivendi.

Having lived all my life in the San Francisco Bay Area I decided I should expand my horizons and try living elsewhere for a change. Anywhere. Because my work making software involves computers and the internet, I can work from anywhere as long as I have a computer and the internet, so why not take advantage of that?

Turns out it’s really not so hard. Traveling light is the key, really. All I’ve taken with me is one carry-on sized backpack with some clothes and a laptop, and mostly just stayed at Airbnbs. It’s that simple.

Pretty much all my stuff except clothes

Oddly enough many of my friends, themselves often computer typers and Linux janitors living in the Bay Area, express a desire to travel around as well. They say “oh I could never afford that though.” To which I respond “fool, I can’t afford to live in fucking San Francisco, how the fuck can I afford not to travel?” I spend a lot less money seeing the world than I did suffering through the bleak dystopian dysfunctional morass that is modern-day San Francisco.

Screen Shot 2018-01-29 at 01.51.37.png
KRON 4: Lawless rioters light a pile of cardboard recycling aflame in the Tenderloin

SF was a good deal more functional and cooler when I moved there in 2005 but now it’s beyond repair and hardly worth the astronomical cost of living there. Maybe the subject for another post sometime.

Because I’m not on vacation, I don’t do a ton of sightseeing. I try to hit one or two famous things in each city I go to but really I’m working most of the time. It’s like if I were at home, except that I’m not at home. Just working from different cities all the time.

I’ll always miss the Mexican food while traveling though :/

Turns out plane travel can be extremely reasonable when you can be flexible about dates and destinations and can plan a couple months in advance. Plane tickets around the globe can be had for a song sometimes, and intra-Europe flight costs can be in the low double digits if flying by Wizz or Ryan Air. I wrote before about flying on the cheap, including my $116 one-way ticket from SFO to Amsterdam. And I haven’t even gotten in on the rewards cards and miles redemption schemes that are out there.


If you’re self-employed attending conferences is a solid plan for a few reasons. For one, if they are related to your work, you can claim conference-related expenses as deductible business expenses. Also it gives you a good reason to go to new destinations, meet new people, write articles covering the talks, and of course learn new stuff.

Thermal storage management tech talk

A few websites make the itinerant lifestyle much simpler. One is the Nomad List community, which has the most relevant list of destinations, measuring things like air quality, internet speed, safety, weather, friendliness and more. Also there’s a wonderful Slack chat associated with it where you can ask anything at all that’s travel-related, and even meet up with other people doing the same thing you are in basically any major city in the world. A couple weeks ago I met up with some very nice folks from there in Chiang Mai, Thailand, which is something like the digital nomad capital of the world if such a place can be said to exist.

Chiang Mai at night

Wifi isn’t always the greatest, but I’ve had fantastic luck getting local SIM cards wherever I travel. They almost always provide good speeds and decent latency, at a decent price. There’s even a helpful wiki that has everything you could ever want to know about data SIM cards anywhere in the world. One important thing to know though; even if you have an awesome free roaming plan like say, T-mobile’s, your normal SIM will be slooooow if you leave the continent. I learned by way of a telecom engineer at the IETF conference that your carrier tunnels your IP traffic through their network when you’re roaming. Meaning that all your traffic goes to the USA (or wherever your carrier is) and back. Get a local data SIM.

Airbnb, well, it’s just awesome. I’ve stayed at 40 of them so far, and it’s been mostly problem-free. The worst case has been a couple of times when I got canceled on last-minute, something a bit annoying but hardly the end of the world. You just take it in stride and get a new place to stay. I always make sure to get a place with a washing machine, so laundry isn’t a big deal. Sadly most countries aren’t into dryers like the USA, but you learn to live with these setbacks.

Actually, come to think of it, dealing with foreign washing machines is extremely challenging sometimes.

Google Translate is no help with printed Japanese… at all…

I think taking things in stride is really key to exploring the world. Maybe some places have traffic lights that change every ten minutes or so (I don’t know what’s up with that in Thailand), or people for whom communication, verbal or otherwise is an impossibility, or you leave a bank card in an ATM, or a million other things that can go wrong. I’ve had the good fortune to not encounter any catastrophes, and anything else can be dealt with by a generous application of calm and just asking yourself “okay, well what should I do now?” It all works out fine.


Screen Shot 2018-01-29 at 01.03.00.png


Above are the places in Europe that I’ve been in the past few years. I’ll write more about them in subsequent posts. (I wrote about my previous travels in Poland and Ukraine back in 2015 here).

Summary of Europe: Budapest is probably my favorite. Lviv and Kyiv in Ukraine are excellent and quite easy on the pocketbook. Berlin and Amsterdam are also great but definitely far on the pricier end. Serbia sucks don’t go there. Paris is Paris. Dublin has shit weather and bad food. Wasn’t that into Northern Italy but still want to visit Southern Italy pretty badly. Skip Warsaw but check out Wrocław, Prague and “Bohemian Switzerland” in Czech Republic. Brussels is boring and a bit too frenchy for my tastes but they got good beer and fries so can’t hate too much. Barcelona is hot. Croatia is a nice getaway that isn’t in Schengen if you’re running up against the limit of days you can spend visa-free in that part of Europe.

IMG_3171 2
Alright, Northern Italy isn’t so bad I guess

At the end of 2017 I veered off to new waters – Asia and Oceania.

Screen Shot 2018-01-29 at 00.55.10

Sydney and New Zealand are great places to go in December if you’re like me and hate the cold; ’cause it’s summertime in the Southern Hemisphere. Also they’re pretty great places and people speak English, though the timezone difference really complicates things if you’re working with other people or trying to keep in touch with friends. New Zealand is UTC+12, which puts you on the exact opposite of Western Europe, though it’s damn pretty there. Japan I wasn’t into so much; serious language barrier, hella cold, wack food, complicated getting around, expensive, tiny apartments. Hong Kong on the other hand is a fantastic place I plan on returning to as soon as I get the chance. I stayed in six different cities in Thailand and they were mostly very agreeable. Great food obviously, as well as a warm and pleasant climate, prices and exchange rate comparable to Ukraine, lots of smiling, friendly people, some of what’s probably in the top tier SCUBA diving in the world at Ko Tao, massive expat community in Chiang Mai, and plenty of great nature, temples, night markets, and things to see. Just be sure to not throw shade on the monarchy or the junta running the country while you’re there and you’ll be fine.

I’ve come to the end of this leg of my travels and will be heading back to Europe shortly. As you can see from Google Maps I’m writing this from the most perfect civilization ever created by man – Singapore. More on that later.


More posts to come, so stay tuned!

Information vs. Encodings

A concept about modern computing that often confuses people is the difference between some piece of data and the encoding, or representation of that data.

Everyone knows computers use binary. They use 1s and 0s to store and manipulate information. Do they use binary numbers?

Computers can only store information as patterns of electrical switches, set in the “on” or “off” position. There is no such thing as a “binary” number, only a number that is encoded as a binary pattern. Numbers are information, and they don’t actually exist. We can write down Arabic numerals like “42”, or write it in base-2 as “101010”, but these are merely different ways of encoding the same number. It’s up to us to come up with a scheme of encoding information using whatever is available.

Humans have all used base-10 numbering systems throughout history because we have ten fingers. In Roman times people used Roman numerals, which were pretty clumsy and not especially well-suited for arithmetic or algebra. Later, Europeans switched to Arabic numerals (0-9) while keeping the Latin writing system (A-Z).

So the number 42 is still the same number whether it’s written as XLII, “forty-two”, 4️⃣2️⃣, 0x2A, etc. All represent the same number, just encoded different ways. It’s up to the person interpreting the encoding using a particular scheme to translate it from the written-down form into useful information.

This doesn’t apply to only numbers but text, audio, video, web pages, hard disks, subtitles, and anything else one may want to be able to store in some hard copy form and represent digitally. Assuming lossless encoding, a FLAC of a song is the same information as an AIFF of a song is the same information as a zipped WAV of a song. They all represent the same PCM audio data just in different formats.

This blog post is a bunch of dumb words that anyone who understands English can make some sense of, but it’s stored as a sequence of bytes using the UTF-8 encoding standard which is a way of storing Unicode glyphs as a sequence of bytes (byte = 8 bits, hence “UTF-8”). Unicode is a mapping of codepoints (numbers) to glyphs, with some fancy rules about combining glyphs and things. Unicode is not a format, there are different ways to encode the codepoints into a machine-processable format.

As far as computers are concerned you can only deal with bits, grouped into bytes. The most convenient way to store and retrieve any data from RAM or storage or over a network is a stream of bytes. If you want to represent some information in a computer, you need some encoding scheme to translate it to and from a stream of bytes. How you want to accomplish this can be entirely up to you. The information only has the meaning you choose to imbue it with.

Heroku logging to AWS Lambda

If you use heroku and AWS and want to customize your heroku application logging, you can hook Logplex up to AWS Lambda.


When a heroku application emits things to stdout or stderr they get shuttled to the magical world of Logplex. The logs enter as syslog messages, containing information like facility, priority, etc. Not only logs from your application but logs from heroku’s build and deploy systems, postgresql, and other add-ons as well. Shortly after arrival these logs are dispatched to whatever sinks your heroku app has configured which can go to add-ons like PaperTrail, and also to custom log sink URLs. The sink destinations can be syslog(+TLS) or syslog-over-HTTPS using octet counting framing.

One advantage of this setup is that you can have your application emit logs with a minimum of blocking. At one point I had my application sending logs to Slack directly but this caused latency in the application any time I logged anything. By sending to Logplex on the other hand, I can process the application messages asynchronously without doing anything remotely fancy in my application. Another benefit is that you can handle your application, database, build, and deploy logs all the same unified fashion.

Using AWS API Gateway and Lambda you can set up your own Logplex sink and can do whatever you desire with the logs coming out of Logplex. This includes your application’s output as well as add-ons and heroku platform messages. You can them send them into CloudWatch Logs, or even Slack as in this example:

"""Sample handler for parsing Heroku logplex drain events (
Expects messages to be framed with the syslog TCP octet counting method (
This is designed to be run as a Python3.6 lambda.
import json
import boto3
import logging
import iso8601
import requests
from base64 import b64decode
from pyparsing import Word, Suppress, nums, Optional, Regex, pyparsing_common, alphanums
from collections import defaultdict
HOOK_URL = "https://&quot; + boto3.client('kms').decrypt(CiphertextBlob=b64decode(ENCRYPTED_HOOK_URL))['Plaintext'].decode('ascii')
CHANNEL = "#alerts"
log = logging.getLogger('myapp.heroku.drain')
class Parser(object):
def __init__(self):
ints = Word(nums)
# priority
priority = Suppress("<") + ints + Suppress(">")
# version
version = ints
# timestamp
timestamp = pyparsing_common.iso8601_datetime
# hostname
hostname = Word(alphanums + "_" + "-" + ".")
# source
source = Word(alphanums + "_" + "-" + ".")
# appname
appname = Word(alphanums + "(" + ")" + "/" + "-" + "_" + ".") + Optional(Suppress("[") + ints + Suppress("]")) + Suppress("-")
# message
message = Regex(".*")
# pattern build
self.__pattern = priority + version + timestamp + hostname + source + appname + message
def parse(self, line):
parsed = self.__pattern.parseString(line)
# get priority/severity
priority = int(parsed[0])
severity = priority & 0x07
facility = priority >> 3
payload = {}
payload["priority"] = priority
payload["severity"] = severity
payload["facility"] = facility
payload["version"] = parsed[1]
payload["timestamp"] = iso8601.parse_date(parsed[2])
payload["hostname"] = parsed[3]
payload["source"] = parsed[4]
payload["appname"] = parsed[5]
payload["message"] = parsed[6]
return payload
parser = Parser()
def lambda_handler(event, context):
return {
"isBase64Encoded": False,
"statusCode": 200,
"headers": {"Content-Length": 0},
def handle_lambda_proxy_event(event):
body = event['body']
headers = event['headers']
# sanity-check source
assert headers['X-Forwarded-Proto'] == 'https'
assert headers['Content-Type'] == 'application/logplex-1'
# split into chunks
def get_chunk(payload: bytes):
# payload = payload.lstrip()
msg_len, syslog_msg_payload = payload.split(b' ', maxsplit=1)
if msg_len == '':
raise Exception(f"failed to parse heroku logplex payload: '{payload}'")
msg_len = int(msg_len)
except Exception as ex:
raise Exception(f"failed to parse {msg_len} as int, payload: {payload}") from ex
# only grab msg_len bytes of syslog_msg
syslog_msg = syslog_msg_payload[0:msg_len]
next_payload = syslog_msg_payload[msg_len:]
yield syslog_msg.decode('utf-8')
if next_payload:
yield from get_chunk(next_payload)
# group messages by source,app
# format for slack
srcapp_msgs = defaultdict(dict)
chunk_count = 0
for chunk in get_chunk(bytes(body, 'utf-8')):
chunk_count += 1
evt = parser.parse(chunk)
if not filter_slack_msg(evt):
# skip stuff filtered out
# add to group
sev = evt['severity']
group_name = f"SEV:{sev} {evt['source']} {evt['appname']}"
if sev not in srcapp_msgs[group_name]:
srcapp_msgs[group_name][sev] = list()
body = evt["message"]
srcapp_msgs[group_name][sev].append(str(evt["timestamp"]) + ': ' + evt["message"])
for group_name, sevs in srcapp_msgs.items():
for severity, lines in sevs.items():
if not lines:
title = group_name
# format the syslog event as a slack message attachment
slack_att = slack_format_attachment(log_msg=None, log_rec=evt)
text = "\n" + "\n".join(lines)
slack(text=text, title=title, attachments=[slack_att], channel=channel, severity=severity)
# sanity-check number of parsed messages
assert int(headers['Logplex-Msg-Count']) == chunk_count
return ""
def slack_format_attachment(log_msg=None, log_rec=None, title=None):
"""Format as slack attachment."""
severity = int(log_rec['severity'])
# color
color = None
if severity == LOG_DEBUG:
color = "#aaaaaa"
elif severity == LOG_INFO:
color = "good"
elif severity == LOG_NOTICE:
color = "#439FE0"
elif severity == LOG_WARNING:
color = "warning"
elif severity < LOG_WARNING:
# error!
color = "danger"
attachment = {
# 'text': "`" + log_msg + "`",
# 'parse': 'none',
'author_name': title,
'color': color,
'mrkdwn_in': ['text'],
'text': log_msg,
# 'fields': [
# # {
# # 'title': "Facility",
# # 'value': log_rec["facility"],
# # 'short': True,
# # },
# # {
# # 'title': "Severity",
# # 'value': severity,
# # 'short': True,
# # },
# {
# 'title': "App",
# 'value': log_rec["appname"],
# 'short': True,
# },
# # {
# # 'title': "Source",
# # 'value': log_rec["source"],
# # 'short': True,
# # },
# {
# 'title': "Timestamp",
# 'value': str(log_rec["timestamp"]),
# 'short': True,
# }
# ]
return attachment
def filter_slack_msg(msg):
"""Return true if we should send to slack."""
sev = msg["severity"] # e.g. LOG_DEBUG
source = msg["source"] # e.g. 'app'
appname = msg["appname"] # e.g. 'heroku-postgres'
body = msg["message"]
if sev >= LOG_DEBUG:
return False
if body.startswith('DEBUG '):
return False
# if source == 'app' and sev > LOG_WARNING:
# return False
if appname == 'router':
return False
if appname == 'heroku-postgres' and sev >= LOG_INFO:
return False
if 'sql_error_code = 00000 LOG: checkpoint complete' in body:
# ignore checkpoint
return False
if 'sql_error_code = 00000 NOTICE: pg_stop_backup complete, all required WAL segments have been archived' in body:
# ignore checkpoint
return False
if 'sql_error_code = 00000 LOG: checkpoint starting: ' in body:
# ignore checkpoint
return False
if appname == 'logplex' and body.startswith('Error L10'):
# NN messages dropped since...
return False
return True
def slack(text=None, title=None, attachments=[], icon=None, channel='#alerts', severity=LOG_WARNING):
if not attachments:
# emoji icon
icon = 'mega'
if severity == LOG_DEBUG:
icon = 'information_source'
elif severity == LOG_INFO:
icon = 'information_desk_person'
elif severity == LOG_NOTICE:
icon = 'scroll'
elif severity == LOG_WARNING:
icon = 'warning'
elif severity < LOG_WARNING:
# error!
icon = 'boom'
message = {
"username": title,
"channel": channel,
"icon_emoji": f":{icon}:",
"attachments": attachments,
"text": text,
def slack_raw(payload):
response =
HOOK_URL, data=json.dumps(payload),
headers={'Content-Type': 'application/json'}
if response.status_code != 200:
raise ValueError(
'Request to slack returned an error %s, the response is:\n%s'
% (response.status_code, response.text)



There is one major deficiency in this system that is worth noting: there is no way for your application to alter the log message’s syslog fields. So even if your application logger knows a particular message is debug, or warn, or error, it all comes across as severity level 6 (info). Logs from other components such as postgresql preserve their log severities but your application is a second-class citizen and there is no mechanism to send actual syslog messages to Logplex even though add-ons and internal heroku machinery clearly does. I filed a ticket about this and complained at length and they told me they have no plans to allow users to send syslog-formatted messages to Logplex, and everyone is stuck with only stdout/stderr. This means if you wish to treat messages of differing severities differently in your Logplex sink you can’t, at least not with the existing out-of-band syslog data that your sink receives. As far as the sink can tell all of your application debug logs and error logs all look the same, which is frankly an impossible situation when it comes to logging. Hopefully they fix this some day.