IoT Security Through Open Certification

(Cross-posted from SF ISOC blog)

IoT Security Through Open Certification

The more jaded nerds who’ve been around the block a few times here in San Francisco have an understandably dismissive attitude towards the use and abuse of technological buzzwords, of which “IoT” is a contemporary offender. In one sense they’re correct in that what we’re talking about are embedded systems connected to the internet, Big Deal. But remind them that it’s a bunch of embedded systems connected to the internet in the context of security, and the salient point is sharply made. They quickly turn from dismissive to despondent, knowing where this is all likely headed.

Obligatory Scary References and Predictions

Where is it headed? You don’t have to turn to prognostication to get a glimpse of the consequences of the Earth being flooded with sloppily-developed firmware. In case you missed it, in September and October of 2016 the Mirai botnet, thousands of embedded devices comprising 36 depressingly-poorly-secured IoT products shipping with default usernames and passwords were press-ganged into “multiple major DDoS attacks in DNS services of [the] DNS service provider Dyn […] using Mirai malware installed on a large number of IoT devices, resulting in the inaccessibility of several high profile websites such as GitHub, Twitter, Reddit, Netflix, Airbnb and many others” (https://en.wikipedia.org/wiki/Mirai_(malware)). At volumes of 620-1024 Gbps, these attacks were extremely consequential and disruptive, essentially breaking the internet for many users for the better part of a day.

This attack represented the most low-hanging fruit possible; default usernames and passwords, internet-addressable devices. The sophistication required was likely minimal.

Even more recently someone set up ZMap to find raspberry pis with SSH on and the default username and password, and created a worm capable of infecting millions of hosts that probably took the author an afternoon to make.

As the number of these sorts of devices proliferate and attacks increase in sophistication, we can expect a corresponding increase in bad days for network admins, not to mention the hapless end user. The FBI in 2015 felt the need to issue a PSA to this effect: “The FBI is warning companies and the general public to be aware of IoT vulnerabilities cybercriminals could exploit.”

The danger is well-known and publicized and not worth belaboring for too long. The real question is of course: what can we do about it?

Incentives and Obstacles

The reason that many IoT products have poor security is not due to a failure of morals, bad upbringing, or stupidity, but a reasonable economic calculation on the part of the manufacturer. They are concerned primarily with the time to market. Taking extra time to design and build properly and test their code only adds delay, for which they see no fungible benefit. These products are made by thousands of large and small manufacturers and pieced together from various developers and engineers around the world, a top-down regulatory approach is impractical. There are simply too many moving parts, countries, agencies, software libraries and stacks, for effective regulations to keep pace with this fast-moving target. So what’s to be done?

In the opinion of people smarter than me, what’s needed is an open certification for things connected to the internet asserting a minimum level of security. It doesn’t need to be ultra-rigorous to be of benefit, at least at the basic level. A simple “this device is not almost certainly going to get taken over and wreak havoc” stamp would be a great first step, one that many manufacturers are not passing muster on presently.

Why a certification?

A certification process can be designed collaboratively and openly, can be implemented by anyone, doesn’t require action from policymakers, can have different levels of rigor, and most importantly provides a market-based incentive to manufacturers to not make obvious, common blunders. The result can only be greater security and stability for pretty much the entire internet-connected planet. As a user of the internet I have a personal interest in not having everything susceptible to hacks and being used to take down internet infrastructure.

It’s the opinion of respected security professionals that this is a positive and necessary measure.

There would be incentive to manufacturers to conform to the certification; consumers and institutions should prefer to purchase conforming devices vs. similar devices that haven’t been vetted. Consider a government or corporation procurement policy that mandates that conforming devices be preferred or required.

This is not a novel idea, there are in fact a small number of company-sponsored certifications already but as far as I can tell they are proprietary and run by a single company. The most promising proposal comes from the Online Trust Alliance initiative from the Internet Society. They define a set of best practices for securing IoT devices and also take into consideration notifications and privacy. Their IoT Trust Framework provides a solid assurance that a device is trustworthy to deploy, at least more than any random off-the-shelf thing.

Other Options

Certification is not the only option for securing Things and embedded devices. Governmental policy is another possibility, though necessarily limited in its jurisdiction, scope, and ability to keep up with new developments in a rapidly-changing highly technical field. Also I don’t get to make policy, but I can help make a certification. As an example of useful legislation Dan Greer suggests making liability contingent on the openness of the firmware; if you use closed-source, proprietary systems then you are more legally liable for damage caused than if you used open-source software. This is both practical and reasonable, as open-source code can be audited and improved by the community, particularly if you go out of business but your devices remain. He has many more such intelligent suggestions that he lays out in his 2014 BlackHat keynote which I highly recommend watching. I also thought highly of his suggestion that devices should either be remotely-updateable (with signed updates of course) to patch flaws in the field, or they should “expire” and stop being connected to the internet after some period of time, say five years. Having insecure devices on the internet is one thing, having un-patchable systems that stay around forever is quite another. This could easily be a component of certification.

Another more extreme approach that as far as I’m aware was not predicted, is that some people such as the hacker “The Janit0r” have taken it upon themselves to release worms using similar vectors as the Mirai botnet to take over insecure IoT devices and then either brick them or firewall them so that they can’t be used maliciously. The Janit0r claims he has bricked over two million insecure devices so far, so that they can’t be press-ganged into evil servitude. The similar Hajime worm has no DDoS capability and instead blocks ports to lock down the device:

From https://www.symantec.com/connect/blogs/hajime-worm-battles-mirai-control-internet-things:

“There are some features that are noticeably missing from Hajime. It currently doesn’t have any distributed denial of service (DDoS) capabilities or any attacking code except for the propagation module. Instead, it fetches a statement from its controller and displays it on the terminal approximately every 10 minutes. The current message is:

Just a white hat, securing some systems.

Important messages will be signed like this!

Hajime Author.

Contact CLOSED

Stay sharp!

[…]

To the author’s credit, once the worm is installed it does improve the security of the device. It blocks access to ports 23, 7547, 5555, and 5358, which are all ports hosting services known to be exploitable on many IoT devices. Mirai is known to target some of these ports.”

Community and Governance

Another reason for optimism is the response from assorted institutions, individuals and corporations. AWS should be praised for absolutely requiring proper (mutual TLS) authentication for anyone using their IoT platform. On June 8th, 2017 the US NTIA put out a RFC specifically about hardening IoT devices and preventing botnets. The San Francisco Bay Area Internet Society has a new IoT Working Group to promote security and best practices for development, which I’m happy to be leading. If this is a topic of interest to you, there are plenty of communities of people willing to work together to make the coming flood of Things a positive transition instead of an internet minefield.

 

 

Mischa’s {Fun,Lucrative} Project Ideas

You are free to steal these ideas for yourself or work with me on them. If you make a million bucks please buy me a burrito.

Table of Contents

1 Next-generation live video streaming system

  • Use WebRTC
    • WebRTC is a new standard defining peer-to-peer (usually) live video/audio streaming. It can be easily used in web browsers with JavaScript. It is brand new and not fully deployed yet (supported in all browsers except Safari, with Safari support coming soon).
    • It is a way to do streaming live video without Flash or HLS, both of which suck.
  • Janus
    • Janus is a two-faced server written in C that speaks RTP and RTSP on one side, and WebRTC on the other. It acts like a WebRTC peer and can be used to turn WebRTC into a client/server model, or facilitate client/client communications. It is an awesome project and will enable all kinds of cool, standards-based, real-time video and audio communication in web browsers and other things.
  • Your own peer-to-peer Skype replacement website
    • With Janus and a couple hundred lines of JavaScript you could make a peer-to-peer video calling web page. It’s really not that hard. Why not make something to replace the piece of shit that is Skype. It’ll be insanely popular and you barely need any infrastructure.
  • References

2 Economic data overlay for Google Earth

  • I’ve always thought it’d be really cool to have a geospatial visualization of different economic measurements like GDP, PPP, happiness, Gini coeffecient, core and non-core inflation, unemployment, corruption, etc. This would give anyone in the world the ability to visualize how different countries (and regions?) stack up to each other in an intuitive way. Graphs and data are not accessible to most people, but visually seeing their country be obviously shittier compared to neighboring countries could help increase demand for measures to improve the quality of life of their citizenry.
  • Maps and visualizations can convey information and affect people a lot more than articles and numbers can.
  • Trick is finding good, comparable sources of data. Probably the CIA world factbook and Shadow Govt Stats would be good places to start.
  • References

3 IOT security certification biz

4 AWS consulting biz

  • AWS is awesome and can save companies gazillions of dollars on capex, datacenter, personnel, development and operations costs. If you aren’t using AWS you’re an idiot. Really. You just don’t know it yet.
  • Doing AWS The Right Way isn’t hard but requires some experience or just reading how to do it The Right Way.
  • References

5 Resell AWS functionality (rekognition, polly)

6 COBOL modernization biz

  • So many businesses have important applications that are built on old systems like COBOL and JCL.
  • They are desperate to modernize these codebases and not be reliant on impossible-to-replace hardware and systems that nobody understands and cannot hire anyone to do.
  • COBOL is incredibly unsexy-sounding and most young people have never heard of it and want to build apps or screw around with JavaScript.
  • Everyone who is qualified to do this is dead or retired.
  • Companies and governments are unable to hire anyone to fix their crap.
  • COBOL is incredibly easy to read. Probably hard to write, but you wouldn’t need to write any.
  • Porting legacy applications to modern systems could be extremely fun and lucrative.

7 Reality capture * VR

8 Hardware projectM

  • I maintain a nifty music visualizer project called projectM. It is an open-source re-implementation of the venerable WinAmp Milkdrop visualizer.
  • It needs some software work to port it to be OpenGL-ES compatible.
  • Once it works with GLES you could easily make an embedded linux system (probably a Rasberry Pi or something more beefy) that would have audio in and HDMI out.
  • References

9 dancar

SF PostgreSQL Conference

img_1796Recently beautiful South San Francisco hosted the annual Silicon Valley PostgreSQL conference, a gathering of the world’s top open-source database nerds.

Some of the fantastic talks I attended were:

PL/pgsql:

A deep dive into the myriad features of the built-in postgres procedural language, PL/pgsql. It’s a sort of funny-looking but very capable and featureful language that lets you very easily mix procedural code with SQL statements and types based on your rows and tables. It’s something I’ve used before in a very limited form before but I really had no idea how many standard scripting language features were available, including things like “auto” and composite types, multiple return values, IN/OUT/INOUT/VARIADIC parameters, automatic function AST and SQL prepared statement caching, anonymous functions. PL/pgsql is very handy for trigger functions, administrative functions (like partitioning tables on a periodic basis) and distilling complex logic into reusable pieces. There are some important caveats about function performance, so if you’re planning on calling them often be sure to read up on what you should and shouldn’t do. Try to avoid functions calling other functions if possible, take advantage of the advisory keywords like IMMUTABLE and figure out if it’s okay to serialization inside of a transaction boundary.

img_1794

pg_paxos:

screen-shot-2016-11-16-at-1-45-26-pm

Paxos is a distributed consensus algorithm and its integration into postgres as an extension gives you the nifty ability to paxosly-replicate tables and use a paxos(key) function to find out what value a majority of nodes report back with the option to use constraints as well. Seems like it could be useful for things like master elections, geographically disparate systems that have low latency for local writes but eventually become consistent, and times when you only care about an upper or lower bound (easy with the constraints). Not sure if I’ll ever have a need for it or not.

img_1800

Go:

Went to a talk on using go with postgresql. There’s a nice driver for it. Mostly people seem to do raw SQL queries, using ORMs like gorm doesn’t seem like a very popular option. I imagine largely because people using go are doing so because they care about performance, and because ORMs are going to obviously be more limited in a feature-constrained compiled language. Speaker claimed his go rewrite of pgnetdetective was a bajillion times faster than the python version.

Becoming a PostgreSQL Guru:

We all want to be the proverbial unixbeard guru in the corner office who acolytes petition to receive tidbits of wisdom. A big ingredient in achieving enlightenment involves knowing what the new aggregate functions (see sections 7.2.4 and 7.2.5) can do for you. There are easy ways to auto-generate hierarchical aggregates by groups of different ranges and sets, using GROUPING SETS, CUBE, ROLLUP, LATERAL JOIN, CTE and window functions. If you find yourself needing to generate some reports there’s a really good chance some of these new features can speed things up a huge amount and require less code.

img_1803

Durability:

Postgres has many knobs related to how safe you want to be with your data. These are great to know about in some detail because often you will have different demands based on your application or business. Naturally they have tradeoffs so knowing how to make informed choices on the matter is crucial. For example if you’re a bank, you may not want to finish a transaction until 3-phase commit happens on all write replicas, but if you have some web session cookie table or log table on a single box you may want to make it SET UNLOGGED to vastly improve performance, with the caveat that you may not have perfect crash recovery of the latest writes if something terrible happens. Great that postgres gives you lots of options in these areas.

img_1808

Supporting legacy systems:

A gentleman from a consulting company shared his experiences as a person hired by companies to come in and support or maintain or migrate extreme legacy systems and how useful postgres is in that process, along with some Java toolkit for bridging old systems. He namedropped things like FoxPro, JCL, COBOL, Solaris and a bunch of other things I didn’t recognize. I’ve always thought it’d be a fun job to take these ultra old systems that companies entirely depend on and are desperate to get off of and help them out. It’s not hip like writing new JavaScript build systems or whatever but I bet there’s real good money in it. One thing that’s always stuck in my head was how during the California budget crisis ten years ago or so, the governor wanted to pay all state employees minimum wage but the comptroll-er said it couldn’t be done. You see, the state’s payroll system runs on COBOL and their two job reqs have gone unfilled for years and years. Probably because all COBOL devs are dead or retired. It’s written out in plain English though so I don’t get what the big deal is…

img_1797


 

In conclusion it was a fine set of talks, I wish I could have seen some of the others that were going on at the same time in other rooms. The SF Postgresql Meetup has more of these same types of great informative lectures going on year-round and I highly recommend attending them if this sort of stuff gets you pumped up too.

 

projectM: Open-Source Music Visualization

Update: more recent information can be found here

 

If you remember the old windows music player Winamp, it came with an amazing visualizer named Milkdrop written by a guy at nVidia named Geiss. This plugin performed beat detection and splitting the music into frequency buckets with an FFT and then fed that info into a randomly-selected “preset.” The presets are equations and parameters controlling waveform equations, colors, shapes, shaders,”per-pixel” equations (not actually per-screen-pixel, rather a smaller mesh that is interpolated) and more.

Most of the preset files have ridiculous names like:

  • “suksma + aderassi geiss – the sick assumptions you make about my car [shifter’s esc shader] nz+.milk”
  • “lit claw (explorers grid) – i don’t have either a belfry or bats bitch.milk”
  • “Eo.S. + Phat – chasers 12 sentinel Daemon – mash0000 – multi-band time-distortion aurora granules.milk”
  • “Goody + martin – crystal palace – Schizotoxin – The Wild Iris Bloom – mess2 nz+ i have no character and feel entitled to one.milk”

Milkdrop was originally only for windows and was not open-source, so a few very smart folks got together and re-implemented Milkdrop in C++ under the LGPL license. The project created plugins to visualize Winamp, XMMS, iTunes, Jack, Pulseaudio, ALSA audio. Pretty awesome stuff.

This was a while ago, but recently I wanted to try it out on OSX. I quickly realized that the original iTunes plugin code was out of date by about 10 major versions and wasn’t even remotely interested in compiling, not to mention lacking a bunch of dependencies built for OSX.

So I went ahead and updated the iTunes plugin code, mostly in a zany language called Objective-C++ which combines C++ and Objective-C. It’s a little messed up but I guess it works for this particular case. I grabbed the dependencies and built them by hand, including static versions for OSX in the repository to make it much easier for others to build it (and myself).

Getting it to build was no small feat either. Someone made the unfortunate decision to use cmake instead of autotools. I can understand the hope and desire to use something better than autotools, but cmake ain’t it. Everything is written in some ungodly undocumented DSL that is unlike any other language you’ve used and it makes a giant mess all over your project folders like an un-housebroken puppy fed a laxative. I have great hope that the new Meson build system will be awesome and let us all put these miserable systems out to pasture. We’ll see.

Screen Shot 2016-08-02 at 9.59.55 PM.png
cmake – not even once

Long story short after a bunch of wrangling I got this all building as a native OSX iTunes plugin. With a bit of tweaking and tossing in the nVidia Cg library I got the quality and rendering speed to be top-notch and was able to reduce the latency between the audio and rendering, although I think there’s still a few frames of delay I’d like to figure out how to reduce.

I wanted to share my plugin with Mac users, so I tried putting it in the Mac App Store. What resulted was a big fat rejection from Apple because I guess they don’t want to release plugins via the app store. You can read about those travails here. I think that unpleasant experience is what got me to start this blog so I could publicly announce my extreme displeasure with Apple’s policies towards developers trying to contribute to their ecosystem.

After trying and failing to release via the app store I put the plugin up on my GitHub, along with a bunch of the improvements I made. I forked the SourceForge version, because SourceForge can go wither and die for all I care.

I ended up trying to get it running in a web page with Emscripten and on an embedded linux device (raspberry pi). Both of these efforts required getting it to compile with the embedded spec for OpenGL, GLES. Mostly I accomplished this by #ifdef’ing out immediate-mode GL calls like glRect(). After a lot more ferocious battling with cmake I got it running in SDL2 on Linux on a Raspberry Pi. Except it goes about 1/5fps, lol. Need to spend some time profiling to see if that can be sped up.

I also contacted a couple of the previous developers and the maintainers on SourceForge. They were helpful and gave me commit access to SF, one said he was hoarding his GLES modifications for the iOS and Android versions. Fair enough I guess.

Now we’re going to try fully getting rid of the crufty old SourceForge repo, moving everything to GitHub. We got a snazzy new GitHub homepage and even our first pull request!

My future dreams for this project would be to make an embedded Linux device that has an audio input jack and outputs visualizations via HDMI, possibly a raspberry pi, maybe something beefier. Apparently some crazy mad genius implemented this mostly in a FPGA but has stopped producing the boards, I don’t know if I’m hardcore enough to go that route. Probably not.

In conclusion it’s been nice to be able to take a nifty library and update it, improve it, put out a release that people can use and enjoy, and work with other contributors to make software for making pretty animations out of music. Hopefully with our fresh new homepage and an official GitHub repo we will start getting more contributors.

I recorded a crappy demo video. The actual visualizer is going 60fps and looks very smooth, but the desktop video recorder I used failed to capture at this rate so it looks really jumpy. It’s not actually like that.

AWS Lambda Editor Plugin for Sublime Text

Editing the source of a lambda procedure in AWS can be very cumbersome. Logging in with two-factor authentication and then selecting your lambda and using their web-based “IDE” with nested scroll bars going on on the page is not the greatest. Even worse is if your function actually has dependencies! Then you cannot view the source on the web and must download a zip file, and re-zip and upload it every time you wish to make a change.

Naturally after a while of doing this I got pretty fed up so I created a handy plugin (documentation and source on GitHub) for my editor of choice these days, Sublime Text. After setting up your AWS access key if you haven’t done so already (it uses the awscli or boto config) and installing the plugin via the Sublime Package Manager, you can call up a list of lambdas from within your editor.

After selecting a lambda to edit, it downloads the zip (even if it wasn’t originally a zip), sticks it in a temporary directory and creates a sublime project for you. When you save any of the files it will automatically zip up the files in the project and update the function source automatically, as if you were editing a local file. Simplicity itself.

If you use AWS lambda and Sublime Text, get this plugin! It’ll save you a ton of time. Watch it in action:

 

Video instructions for installing the plugin from scratch:

Mac OS X El Capitan and OpenSSL Headers

Apple stopped including the OpenSSL development headers on recent versions of OSX, trying to get people to move away from the old 0.9.8 version that’s been deprecated for a very long time. Making people stop using this shared library is a Good Thing to be sure but you may come across older software that you want to build for yourself.

If you try to compile a newer version of OpenSSL you will likely find that programs will fail to build against more recent versions because a lot of data structures have been hidden. You may see errors such as:

error: variable has incomplete type 'EVP_PKEY' (aka 'struct evp_pkey_st')

        EVP_PKEY pk;

                 ^

/usr/local/include/openssl/ossl_typ.h:92:16: note: forward declaration of 'struct evp_pkey_st'

typedef struct evp_pkey_st EVP_PKEY;

If you want to get such code to compile there’s a quick and easy solution! OSX still ships with the 0.9.8 library, you just need to provide the headers. Remove any newer versions of OpenSSL, grab the 0.9.8 sources, and copy over the headers:

$ sudo cp -r include/openssl /usr/local/include/

And then you’re all set.

Developing a cloud-based IoT service

In my previous post I describe my adventures in building an AWS IoT-enabled application for a proprietary embedded linux system and getting it to run. The next step in our journey is to create a service that communicates with our device and controls it in a useful way.

What can we do with a system running with the aws_iot library? We can use the MQTT message bus to subscribe to channels and publish messages, and we can diff the current device state against the desired device state shadow stored on the server. Now we need the service side of the puzzle.

My sample IoT application is to be able to view images on an IP camera from anywhere on the internet. I’m planning to incorporate live HD video streaming as well but that is a whole other can of worms we don’t need to open for this demonstration. My more modest goal for now will be to create a service where I can request a snapshot from the camera be uploaded to AWS’s Simple Storage Service (S3) which can store files and serve them up to authenticated users. In addition I will attempt to build the application server logic around AWS Lambda, a service for running code in response to events without actually having to deploy a server or run a daemon of any sort. If I can manage this then I will have a truly cloud-based service; one that does not consume any more resources than are required to perform its job and with no need to pre-allocate any servers or storage. It will be running entirely on Amazon’s infrastructure with only small bits of configuration, policy and code inserted in the right places to perform the relatively simple tasks required of my app. This is the Unemployed DevOps lifestyle, the dream of perfect lazy scalability and massive offloading of effort and operations to Amazon. There is of course a large downside to this setup, namely that I am at the mercy of Amazon. If they are missing a feature I need then I’m pretty much screwed and if their documentation is poor then I will suffer enormously. A partial description of my suffering and screwed state continues below.

I’ve been bitten before by my foolish impetuousness in attempting to use new AWS services that have clearly not been fully fleshed out. I was an early adopter of the CodeDeploy system, a super useful and nifty system for deploying changes to your application on EC2 instances from S3 or even straight from GitHub. Unfortunately it turned out to not really be finished or tested or documented and I ended up wasting a ton of time trying to make it work and deal with corner cases. It’s a dope service but it’s really painfully clear nobody at AWS has ever bothered to actually try using it for a real application, and all of my feature requests and bug reports and in-person sessions with AWS architects have all resulted in exactly zero improvements despite my hours of free QA I performed for them. As a result I am now more cautious when using new AWS services, such as IoT and Lambda.

In truth attempting to make use of the IoT services and client library has been one of the most frustrating and difficult uphill battles I’ve ever waged against a computer. The documentation is woefully incomplete, I’ve wasted tons of time guessing at what various parameters should be, most features don’t really behave as one would expect and the entire system is just super buggy and non-deterministic. Sometimes when I connect it just fails. Or when subscribing to MQTT topics.

Usually this doesn't happen. But sometimes it does!
Usually this doesn’t happen. But sometimes it does!

Why does it disconnect me every few seconds? I don’t know. I enabled autoReconnect (which is a function pointer on a struct unlike every other function) so it does reconnect at least, except when it just fails for no apparent reason.

setAutoReconnectStatus is only mentioned as being a typedef in the MQTT client documentation. One would assume you should call the function aws_iot_mqtt_autoreconnect_set_status(), but the sample code does indeed call the struct’s function pointer instead. No other part of the library uses this fakeo method call style.

On the boto3 (python AWS clienet library) side things are not really any better. The device shadow support (called IoT Dataplane) documentation is beyond unhelpful at least as of this writing. If you want to update a device state dictionary (its “shadow”) in python, say, in a lambda, you call the following method:

Usually when you want to specify a dictionary-type object as a param in python it’s customary to pass it around as a dict. It’s pretty unusual for an API that is expecting a dictionary data structure to expect you to already have encoded it as JSON, but whatever. What is really missing in this documentation is the precise structure of the update payload JSON string you’re supposed to pass in. You’re supposed to pass in the desired new state in the format {“state”: { “desired”: { … } } }:

My dumb lambda

If you hunt around from the documentation pages referenced by the update_thing_shadow() documentation you may uncover the correct incantation, though not on the page it links to. It would really save a lot of time if they just mentioned the desired format.

I really definitely have no reason why it wants a seekable object for the payload since it’s not like you can really send files around. I actually first attempted to send an image over the IoT message bus with no luck, until I realized that the biggest message that can ever be sent over it is 128k. This application would be infinitely simpler if I could transmit the image snapshot over my existing message bus but that would be too easy. I am fairly certain my embedded linux system can handle buffering many megabytes of data and my network is pretty solid, it’s really a shame that AWS is so resource-constrained!

The reason I am attempting to use the device shadow to communicate is that my current scheme for getting an image from the device into AWS in lieu of the message bus is:

  • The camera sends a MQTT message that indicates it is online
  • When the message is received, a DevicePolicy matches the MQTT topic and invokes a lambda
  • The lambda generates a presigned S3 request that will allow the client to upload a file to an S3 bucket
  • The lambda updates the device shadow with the request params
  • A device shadow delta callback on the camera is triggered (maybe once, maybe twice, maybe not at all, by my testing)
  • Callback receives the S3 request parameters and uploads the file via libcurl to S3
  • Can now display thumbnail to a web client from S3

I went to the AWS Loft to talk to an Amazon architect, a nice free service the company provides. He didn’t seem to know much about IoT, but he spoke with some other engineers there about my issues. He said there didn’t appear to be any way to tell what client sent a message, which kind of defeats the entire point of the extra security features, and he was going to file an internal ticket about that. As far as uploading a file greater than 128k, the above scheme was the best we could come up with.

Regarding the security, I still am completely at a loss as to how one is supposed to manage more than one device client at a time. You’re supposed to create a “device” or a “Thing”, which has a policy and unique certificate and keypair attached to it and its own device shadow state. I assume the keypair and device shadows are supposed to be associated with a single physical device, which means you will need to automate some sort of system that provisions all of this along with a unique ThingName and ClientID for each physical device and then include that in your configuration header and recompile your application. For each device, I guess? There is no mention of what exactly how provisioning is supposed to work when you have more than one device, and I kinda get the feeling nobody’s thought that far ahead. Further evidence in support of this theory is that SNS messages or lambdas that are invoked from device messages do not include any sort of authenticated ClientID or ThingName, so there’s no way to know where you are supposed to deliver your response. Right now I just have it hard-coded to my single Thing for testing. I give Amazon 10/10 for the strict certificate and keypair verification, but that’s only one part of a scheme that as far as I can tell has no mechanism for verifying the client’s identity when invoking server-side messages and code.

It wasn’t my intention to bag on AWS IoT, but after months of struggling to get essentially nowhere I am rather frustrated. I sincerely hope that it improves in usableness and stability because it does have a great deal of powerful functionality and I’d very much like to base my application on it. I’d be willing to help test and report issues as I have in the past, except that I can’t talk to support without going in to the loft in person or paying for a support plan, and the fact that all of my previous efforts at testing and bug reporting have added up to zero fixes or improvements doesn’t really motivate me either.

If I can get this device shadow delta callback to actually work like it’s supposed to I’ll post more updates as I progress. It may be slow going though. The code, such as it is, is here.

 

Diving into IoT development using AWS

I’m more allergic than most people to buzzwords. I cringe big time when companies suddenly start rebranding their products with the word “cloud” or tack on a “2.0”. That said, I realize that the cloud is not just computers in a datacenter and the Internet of Things isn’t all meaningless hype either. There exists a lot of cool new technology, miniaturization, super cheap hardware of all shapes and sizes and power requirements, ever more rapid prototyping and lot more that adds up to what looks like a new era in embedded system hardware.

People at the embedded linux conference can't wait to tell you about IoT stuff
People at the embedded linux conference can’t wait to tell you about IoT stuff

But what will drive this hardware? There is a lot of concern about the software that’s going to be running on these internet-connected gadgets because we all just know that the security on most of these things is going to be downright laughable, but now since they’re a part of your car, your baby monitor, your oven, your insulin pump and basically everything, this is gonna be a big problem.

So I’ve embarked on a project to try to build an IoT application properly and securely. I think it’ll be fun, a good learning experience, and even a useful product that I may be able to sell one day. At any rate it’s an interesting technical challenge.

My project is thus: to build a cloud-based IoT (ughhh sorry) IP camera for enterprise surveillance. It will be based on as much open source software as possible, ABRMS-licensed, mobile-first and capable of live streaming without any video transcoding.

I think I know how to do this, I’ve written a great deal of real-time streaming software in the past. I want to offload as much as the hard stuff as possible; let the hardware do all the h.264 encoding and let AWS manage all of the security, message queueing and device state tracking.

At the Dublin gstreamer conference I got to chat up an engineer from Axis, an awesome Swedish company that makes the finest IP cameras money can buy. He informed me that they have a new program called ACAP (Axis Camera Application Platform) which essentially lets you write what are essentially “apps” that are software packages that can be uploaded to their cameras. And they’re all running Linux! Sweet!

And recently I also learned of a new IoT service from Amazon AWS. I was dreading the humongo task of writing a whole new database-backed web application and APIs for tracking devices, API keys, device states, authentication, message queueing and all of that nonsense. Well it looks like the fine folks at Amazon already did all the hard work for me!

So I had my first development goal: create a simple AWS-IoT client and get it to run on an Axis camera.

Step one: get access to ACAP

Axis doesn’t really make it very easy to join their development program. None of their API documentation is public. I’m always very wary of companies that feel like they need to keep their interfaces a secret. What are you hiding? What are you afraid of? Seems like a really weird thing to be a control freak about. And it majorly discourages developers from playing around with your platform or knowing about what it can do.

But that is a small trifle compared to joining the program. I filled out a form requesting access to become a developer and was eventually rewarded with a salesbro emailing me that he was busy with meetings for the next week but could hop on a quick call with me to tell me about their program. I informed them that I already wanted to join the program and typed all the relevant words regarding my interest into their form and didn’t need to circle back with someone on a conference call in a few weeks’ time, but they were really insistent that they communicate words via telephone.

After Joe got to give me his spiel on the phone I got approved to join the Axis developer partner program. As far as ACAP they give you a SDK which you can also download as an Ubuntu VirtualBox image. Inside the SDK is a tutorial PDF, several cross-compiler toolchains, some shady Makefile includes, scripts for packaging your app up and some handy precompiled libraries for the various architectures.

Basically the deal is that they give you cross-compilers and an API for accessing bits of the camera’s functionality, things like image capture, event creation, super fancy storage API, built-in HTTP server CGI support, and even video capture (though support told me vidcap super jankity and I shouldn’t use it). The cross-compilers support Ambarella ARM, ARTPEC (a chip of Axis’s design) and some MIPS thing, these being the architectures used in various Axis products. They come with a few libraries all ready to link, including glib, RAPP (RAster Processing Primitives library) and fixmath. Lastly there’s a script that packages your app up, building a fat package for as many architectures as you want, making distribution super simple. Now all I had to do was figure out how to compile and make use of the IoT libraries with this build system.

Building mbedTLS and aws_iot

AWS has three SDKs for their IoT clients: Arduino Yún, node.js and embedded C linux platforms. The Arduino client does sound cool but that’s probably underpowered for doing realtime HD video, and I’m not really the biggest node.js fan. Linux embedded C development is where it is at, on the realz. This is the sort of thing I want to be doing with my life.

Hells yeah!
Word

All that I needed to do was create a Makefile that builds the aws_iot client library and TLS support with the Axis toolchain bits. Piece of cake right? No, not really.

The IoT AWS service takes security very seriously, which is super awesome and they deserve props for forcing users to do things correctly: use TLS 1.2, include a server certificate and root CA cert with each device and give each device a private key. Wonderful! Maybe there is hope and the IoT future will not be a total ruinfest. The downside to this strict security of course is that it is an ultra pain in the ass to set up.

You are offered your choice of poison: OpenSSL or mbedTLS. I’d never heard of mbedTLS before but it looked like a nice little library that will get the job done that isn’t a giant bloated pain in the ass to build. OpenSSL has a lot of build issues I won’t go into here.

To set up your app you create a key and cert for a device and then load them up in your code:

 connectParams.pRootCALocation = rootCA;
 connectParams.pDeviceCertLocation = clientCRT;
 connectParams.pDevicePrivateKeyLocation = clientKey;

Simple enough. Only problem was that I was utterly confused by what these files were supposed to be. When you set up a certificate in the IoT web UI it gives you a public key, a private key and a certificate PEM. After a lot of dumbness and AWS support chatting we finally determined that rootCA referred to a secret CA file buried deep within the documentation and the public key was just a bonus file that you didn’t need to use. In case anyone else gets confused as fuck by this like I was you can grab the root CA file from here.

The AWS IoT C SDK (amazon web services internet of things C software development kit) comes with a few sample programs by way of documentation. They demonstrate connecting to the message queue and viewing and updating device shadows.

#define AWS_IOT_MQTT_HOST              "B13C0YHADOLYOV.iot.us-west-2.amazonaws.com" ///< Customer specific MQTT HOST. The same will be used for Thing Shadow                                                                                                                       
#define AWS_IOT_MQTT_PORT              8883 ///< default port for MQTT/S                                                                  
#define AWS_IOT_MQTT_CLIENT_ID         "MischaTest" ///< MQTT client ID should be unique for every device                                 
#define AWS_IOT_MY_THING_NAME          "MischaTest" ///< Thing Name of the Shadow this device is associated with                          
#define AWS_IOT_ROOT_CA_FILENAME       "root-ca.pem" ///< Root CA file name                                                               
#define AWS_IOT_CERTIFICATE_FILENAME   "1cd9c753bf-certificate.pem.crt" ///< device signed certificate file name                          
#define AWS_IOT_PRIVATE_KEY_FILENAME   "1cd9c753bf-private.pem.key" ///< Device private key filename                                      

To get it running you edit the config header file, copy your certificates and run make. Then you can run the program and see it connect and do stuff like send messages.

successful-run

Once you’ve got a connection set up from your application to the IoT API you’re good to go. Kind of. Now that I had a simple C application building with the Axis ACAP SDK and a sample AWS IoT application building on linux, the next step was to combine them into the ultimo baller cloud-based camera software. This was not so easy.

Most of my efforts towards this were spent tweaking the Makefile to pull in the mbedTLS code, aws_iot code and my application code in a setup that would allow cross-compiling and some semblance of incremental building. I had to up my Make game considerably but in the end I was victorious. You can see the full Makefile in all its glory here.

The gist of it is that it performs the following steps:

  • loads ACAP make definitions (include $(AXIS_TOP_DIR)/tools/build/rules/common.mak)
  • sets logging level (LOG_FLAGS)
  • grab all the source and include files/dirs for mbedTLS and aws_iot
  • define a static library target for all of the aws_iot/mbedTLS code – Screen Shot 2016-03-20 at 2.19.16 PM
  • produce executable:
    Screen Shot 2016-03-20 at 8.39.58 PM.png

The advantage of creating aws-iot.a is that I can quickly build changes to my application source without having to re-link dozens of files.

I combined the Axis logging macros and the aws_iot style logging into one syslog-based system so that I can see the full output when the app is running on the device.

Uploading to the Camera

Once I finally had an ACAP application building I was finally able to try deploying it to a real camera (via make target of course):

Screen Shot 2016-03-20 at 2.18.58 PM

Screen Shot 2016-03-20 at 2.14.21 PM

Getting the app running on the camera and outputting useful logging took quite a bit of effort. I really ran into a brick wall with certificate verification however. My first problem was getting the certs into the package, which was just a simple config change. But then it began failing. Eventually I realized it was because the clock on the camera was not set correctly. Realizing the importance of a proper config, including NTP, I wrote a script to configure a new camera via the REST API. I wanted it to be as simple as possible to run so I wrote it without requiring any third party libraries. It also shares the package uploader config for the camera IP and password so if you’ve already entered it you don’t need to again.

With NTP configured at least there are no more certificate expired errors. I’m able to connect just fine on normal x86 linux, but fails to verify the certs when running on the camera. After asking support, they suggest recompiling mbedTLS with -O0 (disable optimizations) when building on ARM. After doing so, it connects and works!

Screen Shot 2016-03-20 at 2.14.51 PM

🌭🍕🍔 !!!!! Success!

To summarize; at this point we now have an embedded ARM camera device that is able to connect and communicate with the AWS IoT API securely. We can send and receive messages and device shadow states.

So what’s next? Now we need a service for the camera to talk to.

 

AWS Orchestration / Unemployed DevOps Professionals

Now that we live in the age of ~the cloud~ it strikes me that for many software projects the traditional roles of system administrator and their more recent rebranded “DevOps” are not strictly required.

I can’t speak much to other cloud hosting platforms but Amazon Web Services really is the flyest shit. All of Amazon’s internal infrastructure has been built with APIs to control everything for many years by decree of Jeff Bezos, the CEO. This was a brilliant requirement that he mandated because it allowed Amazon to become one of the first companies able to re-sell its spare server capacity and make their automated platform services available to everyday regular developers such as myself. They’ve been in the business of providing service-oriented infrastructure to anyone with a credit card longer than most everything, and their platform is basically unmatched. Whereas before setting up a fancy HTTPS load balancer or highly available database cluster with automated backups was a time-consuming process, now a few clicks or API calls will set you up with nearly any infrastructure your application requires, with as much reliability and horsepower as you’re willing to pay for.

I’ve heard some people try to argue that AWS is expensive. This is true if you use it like a traditional datacenter, which it’s not. If you try running all your own services on EC2 and pay an army of expensive “DevOps” workers to waste time playing with Puppet or Chef or some other nonsense then perhaps it’s a bit costly. Though compared with power, bandwidth, datacenter, sysadmin and hardware costs and maintenance overheard of running on your own metal I still doubt AWS is going to run you any more. In all likelihood your application really isn’t all that special. You probably have a webserver, a database, store some files somewhere, maybe a little memcached cluster and load balancer or two. All this can be had for cheap in AWS and any developer could set up a highly available production-ready cluster in a few hours.

Us software engineers, we write applications. These days a lot of them run on the internet and you wanna put them somewhere on some computers connected to the internet. Back in “the day" you might have put them on some servers in a datacenter (or your parents’ basement). Things are a little different today.

Some time ago I moved my hosting from a traditional datacenter to AWS. I really didn’t know a lot about it so I asked the advice of some smart and very experienced nerds. I thought it would be pretty much the same as what I was doing but using elastic compute instances instead of bare metal. They all told me “AWS is NOT a datacenter in the cloud. Stop thinking like that.”

For example, you could spin up some database server instances to run MySQL or PostgreSQL OR you could just have AWS do it for you. You could set up HAproxy and get really expensive load balancers, or simply use an elastic load balancer. Could run a mail server if you’re into that, but I prefer SES. Memcached? Provided by ElastiCache. Thinking of setting up nagios and munin? CloudWatch is already integrated into everything.

Point being: all the infrastructure you need is provided by Amazon and you don’t need to pay DevOps jokers to set it up for you. AWS engineers have already done all the work. Don’t let smooth-talking Cloud Consultants talk you into any sort of configuration management time-wasters like Puppet. Those tools impose extra overhead to make your systems declaratively configured rather than imperatively because they are designed for people who maintain systems. In EC2-land you can and should be able to kill off any instance at any time and a new one will pop up in its place, assuming you’re using autoscaling groups. You are using ASgroups, right? You will be soon!

When you can re-provision any instance at will, there is no longer any need to maintain and upgrade configuration. Just make a new instance and terminate the old ones. Provision your systems using bash. Or RPMs if you want to get really fancy. You really don’t need anything else.

I’m a fan of Amazon Linux, which is basically just CentOS. I use a nifty yum plugin that lets me store RPMs in the Simple Storage Service (S3) and have instances authenticate via IAM instance roles. This is a supremely delightful way of managing dependencies and provisioning instances.

The last piece of the puzzle is orchestration; once you have all of your infrastructure in place you still need to perform tasks to maintain it. Updating your launch configurations and autoscaling groups, deploying code to your servers, terminating and rebuilding clusters, declaring packages to be installed on EC2 with cloud-init and so on. You could do all of this by hand maybe or script it, except that you don’t have to because I already did it for you!

To be totally honest, my AWS setup is pretty freaking sweet. The reason it is freaking sweet is because I listened to grumpy old AWS wizards and took notes and built their recommendations into a piece of software I call Udo – short for Unemployed DevOps.

Udo is a pretty straightforward application. It essentially provides a configuration-driven command-line interface to Boto, which is the python library for interfacing with the AWS APIs. It is mostly centered around autoscaling groups, which are a very powerful tool not only for performing scaling tasks but also for logically grouping your instances. In your configuration file you can define multiple clusters to group your ASgroups, and then define “roles” within your clusters. I use this system to create clusters for development, QA, staging and production, and then in each cluster I have “webapp” roles and “worker” roles, to designate instances which should handle web requests vs. asynchronous job queue workers. You can of course structure your setup however you want though.

Using Udo is as simple as it gets. It’s a python module you can install like any other (sudo easy_install udo). Once it’s installed you create a configuration file for your application’s setup and a Boto credentials file if you don’t already have one. Then you can take it for a spin.

The cluster/role management feature is central to the design. It makes it so you never have to keep track of individual instances or keep track of IP addresses or run any sort of agents on your instances. Finding all of your stage webapp server IPs for example is as easy as looking up the instances in the stage-webapp autoscaling group. You can easily write tools to automate tasks with this information. We have a script that allows you to send commands to an autoscaling group via SSH, which works by reading the external IPs of the instances in the group. This is so useful we plan on adding it to Udo sometime in the near future, but it’s an example of the sort of automation that would normally require fancy tools and daemons or keeping track of IPs in some database somewhere, but is totally simplified by making use of the tools which Amazon already provides you.

Udo has a few nifty features on offer. One handy command is “updatelc” – update launchconfiguration. Normally you cannot modify a launch configuration attached to an autoscaling group, so Udo will instead create a copy of your existing launchconfig and then replace the existing launchconfig on your asgroup, allowing you to apply udo.yml configuration changes without terminating your asgroup. Very handy for not having to bring down production to make changes.

Another powerful feature is tight integration with CodeDeploy, a recent addition to the AWS ops tools suite. As far as I’m aware Udo is the first and only application to support CodeDeploy at this time and I actually have an epic support ticket open with a sizable pile of feature requests and bug reports. Despite its rather alpha level of quality it is extremely handy and we are already using it in production. It allows AWS to deploy a revision straight from GitHub or S3 to all instances in an autoscaling group or with a particular tag, all without any intervention on your part other than issuing an API call to create a deployment. You can add some hooks to be run at various stages of the deployment for tasks like making sure all your dependencies are installed or restarting your app. I’d honestly say it’s probably the final nail in the coffin for the DevOps industry.