Developing a cloud-based IoT service

In my previous post I describe my adventures in building an AWS IoT-enabled application for a proprietary embedded linux system and getting it to run. The next step in our journey is to create a service that communicates with our device and controls it in a useful way.

What can we do with a system running with the aws_iot library? We can use the MQTT message bus to subscribe to channels and publish messages, and we can diff the current device state against the desired device state shadow stored on the server. Now we need the service side of the puzzle.

My sample IoT application is to be able to view images on an IP camera from anywhere on the internet. I’m planning to incorporate live HD video streaming as well but that is a whole other can of worms we don’t need to open for this demonstration. My more modest goal for now will be to create a service where I can request a snapshot from the camera be uploaded to AWS’s Simple Storage Service (S3) which can store files and serve them up to authenticated users. In addition I will attempt to build the application server logic around AWS Lambda, a service for running code in response to events without actually having to deploy a server or run a daemon of any sort. If I can manage this then I will have a truly cloud-based service; one that does not consume any more resources than are required to perform its job and with no need to pre-allocate any servers or storage. It will be running entirely on Amazon’s infrastructure with only small bits of configuration, policy and code inserted in the right places to perform the relatively simple tasks required of my app. This is the Unemployed DevOps lifestyle, the dream of perfect lazy scalability and massive offloading of effort and operations to Amazon. There is of course a large downside to this setup, namely that I am at the mercy of Amazon. If they are missing a feature I need then I’m pretty much screwed and if their documentation is poor then I will suffer enormously. A partial description of my suffering and screwed state continues below.

I’ve been bitten before by my foolish impetuousness in attempting to use new AWS services that have clearly not been fully fleshed out. I was an early adopter of the CodeDeploy system, a super useful and nifty system for deploying changes to your application on EC2 instances from S3 or even straight from GitHub. Unfortunately it turned out to not really be finished or tested or documented and I ended up wasting a ton of time trying to make it work and deal with corner cases. It’s a dope service but it’s really painfully clear nobody at AWS has ever bothered to actually try using it for a real application, and all of my feature requests and bug reports and in-person sessions with AWS architects have all resulted in exactly zero improvements despite my hours of free QA I performed for them. As a result I am now more cautious when using new AWS services, such as IoT and Lambda.

In truth attempting to make use of the IoT services and client library has been one of the most frustrating and difficult uphill battles I’ve ever waged against a computer. The documentation is woefully incomplete, I’ve wasted tons of time guessing at what various parameters should be, most features don’t really behave as one would expect and the entire system is just super buggy and non-deterministic. Sometimes when I connect it just fails. Or when subscribing to MQTT topics.

Usually this doesn't happen. But sometimes it does!
Usually this doesn’t happen. But sometimes it does!

Why does it disconnect me every few seconds? I don’t know. I enabled autoReconnect (which is a function pointer on a struct unlike every other function) so it does reconnect at least, except when it just fails for no apparent reason.

setAutoReconnectStatus is only mentioned as being a typedef in the MQTT client documentation. One would assume you should call the function aws_iot_mqtt_autoreconnect_set_status(), but the sample code does indeed call the struct’s function pointer instead. No other part of the library uses this fakeo method call style.

On the boto3 (python AWS clienet library) side things are not really any better. The device shadow support (called IoT Dataplane) documentation is beyond unhelpful at least as of this writing. If you want to update a device state dictionary (its “shadow”) in python, say, in a lambda, you call the following method:

Usually when you want to specify a dictionary-type object as a param in python it’s customary to pass it around as a dict. It’s pretty unusual for an API that is expecting a dictionary data structure to expect you to already have encoded it as JSON, but whatever. What is really missing in this documentation is the precise structure of the update payload JSON string you’re supposed to pass in. You’re supposed to pass in the desired new state in the format {“state”: { “desired”: { … } } }:

My dumb lambda

If you hunt around from the documentation pages referenced by the update_thing_shadow() documentation you may uncover the correct incantation, though not on the page it links to. It would really save a lot of time if they just mentioned the desired format.

I really definitely have no reason why it wants a seekable object for the payload since it’s not like you can really send files around. I actually first attempted to send an image over the IoT message bus with no luck, until I realized that the biggest message that can ever be sent over it is 128k. This application would be infinitely simpler if I could transmit the image snapshot over my existing message bus but that would be too easy. I am fairly certain my embedded linux system can handle buffering many megabytes of data and my network is pretty solid, it’s really a shame that AWS is so resource-constrained!

The reason I am attempting to use the device shadow to communicate is that my current scheme for getting an image from the device into AWS in lieu of the message bus is:

  • The camera sends a MQTT message that indicates it is online
  • When the message is received, a DevicePolicy matches the MQTT topic and invokes a lambda
  • The lambda generates a presigned S3 request that will allow the client to upload a file to an S3 bucket
  • The lambda updates the device shadow with the request params
  • A device shadow delta callback on the camera is triggered (maybe once, maybe twice, maybe not at all, by my testing)
  • Callback receives the S3 request parameters and uploads the file via libcurl to S3
  • Can now display thumbnail to a web client from S3

I went to the AWS Loft to talk to an Amazon architect, a nice free service the company provides. He didn’t seem to know much about IoT, but he spoke with some other engineers there about my issues. He said there didn’t appear to be any way to tell what client sent a message, which kind of defeats the entire point of the extra security features, and he was going to file an internal ticket about that. As far as uploading a file greater than 128k, the above scheme was the best we could come up with.

Regarding the security, I still am completely at a loss as to how one is supposed to manage more than one device client at a time. You’re supposed to create a “device” or a “Thing”, which has a policy and unique certificate and keypair attached to it and its own device shadow state. I assume the keypair and device shadows are supposed to be associated with a single physical device, which means you will need to automate some sort of system that provisions all of this along with a unique ThingName and ClientID for each physical device and then include that in your configuration header and recompile your application. For each device, I guess? There is no mention of what exactly how provisioning is supposed to work when you have more than one device, and I kinda get the feeling nobody’s thought that far ahead. Further evidence in support of this theory is that SNS messages or lambdas that are invoked from device messages do not include any sort of authenticated ClientID or ThingName, so there’s no way to know where you are supposed to deliver your response. Right now I just have it hard-coded to my single Thing for testing. I give Amazon 10/10 for the strict certificate and keypair verification, but that’s only one part of a scheme that as far as I can tell has no mechanism for verifying the client’s identity when invoking server-side messages and code.

It wasn’t my intention to bag on AWS IoT, but after months of struggling to get essentially nowhere I am rather frustrated. I sincerely hope that it improves in usableness and stability because it does have a great deal of powerful functionality and I’d very much like to base my application on it. I’d be willing to help test and report issues as I have in the past, except that I can’t talk to support without going in to the loft in person or paying for a support plan, and the fact that all of my previous efforts at testing and bug reporting have added up to zero fixes or improvements doesn’t really motivate me either.

If I can get this device shadow delta callback to actually work like it’s supposed to I’ll post more updates as I progress. It may be slow going though. The code, such as it is, is here.


Diving into IoT development using AWS

I’m more allergic than most people to buzzwords. I cringe big time when companies suddenly start rebranding their products with the word “cloud” or tack on a “2.0”. That said, I realize that the cloud is not just computers in a datacenter and the Internet of Things isn’t all meaningless hype either. There exists a lot of cool new technology, miniaturization, super cheap hardware of all shapes and sizes and power requirements, ever more rapid prototyping and lot more that adds up to what looks like a new era in embedded system hardware.

People at the embedded linux conference can't wait to tell you about IoT stuff
People at the embedded linux conference can’t wait to tell you about IoT stuff

But what will drive this hardware? There is a lot of concern about the software that’s going to be running on these internet-connected gadgets because we all just know that the security on most of these things is going to be downright laughable, but now since they’re a part of your car, your baby monitor, your oven, your insulin pump and basically everything, this is gonna be a big problem.

So I’ve embarked on a project to try to build an IoT application properly and securely. I think it’ll be fun, a good learning experience, and even a useful product that I may be able to sell one day. At any rate it’s an interesting technical challenge.

My project is thus: to build a cloud-based IoT (ughhh sorry) IP camera for enterprise surveillance. It will be based on as much open source software as possible, ABRMS-licensed, mobile-first and capable of live streaming without any video transcoding.

I think I know how to do this, I’ve written a great deal of real-time streaming software in the past. I want to offload as much as the hard stuff as possible; let the hardware do all the h.264 encoding and let AWS manage all of the security, message queueing and device state tracking.

At the Dublin gstreamer conference I got to chat up an engineer from Axis, an awesome Swedish company that makes the finest IP cameras money can buy. He informed me that they have a new program called ACAP (Axis Camera Application Platform) which essentially lets you write what are essentially “apps” that are software packages that can be uploaded to their cameras. And they’re all running Linux! Sweet!

And recently I also learned of a new IoT service from Amazon AWS. I was dreading the humongo task of writing a whole new database-backed web application and APIs for tracking devices, API keys, device states, authentication, message queueing and all of that nonsense. Well it looks like the fine folks at Amazon already did all the hard work for me!

So I had my first development goal: create a simple AWS-IoT client and get it to run on an Axis camera.

Step one: get access to ACAP

Axis doesn’t really make it very easy to join their development program. None of their API documentation is public. I’m always very wary of companies that feel like they need to keep their interfaces a secret. What are you hiding? What are you afraid of? Seems like a really weird thing to be a control freak about. And it majorly discourages developers from playing around with your platform or knowing about what it can do.

But that is a small trifle compared to joining the program. I filled out a form requesting access to become a developer and was eventually rewarded with a salesbro emailing me that he was busy with meetings for the next week but could hop on a quick call with me to tell me about their program. I informed them that I already wanted to join the program and typed all the relevant words regarding my interest into their form and didn’t need to circle back with someone on a conference call in a few weeks’ time, but they were really insistent that they communicate words via telephone.

After Joe got to give me his spiel on the phone I got approved to join the Axis developer partner program. As far as ACAP they give you a SDK which you can also download as an Ubuntu VirtualBox image. Inside the SDK is a tutorial PDF, several cross-compiler toolchains, some shady Makefile includes, scripts for packaging your app up and some handy precompiled libraries for the various architectures.

Basically the deal is that they give you cross-compilers and an API for accessing bits of the camera’s functionality, things like image capture, event creation, super fancy storage API, built-in HTTP server CGI support, and even video capture (though support told me vidcap super jankity and I shouldn’t use it). The cross-compilers support Ambarella ARM, ARTPEC (a chip of Axis’s design) and some MIPS thing, these being the architectures used in various Axis products. They come with a few libraries all ready to link, including glib, RAPP (RAster Processing Primitives library) and fixmath. Lastly there’s a script that packages your app up, building a fat package for as many architectures as you want, making distribution super simple. Now all I had to do was figure out how to compile and make use of the IoT libraries with this build system.

Building mbedTLS and aws_iot

AWS has three SDKs for their IoT clients: Arduino Yún, node.js and embedded C linux platforms. The Arduino client does sound cool but that’s probably underpowered for doing realtime HD video, and I’m not really the biggest node.js fan. Linux embedded C development is where it is at, on the realz. This is the sort of thing I want to be doing with my life.

Hells yeah!

All that I needed to do was create a Makefile that builds the aws_iot client library and TLS support with the Axis toolchain bits. Piece of cake right? No, not really.

The IoT AWS service takes security very seriously, which is super awesome and they deserve props for forcing users to do things correctly: use TLS 1.2, include a server certificate and root CA cert with each device and give each device a private key. Wonderful! Maybe there is hope and the IoT future will not be a total ruinfest. The downside to this strict security of course is that it is an ultra pain in the ass to set up.

You are offered your choice of poison: OpenSSL or mbedTLS. I’d never heard of mbedTLS before but it looked like a nice little library that will get the job done that isn’t a giant bloated pain in the ass to build. OpenSSL has a lot of build issues I won’t go into here.

To set up your app you create a key and cert for a device and then load them up in your code:

 connectParams.pRootCALocation = rootCA;
 connectParams.pDeviceCertLocation = clientCRT;
 connectParams.pDevicePrivateKeyLocation = clientKey;

Simple enough. Only problem was that I was utterly confused by what these files were supposed to be. When you set up a certificate in the IoT web UI it gives you a public key, a private key and a certificate PEM. After a lot of dumbness and AWS support chatting we finally determined that rootCA referred to a secret CA file buried deep within the documentation and the public key was just a bonus file that you didn’t need to use. In case anyone else gets confused as fuck by this like I was you can grab the root CA file from here.

The AWS IoT C SDK (amazon web services internet of things C software development kit) comes with a few sample programs by way of documentation. They demonstrate connecting to the message queue and viewing and updating device shadows.

#define AWS_IOT_MQTT_HOST              "" ///< Customer specific MQTT HOST. The same will be used for Thing Shadow                                                                                                                       
#define AWS_IOT_MQTT_PORT              8883 ///< default port for MQTT/S                                                                  
#define AWS_IOT_MQTT_CLIENT_ID         "MischaTest" ///< MQTT client ID should be unique for every device                                 
#define AWS_IOT_MY_THING_NAME          "MischaTest" ///< Thing Name of the Shadow this device is associated with                          
#define AWS_IOT_ROOT_CA_FILENAME       "root-ca.pem" ///< Root CA file name                                                               
#define AWS_IOT_CERTIFICATE_FILENAME   "1cd9c753bf-certificate.pem.crt" ///< device signed certificate file name                          
#define AWS_IOT_PRIVATE_KEY_FILENAME   "1cd9c753bf-private.pem.key" ///< Device private key filename                                      

To get it running you edit the config header file, copy your certificates and run make. Then you can run the program and see it connect and do stuff like send messages.


Once you’ve got a connection set up from your application to the IoT API you’re good to go. Kind of. Now that I had a simple C application building with the Axis ACAP SDK and a sample AWS IoT application building on linux, the next step was to combine them into the ultimo baller cloud-based camera software. This was not so easy.

Most of my efforts towards this were spent tweaking the Makefile to pull in the mbedTLS code, aws_iot code and my application code in a setup that would allow cross-compiling and some semblance of incremental building. I had to up my Make game considerably but in the end I was victorious. You can see the full Makefile in all its glory here.

The gist of it is that it performs the following steps:

  • loads ACAP make definitions (include $(AXIS_TOP_DIR)/tools/build/rules/common.mak)
  • sets logging level (LOG_FLAGS)
  • grab all the source and include files/dirs for mbedTLS and aws_iot
  • define a static library target for all of the aws_iot/mbedTLS code – Screen Shot 2016-03-20 at 2.19.16 PM
  • produce executable:
    Screen Shot 2016-03-20 at 8.39.58 PM.png

The advantage of creating aws-iot.a is that I can quickly build changes to my application source without having to re-link dozens of files.

I combined the Axis logging macros and the aws_iot style logging into one syslog-based system so that I can see the full output when the app is running on the device.

Uploading to the Camera

Once I finally had an ACAP application building I was finally able to try deploying it to a real camera (via make target of course):

Screen Shot 2016-03-20 at 2.18.58 PM

Screen Shot 2016-03-20 at 2.14.21 PM

Getting the app running on the camera and outputting useful logging took quite a bit of effort. I really ran into a brick wall with certificate verification however. My first problem was getting the certs into the package, which was just a simple config change. But then it began failing. Eventually I realized it was because the clock on the camera was not set correctly. Realizing the importance of a proper config, including NTP, I wrote a script to configure a new camera via the REST API. I wanted it to be as simple as possible to run so I wrote it without requiring any third party libraries. It also shares the package uploader config for the camera IP and password so if you’ve already entered it you don’t need to again.

With NTP configured at least there are no more certificate expired errors. I’m able to connect just fine on normal x86 linux, but fails to verify the certs when running on the camera. After asking support, they suggest recompiling mbedTLS with -O0 (disable optimizations) when building on ARM. After doing so, it connects and works!

Screen Shot 2016-03-20 at 2.14.51 PM

🌭🍕🍔 !!!!! Success!

To summarize; at this point we now have an embedded ARM camera device that is able to connect and communicate with the AWS IoT API securely. We can send and receive messages and device shadow states.

So what’s next? Now we need a service for the camera to talk to.