Developing a cloud-based IoT service

In my previous post I describe my adventures in building an AWS IoT-enabled application for a proprietary embedded linux system and getting it to run. The next step in our journey is to create a service that communicates with our device and controls it in a useful way.

What can we do with a system running with the aws_iot library? We can use the MQTT message bus to subscribe to channels and publish messages, and we can diff the current device state against the desired device state shadow stored on the server. Now we need the service side of the puzzle.

My sample IoT application is to be able to view images on an IP camera from anywhere on the internet. I’m planning to incorporate live HD video streaming as well but that is a whole other can of worms we don’t need to open for this demonstration. My more modest goal for now will be to create a service where I can request a snapshot from the camera be uploaded to AWS’s Simple Storage Service (S3) which can store files and serve them up to authenticated users. In addition I will attempt to build the application server logic around AWS Lambda, a service for running code in response to events without actually having to deploy a server or run a daemon of any sort. If I can manage this then I will have a truly cloud-based service; one that does not consume any more resources than are required to perform its job and with no need to pre-allocate any servers or storage. It will be running entirely on Amazon’s infrastructure with only small bits of configuration, policy and code inserted in the right places to perform the relatively simple tasks required of my app. This is the Unemployed DevOps lifestyle, the dream of perfect lazy scalability and massive offloading of effort and operations to Amazon. There is of course a large downside to this setup, namely that I am at the mercy of Amazon. If they are missing a feature I need then I’m pretty much screwed and if their documentation is poor then I will suffer enormously. A partial description of my suffering and screwed state continues below.

I’ve been bitten before by my foolish impetuousness in attempting to use new AWS services that have clearly not been fully fleshed out. I was an early adopter of the CodeDeploy system, a super useful and nifty system for deploying changes to your application on EC2 instances from S3 or even straight from GitHub. Unfortunately it turned out to not really be finished or tested or documented and I ended up wasting a ton of time trying to make it work and deal with corner cases. It’s a dope service but it’s really painfully clear nobody at AWS has ever bothered to actually try using it for a real application, and all of my feature requests and bug reports and in-person sessions with AWS architects have all resulted in exactly zero improvements despite my hours of free QA I performed for them. As a result I am now more cautious when using new AWS services, such as IoT and Lambda.

In truth attempting to make use of the IoT services and client library has been one of the most frustrating and difficult uphill battles I’ve ever waged against a computer. The documentation is woefully incomplete, I’ve wasted tons of time guessing at what various parameters should be, most features don’t really behave as one would expect and the entire system is just super buggy and non-deterministic. Sometimes when I connect it just fails. Or when subscribing to MQTT topics.

Usually this doesn't happen. But sometimes it does!
Usually this doesn’t happen. But sometimes it does!

Why does it disconnect me every few seconds? I don’t know. I enabled autoReconnect (which is a function pointer on a struct unlike every other function) so it does reconnect at least, except when it just fails for no apparent reason.

setAutoReconnectStatus is only mentioned as being a typedef in the MQTT client documentation. One would assume you should call the function aws_iot_mqtt_autoreconnect_set_status(), but the sample code does indeed call the struct’s function pointer instead. No other part of the library uses this fakeo method call style.

On the boto3 (python AWS clienet library) side things are not really any better. The device shadow support (called IoT Dataplane) documentation is beyond unhelpful at least as of this writing. If you want to update a device state dictionary (its “shadow”) in python, say, in a lambda, you call the following method:

Usually when you want to specify a dictionary-type object as a param in python it’s customary to pass it around as a dict. It’s pretty unusual for an API that is expecting a dictionary data structure to expect you to already have encoded it as JSON, but whatever. What is really missing in this documentation is the precise structure of the update payload JSON string you’re supposed to pass in. You’re supposed to pass in the desired new state in the format {“state”: { “desired”: { … } } }:

My dumb lambda

If you hunt around from the documentation pages referenced by the update_thing_shadow() documentation you may uncover the correct incantation, though not on the page it links to. It would really save a lot of time if they just mentioned the desired format.

I really definitely have no reason why it wants a seekable object for the payload since it’s not like you can really send files around. I actually first attempted to send an image over the IoT message bus with no luck, until I realized that the biggest message that can ever be sent over it is 128k. This application would be infinitely simpler if I could transmit the image snapshot over my existing message bus but that would be too easy. I am fairly certain my embedded linux system can handle buffering many megabytes of data and my network is pretty solid, it’s really a shame that AWS is so resource-constrained!

The reason I am attempting to use the device shadow to communicate is that my current scheme for getting an image from the device into AWS in lieu of the message bus is:

  • The camera sends a MQTT message that indicates it is online
  • When the message is received, a DevicePolicy matches the MQTT topic and invokes a lambda
  • The lambda generates a presigned S3 request that will allow the client to upload a file to an S3 bucket
  • The lambda updates the device shadow with the request params
  • A device shadow delta callback on the camera is triggered (maybe once, maybe twice, maybe not at all, by my testing)
  • Callback receives the S3 request parameters and uploads the file via libcurl to S3
  • Can now display thumbnail to a web client from S3

I went to the AWS Loft to talk to an Amazon architect, a nice free service the company provides. He didn’t seem to know much about IoT, but he spoke with some other engineers there about my issues. He said there didn’t appear to be any way to tell what client sent a message, which kind of defeats the entire point of the extra security features, and he was going to file an internal ticket about that. As far as uploading a file greater than 128k, the above scheme was the best we could come up with.

Regarding the security, I still am completely at a loss as to how one is supposed to manage more than one device client at a time. You’re supposed to create a “device” or a “Thing”, which has a policy and unique certificate and keypair attached to it and its own device shadow state. I assume the keypair and device shadows are supposed to be associated with a single physical device, which means you will need to automate some sort of system that provisions all of this along with a unique ThingName and ClientID for each physical device and then include that in your configuration header and recompile your application. For each device, I guess? There is no mention of what exactly how provisioning is supposed to work when you have more than one device, and I kinda get the feeling nobody’s thought that far ahead. Further evidence in support of this theory is that SNS messages or lambdas that are invoked from device messages do not include any sort of authenticated ClientID or ThingName, so there’s no way to know where you are supposed to deliver your response. Right now I just have it hard-coded to my single Thing for testing. I give Amazon 10/10 for the strict certificate and keypair verification, but that’s only one part of a scheme that as far as I can tell has no mechanism for verifying the client’s identity when invoking server-side messages and code.

It wasn’t my intention to bag on AWS IoT, but after months of struggling to get essentially nowhere I am rather frustrated. I sincerely hope that it improves in usableness and stability because it does have a great deal of powerful functionality and I’d very much like to base my application on it. I’d be willing to help test and report issues as I have in the past, except that I can’t talk to support without going in to the loft in person or paying for a support plan, and the fact that all of my previous efforts at testing and bug reporting have added up to zero fixes or improvements doesn’t really motivate me either.

If I can get this device shadow delta callback to actually work like it’s supposed to I’ll post more updates as I progress. It may be slow going though. The code, such as it is, is here.

 

2 thoughts on “Developing a cloud-based IoT service

Leave a Reply