Introduction: AWS and IBM: an IoT Services Comparison
Today we are comparing two stacks that make it possible to develop IoT applications under the point of view of different service offers.
Step 1: Functions As a Service
FaaS is a category of cloud services used to build a “serverless” architecture.
FaaS allows the customers to develop, run, and manage application functionalities without building and maintaining the infrastructure.
Amazon offers AWS Lambda, IBM offers IBM Cloud Functions. Those services are quite similar, however Lambda was the first of this kind. Using FaaS you can run pieces of code in the cloud and every service supports different programming languages.
IBM Supports more languages and with docker is easy to use scripts written in other languages. This can be done also with Lambda but it is not immediate. You can read an example here: https://aws.amazon.com/blogs/opensource/rust-runt...
Both services have usage limits, we report them in a table and highlight the best.
The price is based on GigaBytes per seconds (RAM) with the addition of the number of requests for AWS Lambda.
Each service has a free plan and they are almost equivalent. As you can see Lambda is a bit cheaper for the GB/s but it has a cost related to requests that Cloud Functions doesn’t have so the cost almost the same in general. Of course, if you need to run tasks that eats memory and uses few requests you should use Lambda. The main advantage of IBM Cloud Function, in our opinion, is that its stack is open source. It is completely based on Apache OpenWhisk and can be also deployed on a private infrastructure.
Step 2: Machine Learning
A field where the IBM and AWS stacks offer similar services is that of machine learning: Amazon with its SageMaker and IBM with Watson Machine Learning. The two services are on many aspects very similar: both present themselves as tools to help data scientists and developers build, train and then deploy into production ready environments their machine learning models, but the philosophies the two companies adopt vary quite a bit.
Both services let you choose between different degrees of control on the models you use. In Watson ML, you have some built-in models that are already trained to do some very specific tasks: for example, if you want to recognize what objects are present in a picture you just import the VisualRecognitionV3 model and pass to it the picture you want to analyze. You can also build a “custom model”, but in Watson ML this mostly means taking an already built model and doing our training on it, so the customization is quite limited. It’s important to notice though that neither SageMaker nor Watson ML are the only ways of doing machine learning on their developers’ stacks, they’re just services aiming to make the developers’ lives easier. The Watson ML platform also supports many of the most popular machine learning libraries, so you can even build a model from scratch with PyTorch, Tensorflow or similar libraries. You either use those libraries directly, or use the pre-made models, there is no middle ground. Also Watson ML doesn't support Amazon's choice library, Apache MXNet, which instead has first class support in SageMaker.
Amazon SageMaker’s approach, even when using built-in options, is a bit more low level: rather than making you choose from pre-made models, it lets you choose from a plethora of already implemented training algorithms, which you can use when building your model in a more traditional way. If these aren’t enough, you can also use your own algorithm. This way of doing stuff certainly requires more knowledge on how machine learning is done compared to just using a trained model in Watson ML.
At a first glance it may seem that Watson ML is the “easy and quick” way, with Amazon SageMaker being the more complex one to set up. This might not be entirely true from some points of view, as SageMaker is structured to make everything run on a Jupyter Notebook, while for the same features in Watson ML you have to set up many different sub-services from the web UI. The preprocessing of the data also has dedicated spaces on the IBM service while SageMaker relies on you doing it all from code in your notebook. This plus the fact that Jupyter notebooks aren’t exactly the best choice from a software engineering point of view, may prevent SageMaker from scaling very well in production. Both services have pretty good and simple mechanisms to deploy your model and make APIs for it available in the outside world.
In conclusion, Watson ML performs better in huge projects where the Jupyter notebooks start showing their limits, and where you don’t need much customization in what the model itself does. SageMaker is a lot better when you need more flexibility in defining the algorithms, but when using it you need to take into account the fact that you have to rely on Jupyter Notebooks, that may not scale well in production. A solution could be to decouple the rest of the code from the model as much as possible, so that the code in the actual notebooks doesn’t get too big and we can better organize our software in the other modules that just use our model’s API.
Step 3: Data Streaming & Analytics
Data streaming services are crucial in handling and analyzing in real time large flows of data. This flow can be from the cloud to the users’ device, like a video streaming, or from the users to the cloud, like IoT telemetry and sensor readings. Especially in the second case, we could have a situation where single sources upload small amounts of data but when we consider the overall throughput, coming from all the devices, it consumes considerable bandwidth, thus it makes sense to use a service specialized to handle such flows of data. Without handling this continuous flow directly, we would have to buffer the incoming information into a temporary storage and in a second time process it with some computational engine. The problem of this last approach is that we would have to coordinate more different services to achieve what a single data stream service already does alone, increasing the complexity of the application’s maintenance and configuration. In addition, the buffering can in principle make our application no longer in real time, since for an item to be processed it is necessary that all the other items before it to be processed as well, and adding precedence policies to the buffer can, again, increase the complexity drastically.
Summing up, data streaming services offer data flow handling in real time, with an easy configuration, and can provide analytics on the incoming data. Here we compare the two main streaming services of the IBM and AWS stack, namely IBM Streams and AWS Kinesis.
We start by noting that all the basic features that we may want from a streaming service are offered by both IBM and AWS. These features include virtually infinite processing rate, low latency and real time data analytics. Since we are talking about professional services, they both offer production-grade tools for deployment and automation.
Talking about data analytics, both services offer it as an optional, making you pay only whether you need it or not. In case of Kinesis, when you don’t need analytics but just data flow handling, the prices are charged per GB processed instead of processing time, like in the IBM case. The pricing per GB will be generally less expensive than the pricing per time, since you are paying only for the incoming traffic. For the rest of this post we will consider both IBM Streams and AWS Kinesis with the data analytics feature enabled.
Streams and Kinesis provide integration with different services for pre-processing and filtering the incoming data before passing them to data analytics, respectively with Apache Edgent and AWS Lambda. While these services are radically different one to the other, we will discuss them only from the point of view of the two streaming services. The fundamental difference between the two is that Apache Edgent executes on the device, while AWS Lambda executes on the cloud. This brings lots of pros and cons: from Lambda side we have a flexible and easy-to-use service with a seamless integration with Kinesis, but it requires the data to be already uploaded to the cloud, thus losing in efficiency and paying Kinesis also for the data that will eventually discarded. From Edgent side instead, we have that most of the computation is done, well, at the edge of the network (thus on the devices) before uploading useless data on the cloud. The main drawback is that Edgent is a large framework, which may require time to set up and could be complex to maintain. Another difference that could be relevant in the choice of a platform is that Edgent is fully open source, Lambda is not. This can be seen both as a pro, since having access to the code that you or your customer will execute is always a positive thing, both as a con, because there may be situations where you need urgent support that can’t be provided in all open source environments.
Other features that we can mention is Kinesis’s auto-scalability of the allocated resources. Indeed, the hardware it offers is composed by a number of so called Kinesis Processing Units (KPUs) running in parallel, where one KPU offers 1 vCore and 4GB of RAM. Their number depends on the needs of the application and are dynamically and automatically allocated (what you pay is indeed the cpu time times the number of KPUs), just remember that it is a Kinesis policy to charge you one KPU more if you use a Java application. IBM Streams, instead, does not provide this kind of flexibility, offering you a container with fixed hardware, more details when we talk about pricing. On the other hand, IBM Streams is more open than Kinesis, since it interfaces to the WAN via common used protocols, like HTTP, MQTT and so on, while Kinesis is closed to the AWS ecosystem.
As final comparison let’s talk about pricing, and let me tell that IBM doesn’t work great on this point. We have configured different solutions for three different categories (basic, high-end, ultra-high-end) for both IBM and AWS, and we are going to compare their price. In the basic configuration we have one AWS KPU, mentioned earlier, against an IBM solution with the same hardware. For the high-end we have 8 KPUs running is parallel for Kinesis and 2 containers always in parallel for IBM, each with 4 vCores and 12GB of RAM. Always IBM offers in the ultra-high-end a single container with 16 vCores and 128GB of RAM, while we omitted an equivalent solution for AWS, since if some application requires this large amount of RAM it could not possible to run it on different KPUs. The prices we report are expressed in $/month considering a 24/7 usage. For the basic configuration we have for IBM and AWS respectively 164$ and 490$, for the high-end 1320$ and 3500$, for the ultra-high-end AWS is not considered and there is only IBM with 6300$. From these results we can see that Kinesis works better for the everyday user up to enterprise level, while it lacks of options to directly handle data analytics which require enormous amount of computing power. Kinesis delivers better performance/$ ratio than IBM Streams, helped also by the dynamic allocation of small resource blocks only when needed, while IBM offers you a fixed container. In this way, if your workload is characterized by peaks, with IBM you are forced to overestimate your application needs and configuring a solution in the worst case scenario. IBM offers hours fees instead of paying the full month, but it is not automated as Kinesis.
Step 4: IoT Architecture
The configuration for devices for aws iot is quite easy when compared to ibm watson iot. Because in ibm watson iot the authentication is per device with token and once it display the token it will never displayed again.
Coming to pricing part again ibm watson iot is quite costly compared to aws iot. So, the price in ibm watson iot charges are based on per device, data storage, data traffic. But in aws iot we can pay the amount once and we can add more devices and data published from devices and delivered to devices.
Start with your device- whether it’s a sensor, gateway, or something else- and let us help you to connect with cloud.
Your device data is always secure when you connect to the cloud using open, lightweight MGTT messaging protocol or HTTP. With the help of protocols and node-red we can connect our device with iot platform and can access live and historical data.
Use our secure API’s to connect your apps with data from your devices.
Create applications within our given cloud service to interpret data.