The new Industrial Internet Consortium (IIC) Industrial Internet Reference Architecture v1.8 (IIRA) contains a new architectural pattern called the “layered databus” pattern. A databus is a new way to manage dataflow in an industrial system.
The IIRA databus definition is:
A databus is a data-centric information-sharing technology that implements a virtual, global data space. Software applications read and update entries in a global data space. Updates are shared between applications via a publish-subscribe communications mechanism.
Key characteristics of a databus are:
a) the participants/applications directly interface with the data,
b) the infrastructure understands, and can therefore selectively filter the data, and
c) the infrastructure imposes rules and guarantees of Quality of Service (QoS) parameters such as rate, reliability, and security of data flow.
Like a database, a databus is a data-science concept. Let’s look at what it really means.
How is a databus different from a database?
Perhaps the easiest way to understand is to compare a databus to the more familiar database.
The key difference: A database implements data-centric storage. It saves old information that you can later search by relating properties of the stored data. A databus implements data-centric interaction. It manages future information by letting you filter by properties of the incoming data.
Under the hood, a database is a special file. But, user applications don’t see that; applications interact with the data itself. A database knows how to interpret the data, enables operations like search and enforces access control. There is no application-level awareness of the file system. Programs using a database read and write and search for data objects, not files.
Similarly, under the hood, a databus sends messages. But, user applications don’t see that; applications interact directly with data and data “Quality of Service” (QoS) properties like age and rate. The databus controls access to the data, how data in the system changes, and when participants get updates. There is no application-level awareness or concept of “message”. Programs using a databus read and write and get updates to data, they do not send and receive messages.
This common property of “data centricity” is hugely powerful in both cases. By enforcing simple rules that control the data model, databases ensure consistency. By exposing the data to search and retrieval by all users, databases greatly ease system integration. By allowing discovery of data and schema, databases also enable generic tools for monitoring, measuring and mining information.
Similarly, a databus understands the content of, and need for, the transmitted data. With knowledge of the structure and demands on data, the databus can do things like filter information, selecting when or if to do updates. The infrastructure itself can control QoS like update rate, reliability and guaranteed notification of peer liveliness. The infrastructure can discover data flows and offer those to applications and generic tools alike. This accessible data greatly eases system integration.
Databases enable huge enterprise systems with thousands of applications sharing storage; databases power the enterprise. Similarly, databuses enable huge industrial systems with thousands of interacting applications; databuses will power the IIoT.
Is a databus a database that you interact with via a pub-sub interface?
No, there is no database. A database implies storage: the data physically resides somewhere. A databus has no storage.
The databus defines how to interact with future information. For instance, if “you” are an intersection controller, you can subscribe to updates of vehicles within 200m of your position. Those updates will then be delivered to you, should a vehicle ever approach. Delivery is guaranteed in many ways (start within .01 secs, updated 100x/sec, reliable, etc.). The data may never be stored at all; it’s just delivered.
Isn’t a databus the same as pub-sub?
Most pub-sub is very primitive. An application “registers interest”, and then everything is simply sent to that application. So, for instance, an intersection collision-detection algorithm could subscribe to “vehicle positions”. Pub-sub sends messages from any sensor capable of producing positions, with no knowledge of the data inside that message. Even “content filtering” pub-sub offers only very simple specs, and requires the system to pre-select what’s important for all. There’s no real control of flow.
A databus is much more expressive. That intersection could say “I am interested only in vehicle positions within 200m, moving at least 2m/s towards me. If a vehicle falls into my specs, I need to be updated 200 times a second. You (the databus) need to guarantee me that all sensors feeding this algorithm promise to deliver data that fast…no slower or faster. If a sensor updates 1000 times a second, then only send me every 5th update. I also need to know that you actually are in touch with currently-live sensors (which I define as producing in the last 0.01secs) on all possible roadway approaches at all times.” (These are a few of the 20+ QoS settings in the Data Distribution Service standard.)
Note that a subscribing application in the primitive pub-sub case is very dependent on the actual properties of its producers. It has to somehow trust that they are alive (!), that they have enough buffers to save the information it may need, that they won’t flood it with information nor provide it too slowly. If there are 10,000 cars being sensed 1000x/sec, but only 3 within 200m, it will have to receive 10,000*1000 = 10m samples every second just to find the 3*200 = 600 it needs to pay attention to. It will have to ping every single sensor 100x/second just to ensure it is active. If there are redundant sensors on different paths, it has to ping them all independently and somehow make sure all paths are covered. It also has to know the schema of the producers, etc.
The application in the second case will, by contrast, receive exactly the 600 samples it cares about, comfortable in the knowledge that at least one sensor for each path is active. The flow rate is guaranteed. Sufficient reliability is guaranteed. The total dataflow is reduced by 99.994% (we only need 600/10m samples, and smart middleware does filtering at the source). The databus delivers only the needed information.
Which applications need a databus?
The databus standard, Data Distribution Service (DDS), is an active, growing family of standards. It has significant use across many Industrial IoT verticals, including medical, transportation, smart cities, energy, and defense.
Example applications include intelligent robots, autonomous cars, wind turbines, connected medical devices, and large coordinated systems like NASA’s 300k-pt launch control SCADA and advanced Navy combat management. These systems need reliability even when components fail, data fast enough to control physical processes, selective discovery, and scalable delivery. They combine teams of programmers working on large software systems over time. Databuses are great for software integration of demanding IIoT systems.
About Your Guest Blogger
Stan Schneider is CEO of Real-Time Innovations (RTI), the Industrial Internet of Things connectivity platform company. RTI is the largest embedded middleware vendor, with an extensive footprint in all areas of the Industrial Internet of Things, including Energy, Medical, Automotive, Transportation, Defense, and Industrial Control. Appinions recently named RTI the “most influential” company in the Industrial Internet of Things in an article published in Forbes and Reuters. Stan is the small company representative on the Industrial Internet Consortium Steering Committee. With over 200 companies, the goal of the IIC is to develop, test, and promote the standards that are crucial to the success of the next industrial revolution. Stan serves on the advisory boards for Smart Industry and IoT Solutions World Congress. Embedded Computing Design Magazine presented Stan the Top Embedded Innovator Award for 2015. Stan holds a BS and MS from the University of Michigan and a PhD in Electrical Engineering and Computer Science from Stanford University. Twitter: https://twitter.com/RTIStan/