Intelligent automation using multi-modal predictive maintenance on AI


Artificial intelligence (AI) adoption in organizations is maturing from prototype to production but various factors are impeding its accelerated growth. AI in simple terms, is a technology that has the ability to learn, and as a result, provide responses that were not programmed or predicted by its creators. AI enables machines to learn from “experience” while adjusting to new input and stimuli so they can perform human-like tasks, ultimately leading to more “complete automation” in many traditionally man-powered processes.

Service management has been a domain that is dominated by manual processes. Businesses spend more and strive to set up more robust service management in enterprises and industries because equipment down-time in these environments is costly and affects more people. Industries and enterprises today have much more machinery and equipment than they had 5 years ago and hence have a bigger service management portfolio than in the past. A recent push to accelerate Digital Transformation is adding new kinds of devices, IoT devices, into an organization’s portfolio. Further, rapidly growing Industrial IoT is expediting factories to roll-out their massive IoT networks and “digital twins”; markets forecast the digital twin market to grow 2000% in the next 10 years. 5G connectivity in industries are fuelling IIoT and digital-twin implementations. All these developments further broadens the scope of service management.

Surging demand for service management solutions are driving accelerated adoption of AI approaches across industries and enterprises. Given the rapid pace at which billions of devices are getting connected to the internet through IoT, the need for AI will be more prominent as devices now demand continuous service and support, in real-time. Recent studies show that Enterprises are adopting AI predominantly in ITSM (Information Technology Service Management) whereas industries in predictive maintenance of different equipment.


In spite of the fact that AI holds one of its biggest potential, Operation and maintenance (O & M) divisions and their stakeholders in any organization find it difficult to quickly navigate through the AI adoption cycle. Unlike ITSM, predictive maintenance etc. O & M use cases are laden with tightly-knit manned processes that conspicuously require multiple AI technologies working in tandem to realize the full potential of “intelligent automation”. The last mile of operation analytics, or the virtual mile between operations and the service is a vital O & M use case that has opened its doors for intelligent automation using multi-modal predictive maintenance on AI. It covers the planning phase of the service life cycle – scheduling, dispatching and ticketing. There is little research done on addressing the challenges of AI implementation in a highly manned environment with traditional silos that, by its very nature, demands a consolidated AI technology that brings different tools under one bundle to solve problems in a true human way. Widespread implementation of AI in Last-mile operation analytics or AI in general, in an organization, is further impaired by skill-shortage, lack of support from management, perception of higher risk and most importantly the fact that organizations find it overwhelming to deal with too many disconnected AI technologies to solve their one problem.

Every potential use case of AI in an organization needs a mix of AI technologies implemented. A new approach of consolidating different bits and pieces of AI that are available today – text retrieval, text extraction, speech recognition, conversational AI, image analytics, video analytics, predictive maintenance etc. into one bundle well integrated to solve problems in an organization is a huge step in accelerating the adoption of AI.


Sensfix is undertaking that step to expedite the adoption of AI among its current and potential customers in the facilities management domain. Industrial facilities as well as enterprise facilities like offices, corporate real-estate that use sensfix’s current offering today – workflow digitization and automation, are gearing up for the future with intelligent automation using AI.

Sensfix is instituting a new way how devices should be maintained, repaired and taken care of. Whether it is a plant manager in an industrial plant/facility, facility manager in an enterprise facility, an operations & maintenance manager in an oil & gas facility or a customer support Manager of an OEM, sensfix will be the new AI-driven virtual assistant manager to each one of them assisting them in performing their day-to-day standard operating procedures, silos, workflows, best-practice methodologies etc. It will automatically perform learnt tasks that are repetitive in nature and keeps learning new tasks.

sensfix SaaS is a IoT-connected AI-driven Service Lifecycle Management platform helping Operation & Maintenance managers easily digitize their workflows, rendering devices self-schedule, self-dispatch and self-ticket their repair & maintenance needs automatically by themselves.

sens-rule engine

At its core is an integrated Rules-Engine ↔ workflow-engine combination that works coherently and collaboratively to achieve maximum scalability, operability and adaptability. Sensfix’s Rule Engine is a first-of-its-kind multi-nodal (text, image, video, audio, IoT data) inference engine built 100% on AI allowing it to realize a full virtual human-assist to a service manager. Rules Engines have evolved over several decades in pursuit of the perennial goal of an automated decision-making system to model complexity, time and uncertainty. Earliest Rules Engines used Forward Chaining techniques that start with a number of facts (or data) and apply rules to derive all possible conclusions from those facts. Only a few expert decision-making systems used the Backward Chaining techniques that work backwards from conclusion to facts where few facts (or data) are asked, but many rules are searched.


IFTTT (If This Then That) is a category of Forward Chaining techniques that was widely adopted because they were scalable, flexible, offered easier integration with third-party systems, easy to operate despite showing limitations on tackling complexity. The need to model complexity in temporal domain led to innovations like Flow Based Programming, Stream Processing, Complex Event Processing and Decision Trees. Modeling uncertainty was made possible only by applying AI techniques. A handful of AI based Rules Engines and Decision-making engines have been built in the last few years that primarily focussed on Gaming domain, health care being a distant second target sector. A majority of industrial and enterprise applications do need different modes of AI to work coherently and collaboratively like different sensing organs do in a human body. Sensfix’s multi-nodal ai-engine has embarked on a journey to design a competent human-assist to an operation & maintenance manager in an industrial/ enterprise facility.


AI technologies for handling different modalities of input such as numerical data, categorical data, text, speech, audio, image and video are disparate, and quite often highly specific to that modality, and not interoperable. Even handling text itself is not uniform across different genres of text. Different methods and algorithms are needed to handle email text, SMS text, operational manual text, and conversational text etc.

There are very few systems that can handle more than two of these modalities under a single umbrella. Significant challenges exist in combining these modalities in everyday industrial applications for ex: monitoring and displaying the centralized view of the health of a system or the progress of a service request. Researchers have been busy in perfecting AI for individual modalities, hence there are very few publications which focus on integrating the above modalities. This is a virgin field, requiring serious attention from enterprises, as universities are ill poised (mostly due to lack of data) to handle this kind of amalgamation of inputs. Industries and enterprises should lead this effort for academia to accelerate pending research.

In this project we are precisely tackling this amalgamation problem, and will start exploring practical methods of handling mixed input signals, leveraging the research work done with respect to individual modalities. Since we have to pioneer many of these multi-modality signal handling techniques, our progress could be painful, hard and slow. Our intent is to develop/extend methods inspired by research results that not only work in labs but also seem to have been verified in industry. This severely limits our choices and we may often rely more on results that have withstood tests of time and that can handle our own input data/signals than look into more cutting edge techniques that are yet to be widely proven.

One of the strong requirements in industry is explainability of results. Not all modern AI methods offer explainable models or results. We will thus be very careful in evaluating and using AI techniques originating form research labs, even though seemingly better methods may be reported by researchers –

An example user journey

Let us look into a simple problem that a factory worker can walk into on any given day —

  • The worker, having a mobile device with a sensfix app on it, walking past a machine finds an oil leak dripping to the ground. 
  • He invokes the sensfix app and points his mobile device towards the oil leak area. 
  • The sensfix app records a video of the leaking oil, images of the machine, an audio of the noise generated by the machine and finally prompts the user for a text message explaining the problem, all with the push of a button. 
  • The sensfix ai-engine (sens-ai) picks up this  information, validates it, if not satisfied, engages the worker on the sens-channel (sensfix’s advanced chatbot) of the mobile app for a conversation, and possibly asks the user for more relevant information. 
  • The ai-engine, if need be, launches a service ticket, attaching all the input modalities. It further identifies the ‘best’ technician to handle this problem, notifies him, advises him to carry certain equipment, certain spare-part for ex: a hose pipe from the maintenance store (after confirming that enough hose pipes are in stock and placing an indent to the stores against the said ticket) and suggests a few pages of the relevant document or manual as reading material before attending to the service request, and goes on to book an earliest slot in  his calendar taking into account the priority of this incident (or service request). 
  • Later on, the sensfix ai-engine can help the technician in performing his service by generating a dedicated digitized service workflow that he has to follow. While following those service steps, the technician can be asked to leave certain feedback in text, audio, photo and video forms that get archived in the knowledge database used for self-learning of the ai-engine.


To be able to achieve this vision of the sensfix product, the technologies to be developed by Sensfix have to achieve following milestones in the best possible way subject to project time and resource limits –

  1. Sensfix mobile app sends GPS or indoor coordinates (working in collaboration with an indoor location tracking solution) to sens-ai (ai-engine). Sens-ai realizes that GPS coordinates could be upto 10m inaccurate, figures out 3 machines are in the proximity of the user and bookmarks them for further analysis.  (Challenge: The AI algorithms have to handle uncertainties of different modalities and their measurements. This needs using information fusion techniques exemplified by the bayesian learning methods and beyond.)
  1. Sensfix mobile app records a few images of the machine in a quick sequence. Quite often a single image may not be adequate due to focus, blurr or obstruction etc. Images from mobile devices can be degraded by limited contrast, inappropriate exposure, imperfection of auto-focusing or motion compensating devices, lack of skills of the photographers, and so on. In order to decide whether to process the images or not, or whether to delete them or not, a reliable measure of image degradation to detect blurry images from sharp ones is needed. We need a method that is fast, easy to implement and accurate.  While DeepFocus [DF19] like deep learning based methods can address this problem quite adequately, they need significant computing power and also immense effort in gathering training data and labeling them. We intend to develop simpler methods that can be used on the mobile itself. We will develop a method which is based on local information such as implemented in [Tsomo08] and [Liu08]. By combining the strengths of both these methods, we can  aim to develop a very fast low cost method that improves on both of them. The images that do pass through this sieve, if need be, can still be caught by the server side code which implements simpler deep learning models using imagenet [IM14] kind of neural networks pre-trained on millions of generic images. Using model based techniques, we will create blur and other effects so that deep learning based techniques could still be worked with. This dual approach gives us the best of both worlds.  This dual approach of correcting faults on the mobile, with a fine grained sieve on the server side makes our method more fool proof and would exhibit industry specific results.
  1. Sensfix mobile app also records audio emissions of the machine. We do not expect audio captured on mobiles to be of sufficient quality to pinpoint machine faults, but could be an aid at a gross level.
  1. If the audio has human noise superimposed, the sensfix mobile app has to attempt to capture acoustic emissions till they are as clean as possible. (Challenge: combined DSP (Digital Signal Processing) AI methods would be needed to identify interference between machine and  human generated noises)
  1. Sensfix mobile app streams all of the above info (Geo-positioning and image) to sens-ai. Sens-ai using the images recognizes the machine type, and then based on the location information, identifies the exact machine. (Challenge: identifying objects in the wild, under different illumination conditions and 3D view angles is a tough to handle problem. All the current genre of image recognition neural networks assume clean images as input.)
  1. If sens-ai does not identify the machine, it engages in a conversation with the user on the sens-channel (an advanced chatbot) instructing the mobile app to ask the user for more details in text (or audio), to change the angle/distance of the observation. ( Challenge: AI image recognition has to work at sub-second speeds. The AI algorithms have to learn that the images in step 2, are also valid images of the same object and use them for improved recognition later on)
  1. Sensfix mobile app then takes a video of the leakage. Video of a static scene is quite often no better than an image. Video becomes useful if some parts are moving in an unexpected way or some liquid or gas is coming out at some unexpected place. (Challenge: AI algorithms have to recognize if anything substantial is changing in the scene other than expected movements, and other unexpected but non-substancial changes due to gentle/wild camera motion. Hitherto, this is an unsolved problem.) 
  1. sens-ai then instructs the mobile app to ask the user to take an image of the object  being worked on. (Challenge: Depending on the machine, certain images of the output could helps service people identify machine problems more easily. From the feedback on the ticket, the AI algorithms learns to associate machine output quality with machine defects. As the Sensfix app is repeatedly used, it starts getting more labeled data in this form, making itself smarter by the day, literally.)
  1. Sensfix mobile app then prompts the user to put in his observations.( Challenge: learn associations between lay user perception of problems expressed as text with real machine defects. User perceptions are the only clues that emails contain, when the app is not used, so that later some time actions can be taken based on just email textual contents. User perceptions tend to be noisy and conflicting at times. In addition, one  needs to develop email classification techniques, which are quite different from normal text classification.  Email classification is a long studied problem, has quite some history, but needs significant re-engineering to meet the needs of this project.)
  1. Taking into account all this information, and prior history of  similar machine faults, the history of the repair personnel, and their availability, has to decide whom to email about this problem. If suitable persons are not available, it has to escalate the issue. (Challenge: The problem here is that of matching machine faults to the soft skill levels, expertise and cost of using relevant technicians to service the fault. This essentially needs development of ontologies of the faults and human skills, and metrics to match them in the service domain.)    
  1. The modern machines come with a host of sensors which transmit quite a plethora of information related to the health of the machine.  Most of this data is in numerical form. It has been repeatedly found in other domains such as healthcare and others, that using additional data on top of numerical data, leads to better results. Researchers have tried mixing numerical + text, or images + text  and a few other bi-modal information combinations. By combining all these modalities, it can aid the service person in quicker diagnosis and thus repair of the fault. (Challenge: Processing all these modalities, numerical data, text, images, acoustic and vibration data, many of which may not even be synchronised is the grand AI challenge we wish to take up. Much of our effort will go into this aspect in the final year of the project, while the other individual modality problems will attract a major share of our attention in the first two years of the project.)
  1. Sensfix mobile app needs to help a service person, if he wants to refer to certain pages containing relevant information in reference documents (or manuals) where appropriate action sequences are mentioned. In addition, if the suggestions /help rendered by the app are not appropriate, and the service person suspects some other problem, and needs help in knowing the maintenance steps to follow,   the person may key in his requirement or diagnosis, and the app has to retrieve relevant material with high fidelity, so that the service person does not lose time searching for relevant information. (Challenge: Quite often the service person may express his request in a certain way, whereas the service manuals may have talked about the same issue,  in a different way altogether. This disparity is exasperated by the inexperience of the service person. To assail this mismatch,  one needs to develop domain specific and semantics based retrieval methods that are somewhat cognizant of the machines and their faults. This would need building fault and service related ontologies. This is highly manpower intensive and very few reliable tools exist for easing one’s efforts)
  1. To support and schedule preventive maintenance activities with respect to individual machines/equipment, many existing systems like IBM maximo,  need information to be fed into the system for each and every machine. This needs studying the maintenance manuals of the machines/equipment and filling the fields of the application. This is laborious and prone to human errors. By automating this activity, one could gain competitive advantages in commissioning  the sensfix system at new customer sites. (Challenge: This kind of extraction of relevant information from service and maintenance, comes under information extraction domain. While researchers have decried pattern base methods, many industries have found this method more acceptable due to ease of explanation of the extraction patterns. In this project, we will try to enhance this technique using transfer learning kind of techniques, which no one seems to have attempted so far.)
For many of the above steps, individually interesting algorithms have been developed. Those algorithms have been showcased, quite often with limited data or limited noise, under fairly favourable conditions. On the factory floor, the things are not that clean. People have not tested them in real life situations. We believe significant changes or even major innovations may be needed to make them work well. Since this in itself is a never ending story, we would research them to the extent of making the best we can in the first two years of the project.


Many of the AI techniques reported in research labs need careful reassessment and significant fine tuning before they can be used on real life factory floor kind of environments. This project is all about that and further integrating the disparate systems into a functional single uint. In the process of algorithm ruggedization and integration several minor to major innovations would have to be done, in each of the above mentioned challenge areas. In this process, new custom data would be collected at a huge scale and labelled. Methods to reduce this labelling effort, such as transfer learning, will have to be investigated. Many of the innovations needed in the shape of either developing new algorithms or using transfer learning judiciously would necessarily be on a per need basis and difficult to foresee right at the outset.