Est. reading time: 16 minutes
Author: Steph Locke
It’s no secret that the disruption of Industry 4.0 and the challenges presented by Covid-19 have been a push for manufacturers to evaluate digital transformation and consider going smart with AI in their factories. This article, adapted from the webinar shared below, is aimed at manufacturers who are interested in the techniques, data infrastructure and processes needed to support building internal data science & AI intellectual property. We will be drilling into the technical aspects of AI and focusing on the machine learning and data science side of things, how these work in terms of methods, and how manufacturers can identify the best solution based on their business challenges.
It’s no secret that the disruption of Industry 4.0 and the challenges presented by Covid-19 have been a push for manufacturers to evaluate digital transformation and consider going smart with AI in their factories. This article, adapted from the webinar shared below, is aimed at manufacturers who are interested in the techniques, data infrastructure and processes needed to support building internal data science & AI intellectual property.
We will be drilling into the technical aspects of AI and focusing on the machine learning and data science side of things, how these work in terms of methods, and how manufacturers can identify the best solution based on their business challenges. Discover what you'll need to put in place to start leveraging AI and ML, customising real products, and building your own bespoke solutions over using off-the-shelf, plug-and-play style solutions. Jump to the relevant sections below:
- What is AI: recapping the basics
- R&D behind AI
- Identify the core ML solution for your business challenge
- Critical Infrastructure
- Your analytical maturity
- Your process
- AI readiness and back-to-work readiness
Rewatch the webinar
What is AI: recapping the basics
AI performs cognitive tasks which fall under reasoning, understanding or interacting.
- Reasoning - going from real-world, imperfect data and turning that into rules and ways of operating.
- Understanding - interpreting sensory inputs such as sight, text, voice, etc.
- Interacting - combining the reasoning and understating capabilities to provide a more natural interface between humans and computers.
These capabilities make it easier for humans to interact with computers without necessarily needing to code or be 100% accurate or specific in order to get it to do what they want. They can be broken down further into specialisms, displayed in the table below.
|Computer vision||Working with images and video||Detecting health and safety risks on the shop floor|
|Natural language processing||Processing text from documents or audio||Extracting insights from field reports, news or academic papers|
|Conversational interactions||Chatbots||Adding “face-to-face” and chatbot type interfaces into processes for a more intuitive way of using them|
|Speech processing||Facilitating communication across multiple languages||Real-time translation in meetings or calls|
|Knowledge mining||Mapping useful knowledge from unstructured data||Uncovering how timing or settings of a process affect the quality or throughput in a factory|
|Induction||The process of learning by example, generating rules from data||This underpins many other areas of AI, such as machine learning|
AI is used heavily in smart factories for processes such as quality control, generative design, or to oversee the whole system and predict maintenance. It can also be used behind the scenes, at headquarters and among your staff to automate repetitive processes and tasks that fall in the knowledge workers sphere. Check out this article for a deeper dive on Industry 4.0 and see our AI in manufacturing resource for some cases studies and a lighter overview of AI.
R&D behind AI
Computer vision, speech and language are the core techniques that underpin many of the different solutions and use cases that can be applied to manufacturing. These are almost always based on a machine learning model.
An AI model which identifies defects in stock will have been built on a model that has been shown multiple images of what is acceptable and what is not. A speech recognition AI which translates sounds into text will have been trained on multiple examples of the shapes of certain sound waves that represent certain words.
If you’re using AI without conducting your own research & development (R&D) as described above, you are usually relying on other people's machine learning models. The advantages of this are that if the model has been built by someone like Microsoft or the Open AI Project, they will have been built using the huge amounts of data that they have access to or have invested heavily in. So whether you use an entirely off-the-shelf solution or a more customisable one, you can leverage their intellectual property, research, and data, to build your own solution.
How to identify the core ML solution for your business challenge
There are four key machine learning and induction tasks that are relevant to manufacturing. As a business person, when you're thinking about the problem you're trying to solve, if you can work out which you are trying to do, this will speed up the process of going from business challenge to delivering real value.
- Predicting values
- Predicting labels
- Discovering patterns
- Discovering anomalies
Predicting values can be anything from how much a unit will sell for, how many defects we might expect, or projecting what our cash flow is going to look like. There’s a huge variety of things that we can try to predict with machine learning.
Predicting labels refers to predicting outcomes. For example, whether a machine will break down or not or whether an image contains a product vs whether the product is missing from the image. These labels are things that have a probability of occurring.
Discovering patterns in our data can help point out unexpected relationships. For example, Walmart is one of the pioneers of data science in business, and they discovered that beer and diapers often sold together. After conducting user research into this relationship they discovered that it transpired from dads on their way home from work being instructed to pick up nappies and grabbing some beers since they were already in the shop. Based on this data they started placing these products next to each other to increase sales.
Discovering anomalies is particularly useful in a manufacturing setting. It is important to be able to find out when processes are deviating from the norm and to be able to intervene before something goes wrong. This can be applied in a sophisticated and distributed manner.
Below, we will be taking a higher-level introduction into these techniques so you can get a deeper understanding of what data scientists and machine learning specialists will use to solve specific business problems for you.
Classification is where you try and predict a label, i.e. do you classify this as x or y. Predictive maintenance is a classic example of where classification is used to understand the drivers of whether something is going to break down or not.
Predictive maintenance works by collecting data about the machinery and external factors such as the dates of previous repairs and maintenance, temperatures the machinery is operating in, etc. By matching previous outages with the occurring conditions at the time, we can feed this into an algorithm to make rules and predict when it is most likely to happen again.
Based on this, you can put in manual or automated interventions. Automated interventions would generate work orders to be approved before something breaks down, giving you a chance to arrange maintenance before breakdowns occur.
Three key ways of performing classification are decision trees, regression, and neural networks.
Decision trees identify ways to split data to get the cleanest outcome groups. They are based on a set of "if this then that" style rules. These can get very complicated, potentially sifting through hundreds of sensors on your machinery to figure out the right “if this then that” boundaries with as many levels of complexity. You can also use multiple variations, so you can end up with not just a single tree, but jungles and forests. Grouping decision trees in this manner can get you more sophisticated results.
Regression predicts the chances of something happening. This is based on a classic line of best fit across multiple columns and sections of data. This gives you a really fine-grained control to say not just whether it will happen or not, but to set a threshold of probability. This means you can choose to get alerts if there is a probability of greater than 70% of a breakdown so you don’t waste resources on too many false alarms.
Neural networks use multiple iterations to predict the chances. This is the most common type of algorithm in deep learning. Unlike regression which only finds one line of best fit from which to base its rules, neural networks self optimise by going through all of the possible data multiple times with the aim of producing a more accurate solution. Neural networks require far more data which results in greater accuracy, however the downside is a decrease in explainability. There is currently a lot of money and research going into increasing explainability in these more sophisticated classification models in order to build trust.
In summary, decision trees and regression models are better for insights. They can be used in autonomous situations, however they usually require some form of human supervision. They are often used to inform, and a human assesses whether to take action.
Anomaly detection is particularly relevant in manufacturing for detecting unusual values in rich real-time data that indicate how your machinery is working. Deviations from the expected range may cause problems like quality issues, production issues or indicate breakdowns or a significant risk to operators.
There are three types of unusual values, point anomalies, contextual anomalies, and collective anomalies. Point anomalies are the easiest and least sophisticated set of algorithms or rules to implement and collective anomalies are the most sophisticated and least likely to generate false positives.
Point anomalies are simply values that fall outside the expected range. They are unusual in contrast to the whole dataset. This has been used for years in classic monitoring as it can be as simple as triggering an alert for values outside the 95th percentile, however it can trigger false positives in cases where values are cyclical or have a wide range.
Contextual anomalies are monitored based on the context of the neighbouring values, or the last few values that occurred before. For example, in a set of values that are steadily increasing, perhaps RPMs, a sudden drop of 50% might appear normal in contrast with the whole dataset, but would be an anomaly that's worth pointing out at that moment. Contextual anomalies build from recent occurrences, so they give you far fewer false positives.
Collective anomalies give you a retrospective view of unusual values overtime. While this is harder to detect in, and less useful for real-time applications, it is great for detecting quality defect patterns, fraudulent transactions or tampering with systems.
Pattern detection can be used for identifying groups in data that doesn't currently have labels, or we can use it to find associations or correlations between different things.
K-means clustering is the most common and straight forward clustering method, where K represents the number of clusters that you have. It works by grouping clusters of records that are close to each other in mean. This is a form of data mining which can result in finding related groups that you didn't know existed, for example, this can be applied to finding new customer groups.
Hierarchical clustering is a multi-level grouping of records, resulting in a hierarchy of clusters. It’s similar to a decision tree, but instead of optimising for an outcome or label, it groups similar values in clusters that are as dissimilar from other clusters as possible. The best thing about a hierarchical clustering model is the ability to harvest a group at any level. For example, the first split may only have three groups, but the final split could have 300. This is great for categorisation and building a taxonomy of records.
Associations identify co-occurrences and correlations. This relates back to the Walmart example of buying beer and diapers together, or the Netflix recommendation engine that suggests what to watch next. This association method can help to find patterns to figure out what is normal and why things deviate from this. It can be used optimise your inventory, stock or prices based on what you have, which can be really useful for inventory management or upselling products.
In order to start using these techniques, you first need to invest in some critical infrastructure that will underpin it all. Let’s take a closer look at these.
You need to be investing in your data. You should treat your data like any other asset in your organisation. Data must be maintained and it must adhere to quality processes, documentation and compliance considerations in order to remain trustworthy, useful and cost-effective. The following four areas of data collection are particularly relevant to manufacturing, however it’s likely that your business generates tonnes of data every day, so you will want to store and explore its value.
Sensors are a great way to collect data from equipment old and new in your operating environment. Take a look at our previous session called The Historian and AI for a deeper view of how you can collect data from your historian appliances using sensors in a consolidated manner.
ERP and logistics systems hold a wealth of data, from metadata about your product to the fleet that ships it or even who your staff are. ERP data is usually vital for making sense of what’s going on in the right context. Making sure you get good quality data from your ERP doesn't just help the business perform better day-to-day, but will also help build these bespoke machine learning tools to solve business problems.
Events refer to incidents that are not continuous, such as a breakdown, repair or an accident. You may pick these up through your sensor networks or spreadsheet logs. It is important to log events and the conditions in which they occurred so that you can make links to causes and use this to make predictions for future events.
Third party data can be used to help you maximise the value of your solutions. Looking at Walmart again, they factor in weather forecasts to their just-in-time logistics process for delivering goods, meaning that if there is a heatwave coming, they ship more warm weather foods like ice cream, beers and BBQ items. This inventory optimisation leads to increased sales. In manufacturing, third party data like weather, supplier data and customer returns can be integrated into your solutions to give a richer perspective on your business.
The rise of data has meant we now collect it from a number of sources, systems and at different frequencies (e.g. daily weather forecasts vs continuous data from your machinery sensors). Matching this up in the classic database can be very difficult, particularly when dealing with rapidly changing data with potentially changing attributes.
As you grow or encounter new challenges, you might need to make changes to your repository model or reconfigure the structure which is easy to do in a data lake, but could be costly to do with a database. For example, with Covid-19 challenges, you may need to collect metadata around staff and social distancing measures that you didn’t before. Data lakes allow you to capture more types of data, structured or unstructured, and it does not need to be prepped at the time of storage. This data can be called upon later and will be reconfigured only when needed.
You should always practice caution when using data lakes as your data is an important asset and you don’t want to make a mess of it by treating it carelessly. You should use recommended structures and maintain a central data catalogue which will capture metadata on the various sources, make your data accessible and enforce data governance.
When done right, a data lake is a low-cost way to be consolidating data from lots of sources without substantial set-up time.
Your analytical maturity
Where you stand with critical infrastructure depends heavily on your current data capabilities. The great news is that wherever you are on this scale, you can be advancing from spreadsheets to edge AI in a matter of years.
If you’re at a stage where you’re working on spreadsheets, jumping to machine learning isn’t what we would advise next. A better idea at this stage is to start using software that has AI built in. This way you are leveraging other people’s data while setting out on a journey to become more data-savvy. You might start picking up on data quality issues caused by a generic solution which isn’t a perfect fit for your problem.
If you’re collecting data on a database and have access to developing or coding capabilities, you can start adding some off-the-shelf APIs to do things like adding captions to images taken by field engineers. This level still relies on other people’s models, but you start enriching your own data to improve processes.
If you're using connected devices to collect data and have accessible ERP data and you’re ready to move onto pre-trained AI analytics, which can be things like predictive maintenance. There’s a range of tools you can use like Power BI, Azure Machine Learning, Data Robot, etc to start experimenting with different models. This can be done by upskilling people who are not necessarily specialists, but can follow guided workflows to use these tools for things like understanding the key drivers behind machinery breakdowns, etc.
If you’re at a stage where you’re ready to build your own bespoke analytical models, you either have or need to acquire people who understand the mathematical principles behind the different types of models, structures and data management options to be able to guide you to solutions that are well tuned to your situation.
Finally, once you're comfortable with generating AI and machine learning models to drive insight and inform business decisions, you can move on to edge AI, training these models and then actually deploying them on the device. This gives you real-time monitoring of your machinery and the potential to start fixing problems immediately or have alerts sent. This requires good quality data and a foundation of trust in the data which comes from creating a positive data culture.
With these technical details in mind, you can approach your business problems in a new way. You understand the types of solutions that can be built, ways to work with your data and how to prepare it to be able to support machine learning processes. There are endless use cases inside manufacturing from improving safety to lowering costs, and understanding these processes helps you understand the possible solutions, even if you’re not the one implementing them.
Your process begins by identifying a business improvement, problem or challenge, and setting a goal against it, e.g. a critical measure for optimising profits, reducing overheads or increasing safety. From there you will need to assess your internal skills and build a prototype. If this goes well you look at how to put that into production and how to scale this concept across your organisation.
Fundamentally, building models is like any R&D effort, taking you from a business need, to requirements, to initial design and implementation followed by iterations.
AI Readiness and Back to Work Readiness
As previously mentioned, we have a list of other manufacturing resources to help you learn more. As an organisation, our passion is helping businesses adopt artificial intelligence and other emerging technologies. Interestingly enough, we believe that it’s not so much a technical problem, but the focus should be on transforming your people and processes to have everyone understanding and engaging with data in a way that benefits the whole organisation.
We resolve the people and process challenges for businesses to successfully adopt AI by upskilling staff, building your AI strategy and providing access to experts to work with you on building your bespoke solutions. Get in touch to discuss accessing our services to kickstart your transformation journey.
Don't miss our guest sessions from the series that help businesses think about some of the larger concerns about using AI in the company. From an overall AI Readiness perspective, Clare Dillon (NuaWorks) and Ashwini Mathur (Novartis) talk about building trusted AI products and embedding this capability throughout the organisation. Then Matt Macdonald-Wallace (Mockingbird Consulting) and Dr. Iain Keaney (Skellig.ai) look at the use of IoT and privacy respecting data science to help businesses operate in the post-COVID19-lockdown world.