Predictive Maintenance & Monitoring using Machine Learning: Demo & Case study (Cloud Next ’18)

Predictive Maintenance & Monitoring using Machine Learning: Demo & Case study (Cloud Next ’18)


MANJU DEVADAS:
Welcome, everybody. Hope you had a good
morning since morning, and I hope many of you
attended the keynote session. Let me introduce myself, and
I also have my co-speaker, and we’ll quickly give
his introduction as well. My name is Manju Devadas. I’m the founder, CEO of Pluto7. We are one of the key
partners for machine learning and AI implementation
for customers on GCP. And it’s my honor
and pleasure to be co-speaking with
Purushan Dhingra, so he’ll be joining
onstage very shortly. And a few things
I want to mention about him is, he is one
of the person at Google I’ve actively
followed for a while. And many of you may have
seen him on YouTube, at various sessions. Very insightful. Buckle up your seats if
there are seat belts. You’re going to see some
pretty incredible demos and walkthroughs of
real world problems that are being solved on
Google Cloud with machine learning and AI. So before I pass it
onto him, I would like to mention a
few things and also walk you through one of the
case studies that we solved. What we essentially do is
going to customer sites and look at, from an
innovation angle, what can Google machine learning
and AI technologies do for solving their
real world problems. I’ll go into one case study. There’s also a
deeper session later in the day on this case study. So while they’re bringing
up the right slide, let me give you some
introduction of what I’m going to be talking. Improving taste of beer. What does machine
learning and AI got to do with improving
taste of beer? That’s the question
that was posed to us less than six months back. And what I’m going
to walk you through is working with one of the
world’s largest brewery. I’m pretty sure eight out of
10 beer drinkers in this room drink that beer. And I can’t tell you the name. You’ll figure it out eventually
in the later session today. Essentially, they said, OK,
we think machine learning and AI can do something. But we don’t know exactly what. You’re the supply chain
manufacturing domain expert. Come and show us what it can do. In short, when we solved it,
it was an eye-opener for them. Not only selling
millions of dollars, improving the taste of
beer, which is a good thing. Having said that, I’ll walk you
through that a little bit more. If you saw the keynote
from [INAUDIBLE] this morning and
the demos, those were some of the very
initial examples. There are tons and tons
examples of use cases being solved with prebuilt
machine learning models– auto ML, custom ML
models and so on. So I’ll walk you
through one example. And then, of course,
during Q&A, we’ll have a lot more to
speak about and answer some of your questions. When it comes to
predictor maintenance and preventative
monitoring, Prashant is going to walk you through
some of the deeper demos. But let me set the context
and a little bit of thinking. Machine learning and AI,
why is it a big deal? If I told in ’94, ’93 in
an enterprise setting, internet is going to
change your world, it’s like, yeah
connecting to computers would be useful, but
not sure how it’s going to change my business. Today, nobody needs
to explain how internet change your business. If you take machine learning and
AI, in my simple explanation, we rely on computer to do
a lot of different things– computations, storage,
organizing, searching, finding, but ultimately, the
decision-making we humans want to have and control. Just like driving the
car, it’s very hard for us to give complete control
to the driverless car. It’s the same thing in
enterprise decision-making, whether it’s invoice
processing, or replacing a part, or planning a
shipment and so on. We want to be in control. When you break
down these problems and allow a machine to
do it, in some cases, it might do a better
job and help us. So it is this
decision-making that we are talking about when
we say let the machine crunch through the numbers,
look for the patterns, and make better
decisions than humans. Now, I just spoke to
you about the beer, one of the largest consumed
beverage apart from water. When we say we’ll
improved the taste of beer with machine learning
and AI, it was something that everybody wondered. What would that be? What has that got
to do with taste? Now let me go into a
little bit into the detail. So as most of you may
know, creating a beer involves fermentation
in a kettle and then filtering the beer to
get it into your bottle or can. And for that, when is the
beer ready to transport it into a bottle? It is mostly human
judgment in most breweries. It is looking at the color
of the beer or the particles and what’s called turbidity. There are few things
that humans are involved. In this particular
brewery, there was a 30-year
experienced brewmaster who made the judgment. We’ll talk about his
accuracy level in a minute. But essentially, the problem
what we are solving here is the beer flows through
the kettle, passes through the filter, and
then, the clear beer comes out of the other
side of the filter. And then, it gets bottled
or stored in a can. Now when you
replace this filter, here is the key problem. The problem costs millions of
dollars when it’s done wrong. So as you’re processing the
beer, as you are bottling the beer, the
color of the beer– there’s good beer and bad beer– turns from good, bad, to worse. You want to catch it
right before it gets bad. In this, ideally,
you want to catch it at the very right time. And now this is a
human decision-making looking at the taste
and color and so on. And then you say, oh,
the filter has gone bad. Now, let’s replace it. You replace it too early,
it’s not good, or too late, it’s not good. Essentially, if you
replace it too early, you’re replacing a
filter that costs– it’s not just the filter cost. You bring the production down. You’re doing it at
Monday at 11:00 AM, which was not expected. Now your labor, and then your
transportation, your trucks, and so on. This one brewery makes 100
kegs of beer every month. So think about the
magnitude here. So in short, when a
brewmaster made the decision on when to replace the
filter, he was 60% right, which means 40% of the time even
whether the filter was wrong, or he made a bad decision. Essentially, he
got it 40% wrong. Now how did he
make this decision? He made his decision through
his 30 years experience. He made his decisions
through, OK, I look at temperature,
pressure, turbidity. There are four or five
different key data points, which he believed. I call it human bias. And for the most
part, they are right. But sometimes they’re wrong. And that’s kind of what we want
machine learning and AI to do something for us. And there is no magic here. It is more number
and data crunching. And when we say no more
number and data crunching, you look for data patterns
with your neglected data. When I say neglected data,
data that you neglected, because it’s too hard for you
to look across all the columns, across all the rows and
identify the patterns. Or it’s humanly not
possible for you to identify the perfect
scenario where the data patterns occur, such that it’s telling
you the filter is wrong. In the ERP, they had
200 columns of data. And they only relied on
three columns of data, but with tons of
useful information, which they couldn’t. Now what’s the difference
with what we did? Again, like I said,
there is no magic. We just looked through all the
available reasonably meaningful columns. What is more commonly termed
as feature engineering. We identified the columns
which are more relevant. And we build machine
learning model on GCML. Now OK, machine learning
model, in case anybody is new, in my simple
terms, it’s just mimicking the simple
decision-making process of beer filtration replacement. Take the decision,
mimic that into a model, and deploy it on GCML. It’s really as simple as that. But again, there is
a lot of complexity you’ll appreciate when
you go into solving more and more business problems. So now all of these
things when it is done, these are not done over a
three months, six months, or a year-long project. There are experimentations
done in weeks. And you have to show
the results, results that they can believe in. And the best way
is when they ask me to fly back and
present about the results, it was not me presenting. It was the brewmaster presenting
that I can’t beat this model. It’s too good. Now we need to take
it to production. So essentially, it’s not
just building a machine learning model, but
it’s also making sure that your stakeholders
[? believe. ?] Many of them are new to machine
learning and AI. They need time to comprehend. At the end of the day,
you’re tying your machine learning models, your
processors, and the information that you find together into
a machine learning model that you deploy. These are pretty much
the high level steps of machine learning
model deployment. When you look at this
on the left side, it’s really there are
two main components on deploying the model. First, you do it locally
training and applying the model. So I won’t go into the details
because of these models, training and deploying, because
there’s another use case that Prashant is
going to share, where you’re going to get a
flavor for what it looks like when a model is running. So in other words,
preventive maintenance is one of the key topics that
many companies around the world are watching very closely. Because there are numerous
decisions that gets made in the manufacturing
and supply chain world. And with that,
what you are really talking about as
direct ROI impact in the form of saving money,
increasing productivity. So with that, let me pass it on
to Prashant who has some very interesting demos to show you. Prashant. PRASHANT DHINGRA:
Thanks a lot, Manju. [APPLAUSE] Thanks a lot, Manju, for
the great case study. My name is Prashant Dhingra. I’ll walk you through two cases. We’ll showcase a
case study, like how you can deploy predictive
maintenance model on a [INAUDIBLE] data set. And we can also
showcase, like yesterday, where you will use a river data. And we will see a scenario
how you can predict the water flow in a river. So the common use cases
for predictive maintenance are companies wants to
predict which machines or which device
is going to fail. So in machine learning
term, we call it as a classification problem that
you have a set of sensor data. As Manju showed you, it was
a classification problem, like whether this filter
is spoiled or not. So which machine will fail? Which device will fail? Or which car will fail? These are the one
kind of scenario. The other kind of scenarios
in predictive maintenance are what is the remaining
life of a machine? So if you have a
oil rig, or if you have an engine in
an aircraft, what is the remaining life of it? If you have a battery,
what is the remaining life of a battery? These are the second type of
machine learning scenario. We call it as a
regression scenarios. The other more
advanced scenarios we use in machine learning is
called optimization scenario. Generally, a human looks into
the machine learning output and makes a decision. When you are mature company– like within Google, we
looked into our data center. And once our machine learning
model got mature, instead of human making a decision,
we let the machine learning make the decision itself. For example, in Google data
center, we saved 40% of energy by using reinforcement learning,
where the machine learning model makes a decision, like how
much of cooling power to use. So first step is you build a
classification or regression problem for determining what is
the remaining life of a machine or whether this
machine will fail. Once you achieve
maturity, then you start building
optimization scenario. Then there are four
types of scenario. Many companies doesn’t
have a label data. So if you have a label
data, like historical data, and when machine failed
as a [INAUDIBLE],, you can build classification
or regulation model. Many times, company simply
wants to identify patterns, like where the anomalies are. Sometimes company doesn’t
want to find anomalies, but they want to
create a benchmark, like if millions of
vehicles are used, whether these vehicles are
used in a right way or not way. If there are a lot of aircrafts,
which aircraft landing is the right landing, and which
aircraft landing is an anomaly? So you can create benchmark, and
you can also create anomalies without using the label data. That’s a fourth
type of scenario. There’s a fifth type
of scenario also. I recognized this scenario when
I was working with a customer. Sometimes it is not possible
to identify whether the device will fail or not. But many times, you are
interested in the outcome. For example, if
you are measuring the amount of water flow in the
river and if the device fail, you can make a
machine learning model to predict when the
device will fail. But when the device
fail, the water is still flowing in the river. That same thing also
happened in industry. Like sometimes, the
sensors fail, but the data is still getting generated. So can you predict
the water flow when the measuring device
of water flow fails? So there’s another
type of scenario. We are calling it a predictive
monitoring of a virtual sensor. You might be using a
different terminology. So this is the fifth
type of scenario. So depending upon your
need, your customer need, brainstorm what
kind of a scenario is the right scenario,
which scenario will give you a right value proposition. Accordingly, make a decision
what kind of a machine learning model you want to build. When you have decided which you
use cases you want to go after, the common problem
in machine learning is how you will
collect sensor data. So here, we are talking
about the machine learning for IoT scenario. Three key challenges are
collect data from your sensors or from equipment,
create features. You create features so that
you bring data into a shape where the algorithm
can recognize it and algorithm can work on it. Deep learning,
generally, is very good in working with
a data set, where even if you do not have the
right number of features, it can identify features itself. But generally, you want
to create features, so that you bring the
data in a good shape. Once you bring the
data in a good shape, then you can select an
algorithm and build a model. So going back again,
determine the right scenario for your customers. Once you determine
the right scenario, ensure you can collect the data. Generally, we talk about
defining a scenario. When you work on the
machine learning model, you should try to
convert a business use case into a machine
learning use case. What do I mean by
that is generally, we will say that
here is a use case. We want to predict
whether there would be a battery failure, whether
there will be a car failure or not. But define in the
use case how you will use the output of your
machine learning model. What is the definition
of breakdown? Because breakdown sometimes
means device fails. Breakdown sometimes means the
device is generating more heat. Breakdown sometimes means it is
working at 70% of efficiently, and it is not fully operational. Sometimes breakdown
means it is producing more vibration or sound. So define what is your
definition of breakdown, which you want to avoid. Then define what kind
of signals or patterns you have that
shows that degrade. And determine how often
you have been collecting signals and then how much
of normal and failure data you have. Once you have all these
data, put that details into a use case. Because at that point in
time, your data scientist should be able to make the
right decision, like what is the definition of breakdown,
what data set you have, what you are trying
to predict, and how much of normal and
failure data you have. Then take it further. Convert a use case
into a hypothesis. For example, many times
you want to predict whether this device will fail. Sometimes you want to predict
whether this device will fail in three weeks,
one week, one month. So you want to do it
in multiple periods. So define that period. Sometimes you want to
predict whether this car will fail due to a battery
problem or a starter problem. Same thing– the
machine will fail because of part x or part y. So convert you use case
in the form of hypothesis. And once you have converted
into form of the hypothesis, then do the data
exploration exercise to determine whether your
data set and the use case are right for this use case or not. And I will show you example,
where we do two demos. And we will do the data
exploration for both. So these are the
general steps we go through when we build a model. We define a use case. We convert that use case into
hypothesis and [INAUDIBLE] use case. Then we do the data exploration. In data exploration, many
times, you make a decision that, yes, use case
and data set match, and we can proceed
and build a model. Then you select an algorithm,
you build a pipeline, and after you apply
the algorithm, you will have a model. And then you iterate
on improving the model performance. Then you present the
result of business and make a decision whether you
want to take it to production or not. And if business is happy, you
put the model in production and then start monitoring it. And this cycle continue,
because your data pattern will change over a period of time. So you continue to monitor it. Many times when you
do a data exploration, you realize that you do not have
the right data or this use case is not a right use case. That time, you go back
and make a decision whether you need
a different data, whether you need more
data, different data. And then you collect that data. Sometime you decide
to change a use case. Then you work with your business
to define a different use case all together that
can be built on your data. For example, I gave
you a river example. If there is a
gauge in the river, and the gauge on
the river breaks because a tree is
floating in the river, and if somebody takes a
historical data, like how often the gauges break, a
river is floating in the river randomly. So you can’t build up a
predictive maintenance model. So you go back to your
business and define what kind of other use
case you can build. So then you can think through
another use case, which we’ll show you a second demo that
you can define a new use case that if a gauge fail, how
can I still predict the water flow? And then you have 12 months or
six months to replace a gauge. So let’s go into two demos. We’ll take two example. We’ll take a predictive
maintenance example. And another one is a
predictive monitoring, which is a modified version
of predicament maintenance. When you build a predictive
maintenance example, let’s say you have defined a
use case that you have oil rigs or you have aircraft. And you have a sensor data are
coming from those aircraft. You want to determine
when this aircraft or when this oil rig will fail. So you want to look for
such pattern in the data exploration. For example, speed,
efficiency, pressure reduces when a machine get
older or an engine get older. So for example, in the day
one, the speed will be good. When the machine is getting
older, the speed will be lower. Same thing– the heat,
noise, and vibration generally are lower
in a new machine. And as the machine get
older, it started increasing. So look for such pattern. And when you look
for such patter and you see the real evidence,
then you can make a decision. And you can be more
comfortable that you can build a machine learning model. So example here is we
used a NASA data set. This NASA data set is
about a turbine engine. So it’s showing you Ford machine
in four different color– yellow, green, blue, and red. They are failing at
different point in time. You will see sensor one value
doesn’t change on day one. And the [? day one ?]
machine fails. So it doesn’t have any pattern. But sensor two, three, and four,
as the engine is getting older, their values are
rising steadily. And actually, after half
of the life of the engine, the values are
rising more rapidly. So this gives us a confidence
that if we use this data, we can predict the
failure of an engine. So if you see such kind of
a pattern into your machine or into your engine, it
gives you a confidence that data exploration
phase is good. And now, you can go ahead and
proceed building the model. So whether you take an
example of an oil rig, or whether you take an
example of an aircraft, you can build similar model. So for example, if there
are number of aircrafts, and if we click
on an aircraft, we see what engines are installed. And then you can see the
data coming from that engine. For example, there’s a turbine. There’s the nozzle pressure,
temperature, and fan speed. This is the real-time data. You will see that every second,
it is getting refreshed. Here, there is a
historical data. And here, you can see whether
there is any anomalies or not. This is a traditional big
data in an Audi solution. On top of that,
there’s a machine learning model that predict
what is the remaining life of the engine. Generally, the
life of the engine is measured in terms of cycles. So it shows that there are 46
cycles left for this engine. So what is happening
behind the scene is we’re getting the
data from engine. And as we get the data,
it’s shown on a dashboard, like what is the health of the
current data, which is good. And using the data set,
the machine learning model makes a prediction. What is the remaining
life of an engine? So as I mentioned to you
in an earlier example, first define a use case. This case is about
predicting the remaining life of an engine. It’s a regression problem. Then we looked into
the engine data set. And when we looked into
the engine data set, we saw there were three
or four sensors where the values were normal. And as the engine got older,
the values were rising steadily. So because there is a degrade
pattern in those sensors’ data, using those sensors’
data, we were able to build a
machine learning model that can very accurately
predict the remaining life of an engine. And you can use the same
concept in other domains also. You can use it in an oil rig. You can to use it
in a machinery. So if you have a data set coming
from your machinery or engine which shows a degrade pattern,
you can build a model easily. This is the real
data set, where you can see the data shows a
pattern that the values are rising for these four sensors. Sensor two, three, four, and
seven, for three sensors, it is continuously rising. And four sensors,
it is coming down as the engine is getting older. So we talked about a use case. We talked about a data
exploration exercise. So what are the best
practices for data collection when you are building a
predictive maintenance model? These are generally
comprehensive data set or data attributes
you can collect. So if you have an IoT data,
knowing about the IoT data, like whichever type
of device you have, it might be giving you
temperature or heat, noise, vibration, voltage. Or it might be
sending you images. Generally, the IoT data is very
powerful in making a prediction whether that device
will fail or not. So time series data is the
most powerful and more useful. When you combine the data
with the static data, like what is the make and
model of an engine, what is the configuration
and build or a software, what is running on that engine,
combining that static data with a time series data gives
you a very powerful overview. For example, in this case
also, when we built a model on the NASA data set, we had
an error rate of 45 RMSE root mean square error. But when we combined that
with the static data, which are the operational
characteristics of that engine, we were able to
reduce 45 to five. So the error rate
reduced drastically. Having [? Audi ?] data is great,
because it’s very powerful. Combining that with the static
data makes it more powerful. Many times, depending upon
your domain knowledge, you also want to put
usage history data. For example, if
there are two buses– one bus is used to take 20
people to office every day. Another bus is used in a
crowded place, where it takes 100 people to office every day. Second bus is likely
to fail more often. So knowing how many miles
a car has been used, how many hours it has
been used, or how much was the load on that car or
machine every time, generally helps you in making
a prediction. If you take another
example, if you are trying to predict
battery failure, if you have a car at home
which you start four or five times in a day versus
if you are a contractor, and if you start your car
50 times in a day when you go from different
homes and deliver things or fulfill some service
and move to another home, second car battery is
going to fail more often. So knowing the usage
history data also makes your model very powerful. And if you do the
maintenance on your parts, knowing about when was the
last maintenance done, when was the last service done,
adding that data set also into this data set also makes
your model more powerful. So companies will not have
all these four data set. But if you’re
planning for building predictive maintenance, plan
for putting together such data set, like time series
data is most powerful. Combining this with your
recruitment detail data or operational
characteristics data generally makes very good models. Depending upon your
domain, if the load makes a big difference, have
a data set about the load. And having a maintenance
data also helps. So with this, you can
build powerful machine learning model. So once you have
defined a use case, you have done the
data collection. What are the next
steps for building a machine learning model? So in predictive maintenance,
one of the common problems is you needs to have a label. And many times, you
needs to create labels. So example here is,
let’s say you get signal data on different days. Or you might be getting
different hours every week. So when a machine actually
fail, that is the final label. But if your goal is to predict
failure before one week, so you needs to tag
one week of data before failure as label data. That is your failure data. So that is your positive label. And remaining data becomes
your negative label. So that is additional work. If you do this exercise,
then your model will become more powerful. So depending upon your use
case, how much far before you want to predict the
failure, tag that much period before failure as your label. Similarly, if you are trying to
predict what is the remaining life of a machine, you can
easily build a deep learning model where you
have a final failure and you have the sensor data. But if you can tag the
data at various places, like for example, how the signal
was looking at 20% life, at 40% life, at 60% life, your model
become much more richer. So if you’re not getting a
good result in iteration one, can try to create such label. And then you will be
able to see good result. Once you have a data
set, and then you have created the right
labels, then next step is to create features. Depending upon your
domain, you needs to select what kind of
features you needs to create. But these are standard
features, like you sometimes create minimum, or
maximum, or count, or some of various
attributes to determine whether those features
has a pattern. Sometimes you use tumbling
average or moving averages because they are
generally good in showing short-term or medium-term
pattern of failures. So you can create tumbling
averages and rolling averages. Depending upon your
domain, sometimes you have to create different
types of features. So once you have a use case,
you have a data set, now, you have created labels. And after that, you are
ready to select an algorithm. So here, you will see that
depending upon your use case type– whether it is
a classification use case, multiclass classification
use case, regression use case, or anomaly detection
kind of a use case, you can select various
of different algorithm. And you can also
select whether you want to use traditional
machine learning, whether you want to use
deep neural network, or whether you want to use
more powerful technique that also use memories in
the deep neural network, like recurrent neural network. So generally, for
classification problem, you will use traditional ML,
like random forest or decision trees. If your absolute
values have pattern, then you will use
deep neural network. And sometimes if the spikes
are showing a pattern, then it’s better to use
recurrent neural network. I will show you an example when
we reach to one of the graph. If you see that absolute
values of your data is showing you a pattern, then
you can use traditional ML, or you can use deep
neural network. That will give
you a good result. But if absolute values are
not showing you result, and if you see that your
failure is indicated by how steep the
spike was, then you should be considering
recurrent neural network. Similarly, for multiclass
classification, same thing. You can consider RNN, DNN,
or standard random forest kind of algorithm. You can also use RNN or
LSTM for regression problem. Or you can use random forest
or a hidden Markov chain for regression problem. If you are using an
anomaly detection, there are multiple techniques
for anomaly detection you can use. If you want to use deep neural
networks for anomaly detection, you can also consider
using autoencoder, which are very powerful
in identifying anomalies in a vector. For example, if you
have a complex data, and you do not know which
of the vectors represented by this complex is anomaly,
you can use autoencoder. And then there is
a variation of it. It is called
conditional autoencoder, which is very
powerful in finding anomaly detection in a data set
which has a complex vectors. So depending upon
your use case, try to use simpler algorithm
first– traditional ML or deep neural network. If it gives you a good
result, you need not have to try anything else. If it doesn’t give
you a good result, then you should try
recurrent neural network. Regarding anomaly
detection, you should also make a decision, like
what kind of anomalies you are trying to identify. Are you trying to
identify point anomalies, like if the individual data
points are out of range, or contextual anomalies are
like in a certain context, when some device was running
in a certain context, it was generating more heat. Whether that kind of anomaly
you’re trying to identify, sometime you want to see
whether a sequence of things were anomaly. So try to identify what
kind of an anomalies you want to identify. Sometimes scenario can
become more complex. For example, if aircraft fly
in different weather conditions from different
airports, and there are different
types of aircrafts, and the runway
size is different, and they use a different pattern
when they are taking off. And you want to
identify which one was the right takeoff, which
one was not a right takeoff. Standard anomaly detection
may or may not work. That time, you want to use
autoencoder and condition autoencoder. So you selected a use case. After that, you
selected a data set. And you have the right schema. Then you created the labels. You created the features. You selected the algorithm. And then, you go ahead
and training model. So the traditional
mechanism in training model is you have a
denormalized data set. You give it to the
neural network. The UL model get trained and
start making a prediction. And you see the
cost, like how much is the actual versus predicted. And that’s what you get the
confidence whether your model is good or not. And then you can take
it to production. So generally, when you define a
use case at that point in time, you also define
what kind of metrics you want to optimize
depending upon your case. So I will not go into detail
about precision and recall. These are standard metrics
for classification problem. But you may want to
work with your business to ensure what is
more important. Is reducing false
positives more important, or reducing false
negatives more important? For example, if you are
doing predictive maintenance for aircraft engine, you want
to predict failure in advance. Reducing false negatives is
generally more important. But if you are trying to
do predictive maintenance for a car battery, you do
not want to replace battery unnecessarily also. So reducing false positives
is also a good goal. So you needs to work with
your business in advance to determine how do
you want a trade off between false positive
and false negatives, so that you can
make a decision when to make model to production. And generally, predictive
maintenance data set are not balanced data set. So you do not use
standard accuracy metrics. So other metrics to consider
is, like, how much attention you want to pay
attention to TP rate or to negative rate
and false positive rate or false negative rate. So work with your
business to understand what metrics are right for it. Generally, data scientists are
good in tuning the metrics. So even if you agree to pay
attention to multiple metrics, agree on one metric,
which you will optimize and remaining metrics
you may want to have a satisfactory criteria. So that way your
data scientists team know what is the minimum bar
for the remaining metrics. And then, they can continue
to optimize your optimization metrics. One other small thing–
if you are building a multiclass
classification model, then you needs to
compute that metric for the multiclass
classification. So there are two methods used– micro and macro. Micro generally give
attention to each instance, and you sum up
[? three ?] positive rate for each instance. In a micro method, you give
importance to the whole class. And you can
determine which class you want to pay attention to. And then you compute the
metrics for one class. And then you average it out
for the overall classes. So here is an example
of a regression problem. I’ll show you an
aircraft engine data set. When we use the deep learning
network using the IoT data, we were getting a
metric for 45 RMSE. But even if 45 number
looks very big, it was OK. Because the number
was actually high. So when you will
see the result 2017, this is the result we
were getting last year. In the beginning, when
the engine was new, even if it was predicting error
rate with a high error rate, or let’s say that
in July 1, 120 week, and it might be
predicting 90 weeks. So even if the error
rate was 30 week, we still knew that there
is a lot of life left. But as the engine
became older, it started predicting right value. So it was still
acceptable to business in this case that,
OK, RMSE of 45. When we started adding
other data attributes, we started getting
RMSE of 2 to 5. Then overall, the results
were almost correct. So the results were very good. So you should try to
add more data element, like what is the operational
characteristic when an engine is running. That will generally give
you a better result. So we talked about
the common use cases. And we take an example of
a engine scenario, where we use a NASA public data set. And we predicted what is the
remaining life of an engine. Let’s take another example. We’ll use a USGS data set. This is a United States
Geological Survey. They collect data about how
much is the water flow happening in the river. In the use case, they
wanted to predict water flow in the river. And the reason was
when a flight comes, the gauges generally break. And this is the time the
government needs the data. Because emergency
response team needs to know where to locate people. And this data is also
used on a daily basis, like how much of water should be
given to farms or agriculture, and how much should be
carried forward in the river. Whether dam should be
storing the water, or dam should be releasing the water. So USGS survey
collect this data. And this used for flood risk,
water distribution, reservoir, dam management, and it
is used for agriculture. Now when these gauges break,
we looked into the data. And the gauge’s breakage
was very random. For example, in
California, gauges doesn’t break very often. But in Alaska, gauges break very
often because the water spikes are very big there. And generally, these gauges
break either due to strong time or randomly, there is a tree
or debris coming in the river, and that breaks the gauge. So here, you will see
an example of a guess that the gauge has
a part in the river. And there is some part above it. And if something is
floating in the river, it can break the gauge. And you will see there are 8,200
gauges built across the nation. And these gauges has
a huge cost also. They’re important for life. But they also
require $184 million to maintain all
these 16,300 gauges. And when a gauge breaks during
the storm time or a flood time, that is a time when government
need this data most. And sending somebody to replace
the gauge is very dangerous. Even if you predict in
advance that this gauge is going to break and
you replace first, the new gauge will break. So here, we discussed
with business that predictive maintenance
is not a right use case. So something else
should be done. So we came out
with a new scenario here that instead of predicting
when the gauge will break, we should start predicting
the water flow in a river. Predictive water flow
in a river is, again, not a straightforward method. Because it’s not like
a sales forecasting, which has a seasonal attributes. Water flow in a river depends
upon the weather, rain. It also depends upon how
much snow is melting. And how much snow
is melting depends upon how much snow was
accumulated last year or the last few years. And that is a variable,
and one can’t build a model very easily. There’s a lot of research
happening for last many years. And that problem is
still not solved. So what we did in this case
is we looked into the data that we know that we
can’t predict the water flow in a river. But we looked across
the watershed. Here, if you look
into this example, we did a data
exploration exercise. This exercise done in
Google Data Studio. We looked into the USGS data. There are gauges in a watershed. You will see that there is at
least a pattern when the water level is increasing,
it is increasing in most of the gauges. And when it is decreasing,
it is decreasing in most of the gauges. There’s no straightforward
correlation. But there is some pattern. You will also see that
there is a negative pattern. Like if a dam is
storing a water, or a reservoir is
storing a water, during that time, the
river connected below it, their water flow reduces. When a dam or a reservoir
release the water, suddenly, the dam
water flow reduces, or the reservoir
water flow reduces. But the water flow
downstream increases. So there is either a
big positive correlation or a big negative correlation. So first exercise we did was
using machine learning model, we tried to identify
what set of gauges has a correlation–
positive or negative. And then we
eliminated gauges that doesn’t have a correlation. We also did some more
narrow correlation excise. Here, you will see that
these gauges, either the data or the water flow increases
suddenly in all of them or reduces suddenly
in all of them. Then we looked into a
sheet from the big quarry, where we build this
very simple correlation metric in big quarry, like
which gauges act together and which gauges
doesn’t act together. Once we have this
data set, then we had high confidence that we
can go ahead and build a model. At that time, we built a model. And then we were able
to make a prediction with a very high accuracy
that the red line shows here is the predicted value. And the blue line shows
here is the actual value. So this is on a
training data set. This is on a
validation data set. It’s [? close. ?] And
I’ll show you a demo. This is on a real data
set for the month of July, which we have never seen before. It’s predicting
quite accurately. So here is a simple application. This area is about
Montana river. So it has a number of gauges. If I click on a gauge,
I see the water flow, like how much is the water flow. And then I can go here and
see what is the water flow. And assuming this gauge
was missing since May, these are the actual
values from May. Assuming this gauge was
missing, the purple value was the predicted value. So it would have
predicted very accurately when this gauge was broken. So in this case, we
looked into the use case. The use case was, can we predict
the failure in the water gauge? Then we determined that this
use case is not a right use case for this problem. Then we shifted the problem. Can we predict the water flow? Predicting the
water flow, again, is a very complex problem. There is a research happening
since last 20 years. Because it depends
upon the snow melting, nobody is able to predict it. So we used a workaround
technique here, where we took a set
of gauges together. We identified that these
gauges act together. There may not be a very
strong correlation, but they act together. And then, we will build
two machine learning model. One does the clustering
of the gauges. Second that does the prediction. So if during a
storm, let’s say you have a cluster of 10 gauges,
and five of these gauges break down. You can still predict the
remaining five gauges. And you need not have
to replace immediately. And you can predict nine
to 12 months very easily. And the prediction
accuracy will not go down. Can we shift to the slides? So this is somebody
I [? pointed ?] out. We created cluster of gauges. Then we applied
machine learning. We were able to build the
models that can very accurately predict the water flow. whether you’re building
predictive maintenance for an oil rig, or for a
aircraft, or for a river, if you can collect the data
using your IoT device– so in this case, within Google,
we used Google Cloud IoT Core. And we used [? PubServ. ?]
Those are our IoT component. Then we used data flow
and BigQuery for data processing and storage. And we used Cloud ML
for the model building. This is a reference architecture
you can pretty much use in most of the use cases
where you have a IoT data, and you want to build
predictive maintenance solution. We didn’t get the time
to talk about edge. In many cases, you want to
build predictive maintenance on the edge also. So we have a TensorFlow Lite. You can take your TensorFlow
or Cloud ML model, and you can compress
it very easily. And once you compress it,
the model becomes lighter. And you can deploy
it on an edge. So that’s all we have for today. So in summary, if
you have a IoT data, discuss the use case with
business, agree the metrics, then do the data exploration to
determine your data set and use cases batches. And do a checkpoint. Once you past the checkpoint,
build a predictive maintenance use case. If predictive maintenance
use case is not right, think about predictive
monitoring use case. And then if it meets
the value prop, then build either of these. And focus on creating labels. Focus on creating features
and, finally, training model. And you will be able to
build an end-to-end solution. [MUSIC PLAYING]

2 thoughts on “Predictive Maintenance & Monitoring using Machine Learning: Demo & Case study (Cloud Next ’18)

Leave a Reply

Your email address will not be published. Required fields are marked *