A comprehensive framework from real-time prognostics to maintenance decisions

Studying the influence of imperfect prognostics information on maintenance decisions is an underexplored area. To bridge this gap, a new comprehensive maintenance support system is proposed. First, a survival theory ‐ based prognostics module employing the Weibull time ‐ to ‐ event recurrent neural network was deployed in which prognostics competence was enhanced by predicting the parameters of failure distribution. In conjunction with this, a new predictive maintenance (PdM) planning model was framed via a trade ‐ off between corrective maintenance and time lost due to PdM. This optimises maintenance time based on operational and maintenance cost parameters from the historical data. The performance of the proposed framework is demonstrated using an experimental case study on maintenance planning for cutting tools within a manufacturing facility. Systematic sensitivity analysis is provided, and the impact of imperfect prognostics information on maintenance decisions is discussed. Results show that uncertainty about prediction declines as time goes on, and as uncertainty declines, the maintenance timing becomes closer to the remaining useful life. This is expected, as the risk of making a wrong decision decreases over time.


| INTRODUCTION
Much of the maintenance undertaken today is either corrective (replacing an asset after it fails) or preventive (assuming a certain level of degradation with no input from the asset itself, and maintaining the asset on a fixed schedule regardless of whether the asset's condition would require it). Both strategies are exceedingly inefficient. Thus, predictive maintenance has attracted substantial research focus over the last decade [1]. Predictive maintenance involves forecasting the asset's remaining useful life (RUL) to plan maintenance activities. For instance, a novel integrated diagnostics and prognostics system using a support vector machine was presented in [2,3] technologically advanced a statistical approach for prognostics using the estimated linear model. [4] developed three data-driven prognostics models, namely, offline, online and semi-offline, for different industrial circumstances, namely, online and offline condition monitoring. [5] presented a new product quality-based failure-prediction module for the manufacturing environment. [6] proposed a multi-agent system framework for online prognostics. The same was implemented for collaborative prognostics in [7]. [8] also exhibited accurate life predictions through the stepwise regression feature subset selection technique. [9] designed a prognostics approach under unknown initial wear. [10] proposed a new particle-filtering-based method to provide a precise time-to-failure estimate. [11] circumvented losing potentially valuable information about an asset's degradation process via extended capsule neural networks for fault prognostics, and in particular, RUL estimation for multidimensional sensor data. In [12], probabilistic estimation of RUL by three data-driven prognostic methodologies is presented based on state-of-the-art as well as innovative mathematical models, which are gradient-boosted trees, Bayesian neural networks and non-homogeneous hidden semi-Markov models. In [13][14][15], many approaches to RUL assessment using physics-based and data-driven methods are reviewed. It is observed that available approaches spotlight the prognostics step and do not consider maintenance decisions, which are addressed independently.
For instance, [16] proposed a maintenance optimisation model not considering prognostics information that resulted in maintenance shortage or overage. [17] shared out an optimisation model with time-to-failure substance to a Weibull probability distribution. [18] associated maintenance planning for a repairable system. [19] gave a methodology for predictive maintenance that solely considered the diagnostics information. [20] presented a cost-oriented predictive maintenance strategy based on reliability state. [21] analysed the cost implications of various multi-agent system architectures but did not consider the effect of imperfect prognostics on maintenance decisions. [22,23] dealt with post-prognostics implications but under the assumption that the prognostics data of the asset were available at all times. Nonetheless, neither group proposed an extensive system (from real-time prognostics to maintenance decisions) that researched the effect of imperfect prognostics on maintenance decisions. Consequently, a novel comprehensive maintenance support system is proposed herein.
First, a survival theory-based real-time prognostics module employing the Weibull time-to-event recurrent neural network (WTTE-RNN) was deployed. It is innovatively modelled to project a detailed picture of asset reliability by predicting the asset's probability of failure distribution in contrast to predicting time to failure. This is coupled with a novel predictive maintenance (PdM) planning model that resourcefully balances corrective maintenance (CM) costs and PdM costs to facilitate the determination of the optimal maintenance time that minimises overall system maintenance costs. In addition, we analysed how imperfect prognostic information influences maintenance decisions. Critical insights are underlined, namely, uncertainty about prediction drops as time goes on; as uncertainty drops, maintenance timing becomes closer to RUL because the risk of making a wrong decision is decreasing. Based on this analysis, guidelines are offered for managers to help them improve their chances of making the right maintenance decisions. This study acts as a proof of concept, showing the importance of utilising prognostics information in maintenance planning.
The novel contribution of this paper is in the conceptualisation of a comprehensive maintenance support system satisfying the vital necessities: (a) a real-time prognostics approach that can be extensively realised for several systems; (b) a flexible maintenance-decision model that rapidly evaluates different operational and maintenance costs; (c) consideration of the implications of imperfect prognostics on maintenance decisions to find the right moment for performing maintenance activities. Having such a comprehensive system, a supervisor can design maintenance exercises that are all the more viable for diminishing machine downtimes and improving the production stream. The added contribution lies in the results. The performance of the framework is proven via a case study from a manufacturing environment to distinguish suitability, quality, reliability, robustness, and applicability in a real-world industrial environment. Lastly, the work is complimented with a systematic sensitivity analysis.
The rest of the paper is structured as follows. Section 2 provides details of the methodology involved in developing the comprehensive maintenance support system. Section 3 gives the specifics of the real-life case study and implementation results. Lastly, Section 4 concludes and highlights the key contributions.

| Prognostics module
In the recent technical literature, a large variety of prognostic applications that estimate time to failure have been reported [13,14,24]. For the experiments discussed here, a novel realtime prognostics algorithm, WTTE-RNN, was deployed. WTTE-RNN combines survival theory with recurrent neural networks (RNNs) to model an asset's failure probability distribution. WTTE-RNN assumes the distribution to be Weibull in nature, which is a standard assumption for reliability studies [25]. Inputs to WTTE-RNN are multivariate vectors comprising real-time sensor data, and output are the scale and shape parameters of the Weibull distribution. RNNs have been recommended for time-series predictions such as prognostics on several occasions [26]. Added benefits of WTTE-RNN are the capabilities to analyse both censored and uncensored data and to predict failure distributions rather than a single RUL value. Failure distributions are more informative than singlevalued RUL predictions because they model a detailed picture of an asset's future health and therefore enable operators to plan risk-based maintenance activities.
For fulfilment, the relevant mathematics is described here. Because we only use WTTE-RNN for analysing censored data, the mathematics described here corresponds to that part of the WTTE-RNN description only. The reader is advised to refer to [27] for more information. WTTE-RNN relies on a loglikelihood loss function to be optimised by the RNN to predict the shape parameter (θ) and scale parameter (η) of a Weibull probability distribution. The prognostics module analyses a vector of sensor values as an input, representing the asset's current health condition at a given time, and based on this, the RNN estimates (θ, η). The log-likelihood loss function to be maximised by the RNN is The prognostics module tries to maximise the probability of estimated time to failure (Y n t ) being equivalent to actual time to failure (y n t ) for an available vector of sensor features (x). The summations ( P N n¼1 P T n t¼0 :) are made over every trajectory (N ) and over every time step for every trajectory (T n ). The probabilities appearing in Equation (1) are obtained via survival analysis.
For predicting discrete events, the probabilities can be shown as where Λ(t) is the cumulative hazard function and dðtÞ = Λ (t+1) -Λ(t) is the step cumulative hazard function. For a Weibull distribution for the probability of occurrence of an event with respect to time, the cumulative hazard function is The loss function for a Weibull distribution of events therefore becomes where η n t and θ n t are the scale and shape parameters of the Weibull distribution, respectively, and y n t is the time to failure at every time step t and trajectory n. In summary, the RNN attempts to find the weights that maximise the logðL d Þ described in Equation (2). The output (failure probability distribution) of the prognostics module is coupled with the PdM planning model.
For many assets, data-driven prognostics models overcome the limitations of reliability-based prognostics models. In our method, we have combined the best of both techniques by using a data-driven model to optimise a reliability-theory-based loss function. Concretely, the Weibull loss function (see Equation 2) is inspired by reliability theory. It enables flexible modelling of an asset's probability of failure over its lifetime. However, the Weibull loss function's parameters are estimated using an RNN, which is a commonly used data-driven prognostics technique [28] wherein the RNN learns from the past history of failures and generates the parameters corresponding to the least overall loss.

| Predictive maintenance planning model
We consider an industrial facility consisting of a single asset system with a time-to-failure complying Weibull distribution. In this case, failure is viewed as asset degradation (F AD ) because of wear and tear. It is assumed that whenever a failure is observed, corrective replacement (CR) is carried out leading to a CR cost. The unexpected failure of the asset due to degradation can increase risks and safety hazards. Accordingly, predictive replacement (PdR) of the asset is actioned to bring down the probability of asset failure and reduce the risk of an unexpected failure. However, PdR requires additional time and funds. Therefore, PdR optimisation is executed to trade off the failure and PdR cost. To exhibit the benefits of PdM, a cost model is developed by capturing the various costs pertaining to the industrial operation that are governed via failures and PdM planning. The economic objective is to minimise the expected total cost per unit of time to conduct predictive maintenance ([ETC ] (PdM ) ) by choosing the optimal time for PdR (O PdR ). Here, [ETC ] (PdM ) is the proportion of the addition of the expected total cost of CR due to asset degradation (E½ðC CR Þ F AD �) and PdR (E[C PdR ]) to the planning period/ evaluation time (E T ) for which the analysis is performed. It is written as follows: Theoretic and numerical models of constitutional costs in [ETC] (PdM) are detailed in the following subsections.

| Corrective replacement cost
Assume the system is stopped during the replacement, and take ðC CR Þ F AD as the cost of CR due to asset degradation including the downtime cost. Consequently, the expected cost of CR owing to asset degradation (E½ðC CR Þ F AD �) is determined as where A CR � ½P r � C lp þ C L � is the downtime cost owing to CR, A CR is mean time to perform the corrective replacement (hours), P r is the production rate (products/hours), C lp is the cost of lost production (GBP), C L is the cost of the labour (GBP/hours), C FCR is the fixed cost of corrective replacement (including the cost of asset replacement) and FðE T Þ θ; η is the cumulative probability of failure owing to asset degradation for a given evaluation time as a function of a given shape (θ) and scale (η) parameter.

| Predictive replacement cost
In general, the cost of PdR for an asset is modelled to include the downtime cost with the replacement, labour, and asset costs. The replaced asset always has some useful remaining life that is usually not considered in PdR cost [29]. Comprehensive models that can consider the effect of lost remaining life on overall PdR cost will become increasingly critical. Therefore, in our model, the effect of an asset's lost remaining life is modelled in the PdR cost. This prompts the optimum utilisation of asset life. RUL is the residual life of the asset after a certain time. Here, the proposed model captures the real-time RUL information of the asset with the help of the failure probability distribution acquired as an output from the prognostics module. Moreover, we take the cost of lost remaining life (CLRUL i ) relative to mean life cost. It is assumed that asset cost is uniformly distributed over the lifetime of the asset [30]; the CLRUL i is given as JAIN ET AL. -177 where C A is the cost of the asset (GBP) and A L is the mean life of the asset (hours).
The function η i Γ gives the RUL of the asset in hours at a given point of time. Therefore, the total cost per PdR is given as where A PdR � ½P r � C lp þ C L � is the downtime cost owing to PdR, A PdR is mean time to perform PdR (hours), C FPdR is the fixed cost of PdR (GBP) (including the cost of asset replacement). For assessing the optimal time for PdR, (O PdR ), a balance is created between the cost of the lost remaining life of the asset and maintenance and failure costs. The addition of both the costs, [ETC ] (PdM ) , is calculated for each time step of the operation to be made by the asset, and corresponding to the minimum cost, the optimal PdR cost, along with the optimal time for PdR, is obtained.

| EXPERIMENTAL CASE STUDY
In this section, the proposed methodology is verified on the cutting tool degradation data set from reliability and prognostics repository provided by Industrial and Systems Engineering, IIT Indore, India. This data set is generated for prognostics and health management studies [31]. The data are provided by a testing platform furnished with a CNC milling machine and sensors viz.dynamometers, etc. The primary objective is to provide real-life historical data at different operating conditions for a population of identical cutting tools with cutting force sensor data that characterise the degradation of tools along with their entire operational life. In the present study, the data considered comprises six identical cutting tools operated at a fixed operating condition. Here, cutting force signal in feed direction is measured for every 0.07 h' of operation for each tool until complete failure. As the preprocessing step, four statistical features viz.average cutting force, root mean squared value, signal power, and maximum force level are extracted from cutting force signals for all time steps (0.07 h'). These four features, along with time, are used to represent the degradation of the tools.

| Performance assessment of the prognostics module
An exhaustive performance investigation of the prognostics module is executed to distinguish the suitability, quality, reliability, robustness, and applicability in a real-world industrial environment. Consequently, we divided the data into two subsets, training data (four tools) and testing data (two tools). All four trajectories in training data correspond to the same failure type (breakage) and operating conditions. The trajectories are of different lengths and comprise the same number of sensor features. Moreover, the noise associated with the sensor values is random and can be filtered using a moving average. Therefore, the trajectories are cleaned (rolling average with window size 10), and the values are normalised. Finally, we obtain a training data set of four run-to-failures with six features corresponding to each time step.
We demonstrate training WTTE-RNN with one long short-term memory (LSTM) layer. The architecture for the RNN is 15*10*20*10*5, with the 15-neuron layer being the LSTM layer. Two sets of experiments (for two assets from testing data, test IDs I and II) were performed to analyse performance. Table 1 gives the details of implementation results for both test IDs. Here, to gauge suitability, mean absolute error (MAE) is computed. In this analysis, MAE measures how close the module's predictions of the RUL are to the actual RUL. The MAE value of 1.14 and 0.41 from the prognostics module displays that the predicted RUL is close to the actual RUL, demonstrating the suitability of the module in real life. Stability is measured by precision in terms of the dispersion of the prediction error around its mean. Values as low as 1.37 and 0.51 show that the predicted RULs have higher stability. Next, the quality of prediction is scaled based on the correlation coefficient and Spearman's rho. These measures disclose the strength of the association between predicted and actual RUL. Values near 1 for both test IDs show that the predicted RULs have a strong positive relationship with the actual RULs, showcasing the high quality of predictions from the module. Moreover, the high reliability of these predictions is evident from lower values for the root mean squared error. Lastly, robustness is evaluated by plotting each performance output of the prognostics module, as shown in Figures 1 and 2. The perception from this figure is that each actual and predicted RUL is near the other, showcasing that the prognostics module is robust in predicting the RUL of the asset. These implementation results guarantee a proficient predictive maintenance framework dependent on a timely cautioning of upcoming failures.
(Note: Errors exist between the predicted and actual RUL (as seen in Figures 1 and 2). Concerning their influence, as uncertainty in prediction decreases, the optimal maintenance timing gets closer to the RUL. Section 3.2.1 analyses the implications of imperfect prognostics information on maintenance decisions in detail.)

| Predictive maintenance framework
We consider a production facility representing the cutting tool as a single-component machine producing mild steel plates. All gathered operational cost parameters from the historical data are mentioned in Table 2. The cost of the cutting tool (C A ) utilised in the process is 3000 GBP. The mean life of the tool is computed by historical failure data and given as 3.90 h. As per maintenance history, the mean times to conduct corrective (A CR ) and predictive (A PdR ) replacement tasks are 0.6 h each. The fixed cost of corrective (C FCR ) and predictive (C FPdR ) replacement is 3000 GBP separately.

| Implications of imperfect prognostics information on maintenance decisions
To analyse the implications of imperfect prognostics information on maintenance decisions, we evaluate the [ETC ] (PdM ) at every 10 th time step of the test assets. Here, for each test asset ID, we run the prognostics module at a different time step (T 1 , T 2 …T n ), predict the probability distribution (θ and η) and estimate the variance. These parameters (θ and η) are fed to the predictive maintenance planning model to obtain the

TA B L E 2 Parameters utilised in the case study
Parameter Production rate P r optimal time for predictive replacement (O PdR ). Table 3 presents the detailed results for both test assets. The variance of the distribution gives details about uncertainty in prediction (higher variance means higher prediction uncertainty). At the initial stage of the operation when there is very little information about the condition of the asset, the variance will be higher, implying higher uncertainty in prediction. As time goes on and we obtain more information about the asset's health, the variance will be reduced, implying lower uncertainty in prediction. Along those lines, Figure 3 shows the probability density function (PDF) for test asset ID I at the first and second time steps. It can be observed that at T 1 , the PDF is very wide, with a variance of 1.90. This is because the prediction of θ and η at time step T 1 has high uncertainty because it is the first point of prediction, and there is a lack of information in terms of asset condition. However, as time goes on and we obtain more information about asset condition, uncertainty in predicted θ and η is reduced. This is evident as the PDF at T 2 is narrowed, with 80.53% less variance than in time step T 1 , indicating a drop in uncertainty. This becomes clearer from Figures 4 and 5, which show the PDF for all time steps for test asset IDs I and II. This implies that uncertainty about prediction drops as time goes on. On the other hand, Figures 6 and 7 show the predicted RUL and optimal time for replacement (O PdR ). It can be observed that at time step T 1 , the optimal time for PdR is far from the predicted RUL, although at time step T 5 , the optimal time for PdR is very close to the predicted RUL. This is again because of the reduced uncertainty of the prediction. This implies that as uncertainty goes down, the maintenance timing becomes closer to the RUL because the risk of making a wrong decision is decreasing. Accordingly, the guideline from this for the operational planner is to not stick to the initial optimal maintenance plan but to instead dynamically update the predictive maintenance plan as time goes on to make maintenance decision at the right time.

| Sensitivity analysis
In the exercise, the approximation of appropriate process and cost parameters are subject to inaccuracies, so it is vital to distinguish the influences of errors on the quality of the output attained. Accordingly, a systematic sensitivity analysis utilising essential model parameters is carried out, as can be seen in Table 4. The base level utilised is the same as in the case study along with four other levels of these parameters at ±10% and ±20% of the base value. The ranges of the optimal parameter and obtained cost are presented in Tables 4 and 5. Figure 8 shows that [ETC ] (PdM) is more sensitive to the fixed cost of PdR and less susceptible to the mean time to perform the PdR etc. Thus, the estimation of the fixed cost of a PdR should be carried out accurately.

| CONCLUSION AND FUTURE WORK
This paper formulates a comprehensive framework from realtime prognostics to maintenance decisions. The intention was to render manufactures with a complete maintenance support system to instantaneously prevent asset performance degradation and unexpected failures. The following are the key offerings of this study: F I G U R E 3 Probability density function plot at T 1 and T 2 for ID I F I G U R E 4 Probability density function plot for T 1 to T 7 for ID I F I G U R E 5 Probability density function plot for T 1 to T 4 for ID II a) For the prognostics phase, a new sensor-based prognostics module was modelled employing the WTTE-RNN in which prognostics competence was enhanced by predicting the parameters of failure distribution despite a single time to failure, and therefore, the offered approach delivers a superior response to real-world requirements. b) For the post-prognostics phase, a new predictive maintenance planning model was framed through a trade-off between CM and lost remaining life due to predictive maintenance, thus allowing rapid optimisation of time for maintenance via all gathered operational and maintenance cost parameters. c) The model's performance is highlighted via a case study from a manufacturing environment complimented with a systematic sensitivity analysis. The influence of the imperfect prognostics information on maintenance decisions is debated by showcasing interesting insights. Namely, uncertainty about prediction drops as time goes on, and as uncertainty is reduced, the maintenance timing becomes closer to the RUL because the risk of making a wrong decision decreases over time.
In essence, the model is an entire cognitive operation from conducting prognostics to making maintenance decisions. Such complete models integrating monitoring characteristics, prognostics, and maintenance assessment can give rise to fruitful discussions.
In the future, this framework will be extended to consider a fleet of assets for fleet-level prognostics and predictive maintenance planning. The service operations of a fleet of assets can be controlled with a multi-agent system. In this case, our model can be deployed along with these agents so they calculate health indicators using real-time data collected from the controlled assets. Using collaborative prognostics, it has been previously shown that agents engaging in information exchange with others, controlling assets with similar conditions of degradation behaviour, can improve the accuracy of predictions.
One other direction for future work is to generate updated maintenance plans and schedules, based on our proposed model, with an accurate real-time view of asset condition at each time of evaluation. As a desirable characteristic in these agent-based control systems is to minimise resource consumption and communication overhead, another research direction is to analyse the impact on the multi-agent control system of using a reduced number of features and aggregated statistics as in the proposed prognostics model.