• Extract, transform, load (ETL) the data. • Sanitise the data. Data exploration and feature selection • Develop overall
Data exploration and feature selection
Model validation and deployment
Variable importance and optimisation
Data extraction
Classication Good vs poor
trends and correlation; identify features and key tags. • Conduct auto-correlation and treat missing data. • Explore correlations among them. Classification: good vs poor • When identifying a good operating zone, segregate data with respect to good vs poor operation conditions. • Examine the segregated clusters and their behaviours in detail. Variable importance and optimisation • Identify contributing parameters that result in deviations from optimal performance, and tune and validate the model accordingly. • Translate data science outcomes into “ Hybrid models can easily adapt to historical and real-time data from plants to generate reliable, prescriptive advice that helps to continuously optimise plant operations ” operational and technology language. Once the model is validated, it will be deployed online. The deployed model will provide insights and guide operators to reach the good operating zone in real time. Hybrid models – best of both worlds The hybrid process model utilises the advantages of first principles models and data science models to improve the overall accuracy of chemical processes. Some of the disadvantages and limitations of these models can be overcome using hybrid process models. These models can easily adapt to historical and real-time data from plants to generate reliable, prescriptive advice that helps to continuously optimise plant operations.
Figure 4 Data analytics model development stages
skills, and knowledge of mathematics and statistics to extract meaningful insights from data. Data science models use ML and AI to generate insights that engineers, scientists, analysts, and business teams can translate into tangible business value. These models process and examine large datasets to uncover hidden patterns, unknown correlations, and other useful information. Multivariable analysis reduces process data to its essential dimensions, making it easier to visualise and interpret. This helps identify the minimum critical sets of variables driving the process performance. Traditionally, first principles and rule-based models were used to predict process and equipment upsets as part of early event detection (EED) models. However, there has been a paradigm shift in the accuracy and advanced warning of EED models with the use of advanced data science and AI/ML-based technologies. These models can dynamically learn the changing behaviour of the process and help predict potential process upsets well in advance, allowing for their prevention. Developing soft sensors and identifying good operating zones are also excellent examples of data analytics models being used in process industries. There are multiple ways to build data analytics models. The generic steps involved in model development are as follows (see Figure 4 ), after identifying the correct use case: Data extraction • Analyse the process and identify critical parameters/datasets. • Identify the required dataset, which may be available in a single source or multiple sources (for example, process data historian, laboratory information management system [LIMS], SAP, Meridium).
Refining India
62
Powered by FlippingBook