GECCO '18- Proceedings of the Genetic and Evolutionary Computation Conference CompanionFull Citation in the ACM Digital Library
SESSION: Competition entry: internet of things: online anomaly detection for drinking water quality
Online anomaly detection for drinking water quality using a multi-objective machine learning approach
This document proposes the use of multi-objective machine learning in order to solve the problem of online anomaly detection for drinking water quality. Such problem consists of an imbalanced data set where events, the minority class, must be correctly detected based on a time series denoting water quality data and operative data. In order to develop two different robust systems, signal processing and feature engineering are used to prepare the data, while evolutionary multi-objective optimization is used for feature selection and ensemble generation. The proposed systems are tested with hold-out validation during optimization, and are expected to generalize well the predictions for future testing data.
In this paper, a deep BiLSTM ensemble method was proposed to detect anomaly of drinking water quality. First, a convolutional neural network (CNN) is utilized as a feature extractor in order to process the raw data of water quality. Second, bidirectional Long Short Term Memory (BiLSTM) is employed to handle the time series prediction problem. Then, a linear combination of t-time-step predictions weighted by a discount factor was utilized to ensemble the final output of event. Finally, cost-sensitive learning combined with Adam optimization was applied to learn the model according to the imbalance property of the event label.
This paper evaluates anomaly detection approaches for drinking-water quality. Two major machine learning techniques are compared. One is manual feature engineering with feature subset selection for dimensionality reduction. The other is automatic feature learning through a recurrent neural network. Both methods incorporate the time domain for change detection. Preliminary results show a superior performance of automatic feature learning with an F1 score of 80%. While the feature set proposed in this work out-performs naive classification with original features, it needs further analysis to reach comparable performance to the automatic approach.