26–27 Feb 2020
RMIB
Europe/Brussels timezone

AI for the Quality Control of Surface Data

Not scheduled
20m
Conference Room (RMIB)

Conference Room

RMIB

Ringlaan 3 B-1180 Brussels Belgium

Speaker

Dr Christian Sigg (MeteoSwiss)

Description

MeteoSwiss is engaged in a multi-year effort to renew its automated quality control (AQC) system for surface data.
Our AQC used to be a classical expert system built from hand-designed rules. The ruleset follows WMO recommendations and includes tests for physical and climatological limits and tests for the consistency between measurements. Although a lot of domain expertise went into the design of each rule, the performance of the whole system remained unsatisfactory. A quantitative analysis concluded that simple rules (such as testing for physical limits) perform well, but complex rules perform poorly and are also difficult to maintain (the combined rule definitions took up 70000 rows in an RDBMS table).
We are therefore replacing complex rules with machine learning (ML) models that are trained on past measurements and expert feedback. These data-driven models achieve much better performance and allow for an explicit trade-off between costs (false positives and negatives) and benefits (true positives). As examples, we present an SVM model to detect spurious precipitation in a weighing rain gauge and an XGBoost model to validate climatological snow measurement series. We also present a spatio-temporal consistency test for 2 m air temperature where the data is its own model.
We also replaced our flag-based quality information (QI) with a probabilistic QI. Using the Naïve Bayes approximation, the AQC computes the probability that a measurement is plausible given all available test outcomes. This probabilistic plausibility combines prior knowledge, the output of several independent AQC systems and expert feedback into a QI summary for our users that is both simple and well-defined.
We discuss issues that arise when deploying ML models in the operative data-processing chain. It is tempting to train models that rely on a multitude of data sources, and doing so typically increases the raw model performance during development. But it also creates a technical debt that becomes visible after deployment: the models are exposed to more missing data, changing data distributions (e.g. due to an instrument replacement or a product version change) and comparison measurements that are themselves implausible.
We also discuss the still important role of manual QC. Our climatological series continue to receive expert inspection, even though the conditioning of models can reduce the risk of filtering out extreme values. Expert feedback also forms the basis for evaluating and training our models, creating a symbiotic relationship where the AQC in turn reduces the number of cases that need expert inspection.
We end with a quick survey of other AI-related activities in the Surface Data group at MeteoSwiss, including the extraction of prevailing visibility from webcams with the help of a deep neural network classifier, the detection and classification of pollen from holographic images, and the use of generative adversarial networks for the photo-realistic visualization of weather forecasts.

Primary author

Dr Christian Sigg (MeteoSwiss)

Presentation materials

There are no materials yet.