A UNECE guide, Machine Learning for Official Statistics, released today, will help national and international statistical organizations to harness the power of machine learning (ML) to modernize the production of official statistics.
As found by a 2021 project led by the United Kingdom and UNECE, ML holds enormous potential for modernizing the production of statistics, which can be very time- and resource-intensive when relying on conventional approaches. Statistical offices are increasingly exploring the addition of ML to their arsenal of tools to process very large datasets - such as price information gathered automatically or 'scraped' from online stores - since conventional statistical techniques and tools prove prohibitively expensive in the face of such large volumes of data.
The potential is especially great for processes that require 'human-like' decision-making, such as reading a textual description and assigning it to a category, or looking at an image to identify what it represents. Traditionally, this has been done either manually or through a complex rule-based system, both of which are costly, time-consuming and hard to manage.
Advances in ML have been well publicized. Many of us know already that computers have learned to paint in the style of Rembrandt, to write articles just like humans and to determine the 3D shape of proteins. But how do these exciting findings translate to the more mundane world of official statistics produced by National Statistical Offices (NSOs)? Governed by the Fundamental Principles of Official Statistics, there is a great weight of responsibility associated with each number they produce, so they cannot fall prey to hype.
A series of 21 pilot studies have been conducted as part of two initiatives: the UNECE High-Level Group for the Modernisation of Official Statistics (HLG-MOS) Machine Learning Project (2019-20) and the United Kingdom Office of National Statistics (ONS) - UNECE Machine Learning Group 2021. These studies made clear that for official statistics, the real difficulty begins when NSOs try to move from innovative, fun 'experiments' to everyday production of statistics. For this to happen, the exciting new solutions must be connected seamlessly into regular business processes. The significant changes in infrastructure, organizational structure and culture required to make this happen mean that many machine learning solutions for statistics end up being left on the shelf.
The new Guide draws on the findings of these pilot studies to identify the specific barriers to adoption in NSOs and offer recommendations for tackling them. The key message is that advancing ML in official statistics depends on two things.
The first is acceptance: ultimately, ML will only be used if it is widely accepted, both from a statistical and an ethical point of view. The Guide identifies a range of factors that are key to such acceptance, including the ability to clearly demonstrate added value, a visible and transparent respect for ethical legal considerations, and the alignment of the innovations with the business needs of the NSO.
The second prerequisite is facilitation. This means fostering a setting in which NSOs can make best use of the potential of ML. This comes from ensuring that offices have the right skills among their staff, the necessary computing infrastructure, a corporate strategy that favours research and development, and strong engagement throughout the NSO from technical staff to senior managers.
The Guide, and the project on which it is based, concludes that ML in official statistics is more than mere buzz; it is a must where it can add value, but it should not be used where it does not.
Keys to the acceptance of Machine Learning in Official Statistics
The field of ML is evolving fast, with new methods, platforms and approaches coming out every month. To keep up with the pace of change and avoid duplication of efforts, there is a great need for knowledge sharing and collaboration within the official statistics community. UNECE continues at the helm of these efforts this year, through the Machine Learning Group 2022 in partnership with the UK's ONS, to support statistical organizations to harness the power of machine learning.