Training Window Adjusted for Migration Model

This page explains how we determined the new training window for the international migration model. The training window is the rolling period of historical border crossing data used to train the model for provisional migration estimates. Our goal is to reduce the size of the initial revisions and standard errors by adjusting this window.

Since the international migration model has been operating - starting with the November 2018 provisional migration estimates (published in January 2019) - the training window has been the 36 most recent finalised months.

Starting with the June 2024 provisional migration estimates (to be published on 13 August 2024), Stats NZ will adopt a new training window which includes the 32 most recent finalised months, plus finalised border crossings from the four oldest provisional months. This change in the training window produces smaller standard errors and revisions for provisional estimates. Final migration estimates are unaffected by this change, because in those estimates the migrant status of all travellers is certain and no modelling is required.

How is migration measured?

The outcomes-based measure of migration, with provisional and final estimates, is the official way we measure international migration in New Zealand. To classify a border crossing as a migrant movement, we need to observe up to 16 months of travel after that border crossing. It therefore takes 17 months before final migration estimates are available, using the 12/16-month rule.

We use a machine-learning model to classify travellers whose migrant status is uncertain. The proportion of travellers in each month who have an uncertain migrant status, and thus require modelling, is highest in the most recent months being estimated.

The model learns the features of border crossings that make them more or less likely to be migrant crossings, by looking at historical arrivals and departures for which the migrant status is known (finalised). Since outcome-based measurement has been implemented, a three-year (36-month) rolling window of historical journeys has been used to train the model. The window ends with the most recent month for which migrant status has been finalised. For example, for migration estimates up to May 2024 (published in July 2024), the training window was from February 2020 to January 2023 inclusive.

Revisions and standard errors

Through early 2024, we observed higher than usual revisions (estimates for November 2023 and December 2023) and increased standard errors (estimates for March 2024, April 2024, and May 2024) for the international migration model. This was largely due to the impact of COVID-19 border restrictions (March 2020 to July 2022) on our training window.

For example, the first revision of December 2023 monthly migrant departures was down 31 percent, compared with December 2019 (down 17 percent) and December 2018 (down 11 percent). Table 1 shows the standard errors using March 2024 as an example.

Table 1

Standard errors in first estimates for international migration model, migration by direction, March month
Standard error
March 2024
Range of standard errors
March months 2019-2023
Migrant arrivals1,409350-600
Migrant departures941150-400
Net migration1,517200-700

These elevated revisions and standard errors suggest the training data was less representative of current traveller behaviour than in the past. For provisional estimates published in late 2023 to early 2024, the training window spanned the entire period of New Zealand's COVID-19 pandemic-related border restrictions. During the pandemic period, the proportion of border crossings that were migrants was significantly higher than before and after (Figure 1), and this impacted the provisional migration estimates.

The percentage of border crossings classified as migrants or non-migrants between January 2018 and May 2023 has varied:

  • From January 2018 to February 2020, 1 to 2 percent of border crossings were migrants.
  • From March 2020 to March 2022, migrants ranged from 3 to 43 percent, which reflected greatly reduced short-term (less than 12 months) border crossings rather than more migrants.
  • After March 2022, the percentage of migrants ranged between 3 and 6 percent.

Figure 1

Month-YearNon-migrantsMigrants
Jan-18982
Feb-18982
Mar-18982
Apr-18991
May-18982
Jun-18982
Jul-18982
Aug-18982
Sep-18982
Oct-18982
Nov-18982
Dec-18982
Jan-19982
Feb-19982
Mar-19991
Apr-19991
May-19982
Jun-19982
Jul-19982
Aug-19982
Sep-19982
Oct-19982
Nov-19982
Dec-19982
Jan-20982
Feb-20982
Mar-20973
Apr-20946
May-208515
Jun-208119
Jul-207327
Aug-206634
Sep-206337
Oct-206436
Nov-206139
Dec-205941
Jan-216040
Feb-215743
Mar-215941
Apr-218614
May-21946
Jun-21937
Jul-21928
Aug-218218
Sep-216436
Oct-216436
Nov-216535
Dec-216535
Jan-227228
Feb-227822
Mar-228812
Apr-22946
May-22955
Jun-22964
Jul-22973
Aug-22973
Sep-22973
Oct-22973
Nov-22973
Dec-22973
Jan-23973
Feb-23964
Mar-23964
Apr-23973
May-23964

Including the months most impacted by COVID-19 (with an unstable migration pattern) in the training data results in high revisions and high standard errors. Excluding the oldest months and including mostly-finalised months, with migration that better reflect post-COVID-19 migration dynamics, could improve the earlier estimates. We conducted experiments to determine a more appropriate training window.

Training window experiment

We explored a range of models with different length and type of training window and compared them with the existing 36-month model:

  • 24-month model: rolling training window with 24 months of most recent finalised months.
  • 32+4-month model: rolling training window with 32 months of most recent finalised months plus finalised crossing for the four oldest provisional months.
  • 36+4-month model: rolling training window with 36 months of most recent finalised months plus finalised crossing for the four oldest provisional months.
  • 36-fixed month model: fixed training window with 36 months of finalised months from January 2016 to December 2018.

Key findings

  • The 24-month model shows improvements in revisions and standard errors. However, analysis carried out in 2019 (before the 36-month model was implemented) suggested that a 24-month model gives lower accuracy and higher relative errors compared with the 36-month model.
  • The 32+4-month model shifts the training window forward by 4 months. Over 99 percent of border crossings have finalised migrant status in the four oldest provisional months, allaying concerns that this model is biased to the simplest travel histories. This reduces the time between the final training month and the first estimation month from 16 to 12 months.
  • The 36+4-month model includes three occurrences of 8 months from a year, and four occurrences of the other 4 months. This results in an uneven distribution of training data, with a disproportionate amount of information concentrated in those 4 months.
  • The 36-fixed month model produces much higher standard errors than the 36-month model and will not reflect changes in migration patterns as we move further away from the 2016 to 2018 training window.

Comparison of 36-month and 32+4-month rolling training windows

Revisions

Figure 2 shows international migrant estimates by direction for models using the 36-month and the 32+4-month rolling training window, for the November 2023, December 2023, and January 2024 months.

Figure 2

This three-by-three matrix of line graphs shows estimates of international migration by direction, monthly, November 2023-January 2024. See the text alternative under image.

Text alternative for Figure 2 Estimates of international migration by direction, monthly, November 2023-January 2024
The graphs show monthly revisions to provisional migrant arrivals, migrant departures, and net migration for each month from November 2023 to January 2024 using two different training windows. The data is shown in a three-by-three matrix of line graphs where the columns represent November 2023 (left), December 2023 (middle), and January 2024 (right); and the rows represent migrant arrivals (top), migrant departures (middle), and net migration (bottom). The vertical axis shows estimated number of migrants. Time of estimates are labelled on the horizontal axis. Source: Stats NZ.

For November 2023 and December 2023, the model using the 32+4-month rolling training window produces lower estimates of migrant arrivals and departures than the model using the 36-month rolling training window for the first through fourth estimates. For January 2024 arrivals, the model using the 32+4-month rolling training window produces higher estimates than the model using the 36-month rolling training window for the first through fourth estimates.

Estimates are more aligned between the two models from the fifth estimate onwards.

The second estimates of November 2023 departures, December 2023 departures, December 2023 arrivals, and January 2024 departures were closer than the first estimates, indicating smaller revision sizes.

Standard errors

For migrant arrivals, migrant departures, and net migration, the model using the 32+4-month rolling training window produces lower standard errors than the model using the 36-month rolling training window for March, April, and May 2024.

Figure 3

Month-Year-Direction32+4-month rolling training window36-month rolling training window
Mar-2024 Arrivals3281409
Mar-2024 Departures266941
Mar-2024 Net4071517
Apr-2024 Arrivals3131527
Apr-2024 Departures265724
Apr-2024 Net4231758
May-2024 Arrivals326906
May-2024 Departures501716
May-2024 Net6161142

Conclusion

In most cases, using the 32+4-month rolling training window gives smaller revisions and lower standard errors than the model using the 36-month rolling training window. In other periods tested, it performs at least as well as the 36-month rolling training window. Model estimates using 36-month and 32+4-month rolling training window are similar to each other after the fourth estimate.

This change in training window is intended to provide long-term improvements and stability in the accuracy of migration estimates. Regardless of what training window is selected, the timeliness of provisional migration estimates mean that revisions are inevitable.

The 32+4-month rolling training window reduces the size of initial revisions and standard errors and will be used from June 2024 provisional estimates (to be published on 13 August 2024).

Enquiries

Dave Adair
0508 525 525
[email protected]

ISBN 978-1-991307-06-4

/Stats NZ Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.