OECD Workshop: Impact Evaluation & RCT Approaches

Australian Treasury

As is customary in Australia, I acknowledge the Ngunnawal people, on whose lands I am recording these remarks, and all First Nations people joining this international workshop.

Thank you to our OECD Public Management and budgeting colleagues, Jon Blondal, Andrew Blazey and the team for helping to coordinate this event and offering me the opportunity to provide this opening address. This event is being run by the OECD in collaboration with the Australian Centre for Evaluation in the Department of the Treasury. The Australian Government is delighted to be contributing to global efforts to advocate for better evidence. And we are keen to connect with international endeavours that promote its generation, synthesis and sharing in public policy.

Today, I want to discuss how countries can collaborate to better create and use evidence. This is a substantial reform. Indeed, I argue that randomised trials and better use of evidence isn't just another worthy public policy tweak. It's bigger than that. Much bigger. Effectively using evidence to make policy decisions is a public administration reform on par with the biggest changes in good government that humanity has put into place. It is the seventh phase of good government.

Let's take a quick moment to run through the major milestones in the history of public administration.

Six big reforms in the history of public administration

Throughout history, there have been 6 big reforms in public administration.

The first was the rise of bureaucracy and professionalised governance. It was during the 18th and 19th centuries that public administration shifted from patronage and informal systems to emphasising impartiality, specialisation, and accountability. Democratic institutions and a robust civil society provided the conditions for an independent and accountable civil service.

The second big reform occurred in the early 20th century. The efficiency revolution - scientific management of public administration that focused on efficiency and rational organisation - was inspired by industrial principles.

In response to economic crises and post‑WWII recovery, we saw the rise of the third big reform - the welfare state and the expansion of government responsibilities in social welfare, healthcare and economic planning.

The fourth big reform in public administration in the late 20th century was market‑oriented governance. We saw governments adopt private‑sector practices like outsourcing, performance metrics, and competition.

Concerns about accountability also carried through to the fifth big historic reform - the era of digital transformation and e‑governance. The early 21st century saw technology revolutionise public administration. It enabled data‑driven decision‑making and citizen engagement.

Building on the lessons learnt during the digital transformation, the past decade has seen the move towards adaptive governance - the sixth big reform in public administration. Top‑down processes were swapped out for more flexible, collaborative and cross‑sector approaches that embrace 'long‑term systems thinking' to address interconnected crises such as climate change (Brunner and Lynch 2017).

Each of these 6 big reforms from the past 3 centuries has helped to reshape government and improve citizens' lives.

The seventh big reform in public administration: randomised trials

Today I want to argue that we are on the cusp of a seventh big reform in public administration.

It will involve the widespread adoption of randomised trials as a means of testing policies by providing a counterfactual.

This reform should include the synthesis of quality evidence about what works, and what doesn't, to provide public administrators with irrefutable knowledge that can improve people's lives.

Let's consider a couple of examples to see how this might work in practice.

Eye care is often a neglected field of public health in developing economies.

In rural Bangladesh, a randomised trial of providing free reading glasses involved more than 800 adults with jobs requiring close attention to detail, such as tea pickers, weavers, and seamstresses (Jacobs 2024). The study found that when workers were given free reading glasses, they earned 33 per cent more than those who were not given glasses (Sehrin et al. 2024).

Speaking to The New York Times, Dr Nathan Congdon, one of the authors of the study findings, said that '…what makes the results especially exciting is the potential to convince governments that vision care interventions are as inexpensive, cost‑effective and life‑changing as anything else that we can offer in healthcare' (Jacobs 2024).

As well as garnering evidence on what does work, the widespread adoption of randomised trials must also include quality evidence about what doesn't work.

In 2014, the US state of Massachusetts launched a 4‑year intervention program called the Juvenile Justice Pay for Success Initiative (Patrick DL 2014). The program aimed to reduce recidivism and improve employment outcomes in young men who were at high risk of re‑offending (Third Sector 2024).

The initiative involved an experimental financial contract called 'Pay For Success' - also known as a social impact bond. Funders assumed the US$27 million up‑front financial risk. And the government would only refund the cost of the program if a third‑party evaluator and validator determined that the initiative achieved a reduction in the number of days the young men spent in jail, and improvements in their employment and job readiness (Patrick DL 2014).

At the end of the 4‑year program, a randomised trial found no discernible effects on reincarceration or employment (Coalition for Evidence‑Based Policy 2025). Neither the recidivism nor employment outcomes were sizable enough to trigger the repayment under the pay‑for‑success contract (Roca et al. 2025).

Why randomised trials should be prioritised over other forms of evaluation

When the evaluation of a social program does not produce the hoped‑for results, it's difficult to avoid feelings of disappointment.

But this has been the reality for some time.

We know from the history of large, well‑conducted randomised trial evaluations that only a small percentage find that the intervention being evaluated produces a meaningful improvement over the status quo.

As Peter Rossi attested in his 1987 Iron Law of Evaluation, 'The expected value of any net impact assessment of any large‑scale social program is zero' (Arnold Ventures 2018a).

But here's the light on the hill.

The 'iron law' applies to most fields of research. That includes medicine, where 50-80 per cent of positive results from initial clinical studies are overturned by a subsequent randomised trial (Arnold Ventures 2018a).

In medicine, the move towards randomised trials continues to save lives and stop unnecessary interventions.

For every new treatment such as AIDS drugs, the HPV vaccine and genetic testing - medicine has discarded old ones, like bloodletting, gastric freezing and tonsillectomy (Leigh 2018).

The willingness to test cures against placebos, or the best available alternative, is how we make progress. In public policy, we can do the same. If it works, we use it; if not, it's back to the lab.

The central goal of evaluation: finding interventions that work

The key is having a big, ambitious goal to strive towards.

I propose the primary goal of government evaluation should be to find interventions that work.

More specifically - to build a body of programs backed by strong, replicated randomised trial evidence of important, lasting improvements in people's lives.

In other words, evidence that provides policymakers with confidence that if another jurisdiction were to implement the program faithfully in a similar population, it would improve people's lives in a meaningful way.

Imagine being able to confidently draw from a codified body of social programs and interventions that your jurisdiction could test, deploy and regulate.

In the United States, the Coalition for Evidence‑Based Policy points towards Saga Education, a high‑dosage mathematics tutoring program for year 9 and 10 students in low‑income US schools that underwent 3 rigorous randomised trials. This program produced sizable, statistically significant effects on students' maths scores on the district tests at the end of the tutoring year (Arnold Ventures 2024a). I'll come back to this program a bit later.

Similarly, the Coalition for Evidence‑Based Policy points to 2 job‑training programs for low‑income adults that were both shown to increase long‑term earnings by 20 to 40 per cent. These programs focused on the fast‑growing IT and financial services sectors, where jobs are well paid, and employees are in high demand (Arnold Ventures 2022a and 2022b).

Finding interventions that work should be evaluators' central goal. It is the only plausible path by which rigorous evaluations will improve the human condition. If we don't allocate spending based on rigorous evidence, it is hard to see how governments can make progress on critical social problems.

Here in Australia, a think tank study examined a sample of 20 Australian Government programs conducted between 2015 and 2022 (Winzar et al. 2023).

Their report concluded that 95 per cent of the programs, which had a total expenditure of over A$200 billion, were not properly evaluated. And its analysis of Australian state and territory government evaluations reported similar results.

The researchers noted that the problems with evaluation started from the outset of program and policy design. They also estimated that fewer than 1.5 per cent of government evaluations use a randomised design (Winzar et al. 2023).

This finding echoes the Australian Productivity Commission's 2020 report into the evaluation of Indigenous programs (Productivity Commission 2020).

This report concluded that 'both the quality and usefulness of evaluations of policies and programs affecting Aboriginal and Torres Strait Islander people are lacking', and that 'Evaluation is often an afterthought rather than built into policy design' (Productivity Commission 2020).

Finding what works: using strong signals from prior research

If we accept that the central goal of evaluation is to find interventions that work, there are important implications for researchers and research funders.

It means that it makes sense to evaluate an intervention, using a large randomised trial, only if there is a strong signal in prior research.

Examples of prior research could include a pilot randomised trial, a high‑quality quasi‑experiment, or a randomised trial of a related program.

This is the approach that Arnold Ventures is taking in the US via the Coalition for Evidence‑Based Policy, the US nonprofit relaunched under the leadership of Jon Baron (Coalition for Evidence‑Based Policy n.d.).

Rigorous testing enabled Arnold Ventures to create a growing body of proven interventions in education and training (Coalition for Evidence‑Based Policy n.d.). It's an approach also being used by the US Department of Education in its Investing in Innovation Fund, which was recently renamed the Education Innovation and Research Program. It has yielded a much higher success rate in identifying interventions with true effectiveness. In 2019, robust evidence standards used by the Fund (as it was at the time) resulted in positive impacts for 40 to 50 per cent of its larger grants.

Compare this to the US Department of Health and Human Services' Teen Pregnancy Prevention Program, which had a much lower hit rate of success - just 17 per cent - for its larger grants (Arnold Ventures 2019).

Arnold Ventures (2018b) proposes a strategy for policy and researchers that involves 3 tiers of evidence - top, middle and low.

Expand the implementation of programs backed by strong ('top tier') evidence of sizable, sustained effects on important life outcomes.

Fund and/or conduct rigorous evaluations of programs backed by highly promising ('middle tier') evidence, to hopefully move them into the top tier.

Build the pipeline of promising programs through modest investments in the development and initial testing of many diverse approaches (as part of a 'lower tier').

This is about systematising our use of evidence: a familiar approach in medicine, but one that has not been standard practice for all policymakers.

It is about producing tangible proof that randomised policy trials improve lives, in that way that we already have tangible proof that randomised medical trials save lives.

As a specific example of this kind of approach, in the US state of Maryland, a partnership between Arnold Ventures and the state government is already scaling‑up proven programs.

In August last year, the high‑dosage maths tutoring program for 9th and 10th graders I mentioned earlier (Saga Education) and ASSISTments - an educational tool for mathematics - received scale‑up funding under the US$20 million Maryland Partnership for Proven Programs with Arnold Ventures (Arnold Ventures 2024b).

In the UK, the development of the What Works Network is a world‑leading achievement which owes credit to the network of evidence‑based policymakers. That includes the extraordinary David Halpern, who will be speaking on the panel shortly (for an excellent snapshot of his recommendations for the coming decade, see Halpern 2023).

Across health and housing, education and employment, hundreds of UK randomised trials have been conducted. For a practitioner, policymaker or curious member of the British public, it is now easier than ever to see what we know, and what we do not (Leigh 2024a).

For example, the Education Endowment Foundation has run literally hundreds of randomised trials in the education sector. It uses these findings, alongside rigorous evaluations conducted outside the UK, to advocate for evidence‑based education policies (Education Endowment Foundation n.d.).

The Education Endowment Foundation has commissioned 316 research projects (208 of which are randomised trials). Sixty per cent of schools in England have taken part in a randomised trial funded by the Foundation. Seventy per cent of school leaders use the Education Endowment Foundation's teaching and learning toolkit when making their funding decisions on spending for pupils from disadvantaged backgrounds.

Here in Australia, we are committed to taking a stronger approach towards evidence‑based policymaking.

In July 2023 we established the Australian Centre for Evaluation in the Department of the Treasury.

The main role of the centre is to collaborate with other Australian Government departments to conduct rigorous evaluations, including randomised trials. Such agreements have already been forged with federal agencies responsible for employment, health, education and social services.

Led by Eleanor Williams, armed with a modest budget of A$2 million per year and just over a dozen staff, the Centre operates on smarts and gentle persuasion, not mandates or orders (Leigh 2024b).

No agency is forced to use the services of the Australian Centre for Evaluation, but all are encouraged to do so. This reflects the reality that evaluation, unlike audit, isn't something that can be done as an afterthought. A high‑quality impact evaluation needs to be built into the design of a program from the outset (Leigh 2024b).

The centre takes an active role in considering aspects that are relevant to all evaluations, such as rigorous ethical review and access to administrative microdata. The Australian Bureau of Statistics is playing a pivotal role in brokering access to administrative data for policy experiments.

Collaboration with evaluation researchers outside of government is critical, too. Thanks to a joint initiative by the Centre and the Australian Education Research Organisation, we now have the Impact Evaluation Practitioners Network, which is bringing together government and external impact evaluators.

The centre has several randomised trials currently underway, and I await the results with interest.

In the next month, the centre will release a Randomised Controlled Trial Showcase Report, featuring examples of public policy‑related trials in Australia.

Another organisation doing extraordinarily thorough research across the whole of social policy and the social sciences is the nonprofit Campbell Collaboration.

For example, the Campbell Countering Violent Extremism evidence synthesis program is a global research initiative that is attracting attention here in Australia. The program originated from a 5‑country partnership of Australia, Canada, New Zealand, the UK and the US (Campbell Collaboration n.d.). Professor Lorraine Mazerolle from the University of Queensland is one of the principal investigators on the program (Campbell Collaboration n.d.).

Creating an experimenting society

Bringing a 'what works' philosophy to social policy is vital to helping the most vulnerable.

And it is by no means a new idea. It follows the path forged by the prominent social scientist Donald Campbell.

He is of course, the 'Campbell' in the Campbell Collaboration, which was named after him to honour his substantial contributions to social science and methodology.

Over 50 years ago, Dr Campbell wrote Methods for the Experimenting Society, outlining his vision for helping governments to produce better‑informed policies and social interventions via research and evaluation (Campbell 1991).^[1]

In this paper, Campbell forewarns policymakers of the 'over‑advocacy trap', where advocates of a new social program or policy make exaggerated claims about its effectiveness in order to get it adopted (Campbell 1991). He effectively highlights the tension between the need for strong advocacy to get social programs funded and adopted, and the need for rigorous evaluation to determine their true effectiveness (Campbell 1991).

Thirty years after Dr Campbell wrote Methods for the Experimenting Society, the US Department of Education was allocating over a billion US dollars each year to an after‑school program called the 21st Century Community Learning Center initiative.

The program, which was initiated in 1998, saw children attending the centres for up to 4 hours of after‑school programs, where they partook in everything from tutoring to drama to sports. It attracted high‑profile advocates, including the former Californian governor and Mr Universe, Arnold Schwarzenegger.

It's no wonder then, that a randomised trial by Mathematica in 2003 startled everyone with its findings (Haskins 2009). Attending the after‑school program raised a child's likelihood of being suspended from school (Leigh 2018). And there was no evidence that the after‑school program improved academic outcomes.

The program's prominent advocates had fallen head‑first into the over‑advocacy trap.

Overcoming denial with collaboration and momentum

American political scientist Ron Haskins commented on how easy it was for Schwarzenegger to flex his celebrity muscle to overcome a negative evaluation. 'The lesson here, yet again, is that good evidence does not speak for itself in the policy process and is only one - sometimes a rather puny - element in a policy debate' (Haskins 2009).

Overcoming denial in the face of irrefutable evidence requires continuous collaboration and sustained momentum. In 2025 and beyond, we will need both to reach the tipping point on the widespread use of rigorous impact evaluation across public policy. It will be harder to run roughshod over good evidence if OECD nations continue to collaborate - both internally with non‑profit researchers outside of government, and externally with other nations.

Philanthropic foundations in the UK, US and other OECD nations have a strong track record in supporting randomised policy trials. Initiatives such as the Maryland Partnership for Proven Programs and Arnold Ventures, which I mentioned earlier, demonstrate that the 'what works' philosophy in social policy is gaining traction.

Here in Australia, the Paul Ramsay Foundation launched a A$2.1 million open grant round in 2024. Its structure is similar to a successful model that the Laura and John Arnold Foundation has deployed in the United States over the past decade (Leigh 2024c).

The grants, which last for 3 years and are valued at up to A$300,000 each, will support up to 7 experimental evaluations conducted by non‑profits with a social impact mission. For example, improving education outcomes for young people with disabilities, reducing domestic and family violence, or helping jobless people find work (Paul Ramsay Foundation 2024).

The Australian Centre for Evaluation supported the open grant round, and is helping to connect grantees with administrative data relevant to the evaluation, and I am excited to see what we learn from these studies (Leigh 2024b).

One of the most appealing advantages of well‑conducted randomised trials is that they resonate well with 3 democratic principles: non‑arbitrariness, revisability and public justification (Tanasoca and Leigh 2023).

This gives us good democratic reasons to seek out such evidence for policymaking. Indeed, the more democratic a regime is, the more likely it is to conduct randomised trials (Tanasoca and Leigh 2023).

Recall the first big public administration reform - the growth of a professionalised civil service - rested on the development of democratic institutions. Nobel laureates Daron Acemoglu and James Robinson call this the 'red queen effect', in which societies offering more public goods also need to offer more democratic social power (Acemoglu and Robinson 2019).

The seventh reform - randomised trials and evidence‑based policymaking - takes us further along the corridor. Things are not true simply because politicians assert them. Policies must be backed by evidence, and citizens must be able to test and trust that evidence.

Democracies are on this journey together, and international collaboration is vital to reaching the tipping point.

This is not about the performative use of words like 'evaluation' and 'evidence'. It is about raising the quality and quantity of evidence, which is one reason that I keep referring to randomised trials. I acknowledge the work of the OECD towards achieving the goal of institutionalising rigorous evaluation across public policy areas, as per the OECD Recommendation of the Council on Public Policy Evaluation (OECD 2022).

The second annual update of the Global Commission on Evidence also confirms the many signs of momentum towards the Commission's 3 implementation priorities to formalise and strengthen domestic evidence‑support systems, enhance and leverage the global evidence architecture, and put evidence at the centre of everyday life (Global Commission on Evidence 2024).

Conclusion

We're here because we care about good government. And because we understand that evaluation and evidence science are not fields in their infancy.

Just as we don't put homeopathy on the same level as science‑based medicine, it is a mistake to think that evidence‑free policy is on a par with evidence‑based policy.

OECD governments have decades of experience about how to identify evidence gaps, put policies to the test, and implement the most effective programs (Leigh 2024a).

Policymaking by focus groups and gut‑feel alone is the modern‑day equivalent of bloodletting and lobotomies in medicine (Leigh 2024a). Which is why the seventh big reform to public administration must focus on finding interventions that work. And on building a body of programs backed by strong, replicated randomised trial evidence of important, lasting improvements in people's lives.

This goal requires OECD nations to get behind the momentum of the Global Commission on Evidence.

This will have massive benefits. It will save lives. It will save dollars. And it will make government work better.

So let's make it happen.

My thanks to officials in the Australian Centre for Evaluation for valuable drafting assistance, and to Jon Baron, President and CEO of the Coalition for Evidence‑Based Policy, and David Halpern CBE, President Emeritus at the Behavioural Insights Team, for valuable discussions that helped shape this speech.

References

Acemoglu D and Robinson JA (2019) The Narrow Corridor: States, Societies, and the Fate of Liberty, Penguin, New York.

Arnold Ventures (21 March 2018a) 'How to solve U.S. social problems when most rigorous program evaluations find disappointing effects (part one in a series)', Straight Talk on Evidence, accessed 15 January 2025.

Arnold Ventures (13 April 2018b) 'How to solve U.S. social problems when most rigorous program evaluations find disappointing effects (part 2 - a proposed solution)', Straight Talk on Evidence, accessed 15 January 2025.

Arnold Ventures (18 June 2019) 'Evidence‑Based Policy 'Lite' Won't Solve U.S. Social Problems: The Case of HHS's Teen Pregnancy Prevention Program', Straight Talk on Evidence, accessed 15 January 2025.

Arnold Ventures (26 October 2022a) 'Year Up', Social Programs That Work, accessed 15 January 2025.

Arnold Ventures (21 March 2022b) 'Per Scholas Employment/Training Program for Low-Income Workers', Social Programs That Work, accessed 15 January 2025.

Arnold Ventures (11 July 2024a) 'Saga Math Tutoring', Social Programs That Work, accessed 15 January 2025.

Arnold Ventures (28 August 2024b) Governor Moore Announces $20 Million in Grants for Education Programs, First Awards Under Maryland Partnership for Proven Programs with Arnold Ventures [media release], Arnold Ventures, accessed 16 January 2025.

Australian Education Research Organisation (n.d.), About us, Australian Education Research Organisation website, accessed 22 January 2025.

Brunner R and Lynch A (2017) 'Adaptive Governance', Oxford Research Encyclopedia of Climate Science, doi:10.1093/acrefore/9780190228620.013.601.

Campbell Collaboration (n.d.) Our work, Campbell Collaboration website, accessed 16 January 2025.

Campbell Collaboration (n.d.) About the CVE programme, Campbell Collaboration website, accessed 21 January 2025.

Campbell DT (1991) 'Methods for the Experimenting Society', Evaluation Practice, 12(3):223-260.

Education Endowment Foundation (n.d.) How we work, Education Endowment Foundation website, accessed 22 January 2025.

Global Commission on Evidence to Address Societal Challenges (2024), 'Global Evidence Commission update 2024: Building momentum in strengthening domestic evidence‑support systems, enhancing the global evidence architecture, and putting evidence at the centre of everyday life' [PDF 5MB], McMaster Health Forum, Hamilton, accessed 17 January 2025.

Halpern D (2023) 'Foreword', in Sanders M and Breckon J (eds) The What Works Centres: Lessons and Insights from an Evidence Movement, Bristol University Press, Bristol.

Haskins R (17-18 August 2009) 'Chapter 3 With a scope so wide: using evidence to innovate, improve, manage, budget' [roundtablee presentation] Strengthening Evidence‑based Policy in the Australian Federation, Session 1 Evidence‑based policy: Its principles and development Canberra, accessed 16 January 2025.

Jacobs A (4 April 2024) 'Glasses Improve Income, Not Just Eyesight', The New York Times, accessed 15 January 2025.

Leigh A (2018) Randomistas: How Radical Researchers Changed Our World, Black Inc, Melbourne.

Leigh A (3 October 2024a) 'Address to the UK Evaluation Task Force, 9 Downing Street, London' [presentation], London, accessed 15 January 2025.

Leigh A (17 June 2024) 'Address to the Australian Evaluation Showcase, Canberra' [presentation], Australian Evaluation Showcase, Canberra, accessed 15 January 2025.

Leigh A (28 November 2024c) 'Address to 10th Annual Social Impact Measurement Network Australia Awards' [presentation], 10th Annual Social Impact Measurement Network Australia Awards, Virtual, accessed 17 January 2025.

OECD (Organisation for Economic Co‑operation and Development) (2022) Recommendation of the Council on Public Policy Evaluation, Adopted on 06/07/2022, OECD Legal Instruments, OECD/LEGAL/0478, accessed 17 January 2025.

Patrick DL (29 January 2014) Massachusetts Launches Landmark Initiative to Reduce Recidivism Among At‑Risk Youth [media release], Commonwealth of Massachusetts, accessed 14 January 2025.

Paul Ramsay Foundation (17 June 2024) 'Experimental evaluation open grant round', Paul Ramsay Foundation, accessed 17 January 2025.

Productivity Commission (2020) Indigenous Evaluation Strategy: Background Paper, Australian Government.

Roca Inc., Commonwealth of Massachusetts, and Third Sector Capital Partners (30 August 2024) Final Report: the Massachusetts Juvenile Justice Pay for Success project, accessed 14 January 2025.

Sehrin F, Jin L, Naher K, Chandra Das N, Chan VF, Li DF, Bergson S, Gudwin E, Clarke M, Stephan T and Congdon N (2024) 'The effect on income of providing near vision correction to workers in Bangladesh: The THRIVE (Tradespeople and Hand‑workers Rural Initiative for a Vision‑enhanced Economy) randomized controlled trial', PLOS ONE, 19(4):e0296115, doi:10.1371/journal.pone.0296115.

Tanasoca A and Leigh A (2024) 'The Democratic Virtues of Randomized Trials', Moral Philosophy and Politics, 22(1):113-140, doi:10.1515/mopp‑2022-0039.

Winzar C, Tofts‑Len S, Corpu E (2023) Disrupting disadvantage 3: Finding what works, Committee for Economic Development of Australia, Melbourne, accessed 16 January 2025.

Footnotes

^[1] Campbell's paper was written around 1971 and used in presentations to the Eastern Psychological Association and the American Psychological Association. It was revised and first published in 1988 (see Campbell 1991).

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.