12 November 2024
Developing a 'Safeguarded AI' programme with guaranteed safety standards
We are launching our second programme, 'Safeguarded AI', led by Programme Director David 'davidad' Dalrymple.
This initiative aims to uncover a novel pathway for AI safety by leveraging advanced AI systems to construct a targeted "gatekeeper” AI. This gatekeeper is designed to understand and reduce the safety risks of other AI agents, ensuring they only operate within agreed-upon guardrails for a given application (e.g., balancing electricity grids or optimising clinical trials), thus acting as a safety guarantee for the AI model or system.
"Current techniques designed to mitigate the risks of AI have serious limitations, and can't be relied upon in isolation to ensure the safety of these highly advanced technologies," explains ARIA Programme Director David 'davidad' Dalrymple. "We believe that by combining scientific world models with mathematical proofs, we can develop AI systems with the same kind of safety assurances we've come to expect from nuclear power and passenger aviation."
A successful outcome would harness the problem-solving abilities of advanced AI systems to improve the performance of critical infrastructure, where current approaches are too unreliable for high-stakes applications. For example, even small improvements in the energy sector could amount to huge savings – it cost the UK £7 billion to balance Britain's power grid over 2022/3.
To date, very little R&D has been invested in quantitative approaches to AI safety. With the aim of prototyping a new model and demonstrating scalable 'proof of concept,' we are directing £59 million over four years towards funding researchers and engineers from across a range of disciplines, sectors, and institutions.
The programme will be split into three Technical Areas (TAs), each with a specific aim and approach. Our first funding call for one of these areas (TA.1) is now live, comprising £3.5M of funding. We will open more funding calls across all three areas in the coming months.
"AI may prove to be the defining technology of the 21st century, accelerating progress and productivity in nearly every sector of the economy," said Ilan Gur, ARIA CEO. "But we'll only reap the benefits if we can develop systems whose outputs can be verified and relied upon. If this programme can demonstrate proof-of-concept, it will materially change how we think about AI safety, and help uncover a viable and scalable path to safe and transformative AI."