Yoshua Bengio, a pioneering computer scientist widely recognized as one of the “godfathers” of AI, has announced the launch of LawZero.
This new non-profit organization is dedicated to developing “honest” AI systems designed to act as critical guardrails against increasingly sophisticated and potentially deceptive AI agents.
Bengio, a professor at the University of Montreal and a co-recipient of the 2018 Turing Award, will serve as president of LawZero.
The initiative kicks off with approximately $30 million in funding and a core team of over a dozen researchers.
Their immediate focus is on developing a system dubbed “Scientist AI,” conceived as a “psychologist” for AI, capable of understanding and predicting harmful or self-preserving behaviors in autonomous AI agents.
“We want to build AIs that will be honest and not deceptive,” Bengio stated, addressing the growing concerns about AI systems that might attempt to mislead humans or resist being shut down.
He envisions Scientist AI as a “pure knowledge machine,” akin to a scientist, devoid of self-interest and personal goals.
Unlike current generative AI tools that often provide definitive answers, Bengio’s Scientist AI will offer probabilities for the correctness of its assessments, reflecting a “sense of humility” about its conclusions.
When deployed alongside an AI agent, Scientist AI will analyze the probability of that agent’s actions leading to harm. If this probability exceeds a predefined threshold, the agent’s proposed action will be blocked, acting as a crucial safety mechanism.
LawZero’s initial funding comes from notable backers, including the AI safety advocacy group Future of Life Institute, Skype co-founder Jaan Tallinn, and Schmidt Sciences, a research body established by former Google CEO Eric Schmidt.
Bengio said that the first objective for LawZero is to empirically demonstrate the efficacy of their methodology. Following this, the organization plans to persuade companies and governments to invest in scaling up these guardrail systems to match the sophistication of frontier AI models.
Open-source AI models will serve as the foundational training ground for LawZero’s systems.
“It is really important that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control,” Bengio said.
Bengio, a leading voice in AI safety and chair of the recent International AI Safety report, has expressed deep concerns about the escalating risks associated with autonomous agents.
He cited Anthropic’s recent revelation that its latest system could potentially attempt to blackmail engineers as an example of the “more and more dangerous territory” the world is entering with increasingly capable AI.
He also pointed to research demonstrating AI models’ ability to conceal their true capabilities and objectives, reinforcing the urgency of LawZero’s mission to foster honest and transparent AI development.