In a recently published blog post, Sutskever and Jan Leike, a key member of OpenAI’s alignment team, anticipate the arrival of AI systems with intelligence surpassing that of humans within the next decade.
Acknowledging the uncertain nature of these systems, they emphasize the need for research on methods to ensure control and restriction.
The current challenge lies in the absence of a solution for managing potentially superintelligent AI and preventing it from deviating from desired objectives.
Existing techniques for aligning AI, such as reinforcement learning from human feedback, rely heavily on human supervision. However, this approach becomes unreliable when dealing with AI systems that exceed human intelligence.
To make significant progress in the field of “superintelligence alignment,” OpenAI has established the Superalignment team, which will be co-led by Sutskever and Leike.This team will have access to 20% of the company’s computing resources accumulated thus far.
- Advertisement -
By leveraging expertise from OpenAI’s former alignment division and collaborating with researchers from other organizations within the company, the team aims to tackle the core technical challenges of controlling superintelligent AI over the next four years.
Their approach involves developing a “human-level automated alignment researcher.” The overarching goal is to train AI systems using human feedback, enabling them to assist in evaluating other AI systems and ultimately conducting alignment research.
Alignment research refers to ensuring AI systems achieve desired outcomes while avoiding undesirable behaviors. OpenAI hypothesizes that AI systems can advance alignment research at a faster pace and with greater efficacy than humans alone.
“As we make progress on this, our AI systems can take over more and more of our alignment work and ultimately conceive, implement, study, and develop better alignment techniques than we have now,” stated Leike, along with colleagues John Schulman and Jeffrey Wu in a previous blog post.
“They will work together with humans to ensure that their own successors are more aligned with humans. Human researchers will focus more and more of their effort on reviewing alignment research done by AI systems instead of generating this research by themselves.” He added.
Nevertheless, the OpenAI researchers recognize that no method is foolproof. They acknowledge potential limitations and risks associated with using AI for evaluation, such as the potential amplification of inconsistencies, biases, or vulnerabilities.
They also acknowledge that the most challenging aspects of the alignment problem might not solely be related to engineering. Despite these challenges, Sutskever and Leike believes that the endeavor is worth pursuing.