TELSAT.AZ - Create ads
  • 1391

New "AGI" Benchmark Designed for Dangerous AI

Scientists are working on a new "AGI"—Artificial General Intelligence benchmark. This will consist of 75 challenging tests aimed at measuring the "malicious impacts" of future AI models.

As advancements in artificial intelligence continue at a rapid pace, OpenAI scientists have developed a new benchmark. Known as "MLE-bench", this benchmark consists of 75 extremely difficult tests designed to evaluate future advanced AIs’ ability to modify their own code and improve themselves.

The MLE-bench benchmark is a compilation of 75 Kaggle tests, each designed to assess machine learning engineering skills. This research involves training AI models, preparing datasets, and conducting scientific experiments, with the aim of evaluating how well machine learning algorithms perform specific tasks in the real world.

OpenAI scientists have developed MLE-bench to measure the performance of AI models in the field of autonomous machine learning engineering. These tests are considered among the toughest challenges AI can face.


Risks and Rewards are High

Researchers highlight that if AI agents can autonomously perform machine learning research tasks, it could accelerate scientific advancements in fields like healthcare, climate science, and more. However, if these capabilities evolve unchecked, they could lead to catastrophic consequences. AI agents are essentially autonomous intelligent systems that perform specific tasks without human intervention.

On the other hand, researchers warn that if innovations in AI outpace our ability to understand their effects, there is a risk of models emerging with "destructive impacts" and potential for "misuse". Any model capable of solving the majority of MLE-bench challenges would likely be able to independently handle many open-ended machine learning tasks, including self-improvement.

Scientists tested OpenAI's most powerful AI model, the o1, on MLE-bench. The OpenAI o1 model managed to reach at least one Kaggle bronze medal level in 16.9% of the 75 tests. As more trials were conducted, this percentage increased. Achieving a bronze medal means placing in the top 40% of human participants on Kaggle leaderboards. The OpenAI o1 model earned an average of seven gold medals, double the level required for a human to be considered a "Kaggle Grandmaster."

All Azerbaijan
Agdam
Agdash
Agjabedi
Agstafa
Agsu
Astara
Babek
Baku
Balaken
Barda
Beylagan
Bilasuvar
Dashkesan
Fuzuli
Gadabay
Gakh
Ganja
Gazakh
Gobustan
Goranboy
Goychay
Goygol
Goytapa
Guba
Gubadly
Gusar
Hajigabul
Horadiz
Imishli
Ismayilli
Jabrail
Jalilabad
Julfa
Kalbajar
Kangarli
Khachmaz
Khirdalan
Khizi
Khojali
Khojavand
Khudat
Kurdemir
Lachin
Lankaran
Lerik
Masalli
Mingechevir
Nabran
Naftalan
Nakhchivan
Neftchala
Oguz
Ordubad
Qabala
Saatly
Sabirabad
Sadarak
Salyan
Samukh
Shabran
Shahbuz
Shamakhi
Shamkir
Sharur
Sheki
Shirvan
Shusha
Siyazan
Sumgait
Tartar
Tovuz
Ujar
Yardimli
Yevlakh
Zagatala
Zangilan
Zardab