1 de julio de 2026July 1, 2026 · OpenAI

OpenAI lanza GeneBench-Pro para medir el rendimiento de la IA en biología computacionalOpenAI Releases GeneBench-Pro to Measure AI Performance in Computational Biology

Simplificado: Esto me dio curiosidad... y luego me hizo levantar una ceja. OpenAI acaba de soltar GeneBench-Pro, un nuevo sistema de evaluación (benchmark) para medir qué tan bien manejan los agentes de IA tareas complejas de biología computacional: genómica, medicina traslacional, ese tipo de ciencia seria. Son 129 problemas con datos ruidosos a propósito para simular condiciones reales. El modelo que saca la nota más alta es GPT-5.6 Sol Pro con 31.5%, y Claude Opus 4.8 llega al 16%. O sea, el que pone el examen también es el que presume la nota más alta. Es como si tu maestra de química diseñara el parcial y luego llegara al salón con el diploma en mano 🙄. Lo que sí me impresionó: con solo 31.5%, el mejor modelo del mundo todavía falla en 7 de cada 10 problemas reales de biología. No es crítica, es evidencia de lo difícil que es la ciencia de verdad. Para quienes construimos con IA en salud o investigación, esta vara existe y mide algo real.

Simplified: This one got my curiosity... and then made me raise an eyebrow. OpenAI just dropped GeneBench-Pro, a new evaluation benchmark that measures how well AI agents handle complex computational biology tasks: genomics, translational medicine, the kind of serious science that matters. 129 problems with deliberately noisy data to simulate real-world conditions. Top score goes to GPT-5.6 Sol Pro at 31.5%, with Claude Opus 4.8 reaching 16%. So the one who writes the test also walks away with the top grade. It is like if your chemistry teacher designed the exam and then showed up in class with their diploma in hand 🙄. What did impress me: at just 31.5%, the best model in the world still fails 7 out of 10 real biology problems. Not a knock, just proof of how genuinely hard real science is. For those of us building AI in health or research, this benchmark exists and it measures something real.

Leer en la fuenteRead at the source: OpenAI ↗

¿Quieres usar estas herramientas? Mira las reseñas sin filtro o vuelve a las noticias. Want to use these tools? See the unbiased reviews or back to the news.