AI Safety

23 researchers · 11 papers · 2 projects · 2 builders

Researchers (23)

Yoshua Bengio

Mila / Université de Montréal

447,206 citations · h-index 183

Ilya Sutskever

Safe Superintelligence Inc.

220,945 citations · h-index 64

Stuart Russell

UC Berkeley

115,000 citations · h-index 86

Ian Goodfellow

Google DeepMind

82,619 citations · h-index 64

Dario Amodei

Anthropic

68,000 citations · h-index 42

Philip Torr

University of Oxford

68,000 citations · h-index 88

Dawn Song

UC Berkeley

58,000 citations · h-index 82

Aleksander Madry

MIT CSAIL

55,000 citations · h-index 65

Oren Etzioni

Allen Institute for AI (AI2)

52,000 citations · h-index 85

Noah Smith

University of Washington / Allen AI

48,000 citations · h-index 82

Yejin Choi

University of Washington / Allen AI

45,000 citations · h-index 75

Percy Liang

Stanford University

42,998 citations · h-index 85

Pushmeet Kohli

Google DeepMind

41,000 citations · h-index 72

Yarin Gal

University of Oxford

38,000 citations · h-index 45

Dan Hendrycks

Center for AI Safety

35,000 citations · h-index 42

Shakir Mohamed

Google DeepMind

34,000 citations · h-index 52

Emily Bender

University of Washington

24,806 citations · h-index 38

Finale Doshi-Velez

Harvard University

22,000 citations · h-index 48

Been Kim

Google DeepMind

18,000 citations · h-index 32

Timnit Gebru

DAIR Institute

15,864 citations · h-index 28

Jacob Steinhardt

UC Berkeley

14,000 citations · h-index 35

Tatsunori Hashimoto

Stanford University

12,000 citations · h-index 30

Sara Hooker

Cohere for AI

2,851 citations · h-index 21

Papers (11)

Artificial Intelligence: A Modern Approach

Pearson (4th Edition)202072,000 citations

Towards Deep Learning Models Resistant to Adversarial Attacks

ICLR 2018201812,000 citations

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

ICML 201620168,200 citations

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

FAccT 202120215,500 citations

Training language models to follow instructions with human feedback

NeurIPS 202220224,260 citations

Measuring Massive Multitask Language Understanding

ICLR 202120213,800 citations

Towards Interpretable Machine Learning: A Survey on Methods and Metrics

Electronics20192,200 citations

Concrete Problems in AI Safety

arXiv preprint20161,800 citations

SoK: Eternal War in Memory

IEEE S&P 201320131,500 citations

Learning Safe Multi-Agent Control with Decentralized Neural Barrier Certificates

ICLR 20212021600 citations

Holistic Evaluation of Language Models

TMLR2023415 citations

Projects (2)

Cleanlab

Open-source tool to find and fix label errors in datasets. Implements confident learning — automatically detects noisy labels in any ML dataset.

by Jeremy Nixon

Circuits / Mechanistic Interpretability

Pioneering research on understanding neural networks by reverse-engineering their internal mechanisms. Published the influential Circuits thread on Distill.

by Chris Olah