AI Safety
23 researchers · 11 papers · 2 projects · 2 builders
Researchers (23)
Yoshua Bengio
Mila / Université de Montréal
447,206 citations · h-index 183
Ilya Sutskever
Safe Superintelligence Inc.
220,945 citations · h-index 64
Stuart Russell
UC Berkeley
115,000 citations · h-index 86
Ian Goodfellow
Google DeepMind
82,619 citations · h-index 64
Dario Amodei
Anthropic
68,000 citations · h-index 42
Philip Torr
University of Oxford
68,000 citations · h-index 88
Dawn Song
UC Berkeley
58,000 citations · h-index 82
Aleksander Madry
MIT CSAIL
55,000 citations · h-index 65
Oren Etzioni
Allen Institute for AI (AI2)
52,000 citations · h-index 85
Noah Smith
University of Washington / Allen AI
48,000 citations · h-index 82
Yejin Choi
University of Washington / Allen AI
45,000 citations · h-index 75
Percy Liang
Stanford University
42,998 citations · h-index 85
Pushmeet Kohli
Google DeepMind
41,000 citations · h-index 72
Yarin Gal
University of Oxford
38,000 citations · h-index 45
Dan Hendrycks
Center for AI Safety
35,000 citations · h-index 42
Shakir Mohamed
Google DeepMind
34,000 citations · h-index 52
Emily Bender
University of Washington
24,806 citations · h-index 38
Finale Doshi-Velez
Harvard University
22,000 citations · h-index 48
Been Kim
Google DeepMind
18,000 citations · h-index 32
Timnit Gebru
DAIR Institute
15,864 citations · h-index 28
Jacob Steinhardt
UC Berkeley
14,000 citations · h-index 35
Tatsunori Hashimoto
Stanford University
12,000 citations · h-index 30
Sara Hooker
Cohere for AI
2,851 citations · h-index 21
Papers (11)
Artificial Intelligence: A Modern Approach
Towards Deep Learning Models Resistant to Adversarial Attacks
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Training language models to follow instructions with human feedback
Measuring Massive Multitask Language Understanding
Towards Interpretable Machine Learning: A Survey on Methods and Metrics
Concrete Problems in AI Safety
SoK: Eternal War in Memory
Learning Safe Multi-Agent Control with Decentralized Neural Barrier Certificates
Holistic Evaluation of Language Models
Projects (2)
Cleanlab
Open-source tool to find and fix label errors in datasets. Implements confident learning — automatically detects noisy labels in any ML dataset.
by Jeremy Nixon
Circuits / Mechanistic Interpretability
Pioneering research on understanding neural networks by reverse-engineering their internal mechanisms. Published the influential Circuits thread on Distill.
by Chris Olah