Importance of AI Safety Research
by Kaj Sotala, Research Associate at MIRI Machine Intelligence Research Institute
Tags:
View transcript
Okay, apologies for the somewhat disorganized nature of this setup, but we'll hope to do as best as we can. Now, I work at the Future of Humanity Institute in Oxford University, which is an institute which is as cool as its name suggests. And here I'm going to have a look at artificial intelligence risk and risk control, and emphasize some of the reasons why this is an area of potentially extreme risks, but also of some extreme uncertainties. Okay, next slide. Now, those of you who've dealt with finance are aware of the concept of return on investment. And this is what got me started on this. Quest, if your investment is your entire life, and the return is what you want to do with it, well, basically, how can you maximize your return on investment? Next slide. So I decided, well, this is going to sound a lot more altruistic than it is. I decided to look at the most good that I could possibly do, or developed into that. But how can we do good? Okay, next slide. This is Kaposi's sarcoma. It's an unpleasant disease that you will get often if you have HIV. And it can be treated. And treatments improve the life of people who get them. So this is definitely good. But how good is it? If we're going to do that, we're going to need to start quantifying. Next slide. And next slide again. We have a unit called the QALY, the quality adjusted life year, which is imperfect as everything goes. But it's reasonable for what it does. And they estimate that if you spend a thousand pounds, you get slightly less than one QALY. So it's a life year. Someone survives for a year, and they have a decent quality of life. This is reasonable. I mean, I pay a thousand pounds for a year of high quality life myself. But can we do better? Next slide. Antiretroviral therapy. This treats it in a more effective way. And as you can see, it reaches towards almost two QALY. So it's about three times more effective. But, next slide. As my mother used to say, and probably yours did as well, prevention is better than cure. Something like prevention of transmission during pregnancy is extraordinarily effective. You think about it, a baby that does or doesn't have HIV, this is going to affect its whole life. So you just pile on the QALYs. But let's change the scale again. Next slide. To make a bit of room. And next slide. Condom distribution. This reaches up to almost 20 QALYs. So what this means is that if you have money and you're going to give it to a charity to treat HIV, and you invested it in Kaposi's sarcoma, you'd be basically saying that there's about, I don't know, the difference between those two bars. So 19 years, 19 QALYs or lives that you'd be willing to see die or see ended because you went for the least effective intervention. In fact, if you want to be selfish, instead of treating Kaposi's sarcoma, just go for condom distribution and spend half as much money and use the rest to give yourself huge dinners, and you're still going to be doing a lot more good than if you went for the ineffective intervention. But this is HIV. Why have I focused on HIV? Is HIV the most effective intervention we have in the world today? No. Next slide. As far as we can tell, in the traditional charity, anti-malaria via bed net distribution is extraordinarily effective, reaching up to 50 QALYs. Next slide. These numbers come from giving what we can and give well. Charity aggregators are really good at what they do. So as far as traditional charity goes, this seems relatively solid. But can we do better? Next slide. Can we go for hundreds, thousands, or millions of QALYs? Well, sometimes you can. This guy is an interesting guy. He's called Fritz Haber. He's the father of chemical warfare during the First World War, developed gases that killed quite a lot of people, shall we say. However, he's also one of the heroes, in a certain sense, of saving people. Because he also developed, next slide, the Haber-Bosch process, along with a guy called, well, Bosch. This is how you fix nitrogen from the air. This does not sound particularly exciting, but think fertilizers. This is where the world gets most of its fertilizers from. And what that means is that this fertilizer goes into the plants, the plants grow wheat, the plants... About half of the nitrogen atoms in all our bodies are ones that were fixed by the Haber-Bosch process. So you can estimate maybe at least a billion people alive today would be dead if it wasn't for this kind of innovation. Next slide. Now we have Norman Borlaug, the father of the Green Revolution, the reason that India and Mexico have a decent agriculture today. At least a quarter billion, others say. Going further back in time, next slide, Edward Jenner, the smallpox vaccine. Another half billion at least. So basically it seems that one of the ways of having big intervention is to take a difficult scientific health-related or food-related problem and solve it. But can we do even better? There's only one way that I can imagine of having a higher impact than that. Next slide. And that's to take a large disaster and stop it from happening. Next slide. So about two years ago, I looked at what we called global catastrophic risks, large-scale risks that can do and inflict a lot of damage on the planet, ranging from climate change to pandemics to meteor impacts. These risks are very diverse, and only four of them turned out to be true existential risks. Next slide. Nuclear war, still a risk despite the missiles are still mainly out there. Global pandemics, the only natural risk that has a very scary threat curve. Bioengineered pandemics. And the curious one, artificial intelligence. Next slide. Why did I put AI on this list? Well, we'll see that. But AI is the risk, it's the one with the most extreme uncertainties. It's got some potential extreme risks, but it also has some potential extreme benefits. The main benefit of not having a nuclear war is that you don't have a nuclear war. The benefit of not having potentially dangerous AI is that you could get extremely effective, safe AI. And it also has some short-term impact as well. Let's have a look. Basically at this point I said, okay, enough research, I found my return on investment. This is where I'm going to be working. Right. Next slide. This is where I've roughly arranged the 12 risks in terms of how much we know about them. And how little we know about them. And you have major asteroid impact at one side where we know practically everything. We count them, we can assess them. All the way down through things like extreme climate change, nuclear war, a lot of politics in assessing that. Biotech pandemic where we don't even know the technologies involved. And then AI down at the bottom. And the only thing that's more uncertain than that is unknown risks. Next slide. I'll skirt quickly over the short-term AI. Basically there's a lot of hype and anti-hype. Next slide. About AI. There's two things that were very clearly true until about a year ago. The first that is still true is that it's becoming very hard to point to something. Sorry, next slide. This is a Kurzweil cartoon. It's very hard to point to something that only humans can do. You see on the wall there when that was written, there was only humans can drive cars. But that's already slipped off. So we cannot confidently anymore say this is humans only. At the same time, next slide, we have not seen any sign of general intelligence. An intelligence in AI able to perform across different areas. This is illustrated, for instance, by Watson, the AI that won on Jeopardy. Here it is guessing it won. It completely crushed the humans in opposition. But its mistakes were quite interesting. Here it is guessing Toronto as an answer. Next slide. When the category was US cities. For those who don't know, Toronto is not a US city. And Watson certainly knew this. And if you'd asked it, is Toronto a US city, it would have said no. So there's an interesting lack of general intelligence. There has been some change recently with deep learning. It's not quite general intelligence, but it's general methods for many narrow areas. So next slide. Much narrow AI and little general AI. Next slide. Okay. A lot of people talk about AI and employment. People get really excited about that. And people at the institute here looked at what jobs are most risk for automation and which ones are least risk. They seem to be clearly divided into two groups with very little in the middle. About almost half of the jobs are very vulnerable to automation. And another third seem to be practically immune, at least according to this model. Next. The ones that are very risky to automation are things like underwriters, waiters. In the middle you have bus drivers. All the way down to physicists and especially the safest job of all, recreational therapists. If you don't want the machines to come for your employment, that is the way to go. Next slide. Now, you have to realize that these jobs being replaced does not mean the computer does exactly the same thing. For instance, computers lack the sort of skills to be waiters. But they can still replace the job while not being exactly identical. They would replace waiters, for instance, by automated ordering systems. Companies used to have lots of secretaries all over the place. And now they have considerably less of them. And the reason that they do is because the secretaries were replaced, not by robot secretaries. Next slide. But by word processors. And, yeah. So these are the short-term impacts. Okay, next slide. Here I put YouTube. Next slide. Facebook. And advanced games. The reason these are up is what I was showing you before was that, computers doing jobs that humans already do. So it's replacement. Computers also add things to the economy that were absolutely impossible. Facebook allows connection with distant friends that just didn't exist before. Immersive games. Well, that's also something new. And YouTube. The richest person in the world in the 1950s could not get a documentary on Peruvian cuisine in their house in five minutes, no matter how much money they threw at the problem. But now anyone with a YouTube connection can. Next slide. There's also other things like robots extracting people from danger areas. Next slide. Potentially, it's almost like precise manufacturing or nanotechnology that used to be known. And this one's very interesting. Artificial limbs. Artificial limbs are not yet better than natural limbs. But they're getting there. And I predict that within ten years you're going to get people asking to be amputated to be replaced with artificial limbs. And that's going to open up some whole loads of interesting things. Anyway, that's the short-term AI. This is rather minor things. Next slide. Yeah, I'll skip over this. Continuous research. Continue pressing buttons until you get the brain on the screen. You got the brain yet? Yes. Okay. Extreme AI. Now, if we want to talk about what AI could do if we remove all limitations, well, there's a variety of methods we can do to try and imagine that. The first is that AI can only do cognitive tasks, thinking tasks. So what are cognitive tasks? Cognitive tasks. Next slide. Understanding things, predicting and planning, and organizing, for instance. These are all cognitive tasks. Unfortunately, it turns out that you can do most things with that. For instance, next slide, manufacturing. People get paid a lot of money in manufacturing today, but it's not the people who put the stuff together. It's the people who plan the factories, build the machines, find the raw materials, do the infrastructure, finance. Basically, it's all a great question of organizing. And this is the sort of stuff that AI would be extremely good at, automated organizing. What else? Next slide. Warfare. To vastly oversimplify, warfare is weapon design plus... ...plus running your military corporation. Sorry, running your military organization plus strategy. All of these are cognitive tasks. Next slide. Space exploration. I mentioned this is a very, it's a minor point, but space exploration is almost pure cognitive. You have metals and oils and they're underground, and suddenly there is a satellite landing on a comet. And the only reason this happened was because smart politicians allowed it to happen. Smart managers, organizers, and smart engineers put the whole stuff together. Last cognitive task. Next slide. People skills. Seduction. Political movements. Writing popular things. All these are cognitive tasks. And it's basically this approach of limiting what AIs can do is a failure. If you just say you can only do cognitive tasks, that doesn't give you much restriction on what it could possibly do. So next slide. Yeah, you still have the laws of physics as a limitation, but that's not particularly reassuring. Next slide. Okay. This is another way that you can try and do it. What could we imagine doing if we got human level AI? Could we do something with it? Well, suppose we copied the human level AI and trained it in different areas, and we got the computer equivalents of Edison, Einstein, George Soros. Next slide, sorry. Clinton, Oprah, Bernie Sanders. Oprah, Bernie Madoff, Goebbels, Steve Jobs, and Confucius. All these people are brilliant in their own cognitive domain. We get their computer equivalents. We run them at thousands and thousands of times human speed so that they have an hour to think about every response in a conversation. And we give them vast amounts of data. This kind of entity is the sort of thing which I fear would find, next slide, the Internet and the human race as useful resources for their plans. And we'll get to their plans later because that will be an important point. So this at least suggests, I mean this is just a story, this suggests that we might get extremely powerful AIs just from human level AIs. Next slide. There's another argument. Yes, what does this slide here look like? You should now get onto copyable cognitive capital. Okay, yeah. The one minute corporation. Corporations are extremely powerful entities today that do specific tasks. And building one takes a lot of time. You have to recruit all the people. Raise the money to pay them. Organize the managers. Blah, blah, blah. With AI you could potentially have the one minute corporation. Next slide. You have an AI. Next slide. Instead of recruiting all these people, if your AI is trained enough, then next slide, copy, copy, copy, copy, copy, copy. And next slide. You have your perfectly unified corporation. Probably does not need much in terms of middle managers because all the entities are motivated in the same way. And you have created this just from one AI and a lot of copying. So some of the most powerful organizations that we have in the world today you can create in a minute and destroy in a minute if it turns out that that particular corporation doesn't serve your tasks. Next slide. Currently there's 2 billion computers, very, very roughly in the world today. Probably a lot more if you count some of the smaller. Anyway. But there's a lot of computers. How many AIs could we potentially run when we get AIs? Next slide. Probably quite a lot. But remember I said that manufacturing was a cognitive task. So if we have this many AIs for us in manufacturing, how many AIs could we conceivably run then? Next slide. Well, insert some huge upper bounds. So basically we're not just talking about one AI. If we assume it's copyable, which would be very likely if it is possible to build one, then we get vast amounts. Next slide. Finally, the last argument, which I won't put too much emphasis on. Some people point out about AI self-improvement. Next slide. What's a picture of someone doing brain surgery on themselves? Recommendation. Don't do brain surgery on yourself. But why shouldn't you do brain surgery on yourself? Well, mainly because you only have one brain and the damage is going to be rather bad if you mess it up. But, next slide, with the AI that can copy itself so much, it can experiment in many different ways with improving itself. Next slide. And it could also improve itself indirectly. By improving its own design and improving its own manufacturing. Okay, next slide. That's the case that if we have an AI, there's at least a chance that it can become extremely powerful in one of several different ways. But that doesn't mean that it's a risk. It's only a risk if this extremely powerful entity has goals that are dangerous for us. And since we're going to be giving it the goals, we'd be stupid enough to say, we'd be stupid to give it dangerous goals. Next slide. So what we'd want is something like, in its code, a help all humans equals true and a kill all humans equals false. Except if you try and do that, next slide, you're going to get some sort of undefined term. Defining what a human is is extremely difficult and defining what helping is is also extremely difficult. In fact, next slide, philosophers spend a lot of time showing that even a term like chair is extremely hard to define. Next slide. It's rather more important to get the help all humans concepts down. Now, next slide. You should see increased shareholder value on the top of the screen. This is something that you might give into a corporate AI. What could increase shareholder value the most? Next slide. Well, they have weak, what I call weak, naive plans. You could relabel stocks so that they're worth more. You could print money. I mean, that means you have more money. You could keep... Or it's going to get caught when it tries to do something like kidnapping opposing CEOs. But these are the kind of things that we can catch because these are plans that only you can do when it's weak. Next slide. Next slide. Then we have strong, what I call strong, naive plans. Things like accountancy fraud, things that are technically legal. You can... If you have a bank and you do it at the right time, you can fail it and you can get a bailout. You can do destructive investing or very dangerous PR that damages the reputation of your opposing firms. All of these are legal today to some extent. So it could do a lot of dubious things to increase shareholder value that are legal. And if we program it to obey the law, even if we manage to get to understand that, it would still have problems. Next slide and next slide. However, this is less important than the strong dramatic plans. If you can crash the U.S. economy or the European economy and profit from that, which an extremely powerful AI and a corporation might be able to do, this counts as a way of achieving your goals. What about destroying the U.S. currency? This is done in a sophisticated way. And maybe even a totally legal way would also increase the value of these shares. So these strong dramatic plans are the large risks. And the more powerful the AI becomes, the more risk it has of these strong dramatic plans. Next slide. And there's always the twin plans of gain great power and do something unexpected, which an AI should do for all goals. This is what makes it potential. So dangerous. Next slide. So basically, if you don't have safety in your design, as it gets extremely powerful, the AI will not do what you want it to do. Next slide. Now, let's see something like filter spam. You notice I've changed the weak naive plans and the strong naive plans. But you still have strong dramatic plans. Like crash the internet. And the eternal plans, gain great power, do something we hadn't predicted. This works for almost all goals. Next plan, keep humans safe and happy. Very difficult plan to design. Humans, hard to define. Safe, hard to define. Happy, very hard to define. But if you do this, and I'm afraid. Next slide. Oh, sorry. You see an asterisk, unhappy. What does this mean? Brain chemicals, smiling, human smiling, self-report. Next slide. If it's a self-report, does it mean the AI holds a gun to people's heads and tells them to say that they're happy? It has to be uninfluenced. Next slide. Define influence. How do you not influence someone's report if you're trying to actually make them happy? Is it a predictable influence? And next slide. In fact, there's asterisks on all of these words here. Next slide. Even and. What's the trade-off between the two? What do you do in extreme circumstances? Do we also need freedom? Do we need rights? And all those sort of stuff. In fact, next slide. Humans as well, replacement humans. How do you define what a human is? And so on. All these terms need to be defined. You need to sort out exactly what it's defined. You basically need to have all of human morality, or almost all of human morality, before you would ask these extremely powerful AIs to do anything. Next slide. And next slide. Now, there's some more clever approaches, like indirect normativity. Figure out human preferences from observation. This could work. It has the advantage you don't have to define everything. But the problem is that if you don't do it very carefully and it just looks naively at what's going on. Next slide. I fear that the entire future of the world, or potentially the universe, is going to look nothing more than a huge candy crush game based on observed human behavior. Next slide. I could say that uncontrollable and highly dangerous is generic behaviors for high intelligent AIs. And you need to add caveats. There's great uncertainty. Next slide. But even with the caveats, this is not reassuring. Next slide. You can read more about it in my boss's book, Superintelligence. I have a shorter pamphlet, non-technical, that is smarter than us, that lays out the case for that. Now, I think I'm going to need to accelerate. Give any time for questions at all. Next slide. Next slide. Next slide. Next slide. Okay. This is just to show that as you get to extremely powerful AIs, the quality of the outcome divides quite sharply between something disastrous and something fantastic. Next slide. Because if you have a fantastic AI, for reasons which I won't go into, you can solve things like absolute poverty. That's, in a sense, a question of people not having enough stuff. Some people in some places of the world don't have enough of certain amounts of stuff. Building stuff and getting it to people is something that a super intelligent AI could do very well. Relative poverty is completely different. But absolute poverty, ill health, this is experimentation, understanding medicine, etc. This is, again, something that super intelligent AI could be good at. Next slide. Death. I'm just dropping this in here as a potential extreme version of ill health. It's hard to imagine solving medical problems without solving death, which is the interesting thing. But anyway, next slide. Things like depression and boredom. Anything that humans are kind of okay at doing today, a super intelligent AI, if it existed, could be better at that. And could therefore solve the problem. Next slide. Lack of any health. Same thing. Now, here I'm going to accelerate through why... Okay. Go down to slide number 37. No, go to slide 35. What's the headline? AI predictions. Yeah, so here you just talk about how good we can make predictions. Yeah, okay. Darth Vader. This is a prediction from the Dartmouth conference that coined the term AI. They were basically saying, yeah, we have AI over a summer, 1956. These are some of the smartest people in the world at the time with the most knowledge of making computers do stuff. Nine years later, next slide. Dreyfus, in an otherwise excellent paper, was saying basically we're reaching the limit of what we can do with computers. Neither of these predictions are accurate. I think it's safe to say. Next slide. AI will be developed in 15 to 25 years. This is a prediction that has been made by various sources. Next slide. In 2012. And next slide. 2011, 2010, 2009, 2008. And all the way back to the 1960s. It's the most popular prediction. Next slide. Here I have plotted predictions made as to when we might have AI. And by the date the prediction was made. Next slide. This is Turing's original prediction. Next slide. This is AI winter when everyone was pretending not to work on AI anymore. And next slide. If you zoom in. Next slide. This is the only sort of area where there's any accumulation. There's sort of 10, 20 years, 30 years, 40 years easily. There's 40 years easily between predictions. There's no idea that there's coming in on us any central idea. But the only accumulation we have is the 15 to 25 year timeline again. Next slide. Next slide. Okay. Go on until you see the slide building safety. So how do we build safety? So there's a very unusual way of thinking because you have to be able to move between descriptions of what you're achieving, designs for how you set up things, and mathematical equations where you lock down everything. This here are three elements that go into my design for courageability. Next slide. Now, courageability is the idea that you can get an AI. You can get an AI to change its values safely. This is a key element because why is this important? Because it allows, next slide, things like value learning. IE, an AI can adjust its values. The indirect normativity I mentioned before, this is value learning. This is probably going to be essential, but you can't do it if the AI is going to try and resist you changing its values. Next slide. Another idea that people are working with, reduce impact. So define an AI that doesn't have much impact on the world so you can safely give it a dangerous goal because it's not going to go crazy. And finally, next slide, safe value learning. The ultimate thing. If we could get it to have safe values, we could do it safely. All of these are overlapping. They're influencing each other. And hopefully together they'll construct something that works. There's a fifth thing that we're building in safety today. It's, in a sense, the easiest and the most fun, but also the most depressing. Next slide. And that's finding flaws in everybody else's approach because there's always huge amounts of flaws in everybody's approaches, especially people who think they found them easy. And there's also flaws in the approaches that we come up with. And what we're afraid of is that we'll continue doing this until we're not smart enough to find a flaw and we'll say, okay, that's the problem solved now. That would be a disaster because just because we're not smart enough doesn't mean a potential AI, if it has existed, could not find a flaw. Anyway, I think I'll end there. Cut off bits of the talk short. So just to summarize, if an AI is developed in the next, say, 50 years, which is not impossible, it might or might not happen. There's at least strong arguments that it could become extremely powerful through various methods like copying and speed and self-improvement. The world will come to resemble what it prefers and it's very hard to get those preferences right. And fortunately, some people are making some progress on that. Thanks for listening. And now we can get to the Q&A.