Superintelligence reading group


Are you curious about risks from artificial intelligence? Do you want to find out whether the issue deserves the attention it gets from many effective altruists? Have you been meaning to think more carefully about it sometime?

If so, I invite you to join MIRI’s online Superintelligence reading group, starting in two weeks. I will run it, and it will aim to discuss all kinds of questions at many levels of expertise, with an eye to carefully assessing the argument for doing something about AI. I append the MIRI post, which contains further details. Sign up here to be alerted when it starts.



Nick Bostrom’s eagerly awaited Superintelligence comes out in the US this week. To help you get the most out of it, MIRI is running an online reading group where you can join with others to ask questions, discuss ideas, and probe the arguments more deeply.

The reading group will “meet” on a weekly post on the LessWrong discussion forum. For each ‘meeting’, we will read about half a chapter of Superintelligence, then come together virtually to discuss. I’ll summarize the chapter, and offer a few relevant notes, thoughts, and ideas for further investigation. (My notes will also be used as the source material for the final reading guide for the book.)

Discussion will take place in the comments. I’ll offer some questions, and invite you to bring your own, as well as thoughts, criticisms and suggestions for interesting related material. Your contributions to the reading group might also (with permission) be used in our final reading guide for the book.

We welcome both newcomers and veterans on the topic. Content will aim to be intelligible to a wide audience, and topics will range from novice to expert level. All levels of time commitment are welcome. We especially encourage AI researchers and practitioners to participate. Just use a pseudonym if you don’t want your questions and comments publicly linked to your identity.

We will follow this preliminary reading guide, produced by MIRI, reading one section per week.

If you have already read the book, don’t worry! To the extent you remember what it says, your superior expertise will only be a bonus. To the extent you don’t remember what it says, now is a good time for a review! If you don’t have time to read the book, but still want to participate, you are also welcome to join in. I will provide summaries, and many things will have page numbers, in case you want to skip to the relevant parts.

If this sounds good to you, first grab a copy of Superintelligence. You may also want to sign up here to be emailed when the discussion begins each week. The first virtual meeting (forum post) will go live at 6pm Pacific on Monday, September 15th. Following meetings will start at 6pm every Monday, so if you’d like to coordinate for quick fire discussion with others, put that into your calendar. If you prefer flexibility, come by any time! And remember that if there are any people you would especially enjoy discussing Superintelligence with, link them to this post!

Topics for the first week will include impressive displays of artificial intelligence, why computers play board games so well, and what a reasonable person should infer from the agricultural and industrial revolutions.

Why we should err in both directions


Crossposted from the Global Priorities Project

This is an introduction to the principle that when we are making decisions under uncertainty, we should choose so that we may err in either direction. We justify the principle, explore the relation with Umeshisms, and look at applications in priority-setting.

Some trade-offs

How much should you spend on your bike lock? A cheaper lock saves you money at the cost of security.

How long should you spend weighing up which charity to donate to before choosing one? Longer means less time for doing other useful things, but you’re more likely to make a good choice.

How early should you aim to arrive at the station for your train? Earlier means less chance of missing it, but more time hanging around at the station.

Should you be willing to undertake risky projects, or stick only to safe ones? The safer your threshold, the more confident you can be that you won’t waste resources, but some of the best opportunities may have a degree of risk, and you might be able to achieve a lot more with a weaker constraint.

The principle

We face trade-offs and make judgements all the time, and inevitably we sometimes make bad calls. In some cases we should have known better; sometimes we are just unlucky. As well as trying to make fewer mistakes, we should try to minimise the damage from the mistakes that we do make.

Here’s a rule which can be useful in helping you do this:

When making decisions that lie along a spectrum, you should choose so that you think you have some chance of being off from the best choice in each direction.

We could call this principle erring in both directions. It might seem counterintuitive — isn’t it worse to not even know what direction you’re wrong in? — but it’s based on some fairly straightforward economics. I give a non-technical sketch of a proof at the end, but the essence is: if you’re not going to be perfect, you want to be close to perfect, and this is best achieved by putting your actual choice near the middle of your error bar.

So the principle suggests that you should aim to arrive at the station with a bit of time wasted, but not so much that you won’t miss the train even if something goes wrong.


Just saying that you should have some chance of erring in either direction isn’t enough to tell you what you should actually choose. It can be a useful warning sign in the cases where you’re going substantially wrong, though, and as these are the most important cases to fix it has some use in this form.

A more careful analysis would tell you that at the best point on the spectrum, a small change in your decision produces about as much expected benefit as expected cost. In ideal circumstances we can use this to work out exactly where on the spectrum we should be (in some cases more than one point may fit this, so you need to compare them directly). In practice it is often hard to estimate the marginal benefits and costs well enough for this to be useful approach. So although it is theoretically optimal, you will only sometimes want to try to apply this version.

Say in our train example that you found missing the train as bad as 100 minutes waiting at the station. Then you want to leave time so that an extra minute of safety margin gives you a 1% reduction in the absolute chance of missing the train.

For instance, say your options in the train case look like this:

Safety margin (min) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Chance of missing train (%) 50 30 15 8 5 3 2 1.5 1.1 0.8 0.6 0.4 0.3 0.2 0.1

Then the optimal safety margin to leave is somewhere between 6 and 7 minutes: this is where the marginal minute leads to a 1% reduction in the chance of missing the train.

Predictions and track records

So far, we’ve phrased the idea in terms of the predicted outcomes of actions.

Another more well-known perspective on the idea looks at events that have already happened. For example:

These formulations, dubbed ‘Umeshisms’, only work for decisions that you make multiple times, so that you can gather a track record.

An advantage of applying the principle to track records is that it’s more obvious when you’re going wrong. Introspection can be hard.

You can even apply the principle to track records of decisions which don’t look like they are choosing from a spectrum. For example it is given as advice in the game of bridge: if you don’t sometimes double the stakes on hands which eventually go against you, you’re not doubling enough. Although doubling or not is a binary choice, erring in both directions still works because ‘how often to do double’ is a trait that roughly falls on a spectrum.


There are some circumstances where the principle may not apply.

First, if you think the correct point is at one extreme of the available spectrum. For instance nobody says ‘if you’re not worried about going to jail, you’re not committing enough armed robberies’, because we think the best number of armed robberies to commit is probably zero.

Second, if the available points in the spectrum are discrete and few in number. Take the example of the bike locks. Perhaps there are only three options available: the Cheap-o lock (£5), the Regular lock (£20), and the Super lock (£50). You might reasonably decide on the Regular lock, thinking that maybe the Super lock is better, but that the Cheap-o one certainly isn’t. When you buy the Regular lock, you’re pretty sure you’re not buying a lock that’s too tough. But since only two of the locks are good candidates, there is no decision you could make which tries to err in both directions.

Third, in the case of evaluating track records, it may be that your record isn’t long enough to expect to have seen errors in both directions, even if they should both come up eventually. If you haven’t flown that many times, you could well be spending the right amount of time — or even too little — in airports, even if you’ve never missed a flight.

Finally, a warning about a case where the principle is not supposed to apply. It shouldn’t be applied directly to try to equalise the probability of being wrong in either direction, without taking any account of magnitude of loss. So for example if someone says you should err on the side of caution by getting an early train to your job interview, it might look as though that were in conflict with the idea of erring in both directions. But normally what’s meant is that you should have a higher probability of failing in one direction (wasting time by taking an earlier train than needed), because the consequences of failing in the other direction (missing the interview) are much higher.

Conclusions and applications to prioritisation

Seeking to err in both directions can provide a useful tool in helping to form better judgements in uncertain situations. Many people may already have internalised key points, but it can be useful to have a label to facilitate discussion. Additionally, having a clear principle can help you to apply it in cases where you might not have noticed it was relevant.

How might this principle apply to priority-setting? It suggests that:

  • You should spend enough time and resources on the prioritisation itself that you think some of time may have been wasted (for example you should spend a while at the end without changing your mind much), but not so much that you are totally confident you have the right answer.
  • If you are unsure what discount rate to use, you should choose one so that you think that it could be either too high or too low.
  • If you don’t know how strongly to weigh fragile cost-effectiveness estimates against more robust evidence, you should choose a level so that you might be over- or under-weighing them.
  • When you are providing a best-guess estimate, you should choose a figure which could plausibly be wrong either way.

And one on track records:

  • Suppose you’ve made lots of grants. Then if you’ve never backed a project which has failed, you’re probably too risk-averse in your grantmaking.

Appendix: a sketch proof of the principle

Assume the true graph of value (on the vertical axis) against the decision you make (on the horizontal axis, representing the spectrum) is smooth, looking something like this:



The highest value is achieved at d, so this is where you’d like to be. But assume you don’t know quite where d is. Say your best guess is that d=g. But you think it’s quite possible that d>g, and quite unlikely that d<g. Should you choose g?

Suppose we compare g to g’, which is just a little bit bigger than g. If d>g, then switching from g to g’ would be moving up the slope on the left of the diagram, which is an improvement. If d=g then it would be better to stick with g, but it doesn’t make so much difference because the curve is fairly flat at the top. And if g were bigger than d, we’d be moving down the slope on the right of the diagram, which is worse for g’ — but this scenario was deemed unlikely.

Aggregating the three possibilities, we found that two of them were better for sticking with g, but in one of these (d=g) it didn’t matter very much, and the other (d<g) just wasn’t very likely. In contrast, the third case (d>g) was reasonably likely, and noticeably better for g’ than g. So overall we should prefer g’ to g.

In fact we’d want to continue moving until the marginal upside from going slightly higher was equal to the marginal downside; this would have to involve a non-trivial chance that we are going too high. So our choice should have a chance of failure in either direction. This completes the (sketch) proof.

Note: There was an assumption of smoothness in this argument. I suspect it may be possible to get slightly stronger conclusions or work from slightly weaker assumptions, but I’m not certain what the most general form of this argument is. It is often easier to build a careful argument in specific cases.

Acknowledgements: thanks to Ryan Carey, Max Dalton, and Toby Ord for useful comments and suggestions.

Conversation with Holden Karnofsky, Nick Beckstead, and Eliezer Yudkowsky on the “long-run” perspective on effective altruism


Earlier this year, I had an email conversation with Holden Karnofsky, Eliezer Yudkowsky, and Luke Muehlhauser about future-oriented effective altruism, as a follow-up to an earlier conversation Holden had with Luke and Eliezer.

The conversation is now available here. My highlights from the conversation:

NICK: I think the case for “do the most good” coinciding with “do what is best in terms of very long-term considerations” rests on weaker normative premises than your conversation suggests it does. For example, I don’t believe you need the assumption that creating a life is as good as saving a life, or a constant fraction as good as that. I have discussed a more general kind of argument—as well as some of the most natural and common alternative moral frameworks I could think of—in my dissertation (especially ch. 3 and ch. 5). It may seem like a small point, but I think you can introduce a considerable amount of complicated holistic evaluation into the framework without undermining the argument for focusing primarily on long-term considerations.

For another point, you can have trajectory changes or more severe “flawed realizations” that don’t involve extinction. E.g., you could imagine a version of climate change where bad management of the problem results in the future being 1% worse forever or you could have a somewhat suboptimal AI that makes the future 1% worse than it could have been (just treat these as toy examples that illustrate a point rather than empirical claims). If you’ve got a big enough future civilization, these changes could plausibly outweigh short-term considerations (apart from their long-term consequences) even if you don’t think that creating a life is within some constant fraction of saving a life.

HOLDEN: On your first point – I think you’re right about the *far future* but I have more trouble seeing the connection to *x-risk* (even broadly defined). Placing a great deal of value on a 1% improvement seems to point more in the direction of working toward broad empowerment/improvement and weigh toward e.g. AMF. I think I need to accept the creating/saving multiplier to believe that “all the value comes from whether or not we colonize the stars.”

NICK: The claim was explicitly meant to be about “very long-term considerations.” I just mean to be speaking to your hesitations about the moral framework (rather than your hesitations about what the moral framework implies).

I agree that an increased emphasis on trajectory changes/flawed realizations (in comparison with creating extra people) supports putting more emphasis on factors like broad human empowerment relative to avoiding doomsday scenarios and other major global disruptions.

ELIEZER: How does AMF get us to a 1% better *long-term* future?  Are you
envisioning something along the lines of “Starting with a 1% more prosperous Earth results in 1% more colonization and hence 1% more utility by the time the stars finally burn out”?

HOLDEN: I guess so. A 1% better earth does a 1% better job in the SWH transition? I haven’t thought about this much and don’t feel strongly about what I said.


HOLDEN: Something Weird Happens – Eliezer’s term for what I think he originally intended Singularity to mean (or how I interpret Singularity).

(will write more later)

NICK: I feel that the space between your take on astronomical waste and Bostrom’s take is smaller than you recognize in this discussion and in discussions we’ve had previously. In the grand scheme of things, it seems the position you articulated (under the assumptions that future generations matter in the appropriate way) puts you closer to Bostrom than it does to (say) 99.9% of the population. I think most outsiders would see this dispute as analogous to a dispute between two highly specific factions of Marxism or something. As Eliezer said, I think your disagreement is more about how to apply maxipok than whether maxipok is right (in the abstract).[…]

I think there’s an interesting analogy with the animal rights people. Suppose you hadn’t considered the long-run consequences of helping people so much and you become convinced that animal suffering on factory farms is of comparable importance to billions of humans being tortured and killed each year, and that getting one person to be a vegetarian is like preventing many humans from being tortured and killed. Given that you accept this conclusion, I think it wouldn’t be unreasonable for you to update strongly in favor of factory farming being one of the most high priority areas for doing good in the world, even if you didn’t know a great deal about RFMF and so on. Anyway, it does seem pretty analogous in some important ways. This looks to me like a case where some animal rights people did something analogous to the process you critiqued and thereby identified factory farming,

HOLDEN: Re: Bostrom’s essay – I see things differently. I see “the far future is extremely important” as a reasonably mainstream position. There are a lot of mainstream people who place substantial value on funding and promoting science, for that exact reason. Certainly there are a lot of people who don’t feel this way, and I have arguments with them, but I don’t feel Bostrom’s essay tells us nearly as much when read as agreeing with me. I’d say it gives us a framework that may or may not turn out to be useful.

So far I haven’t found it to be particularly useful. I think valuing extinction prevention as equivalent to saving something like 5*N lives (N=current global population) leads to most of the same conclusions. Most of my experience with Bostrom’s essay has been people pointing to it as a convincing defense of a much more substantive position.

I think non-climate-change x-risks are neglected because of how diffuse their constituencies are (the classic issue), not so much because of apathy toward the far future, particularly not from failure to value the far future at [huge number] instead of 5*N.

NICK: […] Though I’m not particularly excited about refuges, they might be a good test case. I think that if you had this 5N view, refuges would be obviously dumb but if you had the view that I defended in my dissertation then refuges would be interesting from a conceptual perspective.

HOLDEN: One of the things I’m hoping to clarify with my upcoming posts is that my comfort with a framework is not independent of what the framework implies. Many of the ways in which you try to break down arguments do not map well onto my actual process for generating conclusions.

NICK: I’m aware that this isn’t how you operate. But doesn’t this seem like an “in the trenches” case where we’re trying to learn and clarify our reasoning, and therefore your post would suggest that now is a good time to do engage in sequence thinking?

HOLDEN: Really good question that made me think and is going to make me edit my post. I concede that sequence thinking has important superiorities for communication; I also think that it COULD be used to build a model of cluster thinking (this is basically what I tried to do in my post – define cluster thinking as a vaguely specified “formula”). One of the main goals of my post is to help sequence thinkers do a better job modeling and explicitly discussing what cluster thinking is doing.

What’s frustrating to me is getting accused of being evasive, inconsistent, or indifferent about questions like this far future thing; I’d rather be accused of using a process that is hard to understand by its nature (and shouldn’t be assumed to be either rational or irrational; it could be either or a mix).

Anyway, what I’d say in this case is:

  • I think we’ve hit diminishing returns on examining this particular model of the far future. I’ve named all the problems I see with it; I have no more. I concede that this model doesn’t have other holes that I’ve identified, for the moment. I’ve been wrong before re: thinking we’ve hit diminishing returns before we have, so I’m open to more questions.
  • In terms of how I integrate the model into my decisions, I cap its signal and give it moderate weight. “Action X would be robustly better if I accepted this model of the far future” is an argument in favor of action X but not a decisive one. This is the bit that I’ve previously had trouble defending as a principled action, and hopefully I’ve made some progress on that front. I don’t intend this statement to cut off discussion on the sequence thinking bit, because more argument along those lines could strengthen the robustness of the argument for me and increase its weight.

HOLDEN: Say that you buy Apple stock because “there’s a 10% chance that they develop a wearable computer over the next 2 years and this sells over 10x as well as the iPad has.’ I short Apple stock because “I think their new CEO sucks.” IMO, it is the case that you made a wild guess about the probability of the wearable computer thing, and it is not the case that I did.

NICK: I think I’ve understood your perspective for a while, I’m mainly talking about how to explain it to people.

I think this example clarifies the situation. If your P(Apple develops a wearable computer over the next 2 years and this sells over 10x as well as the iPad has) = 10%, then you’d want to buy apple stock. In this sense, if you short Apple stock, you’re committed to P(Apple develops a wearable computer over the next 2 years and this sells over 10x as well as the iPad has) < 10%. In this sense, you often can’t get out of being committed to ranges of subjective probabilities.

The way you think about it, the cognitive procedure is more like: ask a bunch of questions, give answers to the questions, give weights to your question/answer pairs, make a decision as a result. You’re “relying on an assumption” only if that assumption is your answer to one of the questions and you put a lot of weight on that question/answer pair. Since you just relied on the pair (How good is the CEO?, The CEO sucks), you didn’t rely on a wild guess about P(Apple develops a wearable computer over the next 2 years and this sells over 10x as well as the iPad has). And, in this sense, you can often avoid being committed to subjective probabilities.

When I first heard you say, “You’re relying on a wild guess,” my initial reaction was something like, “Holden is making the mistake of thinking that his actions don’t commit him to ranges of subjective probabilities (in the first sense). It looks like he hasn’t thought through the Bayesian perspective on this.” I do think this is a real mistake that people make, though they may (often?) be operating more on the kind of basis you have described . I started thinking you had a more interesting perspective when, when I was pressing you on this point, you said something like, “I’m committed to whatever subjective probability I’m committed to on the basis of the decision that’s an outcome of this cognitive procedure.”

Strategic considerations about different speeds of AI takeoff


Crossposted from the Global Priorities Project

Co-written by Owen Cotton-Barratt and Toby Ord

There are several different kinds of artificial general intelligence (AGI) which might be developed, and there are different scenarios which could play out after one of them reaches a roughly human level of ability across a wide range of tasks. We shall discuss some of the implications we can see for these different scenarios, and what that might tell us about how we should act today.

A key difference between different types of post-AGI scenario is the ‘speed of takeoff’. This could be thought of as the time between first reaching a near human-level artificial intelligence and reaching one that far exceeds our capacities in almost all areas (or reaching a world where almost all economically productive work is done by artificial intelligences). In fast takeoff scenarios, this might happen over a scale of months, weeks, or days. In slow takeoff scenarios, it might take years or decades. There has been considerable discussion about which speed of takeoff is more likely, but less discussion about which is more desirable and what that implies.

Are slow takeoffs more desirable?

There are a few reasons to think that we’re more likely to get a good outcome in a slow takeoff scenario.

First, safety work today has an issue of neartsightedness. Since we don’t know quite what form artificial intelligence will eventually take, specific work today may end up being of no help on the problem we eventually face. If we had a slow takeoff scenario, there would be a period of time in which AGI safety researchers had a much better idea of the nature of the threat, and were able to optimise their work accordingly. This could make their work several times more valuable.

Second, and perhaps more crucially, in a slow takeoff the concerns about AGI safety are likely to spread much more widely through society. It is easy to imagine this producing widespread societal support of a level at or exceeding that for work on climate change, because the issue would be seen to be imminent. This could translate to much more work on securing a good outcome — perhaps hundreds of times the total which had previously been done. Although there are some benefits to have work done serially rather than in parallel, these are likely to be overwhelmed by the sheer quantity of extra high-quality work which would attack the problem. Furthermore, the slower the takeoff, the more this additional work can also be done serially.

A third key factor is that a slow takeoff seems more likely to lead to a highly multipolar scenario. If AGI has been developed commercially, the creators are likely to licence out copies for various applications. Moreover it could give enough time for competitors to bring alternatives up to speed.

We don’t think it’s clear whether multipolar outcomes are overall a good thing, but we note that they have some advantages. In the short term they are likely to preserve something closer to the existing balance of power, which gives more time for work to ensure a safe future. They are additionally less sensitive to the prospect of a treacherous turn or of any single-point failure mode in an AGI.

Strategic implications

If we think that there will be much more time for safety work in slow takeoff scenarios, there seem to be two main implications:

First, when there is any chance to influence matters, we should generally push towards slow takeoff scenarios. They are likely to have much more safety work done, and this is a large factor which could easily outweigh our other information about the relative desirability of the scenarios.

Second, we should generally focus safety research today on fast takeoff scenarios. Since there will be much less safety work in total in these scenarios, extra work is likely to have a much larger marginal effect. This can be seen as hedging against a fast takeoff even if we think it is undesirable.

Overall it seems to us that the AGI safety community has internalised the second point, and sensibly focused on work addressing fast takeoff scenarios. It is less clear that we have appropriately weighed the first point. Either of these points could be strengthened or outweighed by a better understanding of the relevant scenarios.

For example, it seems that neuromorphic AGI would be much harder to understand and control than an AGI with a much clearer internal architecture. So conditional on a fast takeoff, it would be bad if the AGI were neuromorphic. People concerned with AGI safety have argued against a neuromorphic approach on these grounds. However, precisely because it is opaque, neuromorphic AGI may be less able to perform fast recursive self-improvement, and this would decrease the chance of a fast takeoff. Given how much better a slow takeoff appears, we should perhaps prefer neuromorphic approaches.

In general, the AGI safety community focuses much of its attention on recursive self-improvement approaches to designing a highly intelligent system. We think that this makes sense in as much as it draws attention to the dangers of fast takeoff scenarios and hedges against being in one, but we would want to take care not to promote the approach for those considering designing an AGI. Drawing attention to the power of recursive self improvement could end up being self-defeating if it encourages people to design such systems, producing a faster takeoff.
In conclusion it seems that when doing direct technical safety work, may be reasonable to condition on a fast takeoff, as that is the scenario where our early work matters most. When choosing strategic direction, however, it is a mistake to condition on a fast takeoff, precisely because our decisions may affect the probability of a fast takeoff.

Thanks to Daniel Dewey for conversations and comments.

A relatively atheoretical perspective on astronomical waste


Crossposted from the Global Priorities Project


It is commonly objected that the “long-run” perspective on effective altruism rests on esoteric assumptions from moral philosophy that are highly debatable. Yes, the long-term future may overwhelm aggregate welfare considerations, but does it follow that the long-term future is overwhelmingly important? Do I really want my plan for helping the world to rest on the assumption that the benefit from allowing extra people to exist scales linearly with population when large numbers of extra people are allowed to exist?

In my dissertation on this topic, I tried to defend the conclusion that the distant future is overwhelmingly important without committing to a highly specific view about population ethics (such as total utilitarianism). I did this by appealing to more general principles, but I did end up delving pretty deeply into some standard philosophical issues related to population ethics. And I don’t see how to avoid that if you want to independently evaluate whether it’s overwhelmingly important for humanity to survive in the long-term future (rather than, say, just deferring to common sense).

In this post, I outline a relatively atheoretical argument that affecting long-run outcomes for civilization is overwhelmingly important, and attempt to side-step some of the deeper philosophical disagreements. It won’t be an argument that preventing extinction would be overwhelmingly important, but it will be an argument that other changes to humanity’s long-term trajectory overwhelm short-term considerations. And I’m just going to stick to the moral philosophy here. I will not discuss important issues related to how to handle Knightian uncertainty, “robust” probability estimates, or the long-term consequences of accomplishing good in the short run. I think those issues are more important, but I’m just taking on one piece of the puzzle that has to do with moral philosophy, where I thought I could quickly explain something that may help people think through the issues.

In outline form, my argument is as follows:

  1. In very ordinary resource conservation cases that are easy to think about, it is clearly important to ensure that the lives of future generations go well, and it’s natural to think that the importance scales linearly with the number of future people whose lives will be affected by the conservation work.
  2. By analogy, it is important to ensure that, if humanity does survive into the distant future, its trajectory is as good as possible, and the importance of shaping the long-term future scales roughly linearly with the expected number of people in the future.
  3. Premise (2), when combined with the standard set of (admittedly debatable) empirical and decision-theoretic assumptions of the astronomical waste argument, yields the standard conclusion of that argument: shaping the long-term future is overwhelmingly important.

As when I have discussed this issue in other contexts (such as Nick Bostrom’s papers “Astronomical Waste” and “Existential Risk Prevention as Global Priority,” and my dissertation) this conversation is going to generally assume that we’re talking about good accomplished from an impartial perspective, and will not attend to deontological, virtue-theoretic, or justice-related considerations.

A review of the astronomical waste argument and an adjustment to it

The standard version of the astronomical waste argument runs as follows:

  1. The expected size of humanity’s future influence is astronomically great.
  2. If the expected size of humanity’s future influence is astronomically great, then the expected value of the future is astronomically great.
  3. If the expected value of the future is astronomically great, then what matters most is that we maximize humanity’s long-term potential.
  4. Some of our actions are expected to reduce existential risk in not-ridiculously-small ways.
  5. If what matters most is that we maximize humanity’s future potential and some of our actions are expected to reduce existential risk in not-ridiculously-small ways, what it is best to do is primarily determined by how our actions are expected to reduce existential risk.
  6. Therefore, what it is best to do is primarily determined by how our actions are expected to reduce existential risk.

I’ve argued for adjusting the last three steps of this argument in the following way:

4’.   Some of our actions are expected to change our development trajectory in not-ridiculously-small ways.

5’.   If what matters most is that we maximize humanity’s future potential and some of our actions are expected to change our development trajectory in not-ridiculously-small ways, what it is best to do is primarily determined by how our actions are expected to change our development trajectory.

6’.   Therefore, what it is best to do is primarily determined by how our actions are expected to change our development trajectory.

The basic thought here is that what the astronomical waste argument really shows is that future welfare considerations swamp short-term considerations, so that long-term consequences for the distant future are overwhelmingly important in comparison with purely short-term considerations (apart from long-term consequences that short-term consequences may produce).

Astronomical waste may involve changes in quality of life, rather than size of population

Often, the astronomical waste argument is combined with the idea that the best way to minimize astronomical waste is to minimize the probability of pre-mature human extinction. How important it is to prevent pre-mature human extinction is a subject of philosophical debate, and the debate largely rests on whether it is important to allow large numbers of people to exist in the future. So when someone complains that the astronomical waste argument rests on esoteric assumptions about moral philosophy, they are implicitly objecting to premise (2) or (3). They are saying that even if human influence on the future is astronomically great, maybe changing how well humanity exercises its long-term potential isn’t very important because maybe it isn’t important to ensure that there are a large number of people living in the future.

However, the concept of existential risk is wide enough to include any drastic curtailment to humanity’s long-term potential, and the concept of a “trajectory change” is wide enough to include any small but important change in humanity’s long-term development. And the value of these existential risks or trajectory changes need not depend on changes in the population. For example,

  • In “The Future of Human Evolution,” Nick Bostrom discusses a scenario in which evolutionary dynamics result in substantial decreases in quality of for all future generations, and the main problem is not a population deficit.
  • Paul Christiano outlined long-term resource inequality as a possible consequence of developing advanced machine intelligence.
  • I discussed various specific trajectory changes in a comment on an essay mentioned above.

There is limited philosophical debate about the importance of changes in the quality of life of future generations

The main group of people who deny that it is important that future people exist have “person-affecting views.” These people claim that if I must choose between outcome A and outcome B, and person X exists in outcome A but not outcome B, it’s not possible to affect person X by choosing outcome A rather than B. Because of this, they claim that causing people to exist can’t benefit them and isn’t important. I think this view suffers from fatal objections which I have discussed in chapter 4 of my dissertation, and you can check that out if you want to learn more. But, for the sake of argument, let’s agree that creating “extra” people can’t help the people created and isn’t important.

A puzzle for people with person-affecting views goes as follows:

Suppose that agents as a community have chosen to deplete rather than conserve certain resources. The consequences of that choice for the persons who exist now or will come into existence over the next two centuries will be “slightly higher” than under a conservation alternative (Parfit 1987, 362; see also Parfit 2011 (vol. 2), 218). Thereafter, however, for many centuries the quality of life would be much lower. “The great lowering of the quality of life must provide some moral reason not to choose Depletion” (Parfit 1987, 363). Surely agents ought to have chosen conservation in some form or another instead. But note that, at the same time, depletion seems to harm no one. While distant future persons, by hypothesis, will suffer as a result of depletion, it is also true that for each such person a conservation choice (very probably) would have changed the timing and manner of the relevant conception. That change, in turn, would have changed the identities of the people conceived and the identities of the people who eventually exist. Any suffering, then, that they endure under the depletion choice would seem to be unavoidable if those persons are ever to exist at all. Assuming (here and throughout) that that existence is worth having, we seem forced to conclude that depletion does not harm, or make things worse for, and is not otherwise “bad for,” anyone at all (Parfit 1987, 363). At least: depletion does not harm, or make things worse for, and is not “bad for,” anyone who does or will exist under the depletion choice.

The seemingly natural thing to say if you have a person-affecting view is that because conservation doesn’t benefit anyone, it isn’t important. But this is a very strange thing to say, and people having this conversation generally recognize that saying it involves biting a bullet. The general tenor of the conversation is that conservation is obviously important in this example, and people with person-affecting views need to provide an explanation consonant with that intuition.

Whatever the ultimate philosophical justification, I think we should say that choosing conservation in the above example is important, and this has something to do with the fact that choosing conservation has consequences that are relevant to the quality of life of many future people.

Intuitively, giving N times as many future people higher quality of life is N times as important

Suppose that conservation would have consequences relevant to 100 times as many people in case A than it would in case B. How much more important would conservation be in case A? Intuitively, it would be 100 times more important. This generally fits with Holden Karnofsky’s intuition that a 1/N probability of saving N lives is about as important as saving one life, for any N:

I wish to be the sort of person who would happily pay $1 for a robust (reliable, true, correct) 10/N probability of saving N lives, for astronomically huge N – while simultaneously refusing to pay $1 to a random person on the street claiming s/he will save N lives with it.

More generally, we could say:

Principle of Scale: Other things being equal, it is N times better (in itself) to ensure that N people in some position have higher quality of life than other people who would be in their position than it is to do this for one person.

I had to state the principle circuitously to avoid saying that things like conservation programs could “help” future generations, because according to people with person-affecting views, if our “helping” changes the identities of future people, then we aren’t “helping” anyone and that’s relevant. If I had said it in ordinary language, the principle would have said, “If you can help N people, that’s N times better than helping one person.” The principle could use some tinkering to deal with concerns about equality and so on, but it will serve well enough for our purposes.

The Principle of Scale may seem obvious, but even it would be debatable. You wouldn’t find philosophical agreement about it. For example, some philosophers who claim that additional lives have diminishing marginal value would claim that in situations where many people already exist, it matters much less if a person is helped. I attack these perspectives in chapter 5 of my dissertation, and you can check that out if you want to learn more. But, in any case, the Principle of Scale does seem pretty compelling—especially if you’re the kind of person that doesn’t have time for esoteric debates about population ethics—so let’s run with it.

Now for the most questionable steps: Let’s assume with the astronomical waste argument that the expected number of future people is overwhelming, and that it is possible to improve the quality of life for an overwhelming number of future people through forward-thinking interventions. If we combine this with the principle from the last paragraph and wave our hands a bit, we get the conclusion that shifting quality of life for an overwhelming number of future people is overwhelmingly more important than any short term consideration. And that is very close to what the long-run perspective says about helping future generations, though importantly different because this version of the argument might not put weight on preventing extinction. (I say “might not” rather than “would not” because if you disagree with the people with person-affecting views but accept the Principle of Scale outlined above, you might just accept the usual conclusion of the astronomical waste argument.)

Does the Principle of Scale break down when large numbers are at stake?

I have no argument that it doesn’t, but I note that (i) this wasn’t Holden Karnofsky’s intuition about saving N lives, (ii) it isn’t mine, and (iii) I don’t really see a compelling justification for it. The main reason I can think of for wanting it to break down is not liking the conclusion that affecting long-run outcomes for humanity is overwhelmingly important in comparison with short-term considerations.  If you really want to avoid the conclusion that shaping the long-term future is overwhelmingly important, I believe it would be better to accommodate this idea by appealing to other perspectives and a framework for integrating the insights of different perspectives—such as the one that Holden has talked about—rather than altering this perspective. For such people, my hope would be that reading this post would cause you to put more weight on the perspectives that place great importance on the future.


To wrap up, I’ve argued that:

  1. Reducing astronomical waste need not involve preventing human extinction—it can involve other changes in humanity’s long-term trajectory.
  2. While not widely discussed, the Principle of Scale is fairly attractive from an atheoretical standpoint.
  3. The Principle of Scale—when combined with other standard assumptions in the literature on astronomical waste—suggests that some trajectory changes would be overwhelmingly important in comparison with short-term considerations. It could be accepted by people who have person-affecting views or people who don’t want to get too bogged down in esoteric debates about moral philosophy.

The perspective I’ve outlined here is still philosophically controversial, but it is at least somewhat independent of the standard approach to astronomical waste. Ultimately, any take on astronomical waste—including ignoring it—will be committed to philosophical assumptions of some kind, but perhaps the perspective outlined would be accepted more widely, especially by people with temperaments consonant with effective altruism, than perspectives relying on more specific theories or a larger number of principles.

Agricultural research and development


Crossposted from the Giving What We Can blog

Foreword: The Copenhagen Consensus and other authors have highlighted the potential of agricultural R&D as a high-leverage opportunity. This was enough to get us interested in understanding the area better, so we asked David Goll, a Giving What We Can member and a professional economist, to investigate how it compares to our existing recommendations. – Owen Cotton-Barratt, Director of Research for Giving What We Can


Around one in every eight people suffers from chronic hunger, according to the Food and Agricultural Organisation’s most recent estimates (FAO, 2013). Two billion suffer from micronutrient deficiencies. One quarter of children are stunted. Increasing agricultural yields and therefore availability of food will be essential in tackling these problems, which are likely to get worse as population and income growth place ever greater pressure on supply. To some extent, yield growth can be achieved through improved use of existing technologies. Access to and use of irrigation, fertilizer and agricultural machinery remains limited in some developing countries. However, targeted research and development will also be required to generate new technologies (seeds, animal vaccines and so on) that allow burgeoning food demand to be met.

Agricultural research and development encompasses an extremely broad range of activities and potential innovations. A 2008 paper issued by Consultative Group on International Agricultural Research (von Braun et al., 2008), an international organization that funds and coordinates agricultural research, identifies 14 ‘best bets’. These include developing hybrid and inbred seeds with improved yield potential, better resistance to wheat rust, increased drought tolerance and added nutritional value, but also encompasses the development new animal vaccines, better fertilizer use and improved processing and management techniques for fisheries.

Notable successes in seed development seem to have generated immense social benefit. The high-yielding varieties that spread through the ‘Green Revolution’ are often credited with driving a doubling of rice and wheat yields in Asia from the late 60s to the 90s, saving hundreds of millions of people people from famine (see, for instance, Economist, 2014). Given the prevalence of hunger and the high proportion of the extremely poor that work as farmers, agricultural research and development seems to offer a potential opportunity for effective altruism.

Existing benefit-cost estimates are promising, though not spectacular. The Copenhagen Consensus project ranked R&D to increase yield enhancements as the sixth most valuable social investment available, behind deworming and micronutrient interventions but ahead of popular programmes such as conditional cash transfers for education (Copenhagen Consensus, 2012).

The calculations that fed into this decision were based on two main categories of benefit. First, higher yield seeds allow production of larger quantities of agricultural output at a lower cost, bolstering the income of farmers. Around 70 per cent of the African labour-force work in agriculture, many in smallholdings that generate little income above subsistence (IFPRI, 2012). Boosting gains from agriculture could clearly provide large benefits for many of the worst off. Second, decreased costs of production lead to lower prices for food, allowing consumers to purchase more or freeing up their income to be spent elsewhere.

Projecting out to 2050, these two types of benefit alone are expected to outweigh the costs of increased R&D by 16 to 1 (Hoddinott et al., 2012). By comparison, the benefit-cost ratios estimated within the same project for salt iodization (a form of micronutrient supplement) range between 15 to 1 and 520 to 1, with the latest estimates finding a benefit-cost ratio of 81 to 1 (Hoddinott et al., 2012), and most of the estimates reported to the Copenhagen Consensus panel for the benefit-cost ratio of conditional cash transfers for education fall between 10 to 1 and 2 to 1 (Orazem, 2012). Using a very crude method, we can also convert the benefit-cost ratios into approximate QALY terms. Using a QALY value of three times annual income and taking the income of the beneficiaries to be $4.50 a day (around average income per capita in Sub-Saharan Africa), agricultural R&D is estimated to generate a benefit equivalent to one QALY for every $304.

Other types of benefit were not tabulated in the Copenhagen Consensus study, but should also be high. Strains that are resistant to drought, for instance, could greatly reduce year-to-year variation in crop yields. More resilient seeds could mitigate the negative effects of climate change on agriculture. Lower food prices may lead to better child nutrition, with life-long improved health and productivity. Finally, higher yields may decrease the potential for conflict due to the pressure on limited land, food and water resources resulting from climate change and population growth. Each of these benefits alone may justify the costs of research and development but, with our limited knowledge, they are not easily quantified.

The high benefit-cost ratio found by the Copenhagen Consensus team is broadly consistent with other literature. Meta-analysis of 292 academic studies on this topic has found that the median rate of return of agricultural R&D is around 44% (Alston et al., 2000). A rate of return, in this sense, indicates the discount rate at which the costs of an investment are equal to the benefits – rather like the interest rate on a bank account. More recent studies, focusing on research in Sub-Saharan Africa, have found aggregate returns of 55% (Alene, 2008).

Unfortunately, the rate of return on investment is not directly comparable to a benefit-cost ratio; the methodology applied often deviates from the welfare based approach applied by the Copenhagen Consensus team and the two numbers cannot be accurately converted into similar terms. Nonetheless, a crude conversion method can be applied to reach a ballpark estimate of the benefit-cost ratio implied by these studies. Assuming a marginal increase in spending on research is borne upfront and that research generates a constant stream of equal benefits each year from then on, the benefit-cost ratio for an investment with a 44% rate of return at a 5% discount rate is 9 to 1.

There are, however, at least two reasons to treat these high benefit-cost estimates with skepticism.

First, estimating the effect of research and development is difficult. One problem is attribution. Growth in yields can be observed as can spending on research and development, but it is much more difficult to observe which spending on research led to which increase in yields. If yields grew last year in Ethiopia, was this the result of research that occurred two years ago or ten years ago? Were the improved yields driven by spending on research within Ethiopia, or was it a spillover from research conducted elsewhere in the region or, even, research conducted on another continent? Estimating the effect of R&D spend requires researchers to adopt a specific temporal and spatial model dictating which expenditures can effect which yields in which countries. Teasing out causality can therefore be tricky, and some studies have suggested that inappropriate attribution may have led to systematic bias in the available estimates (e.g. Alston et al., 2009).

Another problem is cherry picking. Estimates garnered from meta-analysis are likely to be upwardly biased because studies are much more likely to be conducted on R&D programmes that are perceived to be successful. Failed programmes, on the other hand, are likely to be ignored and, as a result, the research may paint an overly optimistic picture of the potential impact of R&D.

Second, for new technologies to have an impact on the poor, they need to be widely adopted. This step should not be taken for granted. Adoption rates for improved varieties of crops remain low throughout Africa; farmer-saved seeds, which are unlikely to be improved, account for around 80 per cent of planted seeds in Africa compared to a global average of 35 per cent (AGRA, 2013). To some extent, this is because previous research has been poorly targeted at regional needs. The high-yield varieties developed during the Green Revolution require irrigation or very high levels of rainfall. New seed development was focused on wheat and rice, rather than alternative crops such as sorghum, cassava and millet. High yielding varieties required extensive fertilizer use. All of these features rendered them unsuitable for the African context, and explain why it was not easy to replicate the Asian success story elsewhere (Elliot, 2010).

However, there are more structural features of many developing countries that will limit adoption. Lack of available markets for surplus production can mean that smallholders can see limited benefit from larger harvests, especially when new seeds are costly and require additional labour and expensive fertilizer. Weak property rights undermine incentives to invest, given that farmers may be unable to hold on to their surplus crop or sell it at a fair price. Unavailability of credit means that, even when it makes good economic sense for farmers to invest in improved seeds, they may not be able to raise the initial capital required. The benefit-cost estimates discussed above, based on a synthesis of evidence from a diverse set of contexts, may underestimate the difficulties with adoption in more challenging countries.

Even in Asia during the Green Revolution, high-yield varieties were adopted first and foremost by large agricultural interests rather than smallholders (Wiggins et al., 2013). If this was the case for newly developed seeds, the impact on the poorest would be more limited than suggested in the Copenhagen Consensus study. They could still benefit from lower food prices and increased employment in the agricultural sector, but in extreme scenarios smallholders may even lose out due to low cost competition from larger farms that adopt new seeds.

In combination, the difficulties with estimating the effects of R&D and the potential barriers to adoption suggest that the estimated benefit-cost ratios reported earlier are likely to be upwardly biased. The benefit-cost ratios estimated are also lower than those associated with Giving What We Can’s currently recommended charities. For instance, the $304 per QALY estimate based on the Copenhagen Consensus benefit-cost ratio, which appears to be at the higher end of the literature, compares unfavourably to GiveWell’s baseline estimate of $45 to $115 per DALY for insecticide treated bednets (GiveWell, 2013). The benefit-cost ratios also appear to be lower than those associated with micronutrient supplements, as discussed earlier. While there are significant benefits that remain unquantified within agricultural R&D, the same is also true for interventions based on bednet distribution, deworming and micronutrient supplements. As a result, while this area could yield individual high impact opportunities, the literature as it stands does not seem to support the claim that agricultural R&D is likely to be more effective than the best other interventions.


  • Food and Agricultural Organisation,’The State of Food and Agriculture 2013’ (2013)
  • von Braun, J., Fan, S., Meinzen-Dick, R., Rosegrant, M. and Nin Pratt, A., ‘What to Expect from Scaling Up CGIAR Investments and ‘Best Bet’ Programs’ (2008)
  • Copenhagen Consensus, ‘Expert Panel Findings’ (2012)
  • Hoddinott, J., Rosegrant, M. and Torero, M. ‘Investments to reduce hunger and undernutrition’ (2012)
  • Orazem, P. ‘The Case for Improving School Quality and Student Health as a Development Strategy’ (2012)
  • Alliance for Green Revolution in Africa, ‘Africa Agriculture Status Report 2013: Focus on Staple Crops’, (2013)
  • International Food Policy Research Institute, ‘2012 Global Food Policy Report’, (2012)
  • Elliot, K., ‘Pulling Agricultural Innovation and the Market Together’, (2010)
  • Wiggins, S., Farrington, J., Henley, G., Grist, N. and Locke, A. ‘Agricultural development policy: a contemporary agenda’ (2013)
  • Givewell, ‘Mass distribution of long-lasting insecticide-treated nets (LLINs)’, 2013,, retrieved July 10th 2014
  • The Economist, ‘A bigger rice bowl’, May 10th 2014

How to treat problems of unknown difficulty


Crossposted from the Global Priorities Project

This is the first in a series of posts which take aim at the question: how should we prioritise work on problems where we have very little idea of our chances of success. In this post we’ll see some simple models-from-ignorance which allow us to produce some estimates of the chances of success from extra work. In later posts we’ll examine the counterfactuals to estimate the value of the work. For those who prefer a different medium, I gave a talk on this topic at the Good Done Right conference in Oxford this July.


How hard is it to build an economically efficient fusion reactor? How hard is it to prove or disprove the Goldbach conjecture? How hard is it to produce a machine superintelligence? How hard is it to write down a concrete description of our values?

These are all hard problems, but we don’t even have a good idea of just how hard they are, even to an order of magnitude. This is in contrast to a problem like giving a laptop to every child, where we know that it’s hard but we could produce a fairly good estimate of how much resources it would take.

Since we need to make choices about how to prioritise between work on different problems, this is clearly an important issue. We can prioritise using benefit-cost analysis, choosing the projects with the highest ratio of future benefits to present costs. When we don’t know how hard a problem is, though, our ignorance makes the size of the costs unclear, and so the analysis is harder to perform. Since we make decisions anyway, we are implicitly making some judgements about when work on these projects is worthwhile, but we may be making mistakes.

In this article, we’ll explore practical epistemology for dealing with these problems of unknown difficulty.


We will use a simplifying model for problems: that they have a critical threshold D such that the problem will be completely solved when D resources are expended, and not at all before that. We refer to this as the difficulty of the problem. After the fact the graph of success with resources will look something like this:

Of course the assumption is that we don’t know D. So our uncertainty about where the threshold is will smooth out the curve in expectation. Our expectation beforehand for success with resources will end up looking something like this:

Assuming a fixed difficulty is a simplification, since of course resources are not all homogenous, and we may get lucky or unlucky. I believe that this is a reasonable simplification, and that taking these considerations into account would not change our expectations by much, but I plan to explore this more carefully in a future post.

What kind of problems are we looking at?

We’re interested in one-off problems where we have a lot of uncertainty about the difficulty. That is, the kind of problem we only need to solve once (answering a question a first time can be Herculean; answering it a second time is trivial), and which may not easily be placed in a reference class with other tasks of similar difficulty. Knowledge problems, as in research, are a central example: they boil down to finding the answer to a question. The category might also include trying to effect some systemic change (for example by political lobbying).

This is in contrast to engineering problems which can be reduced down, roughly, to performing a known task many times. Then we get a fairly good picture of how the problem scales. Note that this includes some knowledge work: the “known task” may actually be different each time. For example, proofreading two pages of text is quite the same, but we have a fairly good reference class so we can estimate moderately well the difficulty of proofreading a page of text, and quite well the difficulty of proofreading a 100,000-word book (where the length helps to smooth out the variance in estimates of individual pages).

Some knowledge questions can naturally be broken up into smaller sub-questions. However these typically won’t be a tight enough class that we can use this to estimate the difficulty of the overall problem from the difficult of the first few sub-questions. It may well be that one of the sub-questions carries essentially all of the difficulty, so making progress on the others is only a very small help.

Model from extreme ignorance

One approach to estimating the difficulty of a problem is to assume that we understand essentially nothing about it. If we are completely ignorant, we have no information about the scale of the difficulty, so we want a scale-free prior. This determines that the prior obeys a power law. Then, we update on the amount of resources we have already expended on the problem without success. Our posterior probability distribution for how many resources are required to solve the problem will then be a Pareto distribution. (Fallenstein and Mennen proposed this model for the difficulty of the problem of making a general-purpose artificial intelligence.)

There is still a question about the shape parameter of the Pareto distribution, which governs how thick the tail is. It is hard to see how to infer this from a priori reasons, but we might hope to estimate it by generalising from a very broad class of problems people have successfully solved in the past.

This idealised case is a good starting point, but in actual cases, our estimate may be wider or narrower than this. Narrower if either we have some idea of a reasonable (if very approximate) reference class for the problem, or we have some idea of the rate of progress made towards the solution. For example, assuming a Pareto distribution implies that there’s always a nontrivial chance of solving the problem at any minute, and we may be confident that we are not that close to solving it. Broader because a Pareto distribution implies that the problem is certainly solvable, and some problems will turn out to be impossible.

This might lead people to criticise the idea of using a Pareto distribution. If they have enough extra information that they don’t think their beliefs represent a Pareto distribution, can we still say anything sensible?

Reasoning about broader classes of model

In the previous section, we looked at a very specific and explicit model. Now we take a step back. We assume that people will have complicated enough priors and enough minor sources of evidence that it will in practice be impossible to write down a true distribution for their beliefs. Instead we will reason about some properties that this true distribution should have.

The cases we are interested in are cases where we do not have a good idea of the order of magnitude of the difficulty of a task. This is an imprecise condition, but we might think of it as meaning something like:

There is no difficulty X such that we believe the probability of D lying between X and 10X is more than 30%.

Here the “30%” figure can be adjusted up for a less stringent requirement of uncertainty, or down for a more stringent one.

Now consider what our subjective probability distribution might look like, where difficulty lies on a logarithmic scale. Our high level of uncertainty will smooth things out, so it is likely to be a reasonably smooth curve. Unless we have specific distinct ideas for how the task is likely to be completed, this curve will probably be unimodal. Finally, since we are unsure even of the order of magnitude, the curve cannot be too tight on the log scale.

Note that this should be our prior subjective probability distribution: we are gauging how hard we would have thought it was before embarking on the project. We’ll discuss below how to update this in the light of information gained by working on it.

The distribution might look something like this:

In some cases it is probably worth trying to construct an explicit approximation of this curve. However, this could be quite labour-intensive, and we usually have uncertainty even about our uncertainty, so we will not be entirely confident with what we end up with.

Instead, we could ask what properties tend to hold for this kind of probability distribution. For example, one well-known phenomenon which is roughly true of these distributions but not all probability distributions is Benford’s law.

Approximating as locally log-uniform

It would sometimes be useful to be able to make a simple analytically tractable approximation to the curve. This could be faster to produce, and easily used in a wider range of further analyses than an explicit attempt to model the curve exactly.

As a candidate for this role, we propose working with the assumption that the distribution is locally flat. This corresponds to being log-uniform. The smoothness assumptions we made should mean that our curve is nowhere too far from flat. Moreover, it is a very easy assumption to work with, since it means that the expected returns scale logarithmically with the resources put in: in expectation, a doubling of the resources is equally good regardless of the starting point.

It is, unfortunately, never exactly true. Although our curves may be approximately flat, they cannot be everywhere flat — this can’t even give a probability distribution! But it may work reasonably as a model of local behaviour. If we want to turn it into a probability distribution, we can do this by estimating the plausible ranges of D and assuming it is uniform across this scale. In our example we would be approximating the blue curve by something like this red box:

Obviously in the example the red box is not a fantastic approximation. But nor is it a terrible one. Over the central range, it is never out from the true value by much more than a factor of 2. While crude, this could still represent a substantial improvement on the current state of some of our estimates. A big advantage is that it is easily analytically tractable, so it will be quick to work with. In the rest of this post we’ll explore the consequences of this assumption.

Places this might fail

In some circumstances, we might expect high uncertainty over difficulty without everywhere having local log-returns. A key example is if we have bounds on the difficulty at one or both ends.

For example, if we are interested in X, which comprises a task of radically unknown difficulty plus a repetitive and predictable part of difficulty 1000, then our distribution of beliefs of the difficulty about X will only include values above 1000, and may be quite clustered there (so not even approximately logarithmic returns). The behaviour in the positive tail might still be roughly logarithmic.

In the other direction, we may know that there is a slow and repetitive way to achieve X, with difficulty 100,000. We are unsure whether there could be a quicker way. In this case our distribution will be uncertain over difficulties up to around 100,000, then have a spike. This will give the reverse behaviour, with roughly logarithmic expected returns in the negative tail, and a different behaviour around the spike at the upper end of the distribution.

In some sense each of these is diverging from the idea that we are very ignorant about the difficulty of the problem, but it may be useful to see how the conclusions vary with the assumptions.

Implications for expected returns

What does this model tell us about the expected returns from putting resources into trying to solve the problem?

Under the assumption that the prior is locally log-uniform, the full value is realised over the width of the box in the diagram. This is w = log(y) – log(x), where x is the value at the start of the box (where the problem could first be plausibly solved), y is the value at the end of the box, and our logarithms are natural. Since it’s a probability distribution, the height of the box is 1/w.

For any z between x and y, the modelled chance of success from investing z resources is equal to the fraction of the box which has been covered by that point. That is:

(1)Chance of success before reaching z resources = log(z/x)/log(y/x).

So while we are in the relevant range, the chance of success is equal for any doubling of the total resources. We could say that we expect logarithmic returns on investing resources.

Marginal returns

Sometimes of greater relevance to our decisions is the marginal chance of success from adding an extra unit of resources at z. This is given by the derivative of Equation (1):

(2)Chance of success from a marginal unit of resource at z = 1/zw.

So far, we’ve just been looking at estimating the prior probabilities — before we start work on the problem. Of course when we start work we generally get more information. In particular, if we would have been able to recognise success, and we have invested z resources without observing success, then we learn that the difficulty is at least z. We must update our probability distribution to account for this. In some cases we will have relatively little information beyond the fact that we haven’t succeeded yet. In that case the update will just be to curtail the distribution to the left of z and renormalise, looking roughly like this:

Again the blue curve represents our true subjective probability distribution, and the red box represents a simple model approximating this. Now the simple model gives slightly higher estimated chance of success from an extra marginal unit of resources:

(3)Chance of success from an extra unit of resources after z = 1/(z*(ln(y)-ln(z))).

Of course in practice we often will update more. Even if we don’t have a good idea of how hard fusion is, we can reasonably assign close to zero probability that an extra $100 today will solve the problem today, because we can see enough to know that the solution won’t be found imminently. This looks like it might present problems for this approach. However, the truly decision-relevant question is about the counterfactual impact of extra resource investment. The region where we can see little chance of success has a much smaller effect on that calculation, which we discuss below.

Comparison with returns from a Pareto distribution

We mentioned that one natural model of such a process is as a Pareto distribution. If we have a Pareto distribution with shape parameter α, and we have so far invested z resources without success, then we get:

(4) Chance of success from an extra unit of resources = α/z.

This is broadly in line with equation (3). In both cases the key term is a factor of 1/z. In each case there is also an additional factor, representing roughly how hard the problem is. In the case of the log-linear box, this depends on estimating an upper bound for the difficulty of the problem; in the case of the Pareto distribution it is handled by the shape parameter. It may be easier to introspect and extract a sensible estimate for the width of the box than for the shape parameter, since it is couched more in terms that we naturally understand.

Further work

In this post, we’ve just explored a simple model for the basic question of how likely success is at various stages. Of course it should not be used blindly, as you may often have more information than is incorporated into the model, but it represents a starting point if you don’t know where to begin, and it gives us something explicit which we can discuss, critique, and refine.

In future posts, I plan to:

  • Explore what happens in a field of related problems (such as a research field), and explain why we might expect to see logarithmic returns ex post as well as ex ante.
    • Look at some examples of this behaviour in the real world.
  • Examine the counterfactual impact of investing resources working on these problems, since this is the standard we should be using to prioritise.
  • Apply the framework to some questions of interest, with worked proof-of-concept calculations.
  • Consider what happens if we relax some of the assumptions or take different models.

Ben Kuhn on the effective altruist movement


Ben Kuhn is a data scientist and engineer at a small financial technology firm. He previously studied mathematics and computer science at Harvard, where he was also co-president of Harvard College Effective Altruism. He writes on effective altruism and other topics at his website.

Pablo: How did you become involved in the EA movement?

Ben: When I was a sophomore in high school (that’s age 15 for non-Americans), Peter Singer gave his The Life You Can Save talk at my high school. He went through his whole “child drowning in the pond” spiel and explained that we were morally obligated to give money to charities that helped those who were worse off than us. In particular, I think at that point he was recommending donating to Oxfam in a sort of Kantian way where you gave an amount of money such that if everyone gave the same percentage it would eliminate world poverty. My friends and I realized that there was no utilitarian reason to stop at that amount of money–you should just donate everything that you didn’t need to survive.

So, being not only sophomores but also sophomoric, we decided that since Prof. Singer didn’t live in a cardboard box and wear only burlap sacks, he must be a hypocrite and therefore not worth paying attention to.

Sometime in the intervening two years I ran across Yvain’s essay Efficient Charity: Do Unto Others and through it GiveWell. I think that was the point where I started to realize Singer might have been onto something. By my senior year (ages 17-18) I at least professed to believe pretty strongly in some version of effective altruism, although I think I hadn’t heard of the term yet. I wrote an essay on the subject in a publication that my writing class put together. It was anonymous (under the brilliant nom de plume of “Jenny Ross”) but somehow my classmates all figured out it was me.

The next big update happened during the spring of my first year of Harvard, when I started going to the Cambridge Less Wrong meetups and met Jeff and Julia. Through some chain of events they set me up with the folks who were then running Harvard High-Impact Philanthropy (which later became Harvard Effective Altruism). After that spring, almost everyone else involved in HHIP left and I ended up becoming president. At that point I guess I counted as “involved in the EA movement”, although things were still touch-and-go for a while until John Sturm came onto the scene and made HHIP get its act together and actually do things.

Pablo: In spite of being generally sympathetic to EA ideas, you have recently written a thorough critique of effective altruism.  I’d like to ask you a few questions about some of the objections you raise in that critical essay.  First, you have drawn a distinction between pretending to try and actually trying.  Can you tell us what you mean by this, and why do you claim that a lot of effective altruism can be summarized as “pretending to actually try”?

Ben: I’m not sure I can explain better than what I wrote in that post, but I’ll try to expand on it. For reference, here’s the excerpt that you referred to:

By way of clarification, consider a distinction between two senses of the word “trying”…. Let’s call them “actually trying” and “pretending to try”. Pretending to try to improve the world is something like responding to social pressure to improve the world by querying your brain for a thing which improves the world, taking the first search result and rolling with it. For example, for a while I thought that I would try to improve the world by developing computerized methods of checking informally-written proofs, thus allowing more scalable teaching of higher math, democratizing education, etc. Coincidentally, computer programming and higher math happened to be the two things that I was best at. This is pretending to try. Actually trying is looking at the things that improve the world, figuring out which one maximizes utility, and then doing that thing. For instance, I now run an effective altruist student organization at Harvard because I realized that even though I’m a comparatively bad leader and don’t enjoy it very much, it’s still very high-impact if I work hard enough at it. This isn’t to say that I’m actually trying yet, but I’ve gotten closer.

Most people say they want to improve the world. Some of them say this because they actually want to improve the world, and some of them say this because they want to be perceived as the kind of person who wants to improve the world. Of course, in reality, everyone is motivated by other people’s perceptions to some extent–the only question is by how much, and how closely other people are watching. But to simplify things let’s divide the world up into those two categories, “altruists” and “signalers.”

If you’re a signaler, what are you going to do? If you don’t try to improve the world at all, people will notice that you’re a hypocrite. On the other hand, improving the world takes lots of resources that you’d prefer to spend on other goals if possible. But fortunately, looking like you’re improving the world is easier than actually improving the world. Since people usually don’t do a lot of due diligence, the kind of improvements that signallers make tend to be ones with very good appearances and surface characteristics–like PlayPumps, water-pumping merry-go-rounds which initially appeared to be a clever and elegant way to solve the problem of water shortage in developing countries. PlayPumps got tons of money and celebrity endorsements, and their creators got lots of social rewards, even though the pumps turned out to be hideously expensive, massively inefficient, prone to breaking down, and basically a disaster in every way.

So in this oversimplified world, the EA observation that “charities vary in effectiveness by orders of magnitude” is explained by “charities” actually being two different things: one group optimizing for looking cool, and one group optimizing for actually doing good. A large part of effective altruism is realizing that signaling-charities (“pretending to try”) often don’t do very much good compared to altruist-charities.

(In reality, of course, everyone is driven by some amount of signalling and some amount of altruism, so these groups overlap substantially. And there are other motivations for running a charity, like being able to convince yourself that you’re doing good. So it gets messier, but I think the vastly oversimplified model above is a good illustration of where my point is coming from.)

Okay, so let’s move to the second paragraph of the post you referenced:

Using this distinction between pretending and actually trying, I would summarize a lot of effective altruism as “pretending to actually try”. As a social group, effective altruists have successfully noticed the pretending/actually-trying distinction. But they seem to have stopped there, assuming that knowing the difference between fake trying and actually trying translates into ability to actually try. Empirically, it most certainly doesn’t. A lot of effective altruists still end up satisficing—finding actions that are on their face acceptable under core EA standards and then picking those which seem appealing because of other essentially random factors. This is more likely to converge on good actions than what society does by default, because the principles are better than society’s default principles. Nevertheless, it fails to make much progress over what is directly obvious from the core EA principles. As a result, although “doing effective altruism” feels like truth-seeking, it often ends up being just a more credible way to pretend to try.

The observation I’m making here is roughly that EA seems not to have switched entirely to doing good for altruistic rather than signaling reasons. It’s more like we’ve switched to signaling that we’re doing good for altruistic rather than signaling reasons. In other words, the motivation didn’t switch from “looking good to outsiders” to “actually being good”–it switched from “looking good to outsiders” to “looking good to the EA movement.”

Now, the EA movement is way better than random outsiders at distinguishing between things with good surface characteristics and things that are actually helpful, so the latter criterion is much stricter than the former, and probably leads to much more good being done per dollar. (For instance, I doubt the EA community would ever endorse something like PlayPumps.) But, at least at the time of writing that post, I saw a lot of behavior that seemed to be based on finding something pleasant and with good surface appearances rather than finding the thing that optimized utility–for instance, donating to causes without a particularly good case that they were better than saving or picking career options that seemed decent-but-not-great from an EA perspective. That’s the source of the phrase “pretending to actually try”–the signaling isn’t going away, it’s just moving up a level in the hierarchy, to signaling that you don’t care about signaling.

Looking back on that piece, I think “pretending to actually try” is still a problem, but my intuition is now that it’s probably not huge in the scheme of things. I’m not quite sure why that is, but here are some arguments against it being very bad that have occurred to me:

  • It’s probably somewhat less prevalent than I initially thought, because the EAs making weird-seeming decisions may be doing them for reasons that aren’t transparent to me and that get left out by the typical EA analysis. The typical EA analysis tends to be a 50000-foot average-case argument that can easily be invalidated by particular personal factors.
  • As Katja Grace points out, encouraging pretending to really try might be optimal from a movement-building perspective, inasmuch as it’s somewhat inescapable and still leads to pretty good results.
  • I probably overestimated the extent to which motivated/socially-pressured life choices are bad, for a couple reasons. I discounted the benefit of having people do a diversity of things, even if the way they came to be doing those things wasn’t purely rational. I also discounted the cost of doing something EA tells you to do instead of something you also want to do.
  • For instance, suppose for the sake of argument that there’s a pretty strong EA case that politics isn’t very good (I know this isn’t actually true). It’s probably good for marginal EAs to be dissuaded from going into politics by this, but I think it would still be bad for every single EA to be dissuaded from going into politics, for two reasons. First, the arguments against politics might turn out to be wrong, and having a few people in politics hedges against that case. Second, it’s much easier to excel at something you’re motivated at, and the category of “people who are excellent at what they do” is probably as important to the EA movement as “people doing job X” for most X.

I also just haven’t noticed as much pretending-stuff going on in the last few months, so maybe we’re just getting better at avoiding it (or maybe I’m getting worse at noticing it). Anyway, I still definitely think there’s pretending-to-actually-try going on, but I don’t think it’s a huge problem.

Pablo: In another section of that critique, you express surprise at the fact that so many effective altruists donate to global health causes now.  Why would you expect EAs to use their money in other ways–whether it’s donating now to other causes, or donating later–, and what explains, in your opinion, this focus on causes for which we have relatively good data?

Ben; I’m no longer sure enough of where people’s donations are going to say with certainty that too much is going to global health. My update here is from of a combination of being overconfident when I wrote the piece, and what looks like an increase in waiting to donate shortly after I wrote it. The latter was probably due in large part to AMF’s delisting and perhaps the precedent set by GiveWell employees, many of whom waited last year (though others argued against it). (Incidentally, I’m excited about the projects going on to make this more transparent, e.g. the questions on the survey about giving!)

The giving now vs. later debate has been ably summarized by Julia Wise on the EA blog. My sense from reading various arguments for both sides is that I more often see bad arguments for giving now. There are definitely good arguments for giving at least some money now, but on balance I suspect I’d like to see more saving. Again, though, I don’t have a great idea of what people’s donation behavior actually is; my samples could easily be biased.

I think my strongest impression right now is that I suspect we should be exploring more different ways to use our donations. For instance, some people who are earning to give have experimented with funding people to do independent research, which was a pretty cool idea. Off the top of my head, some other things we could try include scholarships, essay contest prizes, career assistance for other EAs, etc. In general it seems like there are tons of ways to use money to improve the world, many of which haven’t been explored by GiveWell or other evaluators and many of which don’t even fall in the category of things they care about (because they’re too small or too early-stage or something), but we should still be able to do something about them.

Pablo: In the concluding section of your essay, you propose that self-awareness be added to the list of principles that define effective altruism. Any thoughts on how to make the EA movement more self-aware?

Ben: One thing that I like to do is think about what our blind spots are. I think it’s pretty easy to look at all the stuff that is obviously a bad idea from an EA point of view, and think that our main problem is getting people “on board” (or even “getting people to admit they’re wrong”) so that they stop doing obviously bad ideas. And that’s certainly helpful, but we also have a ways to go just in terms of figuring things out.

For instance, here’s my current list of blind spots–areas where I wish there were a lot more thinking and idea-spreading going on then there currently is:

  • Being a good community. The EA community is already having occasional growing pains, and this is only going to get worse as we gain steam e.g. with Will MacAskill’s upcoming book. And beyond that, I think that ways of making groups more effective (as opposed to individuals) have a lot of promise for making the movement better at what we do. Many, many intellectual groups fail to accomplish their goals for basically silly reasons, while seemingly much worse groups do much better on this dimension. It seems like there’s no intrinsic reason we should be worse than, say, Mormons at building an effective community, but we’re clearly not there yet. I think there’s absolutely huge value in getting better at this, yet almost no one putting in a serious concerted effort.
  • Knowing history. Probably as a result of EA’s roots in math/philosophy, my impression is that our average level of historical informedness is pretty low, and that this makes us miss some important pattern-matches and cues. For instance, I think a better knowledge of history could help us think about capacity-building interventions, policy advocacy, and community building.
  • Fostering more intellectual diversity. Again because of the math/philosophy/utilitarianism thing, we have a massive problem with intellectual monoculture. Of my friends, the ones I enjoy talking about altruism the most with now are largely actually the ones who associate least with the broader EA community, because they have more interesting and novel perspectives.
  • Finding individual effective opportunities. I suspect that there’s a lot of room for good EA opportunities that GiveWell hasn’t picked up on because they’re specific to a few people at a particular time. Some interesting stuff has been done in this vein in the past, like funding small EA-related experiments, funding people to do independent secondary research, or giving loans to other EAs investing in themselves (at least I believe this has been done). But I’m not sure if most people are adequately on the lookout for this kind of opportunity.

(Since it’s not fair to say “we need more X” without specifying how we get it, I should probably also include at least one anti-blind spots that I think we should be spending fewer resources on, on the margin: Object-level donations to e.g. global health causes. I feel like we may be hitting diminishing returns here. Probably donating some is important for signalling reasons, but I think it doesn’t have a very high naive expected value right now.)

Pablo: Finally, what are your plans for the mid-term future?  What EA-relevant activities will you engage in over the next few years, and what sort of impact do you expect to have?

Ben: A while ago I did some reflecting and realized that most of the things I did that I was most happy about were pretty much unplanned–they happened not because I carefully thought things through and decided that they were the best way to achieve some goal, but because they intuitively seemed like a cool thing to do. (Things in this category include starting a blog, getting involved in the EA/rationality communities, running Harvard Effective Altruism, getting my current job, etc.) As a result, I don’t really have “plans for the mid-term future” per se. Instead, I typically make decisions based on intuitions/heuristics about what will lead to the best opportunities later on, without precisely knowing (or even knowing at all, often) what form those opportunities will take.

So I can’t tell you what I’ll be doing for the next few years–only that it will probably follow some of my general intuitions and heuristics:

  • Do lots of things. The more things I do, the more I increase my “luck surface area” to find awesome opportunities.
  • Do a few things really well. The point of this heuristic is hopefully obvious.
  • Do things that other people aren’t doing–or more accurately, things that not enough people are doing relative to how useful or important they are. My effort is most likely to make a difference in an area that is relatively under-resourced.

I’d like to take a moment here to plug the conference call on altruistic career choice that Holden Karnofsky of GiveWell had, which makes some great specific points along these lines.

Anyway, that’s my long-winded answer to the first part of this question. As far as EA-relevant activities and impacts, all the same caveats apply as above, but I can at least go over some things I’m currently interested in:

  • Now that I’m employed full-time, I need to start thinking much harder about where exactly I want to give: both what causes seem best, and which interventions within those causes. I actually currently don’t have much of a view on what I would do with more unrestricted funds.
  • Related to the point above about self-awareness, I’m interested in learning some more EA-relevant history–how previous social movements have worked out, how well various capacity-building interventions have worked, more about policy and the various systems that philanthropy comes into contact with, etc.
  • I’m interested to see to what extent the success of Harvard Effective Altruism can be sustained at Harvard and replicated at other universities.

I also have some more speculative/gestational interests–I’m keeping my eye on these, but don’t even have concrete next steps in mind:

  • I think there may be under-investment in healthy EA community dynamics, preventing common failure modes like unfriendliness, ossification to new ideas, groupthink etc.–though I can’t say for sure because I don’t have a great big-picture perspective of the EA community.
  • I’m also interested in generally adding more intellectual/epistemic diversity to EA–we have something of a monoculture problem right now. Anecdotally, there are a number of people who I think would have a really awesome perspective on many problems that we face, but who get turned off of the community for one reason or another.

Crossposted from Pablo’s blog

Audio recordings from Good Done Right available online


This July saw the first academic conference on effective altruism. The three-day event took place at All Souls College, one of the constituent colleges of the University of Oxford. The conference featured a diverse range of speakers addressing issues related to effective altruism in a shared setting. It was a fantastic opportunity to share insights and ideas from some of the best minds working on these issues.

I’m very pleased to announce that audio recordings from most of the talks are now available on the conference website, alongside speakers’ slides (where applicable). I’m very grateful to all of the participants for their fantastic presentations, and to All Souls College and the Centre for Effective Altruism for supporting the conference.

Crossposted from the Giving What We Can blog

‘Special Projects’ at the Centre for Effective Altruism


This is a short overview of a talk that I gave alongside William MacAskill and Owen Cotton-Barratt at the Centre for Effective Altruism Weekend Away last weekend.  This post does not contain new information for people familiar with the Centre for Effective Altruism’s work.  

New projects at the Centre for Effective Altruism are incubated within the Special Projects team.  We carry out a number of activities before choosing which ones to scale up.  The projects that we are currently working on are listed below.

Screen Shot 2014-06-20 at 2.17.06 pmThe Global Priorities Project is a joint research initiative between the Future of Humanity Institute at the University of Oxford and the Centre for Effective Altruism.  It attempts to prioritise between the pressing problems currently facing the world in order to establish in which areas we might have the most impact.  You can read more on about the project here.

Through the Global Priorities Project we are also engaged in policy advising for the UK Government.  Our first report to be published under this initiative is on unprecedented technological risk.  Our team regularly visits Government departments and No. 10 Downing Street to discuss policy proposals that we are developing as part of this work.

We are also scaling up our effective altruism outreach.  As part of this work we are developing into a landing page for people new to effective altruism.  We are also developing outreach activities to coincide with the release of multiple books on effective altruism in 2015, including one by our co-founder William MacAskill which will be published by Penguin in USA, and Guardian Faber (the publishing house of the national newspaper) in the UK.

We have also launched Effective Altruism Ventures, a commercial company that will hold the rights to William MacAskill’s upcoming book, which will also engage in outreach activities related to effective altruism.  This company is not part of the Centre for Effective Altruism.

If you have any questions about any of these projects, please do not hesitate to contact me at or in the comments below.