AI safety
AI safety
Studying and reducing the existential risks posed by advanced artificial intelligence

Quick takes

67
5d
6
We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.    From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons: 1. Incentives 2. Culture From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host successful multibillion-dollar scientific/engineering projects: 1. As part of an intergovernmental effort (e.g. CERN’s Large Hadron Collider, the ISS) 2. As part of a governmental effort of a single country (e.g. Apollo Program, Manhattan Project, China’s Tiangong) 3. As part of a larger company (e.g. Google DeepMind, Meta AI) In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the case for pausing is uncertain, and minimal incentive to stop or even take things slowly.  From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements: 1. Ideological AGI Vision AI-focused companies may have a large contingent of “true believers” who are ideologically motivated to make AGI at all costs and 2. No Pre-existing Safety Culture AI-focused companies may have minimal or no strong “safety” culture where people deeply understand, have experience in, and are motivated by a desire to avoid catastrophic outcomes.  The first one should be self-explanatory. Th
2
3d
1
Status: Fresh argument I just came up with. I welcome any feedback! Allowing the U.S. Social Security Trust Fund to invest in stocks like any other national pension fund would enable the U.S. public to capture some of the profits from AGI-driven economic growth. Currently, and uniquely among national pension funds, Social Security is only allowed to invest its reserves in non-marketable Treasury securities, which are very low-risk but also provide a low return on investment relative to the stock market. By contrast, the Government Pension Fund of Norway (also known as the Oil Fund) famously invests up to 60% of its assets in the global stock market, and the Japanese Government Pension Investment Fund invests in a 50-50 split of stocks and bonds.[1] The Social Security Trust Fund, which is currently worth about $2.9 trillion, is expected to run out of reserves by 2034, as the retirement-age population increases. It has been proposed that allowing the Trust Fund to invest in stocks would allow it to remain solvent through the end of the century, avoiding the need to raise taxes or cut benefits (e.g. by raising the retirement age).[2] However, this policy could put Social Security at risk of insolvency in the event of a stock market crash.[3] Given that the stock market has returned about 10% per year for the past century, however, I am not very worried about this.[4] More to the point, if (and when) "transformative AI" precipitates an unprecedented economic boom, it is possible that a disproportionate share of the profits will accrue to the companies involved in the production of the AGI, rather than the economy as a whole. This includes companies directly involved in creating AGI, such as OpenAI (and its shareholder Microsoft) or Google DeepMind, and companies farther down the value chain, such as semiconductor manufacturers. If this happens, then owning shares of those companies will put the Social Security Trust Fund in a good position to benefit from the econo
14
22d
American Philosophical Association (APA) announces two $10,000 AI2050 Prizes for philosophical work related to AI, with June 23, 2024 deadline:  https://dailynous.com/2024/04/25/apa-creates-new-prizes-for-philosophical-research-on-ai/ https://www.apaonline.org/page/ai2050 https://ai2050.schmidtsciences.org/hard-problems/
85
7mo
6
Being mindful of the incentives created by pressure campaigns I've spent the past few months trying to think about the whys and hows of large-scale public pressure campaigns (especially those targeting companies — of the sort that have been successful in animal advocacy). A high-level view of these campaigns is that they use public awareness and corporate reputation as a lever to adjust corporate incentives. But making sure that you are adjusting the right incentives is more challenging than it seems. Ironically, I think this is closely connected to specification gaming: it's often easy to accidentally incentivize companies to do more to look better, rather than doing more to be better. For example, an AI-focused campaign calling out RSPs recently began running ads that single out AI labs for speaking openly about existential risk (quoting leaders acknowledging that things could go catastrophically wrong). I can see why this is a "juicy" lever — most of the public would be pretty astonished/outraged to learn some of the beliefs that are held by AI researchers. But I'm not sure if pulling this lever is really incentivizing the right thing. As far as I can tell, AI leaders speaking openly about existential risk is good. It won't solve anything in and of itself, but it's a start — it encourages legislators and the public to take the issue seriously. In general, I think it's worth praising this when it happens. I think the same is true of implementing safety policies like RSPs, whether or not such policies are sufficient in and of themselves. If these things are used as ammunition to try to squeeze out stronger concessions, it might just incentivize the company to stop doing the good-but-inadequate thing (i.e. CEOs are less inclined to speak about the dangers of their product when it will be used as a soundbite in a campaign, and labs are probably less inclined to release good-but-inadequate safety policies when doing so creates more public backlash than they were
12
20d
A lot of policy research seems to be written with an agenda in mind to shape the narrative. And this kind of destroys the point of policy research which is supposed to inform stakeholders and not actively convince or really nudge them. This might cause polarization in some topics and is in itself, probably snatching legitimacy away from the space. I have seen similar concerning parallels in the non-profit space, where some third-sector actors endorse/do things which they see as being good but destroys trust in the whole space. This gives me scary unilaterist's curse vibes..
31
3mo
Y-Combinator wants to fund Mechanistic Interpretability startups "Understanding model behavior is very challenging, but we believe that in contexts where trust is paramount it is essential for an AI model to be interpretable. Its responses need to be explainable. For society to reap the full benefits of AI, more work needs to be done on explainable AI. We are interested in funding people building new interpretable models or tools to explain the output of existing models." Link https://www.ycombinator.com/rfs (Scroll to 12) What they look for in startup founders https://www.ycombinator.com/library/64-what-makes-great-founders-stand-out
8
25d
1
Quote from VC Josh Wolfe: https://overcast.fm/+5AWO95pnw/46:15
44
7mo
4
I might elaborate on this at some point, but I thought I'd write down some general reasons why I'm more optimistic than many EAs on the risk of human extinction from AI. I'm not defending these reasons here; I'm mostly just stating them. * Skepticism of foom: I think it's unlikely that a single AI will take over the whole world and impose its will on everyone else. I think it's more likely that millions of AIs will be competing for control over the world, in a similar way that millions of humans are currently competing for control over the world. Power or wealth might be very unequally distributed in the future, but I find it unlikely that it will be distributed so unequally that there will be only one relevant entity with power. In a non-foomy world, AIs will be constrained by norms and laws. Absent severe misalignment among almost all the AIs, I think these norms and laws will likely include a general prohibition on murdering humans, and there won't be a particularly strong motive for AIs to murder every human either. * Skepticism that value alignment is super-hard: I haven't seen any strong arguments that value alignment is very hard, in contrast to the straightforward empirical evidence that e.g. GPT-4 seems to be honest, kind, and helpful after relatively little effort. Most conceptual arguments I've seen for why we should expect value alignment to be super-hard rely on strong theoretical assumptions that I am highly skeptical of. I have yet to see significant empirical successes from these arguments. I feel like many of these conceptual arguments would, in theory, apply to humans, and yet human children are generally value aligned by the time they reach young adulthood (at least, value aligned enough to avoid killing all the old people). Unlike humans, AIs will be explicitly trained to be benevolent, and we will have essentially full control over their training process. This provides much reason for optimism. * Belief in a strong endogenous response to AI:
Load more (8/99)