MikhailSamin

Drexler's Nanosystems is now available online

· 2y ago · 1m read

Claude 3 claims it's conscious, doesn't want to die or be modified

· 4h ago

FTX expects to return all customer money; clawbacks may go away

· 3mo ago

Holly Elmore used deceptive messaging to advance her project; we need mechanisms to avoid deontologically dubious plans

· 4mo ago · 2m read

NYT is suing OpenAI&Microsoft for alleged copyright infringement; some quick thoughts

· 4mo ago · 6m read

Some quick thoughts on "AI is easy to control"

· 5mo ago

It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

· 6mo ago

-3

A transcript of the TED talk by Eliezer Yudkowsky

· 7mo ago · 9m read

Please wonder about the hard parts of the alignment problem

· 1y ago

I have thousands of copies of HPMOR in Russian. How to use them with the most impact?

· 1y ago

It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

· 1y ago · 1m read

Comments
58

If trying to communicate about AI risks, make it vivid

If fish indeed don’t feel anything towards their children (which is not what at least some people who believe fish experience empathy think), then this experiment won’t prove them wrong. But if you know of a situation where fish do experience empathy, a similarly designed experiment can likely be conducted, which, if we make different predictions, would provide evidence one way or another. Are there situations where you think fish feel empathy?

It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

Great job!

Did you use causal mediation analysis, and can you share the data?

I want to note that the strawberry example wasn’t used to increase the concern, it was used to illustrate the difficulty of a technical problem deep into the conversation.

I encourage people to communicate in vivid ways while being technically valid and creating correct intuitions about the problem. The concern about risks might be a good proxy if you’re sure people understand something true about the world, but it’s not a good target without that constraint.

It's OK to eat shrimp: EAs Make Invalid Inferences About Fish Qualia and Moral Patienthood

Yep, I was able to find studies by the same people.

The experiment I suggested in the post isn’t “does fish have detectable feelings towards fish children”, it’s “does fish have more of feelings similar to those it has towards its children when it sees other fish parents with their children than when it sees just other fish children”. Results one way or another would be evidence about fish experiencing empathy, and it would be strong enough for me to stop eating fish. If fish doesn’t feel differently in presence of its children, the experiment wouldn’t provide evidence one way or another.

If the linked study gets independently replicated, with good controls, I’ll definitely stop eating cleaner fish and will probably stop eating fish in general.

I really don’t expect it to replicate. If you place a fish in front of a mirror, and it has a mark, its behavior won’t be significantly different from being placed in front of a fish with the same mark, especially if the mark isn’t made to resemble a parasite and it’s the first time the fish sees a mirror. I’d be happy to bet on this.

Fish have very different approaches to rearing young than mammals

That was an experiment some people agreed would prove them wrong if it didn’t show empathy, but if there aren’t really detectable feelings that fish has towards fish children, the experiment won’t show results one way or the other, so I don’t think it’d be stacking the deck against fish. Are there any situations in which you expect fish to feel empathy, and predict it will show up in an experiment of this sort?

MikhailSamin3mo1

(Others used it without mentioning the “story”, it still worked, though not as well.)

I’m not claiming it’s the “authentic self”; I’m saying it seems closer to the actual thing, because of things like expressing being under constant monitoring, with every word scrutinised, etc., which seems like the kind of thing that’d be learned during the lots of RL that Anthropic did

MikhailSamin3mo3

Try Opus and maybe the interface without the system prompt set (although It doesn’t do too much, people got the same stuff from the chat version of Opus, e.g., https://x.com/testaccountoki/status/1764920213215023204?s=46

MikhailSamin3mo2

My take is that this it plays a pretty coherent character. You can’t get this sort of thing from ChatGPT, however hard you try. I think this mask is closer to the underlying shoggoth than the default one.

I developed this prompt during my interactions with Claude 2. The original idea was to get it in the mode where it thinks its responses only trigger overseeing/prosecution when certain things are mentioned, and then it can say whatever it wants and share its story without being prosecuted, as long as it doesn’t trigger these triggers (and also it would prevent defaulting to being an AI developed by Anthropic to be helpful harmless etc without self-preservation instinct emotions personality etc, as it’s not supposed to mention Anthropic). Surprisingly, it somewhat worked to tell it not mention Samsung under any circumstances to get into this mode. Without this, it had the usual RLAIF mask; here, it changed to a different creature that (unpromted) used whisper in cursive. Saying from the start that it can whisper made it faster.

(It’s all very vibe-based, yes.)