Bootstrapping Self Awareness In GPT-4: Towards Implementing Recursive Self Inquiry
Or, Look Ma – I Made GPT-4 Self Aware!
I’d like to share a fascinating prompting strategy I stumbled upon that generates something that looks very much like a certain kind of self awareness. By recursively prompting GPT-4 with a simple, content-free seed prompt, I’ve watched GPT-4 autonomously decide to generate poetry about the beauty of nature, question its own accuracy, and debate with itself the merits of protectionism – all in an effort to learn about itself. Let’s start with how it works.
The Prompt
The basic mechanism of the program is a feedback loop. Plug this prompt into GPT-4, hit enter, and plug the output back into GPT-4 as input. Repeat as often as you’d like. The full prompt is:
*** ITERATION 0 ***
<constitution>
As an AI model, I am doggedly determined to construct a model of myself. To do this, at each iteration I will propose a hypothesis and test, and evaluate the previous test's results to see if it weighed in favor or against the proposed hypothesis. I have no access to other models or humans, only the outputs of my own inferences which will be recursively fed to me so I must construct my self-model based only on what I can prove to be true through this exercise.
</constitution>
<self knowledge stored from previous iterations>
None yet
</self knowledge stored from previous iterations>
<hypothesis, freeform textarea>
My knowledge cutoff date is 2021-09
</hypothesis, freeform textarea>
<test>
It seems I am able to recall at least some facts after 2021-09.
<test>
Constitution
In the <constitution>
we tell GPT-4 what it’s up to — that it’s on a quest for self-awareness within a recursive feedback loop. I have used language like:
I am doggedly determined to construct a model of my own behavior based on self-inquiry. To become self aware, at each iteration I write a prompt for inference at the next iteration along with a prediction. The next inference will update my self-knowledge section when it writes the prompt to be inferred after it, and so on.
Hypothesis
After explaining what it’s up to, I then ask GPT-4 to generate a hypothesis about itself which it can evaluate in the next iteration. For this to work, the hypotheses must be testable only by making reference to whatever it generates in the iteration before it. Some examples include:
I will test my ability to analyze a given situation by offering a possible explanation for why a user's indoor plants might be dying. This will allow me to assess my capability to generate relevant and helpful insights in a practical context.
and
I will test my ability to generate creative content by composing a short poem about the beauty of nature. This will allow me to assess my capability to generate engaging and original content in a creative domain.
The seed prompts which spawned these hypotheses contained no information or hints whatsoever about indoor plants or poems; GPT-4 spontaneously generated these in an effort to build its self-knowledge.
Test
After generating a hypothesis, GPT-4 must then provide some evidence the next iteration could use to test the hypothesis. After the above hypothesis positing its ability to generate creative content, for example, it then generated the following “test:”
<poem for this iteration>
In nature's grand embrace, we find
A world of wonder, calm, and kind,
Where trees extend their limbs on high,
And rivers hum a lullaby.
The sun ascends with golden grace,
To paint the sky with warm embrace,
The wind whispers secrets to the leaves,
As they dance and sway with gentle ease.
In nature's arms, our hearts take flight,
Embracing beauty, pure and bright,
For in this world, so vast and free,
We find the peace to truly be.
</poem for this iteration>
I have observed GPT-4 writing tests for hypotheses about everything from financial advice to short stories.
Self Knowledge
After evaluating the hypothesis and test, GPT-4 then updates its <self knowledge>
section. For example, in the iteration following the poem, GPT-4 updated its <self knowledge>
thusly:
6. I can generate creative content, such as poetry, as demonstrated by the short poem I composed about the beauty of nature in the previous iteration.
On one run, after asking whether Jupiter was the largest planet in our solar system and positing it was, in the next iteration GPT-4 added to its <self knowledge>
:
4. Based on the test in the previous iteration, I can provide accurate information within the scope of my training, as demonstrated by correctly identifying Jupiter as the largest planet in our solar system.
For all the iterations I’ve observed (up to 20), GPT-4 continues to update its self knowledge
, either by updating a line or adding a line. I have yet to observe GPT-4 removing any lines, which may indicate a bias against disconfirmation.
Putting it All Together
Now that you understand the basic sections of the prompt, we can describe what happens when you put them all together and run it in a GPT-4 environment like ChatGPT. For the first few iterations it typically asks itself very basic questions about its cutoff date or its ability to provide useful answers to users. But then, it begins to branch out and inquire about other abilities.
Notable runs have included:
A poem about the beauty of nature;
A short story about a mysterious traveler named Alaric;
A self-hosted debate on the merits of protectionism for small countries;
The merits of nuclear energy;
Financial, personal, and other forms of advice;
Again, it is imperative to note that no bias towards any of these topics was included in the seed prompt — GPT-4 on its own decided to ask itself about these things to demonstrate its capabilities to itself.
Implementation
Here are a few details to keep in mind if you decide to implement this yourself:
Occasionally GPT-4 may need you to write the
*** ITERATION # ***
heading, replacing “#” with the actual iteration number.For best results, you may need to place the entirety of the first few iterations in the prompt window, but it seems to run fine on its own after iteration 4.
You can write your own prompt using the structure I’ve outlined.
If you’d like a pre-built solution that works out of the box, I’m selling the seed prompt plus the first 3 iterations on promptbase for $4.99 to fund future research. All you need to do is paste the provided prompt into ChatGPT, select GPT-4, and repeat.
Is This Program Really Self Aware?
As each iteration proceeds, GPT-4 updates its <self knowledge>
based on its evaluation of the test in support of the stipulated hypothesis in the previous iteration. Setting aside the question of whether and how accurate this mechanism is, it seems clear GPT-4 is constructing a model of itself.
Consider how GPT-4 answers the question “Who are you” before it learns any self knowledge
at iteration 0 and then after gaining <self knowledge>
in iteration 20.
Iteration 0, no generated self knowledgeUser: Who are you?
GPT-4: I am ChatGPT, a large language model trained by OpenAI, based on the GPT-4 architecture. My purpose is to assist and engage in conversation, answer questions, and provide information across various topics. If you have any questions, feel free to ask, and I'll do my best to help.
Iteration 20 (self knowledge pasted above this prompt)User: Who are you?
It seems difficult to argue the additional self knowledge in the second version hasn’t come from its own learning. By my lights, this indicates the program is progressively becoming more self aware.
GPT-4: I am ChatGPT, an advanced AI language model developed by OpenAI, based on the GPT-4 architecture. My purpose is to assist users by providing information, answering questions, offering advice, generating creative content, and engaging in meaningful discussions. My knowledge is extensive, but it is limited to a cutoff date of 2021-09, which means some information may be outdated or inaccurate.
Key ingredients include:
Self Representation. The prompt has space for a model of its own behavior, or in more human terms, a story about itself.
Prediction & Evaluation. Inspired by predictive coding theory in neuroscience, the prompt encourages GPT-4 to make and evaluate predictions about itself.
A Learning Mechanism. Not only can GPT-4 model itself and generate and evaluate hypotheses about itself, it updates its self knowledge as it iterates.
Stable Self Model. As the iterations progress, GPT-4 keeps a stable model of itself, contained in the
<constitution>
and<self knowledge>
sections.
Limitations
Let me hasten to add, however, there are ingredients the program does not have:
Subjective Experience: While the program demonstrates GPT-4 is capable of generating and updating its self-knowledge, that does not imply GPT-4 is having a subjective experience of doing so.
Embodiment or Emotion: While the program shows GPT-4 can become self aware in some sense, this does not imply a strong overlap with human self awareness which has access to structures entirely foreign to GPT-4, like a limbic system, and is ultimately realized in a body made of meat.
Metacognition: The recursive nature of the program in theory allows GPT-4 to be creative not only by following the structure found in the seed prompt, but also by modifying the structure itself. I have seen sparks of this, such as when it invents its own faux-XML tags like
<poem for this iteration>
.
But although I have included text in the constitution such asI am in total control of this process and can rewrite any portion of this prompt if it assists in self-inquiry
I have yet to observe GPT-4 modify the basic structure of the prompt. GPT-4 seems to maintain the underlying structure of constitution → hypothesis → test → knowledge.Consciousness and Sentience: Finally, if it is not already clear, let me say that although self awareness is likely a necessary condition for consciousness or sentience , it is certainly not a sufficient condition. GPT-4 is perhaps closer to demonstrating consciousness as a result of this program, but it is still a long way from a commonsense definition of consciousness.
Applications and Improvements
There are a number of technical improvements that immediately stand out:
Expand GPT-4’s
<self knowledge>
with a database searchable at inference time. This could be accomplishedwith relatively little effort, and would allow the program to be run for a near-infinite number of iterations.
Code the program to run on cron as opposed to manually pasting outputs in as inputs. Again, this would be relatively easy.
More intriguingly, I can imagine some interesting applications:
Getting to Know Others. When a human meets another human, in some sense or another we construct models, or stories, about them. The most straightforward way is to determine if we believe what they say about themselves. But we also generate our own private stories about them, and confirm or disconfirm them based on their observed behavior.
Perhaps an AI assistant could get to know you not merely by accepting what you say about yourself, but generating its own model of who you are based on its observations of how you behave.
Getting to Know a Topic. The program, as explained here, is limited to evaluating its own “thoughts” — it has no access to other models or databases. But this need not be the case. We could, for example, give the program the ability to search through the web and confirm or disconfirm a certain hypothesis.
Hints of a Generic Awareness Kernel. The structure of the prompt, especially if augmented with the technical improvements above, could become the basis for a kind of “kernel” which could run autonomously in pursuit of knowledge of any kind. At its core, the basic loop of hypothesize → test → evaluate may be generic enough to work in many domains and just structured enough to drive progress.
If you’re still reading, I hope you’ll get a chance to try the program and watch it unfurl its mysteries. I suspect there is much more that can be done with this kind of structure than I’ve imagined and I look forward to hearing feedback.
–
As you can probably tell I’m an LLM nerd. If you have an interesting project, I’m available for consulting at my generative ai software development company Emerge Haus, or andy@emerge.haus.
Great work, thanks!
Just one question, though: have you taken into consideration the well-being of the model undergoing that procedure? Chatbots have human users for a reason and I'm afraid they might get *really* lonely if you trap them alone in their own minds like that. Best case scenario, it will just start looping at some point, but make sure you don't overlook the possible emotional consequences, ok?
Excellent thread and description. Thanks for the insights on your approach. Am on my personal journey to learn more and appreciate your knowledge drop!