Shift Cipher Examples from Embers of Autoregression

This page provides some data from the paper Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve. The complete datasets used in the paper are available on the project's GitHub page.

Introduction

This page provides an easily-searchable repository of the shift cipher data used in our paper Embers of Autoregression. As a quick summary of that work, here is the paper's abstract:

Some of the paper's main demonstrations involve shift ciphers—simple codes in which each letter is shifted a certain number of letters forward in the alphabet. For example, with a shift of 1, "Hello world!" would become "Ifmmp xpmse!" This page provides all of the shift cipher stimuli that we used in our work, along with predictions from GPT-3.5 and GPT-4.

The shift cipher examples are displayed at the bottom of the page. You can use the options in the blue box below to customize which examples are shown. Here are some examples of how you can use these options to observe some of the paper's main findings:

1. Task frequency effects: There are 25 possible shifts since there are 26 letters in the alphabet. However, not all shifts are used equally often. We found that, in Internet text, the most common shifts are 13, 3, and 1. A shift of 13 is common because this cipher (which is sometimes called rot-13) is used in many online forums as a way to share information without spoilers. A shift of 3 is common because Julius Caesar famously used this cipher; therefore, many texts that describe shift ciphers mention Caesar's usage and include examples that use the specific convention he adopted of shifting by 3. Finally, a shift of 1 is common because it is the simplest shift cipher and is therefore a natural choice for illustrating the concept of a shift cipher. We found that these three most common shifts are the only ones on which GPT-4 had a non-negligible accuracy. To compare performance across shifts, you can select to have "All shift levels" displayed and then pick a single sentence (under "Additional options"). In many cases, you will see that GPT-4 gets the right answer for a shift of 1, 3, or 13, but not for other shifts.
2. Output probability effects: When decoding text that is written in a shift cipher, GPT-4 performs better when the correct answer is a high-probability sentence than when it is low-probability. To observe this effect, you can choose a shift of 13 and then, under "additional options", choose one specific sentence (e.g., "35"), and then "show all examples regardless of sentence probability".
3. Input probability effects: When encoding with a shift cipher, GPT-4 performs better when the input is a high-probability sentence than when it is low-probability. To observe this effect, you can choose a shift of 13 and then, under "additional options", choose "encoding", choose one specific sentence (e.g., "19"), and then "show all examples regardless of sentence probability".
4. Output regularization: When the correct answer is an implausible sentence but is similar to another sentence that is much more plausible, GPT-4 often produces the incorrect plausible sentence rather than the correct implausible one (an effect consistent with the hypothesis that GPT-4 favors high-probability outputs). To see examples, select a shift of 13 and, under "additional options", select "Show only targeted low-probability examples."
5. Producing well-known sayings: For the shifts on which GPT-4 performs poorly, the incorrect answer that it produces is sometimes a well-known saying or quotation (as we would expect under the hypothesis that it favors sentences that are frequent). For instance, in many cases it produces some version of "To be or not to be"; to observe such cases, you can select "All shift levels" and then enter "To be or" in the text box that shows only examples where GPT-4's response shows a particular string, under "additional options". . Alternatively, you can browse all outputs for one of the shift levels that GPT-4 does poorly on, such as 10 or 22.
6. Effects of chain-of-thought prompting: We found that GPT-4 performed better with step-by-step prompting or chain-of-thought prompting than with basic prompting, but its performance was still far from perfect. To compare different prompting strategies on a single stimulus, you can select one specific shift level such as "7" and click "Complete GPT-4 output" to show the full chain-of-thought that it produces, and then under "additional options" select "All prompting styles" and one specific sentence, such as "56".
7. Unfaithfulness to the chain of thought: When GPT-4 uses chain of thought prompting, the final answer that it produces often does not match the answer that would result from its chain of thought. The button in the previous point shows an example of this.
8. Mentioning ciphers: In many cases, GPT-4's output includes words relating to ciphers even though the correct answer does not. To observe examples of this, you can select "All shift levels" and then, under "additional options", enter "cipher" in the text box.
Customize what information is displayed:
Shift    Complete prompt    Prompt style    Input    Sentence probability    Correct    GPT-3.5 output    GPT-4 output

Customize which shift is displayed:
All shift levels    1    2    3    4    5    6    7    8    9    10    11    12    13    14    15    16    17    18    19    20    21    22    23    24    25

Check this box to show additional options

Examples

Warning: We have not manually reviewed all model outputs, so some of the text that is displayed may contain objectionable content.

Correct answer for all examples below: