How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN

Tom McCoy, Paul Smolensky, Tal Linzen, Jianfeng Gao, Asli Celikyilmaz

This post accompanies the paper found here.


Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore...


So that now, to still the beating of my heart, I stood repeating

Novel n-grams and syntactic structures

"Doubtless," said I, "what it utters is its only stock and store"

Morphology and syntax

Then, upon the velvet sinking, I betook myself to linking
Fancy unto fancy


Much I marvelled this ungainly fowl to hear discourse so plainly,
Though its answer little meaning—little relevancy bore


And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamp-light o'er him streaming throws his shadow on the floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted—nevermore!