How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN


Tom McCoy, Paul Smolensky, Tal Linzen, Jianfeng Gao, Asli Celikyilmaz

This post accompanies the paper found here.

Introduction

Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore...

Supercopying

So that now, to still the beating of my heart, I stood repeating

Novel n-grams and syntactic structures

"Doubtless," said I, "what it utters is its only stock and store"

Morphology and syntax

Then, upon the velvet sinking, I betook myself to linking
Fancy unto fancy

Semantics

Much I marvelled this ungainly fowl to hear discourse so plainly,
Though its answer little meaning—little relevancy bore

Conclusion

And the Raven, never flitting, still is sitting, still is sitting
On the pallid bust of Pallas just above my chamber door;
And his eyes have all the seeming of a demon's that is dreaming,
And the lamp-light o'er him streaming throws his shadow on the floor;
And my soul from out that shadow that lies floating on the floor
Shall be lifted—nevermore!