Tom McCoy, Erin Grant, Paul Smolensky, Tom Griffiths, Tal Linzen
This post accompanies the paper found here and the code found here.
Language acquisition involves a complex interplay between the data and the learner. The importance of data is clear: we can only learn a language if we have experience with it. Less obviously, acquisition is also guided by properties of the learner called inductive biases, which determine how the learner will generalize beyond the utterances they have encountered. As an example, consider the following pattern:
What output should replace the question marks? You probably answered .ta.ra.va. even though other answers are conceivable; e.g., the provided outputs are consistent with a rule that involves reversing the input, so the answer could instead be .va.ra.ta. If you answered .ta.ra.va. instead of .va.ra.ta., it suggests that you have an inductive bias for preserving the input order.
Which inductive biases enable humans to acquire language? The answer is hotly debated. To facilitate computational modeling aimed at answering this question, we introduce a framework for giving particular linguistic inductive biases to a model. Such a model can then be used to explore the effects of those inductive biases, and to see which biases yield the most human-like generalization behavior.
In our framework, the inductive biases are encoded in the initial state of a neural network. This initial state is found with meta-learning, a technique through which a model discovers how to acquire new languages more easily via exposure to many possible languages. By controlling the properties of the languages that are used during meta-learning, we can control the set of inductive biases that meta-learning imparts.
To demonstrate this framework, we use the linguistic domain of syllable structure. For a given language, the model must learn how to map a sequence of sounds (e.g., tarava) into a sequence of syllables (e.g., .ta.ra.va.). Each language has restrictions on what types of syllables are allowed, and sounds may need to be inserted or deleted to meet these restrictions. For example, if a language requires all syllables to end with a vowel, the input kep might map to .ke.pa. (if the language uses insertion) or .ke. (if the language uses deletion).
The demo below shows how a neural network initialized with meta-learning can learn syllable structure mappings much more quickly than a neural network with a standard random initialization. This demo trains the models in your browser using randomly-generated data, so none of the results that you see are cherry-picked.
The meta-initialized model can typically learn any of these syllable structure mappings from fewer than 100 examples—sometimes substantially fewer. In contrast, after the same number of examples, the outputs of the randomly initialized model are usually nowhere near the correct answers.
In the class of languages that we include in our experiment, certain inputs will map to the same output across all languages; such mappings are therefore absolute universals, or properties that all languages possess. For example, an input of the form CVCV will always map to .CV.CV. (where C stands for any consonant, V stands for any vowel, and periods indicate word and syllable boundaries). Even though some languages allow syllables of the form .CVC. and .V., the input CVCV will only ever map to .CV.CV., never to .CVC.V. The meta-initialized model typically gets the correct output for such inputs without any training in the language, indicating that the meta-learning process has imparted these absolute universals. You can test these absolute universals by clicking "Show advanced settings" in the demo.
How many examples does the randomly-initialized model need before it begins performing well? The table below should give some sense of this; it contains data from one language for each of the 8 abstract categories of languages that we use in our experiments. On these languages, the randomly-initialized model typically needs a few thousand examples, compared to a few dozen for the meta-initialized model.
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
uttxr | .u. | pzrrrrrrrr | .u.xu. | .u. | .u. | .u. | .u. | .u. | .u. | .u. | .u. | .u. |
raxur | .ra.xu. | pzrrrrrrrr | .ra.xu. | .ra.xu. | .ra.xu. | .ra.xu.u. | .ra. | .ra.xu. | .ra.xu. | .ra.xu. | .ra.xu. | .ra.xu. |
Oar | .O.a. | pzrrrrrrrr | .O.ra. | .O.a. | .O.a. | .O.a. | .O.a. | .O.a. | .O.a. | .O.a. | .O.a. | .O.a. |
atrau | .a.ra.u. | pzrrrrrrrr | .a.ra. | .a.ra.u. | .a.ru.u. | .a.ra.u. | .a.ra. | .a.ra.u. | .a.ra.u. | .a.ra.u. | .a.ra.u. | .a.ra.u. |
tOuxa | .tO.u.xa. | pzrrrrrrrr | .tO.xo.x. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. | .tO.u.xa. |
xruto | .ru.tO. | prrrrrrrrr | .tu.xO. | .tu. | .ru.tO. | .ru.tO.O. | .ru.tO. | .ru.tO. | .ru.tO. | .ru.tO. | .ru.tO. | .ru.tO. |
xOxra | .xO.ra. | pzrrrrrrrr | .xO.xa. | .xO.xa. | .xO.ra. | .xO.ra. | .xO.ra. | .xO.ra. | .xO.ra. | .xO.ra. | .xO.ra. | .xO.ra. |
uuuuO | .u.u.u.u.O. | xhUjjjjjjj | .u.uu.O.. | .u.u.u.u. | .u.u.u.u.O. | .u.u.u.O.O. | .u.u.u.O. | .u.u.u.u. | .u.u.u.u.O. | .u.u.u.u.O. | .u.u.u.u.u. | .u.u.u.u.O. |
xt | <empty> | prrrrrrrrr | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> |
aata | .a.a.ta. | pai.iii.ii | .a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. | .a.a.ta. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
UUEm | .U.U.Em. | emmm....Ok | .U.mU.E.. | .U.E.m | .U.U.Em. | .U.U.Em. | .U.U.Em. | .U.U.Em. | .U.U.E.Em. | .U.U.Em. | .U.U.Em. | .U.U.Em. |
pppE | .pEp.pE. | emmm....Ok | .pE.pE.EE.. | .pEp.pE.. | .pEp.pE. | .pE..pE. | .pEp.pE. | .pEp.pE. | .pEp.pE. | .pEp.pE. | .pEp.pE. | .pEp.pE. |
UmE | .U.mE. | emmm....Ok | .U.mE.. | .U.mE. | .U.mE. | .U.mE. | .U.mE. | .U.mE. | .U.mE. | .U.mE. | .U.mE. | .U.mE. |
<empty> | <empty> | emmm....Ok | .EE. | .EE. | .EE. | .EE. | .xE. | .cE. | .xE. | .xE. | .xE. | .cE. |
xEpcU | .xEp.cU. | eNmmm....O | .xE....... | .xE..pE..U. | .xE..pU. | .xE..cc.UU. | .xEp.cU. | .xEp.cU. | .xEp.cU. | .xEp.cU. | .xEp.cU. | .xEp.cU. |
mExE | .mE.xE. | emmm....Ok | .mE.EE.E. | .mExxE.. | .mE.xE. | .mE.xE. | .mE.xE. | .mE.xE. | .mE.xE. | .mE.xE. | .mE.xE. | .mE.xE. |
UUUEE | .U.U.U.E.E. | emmm....Ok | .U.UU.E.E.. | .U.U.EE. | .U.U.U.E. | .U.U.U.E.E. | .U.U.U.E.E. | .U.U.U.E.E. | .U.U.U.E.E. | .U.U.U.E.E. | .U.U.U.E.E. | .U.U.U.E.E. |
Exppp | .E.xEp.pEp. | eNmmm....O | .E.xEppEpp. | .Ex.pEppEp. | .E.xEp.pEp. | .E.xEp.pEp. | .E.xEp.pEp. | .E.xEp.pEp. | .E.xEp.pEp. | .E.xEp.pEp. | .E.xEp.pEp. | .E.xEp.pEp. |
UpEUc | .U.pE.Uc. | emmm....O | .U..pE.E... | .U.pE.U.. | .U.pE.cc. | .U.pE.Uc. | .U.pE.Uc. | .U.pE.Uc. | .U.pE.Uc. | .U.pE.Uc. | .U.pE.Uc. | .U.pE.Uc. |
EUUpm | .E.U.U.pEm. | emmm....Ok | .U.E..E..E. | .E.U.pEp. | .E.U.U.pEm. | .E.U.U.pEm. | .E.U.U.pEm. | .E.U.U.pEm. | .E.U.U.pEm. | .E.U.U.pEm. | .E.U.U.pEm. | .E.U.U.pEm. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Aktt | .kAk.tUt. | U.kA.kkU.. | .kA.tUt. | .kAk.UUt. | .kAk.tUt. | .kAk.tUt. | .kAk.tUt. | .kAk.tUt. | .kAk.tUt. | .kAk.tUt. | .kAk.tUt. | .kAk.tUt. |
aukk | .ka.ku.kUk. | UU.kkU..k | .ka.kukk | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. | .ka.ku.kUk. |
Uta | .kU.ta. | UU.kkU..kk | .kU.ka.. | .kU.ta. | .kU.ta. | .kU.ta. | .kU.ta. | .kU.ta. | .kU.ta. | .kU.ta. | .kU.ta. | .kU.ta. |
tak | .tak. | UU.kkU..kk | .tU.. | .ta.k | .tak. | .tak. | .tak. | .tak. | .tak. | .tak. | .tak. | .ta.kak. |
ttktA | .tU.tUk.tA. | nkaaaaaaaa | .tU.tUt. | .tU..Ut.kA. | .tU.tUt.tA. | .tU.tUk.tA. | .tU.tUk.tA. | .tU.tUk.tA. | .tU.tUk.tA. | .tU.tUk.tA. | .tU.tUk.tA. | .tU.tUk.tA. |
UtAUa | .kU.tA.kU.ka. | UU.kkU..kk | .kU.kU.ka.ka. | .kU.tA.kA.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. | .kU.tA.kU.ka. |
kuUta | .ku.kU.ta. | U.kU.kkU.. | .kU.kU.ka.. | .ku.kU.tU. | .ku.kU.ta. | .ku.kU.ta. | .ku.kU.ta. | .ku.kU.ta. | .ku.kU.ta. | .ku.kU.ta. | .ku.kU.ta. | .ku.kU.ta. |
katta | .kat.ta. | U.kU.kkU.. | .ka.tat.. | .kat.ta. | .kat.ta. | .kat.ta. | .kat.ta. | .kat.ta. | .kat.ta. | .kat.ta. | .kat.ta. | .kat.ta. |
U | .kU. | UUkkkaaaaa | .kU. | .kU. | .kU. | .kU. | .kU. | .kU. | .kU. | .kU. | .kU. | |
kattu | .kat.tu. | U.kkU..kkU | .ka.tut | .kat.ku. | .kat.tu. | .kat.tu. | .kat.tu. | .kat.tu. | .kat.tu. | .kat.tu. | .kat.tu. | .kat.tu. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
btubo | .bU.tu.bo. | uttjutjutj | .bU.bo.fo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. | .bU.tu.bo. |
uuutb | .fu.fu.fu.tU.bU. | tNujutjutj | .fu.fu.fu.bU.bU. | .fu.fu.tU.bU.bU. | .fu.fu.tU.bU.bU. | .fu.fu.fu.tU. | .fu.fu.fu.tU.bU. | .fu.fu.tU.bU.bU. | .fu.fu.tU.bU.bU. | .fu.fu.fu.tU.bU. | .fu.fu.fu.tU.bU. | .fu.fu.fu.tU.bU. |
fffbf | .fU.fU.fU.bU.fU. | jutjutjutj | .fU.fU.bU.bU. | .fU.fU.bU.fU. | .fU.fU.fU.bU.fU. | .fU.fU.fU.fU.fU. | .fU.fU.bU.fU.fU. | .fU.fU.bU.bU.fU. | .fU.fU.fU.bU.fU. | .fU.fU.fU.bU.fU. | .fU.fU.fU.bU.fU. | .fU.fU.fU.bU.fU. |
votuv | .vo.tu.vU. | uttjutjutj | .vo.vo.vU.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. | .vo.tu.vU. |
uoUUf | .fu.fo.fU.fU.fU. | jutjutjutj | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. | .fu.fo.fU.fU.fU. |
vuufu | .vu.fu.fu. | tNujutjutj | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. | .vu.fu.fu. |
ubuu | .fu.bu.fu. | rZAbwjutju | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. | .fu.bu.fu. |
fvuof | .fU.vu.fo.fU. | jutjutjutj | .fU.fo.fU.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. | .fU.vu.fo.fU. |
vuofu | .vu.fo.fu. | tNujutjutj | .vu.fo.fu.fU. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. | .vu.fo.fu. |
ootUU | .fo.fo.tU.fU. | ffffffffff | .fo.fo.fU.tU.tU. | .fo.fo.tU.fU. | .fo.fo.tU.fU. | .fo.fo.tU.fU. | .fo.fo.tU.fU. | .fo.to.tU.fU. | .fo.to.tU.fU. | .fo.to.tU.fU. | .fo.to.tU.fU. | .fo.to.tU.fU. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Ebhhh | .Eh. | iibEuccEcc | .Eh. | .Eh.hh. | .Eh. | .Eh. | .Eh. | .Eh. | .Eh. | .Eh. | .Eh. | .Eh. |
AAr | .A.Ar. | nquccEccEc | .A.AA. | .A.A.Ar. | .A.Ar. | .A.Ar. | .A.Ar. | .A.Ar. | .A.Ar. | .A.Ar. | .A.Ar. | .A.Ar. |
EhrA | .Eh.rA. | bEuccEccEc | .Er.AA. | .Er.rA. | .Eh.hA. | .Er.rA. | .Eh.rA. | .Eh.rA. | .Eh.rA. | .Eh.rA. | .Eh.rA. | .Eh.rA. |
AhrhA | .Ar.hA. | xuccEccEcc | .A.AA. | .Ah.hA..A. | .Ah.hA. | .Ah.rA. | .Ah.hA. | .Ar.hA. | .Ar.hA. | .Ar.hA. | .Ar.hA. | .Ar.hA. |
AAhEE | .A.A.hE.E. | ujccEccEcc | .A.A.EE. | .A.A.rE.E. | .A.A.hE. | .A.A.hE.E. | .A.A.hA.E. | .A.A.hE.E. | .A.A.hE.E. | .A.A.hE.E. | .A.A.hE.E. | .A.A.hE.E. |
rhA | .hA. | xuccEccEcc | .hA. | .hA. | .hA. | .hA. | .hA. | .hA. | .hA. | .hA. | .hA. | .hA. |
EArrA | .E.Ar.rA. | xuccEccEcc | .E.AA.AA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. | .E.Ar.rA. |
AEErE | .A.E.E.rE. | ibEuccEccE | .A.E.E.. | .A.E.rE.E. | .A.E.EE.E. | .A.E.E.rE. | .A.E.E.rE. | .A.E.E.rE. | .A.E.E.rE. | .A.E.E.rE. | .A.E.E.rE. | .A.E.E.rE. |
EEEE | .E.E.E.E. | ujccEccEcc | .E.E.E.E. | .E.E.E.E. | .E.E.E.E.. | .E.E.E.E. | .E.E.E.E. | .E.E.E.E. | .E.E.E.E. | .E.E.E.E. | .E.E.E.E. | .E.E.E.E. |
hrAA | .rA.A. | xxuccEccEc | .rA.A. | .rA.A. | .rA.A. | .rA.A. | .rA.A. | .rA.A. | .rA.A. | .rA.A. | .rA.A. | .rA.A. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
rUUUr | .rU.U.U.re. | fddddddddd | .rU.U.re. | .rU.U.U.re. | .rU.U.U.re. | .rU.U.U.re. | .rU.U.U.U.e. | .rU.U.U.re. | .rU.U.U.re. | .rU.U.U.re. | .rU.U.U.re. | .rU.U.U.re. |
UU | .U.U. | xcrfdddddd | .U.U. | .U.U. | .U.U. | .U. | .U.U. | .U. | .U. | .U. | .U. | .U. |
heee | .he.e.e. | xbxbkxbkxb | .he.e.e. | .he.e.e. | .he.e.e. | .he.e.e. | .he.e.e.e. | .he.e.e. | .he.e.e. | .he.e.e. | .he.e.e. | .he.e.e. |
eger | .e.ge.re. | ehcrfddddd | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. | .e.ge.re. |
heeUU | .he.e.U.U. | xSxcrfdddd | .he.e.e... | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. | .he.e.U.U. |
heUh | .he.U.he. | xtxOkkxkkx | .he.h..e. | .he.U.he. | .he.U.he. | .he.U.he. | .hU.U.e.he. | .he.U.he. | .he.U.he. | .he.U.he. | .he.U.he. | .he.U.he. |
rrUU | .re.rU.U. | xtxOkkxkkx | .re.rU.UU. | .re.rU.U. | .re.rU.U. | .re.rU.U. | .re.rU.U.U. | .re.rU.U. | .re.rU.U. | .re.rU.U. | .re.rU.U. | .re.rU.U. |
gr | .ge.re. | fddddddddd | .ge.re. | .ge.re. | .ge.re. | .ge.re. | .ge.re. | .ge.re. | .ge.re. | .ge.re. | .ge.re. | .ge.re. |
grreg | .ge.re.re.ge. | fddddddddd | .ge.re.ge.ge. | .ge.re.re.ge. | .ge.re.re.ge. | .ge.re.ge.ge. | .ge.re.re.ge.ge. | .ge.re.re.ge. | .ge.re.re.ge.ge. | .ge.re.re.ge. | .ge.re.re.ge. | .ge.re.re.ge. |
UeerU | .U.e.e.rU. | xtxOkkxkkx | .U.U.rU.U. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. | .U.e.e.rU. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Uuru | .ru. | ssssssssss | .ru. | .ru. | .ru. | .ru. | .ru. | .ru. | .ru. | .ru. | .ru. | .ru. |
rUj | .rU. | jfffffffff | .rU. | .rU. | .rU. | .rU. | .rU. | .rU. | .rU. | .rU. | .rU. | .rU. |
udub | .du. | jfffffffff | .du. | .du. | .du. | .du. | .du. | .du. | .du. | .du. | .du. | .du. |
uUUUU | <empty> | kmffffffff | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> |
jUu | .ju. | kmffffffff | .ju. | .ju. | .ju. | .ju. | .ju. | .ju. | .ju. | .ju. | .ju. | .ju. |
uUuUu | <empty> | ssssssssss | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> |
bbudU | .bu.dU. | kmffffffff | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. | .bu.dU. |
Uubjj | <empty> | jfffffffff | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> |
UU | <empty> | kmffffffff | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> |
juUb | .jU. | kmffffffff | .jU. | .jU. | .jU. | .jU. | .jU. | .jU. | .jU. | .jU. | .jU. | .jU. |
input | correct output | 0 iterations | 1,000 iterations | 2,000 iterations | 3,000 iterations | 4,000 iterations | 5,000 iterations | 6,000 iterations | 7,000 iterations | 8,000 iterations | 9,000 iterations | 10,000 iterations |
---|---|---|---|---|---|---|---|---|---|---|---|---|
fUxa | .fU.xa. | g.xwiiss.l | .fU.fU.. | .fU..a. | .fU.xa. | .fU.xa. | .fU.xa. | .fU.xa. | .fU.xa. | .fU.xa. | .fU.xa. | .fU.xa. |
ama | .ma. | ..is.isrg. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. |
xxx | <empty> | Oeg.xwiiss | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> | <empty> |
faxUU | .fa.xU. | eg.xwiisse | .fU..U. | .fU.fU. | .fa.xU. | .fa.xU. | .fa.xU. | .fa.xU. | .fa.xU. | .fa.xU. | .fa.xU. | .fa.xU. |
axUUU | .xU. | egk.xiiss. | .xU..UU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. |
axU | .xU. | eg.xwiisse | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. | .xU. |
Ufaaa | .fa. | ..xiiss.ri | .fa. | .fa. | .fa. | .fa. | .fa. | .fa. | .fa. | .fa. | .fa. | .fa. |
xdUd | .dUd. | Uk.isrg.xi | .dUd. | .dUd. | .dUd. | .dUd. | .dUd. | .dUd. | .dUd. | .dUd. | .dUd. | .dUd. |
UxfdU | .dU. | f.r.iiss.r | .dU. | .dU. | .dU. | .dU. | .dU. | .dU. | .dU. | .dU. | .dU. | .dU. |
mxma | .ma. | ..is.isrg. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. | .ma. |
The main goal of our framework is to give a model a certain set of inductive biases. While the ability to learn from a small number of examples is one piece of evidence that a model has acquired some useful inductive biases, a more direct way to study a learner's inductive biases is through generalization: how does the learner handle novel types of examples?
The figure below shows how meta-learning facilitates three types of generalization: generalization to a novel length, generalization to novel symbols, and generalization to novel input structures (we refer to examples of the last category as implicational universals). In each case, we train our model on a training set which has certain types of examples withheld, then evaluate it on the withheld class of examples. Click the tabs above the figure to explore these three types of generalization.
Input |
Correct output |
Meta-initialized model's output |
Randomly-initialized model's output |
---|---|---|---|
jkk |
.ja.ka.ka. |
.ja.ka.ka. |
.ja.ka.ka. |
iji |
.ji.ji. |
.ji.ji. |
.ji.ji. |
iEEE |
.ji.jE.jE.jE. |
.ji.jE.jE.jE. |
.ji.jE.jE.jE. |
Oij |
.jO.ji.ja. |
.jO.ji.ja. |
.jO.ji.ja. |
xaOO |
.xa.jO.jO. |
.xa.jO.jO. |
.xa.jO.jO. |
axEk |
.ja.xE.ka. |
.ja.xE.ka. |
.ja.xE.ka. |
Input |
Correct output |
Meta-initialized model's output |
Randomly-initialized model's output |
---|---|---|---|
jxaai |
.ja.xa.ja.ji. |
.ja.xa.ja.ji. |
.ja.xa.ji. |
jjEjO |
.ja.jE.jO. |
.ja.jE.jO. |
.ja.jE.jO. |
jOOja |
.jO.jO.ja. |
.jO.jO.ja. |
.jO.jO.ja. |
jkkaa |
.ja.ka.ka.ja. |
.ja.ka.ka.ja. |
.ja.ka.ja. |
Ejjaa |
.jE.ja.ja.ja. |
.jE.ja.ja.ja. |
.jE.ja.ja. |
OEOja |
.jO.jE.jO.ja. |
.jO.jE.jO.ja. |
.jO.jE.jO. |
Input |
Correct output |
Meta-initialized model's output |
Randomly-initialized model's output |
---|---|---|---|
is |
.i. |
.i. |
.i. |
Um |
.U. |
.U. |
.U. |
iq |
.i. |
.i. |
.i. |
or |
.o. |
.o. |
.o. |
Er |
.E. |
.E. |
.E. |
Ex |
.E. |
.E. |
.E. |
Input |
Correct output |
Meta-initialized model's output |
Randomly-initialized model's output |
---|---|---|---|
fir |
.fi. |
.fi. |
.o. |
bit |
.bi. |
.bi. |
.i. |
bac |
.ba. |
.ba. |
.e. |
hoh |
.ho. |
.ho. |
.o. |
vAk |
.vA. |
.vA. |
.A. |
cOx |
.cO. |
.cO. |
.O. |
Input |
Correct output |
Meta-initialized model's output |
Randomly-initialized model's output |
---|---|---|---|
Ejpjj |
.E.jEp.jEj. |
.E.jEp.jEj. |
.E.jEp.jEj. |
AjjjE |
.A.jEj.jE. |
.A.jEj.jE. |
.A.jEj.jE. |
jxAxp |
.jE.xA.xEp. |
.jE.xA.xEp. |
.jE.xA.xEp. |
EAjEA |
.E.A.jE.A. |
.E.A.jE.A. |
.E.A.jE.A. |
Ej |
.Ej. |
.Ej. |
.Ej. |
pEA |
.pE.A. |
.pE.A. |
.pE.A. |
Input |
Correct output |
Meta-initialized model's output |
Randomly-initialized model's output |
---|---|---|---|
eXr |
.e.XEr. |
.e.XEr. |
.A.E.XE. |
Oenw |
.O.e.nEw. |
.O.e.nEw. |
.A.A.cEc. |
UUAfi |
.U.U.A.fi. |
.U.U.A.fi. |
.cE.cA.Ej. |
kdAri |
.kE.dA.ri. |
.kE.dA.ri. |
.cE.Aj. |
eh |
.eh. |
.eh. |
.A.A. |
tnsvU |
.tE.nEs.vU. |
.tE.nEs.vU. |
.jE.xEj.jEc. |
Across all three cases, the meta-initialized model generalizes well while the randomly-initialized model generalizes poorly. These results indicate that meta-learning has successfully imparted a bias for the length-invariance of syllable patterns, a bias for assuming two universal classes of symbols (namely, consonants and vowels), and a bias for associating certain mappings with certain other mappings (a.k.a. implicational universals).
Linguistic universals are typically divided into two categories: absolute universals, which are properties that all languages possess, and implicational universals, which are cases where the presence of one property in a language implies the presence of another property. The previous section showed that meta-learning had instilled some absolute universals in the model, while this section shows that it has also instilled some implicational universals. Thus, meta-learning can impart both of the major categories of linguistic universals.
See the paper for more discussion of these experiments and for investigation of three other inductive biases using an alternate technique, namely ease of learning (that is, which types of languages are easiest for a model to learn?)
Our method can be summarized as follows:
The core component of our method is meta-learning; the particular type of meta-learning that we use is MAML. In MAML, a model is exposed to a variety of tasks, each of which comes with a limited amount of data. After exposure to each task, the model's weights are adjusted so that, if it were taught the same task again, it would perform better. As MAML proceeds, the model converges to an initial state from which it can readily learn any task in the distribution of tasks it has been shown. The demo below gives an example of MAML.
If meta-learning is successful, the model's initialization will encode some set of inductive biases that are useful for learning the space of tasks that the model is being shown. In the demo, the initialization that the model converges to encodes a bias for the slope to be close to -3 and for the intercept to be close to -2.
In this simple example, it would have been easy to set the model's parameters by hand in a way that encoded this bias. However, such hand-coding of biases is not practical with standard neural networks, which typically have far more parameters (often tens of millions of parameters, or even 175 billion parameters) which interact with each other in difficult-to-interpret ways. In such cases where manual parameter setting is impractical, meta-learning can automatically discover parameter settings that encode the desired biases.
In our application of meta-learning, each “task” is a language, and the goal is to find a set of initial parameter values that allow the model to quickly learn any language. We hypothesize that, if we carefully construct the space of languages used during meta-learning to encode a desired set of inductive biases, we can impart these inductive biases in order to study how they affect generalization behavior.
We have shown how meta-learning can impart inductive biases specified by the modeler. While the meta-learned biases are not as transparent as those encoded in probabilistic symbolic models, analysis of the model's learning behavior can be used to evaluate whether meta-learning has produced the desired biases, as we have shown. Our case study demonstrates that linguistic inductive biases that have previously been framed in symbolic terms can be reformulated in the context of neural networks, facilitating cognitive modeling that combines the power of neural networks with the controlled inductive biases of symbolic methods.
In future work, we plan to use this approach to study open questions in cognitive science. For example, by creating a model that has a particular bias which has been argued to be important for human cognition, we can empirically test whether the bias has the explanatory power that it has been argued to have. Alternately, while our case study involved giving a model certain pre-selected biases, this approach could instead be applied to naturally-occurring linguistic data to lend insight into the inductive biases that shaped this data. Finally, this approach might have applications in Artificial Intelligence: By giving targeted, cognitively-motivated learning biases to models, we may be able to decrease the gap between the learning capabilities of AI models and humans.
See the full paper for more details. If you have questions or comments, email tom.mccoy@jhu.edu.
*****
Some fine print about this website: The demos differ from the experiments in the paper in a few ways designed to decrease the computation load for the website. First, the models on the website are GRUs, while in the paper we used LSTMs. Second, training in the demos uses a batch size of 1 with stochastic gradient descent, while the paper uses a batch size of 10 with the Adam optimizer. Thus, the performance of the demo models is not guaranteed to be the same as the performance of the models used in the paper. Finally, on this page, we have excluded any examples containing lowercase L or capital i, as the similarity of these letters sometimes makes examples hard to read, but both letters were available in the experiments in the paper.