4. Normalisering i Menota

I håndboken anbefaler vi at lemmatiseringen skal følge normalen til ONP i København.

Det er imidlertid en rekke praktiske spørsmål som må avklares. Andrea de Leeuw van Weenen (Leiden) har skrevet et notat som vi skal legge til grunn for diskusjonen:


In the handbook one is advised to use the ONP normalisation. This is in itself a great step forward, as the diversity between the various grammars, handbooks and dictionaries is rather more than one would expect. But it still is not as clearcut as one might wish. In the first place ONP has changed the spelling of several headwords since they started publishing. For example the first volumes have <pt>, whereas the wordlist now uses <ft>, except where there is a direct relation to forms with /p/. And I suppose that such changes will keep coming. Also in other respects the wordlist in the internet does not always agree with the printed volumes. So probably one should specify according to the ONP wordlist. This wordlist, however, is not yet in its final state, but has still many inconsistencies, as I realized when comparing "my"lemmata for the Homily Book and Möðruvallabók with the lemmata in the wordlist.

I think that it is important that the files deposited at the Menota site should use the same normalisation. Otherwise linguistic research will be far more difficult. As the ONP normalisation is mainly that of the early 13th century (apart from the decision to have short vowels before l+fgkm) this will dress up many texts in a rather oldfashioned dress, but for the sake of comparison this is a good thing.

Quite apart from still remaining errors and inconsistencies there is the question of the multiple headwords. These are found now only in the already published part. For the words not yet in a printed volume only the first variant is listed. Accordingly I have as a rule opted for the first variant, but what to do when a single headword combines 2 spellings by bracketing an s?

Take ágang(s)samr. Do I take it up with s or with ss? For the use of the Menota texts it will be important that all texts use the same lemma. For it one is interested in a certain word it is often not so easy to remember all the different ways to spell the lemma. Searching in the ONP wordlist I have not a few times entered a searchword and got no result. Usually I have some idea where the word can be hiding, but there have been a few times that I had to ask the ONP staff where they had hidden the word. So in order to avoid that we all have to take the same decisions, or at least try to.

Also I have normalised all feminines in -an/-un as -un, and all masculines u-stems in -aðr/-uðr as -uðr. Yes, going for the oldest form, which seems to be ONP’s rule, but has not been carried out consistently yet. In a number of cases a lemma is printed (in normal, not bold) with a reference to another lemma. Here I find it more difficult to decide. Sometimes we have here to do with words with the same meaning, sometimes with a word group that is treated under one of its elements, sometimes with a form that has an independent function. Should I lemmatise aðalfestr as alaðsfestr? I think not. Nor should alls adv be lemmatised as allr. Nor annarrhvárr as annarr, or annarsstaðar as staðr. But what about auðkvisi? Normalise as aukvisi?

Probably the Menota handbook will tell me how to handle pluralia tantum, but I don’t have it at hand at the moment. So include it in order not to forget it myself.

ONP sticks to the older forms with -kð rather then -kt in words like sekð. I presume that it follows that also preterite and participle forms should be normalized with kð (sekðan, not sektan).

Then there are innerparadigmatic variants: In verbs the 1st person plural is sometimes without -m. Should the -m be added? I have done so, because I intend the normalized text for students, who in my experience have trouble enough without such niceties. The 2nd person plural is sometimes without its final -ð. This I have added as well.

I have also normalised eigið to eiguð (in the indicative) and 3rd person plural indicative eiga to eigu, and correspondingly for the other preterite-presents.

I am still undecided about contraction: búm or búum, but feel that if we go for early OIc it should be the contracted form. I have normalised dat. pl. bøndum to bóndum (and gen. pl. bønd to bónda). I have normalised eðr to eða. This mostly for the benefit of students. But what to do with the participle efnt/efnat? or with -ligast forms next to ˆ-igst forms? or with aðila vs aðiljar?

I also tend to the article enn, rather than inn, as the older form. I have normalised dat. sg. ey to eyju, ǫnd to ǫndu, and I think I should do the same with the various words in -ing (a real mess those). And I write the underlying margt rather than margt, vatns rather than vats or vaz.

Should fanginn be replaced by fenginn? Gylltr by gylldr? Mun by man?

And what to do with nokkurr? The oldest form is nekkverr, but even to me it feels a bit over the top to use that as the norm. But I don’t believe in nǫkkurr: that would have to come from the nakkverr/nakkvarr group, which - if I go by the Homily Book - is rather rare. It will have been rather nekkverr > nøkkurr, which then after the merger of ǫ and ø should be written nökkurr, and later became nokkurr (but when this change took place is far from clear to me). Nøkkurr will look strange to most users; nökkurr is out, as ö is not part of the character set used by ONP. For the moment I have use the modern form nokkurr....

I have normalised skyldu to skuldu. The whole situation with the vowels in the past tense of skulu and munu is a big mess. I would be in favour of normalising with u in the indicative and with y in the subjunctive. The problem is that a form like myndi in the ms can be both indicative and subjunctive. So in many cases I cannot decide. Germans tend to follow their own rules in these cases, but I am far from sure that it worked the same way in Old Norse and think that we need a far larger corpus before we can set rules for this.

Tigr/tøgr is another problem word. In fact it needs normalising in all cases. So if we stick to the oldest stage:

nom   tøgr   tigir
gen   tegar   tega
dat   tigi    tøgum
acc   tøg   tøgu

This has the disadvantage that the nom. sg. doesn‚t coincide with the lemma, which the ONP wordlist gives as tigr.

In the u-stems we have also the fact that later mss have -i in the acc.pl. rather than -u. Normalise to -u?

tveim or tveimr? þrem or þremr?

Some words show sometimes syncope, sometimes not. ýmiss is an example. Should one normalise that?

A related problem is the dative þráin. To start with, it is already difficult to decide whether the ms has þrani or þrain, but should this variant be shown in the normalised version. I am tempted to say no, but that is because I see the normalised version of a fully analyzed text as intended for the use of students, who have dificulties enough without having to do with that type of irregularity.

I would like to have your opinion about the problems indicated above. I suppose that you aim at uniformity at the lemma level. This already will require some rules to be set up. Should uniformity at the <norm> level be aimed for as well, then much more will have to be put into the rules.


