Chapter 6. Abbreviations: typology and encoding

Version 1.1 (5 May 2004)

6.1 Introduction
6.2 Abbreviation marks on the base line
6.3 Combining abbreviation marks
6.4 Special cases
6.5 List of abbreviation marks

Back to list of contents

6.1 Introduction

Abbreviations are a common feature of medieval manuscripts. In the medieval Nordic tradition, abbreviations were used most frequently in Norwegian and Icelandic manuscripts, and particularly in the latter. In some Icelandic manuscripts as many as a third of the words may be abbreviated, some of them with several abbreviation marks. The system of abbreviations was inherited from English and Continental practice, but the adoption of this system also meant that the usage of some abbreviation marks was extended and it lead to the development of some new types.

Abbreviations are usually divided into four categories (see e.g. Hreinn Benediktsson 1965, p. 85 and, for a more detailed classification, Kristian Kålund 1907, pp. viii-x):

(1) Suspensions. The first part of the word, often the initial letter only, is written out, followed by a dot or similar mark. The plural may be represented by a doubling of the initial letter, e.g. "ss." = synir (sons).

(2) Contractions. Some letters are left out, but the initial and final letters are written out, often one or more of the intermediate as well. The abbreviation is often indicated with a horizontal bar above the word.

(3) Interlinear marks. The interlinear abbreviation is usually a vowel representing either "r" or "v" + the vowel itself or a consonant representing "a" + the consonant itself.

(4) Special signs (brevigraphs). These signs are usually placed on the base line and are thus akin to ordinary letters. The Tironian notae belong to this category.

The typology below takes as its point of departure the location of the abbreviations. The main distinction is drawn between abbreviation signs placed on the base line and those placed above (or through or below) a base line character. We suggest that letter-sized characters on the base line are referred to as signs, while combining abbreviation marks (above, through or below another character) are referred to as marks. For the sake of simplicity, however, we shall refer to both categories as marks in this chapter.

6.1.1 Glyphs

Glyphs are shown in a font based on Courier. Since abbreviation marks typically appear as part of words and are frequently associated with a base line character we have chosen to illustrate each mark within the context of a whole word. Those who wish to see the abbreviation marks in isolation may go to sections 1.2 and 1.5 of the character list.

6.1.2 Entity names

All abbreviations are referred to with entity names, with the exception of full stop, ".", and colon, ":". Entity names are placed within the delimiters "&" and ";", and we have tried to give as short and mnemonic names as possible. As a rule, we have based the entity name on the typical expansion of the abbreviation. Thus, the semicolon which is an abbreviation for "ed" (or "eð" / "eþ") is given the entity name "&ed;".

As explained in ch. 2, we aim at synchronizing our use of entities with those recommended by ISO. Since there presently are no abbreviation entities in ISO, we are left on our own in this chapter.

6.1.3 Unicode values

Unicode 4.0 has only defined a handful of abbreviation characters and only a few of interest for our use. The great majority of abbreviation characters must therefore be defined as code values in the Private Use Area. The only exceptions are the full stop, colon and semicolon, which are part of the range Basic Latin in Unicode, and the Tironian sign for et, in the range General Punctuation. See the discussion below in sections 6.2.8 and 6.2.9.

A complete list of suggested Unicode values is given in sections 1.2 and 1.5 of the character list.

6.1.4 Descriptive names

As is the case with ordinary characters (cf. ch. 3) we adhere to the naming scheme in Unicode. Since Unicode 4.0 only defines one abbreviation mark in the Latin alphabet, the TIRONIAN SIGN ET in the range General Punctuation, and only one in each of the Armenian, Syriac, Devanagari, Thai and Khmer alphabets, we do not have completley clear examples of descriptive names. We suggest ABBREVIATION SIGN "000" as a general name for abbreviations occupying a separate position on the base line, and COMBINING ABBREVIATION MARK "000" for those typically placed above, through or below a base line character.

For suggested descriptive names, please refer to sections 1.2 and 1.5 of the character list.

6.2 Abbreviation marks on the base line

Abbreviation marks on the base line behave as any other character. The typology is discussed and exemplified below. A complete list is found in section 1.2 of the character list.

6.2.1 The "et" mark

The Tironian nota resembling the number "7" (or the character "z" with or without a crossbar) is often used for the conjunction "ok" / "oc" (in Latin "et"). We recommend using the entity name "&et;", reflecting the Latin origin of the abbreviation.

In Unicode 4.0 this character is located at 204A in the range General Punctuation.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(et)	&et;	204A

There are at least three different variants of this sign. If the transcriber wishes to make a distinction between these, we suggest supplying an index for each type, e.g. "&et-1;", "&et-2;" and "&et-3;". The meaning of each entity must be explained in the header of the transcription and specified in the DTD.

6.2.2 The "ed" mark

The semicolon was used for "e" + dental consonant, often in the preposition "með". We recommend "&ed;" as entity name.

In Unicode 4.0 the semicolon is located at 003B in the range Basic Latin. When the semicolon is used as a punctuation mark, it should be transcribed as such, i.e. simply as ";". When it is used as an abbreviation mark we recommend that it is transcribed with an entity, "&ed;".

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	m(eð)	m&ed;	003B

6.2.3 The "con" mark

A sign resembling a backwards "c" was often used for "con" in Latin and "kon" in Nordic words. As entity name, we recommend "&con;".

This "con" mark is partially similar to 0254 LATIN SMALL LETTER OPEN O in the range IPA Extensions of Unicode 4.0 and may be identified with this character.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(kon)a	&con;a	0254

6.2.4 The "rum" mark

The sequence "rum" was often abbreviated with a character resembling a small version of the number 4 (in fact, it is the round "r" with a stroke across its tail). We recommend the entity name "&rum;" and a separate code point in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	eo(rum)	eo&rum;	F154

6.2.5 The cross mark

The word "kross" was sometimes abbreviated with the cross symbol, which we suggest calling "&cross;".

This "kross" mark can be identified with 271D LATIN CROSS in the range Dingbats of Unicode 4.0.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(kross)	&cross;	271D

6.2.6 The "m" rune

The runic character for "m" was sometimes used for the word "maðr" (including case forms with the stem "mann-"). We recommend the entity name "&mrun;", as introduced in ch. 5.2.7.

Unicode 4.0 has defined a selection of 81 runes from the Older and Younger Futhark in the Runic range. This range includes the "m" rune.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(maðr)	&mrun;	16D8

The runic character may appear with interlinear marks ("a", "i", "e", "n", "z") for various inflected forms of the word "maðr", e.g. "manna", "manni"/"manne", "mann", "mannz". The encoding of this type is discussed in ch. 6.3.7 below.

6.2.7 The "f" rune

The runic character for "f" was sometimes used for the word "fé". In analogy with the use of the "m" rune, we suggest the entity name "&frun;".

The "f" rune is included in the Runic range of Unicode 4.0.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(fé)	&frun;	16A0

6.2.8 Dot (full stop)

Dots were often used as abbreviation marks, typically for suspensions, e.g. "s." for "sonr" (or "segja", "svara"). They may sometimes appear on both sides of the abbreviated word, ".s.". We recommend that the dot is transcribed in the same manner as a full stop, i.e. with the "." mark in Basic Latin. Thus, no entity name is called for.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	s(onr)	.s.	002E
	k(onun)gr	.kgr.	002E

If the transcriber wishes to distinguish between the dot used as an abbreviation mark and the dot used as a punctuation mark, we suggest that the entity name "&dot;" could be used in the former case and "." in the latter. However, we believe that there will arise a number of cases where it is difficult to decide whether the dot in the manuscript is a mark of abbreviation, punctuation or both, e.g. when a suspended word is the last word in a sentence. We therefore believe it is better to accept that the full stop is an ambivalent mark, as is also (although to a much lesser extent) the case with the colon and the runic characters "f" and "m".

6.2.9 Colon

The colon is sometimes, though not often, used as a mark of suspension, in the same manner as the dot (full stop). In analogy with the encoding of dots we suggest transcribing the colon simply as a colon, i.e. without using an entity.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	Rognv(aldr)	Rognv:	003A

6.2.10 Small capitals

In Old Icelandic small capitals were used to denote geminated (long) consonants or they were simply used ornamentally (especially in Old Norwegian). In ch. 5.2.3 above we recommended that they were encoded as entities in both cases. The use of small capitals can be seen as a form of abbreviation, but there will be a number of cases where the usage is open to interpretation. We recommend that the transcriber copies the text as it is, transcribing a small capital as a small capital irrespective of whether it is being used to denote gemination or as an ornament. Thus, exactly the same entities will be used here as introduced in ch. 5.2.3.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	heRa	he&rscap;a	0280

For the encoding of small capitals with dot above, please see ch. 6.3.8 below.

6.3 Combining abbreviation marks

The majority of abbreviation marks are placed above, through or below a base line character. It could be argued that they really refer to the whole word, but from an analytical point of view we recommend that they are encoded immediately after the base line character to which they seem most closely associated. Cf. the rules in ch. 2.2.1.

The typology of combining abbreviation marks is discussed and exemplified below, while a complete list is found in section 1.5 in the character list. Note that all abbreviation marks of this type are described as "combining". This means that they do not occupy a separate position on the base line, but are attached to the immediately preceding base line character.

It is sometimes difficult to decide whether a sign is placed on the base line or above another base line character. For example, the "us" mark (cf. ch. 6.3.3 below) may sometimes occupy a position of its own, although slightly raised above the base line. The classification in this chapter is based on what we believe are the prototypical positions of the abbreviation marks.

6.3.1 Horizontal bar

The horizontal bar is from a historical point of view the earliest form of an abbreviation mark and it is also the most ambiguous type. It is commonly used for "m" or "n" and is often referred to as "nasal stroke", but it is also used in a number of other contexts, as a mark of suspension or contraction. We recommend using the same entity name in all instances, "&bar;". The unmarked position of the bar is above the immediately preceding character.

This horizontal bar is partially similar to 0304 COMBINING MACRON and 0305 COMBINING OVERLINE in the range Combining Diacritical Marks of Unicode 4.0, and may be identified with the latter.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	han(n)	han&bar;	0305
	p(restr)	p&bar;	0305
	þ(at)	þ&bar;	0305

In the last example, the bar crosses the ascender of the character "þ". In our view, this is only a coincidence, since the bar in all cases is placed above the x height of the base line characters. If there is a character with an ascender, the bar will simply cross this stroke.

The unmarked position of the bar is above the base line character, and this is therefore part of the definition of the entity "&bar;". In some cases the bar may be placed below the base line character. Here, we suggest the entity name "&barbl;" (for "bar below").

The horizontal bar below is partially similar to 0331 COMBINING MACRON BELOW or 0332 COMBINING LOW LINE in the range Combining Diacritical Marks of Unicode 4.0, and may be identified with the latter.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	p(er)	p&barbl;	0332

It is possible to identify various shapes of the horizontal bar. In general we recommend that the transcriber should not make more distinctions than strictly necessary. If the transcriber for some reason would like to create a typology of bar forms, we suggest that this is done in the same way as with the "et" mark, i.e. by numbering, "&bar-1;", "&bar-2;", "&bar-3;", etc. Cf. ch. 6.2.1 above.

6.3.2 Flourish

The flourish may be described as a horizontal bar with a return. It appears in the abbreviation of the Latin word "pro" in contradistinction to "per", which typically is abbreviated with a simple horizontal bar. We suggest using the entity name "&combflour;" and recommend that it is given a separate code point in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	p(ro)fat	p&combflour;&fins;at	F1C6

6.3.3 The "us" mark

Originally a Tironian nota, a mark resembling a small version of the number "9" is often used for "us". It is usually placed in a raised position, though not always clearly above the preceding character. Since the typical position of this mark is above the base line, we regard it as a combining mark and suggest the entity name "&us;" and recommend that it is given a separate code point in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	la(us)	la&us;	F15B

6.3.4 The "er" mark

A mark resembling a zigzag was frequently used as abbreviation of a front vowel (including diphtongs) + "r", e.g. "ir", "er", "eir", "ær". The earliest form resembles a horizontal stroke with a descender to the left and an ascender to the right. It later acquired a zigzag-like form and even later resembles the letter "u" turned upside-down. We suggest using the entity name "&er;" since this is the most common expansion of the abbreviation. We recommend that it is given a separate code point in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	v(er)	v&er;	F152

6.3.5 The "ra" mark

Originally an open form of the character "a", this mark was used as an abbreviation for "ra" or "va". One variant resembles the Greek omega-sign and another variant the omega-sign with a horizontal bar above. We suggest using the entity name "&ra;" for the first type and "&rabar;" for the second. We recommend that both marks are given separate code points in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	s(va)	s&ra;	F157
	f(ra)	&fins;&rabar;	F1C1

6.3.6 The "ur" mark

The syllable "ur" (sometimes "yr") can be abbreviated by a mark resembling a small version of the number 2. Other forms of this mark resemble a tilde or a horizontal version of the number 8 (equal to the mathematical eternity symbol), cf. Hreinn Benediktsson 1965, p. 91. Due to the considerable variation in form we suggest that it might be useful to distinguish between two main forms, using the entity &ur2; for the first type and &ur8; for the second. Cf. section 1.5 in the character list.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	ock(ur)	ock&ur2;	F153

6.3.7 Interlinear characters

Interlinear characters are a common type of abbreviation. An interlinear vowel typically represents a consonant (often "r") + the vowel itself, while an interlinear consonant typically represents a vowel (often "a") + the consonant itself. We suggest that interlinear abbreviation marks are named by the character itself + "sup" (for "superscript"), e.g. "&asup;" (interlinear "a"), "&osup;" (interlinear "o"), "&rscapsup;" (interlinear small capital "r"), etc.

Unicode 4.0 includes a selection of 13 superscript characters, namely "a", "e", "i", "o", "u", "c", "d", "h", "m", "r", "t", "v", "x". They are located at the end of the range Combining diacritical marks, 0363-036F. We suggest that these characters are used to display interlinear characters and that characters outside this selection are given separate code points in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	b(or)g	b&osup;g	0366
	m(anna)	m&asup;	0363
	v(ir)þa	v&isup;þa	0365
	þeg(ar)	þeg&rsup;	036C
	Otta(rr)	Otta&rscapsup;	F026

The runic character "m", which itself can be used as an abbreviation (cf. ch. 6.2.5 above), can appear with an interlinear abbreviation mark. The encoding follows the pattern above.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(manna)	&mrun;&asup;	16D8 + 0363

Since the first entity, "&mrun;", is defined as a base line character and the second, "&asup;", as an interlinear mark placed above the immediately preceding base line character, there will be no doubt as to the positioning.

6.3.8 Superscript dots

Superscript dots are sometimes used to denote length. It is a moot question whether this is a type of abbreviation, but in any case the transcriber should use an entity for the encoding. We recommend that superscript dots are transcribed in analogy with other combining abbreviation marks and suggest using the entity name "&combdot;" (for "combining dot above").

Unicode 4.0 has a combining dot above in the range Combining diacritical marks.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	leg(g)ia	leg&combdot;ia	0307

Sometimes the dot is used above small capitals. Since small capitals themselves are a way of representing gemination, the dot above is redundant. The encoding will simply be the same as above. Cf. ch. 6.2.10 above.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	var(r)	va&rscap;&combdot;	0307

6.4 Special cases

6.4.1 Nomina sacra

In some cases the whole word must be analysed as an abbreviation. This applies to the traditional nomina sacra, i.e. abbreviations for sacred words such as "iesus" and "christus". These contain characters which originally were Greek but might be taken for Latin characters. For example, the "p" in "xpm" is originally a Greek "rho" ("r").

We believe these abbreviations should be encoded as a sequence of the individual base line characters and one or more combining bars above. In the examples below, the originally Greek base line characters have been identified with the similar-looking Latin characters. Greek characters might also have been used in the encoding (such as "&igr;" for GREEK SMALL LETTER IOTA, etc.).

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	(iesus)	i&bar;h&bar;c&bar;	0305 (+ 0305 + 0305)
	(christum)	x&bar;p&bar;m&bar;	0305 (+ 0305 + 0305)

Note that the combining bar above has been encoded more than once in these examples. That ensures an appropriate display of the manuscript text, since the bar will be shown as extending over the whole word. However, it may be argued that there is only a single bar in each example, and that this bar simply happens to extend over more than one character. This problem is discussed more fully in ch. 6.4.5 below.

6.4.2 Interlinear characters in other contexts

Interlinear (superscript) characters are used in various ways, not always as abbreviations. According to de Leeuw van Weenen 2000: 36-43 there are four types:

(a) as abbreviation

This type is discussed in ch. 6.3.7 above. Here, we recommend the usage of entities such as "&asup;".

(b) as addition

When interlinear characters are used for adding characters which were left out by the scribe we recommend that this is encoded by use of the element <add> and the attribute place="supralinear" (cf. ch. 7.2). There is no need for an entity of the type "&asup;" since the location of the character is indicated by the element.

Manuscript form	Expanded form	Encoding
	han`a´	han<add place="supralinear">a</add>

(c) as complementation of Roman numbers

Inflected forms of Roman numbers are sometimes specified by interlinear characters. In these cases the interlinear characters are not placed above any base line character but merely raised above the base line. We suggest using the element <seg> and the attribute type="superscript".

Manuscript form	Expanded form	Encoding
	v.`ti´	v.<seg type="superscript">ti</seg>

(d) as space savers

Especially at the end of a line one or more characters may be placed above the last word to save place and complete the line. We suggest the same encoding as in (c) above.

Manuscript form	Expanded form	Encoding
	e`s´	e<seg type="superscript">s</seg>

6.4.3 Missing abbreviation mark

From time to time one can find examples of a word that obviously is abbreviated but where there is no trace of the abbreviation mark. There is then no alternative but transcribing the text as it reads in the manuscript.

Manuscript form	Expanded form	Encoding
	d(rottning)	d

6.4.4 Nesting (stacking) of abbreviation marks

There are a few examples of base line characters which are abbreviated with an abbreviation mark which is itself abbreviated. An example is the base line character "m" with an interlinear "o" which in turn has a horizontal bar. According to rule 7 in ch. 2.2.1 above this abbreviation should be encoded as the sequence "m" + "&osup;" + "&bar;".

Manuscript form	Expanded form	Encoding
	m(onnom)	m&osup;&bar;

Since "&osup;" is defined as a combining character, it follows that it is placed above the immediately preceding character, in this case "m", and since "&bar;" is also defined as a combining character, it follows that it is placed above "&osup;". There is therefore no doubt as to the positioning of each part.

6.4.5 Extension of abbreviation marks

As a rule, combining abbreviation marks are associated with a single base line character. Thus, the sequence "m&osup;" means that the interlinear character "o" is seen as being placed above "m" and not above any other character. However, some abbreviation marks extend over more than one character. For example, the word "k(ir)kia" may be abbreviated with a horizontal bar crossing both the first and the second "k". We believe it is sufficient to associate the abbreviation mark with only one of these characters, preferably the first.

Manuscript form	Expanded form	Encoding
	k(ir)kia	k&bar;kia

It is possible to encode this word so that the bar is associated with both characters. This is in a sense closer to the manuscript form, but it means that a single abbreviation mark may appear as two distinct marks (unless it is somehow stated that the two marks belong together). Thus, this is a more complex and possibly misleading solution.

Manuscript form	Expanded form	Encoding
	k(ir)kia	k&bar;k&bar;ia

On the other hand, it should be noted that this a case where 0305 COMBINING OVERLINE is appropriate, since it connects to left and right. Cf. the reference in ch. 6.3.1 above.

6.4.6 Sporadic ligatures with abbreviation marks

In ch. 5.3 we recommended that sporadic ligatures should not be encoded by use of separate entities but by the element <seg> with the attribute type="ligature". A sporadic ligature is basically a joining of two base line characters which together do not reflect a separate phonological value. This is the case with ligatures such as "s+k" and "p+p" which in this respect are identical to "s" + "k" and "p" + "p".

Manuscript form	Expanded form	Encoding
	(pp)	<seg type="ligature">pp</seg>

However, some ligatures are formed in such a manner that it is difficult to distinguish the separate parts. That applies to ligature of long s + h, k and þ. In these cases, we suggest that it is advisable to use individual entities. The two first characters can be identified with existing Unicode characters, while the third must be referred to the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	h(an)s	&hslonglig;	0266
	k(onung)s	&kslonglig;	0199
	þ(es)s	&thornslonglig;	E734

Sometimes, a horizontal bar is used across these ligatures. The bar may be encoded separately with its usual entity, &bar; (cf. ch. 6.3.1 above) or with a character located in the Private Use Area.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	k(onung)s	&kslonglig;&bar;	0199 + 0305
	k(onung)s	&kslongligbar;	E7C8

6.4.7 The character "r" as interlinear ligature

A quite special type of abbreviation is interlinear "r" in ligature with e.g. "þ". We suggest encoding this as a sporadic ligature of "þ" and interlinear "r".

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	þ(ar)	<seg type="ligature"> þ&rsup;</seg>	00FE + 036C

6.4.8 Sharp "s"

In late Old Norwegian, the "sharp s" appears in a number of abbreviations, e.g. for "skilling", "smør" and "son". The German character "sharp s" is defined in Unicode 4.0 as 00DF LATIN SMALL LETTER SHARP S in the range Latin-1 Supplement. It may be practical to distinguish between the "sharp s" in German usage and the similar-looking abbreviation mark in Nordic manuscripts. We therefore suggest the ISO entity name "ß" for the German letter (if needed) and "&ssharp;" for the abbreviation mark.

Abbreviated form	Expanded form	Encoding	Abbreviation mark code point
	Hakon(son)	Hakon&ssharp;	00DF

6.5 List of abbreviation marks

A complete list of abbreviation marks is found in sections 1.2 and 1.5 in the character list.

Top of page

Version 1.0 published 20 May 2003. Version 1.1 published 5 May 2004.