Chapter 6. Abbreviations: typology and encoding
Version 1.0 (20 May 2003)
6.1
Introduction
6.2 Abbreviation
marks on the base line
6.3 Combining
abbreviation marks
6.4 Special
cases
6.5 List of
abbreviation marks
6.1 Introduction
Abbreviations are a common feature of Medieval manuscripts. In the Medieval Nordic tradition, abbreviations were used most frequently in Norwegian and Icelandic manuscripts, and particularly in the latter. In some Icelandic manuscripts as many as a third of the words may be abbreviated, some of them with several abbreviation marks. The system of abbreviations was inherited from English and Continental practice, but the adoption of this system also meant that the usage of some abbreviation marks was extended and it lead to the development of some new types.
Abbreviations are usually divided into four categories (see e.g. Hreinn Benediktsson 1965: 85 and, for a more detailed classification, Kristian Kålund 1907: viii-x):
(a) Suspensions. The first part of the word, often the initial letter only, is written out, followed by a dot or similar mark. The plural may be represented by a doubling of the initial letter, e.g. "ss." = synir (sons).
(2) Contractions. Some letters are left out, but the initial and final letters are written out, often one or more of the intermediate as well. The abbreviation is often indicated with a horizontal bar above the word.
(3) Interlinear marks. The interlinear abbreviation is usually a vowel representing either "r" or "v" + the vowel itself, or a consonant representing "a" + the consonant itself.
(4) Special signs (brevigraphs). These signs are usually placed on the base line and are thus akin to ordinary letters. The Tironian notae belong to this category.
The typology below takes as its point of departure the location of the abbreviations. The main distinction is drawn between abbreviation signs placed on the base line and those placed above (or through or below) a base line character. We suggest that letter-sized characters on the base line are referred to as signs, while combining abbreviation marks (above, through or below another character) are referred to as marks. For the sake of simplicity, however, we shall refer to both categories as marks in this chapter.
6.1.1 Glyphs
Glyphs are shown in a font based on Courier. Since abbreviation marks typically appear as part of words and are frequently associated with a base line character we have chosen to illustrate each mark within the context of a whole word. Those who wish to see the abbreviation marks in isolation may go to sections 1.2 and 1.5 of the character list.
6.1.2 Entity names
All abbreviations are referred to with entity names, with the exception of full stop, ".", and colon, ":". Entity names are placed within the delimiters "&" and ";", and we have tried to give as short and mnemonic names as possible. As a rule, we have based the entity name on the typical expansion of the abbreviation. Thus, the semicolon which is an abbreviation for "ed" (or "eð" / "eþ") is given the entity name "&ed;".
As explained in ch. 2, we aim at synchronizing our use of entities with those recommended by ISO. Since there presently are no abbreviation entities in ISO, we are left on our own in this chapter.
6.1.3 Unicode values
Unicode 3.2 has only defined a handful of abbreviation characters and only a few of interest for our use. The great majority of abbreviation characters must therefore be defined as code values in the Private Use Area. The only exceptions are the full stop, colon and semicolon, which are part of the range Basic Latin in Unicode, and the Tironian sign for et, in the range General Punctuation. See the discussion below in sections 6.2.8 and 6.2.9.
A complete list of suggested Unicode values is given in sections 1.2 and 1.5 of the character list.
6.1.4 Descriptive names
As is the case with ordinary characters (cf. ch. 3) we adhere to the naming scheme in Unicode. Since Unicode 3.2 only defines one abbreviation mark in the Latin alphabet, the TIRONIAN SIGN ET in the range General Punctuation, and only one in each of the Armenian, Syriac, Devanagari, Thai and Khmer alphabets, we do not have completley clear examples of descriptive names. We suggest ABBREVIATION SIGN "000" as a general name for abbreviations occupying a separate position on the base line, and COMBINING ABBREVIATION MARK "000" for those typically placed above, through or below a base line character.
For suggested descriptive names, please refer to sections 1.2 and 1.5 of the character list.
6.2 Abbreviation marks on the base line
Abbreviation marks on the base line behave as any other character. The typology is discussed and exemplified below. A complete list is found in section 1.2 of the character list.
6.2.1 The "et" mark
The Tironian Nota resembling the number "7" (or the character "z" with or without a crossbar) is often used for the conjunction "ok" / "oc" (in Latin "et"). We recommend using the entity name "&et;", reflecting the Latin origin of the abbreviation.
In Unicode 3.2 this character is located at 204A in the range General Punctuation.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point (et) &et; 204A
There are at least three different variants of this sign. If the transcriber wishes to make a distinction between these, we suggest supplying an index for each type, e.g. "&et-1;", "&et-2;" and "&et-3;". The meaning of each entity must be explained in the header of the transcription and specified in the DTD.
6.2.2 The "ed" mark
The semicolon was used for "e" + dental consonant, often in the preposition "með". We recommend "&ed;" as entity name.
In Unicode 3.2 the semicolon is located at 003B in the range Basic Latin. When the semicolon is used as a punctuation mark, it should be transcribed as such, i.e. simply as ";". When it is used as an abbreviation mark we recommend that it is transcribed with an entity, "&ed;". As an abbreviation mark it sometimes has a form resembling the number "3", i.e. drawn with a single stroke. For this reason, we suggest that a separate code point should be allocated to this character in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point m(eð) m&ed; E602
6.2.3 The "con" mark
A sign resembling a backwards "c" was often used for "con" in Latin and "kon" in Nordic words. As entity name, we recommend "&con;".
This "con" mark is partilally similar to 0254 LATIN SMALL LETTER OPEN O in the range IPA Extensions of Unicode 3.2, but should probably not be identified with this character. We therefore recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point (kon)a &con;a E600
6.2.4 The "rum" mark
The sequence "rum" was often abbreviated with a character resembling a small version of the number 4, or rather a round "r" with a stroke across its tail. We recommend the entity name "&rum;" and a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point eo(rum) eo&rum; E607
6.2.5 The cross mark
The word "kross" was sometimes abbreviated with the cross symbol, which we suggest calling "✗".
This "kross" mark is partially similar to 2020 DAGGER in the range General Punctuation of Unicode 3.2, but should probably not be identified with this character. We therefore recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point (kross) ✗ E601
6.2.6 The "m" rune
The runic character for "m" was sometimes used for the word "maðr" (including case forms with the stem "mann-"). We recommend the entity name "&mrun;", as introduced in ch. 5.2.7.
Unicode 3.2 has defined a selection of 81 runes from the older and younger futhark in the Runic range. This range includes the "m" rune.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point (maðr) &mrun; 16D8
The runic character may appear with interlinear marks ("a", "i", "e", "n", "z") for various inflected forms of the word "maðr", e.g. "manna", "manni"/"manne", "mann", "mannz". The encoding of this type is discussed in ch. 6.3.7 below.
6.2.7 The "f" rune
The runic character for "f" was sometimes used for the word "fé". In analogi with the use of the "m" rune, we suggest the entity name "&frun;".
The "f" rune is included in the Runic range of Unicode 3.2.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point (fé) &frun; 16A0
6.2.8 Dot (full stop)
Dots were often used as abbreviation marks, typically for suspensions, e.g. "s." for "sonr" (or "segja", "svara"). They may sometimes appear on both sides of the abbreviated word, ".s.". We recommend that the dot is transcribed in the same manner as a full stop, i.e. with the "." mark in Basic Latin. Thus, no entity name is called for.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point s(onr) .s. 002E k(onun)gr .kgr. 002E
If the transcriber wishes to distinguish between the dot used as an abbreviation mark and the dot used as a punctuation mark, we suggest that the entity name "˙" could be used in the former case and "." in the latter. However, we believe that there will arise a number of cases where it is difficult to decide whether the dot in the manuscript is a mark of abbreviation, punctuation or both, e.g. when a suspended word is the last word in a sentence. We therefore believe it is better to accept that the full stop is an ambivalent mark, as is also (although to a much lesser extent) the case with the colon and the runic characters "f" and "m".
6.2.9 Colon
The colon is sometimes, though not often, used as a mark of suspension, in the same manner as the dot (full stop). In analogy with the encoding of dots we suggest transcribing the colon simply as a colon, i.e. without using an entity.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point Rognv(aldr) Rognv: 003A
6.2.10 Small capitals
In Old Icelandic and Norwegian small capitals were used to denote geminated (long) consonants or they were simply used ornamentally (especially in Old Norwegian). In ch. 5.2.3 above we recommended that they were encoded as entities in both cases. The use of small capitals can be seen as a form of abbreviation, but there will be a number of cases where the usage is open to interpretation. We recommend that the transcriber copies the text as it is, transcribing a small capital as a small capital irrespective of whether it is being used to denote gemination or as an ornament. Thus, exactly the same entites will be used here as introduced in ch. 5.2.3.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point heRa he&rscap;a 0280
For the encoding of small capitals with dot above, please see ch. 6.3.8 below.
6.3 Combining abbreviation marks
The majority of abbreviation marks are placed above, through or below a base line character. It could be argued that they really refer to the whole word, but from an analytical point of view we recommend that they are encoded immediately after the base line character to which they seem most closely associated. Cf. the rules in ch. 2.2.1.
The typology of combining abbreviation marks is discussed and exemplified below, while a complete list is found in section 1.5 in the character list. Note that all abbreviation marks of this type is described as "combining". This means that they do not accupy a separate position on the base line, but are attached to the immediately preceding base line character.
It is sometimes difficult to decide whether a sign is placed on the base line or above another base line character. For example, the "us" mark (cf. ch. 6.3.3 below) may sometimes occupy a position of its own, although slightly raised above the base line. The classification in this chapter is based on what we believe are the prototypical positions of the abbreviation marks.
6.3.1 Horizontal bar
The horizontal bar is from a historical point of view the earliest form of an abbreviation mark and it is also the most ambiguous type. It is commonly used for "m" or "n" and is often referred to as "nasal stroke", but it is also used in a number of other contexts, as a mark of suspension or contraction. We recommend using the same entity name in all instances, "&bar;". The unmarked position of the bar is above the immediately preceding character.
This horizontal bar is partially similar to 0304 COMBINING MACRON and 0305 COMBINING OVERLINE in the range Combining Diacritical Marks of Unicode 3.2, but should probably not be identified with these characters. The macron is used to indicate length, and the overline connects to the left and right, as opposed to the prototypical horizontal bar. We therefore recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point han(n) han&bar; E700 p(restr) p&bar; E700 þ(at) &th;&bar; E700
In the last example, the bar crosses the ascender of the character "þ". In our view, this is only a coincidence, since the bar in all cases is placed above the x height of the base line characters. If there is a character with an ascender, the bar will simply cross this stroke.
The unmarked position of the bar is above the base line character, and this is therefore part of the definition of the entity "&bar;". In some cases the bar may be placed below the base line character. Here, we suggest the entity name "&barbl;" (for "bar below")
The horizontal bar below is partially similar to 0331 COMBINING MACRON BELOW or 0332 COMBINING LOW LINE in the range Combining Diacritical Marks of Unicode 3.2, but should probably not be identified with these characters. We therefore recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point p(er) p&barbl; E703
It is possible to identify various shapes of the horisontal bar. In general we recommend that the transcriber should not make more distinctions than strictly necessary. If the transcriber for some reason would like to create a typology of bar forms, we suggest that this is done in the same way as with the "et" mark, i.e. by numbering, "&bar-1;", "&bar-2;", "&bar-3;", etc. Cf. ch. 6.2.1 above.
6.3.2 Flourish
The flourish may be described as a horizontal bar with a return. It appears in the abbreviation of the Latin word "pro" in contradistinction to "per", which typically is abbreviated with a simple horizontal bar. We suggest using the entity name "&flour;" and recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point p(ro)fat p&flour;&fins;at E705
6.3.3 The "us" mark
Originally a Tironian Nota a mark resembling a small version of the number "9" is often used for "us". It is usually placed in a raised position, though not always clearly above the preceding character. Since the typical position of this mark is above the base line, we regard it as a combining mark and suggest the entity name "&us;" and recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point la(us) la&us; E70B
6.3.4 The "er" mark
A mark resembling a zigzag was frequently used as abbreviation of a front vowel (including diphtongs) + "r", e.g. "ir", "er", "eir", "ær". The earliest form resembles a horizontal stroke with a descender to the left and an ascender to the right. It later acquired a zigzag-like form, and even later resembles the letter "u" turned upside-down. We suggest using the entity name "&er;" since this is the most common expansion of the abbreviation. We recommend that it is given a separate code point in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point v(er) v&er; E704
6.3.5 The "ra" mark
Originally an open form of the character "a", this mark was used as an abbreviation for "ra" or "va". One variant resembles the Greek omega-sign, and another variant the omega-sign with a horizontal bar above. We suggest using the entity name "&ra;" for the first type, and "&rabar;" for the second. We recommend that both marks are given separate code points in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point s(va) s&ra; E706 f(ra) &fins;&rabar; E708
6.3.6 The "ur" mark
The syllable "ur" (sometimes "yr") can be abbreviated by a mark resembling a small version of the number 2. Later forms of this mark resembles a tilde and even later a horizontal version of the number 8 (equal to the mathematical eternity symbol). Due to the considerable variation in form we suggest that it might be useful to distinguish between two main forms, using the entity &ur2; for the first type and &ur8; for the second. Cf. section 1.5 in the character list.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point ock(ur) ock&ur2; E709
6.3.7 Interlinear characters
Interlinear characters are a common type of abbreviation. An interlinear vowel typically represent a consonant (often "r") + the vowel itself, while an interlinear consonant typically represent a vowel (often "a") + the consonant itself. We suggest that interlinear abbreviation marks are named by the character itself + "sup" (for "superscript"), e.g. "&asup;" (interlinear "a"), "&osup;" (interlinear "o"), "&rscapsup;" (interlinear small capital "r"), etc.
Unicode 3.2 includes a selection of 13 superscript characters, namely "a", "e", "i", "o", "u", "c", "d", "h", "m", "r", "t", "v", "x". They are located at the end of the range Combining diacritical marks, 0363-036F. We suggest that these characters are used to display interlinear characters, and that characters outside this selection are given separate code points in the Private Use Area.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point b(or)g b&osup;g 0366 m(anna) m&asup; 0363 v(ir)þa v&isup;&th;a 0365 þeg(ar) &th;eg&rsup; 036C Otta(rr) Otta&rscapsup; E910
The runic character "m", which itself can be used as an abbreviation (cf. ch. 6.2.5 above), can appear with an interlinear abbreviation mark. The encoding follows the pattern above.
Abbreviated
form Expanded form Encoding Abbreviation marks code
points (manna) &mrun;&asup; 16D8 + 0363
Since the first entity, "&mrun;", is defined as a base line character and the second, "&asup;", as an interlinear mark placed above the immediately preceding base line character, there will be no doubt as to the positioning.
6.3.8 Superscript dots
Superscript dots are sometimes used to denote length. It is a moot question whethter this is a type of abbreviation, but in any case the transcriber should use an entity for the encoding. We recommend that superscript dots are transcribed in analogy with other combining abbreviation marks and suggest using the entity name "&dotab;" (for "dot above").
Unicode 3.2 has a combining dot above in the range Combining diacritical marks.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point leg(g)ia leg&dotab;ia 0307
Sometimes the dot is used above small capitals. Since small capitals themselves are a way of representing gemination, the dot above is redundant. The encoding will simply be the same as above. Cf. ch. 6.2.10 above.
Abbreviated
form Expanded form Encoding Abbreviation mark code
point var(r) va&rscap;&dotab; 0307
6.4 Special cases
6.4.1 Nomina sacra
In some cases the whole word must be analyzed as an abbreviation. This applies to the traditional nomina sacra, i.e. abbreviations for sacred words such as "iesus" and "christus". These contain characters which originally were Greek but might be taken for Latin characters. For example, the "p" in "xpm" is originally a Greek "rho" ("r"). We believe these abbreviations should be encoded as separate entities and given individual code points in the Private Use Area.
Manuscript form Expanded form Encoding Abbreviation mark code
point (iesus) &ihc; E610 (christum) &xpm; E615
6.4.2 Interlinear characters in other contexts
Interlinear (superscript) characters are used in various ways, not always as abbreviations. According to de Leeuw van Weenen 2000: 36-43 there are four types:
(a) as abbreviation
This type is discussed in ch. 6.3.7 above. Here, we recommend the usage of entities such as "&asup;".
(b) as addition
When interlinear characters are used for adding characters which were left out by the scribe we recommend that this is encoded by use of the element <add> and the attribute place="supralinear" (cf. ch. 7.2). There is no need for an entity of the type "&asup;" since the location of the character is indicated by the element.
Manuscript form Expanded form Encoding han`a´ han<add
place="supralinear">a</add>
(c) as complementation of Roman numbers
Inflected forms of Roman numbers are sometimes specified by interlinear characters. In these cases the interlinear characters are not placed above any base line character bur merely raised above the base line. We suggest using the element <seg> and the attribute type="superscript".
Manuscript form Expanded form Encoding v.`ti´ v.<seg
type="superscript">ti</seg>
(d) as space savers
Especially at the end of a line one or more characters may be placed above the last word to save place and complete the line. We suggest the same encoding as in (c) above.
Manuscript form Expanded form Encoding e`s´ e<seg
type="superscript">s</seg>
6.4.3 Missing abbreviation mark
From time to time one can find examples of a word that obviously is abbreviated but where there is no trace of the abbreviation mark. There is then no alternative but transcribing the text as it reads in the manuscript.
Manuscript form Expanded form Encoding d(rottning) d
6.4.4 Nesting (stacking) of abbreviation marks
There are a few examples of base line characters which are abbreviated with an abbreviation mark which is itself abbreviated. An example is the base line character "m" with an interlinear "o" which in turn has a horizontal bar. According to rule 7 in ch. 2.2.1 above this abbreviation should be encoded as the sequence "m" + "&osup;" + "&bar;".
Manuscript form Expanded form Encoding m(onnom) m&osup;&bar;
Since "&osup;" is defined as a combining character it follows that it is placed above the immediately preceding character, in this case "m", and since "&bar;" is also defined as a combining character it follows that it is placed above "&osup;". There is therefore no doubt as to the positioning of each part.
6.4.5 Extension of abbreviation marks
As a rule, combining abbreviation marks are associated with a single base line character. Thus, the sequence "m&osup;" means that the interlinear character "o" is seen as being placed above "m" and not above any other character. However, some abbreviation marks extends over more than one charcacter. For example, the word "k(ir)kia" may be abbreviated with a horizontal bar crossing both the first and the second "k". We believe it is sufficient to associate the abbreviation mark with only one of these characters, preferably the first.
Manuscript form Expanded form Encoding k(ir)kia k&bar;kia
It is possible to encode this word so that the bar is associated with both characters. This is in a sense closer to the manuscript form, but it means that a single abbreviation mark may appear as two distinct marks (unless it is somehow stated that the two marks belong together). Thus, this is a more complex and possibly misleading solution.
Manuscript form Expanded form Encoding k(ir)kia k&bar;k&bar;ia
On the other hand, it should be noted that this a case where 0305 COMBINING OVERLINE might be useful, since it connects to left and right. Cf. the reference in ch. 6.3.1 above.
6.4.6 Sporadic ligatures with abbreviation marks
In ch. 5.3 we recommended that sporadic ligatures should not be encoded by use of separate entities but by the element <seg> with the attribute type="ligature". A sporadic ligature is basically a joining of two base line characters which together do not reflect a separate phonological value. This is the case with ligatures such as "s+k" and "p+p" which in this respect are identical to "s" + "k" and "p" + "p".
Manuscript form Expanded form Encoding (pp) <seg
type="ligature">pp</seg>&bar;
However, some ligatures are formed in such a manner that it is difficult to distinguish the separate parts. That applies to ligature of tall s + h, k and þ. In these cases, we suggest that it is advisable to use individual entities and give them separate code points in the Private Use Area.
Manuscript form Expanded form Encoding Abbreviation mark code
point h(an)s &hstalllig; E800 k(onung)s &kstalllig; E801 þ(es)s þstalllig; E802
Sometimes, a horizontal bar is used across these ligatures. In that case, we suggest that the bar is encoded separately with its usual entity, &bar;. Cf. ch. 6.3.1 above.
Manuscript form Expanded form Encoding Abbreviation marks code
points k(onung)s &kstalllig;&bar; E801
+ E700
6.4.7 The character "r" as interlinear ligature
A quite special type of abbreviation is interlinear "r" in ligature with e.g. "þ". We believe that it is practical to distinguish between the ordinary interlinear "r", which typically does not touch any base line character, and the ligating type. For the latter we suggest using the entity name "&rarm;" (where "arm" refers to the fact that it appears as an arm of an ascender). We recommend that it is given a separate code point in the Private Use Area.
Manuscript form Expanded form Encoding Abbreviation mark code
point þ(ar) &th;&rarm; (00FE) +
E707
6.4.8 Sharp "s"
In late Old Norwegian the "sharp s" appears in a number of abbreviations, e.g. for "skilling", "smør" and "son". The German character "sharp s" is defined in Unicode 3.2 as 00DF LATIN SMALL LETTER SHARP S in the range Latin-1 Supplement. It may be practical to distinguish between the "sharp s" in German usage and the similar-looking abbreviation mark in Nordic manuscripts. We therefore suggest the entity name "&ssharp;" for the German letter (if needed) and "&Ssharp;" for the abbreviation mark.
Manuscript form Expanded form Encoding Abbreviation mark code
point Hakon(son) Hakon&Ssharp; 00DF
6.5 List of abbreviation marks
A complete list of abbreviation marks is found in sections 1.2 and 1.5 in the character list.
Preliminary version created 14 January 2002. Version 1.0 published 20 May 2003. |