gold-core-0.2 This document contains a *working draft* the General Ontology for Linguistic Description (GOLD), an ontology that was developed by several individuals at the University of Arizona as a part of the E-MELD (Electronic Metastructure Endangered Language Data) project. Contributors of ontological content include: D. Terence Langendoen, Scott Farrar, William D. Lewis, Peter Norquest, Brian Fitzsimons. The following people gave much advice in the development of GOLD: Adam Pease, Ian Niles, John Bateman. Also, we gained much insight from a taxonomy of linguistic categories provided by the Summer Institute of Linguistics (SIL). Just as a convention, this ontology capitalizes the first letter of names of classes and individuals (e.g., 'BoundUnit') but uses lower case for names of relations, called properties in OWL (e.g., 'realizes'). Changes since the last version include: a total distributization of gold into different files. 2004-12-26, edited by Scott Farrar The Protege metadata ontology (in the version that is used inside Protege. Note that this is an OWL Full ontology, with annotation properties that have range and domain restrictions. However, the "official" online release of this file is OWL DL, so that ontologies that use Protege metadata annotations can still be shared as OWL DL. A form unit that participates in syntactic relations. These are classified according to structural complexity, i.e. syntactically complex or simple (lexical). An OrthPhrase is a concatenation of one or more instances of OrthWord. OrthPhrase LinguisticFeature, also called 'property', 'quality' or 'feature name', is the class of features that may be associated with units relevant to a linguistic systems. e.g., the feature 'tense' has values: 'past', 'present', ..., 'future'. In the broader domain, the class 'feature' can be thought of as the set of qualities associated with some object in general, e.g., color, size, shape, etc. (Shieber 1986: 12; Gaerdenfors 2000; Masolo et al. 2002). ---------- LinguisticFeatureValue is the class of values that may be associated with instances of linguistic feature. That is, specific features have specific feature values associated with them, e.g., the feature 'tense' has 'past', 'present', ..., 'future' as values. In the broader domain, the class of LinguisticFeatureValue can be thought of as the set of qualia associated with some feature in general, a point in cognitive space. E.g., red is a quale in color space (Shieber 1986: 12; Maxwell, Simons, and Hayashi 2001; Gaerdenfors 2000; Masolo et al. 2002). Any linguistic feature that pertains to the semantic content in a linguistic system. under construction An OrthWord is the fundamental unit of an orthography, usually set off by white space. OrthWord The form units below the level of the syntactic word, i.e. those form units not participating in syntactic relations, but only morphological relations. That is, a morphological unit cannot occupy a lexical position in a syntactic construction. Morphological units are the smallest form units that have a meaning. In some theories, these correspond to the notion of morphemes or constructions. In a feature system, these elements carry morphological or morphosyntactic features. A text is a linguistic sign above the level of the clause, that is, at the discourse level. Relations that hold among various Texts include discourse constituency relations. Note that text is distinct from DiscourseSegement, the corresponding semantic unit at the level of discourse. A sign is an abstract structure whose instances participate in a linguistic system, or `language'. By definition, a linguistic sign must have a form component (whose elements are phonological units) and a meaning component (whose elements are semantic units). The formal structure of a linguistic sign is determined by the grammar of a language. The information value of a linguistic sign, its meaning, is not fixed, but determined by the conventions of the language. The relation of form to meaning is largely arbitrary within a semiotic system. Signs are classified primarily according to what kinds of formal relations they participate in, and, secondly, accoarding to theircomplexity (whether they are atomic or composed of other signs). Signs range from morphological and syntactic constructions to whole discourse segments (Saussure 1955; Hervey 1979; Pollard and Sag 1994). Also called 'grammatical categories', or 'grams', a morphosyntactic feature is the class of linguistic features inhering in form units. Morphosyntactic features give form units their morphosyntactic behavior in a grammar. E.g., two form units can 'agree' according to shared form features. This class is intended to represent only the formal aspects of morphosyntax; that is, there is no notional component. In a grammatical system, attributes of the same type express meanings from the same conceptual domain. That is, they occur in contrast to one another other, and are typically expressed in the same fashion (Crystal 1985: 43-44; Hopper, P. 1992: 81, Bybee 1985: 191). ----------------- FormFeatureValue is the class of values that may be associated with instances of FormFeature. In a FeatureSystem, these dictate the formal properties of the grammar and may or may not be true semantically. A set of FeatureValues forms an integral part of a language's FeatureSystem (Pollard and Sag 1994; Maxwell, Simons, and Hayashi 2001). Term This class includes includes any expression that is not conventially a part of a written language, but is used to name various features, values, and other linguistic constructs. Terms are used in interlinear text, often on the second line, to annotate or 'gloss' transcriptions, e.g., '1st' or 'NOM'. more later SymbolicString is a very general category subsuming any entity which is the product writing process. Instances are usually symbolic, either part of the orthographic or other conventional system. NOTE: there is significant room here for expanding the ontology, that is, to account for different types of orthographies: e.g., hieroglyphs, Unicode characters, Chinese characters, Roman alphabetic characters etc. 1 SymbolicString A special type of OthPhrase usually representing a Clause. In Western writing systems, an OrthSentence is set off by white space on the left edge and some kind of puncuation, such as a period or question mark, on the right. OrthPart is the subclass of OrthographicExpression whose members are not orthographically independent, that is, they cannot stand alone as words but compose to form words. Note that an OrthPart is not the same as a single character. Although, some OrthParts are single characters. OrthPart An elementary unit comprising SymbolicStrings. A single Character is also defined as a subclass of SymbolicString itself, e.g., the letter 'a', or a Chinese character. OrthographicExpression An OrthographicExpression is composed of the standard characters of an orthographic system. In a Romanized system, it is the 'spelling' associated with some word. An OrthographicExpression is governed by the orthographic combinatorial rules of a particular language. OrthographicExpressions are not transcriptions of any external entity, but independent linguistic expressions which refer directly to the LinguisticUnits of the language. They are the physical realizations of some human language, possibly no longer spoken. infixedIn is the relation between a Lexical- or SublexicalUnit and a Root. The Root is realized as discontinuous, surrounding the inserted Lexical- or SublexicalUnit (Hartmann and Stork 1972: 111). A relation holding between morphological units. This relation holds between two form units and represents the notion of precedence in a language. That is, (precedes A B) means that A comes before B in the linearization of the realization of linguistic signs. This inverse of this relation is 'follows'. Any relation that establishes an the linear ordering of form units. The relation between an orthographic expression in one language and some orthographic expression in another such that the translation is done on a word by word, or morpheme by morphem, basis without regard for idiomatic usage. literalTranslation This relation associates some LinguisticSign with its corresponding sound. This relation may become useful when working with sound files. An object, traditionally defined, is either a direct object or an indirect object.An object, in some usages, is any grammatical relation other than subject (Crystal 1985: 211; Hartmann and Stork 1972: 155-156; Mish et al. 1990: 814, Comrie 1989: 66). translates The relation between an orthographic expression in one language and some orthographic expression in another such that both expressions have the same or roughly the same meaning. This relates a FeatureSystem to a FeatureContraint. A relation holding between syntactic units, often manifesting itself in shared form features. NOTE: this could be better defined once syntactic roles and relations are developed. This relation associates some LinguisticSign with a SemanticUnit. NOTE: This will be expanded with the development of the semantic component of GOLD. This is the superclass of common lexical relatations such as synonym, antonym, etc. NOTE: this needs work. Such relations really pertain to meaning and not form units. The relation between a morphological unit and the lexical unit to which it is attached. The LexicalUnit is usually a Root or Stem. The inverse of prefix is suffix (Crystal 1980: 281; Hartmann and Stork 1972: 182; Mish et al. 1990: 927). synonym antonym This relates a SimpleSpecification to some instance of LinguisticFeatureValue. Any relation between form units. A direct object is a grammatical relation that exhibits a combination of certain independent syntactic properties, such as the following: the usual grammatical characteristics of the patient of typically transitive verbs; particular case marking; a particular clause position; the conditioning of an agreement affix on the verb; the capability of becoming the clause subject in passivization; the capability of reflexivization. The identification of the direct object relation may be further confirmed by finding significant overlap with similar direct object relations previously established in other languages. This may be done by analyzing correspondence between translation equivalents (Crystal 1985: 94; Hartmann and Stork 1972: 155; Mish et al. 1990: 358; Comrie 1989: 66; Andrews, Avery 1985: 68,120,126; Comrie 1985a: 337). The relation between a linguistic unit and a linguistic feature. A feature inheres in its host. NOTE: this relation is distinct from the hasFormFeature which pertains to data structures. hypernym NOTE: still lacks development. This relation holds between two form units and represents the notion of circumscription in a morphosyntactic system. That is, (circumscribes A B) means that part of A comes before B and part of A comes after B, in the linearization of the units of a language. Any relation holding between syntactic units. The relation between a Lexicon and its contents, instances of LexicalItem. NOTE: this could probably be replaced by the memberOf relation from set theory. This relation holds between two form units and represents the inverse of 'precedes'. That is, (follows A B) means that A comes after B in the linearization of the realization of linguistic signs. The inverse of this relation is 'precedes'. This subsumes all structuring relations used for LinguisticDataStructures. As a naming convention to distinguish relations in data structure from other relations, all names of dataStructuringRelations begin with 'has-'. This relation expresses dominance between form units, e.g., (constituent `un' `unbelieveable') or (constituent `the house' `in the house'). This semiotic relation associates some OrthographicExpression with some Entity. It differs from 'labels' in that a name is usually considered part of the orthographic system, where a label is not. The relation associates some LinguisticSign with its phonological structure. This relates a ComplexSpecification to a FeatureStructure, thus giving a FeatureStructure its recursive properties. This relates a FeatureStructure to a FeatureSpecification. All relations that have the linguistic sign as the domain. This relates a LexicalItem to a LexicalUnit, those elements commonly represented in a dictionary. The relation between a morphological unit and the lexical unit to which it is attached. The LexicalUnit is usually a Root or Stem. The inverse of suffix is 'prefix' (Crystal 1980: 340; Hartmann and Stork 1972: 226; Mish et al. 1990: 1179). freeTranslation The relation between an orthographic expression in one language and some orthographic expression in another such that both expressions have exactly the same meaning. The words in the translation may not correspond to the those in the source expression. names This relation names or simply associates some SymbolicString with any Entity. This relates either a FeatureStructure or a FeatureConstraint to its type, expressed by an instance of PartOfSpeech. A general category subsuming relations relevant at the level of the Clause, such as predicate and subject. A grammatical relation is a role of a phrase or complement clause that determines syntactic behaviors such as the following: word position in a clause; verb agreement; participation and behavior in such operations as passivization (Comrie 1989: 65-66, Andrews, Avery 1985: 66). The predicate is the relation between the Clause and a portion of a clause, excluding the subject, that expresses something about the subject (Crystal 1980: 280; Hartmann and Stork 1972: 182; Pei and Gaynor 1954: 173; Pike and Pike 1982: 40; Mish et al. 1990: 926; Crystal 1985: 241-242). meronym This relates a FeatureSpecification to a type of LinguisticFeature. An obliqueObject is a grammatical relation proposed for a noun phrase clause constituent with the following characteristics: Its nature and behavior are more readily describable in semantic terms than syntactic; It is likely to be the most constrained in the semantic roles that it may individually express; It is likely to be marked by an adposition or case affix; It is not likely to be a target of syntactic rules, such as agreement with the verb, or strategies of relativization (Andrews, Avery 1985: 81-82,92,127-128; Comrie 1989: 66,179). This is a relation between two OrthographicExpressions, usually some native orthography and some other orthographic system. A subject is a grammatical role that exhibits certain independent syntactic properties, such as: the grammatical characteristics of the agent of typically transitive verbs; the grammatical characteristics of the single argument of intransitive verbs; a particular case marking or clause position; the conditioning of an agreement affix on the verb; the capability of being obligatorily or optionally deleted in certain grammatical constructions (such as the following clauses: adverbial, complement, coordinate); the conditioning of same subject markers and different subject markers in switch-reference systems; and finally, the capability of coreference with reflexive pronouns. The identification of the subject relation may be further confirmed by finding significant overlap with similar subject relations previously established in other languages. This may be done by analyzing correspondence between translation equivalents (Crystal 1985: 293; Hartmann and Stork 1972: 224; Pei and Gaynor 1954: 205; Mish et al. 1990: 1174; Pike and Pike 1982: 458-459; Andrews, Avery 1985: 68-69, 103-117; and Comrie 1989: 66). An indirect object is a grammatical relation that is one means of expressing the semantic role of goal and other similar roles. It is proposed for languages in which the role is distinct from the direct object and the oblique object on the basis of multiple independent syntactic or morphological criteria, such as the following: having a particular case marking, commonly dative; governing an agreement affix on the verb, such as person or number; being distinct from oblique relations in that it may be relativized (Crystal 1985: 156; Hartmann and Stork 1972: 155-156; Pei and Gaynor 1954:99, Givon 1984: 109-110; Mish et al. 1990: 614) Andrews, Avery 1985:126-128, Comrie 1989:66 hasConstituent relates a Constituent to a higher Constituent in a StructuralDescription. This is the data-structure equivalent of the actual morphosyntactic dominance relations. Rewrite rules express this relationship. A transcription is a realtion between a language event and a string. A transcription is something that represents some other entity observed by a transcriber. Therefore, it is an interpretation of an original source. For example, this is usually the primary data in interlinear glossed text. The key difference between a transcription and an orthographic expression is that a transcription involves two entities, a transcriber and a source (speaking event, audio file, video file, some other orthographic expression). This relation associates some LinguisticUnit with an orthographic realization composed in the native orthogrphy, as when a an English speaking transcriber transcribes an English conversation. realization This relation is included as a work-around, because string literals cannot be used as classes themselves, i.e., as subjects in an rdf graph. 'physicalForm' expresses the substantial part of all written expressions.