Data Components

The GOLD Community is founded on best-practice data resources. But given that the data may come from a variety of disparate sources, about different languages, and described from different theoretical perspectives, it is necessary to map these data onto a common semantic resource -- GOLD. But mapping from data to knowledge is not a simple transformation. The various terminologies used in the best practice resources first need to be rendered transparent and compatible with one another. Thus, best practice data resources are mapped to a set of descriptive profile resources, shown in Figure 1. These resources in turn allow for the transition from XML (data) to RDF/OWL (knowledge), described below.


However, since the idea is relatively new, the Community has to rely also on the large number of legacy data resources already available on the Web, ie. those not in best practice. It is the intention of the Community to promote services to transform legacy resources to best-practice. In the actual implementation, legacy resources will be mapped to a set of legacy mapping resources. This is shown in Figure 1 below:


Figure 1: From legacy to best practice.

Best practice resources

Best-practice data resources minimally use Unicode and are in a consistent XML format with an accompanying XML Schema or DTD. In addition, it is recommended that such resources utilize one of the many formats suggested by the E-MELD School of Best Practice.


Descriptive profiles

A profile minimally consists of a mapping of terms used in the data source document to concepts in the ontology. We refer to this as a terminology mapping. A terminology mapping document is a simple set of terms, a termset, linked to concepts in the ontology. A terminology mapping has these minimal requirements:

Beyond terminology mappings, a profile may include a grammatical sketch of the language in question, e.g., an enumeration of the possible features in the grammatical system.

Legacy resources

Legacy resources are given in unstructured formats such as HTML and text documents and proprietary formats, e.g., Microsoft Word, which cannot be read in the absence of special software that may not always be supported.


Legacy mapping resources

Legacy mapping resources map legacy materials to descriptive profiles. This methodology provides a short-cut when the entire legacy resource cannot be converted to a best-practice format. Instead, the most important aspects of the resource, e.g., a partial grammatical description, are captured.


Knowledge Components

Once a framework for best-practice data is in place, the real advantages of the GOLD Communty emerge, as data can be transformed into knowledge. The following figure shows GOLD in relation to various other knowledge components: at the right is an OWL version of an upper ontology (e.g., SUMO or DOLCE); at the middle level is GOLD itself which is actually a network of separate OWL files extending the upper ontology via the subclass relation; towards the left are various Community of Practice Extension (COPE) resources which extend GOLD via the subclass relation into the various theory- or language-specific subdomains; and finally, on the extreme left is the RDF store consisting of instantiated best-practice resources.


Figure 2: The Knowledge Components of the GOLD Community.

Community of practice extensions

COPEs are essentially sub-ontologies that extend GOLD. COPEs provide two main benefits for the GOLD Community. First, they provide the means to create 'communities of practice', the community of consensus formed around specific terminologies and services. With COPEs, communities have the ability to maintain their language-specific, theory-specific, or resource-specific knowledge in discrete, manageable packets. Second, COPEs give individual communities the means to relate their work with one another by virtue of the fact that COPEs extend a single semantic resource, GOLD. A specific community, e.g., linguists concerned with Bantu languages, could create a COPE ensuring that their terminology is interoperable with a completely different community, such as a community centered around the use of WordNet.


Instances: the RDF store

The RDF store consists of instantiated classes from COPEs and GOLD. The instances correspond directly to the data and annotation as expressed in best-practice XML. To be maximally useful, the RDF store can be loaded into an RDF framework (e.g., Sesame) for fast knowledge retrieval.