7 Formalizing texts

In this chapter, we discuss some problems related to formal representations of texts. We demonstrate how a text can be formalized and turned into a knowledge base in Prolog+CG. We also give some additional examples.

Formalization can be viewed as a process and as a product.

Here, the product will be a knowledge base, containing a rich semantic representation of a domain. The knowledge base must be constructed in such a way that the semantics of all relevant elements in the domain are accounted for, and it must be constructed for in such a way that it becomes possible to reason over the knowledge base in a suitable manner.

In the preceding paragraph, the words "relevant" and "suitable" point to the fact that every formal representation is indeed an interpretation governed by the perspective of those who construct the knowledge base. It reflects the need to think about formalization not only as a product, but as an ongoing process as well.

In any process of formalization, the domain must be carefully observed, and the purpose of having a knowledge base as well as the perspective of those who will eventually use the knowledge base must be taken into account.

Consider a simple question: what is the ontological status of a human being? Which one of the following descriptions is the more correct?

universal > animate > animal  > mammal > human_being.
universal > animate > animal, human_being.

Solution a) might be suitable for representing biological facts, whereas solution b) enables us to make a clear distinction between man and beast, which is a useful distinction e.g. in analyzing narratives. The point is that the "better" representation is the one that reflects not only the domain in question but also the intended use of the knowledge base.

Prolog+CG enables us to combine two kinds of reasoning with a knowledge base:

  • Operations on the graphs themselves (such as investigating concepts in use or parts of a graph).
  • Operations on the type-hierarchy (such as looking for more or less general concepts or graphs). This is also known as ontology-driven analysis

In formalizing a text -- more generally speaking: a domain -- both of these options should be kept in mind.

While working to build the knowledge base, you are likely to experience the need to iteratively work on the hierarchy and on the graphs. Often a representation of a new sentence gives rise to rethink the structure of (parts of) the type hierarchy -- and vise versa.

To avoid this process to become too chaotic, we recommend that you pay careful attention to the construction of the top of the hierarchy, while thinking about the following questions:

  • What are we likely to find in this domain?
  • What kinds of investigations are we likely to use the knowledge base for?

Note: This introduction and the next section were written by Henrik Schärfe. He is responsible for the lucidity, not me.
Ulrik Petersen

PrevLite: 6.9 Lists
NextLite: 7.1 Nuts and bolts

Prev: 6.9 Lists
Up: Part II: CGs
Next: 7.1 Nuts and bolts