
Content analysis consists of assigning text to categories so the text can be
compared. The key is to have categories that are valid since they must
measure what they purport to represent. If the performance indicator is
unambiguous, the classification can be straightforward. For example, in the
case of the legal affairs programme dealing with international trade law,
documents can be searched to see whether they referred to the work of
UNCITRAL.
Once you have determined the information to extract, you define categories.
These are classifications given to the information that will allow it to be
compared. In the example, a simple scheme would have two categories:
document refers to work of UNCITRAL, document does not refer to work of
UNCITRAL. More complex categories could be developed, by specifying the
subject of the work of UNCITRAL to which reference is made. The categories
can be as complex as needed to be able to show whether and why performance
has taken place. Good categories meet the criterion that they are all
inclusive and each is mutually exclusive, meaning that each observation
could be placed in only one category. If an observation could be fit into
more than one category, the categories are inadequate.
Finally, before applying coding categories, they should be tested. Once the
proposed categories are defined, it is important to determine (test) whether
they are realistic in terms of the data. To do this, code a sample of
documents to see whether the categories actually “work” and they provide the
basis for the type of analysis needed. If it proves that there is ambiguity
in the categories (a given text could be assigned to more than one
category), or if it is clear that some categories will be empty, the
classification scheme can be adjusted.