How to Code

Core Steps

The following outlines the process of coding conversational material for research. Coding for practical analysis is unlikely to utilize the latter steps associated with the reliable analyses that science demands.  In departments where practical analysis is used, those responsible are encouraged to organize regular (e.g., quarterly) sessions whereby they independently code some archive material and undertake a reliability analysis.  This can serve as useful evidence as to the department’s competence in obtaining reliable coding on ‘the ground’ when resource is limited.

Here are the basic steps:

Gain clear access to the to-be-coded material, either audio access or a transcription (or both)

Determine the ‘unit’ of analysis. This is not a simple decision and requires a compromise between analysis objectives and practical constraints. There are at least 3 possibilities:

  • Thought units: conceptually depicts a complete idea that a person wishes to express, while operationally the unit is an independent clause with a subject and an object (e.g., “I agree with you”). It therefore represents the level at which analysis isolates single communicative acts, and so avoids the danger of overlooking smaller, but potentially significant components of negotiators’ behavior.
  • Utterances: perhaps the most common unit of coding, the utterance is a single statement by an individual. Although often more reliable as a unit of coding, it is not without error, particularly in dialogues where there is constant interruptions or over-talk. It can also be more difficult to code than thought units, since the average utterance contains about 2 thought units that may have conflicting motivations and orientations.
  • Episodes: non-overlapping segments of interaction during which people communicate regarding a single, clearly distinguishable issue, without significant deviation (known as dialogue movement) away from that issue. Formally defined as rhetorical structure analysis.


Single interactions: where multiple recording sessions are available, it is possible to code each recording session as a unit.

Code the material. The coding of materials means identifying when a particular behavior has occurred. For example, if coding for the cylinder model, the easiest way to begin this process is to code for each dimension—orientation, motivation and intensity—separately. After a short period it is possible to recognize individual frames in one pass. If coding at the level of behavior, it is best to have the coding sheet beside you and work steadily through the material.

Determine the reliability of your coding. This involves having a second (or third, etc.) coder examine and independently code the same piece of material. The idea is to determine whether or not you both come up with the same set of codes, since this is evidence that the coding is in some ways reproducible and not simply a projection of your personal interpretations. At a minimum coding reliability will be tested in three stages:

  1. Unitization: This refers to a check that judges agree about the units that are being coded. For example, do both coders identify the same thought units? It is assessed using Guetzkow U statistic, which is calculated as U = (O1– O2) /  (O1 + O2), where Ois the number of units Observer 1 sees in a text, and Ois the number of observations Observer 2 sees in a text.
  2. Code reliability: This refers to a check that judges assign each unit to the same code. For example, do we both see this paragraph as cooperative-instrumental? It is assessed using Cohen’s Kappa statistic, which is easiest to calculate and examine using a statistics package like SPSS. In SPSS you can simply enter your coding as qualitative variables in two columns (ensuring there aren’t erroneous capitalizations, misspellings or spaces). SPSS’s cross-tabs function allows the calculation of Kappa
  3. Sequence reliability: When an individual codes a piece of interaction, she or he is vulnerable to recency effects—the possibility of using a code again simply because it is salient in mind. It is possible to check that this is not the case by calculating a Sequence Kappa, using the software available here:


Some Personal Observations

A coding scheme must be relevant to the end user. Early work on the cylinder model used lots of behaviors, but this aspect has been seldom taught. It was the model of 9 message framing that caught on. In some domains, such as counter-radicalization, only one of the dimensions is used.

It can be useful to ask the end user what they use  and develop the coding framework from this. The Table of Ten did this. This is all but imperative when you want to translate your analysis back to the end user.

Focus on coding behavior relevant to a particular concern, if that is the operational concern. For example, if the research team are worried about emotional stability, then it is more valuable to focus on the detail of identity (and possibly relationally) orientated dialogue.

Focus on coding sets of behavior. There is often a belief or concern that we have to capture everything. This is not true and reflects poor practice. Capturing everything only increase the chances of error and suggests that the purpose of the analysis has not been thought out. For user/law enforcement analyses, this can appear in court as though the investigators were ‘digging for dirt’. For researchers, this likely limits the reproducibility and generalizability of findings.

Why not have professionals code some transcripts (in the hostage negotiation world we’ve started doing that for training)? There is no better way of getting professionals to understand what you are doing as researchers, and for you as researchers to better understand how they understand their interactions. For example, the cylinder model is used by some departments as a way of structuring their debriefing process following an assignment.

Dependent Variables must be dynamic. If you are analyzing with a focus on identifying what works with a particular suspect, then – almost inevitably – your measure of what is working must be part of your coding. In interview work this likely involves coding for information provision. We tend to use the following categories:

Case-related personal informationInformation about the suspect’s motivation, feelings, thoughts or background“I took the money because I have a lot of debts.”
Case-related contextual informationInformation about the criminal event and/or the involvement of others“I took the money while the attendant was smoking a cigarette.”
Refusing to give informationBeing silent or refusing to answer“No comment.”


Comparing against interaction outcome is not useful. One may have 12 hours of dialogue during which there is a 30 second interaction where crucial information is provided.  Taking the whole 12 hours of codes as being responsible for the 30 second success is not enlightening. It’s also not consistent with an information gathering philosophy, which defines the HIG.

Analyses that take the whole 12 hours of codes as being responsible for the final 30 seconds of success are not enlightening

Analysis also should avoid being frequency based. It doesn’t translate very well because behaviors occur in sequences not in amounts. It can lead to dangerous assumptions, such that certain behaviors are always good while others are always bad.

Any form of analysis that respects the sequence is valuable. Here are the main ones.

  • Contingencies: The co-occurrences of behaviors within a sequence, often referred to as contingencies or lag-1 relationships, provide important information about the way in which a sequence is constructed. See for example, Bakeman, R., Deckner, D. F., & Quera, V. (2005). Analysis of Behavioral Streams.  In D. M. Teti (Ed.), Handbook of research methods in developmental psychology (pp. 394–420).  Oxford, UK:  Blackwell Publishers.
  • State-transition diagrams: These graphs consider “what follows what” (i.e., lag-1 conditionals). They are useful for identifying “pathways” through a sequence (e.g., a life history), and for finding critical features within the average sequence (e.g., turning points). They do not, however, take account of longer dependencies (i.e. lag-2 upwards), and alternative methods are required to capture more complex relationships (e.g., proximity coefficient, see below). See, for example, Fossi, J., Clarke, D. D., & Lawrence, C. (2005). Bedroom rape: Sequences of sexual behavior in stranger assaults. Journal of Interpersonal Violence, 20, 1444-1466.
  • Proximity Analysis: This form of analysis measures the interrelationships among codes within a sequence using a general coefficient. The coefficient avoids the arithmetic manipulations and extrinsic assumptions made by many existing techniques, and it uses data efficiently which allows comparisons across speakers, among transcripts, and across different sections of the same sequence. See, for example, Taylor, P. J., & Donald, I. J. (2007). Testing the relationship between local cue-response patterns and global dimensions of communication behavior. British Journal of Social Psychology, 46, 273-298. Software available here
  • Phase analysis: Explores consistency in interaction by identifying coherent periods, or phases, of behavior. It uses sequence data where events may reoccur or not, whose entities or behaviors are coded into discrete events. See, for example, Fisher, B.A. (1970). Decision emergence:  Phases in group decision making. Speech Monographs, 37, 53-66.
  • Optimal matching analysis: An extension of phase analysis, this method calculates the overall similarity of two or more sequences, or the similarity of these sequences to a prototype (Holmes, 1997). Similarity is based on the number of changes (known as Indels [Insertions and Deletions]) that it is necessary to make before one sequence becomes the exact replica of the other. See, for example, Holmes, M.E., & Sykes, R.E. (1993). A test of the fit of Gulliver’s phase model to hostage negotiations. Communication Studies, 44, 38-55.
  • Motif analysis: This method seeks to identify sub-sequences (motifs) that are common to most sequences. The method relies on something called Gibbs sampling, which iterates over the data many times until motif candidates (of various lengths) that occur more than might be expected by chance emerge.  It is useful in any scenario where the researcher wants to identify the fundamental or core sequence of behaviors that are common to most cases. See, for example, Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., & Wootton, J. C. (1993).  Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, 262, 208-214.


Additional thoughts

In research terms, coding reliability is everything. It is far too easy for researchers to overlook the consequence of having a poor measurement tool, which is of course to render the findings un-generalizable.

Coding must be relevant to suspects. Where at all possible, it is useful to have a coder who understands the cultural background of the suspect. In research, it is useful to get cultural diversity in who is doing the analysis.

It is possible to conduct automated coding using N-gram based modeling (there is a piece of work here to be funded). This serves two advantages. First, it can speed up coding in situations where a rapid assessment is needed. Second, the automated coding can be compared to the human coding as a form of reliability check.

It is often the case that a completed coding contains one code that occurs a high percentage of the time (say, e.g., 50% and above). Since the purpose of a coding scheme is to detect variation in behavior, this strongly suggests that the coding scheme is not appropriate for the task. A greater level of granularity is needed in which the frequent code is divided into further ‘sub-codes’.

Less is more (in terms of coding). One of my most successful pieces of research, on bystander intervention and violence, used two codes – escalation vs. de-escalation.