Thursday, December 15, 2022

How we compare ourselves to Others

by Denise Wilson, December 15, 2022

Often when researchers are trying to study how one group of people behaves compared to another group or groups, we code our data to support a wide group of statistical techniques called regression analyses. Regression simply tries to come up with a mathematical relationship (typically linear) between input (independent) variables and an output (dependent) variable.

Without getting into the math of it all, the hands-down, most popular way to code individuals by their demographic characteristics is a process called dummy coding... which BTW does not imply that anyone is a dummy.  

Dummy coding works by identifying a reference group and then giving everyone who doesn't belong to that reference group a label and a category of their own.  For example, in a population of students who are White, Asian, Black, Multiracial, or of "other" races, we might choose the reference group to be White people.  To find a place for all races in the statistical analysis, we could then dummy code the five categories of race into four variables:

  • Asian:  this variable would code all White students as "0" and all Asian students as "1"
  • Black:  this variable would code all White students as "0" and all Black students as "1"
  • Multiracial:  this variable would code all White students as "0" and all Multiracial students as "1"
  • Other:  this variable would code all White students as "0" and "other" race students as "1"
Dummy coding, whether intended or not, inherently implies that the reference group is "normal" and explores whether there is something not normal about the remaining racial groups.  Results in studies that use dummy coding often sound like: "Asian students experienced less belonging than White students," or "Black students had higher test scores than White students," and so on. Dummy coding, intentionally or not, often sets us up to aspire to what White people do.   

Effect coding, on the other hand, works similarly to dummy coding in that it codes demographic data into integer numbers, but unlike dummy coding, it does so in a way that compares each group to the grand mean (the unweighted average of the outcome variable among all groups). In plain English, this means that effect coding allows us to compare results to the norm across the entire population rather than to a particular reference group. This leads to statements like "Asian students experienced less belonging than was the norm in the larger student population in this study," or "Black students had higher post-test scores than was the norm among all students enrolled in the course." Using the same example as for dummy coding of race, effect coding would also code five categories of race into four variables, but a little bit differently than for dummy coding:

  • Asian:  this variable would code all White students as "-1", all Asian students as "1", and all non-White and non-Asian students as "0"
  • Black:  this variable would code all White students as "-1",  all Black students as "1", and all non-White and non-Black students as "0"
  • Multiracial:  this variable would code all White students as "-1", all multiracial students as "1", and all non-White, non-Multiracial students as "0"
  • Other:  this variable would code all White students as "-1", all "other" race students as "1",  and all Black, Asian, and Multiracial students as "0"

Without getting into the math of it all, the above (effect-coded) approach to coding demographic data allows us to refrain from judging what is normal and to simply compare what certain groups are feeling or doing to the average across the whole sample population we are studying. Reaching the norm or (unweighted) average may still not be the ultimate goal, but it prevents us from devising strategies or designing interventions whose goal is to get everyone acting like White people.  

Reference:

Mayhew, M. J., & Simonoff, J. S. (2015). Non-White, no more: Effect coding as an alternative to dummy coding with implications for higher education researchers. Journal of College Student Development, 56(2), 170-175.

UCLA Advanced Research Computing:  Statistical Methods and Data Analytics.  Coding systems for categorical variables in regression analysis

UCLA Advanced Research Computing:  Statistical Methods and Data Analytics.  Interpreting the coefficients of an effect-coded variable in a regression model.  


Denise Wilson is a professor of electrical and computer engineering at the University of Washington in Seattle, Washington. Her research interests in engineering education focus on belonging, engagement, and instructional support in the engineering classroom.  She is also invested in engineering workplace research focused on understanding belonging and inclusivity.     

The Elusive Mere Belonging

Gregory Walton and Geoffrey Cohen, researchers at Stanford University, have conducted a wide range of controlled experiments on students to ...