Often when researchers are trying to study how one group of people behaves compared to another group or groups, we code our data to support a wide group of statistical techniques called regression analyses. Regression simply tries to come up with a mathematical relationship (typically linear) between input (independent) variables and an output (dependent) variable.
Without getting into the math of it all, the hands-down, most popular way to code individuals by their demographic characteristics is a process called dummy coding... which BTW does not imply that anyone is a dummy.
Dummy coding works by identifying a reference group and then giving everyone who doesn't belong to that reference group a label and a category of their own. For example, in a population of students who are White, Asian, Black, Multiracial, or of "other" races, we might choose the reference group to be White people. To find a place for all races in the statistical analysis, we could then dummy code the five categories of race into four variables:
- Asian: this variable would code all White students as "0" and all Asian students as "1"
- Black: this variable would code all White students as "0" and all Black students as "1"
- Multiracial: this variable would code all White students as "0" and all Multiracial students as "1"
- Other: this variable would code all White students as "0" and "other" race students as "1"
Effect coding, on the other hand, works similarly to dummy coding in that it codes demographic data into integer numbers, but unlike dummy coding, it does so in a way that compares each group to the grand mean (the unweighted average of the outcome variable among all groups). In plain English, this means that effect coding allows us to compare results to the norm across the entire population rather than to a particular reference group. This leads to statements like "Asian students experienced less belonging than was the norm in the larger student population in this study," or "Black students had higher post-test scores than was the norm among all students enrolled in the course." Using the same example as for dummy coding of race, effect coding would also code five categories of race into four variables, but a little bit differently than for dummy coding:
- Asian: this variable would code all White students as "-1", all Asian students as "1", and all non-White and non-Asian students as "0"
- Black: this variable would code all White students as "-1", all Black students as "1", and all non-White and non-Black students as "0"
- Multiracial: this variable would code all White students as "-1", all multiracial students as "1", and all non-White, non-Multiracial students as "0"
- Other: this variable would code all White students as "-1", all "other" race students as "1", and all Black, Asian, and Multiracial students as "0"
Without getting into the math of it all, the above (effect-coded) approach to coding demographic data allows us to refrain from judging what is normal and to simply compare what certain groups are feeling or doing to the average across the whole sample population we are studying. Reaching the norm or (unweighted) average may still not be the ultimate goal, but it prevents us from devising strategies or designing interventions whose goal is to get everyone acting like White people.
Reference:
UCLA Advanced Research Computing: Statistical Methods and Data Analytics. Coding systems for categorical variables in regression analysis
UCLA Advanced Research Computing: Statistical Methods and Data Analytics. Interpreting the coefficients of an effect-coded variable in a regression model.