- Help Center
- Machine Learning
-
Data Science Bootcamp
-
Large Language Models Bootcamp
-
Agentic AI Bootcamp
-
Registration
-
Pricing
-
Community
-
Python Programming
-
Platform Related Issues
-
Bootcamps
-
Homework and Notebooks
-
Free Courses
-
Data Science for Business
-
Practicum
-
Blog
-
Employment Assistance
-
Machine Learning
-
Data Analysis
-
R Language
-
Python for Data Science
-
SQL
-
Introduction to Power BI
-
Power BI
-
Programming and Tools
-
Partnerships
when creating dummy variables for a categorical variable, why do we need to discard one of them?
If you have 3 groups for race, then you can use only 2 dummy variables to represent membership in race group.
In general, for k groups, you use only (k-1) dummy variables.
It’s helpful to think of each dummy variable as a yes/ no question about group membership.
Suppose your race groups are:
1= white
2 = black
3 = other
Dummy variable 1 answers the question: Do you identify yourself as white? 0 = no, 1 = yes.
Dummy variable 2 answers the question: Do you identify yourself as black? 0 = no, 1 = yes.
Provided that your groups are mutually exclusive and exhaustive, then if a person answers no to the first two questions, that person must be a member of group 3, other race.
In fact, if you try to include a third dummy variable in this situation, regression analysis will fail because the scores on the third dummy variable are perfectly predictable from the answers on the first two dummy variable questions.