What Is Entropy in a Decision Tree Algorithm?

In decision trees, each time you split the data set in two parts, you want that these two parts are as tidy as possible. Here tidy means that each of the two sets contains only the elements of one category (associated with a single label). It is then easier to make a decision. However, in practice, splitting a set into 2 pure sets almost never happens. The two sets always contain elements with 2 or more different labels. Entropy is used to quantify this purity of the sets in term of the element labels. Entropy is an indicator of how messy your data is.

Imagine you have some data with elements labeled either Label 1 or Label 2. If a set is pure and contains only elements with Label 1 or with Label 2, the entropy is minimum (equals zero). If the set contains a mix of elements with Label 1 and Label 2 the entropy rises. The entropy reaches its maximum (equals 1) when there are as many elements with both labels in the set.