"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

November 28, 2017

Day #92 - Mean Encoding

Mean Coding
  • Add new variables based on certain features
  • Label encoding is done usually
  • Mean encoding is done as variable count / distinct unique variables
  • The proportion of label encoding also is included in this step
  • Min encoding with label encoding
  • Label encoding - No logical order
  • Mean encoding - Classes are separable
  • We can reach better loss with sorted trees
  • Trees need huge number of splits 
  • Model tries to treat all categories differently
Constructing Mean Encoding
  • Goods - Number of ones in a group
  • Bads - Number of zeros
Likelihood = Goods/(Goods + Bads) = mean(target)
Weight of Evidence = In(Goods/Bads)*100
Count = Goods = sum(target)
Diff = Goods-Bads


Happy Learning!!!

No comments: