"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 01, 2016

Day #32 - Regularization in Machine Learning


A large coefficient will result in overfitting. To avoid we perform regularization. Regularization - To avoid overfitting
  • L1 - Sum of values (Lasso - Least absolute shrinkage and selection operator). L1 will be meeting in co-ordinates and result in one of the dimensions zero. This would result in variable elimination. The features that minimally contribute will be ignored.
  • L2 - Sum of squares of values (Ridge). L2 is kind of circle shaped. This will shrink all coefficient in same proportion but eliminate none
  • Discriminative - In SVM we use hyperplane to classify the classes. This is example for discriminative approach
  • Probabilistic - Generated by Gauss Distribution. This is again based on Central Limit Theorem. Infinite points will fit into a Normal distribution. Here we apply gauss distribution model
  • Max Likelihood - Probability that the point p belongs to one distribution. 
Good Read for L2 - Indeed, using the L2 loss comes from the assumption that the data is drawn from a Gaussian distribution

Another Read -

  • L1 Loss function minimizes the absolute differences between the estimated values and the existing target values. L1 loss function is more robust and is generally not affected by outliers
  • L2 loss function minimizes the squared differences between the estimated and existing target values. L2 error will be much larger in the case of outliers 

Happy Learning!!!

No comments: