"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

December 17, 2016

Day #50 - Recommendation Systems

Recommendation Systems
  • Content Based
  • Collaborative (User-User / Item-Item)
Content Based Key Features - Able to recommend based on user taste and historical behavior. No need of other user data
Pros 
  • No Need of Data from Other users
  • Recommend New and Unpopular items
Cons
  • Finding Appropriate feature is hard
  • Unable to Exploit Quality judgment from other users
Collaborative Key Features - Recommendation based on similar users / similar items. 
Item-Item 
  • For item-i, find other similar items
  • Better than user-user
  • Need enough users data
  • Works on any kind of item no feature selection is needed
User-User
  • Find users who have bought / rated similar items
  • Hard to find users rated same items
More Advanced Methods - Latent Factor Models

Happy Learning!!!

Day #49 - Clustering Key Notes






Happy Learning!!!

December 13, 2016

Day #47 - Deep Dive - Learning's

Tip #1 - Support Vector Machines
  • Performs classification by obtaining and utilizing optimal separating hyperplane that separates two classes and maximizes the distance to the closest point from either class called margin
  • Training involves non-linear optimization
  • Objective function is convex
  • So, the solution to optimization problem is relatively straight forward
Tip #2 Regularization - Involves adding penalty term in Error function. Two types of regularization in linear regression
  • Ridge
  • Lasso
Tip #3 - Stochastic Gradient Descent
  • Also called as batch gradient descent
  • One example at a time, move at once
  • Cheaper computation
  • Randomization - Escape shallow valleys, local minima, does take care of escaping silly local minima
  • Simplest possible optimization
  • SGD is applied in Neural Networks
Tip #4 - Gradient Descent
  • Meant to minimize non-linear function
  • Error measure convex function
  • Finding local minimum
  • Initialize -> Iterate until termination ->Adjust Learning Rate -> Terminate on local minimum
  • Return Weights
Tip #5 - Bias and Variance
  • Models with two few parameters may lead to High Bias
  • Models with too many parameters are inaccurate due to Large Variance
Happy Learning!!!

December 11, 2016

Day #46 - Recursive Feature Elimination

Recursive feature elimination is step wise backward feature elimination.

Backward Search
  • Start with all features
  • Greedily remove the least relevant feature
  • Stop when selected the least number of features
Recursive Feature Elimination
  • Train SVM
  • Rank the Features
  • Eliminate Feature with lowest Rank
  • Repeat until required number of features are retained
For each iteration RFE eliminates one feature with minimum weight. Intuition is feature with minimum weight would least influence weight vector form.

Happy Learning!!!

Day #45 - Handling Imbalanced Classes

  • SMOTE - Synthetic minority over sampling technique
  • Sampling with Replacement
  • Sampling without Replacement
  • Under sampling of Majority Class, Oversampling of Minority Class
  • Collect more samples
Happy Learning!!!

December 04, 2016

Day #44 - Real time and Batch Analytics - Vendors - Stack Comparison

Summary of analysis after evaluating different stacks




Happy learning!!!


December 03, 2016

Day #43 - Random Forest - One Page Summary


Consider how the mighty random forest...[From linkedin Post]
1. Handles both classification and regression.
2. Works with both categorical and numeric data.
3. Doesn't require centering/scaling of numeric data.
4. Is robust to outliers and over-fitting.
5. Works well on many business problems with hyperparameter default values.
6. Estimates generalization error.
7. Provides insights into feature importance.
8. Can be trained in parallel.
9. Provides an intuitive vehicle for understanding and working the bias-variance trade-off.
10. Supports problems with complex decision boundaries elegantly. 

Happy Learning!!!

November 26, 2016

Day #42 - Classes in python

Today it's bit more on classes in python. It is similar to C# / C++ / Java


November 11, 2016

Day #41 - Machine Learning Interesting Qns


I do read through a lot of materials. Some readings are very clear and needs bookmark. Some of those questions and answers
  1. How does KNN predict class label for new example ?
    • Find the nearest K neighbour of example which needs to be classified. Take the major vote based on class labels of the K neighbours found
  2. Classification - Map input to discrete outputs
  3. Generative Model - Naive Bayes
  4. Discriminative Model - SVM, Decision Trees, Neural Networks, Boosting, KNN
  5. Regression - Map input to continuous outputs
  6. Decision Tress - Embedded Implicit Feature Selection method
  7. PCA
    • Taking Data into a new space
    • Number of Eigen Values = Number of original dimensions
    • Pick the top k Eigen Value Vectors

       8. Linearly non-separable in normal plane. With SVM Kernal Technique we can project it in hyper plane and make it linearly separable

       
       9. Linearly Separable

Happy Learning!!!

November 05, 2016

Day #40 - Download Images from Web using Python

This post is about downloading images from a URL
  • Read from the input file
  • Perform recursive download for all files
  • Try catch handled errors and downloaded file successfully


October 31, 2016

Day #39 - Useful Tool MyMediaLite for Recommendations

This post is based on learning's for assignment link1, link2

Input is User-Items file as listed below


Sample Execution Command


We will be supplying parameter 20 in user20.txt to identify recommendations for user 20. The recommender type is mentioned in the --recommender parameter

Happy Learning!!!

October 10, 2016

Day #36 - Pandas Dataframe Learning's

Happy Learning!!!

Day #35 - Bias Vs Variance


These are frequently occurring terms with respect to performance of model against training and testing data sets.

Classification error = Bias + Variance

Bias (Under-fitting)
  • Bias is high if the concept class cannot model the true data  distribution well, and does not depend on training set size.
  • High Bias will lead to under-fitting
How to identify High Bias
  • Training Error will be high
  • Cross Validation error also will be high (Both will be nearly the same)
Variance(Over-fitting)
  • High Variance will lead to over-fitting
How to identify High Variance
  • Training Error will be high
  • Cross Validation error also will be Very Very High compared to training error
Hot to Fix ?
Variance decreases with more training data, and increases with more complicated classifiers

Happy Learning!!!

October 08, 2016

Day #34 - What is diffference between Logistics Regression and Naive Bayes

Both are probabilistic
Logistics
  • Discriminative (Entire approach is purely discriminative)
  • P(Y/X)
  • Final Value lies between Zero and 1
  • Formula given by exp(w0+w1x)/(exp(w0+ w1x)+1)
  • Further can be expressed as 1/(1+(exp-(w0+ w1x))
Binary Logistic Regression - 2 class
Multinomial Logistic Regression - More than 2 class

Example - Link




Link - Ref
Logistic Regression
  • Classification Model
  • Probability of success as a sigmoid function of a linear combination of features
  • y belongs to (0,1) - 2 Class problem
  • p(yi) = 1 / 1+e-(w1x1+w2x2)
  • Linear combination of features - w1x1+w2x2
  • w can be found with max likelihood estimate- 
Naive Bayes
  • Generative Model
  • P(X/ Given Y) is Naive Bayes Assumption
  • Distribution for each class
Happy Learning

October 04, 2016

October 02, 2016

Good Data Science Course Links


AI Lectures

Introduction to Machine Learning

Happy Learning!!!

Short Analytics Concept Videos



  • Descriptive Analysis (Analysis of existing data, Trends and Patterns), 
  • Diagnostic Analysis (Reasons / Patterns behind events)
  • Predictive Analytics (Future how will it look like) 
  • Prescriptive Analysis (How to be prepared / handle the future)

Great Compilation, Keep Learning!!!

October 01, 2016

Day #32 - Regularization in Machine Learning


A large coefficient will result in overfitting. To avoid we perform regularization. Regularization - To avoid overfitting
  • L1 - Sum of values (Lasso - Least absolute shrinkage and selection operator). L1 will be meeting in co-ordinates and result in one of the dimensions zero. This would result in variable elimination. The features that minimally contribute will be ignored.
  • L2 - Sum of squares of values (Ridge). L2 is kind of circle shaped. This will shrink all coefficient in same proportion but eliminate none
  • Discriminative - In SVM we use hyperplane to classify the classes. This is example for discriminative approach
  • Probabilistic - Generated by Gauss Distribution. This is again based on Central Limit Theorem. Infinite points will fit into a Normal distribution. Here we apply gauss distribution model
  • Max Likelihood - Probability that the point p belongs to one distribution. 
Good Read for L2 - Indeed, using the L2 loss comes from the assumption that the data is drawn from a Gaussian distribution

Another Read -

  • L1 Loss function minimizes the absolute differences between the estimated values and the existing target values. L1 loss function is more robust and is generally not affected by outliers
  • L2 loss function minimizes the squared differences between the estimated and existing target values. L2 error will be much larger in the case of outliers 

Happy Learning!!!

September 25, 2016

Persuading Organization to embrace analytics

Products / systems currently running need to look at their Data Collection techniques to identify more relevant data to perform better analytics. If current systems rely on point in time data, overwrite / archive historical records over a period of time, we will lose all the valuable information

Why Analytics ?
  • Predict your future based on your past and present
  • Correct your mistakes before it's too late
  • Identify and correct poor performing segments of business

How Analytics differs from Business Intelligence ?
  • I have worked for ETL, data marts, Schemas for BI projects
  • BI helps to summarize compare business performance YoY, QoQ
  • Analytics, is next step for BI to look at future trends

Where are we lagging ?

We need analytics but we do not have enough data points / features to perform analytics. Data collection is a key aspect. The underlying blood of Data science is collecting meaningful data and making models out of it. We need to devote sufficient time to collect data, pipeline it, process and aggregate it for Data Analysis, Modelling.

To evolve from a current product to a system with Analytics capabilities we need to change we way we store data, process data. Technical aspects, project deadlines, resistance has to be handled to make things work.

Persist, Persuade, Implement....

Happy Learning!!!

September 05, 2016

Day #31 - Support Vector Machines

SVM
  • Support Vector Machines
  • Widest Street approach separating +ve and -ve classes, Separations as wide as possible
  • SVM works on classifying only two classes
  • Hard SVM (Strictly linearly separable)
  • Soft SVM (Minimize how they fall on another side, Constant C to minimize how much allow one point go on another side)
  • Kernel Functions perform transformation of data
  • Using Kernel function we simulate idea of finding linear separator 
  • Kernels take data into higher dimensional space
  • Other Key concepts discussed (Lagrange Multipliers, Quadratic Optimization problem)
  • Lagrangian constraint transform from 1D to 2D data
  • SVM (Linear way of approximation)
  • Types of Kernels - Polynomial Kernel, Radial Basis Function Kernel, Sigmoid Kernel
Maths Behind it - Link
Good Relevant Read - SVM

Happy Data Analysis!!!

Day #30 - Machine Learning Fundamentals

Supervised Learning
  • Classification and Regression problems
  • Past data + Past outputs leveraged
  • Regression - Continuous Values
  • Classification - Discrete Labels
Unsupervised
  • Clustering - Discrete Labels
  • Dimensionality reduction - Continuous Values
Classifiers
  • SVM (Linear way of approximations)
  • KNN (Lazy learner)
  • Decision Tree (Rule based approach, Set of Rules)
  • Naive Bayes (Pick class with maximum probability)
Evaluation Methods
  • K-Fold Validation
  • Cross Validation
  • Ranking / Search - Relevance
  • Clustering - Intra-cluster and inter-cluster distances
  • Regression - Mean Square Error
  • ROC Curve 
Bagging
  • Build classifier with 30% of data
  • Again partition and build another classifier with next 30% of data
  • Random Forests - Random combination of Trees
  • Randomly decide and split on attributes
Boosting
  • Multiple weak classifiers build strong classifier
  • Sample with replacement
  • Adaboost - Adaptive boosting
Stacking
  • Use Output from one classifier as input for another classifier
  • Knn -> O/P -> SVM
Happy Learning!!!

August 31, 2016

Day #29 - Decision Trees

  • Hierarchical, Divide and Conquer strategy, Supervised algorithm
  • Works on numerical data
  • Concepts discussed - Information gain, entropy computation (Shanon entropy)
  • Pruning based on chi-square / Shannon entropy
  • Convert all string / character into categorical / numerical mappings
  • You can also bucketize continuous variables
Basic Python pointers

Good Reads
Link1 , Link2, Link3, Link4, Link5, Link6

Happy Learning!!!

August 15, 2016

Day #28 - R - Forecast Library Examples

Following Examples discussed. Library used - R - Forecast Library
  • Moving Average
  • Single Exponential Smoothing - Uses single smoothing factor
  • Double Exponential Smoothing - Uses two constants and is better at handling trends
  • Triple Exponential Smoothing - Smoothing factor, trend, seasonal factors considered
  • ARIMA

Happy Learning!!!

August 08, 2016

Applied Machine Learning Notes


Supervised Learning
  • Classification (Discrete Labels)
  • Regression (Output is continuous, Example - Age, Stock prices)
  • Past data + Past Outputs used
Unsupervised Learning
  • Dimensionality reduction (Data in higher dimensions, Remove dimension without losing lot of information)
  • Reducing dimensionality makes it easy for computation (Continuous values)
  • Clustering (Discrete labels)
  • No Past outputs, Only current data
Reinforcement Learning
  • All Game Playing is unsupervised
  • Learning Policy
  • Negative / Positive reward for each step
Type of Models
  • Inductive (Learn model, Learn from a function) vs Transductive (Lazy learning ex- Opinion from like minded people)
  • Online (Learn from every new incoming tweet) vs Offline (Look past 1 Yeat tweet)
  • Generative (Apply Gaussian on Data, Use ML and compute Mean / Variance) vs Discriminative (Two sides of Line)
  • Parametric vs Non-Parametric Models
Happy Learning!!!

July 31, 2016

Fifth Elephant Day #2

Fifth Elephant Day #2 - Part I

Session #1 - Content Marketing
  • Distribute relevant consistent content. Traditional vs Content Marketing
Challenges
  • Delivering content with speed. Channel proliferation (mobile, computers, tablets)
  • Intersection of Brands, Trends, Community Interests (Social media post and metrics)
  • Data from social media pages, online aggregators



Technical Details
  • Computation of term frequency, inverse document frequency
  • Using Solr, Lucene for Indexes
  • Cosine Similarity
  • Greedy Algorithm
Session #2 - Reasoning
  • Prediction vs Reasoning problem
  • Prediction Problems Evolution 
  • At Advanced level Deep Learning, XGBoost, Graphical models
When Apply prediction ?
Features as input -> Prediction performed (Independent, stateless)

Reasoning - Sequential, Stateful Exploration
Reasoning Problems - Diagnosis, routes, games, crossing roads

Flavours of Reasoning
  • Algorithmic (Search)
  • Logical reasoning
  • Bayesian probabilistic reasoning
  • Markovnian reasoning
Knowledge, Learning the process of reasoning, Knowledge graphs were should in implementation of reasoning
{subject, predicate, object}















Session #3 - Continuous online learning
  • 70% noise in C2B communication
  • 100% noise in B2C communication
  • Zipfian
Technicalities
  • Apriori - Market Basket Analysis
  • XGBoost - Alternative to DL
  • Bias - Variance Tradeoff
  • Spectral Clustering






Bird of Feathers Session
  • Google Deepmind (Used for Air conditioning)
  • Bayesian Probabilistic Learning
  • Deep Learning - Build Hierarchy of features (OCR type of problems)
  • Traditional Neural Network (Fully Connected, lot of degree of freedom)
  • Structural causality (Subsystem appears before, Domain knowledge)
  • Temporal causality - This and then that happened
  • CNN - learning weights
  • Spectral clustering
  • PCA (reduce denser to smaller)
  • Deep Learning - Hidden layers obtained through coarse grained process
Deep Learning workshop Notes
  • Neural Networks
  • Multiple Layers
  • Lots of data
People Involved - Hinton, Andrew Ng, Bengio, Lecuss

Deep Learning now
  • Speech recognition
  • Google Deep Models on Phone
  • Google street view (House numbers)
  • Imagenet
  • Captioning images
  • Reinforcement learning
Neural Networks
  • Simple mathematical units combine into complex functions
  • X-> input, W-> weights, Non linear function of output
Multiple Layers
  • Multiple hidden layers between input and output
  • Training hidden layers is challenge
Gradient Descent
  • Define loss function
  • Minimize by moving along gradient
Backpropagation
  • Move Errors back through the network
  • Chain rule conception
Tools
  • Cafee - Configuration file
  • Torch - Describe network in lue
  • Theano - Describes computation, writes cuda code, runs and gives results
CNN
  • Used for images
  • Images are organized
  • Apply Convolutional filter
  • For Deep Learning GPU is important
Imagenet Competition
  • Convolution (Have all nice features retain them)
  • Pooling (Shrink image)
  • Softmax
  • Other
Simplest RNN - Gradient Descent problem
LSTM (Long Short Term memory)
Interword relationships from corpus (word2vec)

Happy Learning!!!

July 28, 2016

Fifth Elephant Day #1 Notes - Part II

Sessions # - Link

Talk #3 - Machine Learning in FinTech
  • Lending Space
  • Credit underwriting system
India
  • 2% Credit card usage
  • 65% of population < 27 yrs
  • Digital foot print (mobile)
  • Identity (Aadhar)
40 Decisions / Minute -> 100 Crores a month

Use Cases / Scenarios
  • Truth Score (Validity of address / person / sources)
  • Need Score (Urgency / Time to respond application)
  • Saver Score (cash flow real-time analytics)
  • Credit Score (Debt to income)
  • Credit awareness score
  • Continuous risk assessments
Talk #4 - Driving Behaviour from Smartphone Sensors
  • For Safety driving using smartphone sensors
  • Spatial / location data
  • Road traffic injuries due to distracted driving
  • Phone usage - 4x crash risk
  • Speedy driving - 45% car crash history
  • Driving behavior analysis / driving feedback
  • GPS + Inertial Navigational sensors (Accelerometer / Gyroscope / Magnetometer)
Characterization
  • Drive detection
  • Event detection
  • Collision detection
Qualification
  • Drive summarization and scoring
  • Risk modelling
Optimization
  • Events, location of events, duration of events
Dynamics
  • Sensors
  • Availability - wide variety across devices
  • Raw Data - noisy, unevenly spaced time series
  • Events - Time scales, combination of sensors
  • Model building - Labelled vs unlabelled data, feature engineering
  • Algorithms - Stream / batch efficiency
Techniques
  • Cluster data 
  • Eliminated uninteresting time periods
  • Classification / Regression models
  • Spectral clustering
Talk #5 - Indian Agriculture
  • Crop rotation literacy
  • Data curation, Query tools on data product
  • Visualization and plotting of Agricultural data
Tak #6 and #7 - Last two talks were from Ecologists
  • Using Image comparison for Big Cat Counting
  • Predicting Big Cat Areas (Territories)
  • Observe Nature, Frame Hypothesis, Design Experiments
  • Confront with competing hypothesis
  • Spacegap program
  • Markov chain Monte-Carlo technique


Happy Learning!!!

Fifth Elephant Day #1 Notes - Part I

Sessions # - Link

Talk #1 - Data for Genomic Analysis

Great talk by Ramesh. I had attended his session / technical discussion earlier. This session provided insights on genome / discrepancies in genome sequence leading to rare diseases.

Genome - 3 Billion X 2 Characters
Character variables varies from person to person
Stats (1/10th of probability of cancer)
Baseline risk for breast cancer (1/8),(1/70) ovarian cancer
BRCA1 mutation (5-6 fold increase in breast cancer, 27 fold increase for ovarian cancer)

In India
  • 35% inherited risk mutation
  • 1/25 Thalassemia 
  • 1 in 400-900 Retinitis Pigmentosa
  • 1 in 500, Hypertrophic Cardiomyopathy
Data Processing
  • 1 Billion reads - 100GB data per person
  • Very similar sequence yet one character might differ
  • But reference is 3 Billion long
Efficiency
  • Need fast indexing
  • Suffix Trees and variations
  • Hash table based approaches
Reference Genome Sequence
  • Volume of data
  • Funnel down of variety of dimensions
  • Triplet Code (Molecule)
  • Variants of Triplets nailed down to difference of gnome
  • GPU processing / reduce computation time
Concepts Discussed / Used
  • Hypothesis Testing
  • Stats Models
  • GPU Processing to reduce computation time
They also provide assessment for hereditary diseases at corporate level.

Talk #2 - Alternative to Wall Street Data

This session gave me some new strategies to collect / analyze data

How to Identify occupancy rate at hotel ?
  •  Count of cars from parking lots
  •  Number of rooms lights on
  •  Take pics of rooms from corner of street and predict based on images collected
  •  Unconventional ways to think of data collection (Beating the wall street model)
What are usual ways
  •  Checking websites
From Investor perspective lodging key metrics is a very important aspect
Data Sources
  • Direct data gathering
  • Web harvesting
  • Primary research
Primary Research
  • Look at notice patterns in front of you
  • Difference in invoice numbers
  • Serial number changes, difference values
Free Data Sets in link
Lot of opportunity
  • Analyze international markets (India / China)
  • COGS
  • SG
  • ETC
How to value data sets ?
  • Scarcity - How widely used
  • Granularity - Time / aggregation level
  • Structured
  • Coverage



What is the generative value
  • Revenue Surprise Estimates
  • Dataset insight / Analysis
  • Operating GAAP measures
A Great case study on impact of smart watch vs luxury watch was presented ? This session provides great insight into unconventional data collection ways
  • Generate money in automated system
  • Stock sensitivity to revenue surprises
  • Identify underlying ground truth
"Some Refreshing changes to world of investment"

Happy Learning!!!

June 17, 2016

June 15, 2016

Day #26 - R - Moving Weighted Average

Example code based on two day workshop on Azure ML module. Simple example storing and accessing data from Azure workspace



Happy Learning!!!

June 01, 2016

Day #25 - Data Transformations in R

This post is on performing Data Transformations in R. This would be part of feature modelling. Advanced PCA will be done during later stages



Data Normalization in Python

Happy Learning!!!

May 20, 2016

Day #24 - Python Code Examples

Examples for - for loop, while loop, dictionary, function examples and plotting graphs Happy Learning!!

Day #23 - Newton Raphson - Gradient Descent

Newton Raphson
  • Optimization Technique
  • Newton's method tries to find a point x satisfying f'(x) = 0
  • Between two successive approximations
  • Stop iteration when difference between x(n+1) and x(n) is close to zero
Formula
  • x(n+1) = x(n) - (f(x)/f'(x))
  • Choose suitable value for x0
Gradient Descent
  • Works for convex function
  • x(n+1) = x(n) - af'(x)
  • a - learning rate
  • Gradient descent tries to find such a minimum x by using information from the first derivative of f
  • Both gradient and netwon raphson are similar the update rule is different
More Reads - Link







Optimal Solutions
  • Strategy to get bottom of the valley, go-down in steepest slope
  • Measures local error function with respect to the parameter vector
  • Once gradient zero you have reached the minimum
  • Learning rate, steps to converge to a minimum
  • How to converge to Global vs Local Minimum
  • Gradient descent does guarantee local minimum
  • Cost function elongated/circular decides on the convergence


Happy Learning!!!

May 14, 2016

Day #22 - Data science - Maths Basics


Eigen Vector - Vector along which there is no change in direction

Eigen Value - Amount of Scaling factor defined by Eigen value

Eigen Value Decomposition - Only Square matrix can be performed Eigen Decomposition

Trace - Sum of Eigen Values

Rank of A - Number of Non-Zero Eigen Values

SVD - Singular Value Decomposition
  • Swiss Army Knife of Linear Algebra
  • SVD - for Stock market Prediction
  • SVD - for Data Compression
  • SVD - to model sentiments
  • SVD is Greatest Gift of Linear Algebra to Data Science
  • Square Root of (Eigen Values of AtA) - A Transpose A, becomes Singular Value of
Happy Learning!!! (Revise  - Relearn - Practice)

May 09, 2016

Day #21 - Data Science - Maths Basics - Vectors and Matrices

Matrix - Combination of rows and columns
Check for Linear Dependence - R2 = R2 - 2R1, When one of the rows is all zeros it is linearly dependent
Span - Linear combination of vectors
Rank - Linearly Independent set

Good Related Read - Span

Vector Space - Space of vectors, collection of many vectors
If V,W belong to space, V+W also belongs to space, multiplied vector will lie in R Square
If the determinant is non-zero, then the vectors are linearly independent. Otherwise, they are linearly dependent

Vector space properties
  • Commutative  x+y = y+x
  • Associative (x+y)+z = x+(y+z)
  • Origin vector - Vector will all zeros, 0+x = x+0 = x
  • Additive (Inverse) - For every X there exists -x such that x+(-x) = 0
  • Distributivity of scalar sum, r(x+s) = rx+rs
  • Distributivity of vector sum, r(x+s) = rx+rs
  • Identity multiplication, 1*x = x
Subspace
Vector Space V, Subset W. W is called subspace of V
Properties
W is subspace in following conditions
  • Zero vector belongs to W 
  • if u and v are vectors, u+v is in W (closure under +)
  • if v is any vector in W, and c is any real number, c.v is in W
Subset S belongs to V can be represnted as linear combination
 v = r1v1+ r2v2+... rkvk
v1,v2 distinct vectors from S, r belongs to R

Basis - Linearly Independent spanning set. Vector space is called basis if every vector in the vector space is a linear combination of set. All basis for vector V same cardinality

Null Space, Row Space, Column Space
Let A be m x n matrix
  • Null Space - All solutions for Ax = 0, Null space of A, denoted by Null A, is set of all homogenous solution for Ax=0
  • Row Space - Subspace of R power N spanned by row vectors is called Row Space
  • Column Space -  Subspace of R power N spanned by column vector is called Column Space
Norms - Measure of length and magnitude
  • For (1,-1,2), L1 Norm = Absolute value = 1+1+2 = 4
  • L1 - Same Angle
  • L2 - Plane
  • L3 - Sum of vectors in 3D space
  • L2 norm (5,2) = 5*5+2*2 = 29
  • L infinity - Max of (5,2) = 5
Orthogonal - Dot product equals Zero
Orthogonality - Linearly Independent, perpendicular will be linearly independent
Orthogonal matrix will always have determinant +/-1


Differential Equations - Notes - Link


Lectures - Link

Course Notes - Link

Happy Learning!!!

May 08, 2016

Day #20 - PCA basics

Machine Learning Algorithms adjusts itself based on the input data set. Very different from traditional rules based / logic based systems. The capability to tune itself and work according to changing data set makes it self-learning / self-updating systems. Obviously, the inputs / updated data would be supplied by humans.

Basics
  • Line is unidirectional, Square is 2D, Cube is 3D
  • Fundamentally shapes are just set of points
  • For a N-dimensional space it is represented in N-dimensional hypercube
Feature Extraction
  • Converting a feature vector from Higher to lower dimension
PCA (Principal Component Analysis)
  • Input is a large number of correlated variables We perform Orthogonal transformation, convert them into uncorrelated variables. We identify principal components based on highest variation
  • Orthogonal vector - Dot product equals zero. The components perpendicular to each other
  • This is achieved using SVD (Single Value Decomposition)
  • SVD internally solves the matrix and identifies the Eigen Vectors
  • Eigen vector does not change direction when linear transformation is applied
  • PCA is used to explain variations in data. Find principal component with largest variation, Direction with next highest variation (orthogonal for first PCA)
  • Rotation or Reflection is referred as Orthogonal Transformation
  • PCA - Use components with high variations
  • SVD - Express Data as a Matrix
More Reads

Happy Learning!!!

May 03, 2016

Day #19 - Probability Basics

Concepts
  • Events - Subset of Sample Space
  • Sample Space - Set of all possible outcomes
  • Random Variable - Outcome of experiment captured by Random variable
  • Permutation - Ordering matters
  • Combination - Ordering does not matter
  • Binomial - Only two outcomes of trail
  • Poisson - Events that take place over and over again. Rate of Event denoted by lambda
  • Geometric - Suppose you'd like to figure out how many attempts at something is necessary until the first success occurs, and the probability of success is the same for each trial and the trials are independent of each other, then you'd want to use the geometric distribution
  • Conditional Probability - P(A Given B) = P(A) will occur assume B has already occurred
  • Normal Distribution - Appears because of central limit theorem (Gaussian and Normal Distribution both are same)
From Quora -  
"Consider a binomial distribution with parameters n and p. The distribution is underlined by only two outcomes in the run of an independent trial- success and failure. A binomial distribution converges to a Poisson distribution when the parameter n tends to infinity and the probability of success p tends to zero. These extreme behaviours of the two parameters make the mean constant i.e. n*p = mean of Poisson distribution "

May 01, 2016

Day #18 - Linear Regression , K Nearest Neighbours

Linear Regression
  • Fitting straight line to set of data points
  • Create line to predict new values based on previous observations
  • Uses OLS (Ordinary Least Squares). Minimize squared error between each point and line
  • Maximum likelihood estimation
  • R squared - Fraction of total variation in Y
  • 0 - R Squared - Terrible
  • 1 - R Squared is good
  • High R Squared good fit
Linear Regression (Ref - Link )
  • ML Model to predict continuous variables based on set of features
  • Used where target variable is continuous
  • Minimize residuals of points from the line
  • Find line of best fit
  • y = mx + c
  • Residual = sum (y-mx-c)^2
  • Reduce residuals
  • Assumptions in LR
  • Linearity, Residuals Gaussian Distribution, Independence of errors, normal distribution
Updated May 28/ 2020



KNN
  • Supervised Machine Learning Technique
  • New Data point classify based on distance between existing points
  • Choice of K - Small enough to pick neighbours
  • Determine value of K based on trial tests
  • K nearest neighbours on scatter plot and identify neighbours
Related Read
Recommendation Algo Analysis
Linear Regression
Linear Regression - Concept and Theory
Linear Regression Problem 1
Linear Regression Problem 2
Linear Regression Problem 3

Happy Learning!!!

April 22, 2016

Day #17 - Python Basics

Happy Learning!!!

Neural Networks Basics


Notes from Session
  • Neurons - Synapses. Model brain at high level
  • Machine Learning  - Algorithms for classification and prediction
  • Mimic brain structure in technology
  • Recommender engines use neural networks
  • With more data we can increase accuracy of models
  • Linear Regression, y = mx + b. Fit data set with little error possible.
Neural Network
  • Equation starts from neuron
  • Multiply weights to inputs (Weights are coefficients)
  • Apply activation function (Depends on problem being solved)
Basic Structure
  • Input Layer
  • Hidden Layer (Multiple hidden layers) - Computation done @ hidden layer
  • Output Layer
  • Supervised learning (Train & Test)
  • Loss function determines how error looks like
  • Deep Learning - Automatic Feature Detection


Happy Learning!!!

April 14, 2016

Basics - SUPPORT VECTOR MACHINES

Good Reading from link

Key Notes
  • Allow non-linear decision boundaries
  • SVM - Out of box supervised learning technique
  • Feature Space - Finite dimensional vector space
  • Each dimension represents feature
  • Goal of SVN - Train a model that assigns unseen objects into particular category
  • Creates linear partition of feature space
  • Based on features it places above or below separation linear
  • No stochastic element involved (No involvement of any previous state status)
  • support vector classifiers or soft margin classifiers - allows some observations to be on in-correct side of hyperplane allowing soft margin
Advantage
  • High Dimensionality, Memory Efficiency, Versatility
Disadvantages
  • Non probabilistic
More Reads

Happy Learning!!!

Day #16 - Python Basics

Happy Learning!!!

April 10, 2016

Probability Tips

  • Discrete random variables are things we count
  • A discrete variable is a variable which can only take a countable number of values
  • Probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.
  • Continuous random variables are things we measure
  • A continuous random variable is a random variable where the data can take infinitely many values.
  • Probability density function (PDF), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value
  • Bernoulli process is a finite or infinite sequence of binary random variables
  • Markov Chain - stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event
Let's Continue Learning!!!

Day #15 - Data Science - Maths Basics

Day #15 - Mathematics Basics

Sets Basics
  • Cardinality - Number of distinct elements in Set (For a Finite Set)
  • For Real numbers cardinality infinite
Rational Numbers - Made of Ratio of two numbers
Fibonacci series was introduced in 1201 - Amazing :)

Functions
  • Represents relationship between mathematical variables
  • Spread of all possible output is called range
  • Function that maps from A to B. A is referred as (Domain), B is referred as co-domain
Matrix
  • Rows and columns define matrix 
  • 2D array of numbers 
  • Eigen Values - Scalars, Eigen Vector - Vectors special set of values associated with Matrix M
  • Eigen Vectors - Those directions remain unchanged by action of matrix M
  • Trace - Sum of diagonal elements
  • Rank of Matrix - Number of linearly independent vectors
Determinant
  • Can be computed only for square matrix
Vector
  • Vectors have magnitude, length and direction
  •  Magnitude and cost of angle will give you direction
  • Vector product non-commutative
  •  Dot product commutative
  •  Vector is linearly independent if none of vectors can be written as sum of multiple of other vectors



 Happy Learning!!!

April 09, 2016

April 07, 2016

Day #13 - Maths and Data Science

  • Recommender Systems - Pure matrix decomposition problem
  • Deep Learning - Matrix Calculus
  • Google Search - Page Rank, Social Media Graph Analysis - Eigen Decomposition
Happy Learning!!!

April 04, 2016

Ensemble



  • Combine many predictors and provide weighted average
  • Use single kind of learner but multiple instances
  • Collection of "Ok" predictors and combine them making them powerful
  • Learn Predictors and combine them using another new model 
  • One layer of predictors providing features for next layer 
Happy Learning!!!

March 31, 2016

March 30, 2016

Good Read - Winning Data Science Competitions






Interviews



Questions From Data Science Interviews

Let your work do the talking
Highly Creative Person - More(['work','attention','energy]') -> ['More Benefit']

Happy Learning!!!

Interesting Read - Data Science & Applications

Happy Reading!!!

March 29, 2016

Good Data Science Tech Talk

Today I spent some time with Tech talk on predictive modelling. Good Coverage of fundamentals, Needs revision again.


Read William Chen's answer to What are the most common mistakes made by aspiring data scientists? on Quora

Happy Learning!!!!

March 28, 2016

Data Science Day #12 - Text Processing in R

Today's post is basics on text processing using R. We look at removing stop words, numbers, punctuations, lower case conversion etc..


Happy Learning!!!

March 27, 2016

Day #11 - Data Science Learning Notes - Evaluating a model

R Square - Goodness of Fit Test
  • R square = (1- (Sum of Squares of Error/Sum of Squares of Total))
  • SST - Variance of dependent variable
  • SSE - Variance of Actual vs Predicted Values
Adjusted R Square 
  • Adjusted R Square = (1-((n-1)/(n-p-1)))(1-RSquare)
  • P - Number of independent variables
  • n - records in dataset
RMSE (Root mean square error)
  • For every record predicted compute error 
  • Square it and find mean
  • RMSE error should be same for training and testing dataset
Bias (Underfit)
  • Model can't explain the dataset
  • R Square value very less
  • Add more Independent variable
Variance
  • RMSE High for test dataset, RMSE low for training dataset
  • Cut down Independent variable
Collinearity Problem
  • Conduct P test to validate null hypothesis is valid
Next Pending Reads
  • Subset Selection Technique
  • Cross Validation Technique
  • Z test / P Test
Happy Learning!!!

March 22, 2016

Day #10 Data Science Learning - Correlations

Correlation
  • If you have correlation you can use machine learning to predict variables
  • Mutual relationship connection between two or more things
  • Correlation shows inter dependence between two variables
  • Measure - How much one changes when other also changes ?
  • Popularly Used - Pearson Correlation coefficient
  • Value ranges from -1 to +1
  • Negative correlation (Closer to -1) - One value goes up other goes down
  • Closer to Zero (No Correlation)
  • Closer to 1 (Positive Correlation)
Correlation - Relationship between two values
Causation - Reason for change in value (Cholesterol vs weight, Dress Size Vs Cholesterol). Identify if it is incidental.

Handling highly correlated variables
  • Remove two correlated variables, unless correlation=1 or -1, in which case one of the variables is redundant.
  • Perform a PCA
  • Permutation feature importance (Link)
  • Greedy Elimination (GE): iteratively eliminate feature of highest correlated feature pair
  • Recursive Feature Elimination (RFE): recursive elimination of features with respect to their importance
  • Lasso Regulariosion (LR): use L1 regularisation to remove features with zero weight
  • Principle Component Analysis (PCA): transform data set with PCA and choose components with highest variations
Ref  - Link1Link2 , Link3

Happy Learning!!!

March 20, 2016

Data Science Tip Day #8 - Dealing with Skewness for Error histograms after Linear Regression

In previous posts we have seen histogram based validation for errors. When a left / right skew based distribution is observed some transformation techniques to apply are
  • Right Skewed - Apply Log
  • Slightly Right - Square root
  • Left Skewed - Exponential
  • Slightly Left - Square root, Cube
Happy Learning!!!

March 19, 2016

Data Science Tip Day#7 - Interaction Variables

This post is using interaction variables while performing linear regression

For illustration purpose lets construct some datasets with a three vectors (y,x,z)



Happy Learning!!!

Data Science Tip Day#6 - Linear Regression, Polynomial Regression


We will be renaming the R tips to Data Science Tips moving on. Today we will look at Linear Regression, Polynomial & Variable Interactions. Today's stats class was very useful and interesting. Any topic of interest needs practise to master the fundamentals.

Linear Regression
Assume a Linear Equation y = mx+b
  • Here m is slope, b is intercept value at x = 0
  • Similarly, we use the equation to express relationship between dependent and independent variables
  • In this equation y = mx+b, y is the dependent variable. x is the independent variable
For illustration purpose let's construct some datasets with a two vectors (y,x)




Happy Learning!!!

March 17, 2016

R Day #5 Tip of the Day - Linear Regression

I have taken up Udemy Data Science Classes. Below notes from Linear Regression Classes

Linear Regression 
  • Analyze relationship between two or multiple variables
  • Goal is preparing equation (Outcome y, Predictors x)
  • Estimate value of dependent and independent variables using relationship equations
  • Used for Continuous variables that have some correlation between them
  • Goodness of fit test to validate model
Linear Equations
  • Explains relationship between two variables
  •  X (Independent- Predictor), Y (Dependent)
  •  Y = AX + B
  •  A (Slope) = (Y/X)
  •  B - Intercept (Value of Y when X =0)
  •  Equation becomes predictor of Y
Fitting Line
  • Sum of squares of vertical distances minimal
  • Best Line = least residual
  • Difference between model and actual values are called as residuals
Goodness of Fit
  • R square measure 
  • Sum of squares of distances (Sum of squares of vertical distances minimal)
  • Uses residual values
  • Higher R square value better the fit (close to 1 higher fit)
  • Higher Correlation means better fit (R square will also be high) 
Multiple Regression
  • Multiple predictors involved (X Values)
  • More than one independent variable used to predict dependent variable
  • Y = A1X1 + A2X2 + A3X3 +ApXp + B

Homoscedasticity -  all random variables in the sequence or vector have the same finite variance
heteroscedasticity -  variability of a variable is unequal across the range of values of a second variable that predicts it
Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables

Happy Learning!!!

March 16, 2016

R Day #4 - Tip for the Day - Learning Resources


I came across this site NYU Stats R Site

Please bookmark it for a good walkthru and learning on R Core topics

Happy Learning!!!

March 13, 2016

R Day #3 - Tip for the Day

This is based on reading from notes from link 

Logistic Regression
  • Applied when response is binary
  • (0/1, yes/No etc..), Also known as dichotomous outcome variable 
Binomial probability model
  • consists of (i) n independent trials where 
  • (ii) each trial results in one of two possible outcomes (Yes/No, 1/0)
  • (iii) the probability p of a success stays the same for each trial
Maximum likelihood - Find the value of the parameter(s) (in this case p) which makes the observed data most likely to have occurred

Poisson Regression
Applied for below situations
  • The occurrences of the event of interest in non-overlapping “time” intervals are independent
  • The probability two or more events in a small time interval is small, and
  • The probability that an event occurs in a short interval of time is proportional to the length of the time interval
  • Heteroscedasticity - means unequal error variances
Negative Binomial Model
  • The Poisson model does not always provide a good fit to a count response. 
  • An alternative model is the negative binomial distribution
Happy Learning!!!

March 11, 2016

Day #2 - Multivariate Linear Regression - R

  • More than one predictor involved in this case

Happy Learning!!!

March 10, 2016

R Day #1 - Simple Linear Regression - Slope

UCLA Notes were very useful.

Linear Regression Model - Representing mean of response variable as function using slope and intercept parameters. Can be used for predictions. I have earlier used moving average algorithm for forecasting.
  • Simple Linear Regression - Explanatory variable is 1 (Dependent variable is 1)
  • Multivariate Linear Regression - Number of Explanatory variables more than 1
Good Summary of Data Quality Issues were summarized
  • Data-entry errors
  • Missing values
  • Outliers
  • Unusual (e.g. asymmetric) distributions
  • Unexpected patterns
R Cookbook had good step by step examples to try out - link

Basics Maths Again

Slope - lines rate of change in the vertical direction

y = mx + b
  • y = dependent variable as y depends on x
  • x = independent variable
  • m , b = characteristics of line
  • b = y intercept where line crosses y axis
Ref - Link

Slope     = Rise / Run
              = Change in y / Change in X

Equation y = x
1 = 1
2 = 2

Slope = y/x = 2/2 = 1
Slope = y2-y1 / x2-x1

Slope > 1 tilt upwards towards y axis
Slope < 1 tilt downwards towards x axis




Ref - Link


Ref - Link

Happy Learning!!!