Data Science, Database, AI Startups and Domain Learning's (Video-Image-Text-Data-Database): 2016

December 17, 2016

Day #50 - Recommendation Systems

Recommendation Systems

Content Based
Collaborative (User-User / Item-Item)

Content Based Key Features - Able to recommend based on user taste and historical behavior. No need of other user data
Pros

No Need of Data from Other users
Recommend New and Unpopular items

Cons

Finding Appropriate feature is hard
Unable to Exploit Quality judgment from other users

Collaborative Key Features - Recommendation based on similar users / similar items.
Item-Item

For item-i, find other similar items
Better than user-user
Need enough users data
Works on any kind of item no feature selection is needed

User-User

Find users who have bought / rated similar items
Hard to find users rated same items

More Advanced Methods - Latent Factor Models

Happy Learning!!!

Day #49 - Clustering Key Notes

Happy Learning!!!

December 16, 2016

Day #48 - Distance Measures in Simple Steps

Happy Learning!!!

December 13, 2016

Day #47 - Deep Dive - Learning's

Tip #1 - Support Vector Machines

Performs classification by obtaining and utilizing optimal separating hyperplane that separates two classes and maximizes the distance to the closest point from either class called margin
Training involves non-linear optimization
Objective function is convex
So, the solution to optimization problem is relatively straight forward

Tip #2 Regularization - Involves adding penalty term in Error function. Two types of regularization in linear regression

Ridge
Lasso

Tip #3 - Stochastic Gradient Descent

Also called as batch gradient descent
One example at a time, move at once
Cheaper computation
Randomization - Escape shallow valleys, local minima, does take care of escaping silly local minima
Simplest possible optimization
SGD is applied in Neural Networks

Tip #4 - Gradient Descent

Meant to minimize non-linear function
Error measure convex function
Finding local minimum
Initialize -> Iterate until termination ->Adjust Learning Rate -> Terminate on local minimum
Return Weights

Tip #5 - Bias and Variance

Models with two few parameters may lead to High Bias
Models with too many parameters are inaccurate due to Large Variance

Happy Learning!!!

December 11, 2016

Day #46 - Recursive Feature Elimination

Recursive feature elimination is step wise backward feature elimination.

Backward Search

Start with all features
Greedily remove the least relevant feature
Stop when selected the least number of features

Recursive Feature Elimination

Train SVM
Rank the Features
Eliminate Feature with lowest Rank
Repeat until required number of features are retained

For each iteration RFE eliminates one feature with minimum weight. Intuition is feature with minimum weight would least influence weight vector form.

Happy Learning!!!

Day #45 - Handling Imbalanced Classes

SMOTE - Synthetic minority over sampling technique
Sampling with Replacement
Sampling without Replacement
Under sampling of Majority Class, Oversampling of Minority Class
Collect more samples

Happy Learning!!!

December 04, 2016

Day #44 - Real time and Batch Analytics - Vendors - Stack Comparison

Summary of analysis after evaluating different stacks

Happy learning!!!

December 03, 2016

Day #43 - Random Forest - One Page Summary

Consider how the mighty random forest...[From linkedin Post]
1. Handles both classification and regression.
2. Works with both categorical and numeric data.
3. Doesn't require centering/scaling of numeric data.
4. Is robust to outliers and over-fitting.
5. Works well on many business problems with hyperparameter default values.
6. Estimates generalization error.
7. Provides insights into feature importance.
8. Can be trained in parallel.
9. Provides an intuitive vehicle for understanding and working the bias-variance trade-off.
10. Supports problems with complex decision boundaries elegantly.

Happy Learning!!!

November 26, 2016

Day #42 - Classes in python

Today it's bit more on classes in python. It is similar to C# / C++ / Java

November 11, 2016

Day #41 - Machine Learning Interesting Qns

I do read through a lot of materials. Some readings are very clear and needs bookmark. Some of those questions and answers

How does KNN predict class label for new example ?

Find the nearest K neighbour of example which needs to be classified. Take the major vote based on class labels of the K neighbours found

Classification - Map input to discrete outputs
Generative Model - Naive Bayes
Discriminative Model - SVM, Decision Trees, Neural Networks, Boosting, KNN
Regression - Map input to continuous outputs
Decision Tress - Embedded Implicit Feature Selection method
PCA

Taking Data into a new space
Number of Eigen Values = Number of original dimensions
Pick the top k Eigen Value Vectors

8. Linearly non-separable in normal plane. With SVM Kernal Technique we can project it in hyper plane and make it linearly separable

9. Linearly Separable

Happy Learning!!!

November 05, 2016

Day #40 - Download Images from Web using Python

This post is about downloading images from a URL

Read from the input file
Perform recursive download for all files
Try catch handled errors and downloaded file successfully

October 31, 2016

Day #39 - Useful Tool MyMediaLite for Recommendations

This post is based on learning's for assignment link1, link2

Input is User-Items file as listed below

Sample Execution Command

We will be supplying parameter 20 in user20.txt to identify recommendations for user 20. The recommender type is mentioned in the --recommender parameter

Happy Learning!!!

October 30, 2016

Day #38 - Python Matrix Operations Learnings

Happy Learning!!!

October 12, 2016

Day #37 - Numpy Learnings - Matrices

Happy Learning!!!

October 10, 2016

Day #36 - Pandas Dataframe Learning's

Happy Learning!!!

Day #35 - Bias Vs Variance

These are frequently occurring terms with respect to performance of model against training and testing data sets.

Classification error = Bias + Variance

Bias (Under-fitting)

Bias is high if the concept class cannot model the true data distribution well, and does not depend on training set size.
High Bias will lead to under-fitting

How to identify High Bias

Training Error will be high
Cross Validation error also will be high (Both will be nearly the same)

Variance(Over-fitting)

High Variance will lead to over-fitting

How to identify High Variance

Training Error will be high
Cross Validation error also will be Very Very High compared to training error

Hot to Fix ?
Variance decreases with more training data, and increases with more complicated classifiers

Happy Learning!!!

October 08, 2016

Day #34 - What is diffference between Logistics Regression and Naive Bayes

Both are probabilistic
Logistics

Discriminative (Entire approach is purely discriminative)
P(Y/X)
Final Value lies between Zero and 1
Formula given by exp(w0+w1x)/(exp(w0+ w1x)+1)
Further can be expressed as 1/(1+(exp-(w0+ w1x))

Binary Logistic Regression - 2 class

Multinomial Logistic Regression - More than 2 class

Example - Link

Link - Ref
Logistic Regression

Classification Model
Probability of success as a sigmoid function of a linear combination of features
y belongs to (0,1) - 2 Class problem
p(yi) = 1 / 1+e-(w1x1+w2x2)
Linear combination of features - w1x1+w2x2
w can be found with max likelihood estimate-

Naive Bayes

Generative Model
P(X/ Given Y) is Naive Bayes Assumption
Distribution for each class

Happy Learning

October 04, 2016

Day #33 - Pandas Deep Dive

Happy Learning!!!

October 02, 2016

Good Data Science Course Links

AI Lectures

Introduction to Machine Learning

Happy Learning!!!

Short Analytics Concept Videos

Descriptive Analysis (Analysis of existing data, Trends and Patterns),
Diagnostic Analysis (Reasons / Patterns behind events)
Predictive Analytics (Future how will it look like)
Prescriptive Analysis (How to be prepared / handle the future)

Great Compilation, Keep Learning!!!

October 01, 2016

Day #32 - Regularization in Machine Learning

A large coefficient will result in overfitting. To avoid we perform regularization. Regularization - To avoid overfitting

L1 - Sum of values (Lasso - Least absolute shrinkage and selection operator). L1 will be meeting in co-ordinates and result in one of the dimensions zero. This would result in variable elimination. The features that minimally contribute will be ignored.
L2 - Sum of squares of values (Ridge). L2 is kind of circle shaped. This will shrink all coefficient in same proportion but eliminate none
Discriminative - In SVM we use hyperplane to classify the classes. This is example for discriminative approach
Probabilistic - Generated by Gauss Distribution. This is again based on Central Limit Theorem. Infinite points will fit into a Normal distribution. Here we apply gauss distribution model
Max Likelihood - Probability that the point p belongs to one distribution.

Good Read for L2 - Indeed, using the L2 loss comes from the assumption that the data is drawn from a Gaussian distribution

Another Read -

L1 Loss function minimizes the absolute differences between the estimated values and the existing target values. L1 loss function is more robust and is generally not affected by outliers
L2 loss function minimizes the squared differences between the estimated and existing target values. L2 error will be much larger in the case of outliers

Happy Learning!!!

September 25, 2016

Persuading Organization to embrace analytics

Products / systems currently running need to look at their Data Collection techniques to identify more relevant data to perform better analytics. If current systems rely on point in time data, overwrite / archive historical records over a period of time, we will lose all the valuable information

Why Analytics ?

Predict your future based on your past and present
Correct your mistakes before it's too late
Identify and correct poor performing segments of business

How Analytics differs from Business Intelligence ?

I have worked for ETL, data marts, Schemas for BI projects
BI helps to summarize compare business performance YoY, QoQ
Analytics, is next step for BI to look at future trends

Where are we lagging ?

We need analytics but we do not have enough data points / features to perform analytics. Data collection is a key aspect. The underlying blood of Data science is collecting meaningful data and making models out of it. We need to devote sufficient time to collect data, pipeline it, process and aggregate it for Data Analysis, Modelling.

To evolve from a current product to a system with Analytics capabilities we need to change we way we store data, process data. Technical aspects, project deadlines, resistance has to be handled to make things work.

Persist, Persuade, Implement....

Happy Learning!!!

September 05, 2016

Day #31 - Support Vector Machines

SVM

Support Vector Machines
Widest Street approach separating +ve and -ve classes, Separations as wide as possible
SVM works on classifying only two classes
Hard SVM (Strictly linearly separable)
Soft SVM (Minimize how they fall on another side, Constant C to minimize how much allow one point go on another side)
Kernel Functions perform transformation of data
Using Kernel function we simulate idea of finding linear separator
Kernels take data into higher dimensional space
Other Key concepts discussed (Lagrange Multipliers, Quadratic Optimization problem)
Lagrangian constraint transform from 1D to 2D data
SVM (Linear way of approximation)
Types of Kernels - Polynomial Kernel, Radial Basis Function Kernel, Sigmoid Kernel

Maths Behind it - Link
Good Relevant Read - SVM

Happy Data Analysis!!!

Day #30 - Machine Learning Fundamentals

Supervised Learning

Classification and Regression problems
Past data + Past outputs leveraged
Regression - Continuous Values
Classification - Discrete Labels

Unsupervised

Clustering - Discrete Labels
Dimensionality reduction - Continuous Values

Classifiers

SVM (Linear way of approximations)
KNN (Lazy learner)
Decision Tree (Rule based approach, Set of Rules)
Naive Bayes (Pick class with maximum probability)

Evaluation Methods

K-Fold Validation
Cross Validation
Ranking / Search - Relevance
Clustering - Intra-cluster and inter-cluster distances
Regression - Mean Square Error
ROC Curve

Bagging

Build classifier with 30% of data
Again partition and build another classifier with next 30% of data
Random Forests - Random combination of Trees
Randomly decide and split on attributes

Boosting

Multiple weak classifiers build strong classifier
Sample with replacement
Adaboost - Adaptive boosting

Stacking

Use Output from one classifier as input for another classifier
Knn -> O/P -> SVM

Happy Learning!!!

August 31, 2016

Day #29 - Decision Trees

Hierarchical, Divide and Conquer strategy, Supervised algorithm
Works on numerical data
Concepts discussed - Information gain, entropy computation (Shanon entropy)
Pruning based on chi-square / Shannon entropy
Convert all string / character into categorical / numerical mappings
You can also bucketize continuous variables

Basic Python pointers

Good Reads
Link1 , Link2, Link3, Link4, Link5, Link6

Happy Learning!!!

August 15, 2016

Day #28 - R - Forecast Library Examples

Following Examples discussed. Library used - R - Forecast Library

Moving Average
Single Exponential Smoothing - Uses single smoothing factor
Double Exponential Smoothing - Uses two constants and is better at handling trends
Triple Exponential Smoothing - Smoothing factor, trend, seasonal factors considered
ARIMA

Happy Learning!!!

August 08, 2016

Applied Machine Learning Notes

Supervised Learning

Classification (Discrete Labels)
Regression (Output is continuous, Example - Age, Stock prices)
Past data + Past Outputs used

Unsupervised Learning

Dimensionality reduction (Data in higher dimensions, Remove dimension without losing lot of information)
Reducing dimensionality makes it easy for computation (Continuous values)
Clustering (Discrete labels)
No Past outputs, Only current data

Reinforcement Learning

All Game Playing is unsupervised
Learning Policy
Negative / Positive reward for each step

Type of Models

Inductive (Learn model, Learn from a function) vs Transductive (Lazy learning ex- Opinion from like minded people)
Online (Learn from every new incoming tweet) vs Offline (Look past 1 Yeat tweet)
Generative (Apply Gaussian on Data, Use ML and compute Mean / Variance) vs Discriminative (Two sides of Line)
Parametric vs Non-Parametric Models

Happy Learning!!!

July 31, 2016

Fifth Elephant Day #2

Fifth Elephant Day #2 - Part I

Session #1 - Content Marketing

Distribute relevant consistent content. Traditional vs Content Marketing

Challenges

Delivering content with speed. Channel proliferation (mobile, computers, tablets)
Intersection of Brands, Trends, Community Interests (Social media post and metrics)
Data from social media pages, online aggregators

Technical Details

Computation of term frequency, inverse document frequency
Using Solr, Lucene for Indexes
Cosine Similarity
Greedy Algorithm

Session #2 - Reasoning

Prediction vs Reasoning problem
Prediction Problems Evolution
At Advanced level Deep Learning, XGBoost, Graphical models

When Apply prediction ?
Features as input -> Prediction performed (Independent, stateless)

Reasoning - Sequential, Stateful Exploration
Reasoning Problems - Diagnosis, routes, games, crossing roads

Flavours of Reasoning

Algorithmic (Search)
Logical reasoning
Bayesian probabilistic reasoning
Markovnian reasoning

Knowledge, Learning the process of reasoning, Knowledge graphs were should in implementation of reasoning
{subject, predicate, object}

Session #3 - Continuous online learning

70% noise in C2B communication
100% noise in B2C communication
Zipfian

Technicalities

Apriori - Market Basket Analysis
XGBoost - Alternative to DL
Bias - Variance Tradeoff
Spectral Clustering

Bird of Feathers Session

Google Deepmind (Used for Air conditioning)
Bayesian Probabilistic Learning
Deep Learning - Build Hierarchy of features (OCR type of problems)
Traditional Neural Network (Fully Connected, lot of degree of freedom)
Structural causality (Subsystem appears before, Domain knowledge)
Temporal causality - This and then that happened
CNN - learning weights
Spectral clustering
PCA (reduce denser to smaller)
Deep Learning - Hidden layers obtained through coarse grained process

Deep Learning workshop Notes

Neural Networks
Multiple Layers
Lots of data

People Involved - Hinton, Andrew Ng, Bengio, Lecuss

Deep Learning now

Speech recognition
Google Deep Models on Phone
Google street view (House numbers)
Imagenet
Captioning images
Reinforcement learning

Neural Networks

Simple mathematical units combine into complex functions
X-> input, W-> weights, Non linear function of output

Multiple Layers

Multiple hidden layers between input and output
Training hidden layers is challenge

Gradient Descent

Define loss function
Minimize by moving along gradient

Backpropagation

Move Errors back through the network
Chain rule conception

Tools

Cafee - Configuration file
Torch - Describe network in lue
Theano - Describes computation, writes cuda code, runs and gives results

CNN

Used for images
Images are organized
Apply Convolutional filter
For Deep Learning GPU is important

Imagenet Competition

Convolution (Have all nice features retain them)
Pooling (Shrink image)
Softmax
Other

Simplest RNN - Gradient Descent problem
LSTM (Long Short Term memory)
Interword relationships from corpus (word2vec)

Happy Learning!!!

July 28, 2016

Fifth Elephant Day #1 Notes - Part II

Sessions # - Link

Talk #3 - Machine Learning in FinTech

Lending Space
Credit underwriting system

India

2% Credit card usage
65% of population < 27 yrs
Digital foot print (mobile)
Identity (Aadhar)

40 Decisions / Minute -> 100 Crores a month

Use Cases / Scenarios

Truth Score (Validity of address / person / sources)
Need Score (Urgency / Time to respond application)
Saver Score (cash flow real-time analytics)
Credit Score (Debt to income)
Credit awareness score
Continuous risk assessments

Talk #4 - Driving Behaviour from Smartphone Sensors

For Safety driving using smartphone sensors
Spatial / location data
Road traffic injuries due to distracted driving
Phone usage - 4x crash risk
Speedy driving - 45% car crash history
Driving behavior analysis / driving feedback
GPS + Inertial Navigational sensors (Accelerometer / Gyroscope / Magnetometer)

Characterization

Drive detection
Event detection
Collision detection

Qualification

Drive summarization and scoring
Risk modelling

Optimization

Events, location of events, duration of events

Dynamics

Sensors
Availability - wide variety across devices
Raw Data - noisy, unevenly spaced time series
Events - Time scales, combination of sensors
Model building - Labelled vs unlabelled data, feature engineering
Algorithms - Stream / batch efficiency

Techniques

Cluster data
Eliminated uninteresting time periods
Classification / Regression models
Spectral clustering

Talk #5 - Indian Agriculture

Crop rotation literacy
Data curation, Query tools on data product
Visualization and plotting of Agricultural data

Tak #6 and #7 - Last two talks were from Ecologists

Using Image comparison for Big Cat Counting
Predicting Big Cat Areas (Territories)
Observe Nature, Frame Hypothesis, Design Experiments
Confront with competing hypothesis
Spacegap program
Markov chain Monte-Carlo technique

Happy Learning!!!

Fifth Elephant Day #1 Notes - Part I

Sessions # - Link

Talk #1 - Data for Genomic Analysis

Great talk by Ramesh. I had attended his session / technical discussion earlier. This session provided insights on genome / discrepancies in genome sequence leading to rare diseases.

Genome - 3 Billion X 2 Characters
Character variables varies from person to person
Stats (1/10th of probability of cancer)
Baseline risk for breast cancer (1/8),(1/70) ovarian cancer
BRCA1 mutation (5-6 fold increase in breast cancer, 27 fold increase for ovarian cancer)

In India

35% inherited risk mutation
1/25 Thalassemia
1 in 400-900 Retinitis Pigmentosa
1 in 500, Hypertrophic Cardiomyopathy

Data Processing

1 Billion reads - 100GB data per person
Very similar sequence yet one character might differ
But reference is 3 Billion long

Efficiency

Need fast indexing
Suffix Trees and variations
Hash table based approaches

Reference Genome Sequence

Volume of data
Funnel down of variety of dimensions
Triplet Code (Molecule)
Variants of Triplets nailed down to difference of gnome
GPU processing / reduce computation time

Concepts Discussed / Used

Hypothesis Testing
Stats Models
GPU Processing to reduce computation time

They also provide assessment for hereditary diseases at corporate level.

Talk #2 - Alternative to Wall Street Data

This session gave me some new strategies to collect / analyze data

How to Identify occupancy rate at hotel ?

Count of cars from parking lots
Number of rooms lights on
Take pics of rooms from corner of street and predict based on images collected
Unconventional ways to think of data collection (Beating the wall street model)

What are usual ways

Checking websites

From Investor perspective lodging key metrics is a very important aspect
Data Sources

Direct data gathering
Web harvesting
Primary research

Primary Research

Look at notice patterns in front of you
Difference in invoice numbers
Serial number changes, difference values

Free Data Sets in link
Lot of opportunity

Analyze international markets (India / China)
COGS
SG
ETC

How to value data sets ?

Scarcity - How widely used
Granularity - Time / aggregation level
Structured
Coverage

What is the generative value

Revenue Surprise Estimates
Dataset insight / Analysis
Operating GAAP measures

A Great case study on impact of smart watch vs luxury watch was presented ? This session provides great insight into unconventional data collection ways

Generate money in automated system
Stock sensitivity to revenue surprises
Identify underlying ground truth

"Some Refreshing changes to world of investment"

Happy Learning!!!

July 24, 2016

Day #27 - Exploring ggplot2

June 17, 2016

Good Read - Design Patterns

Happy Learning!!!

June 15, 2016

Day #26 - R - Moving Weighted Average

Example code based on two day workshop on Azure ML module. Simple example storing and accessing data from Azure workspace

Happy Learning!!!

June 01, 2016

Day #25 - Data Transformations in R

This post is on performing Data Transformations in R. This would be part of feature modelling. Advanced PCA will be done during later stages

Data Normalization in Python

Happy Learning!!!

May 20, 2016

Day #24 - Python Code Examples

Examples for - for loop, while loop, dictionary, function examples and plotting graphs Happy Learning!!

Day #23 - Newton Raphson - Gradient Descent

Newton Raphson

Optimization Technique
Newton's method tries to find a point x satisfying f'(x) = 0
Between two successive approximations
Stop iteration when difference between x(n+1) and x(n) is close to zero

Formula

x(n+1) = x(n) - (f(x)/f'(x))
Choose suitable value for x0

Gradient Descent

Works for convex function
x(n+1) = x(n) - af'(x)
a - learning rate
Gradient descent tries to find such a minimum x by using information from the first derivative of f
Both gradient and netwon raphson are similar the update rule is different

May 14, 2016

Day #22 - Data science - Maths Basics

Eigen Vector - Vector along which there is no change in direction

Eigen Value - Amount of Scaling factor defined by Eigen value

Eigen Value Decomposition - Only Square matrix can be performed Eigen Decomposition

Trace - Sum of Eigen Values

Rank of A - Number of Non-Zero Eigen Values

SVD - Singular Value Decomposition

Swiss Army Knife of Linear Algebra
SVD - for Stock market Prediction
SVD - for Data Compression
SVD - to model sentiments
SVD is Greatest Gift of Linear Algebra to Data Science
Square Root of (Eigen Values of AtA) - A Transpose A, becomes Singular Value of

Happy Learning!!! (Revise - Relearn - Practice)

May 09, 2016

Day #21 - Data Science - Maths Basics - Vectors and Matrices

Matrix - Combination of rows and columns
Check for Linear Dependence - R2 = R2 - 2R1, When one of the rows is all zeros it is linearly dependent
Span - Linear combination of vectors
Rank - Linearly Independent set

Good Related Read - Span

Vector Space - Space of vectors, collection of many vectors
If V,W belong to space, V+W also belongs to space, multiplied vector will lie in R Square
If the determinant is non-zero, then the vectors are linearly independent. Otherwise, they are linearly dependent

Vector space properties

Commutative x+y = y+x
Associative (x+y)+z = x+(y+z)
Origin vector - Vector will all zeros, 0+x = x+0 = x
Additive (Inverse) - For every X there exists -x such that x+(-x) = 0
Distributivity of scalar sum, r(x+s) = rx+rs
Distributivity of vector sum, r(x+s) = rx+rs
Identity multiplication, 1*x = x

Subspace
Vector Space V, Subset W. W is called subspace of V
Properties
W is subspace in following conditions

Zero vector belongs to W
if u and v are vectors, u+v is in W (closure under +)
if v is any vector in W, and c is any real number, c.v is in W

Subset S belongs to V can be represnted as linear combination
v = r1v1+ r2v2+... rkvk
v1,v2 distinct vectors from S, r belongs to R

Basis - Linearly Independent spanning set. Vector space is called basis if every vector in the vector space is a linear combination of set. All basis for vector V same cardinality

Null Space, Row Space, Column Space
Let A be m x n matrix

Null Space - All solutions for Ax = 0, Null space of A, denoted by Null A, is set of all homogenous solution for Ax=0
Row Space - Subspace of R power N spanned by row vectors is called Row Space
Column Space - Subspace of R power N spanned by column vector is called Column Space

Norms - Measure of length and magnitude

For (1,-1,2), L1 Norm = Absolute value = 1+1+2 = 4
L1 - Same Angle
L2 - Plane
L3 - Sum of vectors in 3D space
L2 norm (5,2) = 5*5+2*2 = 29
L infinity - Max of (5,2) = 5

Orthogonal - Dot product equals Zero
Orthogonality - Linearly Independent, perpendicular will be linearly independent
Orthogonal matrix will always have determinant +/-1

Map of Mathematics.

Enlarge the figure to see all the wonderful areas for exploration and imagination. Which topic might you find most fascinating?

By Dominic Walliman, @DominicWalliman, Source: https://t.co/mNu0hWzFGW, Used with permission. pic.twitter.com/kx1azWIhle
— Cliff Pickover (@pickover) August 22, 2022

Differential Equations - Notes - Link

Lectures - Link

Course Notes - Link

Happy Learning!!!

May 08, 2016

Day #20 - PCA basics

Machine Learning Algorithms adjusts itself based on the input data set. Very different from traditional rules based / logic based systems. The capability to tune itself and work according to changing data set makes it self-learning / self-updating systems. Obviously, the inputs / updated data would be supplied by humans.

Basics

Line is unidirectional, Square is 2D, Cube is 3D
Fundamentally shapes are just set of points
For a N-dimensional space it is represented in N-dimensional hypercube

Feature Extraction

Converting a feature vector from Higher to lower dimension

PCA (Principal Component Analysis)

Input is a large number of correlated variables We perform Orthogonal transformation, convert them into uncorrelated variables. We identify principal components based on highest variation
Orthogonal vector - Dot product equals zero. The components perpendicular to each other
This is achieved using SVD (Single Value Decomposition)
SVD internally solves the matrix and identifies the Eigen Vectors
Eigen vector does not change direction when linear transformation is applied
PCA is used to explain variations in data. Find principal component with largest variation, Direction with next highest variation (orthogonal for first PCA)
Rotation or Reflection is referred as Orthogonal Transformation
PCA - Use components with high variations
SVD - Express Data as a Matrix

More Reads

PCA Explained

PCA Quora Answer

Happy Learning!!!

May 03, 2016

Day #19 - Probability Basics

Concepts

Events - Subset of Sample Space
Sample Space - Set of all possible outcomes
Random Variable - Outcome of experiment captured by Random variable
Permutation - Ordering matters
Combination - Ordering does not matter
Binomial - Only two outcomes of trail
Poisson - Events that take place over and over again. Rate of Event denoted by lambda
Geometric - Suppose you'd like to figure out how many attempts at something is necessary until the first success occurs, and the probability of success is the same for each trial and the trials are independent of each other, then you'd want to use the geometric distribution
Conditional Probability - P(A Given B) = P(A) will occur assume B has already occurred
Normal Distribution - Appears because of central limit theorem (Gaussian and Normal Distribution both are same)

From Quora -

"Consider a binomial distribution with parameters n and p. The distribution is underlined by only two outcomes in the run of an independent trial- success and failure. A binomial distribution converges to a Poisson distribution when the parameter n tends to infinity and the probability of success p tends to zero. These extreme behaviours of the two parameters make the mean constant i.e. n*p = mean of Poisson distribution "

Read Michael Lamar's answer to Probability (statistics): What is difference between binominal, poisson and normal distribution? on Quora
Happy Learning!!!!

May 01, 2016

Day #18 - Linear Regression , K Nearest Neighbours

Linear Regression

Fitting straight line to set of data points
Create line to predict new values based on previous observations
Uses OLS (Ordinary Least Squares). Minimize squared error between each point and line
Maximum likelihood estimation
R squared - Fraction of total variation in Y
0 - R Squared - Terrible
1 - R Squared is good
High R Squared good fit

Linear Regression (Ref - Link )

ML Model to predict continuous variables based on set of features
Used where target variable is continuous
Minimize residuals of points from the line
Find line of best fit
y = mx + c
Residual = sum (y-mx-c)^2
Reduce residuals
Assumptions in LR
Linearity, Residuals Gaussian Distribution, Independence of errors, normal distribution

Updated May 28/ 2020

KNN

Supervised Machine Learning Technique
New Data point classify based on distance between existing points
Choice of K - Small enough to pick neighbours
Determine value of K based on trial tests
K nearest neighbours on scatter plot and identify neighbours

April 22, 2016

Day #17 - Python Basics

Happy Learning!!!

Neural Networks Basics

Notes from Session

Neurons - Synapses. Model brain at high level
Machine Learning - Algorithms for classification and prediction
Mimic brain structure in technology
Recommender engines use neural networks
With more data we can increase accuracy of models
Linear Regression, y = mx + b. Fit data set with little error possible.

Neural Network

Equation starts from neuron
Multiply weights to inputs (Weights are coefficients)
Apply activation function (Depends on problem being solved)

Basic Structure

Input Layer
Hidden Layer (Multiple hidden layers) - Computation done @ hidden layer
Output Layer
Supervised learning (Train & Test)
Loss function determines how error looks like
Deep Learning - Automatic Feature Detection

Happy Learning!!!

April 14, 2016

Basics - SUPPORT VECTOR MACHINES

Good Reading from link

Key Notes

Allow non-linear decision boundaries
SVM - Out of box supervised learning technique
Feature Space - Finite dimensional vector space
Each dimension represents feature
Goal of SVN - Train a model that assigns unseen objects into particular category
Creates linear partition of feature space
Based on features it places above or below separation linear
No stochastic element involved (No involvement of any previous state status)
support vector classifiers or soft margin classifiers - allows some observations to be on in-correct side of hyperplane allowing soft margin

Advantage

High Dimensionality, Memory Efficiency, Versatility

Disadvantages

Non probabilistic

More Reads

Guide to Machine Learning

Happy Learning!!!

Day #16 - Python Basics

Happy Learning!!!

April 10, 2016

Probability Tips

Discrete random variables are things we count
A discrete variable is a variable which can only take a countable number of values
Probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.
Continuous random variables are things we measure
A continuous random variable is a random variable where the data can take infinitely many values.
Probability density function (PDF), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value
Bernoulli process is a finite or infinite sequence of binary random variables
Markov Chain - stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event

Let's Continue Learning!!!

Day #15 - Data Science - Maths Basics

Day #15 - Mathematics Basics

Sets Basics

Cardinality - Number of distinct elements in Set (For a Finite Set)
For Real numbers cardinality infinite

Rational Numbers - Made of Ratio of two numbers
Fibonacci series was introduced in 1201 - Amazing :)

Functions

Represents relationship between mathematical variables
Spread of all possible output is called range
Function that maps from A to B. A is referred as (Domain), B is referred as co-domain

Matrix

Rows and columns define matrix
2D array of numbers
Eigen Values - Scalars, Eigen Vector - Vectors special set of values associated with Matrix M
Eigen Vectors - Those directions remain unchanged by action of matrix M
Trace - Sum of diagonal elements
Rank of Matrix - Number of linearly independent vectors

Determinant

Can be computed only for square matrix

Vector

Vectors have magnitude, length and direction
Magnitude and cost of angle will give you direction
Vector product non-commutative
Dot product commutative
Vector is linearly independent if none of vectors can be written as sum of multiple of other vectors

Happy Learning!!!

April 09, 2016

Day #14 - R Working Tips

Happy Learning!!!

April 07, 2016

Day #13 - Maths and Data Science

Recommender Systems - Pure matrix decomposition problem
Deep Learning - Matrix Calculus
Google Search - Page Rank, Social Media Graph Analysis - Eigen Decomposition

Happy Learning!!!

April 04, 2016

Ensemble

Combine many predictors and provide weighted average
Use single kind of learner but multiple instances
Collection of "Ok" predictors and combine them making them powerful
Learn Predictors and combine them using another new model
One layer of predictors providing features for next layer

Happy Learning!!!

March 31, 2016

Data Science - Good Reads

Good Reads!!!

March 30, 2016

Good Read - Winning Data Science Competitions

Interviews

Questions From Data Science Interviews

Let your work do the talking

Highly Creative Person - More(['work','attention','energy]') -> ['More Benefit']

Happy Learning!!!

Interesting Read - Data Science & Applications

A @Kaggle winner discusses his #MachineLearning secrets: https://t.co/OCIhKyuBRK #abdsc #BigData #DataScience pic.twitter.com/92dVvnaa13
— Kirk Borne (@KirkDBorne) March 21, 2016

Happy Reading!!!

March 29, 2016

Good Data Science Tech Talk

Today I spent some time with Tech talk on predictive modelling. Good Coverage of fundamentals, Needs revision again.

Read William Chen's answer to What are the most common mistakes made by aspiring data scientists? on Quora

Happy Learning!!!!

March 28, 2016

Data Science Day #12 - Text Processing in R

Today's post is basics on text processing using R. We look at removing stop words, numbers, punctuations, lower case conversion etc..

Happy Learning!!!

March 27, 2016

Day #11 - Data Science Learning Notes - Evaluating a model

R Square - Goodness of Fit Test

R square = (1- (Sum of Squares of Error/Sum of Squares of Total))
SST - Variance of dependent variable
SSE - Variance of Actual vs Predicted Values

Adjusted R Square

Adjusted R Square = (1-((n-1)/(n-p-1)))(1-RSquare)
P - Number of independent variables
n - records in dataset

RMSE (Root mean square error)

For every record predicted compute error
Square it and find mean
RMSE error should be same for training and testing dataset

Bias (Underfit)

Model can't explain the dataset
R Square value very less
Add more Independent variable

Variance

RMSE High for test dataset, RMSE low for training dataset
Cut down Independent variable

Collinearity Problem

Conduct P test to validate null hypothesis is valid

Next Pending Reads

Subset Selection Technique
Cross Validation Technique
Z test / P Test

Happy Learning!!!

March 22, 2016

Day #10 Data Science Learning - Correlations

Correlation

If you have correlation you can use machine learning to predict variables
Mutual relationship connection between two or more things
Correlation shows inter dependence between two variables
Measure - How much one changes when other also changes ?
Popularly Used - Pearson Correlation coefficient
Value ranges from -1 to +1
Negative correlation (Closer to -1) - One value goes up other goes down
Closer to Zero (No Correlation)
Closer to 1 (Positive Correlation)

Correlation - Relationship between two values
Causation - Reason for change in value (Cholesterol vs weight, Dress Size Vs Cholesterol). Identify if it is incidental.

Handling highly correlated variables

Remove two correlated variables, unless correlation=1 or -1, in which case one of the variables is redundant.
Perform a PCA
Permutation feature importance (Link)
Greedy Elimination (GE): iteratively eliminate feature of highest correlated feature pair
Recursive Feature Elimination (RFE): recursive elimination of features with respect to their importance
Lasso Regulariosion (LR): use L1 regularisation to remove features with zero weight
Principle Component Analysis (PCA): transform data set with PCA and choose components with highest variations

Ref - Link1, Link2 , Link3

Happy Learning!!!

March 21, 2016

Day #9– Data Science Basics Maths Notes

Happy Learning!!!

March 20, 2016

Data Science Tip Day #8 - Dealing with Skewness for Error histograms after Linear Regression

In previous posts we have seen histogram based validation for errors. When a left / right skew based distribution is observed some transformation techniques to apply are

Right Skewed - Apply Log
Slightly Right - Square root
Left Skewed - Exponential
Slightly Left - Square root, Cube

Happy Learning!!!

March 19, 2016

Data Science Tip Day#7 - Interaction Variables

This post is using interaction variables while performing linear regression

For illustration purpose lets construct some datasets with a three vectors (y,x,z)

Happy Learning!!!

Data Science Tip Day#6 - Linear Regression, Polynomial Regression

We will be renaming the R tips to Data Science Tips moving on. Today we will look at Linear Regression, Polynomial & Variable Interactions. Today's stats class was very useful and interesting. Any topic of interest needs practise to master the fundamentals.

Linear Regression
Assume a Linear Equation y = mx+b

Here m is slope, b is intercept value at x = 0
Similarly, we use the equation to express relationship between dependent and independent variables
In this equation y = mx+b, y is the dependent variable. x is the independent variable

For illustration purpose let's construct some datasets with a two vectors (y,x)

Happy Learning!!!

March 17, 2016

R Day #5 Tip of the Day - Linear Regression

I have taken up Udemy Data Science Classes. Below notes from Linear Regression Classes

Linear Regression

Analyze relationship between two or multiple variables
Goal is preparing equation (Outcome y, Predictors x)
Estimate value of dependent and independent variables using relationship equations
Used for Continuous variables that have some correlation between them
Goodness of fit test to validate model

Linear Equations

Explains relationship between two variables
X (Independent- Predictor), Y (Dependent)
Y = AX + B
A (Slope) = (Y/X)
B - Intercept (Value of Y when X =0)
Equation becomes predictor of Y

Fitting Line

Sum of squares of vertical distances minimal
Best Line = least residual
Difference between model and actual values are called as residuals

Goodness of Fit

R square measure
Sum of squares of distances (Sum of squares of vertical distances minimal)
Uses residual values
Higher R square value better the fit (close to 1 higher fit)
Higher Correlation means better fit (R square will also be high)

Multiple Regression

Multiple predictors involved (X Values)
More than one independent variable used to predict dependent variable
Y = A1X1 + A2X2 + A3X3 +ApXp + B

11 Important Model Evaluation Techniques Everyone Should Know

Homoscedasticity - all random variables in the sequence or vector have the same finite variance

heteroscedasticity - variability of a variable is unequal across the range of values of a second variable that predicts it

Pearson's correlation coefficient (r) is a measure of the strength of the association between the two variables

Happy Learning!!!

March 16, 2016

R Day #4 - Tip for the Day - Learning Resources

I came across this site NYU Stats R Site

Please bookmark it for a good walkthru and learning on R Core topics

Happy Learning!!!

March 15, 2016

Good Reads - Data Science

Read Joaquin Quiñonero Candela's answer to In applied Machine Learning what is more important: data, infrastructure, or algorithms? on Quora

Read Joaquin Quiñonero Candela's answer to What do you look for when hiring someone for your team? on Quora

Happy Learning!!!

December 17, 2016

December 16, 2016

December 13, 2016

December 11, 2016

December 04, 2016

December 03, 2016

November 26, 2016

November 11, 2016

November 05, 2016

October 31, 2016

October 30, 2016

October 12, 2016

October 10, 2016

October 08, 2016

October 04, 2016

October 02, 2016

October 01, 2016

September 25, 2016

September 05, 2016

August 31, 2016

August 15, 2016

August 08, 2016

July 31, 2016

July 28, 2016

July 24, 2016

June 17, 2016

June 15, 2016

June 01, 2016

May 20, 2016

May 14, 2016

May 09, 2016

May 08, 2016

May 03, 2016

May 01, 2016

April 22, 2016

April 14, 2016

April 10, 2016

April 09, 2016

April 07, 2016

April 04, 2016

March 31, 2016

March 30, 2016

March 29, 2016

March 28, 2016

March 27, 2016

March 22, 2016

March 21, 2016

March 20, 2016

March 19, 2016

March 17, 2016

March 16, 2016

March 15, 2016

Git Code Repository

About Me

What is your Expertise

Search This Blog

Translate

About Me and Disclaimer

Labels

Data Science Good Reads

Cloud, Datacentre, BigData and NOSQL Blogs

SQL Links

Archecture Blog List

Programming Problems

Startup - Reads

Perl-Python-Ruby-Linux-Oracle

Management + Leadership Blogs

Research Papers & Podcasts

My Wordpress

Interesting Reads

Useful Links - C# and .NET

Java, Selenium, QTP and Test Tools Learning

Agile Testing

Reverse Logistics Reads

Biztalk Blogs

MS BI Links

Process - Learnt it :)

Usability Guidelines - Building Better Sites

.NET Test Tools and Other Interesting Reads

Review Checklist