"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 31, 2016

Fifth Elephant Day #2

Fifth Elephant Day #2 - Part I

Session #1 - Content Marketing
  • Distribute relevant consistent content. Traditional vs Content Marketing
Challenges
  • Delivering content with speed. Channel proliferation (mobile, computers, tablets)
  • Intersection of Brands, Trends, Community Interests (Social media post and metrics)
  • Data from social media pages, online aggregators



Technical Details
  • Computation of term frequency, inverse document frequency
  • Using Solr, Lucene for Indexes
  • Cosine Similarity
  • Greedy Algorithm
Session #2 - Reasoning
  • Prediction vs Reasoning problem
  • Prediction Problems Evolution 
  • At Advanced level Deep Learning, XGBoost, Graphical models
When Apply prediction ?
Features as input -> Prediction performed (Independent, stateless)

Reasoning - Sequential, Stateful Exploration
Reasoning Problems - Diagnosis, routes, games, crossing roads

Flavours of Reasoning
  • Algorithmic (Search)
  • Logical reasoning
  • Bayesian probabilistic reasoning
  • Markovnian reasoning
Knowledge, Learning the process of reasoning, Knowledge graphs were should in implementation of reasoning
{subject, predicate, object}















Session #3 - Continuous online learning
  • 70% noise in C2B communication
  • 100% noise in B2C communication
  • Zipfian
Technicalities
  • Apriori - Market Basket Analysis
  • XGBoost - Alternative to DL
  • Bias - Variance Tradeoff
  • Spectral Clustering






Bird of Feathers Session
  • Google Deepmind (Used for Air conditioning)
  • Bayesian Probabilistic Learning
  • Deep Learning - Build Hierarchy of features (OCR type of problems)
  • Traditional Neural Network (Fully Connected, lot of degree of freedom)
  • Structural causality (Subsystem appears before, Domain knowledge)
  • Temporal causality - This and then that happened
  • CNN - learning weights
  • Spectral clustering
  • PCA (reduce denser to smaller)
  • Deep Learning - Hidden layers obtained through coarse grained process
Deep Learning workshop Notes
  • Neural Networks
  • Multiple Layers
  • Lots of data
People Involved - Hinton, Andrew Ng, Bengio, Lecuss

Deep Learning now
  • Speech recognition
  • Google Deep Models on Phone
  • Google street view (House numbers)
  • Imagenet
  • Captioning images
  • Reinforcement learning
Neural Networks
  • Simple mathematical units combine into complex functions
  • X-> input, W-> weights, Non linear function of output
Multiple Layers
  • Multiple hidden layers between input and output
  • Training hidden layers is challenge
Gradient Descent
  • Define loss function
  • Minimize by moving along gradient
Backpropagation
  • Move Errors back through the network
  • Chain rule conception
Tools
  • Cafee - Configuration file
  • Torch - Describe network in lue
  • Theano - Describes computation, writes cuda code, runs and gives results
CNN
  • Used for images
  • Images are organized
  • Apply Convolutional filter
  • For Deep Learning GPU is important
Imagenet Competition
  • Convolution (Have all nice features retain them)
  • Pooling (Shrink image)
  • Softmax
  • Other
Simplest RNN - Gradient Descent problem
LSTM (Long Short Term memory)
Interword relationships from corpus (word2vec)

Happy Learning!!!

July 28, 2016

Fifth Elephant Day #1 Notes - Part II

Sessions # - Link

Talk #3 - Machine Learning in FinTech
  • Lending Space
  • Credit underwriting system
India
  • 2% Credit card usage
  • 65% of population < 27 yrs
  • Digital foot print (mobile)
  • Identity (Aadhar)
40 Decisions / Minute -> 100 Crores a month

Use Cases / Scenarios
  • Truth Score (Validity of address / person / sources)
  • Need Score (Urgency / Time to respond application)
  • Saver Score (cash flow real-time analytics)
  • Credit Score (Debt to income)
  • Credit awareness score
  • Continuous risk assessments
Talk #4 - Driving Behaviour from Smartphone Sensors
  • For Safety driving using smartphone sensors
  • Spatial / location data
  • Road traffic injuries due to distracted driving
  • Phone usage - 4x crash risk
  • Speedy driving - 45% car crash history
  • Driving behavior analysis / driving feedback
  • GPS + Inertial Navigational sensors (Accelerometer / Gyroscope / Magnetometer)
Characterization
  • Drive detection
  • Event detection
  • Collision detection
Qualification
  • Drive summarization and scoring
  • Risk modelling
Optimization
  • Events, location of events, duration of events
Dynamics
  • Sensors
  • Availability - wide variety across devices
  • Raw Data - noisy, unevenly spaced time series
  • Events - Time scales, combination of sensors
  • Model building - Labelled vs unlabelled data, feature engineering
  • Algorithms - Stream / batch efficiency
Techniques
  • Cluster data 
  • Eliminated uninteresting time periods
  • Classification / Regression models
  • Spectral clustering
Talk #5 - Indian Agriculture
  • Crop rotation literacy
  • Data curation, Query tools on data product
  • Visualization and plotting of Agricultural data
Tak #6 and #7 - Last two talks were from Ecologists
  • Using Image comparison for Big Cat Counting
  • Predicting Big Cat Areas (Territories)
  • Observe Nature, Frame Hypothesis, Design Experiments
  • Confront with competing hypothesis
  • Spacegap program
  • Markov chain Monte-Carlo technique


Happy Learning!!!

Fifth Elephant Day #1 Notes - Part I

Sessions # - Link

Talk #1 - Data for Genomic Analysis

Great talk by Ramesh. I had attended his session / technical discussion earlier. This session provided insights on genome / discrepancies in genome sequence leading to rare diseases.

Genome - 3 Billion X 2 Characters
Character variables varies from person to person
Stats (1/10th of probability of cancer)
Baseline risk for breast cancer (1/8),(1/70) ovarian cancer
BRCA1 mutation (5-6 fold increase in breast cancer, 27 fold increase for ovarian cancer)

In India
  • 35% inherited risk mutation
  • 1/25 Thalassemia 
  • 1 in 400-900 Retinitis Pigmentosa
  • 1 in 500, Hypertrophic Cardiomyopathy
Data Processing
  • 1 Billion reads - 100GB data per person
  • Very similar sequence yet one character might differ
  • But reference is 3 Billion long
Efficiency
  • Need fast indexing
  • Suffix Trees and variations
  • Hash table based approaches
Reference Genome Sequence
  • Volume of data
  • Funnel down of variety of dimensions
  • Triplet Code (Molecule)
  • Variants of Triplets nailed down to difference of gnome
  • GPU processing / reduce computation time
Concepts Discussed / Used
  • Hypothesis Testing
  • Stats Models
  • GPU Processing to reduce computation time
They also provide assessment for hereditary diseases at corporate level.

Talk #2 - Alternative to Wall Street Data

This session gave me some new strategies to collect / analyze data

How to Identify occupancy rate at hotel ?
  •  Count of cars from parking lots
  •  Number of rooms lights on
  •  Take pics of rooms from corner of street and predict based on images collected
  •  Unconventional ways to think of data collection (Beating the wall street model)
What are usual ways
  •  Checking websites
From Investor perspective lodging key metrics is a very important aspect
Data Sources
  • Direct data gathering
  • Web harvesting
  • Primary research
Primary Research
  • Look at notice patterns in front of you
  • Difference in invoice numbers
  • Serial number changes, difference values
Free Data Sets in link
Lot of opportunity
  • Analyze international markets (India / China)
  • COGS
  • SG
  • ETC
How to value data sets ?
  • Scarcity - How widely used
  • Granularity - Time / aggregation level
  • Structured
  • Coverage



What is the generative value
  • Revenue Surprise Estimates
  • Dataset insight / Analysis
  • Operating GAAP measures
A Great case study on impact of smart watch vs luxury watch was presented ? This session provides great insight into unconventional data collection ways
  • Generate money in automated system
  • Stock sensitivity to revenue surprises
  • Identify underlying ground truth
"Some Refreshing changes to world of investment"

Happy Learning!!!