"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

July 21, 2014

Machine Learning Notes - Anomaly Detection - Entropy Computation


This post is on my learning's from Machine Learning Session conducted by my colleague Gopi. It was really a good introduction and a lot of motivation towards learning the topic.

Concepts Discussed
  • Homogeneity - Is my data homogeneous
  • Pick the odd one out (Anomaly detection)
  • Entropy Computation
Wide variety of examples to find odd sets, variations. Example from below set identify the anomaly one
1,1,1,2
1,2,2,1
1,2,1,1
1,0,1,2

The last row involving zero is a odd one. Identifying them using entropy computation was very useful

Entropy Formula



Formula detailed notes from link

For row (1,1,1,2)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811
 For row (1,2,2,1)
 = -[((2/4)*log2(2/4)) + ((2/4)*log2(2/4))]
 = -[-.5-.5]
 = 1
 For row (1,2,1,1)
 = -[((3/4)*log2(3/4)) + ((1/4)*log2(1/4))]
 = -[-0.311 -0.5]
 = .811 
  
 For row (1,0,1,2)
 = -[((2/4)*log2(2/4)) + ((1/4)*log2(1/4)) + ((1/4)*log2(1/4))]
 = -[-0.5 -0.311 -0.311]
 = 1.12

By excluding the row with higher values we will have homogeneous data set, The one last row with high entropy is the anomaly  
If Data set is homogeneous after removing a particular record set then that particular record set is the anomaly one

More Concepts Introduced
  • Conditional Probability
  • ID3 Algorithm
  • Measure Entropy
  • Decision Tree
  • Random Forest
  • Bagging Technique
Happy Learning!!! 

July 18, 2014

Multithreading - Automation Basics - Usage of lock to ensure threadsafe


Example Code
  • Usage of lock to ensure threadsafe
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;

namespace MultithreadedApp
{
    class Program
    {
        //static long _value1;
        private object threadLock = new object();
        void runcode()
        {
            lock (threadLock)
            {
                for (int i = 0; i < 100; i++)
                {
                    Console.WriteLine("Value of _value1 " + i);
                }
            }
        }
        static void Main(string[] args)
        {
            Thread[] agents = new Thread[10];
            Program[] P = new Program[10];
            for (int i = 0; i < 10; i++)
            {
                P[i] = new Program();
                agents[i] = new Thread(P[i].runcode);
                agents[i].Start();
            }
            Console.ReadLine();
        }
    }
}
Happy Learning!!!

July 17, 2014

Big data Testing Tools - Functional and Performance

This post is based on my learning notes on functional test tools for Big Data ecosystem. Earlier posts, we have read the basics of big data ecosystem components. Sharing the first version of Test Tools Analysis for Functional Testing

Product / Area
Testing Tools
Test Approach
Programming Language
Reference
HiveMQ
MQTT Testing Utility, Tsung


Storm
storm test

Clojure
Hive, Pig
Beetest, Pigmix, Apache DataFu
Query HIVE (Similar to TSQL)

Map Reduce Jobs
MRUnit, MRBench


Analytics

Lift charts, Target shuffling, Bootstrap sampling to test the consistency of the model

HBASE
Junit, Mockito, Apache MRUnit



Jmeter Plugins for Hadoop, HBASE, Cassandra



More Tools
Performance Testing Tools Analysis

Area
Tool
Comments
HBASE
Inbuilt tool
usage –
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
$ bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10240 randomWrite 1
More Reads - link

Automation scripts for comparing different HBase BlockCache implementations - link

Hbase write throughput link

PerformanceEvaluation (Inbuilt HBASE tool) - (Validate Read / Writes / Scans performance in the environment etc..)
YCSB
Performance testing HBase using YCSB , link1, link2

All Big Data Areas (HBASE, Hadoop, MapReduce
Sandstorm commercial cloud / in premise tool for Big Data QA

Kafka




Spark



More ReadsImpetus Perf Engineering Blog
Rest-assured - Java DSL for easy testing of REST services
Retrofit - A type-safe REST client for Android and Java
G7 - Tools for Big Data

Cloud Testing Tools

Test Environment Setup using Cloud Infrastructure
  • Load generation in cloud for on premises application
  • Load generator on premises, application on cloud
  • Both load generator and application both on cloud
Amazon Cloud Pieces
  • EC2 - Elastic Compute Cloud -> CPUs
  • EBS - Elastic Block Storage -> Database
  • S3 - Simple Storage Services -> Storage
  • Ec2 Dream Tool for connecting to multiple cloud providers - link
Blazemeter
  • Distributed geographical performance test tool
  • For First level of testing only
  • Upload and run your custom Jmeter Scripts through blazemeter
Load Test Tools
  • Flood.io
  • loadfocus
Security Testing Tools
  • NTOSpider
  • Burp Proxy
Blazemeter walkthrough Example

Step 1 - Create Load Test


Step 2 - Configure URL


Step 3 - Start Run



Step 4 - Reports



You can also upload Jmeter scripts and execute it through blazemeter

Happy Learning!!!

July 06, 2014

Weekend Reading - Webinar - Performance Testing Approach for Big Data Applications.

Very good session – Webinar - Performance Testing Approach for Big Data Applications. Few interesting notes / slides from session


  • Rate of Data Ingestion - How fast system consumes data?
  • Data Processing - Speed how data is processed. Testing Data processing in isolation with data sets populated. Run specific perf tests (MR Jobs, Pig, Hive Scripts)
  • Data Persistence – I/O bound process. (Data Writes / Updates on DB, Garbage Collection, Monitoring Metrics)
  • Complete end to end time for processing (Network Connectivity, Processing, Results)

Big Data Test Challenges
  • Diverse Technologies
  • Unavailability of Test Tools for Big Data Technologies / Scenarios
  • Limited Monitoring / Diagnostic Solutions
  • Test Scripting / Environment
Perf Test Tools
  • Use cloud to simulate large infrastructure
  • Cloud orchestration scripts Puppet, Chef

Approach
  • Depending on usage in production identify patterns for production workload
  • Fault Tolerance Scenarios
  • Hadoop monitoring tools to check Map reduce jobs
  • Selecting Test Clients - Custom code
  • Performance / Failover tests to ensure scalability (Node failures during processing)
Test Parameters and Summary



Very Nice, Practical and useful webinar. There are a lot of posts / webinars. This one is very useful and practical.

More Reads
Evaluating SolrMeter for Performance Testing
Benchmarking with HTTPerf.js and NodeUnit

Happy Learning!!!

Weekend Reads - API Testing (Web Service Testing, Inspect Http request, Rest API Testing) - Free tools

Very good presentation on compiled list of free tools for API Testing. More Details - Free API debugging and testing tools you should know about

Tools List from the presentation and SO reads
While reading through the list again went back to check on SOAP, Rest Basics. API testing / Web Service testing we will be looking into only aspects Rest, SOAP based web services. Summary based on StackOverflow readings, posts. References - StackOverflow reference answers link. More details Pls check reference link. (Consolidated Answer and Detailed short summary listed below)

Rest
SOAP
REST is over HTTP. REST has no WSDL interface definition
SOAP can be over any transportprotocols such HTTP, FTP, STMP, JMS etc.
REST stands for Representational State Transfer. REST approach uses the standard GET, PUT, POST, and DELETE verbs
Simple Object Access Protocol (SOAP) 
SOAP builds an XML protocol on top of HTTP  / TCP/IP.
REST is good for getting a blob of data that you don't have to work with
SOAP describes functions, and types of data. If you want to get an object, SOAP is way quicker and easier to implement
Typically uses normal HTTP methods instead of a big XML format describing everything
Has several protocols and technologies relating to it: WSDL, XSDs, SOAP, WS-Addressing
REST plays well with AJAX'y web pages. If you keep your requests simple, you can make service calls directly from your JavaScript, and that comes in very handy.
SOAP is useful from a tooling perspective because the WSDL is so easily consumed by tools. So, you can get Web Service clients generated for you in your favourite language.

More Reads
From AWS Blog - 80% REST / 20% SOAP usage pattern

Happy Learning!!!

July 05, 2014

APIs Good Read

I'm learning Big Data basics, checking real life architectures to understand the technology, implementation. Interesting slide on API's. I am sharing the same.


Happy Reading!!!