"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

October 20, 2014

Open Source Test Tools Vs Commercial Test Tools


I have never had the taste of working with record - playback tools. I have mostly developed custom tools / scripts for qa tasks / deployment tasks. I work in different streams Database DEV - QA - Tools Development - Performance & Big Data too. Working on different areas provides a different perspective than doing repetitive things. My perspective of QA evolves with reusable scripts for data generation / scripts that simplify / eliminate repetitive tasks during deployment / configuration / testing / validating.

I have worked with Selenium, Coded UI, Custom developed Automation test frameworks. I have observed new engineering efforts for Automation test framework in every company I worked for. Either the code base becomes too big to manage / modify or new folks hired move towards develop from scratch than maintenance efforts. It is questionable ROI calculation. When the quality of DEV is poor every small bug in QA might show up as hundreds of bugs. How much of this bugs can actually be identified by basic QA check by DEV is another point to consider measuring QA bugs.

QA efforts are often viewed as commodity efforts where focus is mainly to deliver and repetitive cycles of testing are acceptable. Instead of such a model joint DEV-QA effort would always help to identify most bugs before releasing it to QA.

Both Open Source Tools Vs Commerical tools helps address test automation challenges. One instance. Working with WINCE apps, Its very difficult to automate hardware - software integration workflows. Testcomplete tool eliminated most of this efforts with emulating the actions on Mymobiler which in turn mimics real user on WINCE installed hardware.

Effort involved to automated WINCE App deployed on a device vs using a tool like test complete to completely eliminate the pain-point in writing WIN32 calls (Send message, Send Keys) is worth evaluating before thinking of license cost.

Overall its a mix of Open Source Tools, Commercial Tools, In-House scripts, Quality practices in coding, unit testing only can ensure a Quality product. Responsibilities are not just for one function but every function need to be accountable / responsible to deliver a Quality Product.

Automation tools are primarily viewed as Record / Playback, Automation Framework Implementation. More than it they can be also leveraged for
  • Throw Away Scripts to Aid Functional Testers to eliminate repetitive tasks
  • Automation tool can be used to support system activities during functional testing - Monitoring, Screen Capture, Aid Testing by Simulating user events during tests / Support long running tests  
  • Extend it for Support / UAT Environments for for Deployment / Installation where installation involves several client / server / web components installation
  • Aid Automated Deployments / Un-Installations
Happy Learning!!!

Testrail - TCM Tool

This post in on Analysis of Testrail and migrating existing test cases using Testrail

Testrail has a great web interface to organize and create test cases. The factors that makes Testrail competitive candidate are
  • Ease of creating / managing test cases
  • Migration Support for existing test cases
  • API support for automated migration / test case creation / execution / update results
  • Test Case execution out of box reports
  • Integration with bug tracking tools
  • Hosted / In-Premise Model
  • Existing Github projects for .NET / Java / Other languages Automation / Migration Support
  • Great Tech Support
Test Case Migration Efforts

Different aspects involved in test case migration efforts for any TCM tool 
  1. Test case template - Identify required fields inbuilt, custom fields for test case template. Develop, Modify and Evaluate templates to arrive at Test Case Template
  2. Organizing Test Cases  (Test Suites) - Feature wise, Release related test cases. Analyse, Identify structure and arrive, evaluate it (Functional, Regression, Features Areas) plus release specific cases
  3. Migration Efforts - Based on template, test case structures prepare custom xml cases for all migration test cases
    1. Validate, arrive at approach to validate all migrated cases
    2. For attributes identified from Test case template what are values to be filled for existing test cases in case if they were not used
  4. Default values for unused fields (Drop down list / custom values)
  5. Automation Integration, Defect / Bug Tracking Tools Integration
  6. Automation Test cases - Identify, Automation Test cases, Templates, Details
  7. QA Reports - Analyse available reports and custom report needs in Test Rail. Email based reporting on metrics, daily test case execution etc
  8. Custom Tools - Write Test cases in Excel and upload directly from excel to test rail. This tool can be used for writing test cases, update test results directly from excel to Testrail
  9. QA Process document Develop Process document (guidelines / best practices) on adding test cases, updating functional, regression, release related test cases, using Test rail (Permissions), Test case reviews using Test rail
  10. Test Results Archival / Maintenance - Test results / test runs / Test cases archival / maintenance approach
  11. Hosting – local hosting / cloud based Pros / cons of local hosting / cloud hosting
  12. Security / Administration / Configuring users - Admin related aspects, identifying roles / permissions for users
  13. Identifying Pilot projects for Test Rail evaluation period after finalizing above areas Pilot projects for usability, tracking, upgrading before complete migration

Happy Learning!!!

October 10, 2014

HBase Overview Notes

Limitations of Hadoop 1.0
  • No Random Access --> Hadoop for more batch access (OLAP)
  • Not suitable for Real-time Access
  • No Update - Access Pattern is WORM (Write Once Read Multiple Times Hadoop best suited)
Why HBase
  • Flexible Schema Design --> Add a new column when a row is added
  • Multiple versions of a single cell (Data)
  • Columnar storage
  • Cache columns at client side
  • Compression of columns
Read  v/s Write

  • For Availability (Compromise on Write) vs Consistency (Compromise on Read)
Hbase
  • NoSQL Class on Non-Relational Storage Systems
  • In RDBMS it is Rowkey based allocations, HBase it is columnar storage
  • Hbase needs HDFS for replication
  • ZooKeeper - Taking all requests from client. Client will communicate from zookeeper Client -> ZooKeeper -> HMaster
  • Region Server - It Serves the region. Region Server processor runs on slaves (Data Nodes)
Happy Learning!!!

October 09, 2014

Pig Overview Notes

Pig
  • Primarily for semi structured data
  • So called 'Pig' as it processes all kinds of data
  • Pig is data flow language not a procedural language
  • Map Reduce - Java Programmers, Hive - for TSQL folks, Pig (Rapid Prototyping & increased productivity)
  • Pig is on client side, need not be on cluster
  • Execution Sequence - Query Parser -> Semantic Checking -> Logical Optimizer (Variable level) -> Logical to physical translator -> Physical to M/R translator -> MapReduce Launcher
  • Ping Concepts - Map - array, Tuple - ordered list of data ,Bag - Unordered collection of tuple
  • Pig - for client side access, Hive will work only within cluster, semi structured data
  • Hive - Best suited for SQL style analytics, structured data
  • MR - Audio Video Analytics Map Reduce Approach is the only option

Happy Learning!!!

October 08, 2014

Hive Overview Notes

  • Data Warehousing package built on top of hadoop
  • Managing and querying structured data
  • Apache Derby embedded DB used by Hive
  • metastore_db folder for persistence of data
  • Suitable for WORM - Write Once Read Many Times Access Pattern
  • Core Components are Shell, Metastore, Execution Engine, Compiler (Parse, Plan, Optimize), Driver
  • Tables can be created as Internal Tables, External Table (Pointing to external file)
  • When Internal Tables are dropped schema + data is dropped. For external referencing tables only Schema is dropped not data. Both Internal and External tables reside in HDFS
  • Data files for created tables would be available in location /user/hive/warehouse
  • Partitioning in Hive - Hash Value % Number of buckets - that particular row will go into that bucket
  • Partition table should always be an Internal Hive Table
Happy Learning!!!

October 07, 2014

Map Reduce Internals

Client Submits Job. Job Tracker does the splitting, scheduling Job

Mapper
  • Mapper runs the business logic (ex- word counting)
  • Mapper (Maps your need from the record)
  • Record reader provides input to mapper in key value format
  • Mapper Side Join (Distributed Caching)
  • Output of mapper (list of keys and values). Output of mapper function stored in Sequence file
  • Framework does splitting based on input format, Default is new line (text format)
  • Every row / Record will go through map function
  • When there is a data split (row) is split between two 64MB Blocks. That particular row would be merged for complete record and processed
  • Default block size in Hadoop 2.0 is 128MB
Reducer
  • Reducer will poll it, job tracker will inform what all nodes to poll
  • Default number of reducer is 1. This is configurable
  • Multiple Reducers - Not possible - Multiple level MR jobs possible
  • Reduce Side join (Join @ Reducer Level)
Combiner
  • Combiner - Mini Reducer, Combiner before writing to disk, finds max value from data
  • Combiner is used when map job itself can do some preprocessing to minimize reducer workload
Partitioner
  • Hash Partitioner is default partitioner
  • Mapper -> Combiner -> Partitioner -> Reducer (For multi-dimension, 2012-max sales by product, 2013, max sales by location)
Happy Learning!!!

October 06, 2014

Hadoop Ecosystem Internals

Hadoop Internals - This post is quick summary from learning session.

Data Copy Basics (Writing data to HDFS)
  • Network Proximity during Data Storage (First 2 Ips closest to client)
  • Data Storage size in 64MB Blocks
  • Data Replication Copy by default 3
  • Client gets error message when Primary Node Data Write Operation Errors
  • Blocks will be horizontally split on different machines
  • Slave uses SSH to connect to master (Communication between Nodes also SSH)
  • Client communication through RPC
  • Writing happens parallel-y, replication happens in a pipeline
Analysis / Reads (Reading Data from HDFS)
  • Client -> Master -> Nearest Ips returned for Nodes
  • Master knows performance utilization of nodes, It would allocate machine which is least used for Processing (Where data Copy exists)
Concepts
  • Namenode - Metadata
  • DataNode - Actual Data
  • chmod 755 - Owner Write permission, others read and execute
  • Rack - Physical Set of Machines
  • Node - Individual machine
  • Cluster - Set of Racks
Learning Resources
Tools
Happy Learning!!!

October 03, 2014

Hbase Primer Part III


This post is on Read / Write Operations Overview on Hbase. Steps were clear from DB Paper (Exploring NOSQL, Hadoop and Hbase by Ricardo Pettine and Karim Wadie). I'm unable to locate the link to download the paper.

I'm reposting few steps from the paper which lists down steps on Read / Write Operations on Hbase. ZooKeeper is used to perform coordination in Storm, Hbase

Data Path
Table - Hbase Table
  Region - Regions for the Table
Store - Store per column family for each region 
MemStore - Memstore for each Store
Store File - Stores File for each Store
Block - Block within Store File

Write Path
  • Client Request sent to Zoo Keeper 
  • Zoo Keeper find meta data and returns it to client
  • Client Scans region server for new key storage where data need to be stored
  • Client sends request to Region Server
  • Region Server processes the request, Write operation follows WAL (Write Ahead Logging), Same concept is available in other database too
  • Memstore in this case when it is full, Data is pushed into disk

Read Path
  • Client Issues GetCommand
  • Zookeeper identifies Meta data and returns to client
  • Client Scans Region Server to locate data
  • Both memstore and store files are scanned

Happy Learning!!!