"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

March 24, 2012

Hadoop VS Vertica - Big Data Webinar

[You may also like - Big Data - Basics - Getting Started]

Vertica Ranked #1 based on Big Data Revenues. Please check webinar on Hadoop VS RDBMS for Big Data Analytics. Captured are notes from the Webinar.
  • Big Data volumes Projected growth by 2020 is 35 ZB
Hadoop Overview
  • Support unstructured data processing
  • Hadoop / Map reduce programming
  • Different type of computations possible 
  • Low cost Infrastructure
  • Batch Processing based not real time

Survey of Big Data Tools based on Parameters (Deep Analytics not just aggregating Data, Real Time Analytics, Volume of Data)
Veritca seems to succeed in terms of all above parameters. Vertica features listed below
  • RDBMS  compliant
  • Elastic, Scalable
  • Auto tuning feature
  • Built in Analytics
  • Based on MPP Architecture
  • Real time Analytics Support

Vertica Product Details
  • Column based Storage Overview (Key differentiator is I/O savings compared to traditional Row based access)
  • Support for Column Based Compression
  • High Availability Support
  • Less storage space required to store data in Vertica compared for Hadoop for same amount of Data
Summary
  • Hadoop is good for Exploratory Analysis
  • Vertica recommended for Real time /Interactive Analytics. Vertica also provides Hadoop Connector to pull data from Hadoop

Happy Learning!!! 

Tool Developer Notes - Part V

[Previous Post in Series - Tool Developer Notes - Part IV]

This post is based on work done in last couple of weeks

Tip #1 - While trying to run a Winforms Exe, I used to get below message prompt in my Win7 Machine

After a little google effort MSDN forum question was useful to fix this warning. Workaround is edit the manifest file in Debug folder which contains the Exe file

Added below lines of code

This fixed the message.

Tip #2 - C# File Exists Check

Tip #3 - C# Directory Exists Check

Tip #4 - C# - Check Return Value in SQL Command Return

In the earlier post on Tool Developer Notes Series I was not checking if the return value is NULL,
  • Modifying the example to check returned object is not null
  • Simple Windows Console Application
  • Add the references as per mentioned example code snippet
using System;
using System.Collections;
using System.Collections.Generic;
using System.Configuration;
using System.Data;
using System.Data.SqlClient;
using System.IO;
using System.Linq;
namespace TestExample
{
    class Program
    {
        public static void Main(string[] args)
        {

            Program P = new Program();
            P.FetchScalarValue();
        }

        public void FetchScalarValue()
        {
            try
            {
                Console.WriteLine("Fetch Single DB Values Example \n");
                SqlCommand comm = new SqlCommand();
                comm = SetUpConnection();
                OpenConnection(comm);
                comm.CommandType = CommandType.Text;
                comm.CommandText = @"SELECT Top 1 FirstName
                                  FROM [AdventureWorksLT2008R2].[SalesLT].[Customer]";
                object result;
                result = comm.ExecuteScalar();
                if(result!= null)
                {
                    Console.WriteLine("Query Result is " + result.ToString());
                }
                Console.ReadLine();
                Console.ReadLine();
            }
            catch (Exception EX)
            {

            }
        }
        public SqlCommand SetUpConnection()
        {
            string strConn = ConfigurationManager.AppSettings["ConnectionString"];
            SqlCommand comm = new SqlCommand();
            comm.Connection = new SqlConnection(
                strConn);
           return comm;
        }
        public void OpenConnection(SqlCommand comm)
        {
            comm.Connection.Open();
        }
        public void CloseConnection(SqlCommand comm)
        {
            comm.Connection.Close();
        }
    }
} 

App.config file Entry


<?xml version="1.0" encoding="utf-8" ?>
<configuration>
      <appSettings file="" >
            <clear />
            <add key="ConnectionString" value="Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=AdventureWorksLT2008R2;Data Source=.\SQLSERVER2008R2" />
      </appSettings>
</configuration>


Happy Learning!!!

March 17, 2012

TSQL Enhancements in SQL 2012 - Part III

[Previous Post in Series - TSQL Enhancements in SQL 2012 - Part II]

This dedicated post is for SQL 2012 Paging feature. This is a very important feature in terms of performance impact. I have seen pagination playing a crucial load while data loading. This inbuilt paging support would help to fetch only required data.

It would be intersting to see how paging is implemented by the Database Engine. How execution Plan looks like when you run a paging query.
Scenario
  • Create a Table
  • Populate a million records
  • Add Indexes on columns
  • Run Paging queries
  • Check Execution Plan
Step 1 - Creating Table and Populating Data
Step 2 - SELECT query with Filter Condition using Primary Key Column. As you can see it is a Clustered Index Seek Operation
Step  3 - Implement the Same using Paging Option. Check the Execution Plan  



Step 4 - Results of Paging Query

Since We are requesting pagination on a clustered index column, I expected this must be a seek but it seems to be a SCAN.This might be a costly operation. I hope to post answer for it in next set of posts.


More Reads
Happy Learning!!!!

TSQL Enhancements in SQL 2012 - Part II

[Previous Post in Series - TSQL Enhancements in SQL 2012 - Part I]


SEQUENCE

  • This feature is similar to Identity Concept
  • Normally an Identity Column is set automatically, In case of Sequence the value is set in TSQL code level
  • Similar to Identity You can set increment value, set max value
  • From the Virtual Lab training material it is mentioned - Recommened scenario is when you need the value before loading in Table, Increment, Set Max Value limit
Scenario
  • Create a Sequence from Start Value - 0, Increment by 5, Max Value - 50, Again Reset Start from 0

    Format 

    • Specify Format. Reusing Date Examples from Post

    TRY CONVERT

    • TRY to Convert Value, On Success Value returned else NULL
    • USEFUL when you do row by row operation and if one of value exceed limit (ex- assign int variable big int value / string value)
    Paging - Pagination of Results, Very Nice Feature, We will try this in next post..

    Happy Learning!!!!!

    March 16, 2012

    TSQL Enhancements in SQL 2012 - Part I

    [Next Post in Series - TSQL Enhancements in SQL 2012 - Part II]

    I had taken Session on SQL 2008 TSQL Enhancements, SQL 2008 R2 TSQL Enhancements in my earlier roles. This post is to cover TSQL Enhancements in SQL 2012. Online SQL Server Virtual Labs is available in Link. You should give it a try. I have tried couple of exercises for Enhancements in TSQL for SQL Server 2012
    • Throw Feature Support in Try-Catch
    • IIF - Simplify IF-ELSE Logic
    • CHOOSE - Select a value from a list based on index of value
    • CONCAT - String Concatenate Feature
    Throw Keyword Support

    Feature addition is Throw keyword is supported to throw exception. This is useful for debugging, You can set throw option to debug. For logging you can use catch option and log errors for production database

    Normal Try-Catch Scenario


    With Throw Keyword
     
    Concat
    • Concatenate Column Values, Useful for Results Reporting Purpose


    IIF and CHOOSE
    • IIF simplifies IF else condition specifying both in a single line
    Example - Maximum of 3 numbers

    CHOOSE
    • Return based on index value specified
    Next Post we would cover
    • TRY_CONVERT
    • Format
    Happy Learning!!!!

    Tool Developer Notes - Part IV

    [Previous Post in Series - Tool developer notes part III]
    Tip #1 - Initially I used to read IP address from App.config File for my C# Application. Based on feedback I moved the URL assigment in implementation. Only IP Address and Port would be read from App.config

    While trying to compile below is error. Error - C# + System.UriFormatException: Invalid URI: A port was expected because of there is a colon (':') present but the port could not be parsed.
    Below link was useful for the answer. Let's try out that answer

    Step 1 - A simple Windows Forms App

    Under App.Config App setting add following keys

    <!--IP Address-->
    <add key="IPAddress" value="117.00.162.110"/>
    <!--Port-->
    <add key="Port" value="1433"/>

    Step 2 - Add Windows Configuration Reference and System.Web Reference



    Step 3 - For the button click add below code in the method

    string IpAddress = ConfigurationManager.AppSettings["IpAddress"];
    string Port = ConfigurationManager.AppSettings["Port"];
    string TestUrl = string.Format(@"http://{0}:{1}/test/services/testnode/", HttpUtility.UrlEncode(IpAddress),
    HttpUtility.UrlEncode(Port));
    Uri U = new Uri(TestUrl, UriKind.Absolute);
    MessageBox.Show(U.ToString());

    Using HttpUtility and Uri for setting up URL fixed the issue. If the URL is relative you need to use relative option while setting up new uri (uniform resource identifier). Also note there is a double '//' after the http. This is also very important.

    Tip #2 - Log4Net logging had duplicate entries created in logs. This was due to configuration setting in app.config file. Stackoverflow answer was useful to correct duplicate logging issue. Lets try a sample code for Log4Net logging. Earlier example please check in link

    Modified App.Config File

    <?xml version="1.0" encoding="utf-8" ?>
    <configuration>
    <configSections>
    <sectionname="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler,log4net" />
    </configSections>
    <appSettingsfile="" >
    <clear/>
    <addkey="log4net.Internal.Debug" value="false"/>
    </appSettings>
    <!-- This section contains the log4net configuration settings -->
    <log4netdebug="true">
    <appendername="LogFileAppender" type="log4net.Appender.FileAppender">
    <layout type="log4net.Layout.XMLLayout" /> -->
    <param name="File" value="Log4NetExample.log"/>
    <param name="AppendToFile" value="false" />
    <layout type="log4net.Layout.PatternLayout">
    <header type="log4net.Util.PatternString" value="[START LOG] %newline" />
    <footer type="log4net.Util.PatternString" value="[END LOG] %newline" />
    <conversionPattern value="%d [%t] %-5p - %m%n"/>
    </layout>
    </appender>
    <!-- Setup the root category, add the appenders and set the default level -->
    <root>
    <level value="INFO" />
    <appender-ref ref="Console" />
    <appender-ref ref="LogFile" />
    </root>
    <!-- Specify the level for some specific categories -->
    <loggername="log4NetExample" additivity="false">
    <!-- <appender-ref ref="B" /> -->
    <level value="INFO" />
    <appender-ref ref="LogFileAppender" />
    </logger>
    </log4net>
    </configuration>

    This config changes to previous example fixed the duplicate logging issue. I still need to explore all the Log4Net settings. Hoping to try them out in next set of posts.
      
    Tip #3 - This is for error - "The AXIS engine could not find a target service to invoke!  targetService is ABC/". When I was trying to invoke a service through proxy code i received this error. The reason is there was a front slash '/' which need to be removed to fix this issue. There were so many answers provided for this error. Hint to remove slash was useful from answer

    Tip #4 - Read Text from a File and Encode it in UTF8 format

    Tip #5 - Reading a File till end

    Tip #6 - Why do I get the error "Object reference not set to an instance of an object"?


    Happy Learning!!!

    March 04, 2012

    Big Data - Basics - Getting Started

    [You may also like - NOSQL - can it replace RDBMS Databases?]

    This post is towards learning fundamentals and evolution of big data computing. Based on my discussion with one of my colleague. The quest to find the details of big data. Where is started, why it is needed, What is the current state of Big Data Computing ?


    Why Big Data ?
    • Distributed Data processing, Supporting Massive Data (Peta bytes), Scalability were challenges in traditional BI systems (MSSQL, ORACLE and other BI solution providers)
    • An alternative to Transaction processing systems based on ACID properties, fixed Schema design, scalability issues led to evolution of NOSQL Databases, Hadoop based systems
    • Search engines, social networking sites accumulate large amounts of data in a very short time
    • Scalability, flexible schema support, indexing support are properties of NOSQL systems
    • Moving away from traditional ETL based data processing which took alot of time to consolidate various data sources and process large amount of data
    Phases Involved in Traditional BI Processing
    MSBI
    • ETL Processing
    • Build Data Marts
    • Build Cubes
    • Run SSRS reports 
    Phases Involved in Big Data Processing
    • Storage can be Hadoop based / NOSQL Based. It would be useful to check on evolution of Hadoop.
    • Detailed BI processing in Hadoop. Presentation Realtime BI in Hadoop is useful
    • Some important metrics in data processing in big data. Yahoo processed 1 TB data in 16 Secs, 1 PB data in 16 Hours (Source - Link - Slide 29)
    • Hadoop Vs RDBMS (Slide - 17 of presentation is good) 
    How Big Data Evolved
    • Everything started with google map/reduce approach. Followed by Hadoop evolution by 2006
    • Yahoo, Facebook, twitter and other major players opted for Hadoop based databases, NOSQL databases
    • Reference - link was useful
    How Map reduce works
    • Input data is converted (reduced) into meaningful key / Value pairs
    • Since data from source is in processed (reduced) there is no need to load data and process it at the server level
    • This reduced data is consolidated from various sources is used for Data Analytics / further Data processing (Data Marts etc..)
    • Post is very good note in simple terms to learn Map / Reduce implementation - Map reduce includes (Distributed Data Processing, Data stored as Keys)
    • The program mentioned word count is available in link
    How Hadoop works
    • Hadoop is a Framework for Distributed Data Processing
    • Based on Map Reduce Approach
    • Slide 9 of presentation is very good representation of Hadoop setup.
    • The Key components include HBASE for storage, HIVE - Query language for Hadoop
    • SQOOP - Import data from RDBMS systems to Hadoop Clusters, Pig, Avro etc..
    • Good Presentation - link
    Summarizing Key points on Hadoop Usage
    • Suitable for Data Mining, Analytics from Unstructured data
    • Not Recommened for RDBMS compliant systems - Banking, OLTP based systems, financial systems etc..
    How Microsoft & Oracle play with Big Data
    Startups in Big Data space

    • NUODB - Cloud based RDBMS compliant database. Capable of large data processing and a competitive player in big data space
    • SPIRE - Based on Hadoop and HBASE. Real time scalable database
    • Rethink DB - Key Value pair based Storage
    • Emergence of Columnar Database, Vertica Ranked No.1 for Columnar Data
    More Reads 
    Planning to get started with Hadoop, NuoDB in coming weeks....

    Another Excellent Articles Collection List from Wikibon
    Big Data: Hadoop, Business Analytics and Beyond
    Real-Time Data Management and Analytics Come in Many Flavors
    Big Data Market Size and Vendor Revenues
    Microsoft is BIG in Big Data

    Happy Learning!!!