"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

August 16, 2014

Hbase Primer - Loading Data in HBASE Using SQLOOP / Java Code

This post is on examples using SQOOP / custom java code to import data into HBASE, HIVE, HDFS from MSSQL DB

Tried the steps on Cloudera VM - cloudera-quickstart-vm-4.4.0-1-vmware

From Linux terminal > hbase shell
create 'Employee_Details', {NAME => 'Details'}

Example #1 (Import to Hbase from MSSQL DB Table)
sqoop import --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --hbase-table "Employee_Details"  --column-family "Details" --table "dbo.employee"  --split-by "id"

Example #2 (List Tables in MSSQL DB Table)
sqoop list-tables --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver"

Example #3 (Import to HDFS from MSSQL DB Table)
sqoop import --connect "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" --table "dbo.employee"  --split-by "id"

hadoop fs -ls (List files in hadoop file system)

Example #4 (Import into Hive Table from MSSQL DB Table)
sqoop import --hive-import --create-hive-table --hive-table Employee_Hive --connect  "jdbc:sqlserver://11.11.11.11;database=TestDB" --username "sa" --password "sa" --driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" -m 1 --table "dbo.employee" 

Example #5 - Custom Java Code
  • Add all required Jars need to be added to compile the project. This was one of challenges to get this code working 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;

public class HBaseTest
{
   public static void main(String[] args) throws IOException
   {
          HBaseTest HT = new HBaseTest();
          HT.DropTable();
          HT.CreateTable();
          HT.UpdateRecords();
   }  

   public void DropTable()
   {
          try
          {
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HBaseAdmin admin = new HBaseAdmin(config);
                admin.disableTable("Employee_Details");
                admin.deleteTable("Employee_Details");
          }
          catch(Exception Ex)
          {
             
          }
   } 

   public void CreateTable()
   {
          try
          {
                Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HTableDescriptor ht = new HTableDescriptor("Employee_Details");
                ht.addFamily( new HColumnDescriptor("Id"));
                ht.addFamily( new HColumnDescriptor("Details"));
                HBaseAdmin hba = new HBaseAdmin(config);
                hba.createTable( ht );
          }
          catch(Exception Ex)
          {
                
          }
   }

   public void UpdateRecords()
   {
          try
          {
          Configuration config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "localhost");
                config.set("hbase.zookeeper.property.clientPort", "2181");
                HTable table = new HTable(config, "Employee_Details");
             Put put = new Put(Bytes.toBytes("row1"));
             put.add(Bytes.toBytes("Details"),Bytes.toBytes("Name"),Bytes.toBytes("Raka"));
             put.add(Bytes.toBytes("Details"),Bytes.toBytes("Location"),Bytes.toBytes("Chennai"));
             put.add(Bytes.toBytes("Id"),Bytes.toBytes("Eid"),Bytes.toBytes("Chennai"));
             table.put(put);
             table.close();
          }
          catch(Exception Ex)
          {
                
          }
       }
}

This post was useful to try it out SQOOP Examples

Happy Learning!!!

August 09, 2014

HBase Primer - Querying for Seeks / Scans - Part I

This post is HBase Primer on Seeks / Scans. I recommend spending time on HBase Schema design to have a basic understanding of HBASE table structure. The motivation slide for this post is from Cloudera Session slides post.




I wanted to try out the suggested pattern of tables. This can be compared to TSQL Equivalent of DDL, DML Scripts. Querying, Select with filters is the key learning from this exercise. Key Learning's are select with filter examples. There are no aggregate functions available in Hbase. Hive and Phoenix which sits on top of Hbase Serves for this purpose of aggregations.

HBASE (One Liner)- Low Latency, Consistent, best suited for random read/write big data access

Few bookmarks provide great short intro - link1

Hbase Queries (Tried it on cloudera-quickstart-vm-4.4.0-1-vmware)
hbase shell

Disable and Drop Table
Disable 'Employee'
Drop 'Employee'

Create Table
Details is a column family. Inside this column family there are members Name, Location, DOB and Salary

create 'Employee', {NAME => 'Details'}

Insert Records
put 'Employee', 'row1', 'Details:Name', 'Raj'
put 'Employee', 'row1', 'Details:Location', 'Chennai'
put 'Employee', 'row1', 'Details:DOB', '01011990'
put 'Employee', 'row1', 'Details:Salary', '1990'

put 'Employee', 'row2', 'Details:Name', 'Raja'
put 'Employee', 'row2', 'Details:Location', 'Delhi'
put 'Employee', 'row2', 'Details:DOB', '01011991'
put 'Employee', 'row2', 'Details:Salary', '5000'

put 'Employee', 'row3', 'Details:Name', 'Kumar'
put 'Employee', 'row3', 'Details:Location', 'Mumbai'
put 'Employee', 'row3', 'Details:DOB', '01011992'
put 'Employee', 'row3', 'Details:Salary', '50000'

Select based on Column Qualifiers
scan 'Employee', {COLUMNS => ['Details:Name', 'Details:Location']}
scan 'Employee', {COLUMNS => ['Details:Location']}

Single Filter - Search by Location Column Filter
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai')" }

ValueFilter - Search by Location Value
scan 'Employee' , { COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Multiple Filter - Search by Name and Location
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND SingleColumnValueFilter('Details','Name',=, 'binary:Raj')"}

Timestamp and Filter - Search with Timestamp and Filter
scan 'Employee' ,{TIMERANGE=>[1407324224184,1407324224391], COLUMNS => 'Details:Location', FILTER => "ValueFilter (=, 'binary:Chennai')"}

Timestamp and Filter - Search with multiple Filter on same column (contains)
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') OR SingleColumnValueFilter('Details','Location',=, 'binary:Delhi')"}

Filter using regExMatch
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'regexstring:Che*',true,true)" }

Search using Prefix
scan 'Employee' ,{ FILTER => "SingleColumnValueFilter('Details','Location',=, 'binary:Chennai') AND PrefixFilter ('row1')"}

Return only one record
scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 1, STARTROW => 'Chennai'}

Return two records
scan 'Employee', {COLUMNS => ['Details:Location'], LIMIT => 2, STARTROW => 'Chennai'}

Return records between range of rowkeys
scan 'Employee', {COLUMNS => ['Details:Location'], STARTROW => 'row1',STOPROW=>'row2'}

Get Specific Row with Filter
get 'Employee', 'row1', {FILTER => "ValueFilter (=, 'binary:Chennai')"}

Count records
count 'Employee'

Timestamp Range - Return within the timestamp range
scan 'Employee' ,{TIMERANGE=>[1407324224391,1407324234707]}

Hbase Table 3 Column Family
create 'Employee1', {NAME => 'Name'}, {NAME => 'Location'}, {NAME => 'DOB'}

Query with column family
scan 'Employee1', {COLUMNS => ['Name']}

Delete a Column Family
alter ‘Employee', ‘delete’ => ‘Location'