"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 29, 2013

Exploring Hortonworks Sandbox - Part I (on Windows 7)

Setup Steps
  1. Downloaded Virtual Box from Link
  2. Howtonworks windows tutorial Link
  3. Download 1.8GB Hortonworks Sandbox from Link 
  4. After Configuring it, Ran through the first tutorial - Link
  5. Started the Server and Open-up in browser IP address http://192.168.56.101:8000/ from Win7 Machine
  6. Sandbox was setup and configured to use 192 series IP Address. Was able to use the Win7 browser interface to perform file upload, query operations
  7. Credentials to logon on Server Login: root, Password: hadoop
  8. Command to shutdown is poweroff

First Example Notes
  • Downloaded example data from Link 
  • Upload worked in Google Chrome not in IE
  • Poweroff is the command to poweroff the sandbox machine
  • Uploading data, running basic Select queries worked fine
More Info Tutorials
My Feedback
  • Impressive easy setup and easy to use
  • Got Started in < 2 hrs
  • Good Learning Start
More Reads

WTF does a Data Scientist do all day?

How do I become a data scientist?

What are some software and skills that every Data Scientist should know?

Read Quote of Joe Blitzstein's answer to Data Science: What is it like to design a data science class? on Quora

Read Quote of Nishant Neeraj's answer to Big Data: What should be ideal size, skill set and composition of team for a successful Big Data implementation in an organization? on Quora

Read Quote of Sean Owen's answer to Job Interviews: How can a computer science graduate student prepare himself for data scientist/machine learning intern interviews? on Quora

Read Quote of Pronojit Saha's answer to Data Science: What are some software and skills that every Data Scientist should know? on Quora

Read Quote of Ye Zhao's answer to How do I become a data scientist? on Quora

How does one begin to learn data science?

Harvard Data Science Course  

Software engineer's guide to getting started with data science


Happy Learning!!!

September 20, 2013

Advanced Cloud Computing 2013 Notes


Advanced Cloud Computing 2013 Notes. Yesterday I attended ACC2013 held @ Nimhans. Every conference provides a lot of inspiration and motivation to try out new things.

Session #1 - Inauguration Talk

Inauguration was done by Padma Bushan  Rajaram. He explained cloud computing in very simple terms. Cloud computing is utility computing in simple terms. It was coined by a management professor named Chellapa.
He had earlier written a paper in 2005 on challenges in utility computing. He recollected his experiences with new jargon's/ technical abbreviations. He provided several examples on BYOD keyword. BYOD - Bring Your Own Drinks, BYOD - Bring Your Own Dope, BYOD - Bring Your Own Device (Recent usage). Storage and processing costs have come down. This has become a business potential for Amazon and Google to leverage it by outsourcing their excess storage and processing infrastructure.

He stressed on several areas to standardize the cloud for leverage the complete potential of it. Example- SLA to provide the required performance while hosting / sharing the infrastructure, Developing a universally usable cloud, interoperability between cloud providers

Session #2 - This was Talk by Karanataka's IT secretary VidyaShankar.IAS

He mentioned on developing trends Virtualization, Cloud and 3D Technologies. He mentioned
couple of products cloudmagic, cubby.

Session #3 - Connected Systems, Cloud beginner tech talk by Vikas Agarwal (Tally)

This talk pretty much focused on evolution of cloud computing. He tracked from the very beginning PC Era to cloud computing.
  •  Stage 1 - Mainframe Systems
  •  Stage 2 - PC Era (Moving data to personal systems)
  •  Stage 3 - LAN (Locally connected systems) - Intranets
  •  Stage 4 - Connected Era, WANs
  •  Stage 4 - Evolution of Internet (Globally Connected)
  •  Stage 5 - Cloud (Shared computing, storage) - Access anytime / anywhere
Challenges / Features in Cloud
  •  Pooling Optimization
  •  Elasticity
  •  Efficiency
Session #4 - Big Data in Safety & Security Domain, Tech talk by Bob Brewin (Tyco)

This talk focused on basics of cloud computing, challenges and applications in Fire and Security Domain. Key notes covered were
  • Fallacies in Distributed computing
  • Current Challenges in Fire & Security Domain are
  • Identifying False Positives
  • Predictive Analytics to identify and isolate false positives
  • Real time monitoring
Session #5 - Cloud Services in Yahoo by Jothi Padmanabhan

Yahoo has its own private cloud, Author provided details on Yahoo infrastructure and their software stack
 Challenges
  • Scaling systems as per growing data
  • Data Partitioning
  • Data Consistency
  • Hardware provisioning
Benefits of Private Cloud
  • Developers can focus on Application logic instead of designing for crash / recovery scenarios
  • Focus on appealing content for users (UX Exp)
Requirements for Cloud
Multitenancy
  • Several applications will share the same hardware and software
  • Resources can be shared but there should not be performance conflict between resources
  • Multiple Apps will be running in parallel
  • Spike in resource consumption of one app should not affect other application's performance
  • SLA defined for performance need to be met for all hosted apps
Elasticity
  • Applications will have projected capacity vs actual capacity
  • Based on a ball park figure but actual load will be measure when the product is implemented
  • Scale as you need
Scalable
  • Process several requests, Store Huge data, Analytics on top of data are offerings
Other key aspects include Availability, Security, Metering, Global APIs, Load Balancing, Simple API's

More Detailed Architecture is explained in paper link
  • Overview of Open Stack
  • Apache Traffic Server used as caching proxy server
  • Proxy (Route Traffic through intermediate steps)
  • Reverse proxy vs Forward Proxy (Several Variations)
  • Yahoo has 25K Clusters and 40K Servers
  • Mobstor (Storage for large unstructured files)
  • Sherpa (NOSQL solution from Yahoo)

Happy Learning!!!

September 01, 2013

pyCon India 2013 - Day 2 Session Notes

Please find second day session notes
Session #1 – Rasberry PI basics by Sudar Muthu
Good basic session. Speaker presented the content and demo very well. Notes from the session
  • Simplest helloworld program on Rasberry PI is a light blink program
  • Speaker also spoke about controlling devices
  • Using PWM (Pulse width modulation) devices can be managed
  • PWM.py – pull up (Higher Voltage), pull down (Lower Voltage)
  • Protocols – I2C, SPI, Serial. These protocols can be used to talk to devices
  • Interacting with web cam using PyGame
More Reads -  Distributed Computing Tutorials, Author website – HardwareforFun
Session #2 – Robotics Demo
  •  ROS (Robotics Operating System)
  • Author used RasberryPi and arduino Node
Tools
  • Speech Synthesis – Festival and pyFestival
  • Speech Recognition – Gstreamer, Pocket Sphinx
  • Artificial Intelligence – AIML, pyAIML (Artifical Intelligence Markup language)
  • GUI – QT and pyQT
Author site – Technolabz, Lentin Joseph
Session #3 – arduino and Internet of Things
Arduino – Open Source Electronics Prototyping platform
Advantages of arduino
  • Easy to use
  • Cheap
  • Open Hardware
  • Open Documents
Components
  • Hw: device : electronic prototyping board
  • Sw:bootloader
  • Sw:libraries
  • Sw: IDE
  • Interfacing – Connected to computer using Bluetooth / USB
Tools
Use Cases
  • Talk over Serial, RF, Ethernet
  • Attach Sensor and relay other readings
  • Attach Actuators and make things move
  • Connecting devices through web
  • Security Sensor, email on touch
  • Author - Avik Dhupar
  • http://www.arduino.cc/
Session #4 - Testing tools Sessions (Open Source Tools)
  • Fabric for distributed testing (This deployment tool can be used for distributed computing testing  )
  • STAF IBM Test Automation Framework Tool -
  • Nitrate Test case management tools -
  • Test link – Test case management tool
  • Beaker Project – Managing Automated Tests
Session #5 – Web Scraping
  • Author used http://scrapy.org/
  • Pablo Hoffman is the scappy developer
  • Author Anuvrat Parashar provided examples on crawaling web, collecting data, extracting information from collected data 
More Tools

Happy Learning!!!