"No one is harder on a talented person than the person themselves" - Linda Wilkinson ; "Trust your guts and don't follow the herd" ; "Validate direction not destination" ;

September 28, 2014

Pycon 2014


Every time when I attend a conference I plan to post my notes on same day. Delaying to post is inversely proportional to probability of posting it. After missing few conferences, Today this post is on my learning's from python conference 2014. There is a lot of motivation / inspiration to deliver / learn after every conference.

Interesting Quotes 

"Functionality is an asset. Code is a liability" by @sanand0
"Premature Optimization is baby evil" by @sanand0
"If it's not tested, it doesn't work. If it's tested, it may work" - @voidspace
'Libraries are good, your brain is better' - @sanand0. 

Short Notes from Sessions

Panel Discussion on Python Frameworks - Django, Flask , Web.Py 
  • Discussion was pretty interesting. For beginner level (Web.Py followed by Flask)
  • Django Elephant in the room scored over the rest based on usage, features, documentation
Interesting Talks and Notes are shared in Auth Evolution & Spark Overview Posts.

Happy Learning!!!

Auth Evolution

This session Auth as a service  by Kiran provided good overview of evolution of authentication the past decade

Complete text of presentation is available in link. The text is pretty exhaustive. I am only writing key points for my reference
  • HTTP basic Auth
  • Cookies
  • Cryptography Signed Tokens
  • HTTPS
  • Database backed sessions 
HTTP basic Auth - Username and password sent in the HTTP Request. To logout you need to send a wrong password, This gets preserved and server rejects the request after that

Cookies - Regular HTML form with username and Password encoded and put in HTTP cookie. This is sent in every request

Cryptographically signed tokens - random key + user name. Now cookie will be checked against the key to verify its the same user. Plus SSL on top it made sure most of issue are fixed

Database backed sessions - This is very nice one. These days I get notifications in Quora / google. You have these many open sessions / previously logged locations. This is all through database backed sessions. This seems to address all issues that came up as limitations of previous approaches.

Good Refresher!!!

Happy Learning!!!

Spark Overview


I remember Spark keyword appeared during Big Data Architecture discussions in my Team, I never looked more into Spark. Session by Jyotiska NK on Python + Spark: Lightning Fast Cluster Computing was a useful starter about Spark. (slides of talk)

Spark 
  • In memory cluster computing framework for large scale data processing
  • Developed using scala with Java + Python APIs
  • This is not meant to replace hadoop. It can sit on top of Hadoop
  • References on Spark Summit for Slides / Videos to learn from past events - link 
Python Offerings
  • PySpark, Data pipeline using spark
  • Spark for real time / batch processing
Spark Vs Map Reduce Differences
This section was session highlighter. They way how data is handled between Map Reduce Execution and Spark Approach is Key.

Map Reduce Approach - Load Data from Disk into RAM, Mapper, Shuffler, Reducer are the different approaches. Processing is distributed. Fault Tolerance is achieved by replicating data 

Spark - Load data in RAM, Keep it until you are done, Data is cached in RAM from disk for iterative processing. If data is too large, rest is spilled into disk. Interactive processing of datasets without having to load data in memory. RDD (Resilent distributed datasets)

RDD - Read Only collection of objects across machines. On losing information this can still be recomputed. 

RDD Operations
  • Transformations - Map, Filter, Sort, flatmap
  • Action - Reduce, Count, Collect, Save to local data in disk. Action usually involves disk operations

More Reads
Testing Spark Best Practices
Gatling - Open Source Perf Test Framework
Spark Paper

Happy Learning!!!