Friday, March 8, 2013

Hadoop Related Jargons

Hadoop implements a computational paradigm named MapReduce. If you have no idea what is MapReduce, read the Wikipedia article here.

Some Hadoop Related Terms:
  • Hadoop: developed by Y!, a map-reduce implementation
  • HDFS: Distributed file system written in Java for the Hadoop framework
  • Pig: High level scripting language to work with Hadoop
  • Hive: Data warehouse infrastructure to work with Hadoop, uses HiveQL (an SQL-like language) 
  • HBase: A non-relational key/value datastore to work with Hadoop
  • Mahout: A set of machine learning algorithms to work with Hadoop on Big data
Some Other Systems Similar to Hadoop:
  • Dryad
    • Developed at Microsoft
    • Tasks modeled as directed acyclic graph
    • Sequential programs are connected using one-way channels
  • S4
    • Developed by Y!
    • Stream processing
    • Using Java Platform
  • Spark
    • Developed at UC Berekley
    • In-memory queries, not just IO requests
    • Implemented in Scala
    • Needs a cluster manager (called Mesos)
  • Storm
    • Developed by Twitter
    • Stream processing
    • Guarantees message processing
  • BashReduce
    • works with Linux commands such as sort, grep
  • Disco
    • Developed at Nokia 
    • Backend is written in Erlang
    • Works with Pyton 
    • developed at Nokia
  • GraphLab
    • Developed at CMU
    • For machine learning tasks
    • Data should fir in main memory
    • Is not fault tolerant
  • HPCC
    • Uses Enterprise Control Language (ECL)
    • In C++

No comments:

Post a Comment