Fun With Data: Hadoop Related Jargons

Friday, March 8, 2013

Hadoop Related Jargons

Hadoop implements a computational paradigm named MapReduce. If you have no idea what is MapReduce, read the Wikipedia article here.

Some Hadoop Related Terms:

Hadoop: developed by Y!, a map-reduce implementation
HDFS: Distributed file system written in Java for the Hadoop framework
Pig: High level scripting language to work with Hadoop
Hive: Data warehouse infrastructure to work with Hadoop, uses HiveQL (an SQL-like language)
HBase: A non-relational key/value datastore to work with Hadoop
Mahout: A set of machine learning algorithms to work with Hadoop on Big data

Some Other Systems Similar to Hadoop:

Dryad

Developed at Microsoft
Tasks modeled as directed acyclic graph
Sequential programs are connected using one-way channels

S4

Developed by Y!
Stream processing
Using Java Platform

Spark

Developed at UC Berekley
In-memory queries, not just IO requests
Implemented in Scala
Needs a cluster manager (called Mesos)

Storm

Developed by Twitter
Stream processing
Guarantees message processing

BashReduce

works with Linux commands such as sort, grep

Disco

Developed at Nokia
Backend is written in Erlang
Works with Pyton
developed at Nokia

GraphLab

Developed at CMU
For machine learning tasks
Data should fir in main memory
Is not fault tolerant

HPCC

Uses Enterprise Control Language (ECL)
In C++

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)