Great Option for Unstructured Data
March 28, 2018

Great Option for Unstructured Data

Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User

Modules Used

  • Hadoop Common
  • Hadoop Distributed File System
  • Hadoop MapReduce

Overall Satisfaction with Hadoop

  • Used for Massive data collection, storage, and analytics
  • Used for MapReduce processes, Hive tables, Spark job input, and for backing up data
  • Storing Retail Catalog & Session data to enable omnichannel experience for customers, and a 360-degree customer insight
  • Having a consistent data store that can be integrated across other platforms, and have one single source of truth.

Pros

  • HDFS is reliable and solid, and in my experience with it, there are very few problems using it
  • Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
  • It provides High Scalability and Redundancy
  • Horizontal scaling and distributed architecture

Cons

  • Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates
  • Not for small data sets
  • Data security needs to be ramped up
  • Failure in NameNode has no replication which takes a lot of time to recover
  • Too many Hadoop projects have community focus divided; this causes some bug fixes to happen slow
  • Mindset change among business partners
  • Adopting Hadoop/MapReduce has a learning curve
  • For real-time streaming, use Spark; can provide a stark contrast to the way MR works
  • Use Hive for querying purposes
  • Less appropriate for small data sets
  • Works well for scenarios with bulk amount of data. They can surely go for Hadoop file system, having offline applications
  • It's not an instant querying software like SQL; so if your application can wait on the crunching of data, then use it
  • Not for real-time applications

Comments

More Reviews of Hadoop