Name: StreamSets DataOps Platform
Rating: 8.2 (8 reviews)
Author: StreamSets

Overview

What is StreamSets?

StreamSets in San Francisco offers their DataOps Platform, a subscription based streaming analytics platform including StreamSets Data Collector data source management, Control Hub for data movement architecture management, StreamSets Data Collector Edge IoT manager, DataFlow Performance Manager (DPM), and…

Learn from top reviewers

Streamsets : A Powerful DataEngineering + DataOPs Tool

Rating: 9 out of 10

May 6, 2022

IncentivizedVetted ReviewVerified User

Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from va...

Abhishek Katara

Engineer - Information Technology

Assistant Consultant

Tata Consultancy Services

4 years of experience

Log in with LinkedIn to see content from your network

Commonly Discussed Topics

Share Feedback

These are common buyer considerations generated to help you find the best products. While this is a beta feature, it is our mission is to provide you with the best information possible to make confident and trusted technology decisions.

Return to navigation

Pricing

View all pricing

StreamSets

N/A

Unavailable

What is StreamSets?

StreamSets in San Francisco offers their DataOps Platform, a subscription based streaming analytics platform including StreamSets Data Collector data source management, Control Hub for data movement architecture management, StreamSets Data Collector Edge IoT manager, DataFlow Performance Manager…

Entry-level set up fee?

No setup fee

Offerings

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services

Would you like us to let the vendor know that you want pricing?

9 people also want pricing

Alternatives Pricing

Striim

$4,400

per month per 100 million Striim events

What is Striim?

Striim is an enterprise-grade platform that offers continuous real-time data ingestion, high-speed in-flight stream processing, and sub-second delivery of data to cloud and on-premises endpoints.

Cloudera Data Platform

$0.04

per CCU (hourly rate)

What is Cloudera Data Platform?

Cloudera Data Platform (CDP), launched September 2019, is designed to combine the best of Hortonworks and Cloudera technologies to deliver an enterprise data cloud. CDP includes the Cloudera Data Warehouse and machine learning services as well as a Data Hub service for building custom business…

Return to navigation

Product Details

About
Tech Details

What is StreamSets?

StreamSets Technical Details

Operating Systems	Unspecified
Mobile Application	No

Return to navigation

Comparisons

View all alternatives

Compare with

Reviews From Top Reviewers

June 6th 2024

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Business Problems Solved
Recommendations

Users have found Streamsets to be a versatile and user-friendly platform that solves a variety of data integration challenges. One key use case is the ability to easily develop on-premises and deploy to the cloud, helping users control their cloud budget efficiently. The platform has also been praised for its seamless integration with Apache Kafka and Apache Nifi, simplifying the process of connecting these tools with a data lake.

Streamsets has proven valuable in handling real-time data consumption, filtering, tagging, and monitoring of systems, as well as anomaly detection based on traffic patterns. Users have utilized the platform for data movement, migration, and ingestion, reducing downtime and simplifying the process. Additionally, Streamsets has been widely used for data extraction from various source systems, including IoT devices, enabling users to gain insights from previously inaccessible data sources.

The tool's ability to handle different data formats elegantly and save time compared to hand-coded ETL tools has been appreciated by users. It has been effectively used for solving big data ETL problems, offering fast transfer, support for various sources and destinations, and prompt support. Streamsets has also been utilized in AI/ML tasks such as building transformations for knowledge graphs.

Overall, Streamsets has proven reliable and efficient in handling data ingestion from various sources, meeting the needs of users across industries and providing flexibility in designing pipelines with minimal coding.

Users have made several recommendations for StreamSets based on their experiences.

Firstly, they suggest trying out the data collector, as it is free to download and install. This allows users to explore the capabilities of the tool without any financial commitment.

Secondly, users recommend using Docker for local testing and deployment in a development environment. This suggestion helps streamline the process and ensure smooth integration with other systems.

Lastly, users praise StreamSets as one of the best ETL/ELT tools for data ingestion. They mention its ability to handle large volumes of data efficiently. Additionally, users appreciate the high level of customization offered by StreamSets, allowing them to tailor it to meet their specific enterprise needs. They also commend the support team for their dedication in tweaking the software for missing components.

To optimize performance, users advise analyzing data transfer requirements carefully and configuring the data conversion nodes appropriately. They emphasize the need for sufficient memory to support these requirements.

Overall, these recommendations highlight StreamSets' value as a versatile tool for fast-paced Data Engineering pipeline development and reliable data ingestion, especially when dealing with large amounts of data.

View all reviews

Companies can't remove reviews or game the system. Here's why

(1-1 of 1)

Sort By *

Streamsets : A Powerful DataEngineering + DataOPs Tool

Rating: 9 out of 10

Incentivized

May 06, 2022

AK

Abhishek Katara

Assistant Consultant

Tata Consultancy Services (Information Technology & Services, 10,001+ employees)

Vetted Review

Verified User

4 years of experience

View profile

Use Cases and Deployment Scope

Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP. These Datasets are further utilized by Data scientists and analysts to generate patterns and insights for the healthcare benefits of customers.

We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.

Pros

A easy to use canvas to create Data Engineering Pipeline.
A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
Supports both Batch and Streaming Pipelines.
Scheduling is way easier than cron.
Integration with Key-Vaults for Secrets Fetching.

Cons

Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
There should be a concept of creating Global variables which is missing.

Likelihood to Recommend

Majorly for all Batch and Streaming Scenarios we are designing StreamSets pipelines, few best suited and tried out use cases below :
1. JDBC to ADLS data transfer based on source refresh frequency.
2. Kafka to GCS.
3. Kafka to Azure Event.
4. Hub HDFS to ADLS data transfer.
5. Schema generation to generate Avro.

The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.

StreamSets Feature Ratings

Streaming Analytics (5)

90%

9.0

Visualization Dashboards: 70%
7.0
Low Latency: 80%
8.0
Integrated Development Tools: 100%
10.0
Data wrangling and preparation: 100%
10.0
Data Enrichment: 100%
10.0

Return on Investment

Simplified Improvised Overall data ingestion and Integration Process.
Support to various Hetrogenous Source systems like RDBMS< Kafka, Salesforce, Key Vault.
Secure, easy to launch Integration tool.

Alternatives Considered

Cloudera Distribution Hadoop (CDH)

StreamSets is a one-stop solution to design Data engineering Pipelines and doesn't require deep Programming knowledge, It's so user-friendly that anyone in Team can contribute to the Idea of pipeline design. In Hadoop One has to be programming proficient to use its various components like Hive, HDFS, Kafka, etc but in StreamSets all these stages are built-in and ready to use with minor configuration.

Other Software Used

Apache Hadoop, Apache Kafka, Microsoft Azure Key Vault, Azure Data Lake Storage

Return to navigation

How effective is StreamSets DataOps Platform for data management tasks?

Is StreamSets DataOps Platform efficient for system integration tasks?

Are StreamSets DataOps Platform's developer tools efficient for data integration?

How effective is StreamSets DataOps Platform for deployment and performance?

StreamSets

Striim

Cloudera Data Platform

dbt

Apache Kafka

Apache Airflow

Informatica PowerCenter

Bitbucket

Cribl Stream

IBM Streams (discontinued)

Fivetran

Pentaho

IBM Event Streams

Community Insights

Cons