Skip to main content
TrustRadius
StreamSets

StreamSets

Overview

What is StreamSets?

StreamSets in San Francisco offers their DataOps Platform, a subscription based streaming analytics platform including StreamSets Data Collector data source management, Control Hub for data movement architecture management, StreamSets Data Collector Edge IoT manager, DataFlow Performance Manager (DPM), and…

Read more

Learn from top reviewers

Commonly Discussed Topics

Share Feedback
These are common buyer considerations generated to help you find the best products. While this is a beta feature, it is our mission is to provide you with the best information possible to make confident and trusted technology decisions.

Return to navigation

Pricing

View all pricing
N/A
Unavailable

What is StreamSets?

StreamSets in San Francisco offers their DataOps Platform, a subscription based streaming analytics platform including StreamSets Data Collector data source management, Control Hub for data movement architecture management, StreamSets Data Collector Edge IoT manager, DataFlow Performance Manager…

Entry-level set up fee?

  • No setup fee

Offerings

  • Free Trial
  • Free/Freemium Version
  • Premium Consulting/Integration Services

Would you like us to let the vendor know that you want pricing?

9 people also want pricing

Alternatives Pricing

What is Striim?

Striim is an enterprise-grade platform that offers continuous real-time data ingestion, high-speed in-flight stream processing, and sub-second delivery of data to cloud and on-premises endpoints.

What is Cloudera Data Platform?

Cloudera Data Platform (CDP), launched September 2019, is designed to combine the best of Hortonworks and Cloudera technologies to deliver an enterprise data cloud. CDP includes the Cloudera Data Warehouse and machine learning services as well as a Data Hub service for building custom business…

Return to navigation

Product Details

StreamSets Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews From Top Reviewers

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Users have found Streamsets to be a versatile and user-friendly platform that solves a variety of data integration challenges. One key use case is the ability to easily develop on-premises and deploy to the cloud, helping users control their cloud budget efficiently. The platform has also been praised for its seamless integration with Apache Kafka and Apache Nifi, simplifying the process of connecting these tools with a data lake.

Streamsets has proven valuable in handling real-time data consumption, filtering, tagging, and monitoring of systems, as well as anomaly detection based on traffic patterns. Users have utilized the platform for data movement, migration, and ingestion, reducing downtime and simplifying the process. Additionally, Streamsets has been widely used for data extraction from various source systems, including IoT devices, enabling users to gain insights from previously inaccessible data sources.

The tool's ability to handle different data formats elegantly and save time compared to hand-coded ETL tools has been appreciated by users. It has been effectively used for solving big data ETL problems, offering fast transfer, support for various sources and destinations, and prompt support. Streamsets has also been utilized in AI/ML tasks such as building transformations for knowledge graphs.

Overall, Streamsets has proven reliable and efficient in handling data ingestion from various sources, meeting the needs of users across industries and providing flexibility in designing pipelines with minimal coding.

Users have made several recommendations for StreamSets based on their experiences.

Firstly, they suggest trying out the data collector, as it is free to download and install. This allows users to explore the capabilities of the tool without any financial commitment.

Secondly, users recommend using Docker for local testing and deployment in a development environment. This suggestion helps streamline the process and ensure smooth integration with other systems.

Lastly, users praise StreamSets as one of the best ETL/ELT tools for data ingestion. They mention its ability to handle large volumes of data efficiently. Additionally, users appreciate the high level of customization offered by StreamSets, allowing them to tailor it to meet their specific enterprise needs. They also commend the support team for their dedication in tweaking the software for missing components.

To optimize performance, users advise analyzing data transfer requirements carefully and configuring the data conversion nodes appropriately. They emphasize the need for sufficient memory to support these requirements.

Overall, these recommendations highlight StreamSets' value as a versatile tool for fast-paced Data Engineering pipeline development and reliable data ingestion, especially when dealing with large amounts of data.

Companies can't remove reviews or game the system. Here's why
(1-1 of 1)

Streamsets : A Powerful DataEngineering + DataOPs Tool

Rating: 9 out of 10
May 06, 2022
AK
Vetted Review
Verified User
StreamSets DataOps Platform
4 years of experience
Being part of one of the Healthcare Service provider accounts, we as a data engineering Team utilized StreamSets to design Data Pipelines to hydrate/load On-Prem data (from various RDBMS sources) to Cloud i.e. Azure, GCP. These Datasets are further utilized by Data scientists and analysts to generate patterns and insights for the healthcare benefits of customers.

We use StreamSets heavily not only for our Batch use cases but for real-time use cases too like consuming from Kafka topic and streaming data to Azure Event Hub.
  • A easy to use canvas to create Data Engineering Pipeline.
  • A wide range of available Stages ie. Sources, Processors, Executors, and Destinations.
  • Supports both Batch and Streaming Pipelines.
  • Scheduling is way easier than cron.
  • Integration with Key-Vaults for Secrets Fetching.
Cons
  • Monitoring/Visualization can be improvised and enhanced a lot (e.g. to monitor a Job to see what happened 7 days back with data transfer).
  • The logging mechanism can be simplified (Logs can be filtered with "ERROR", "DEBUG", "ALL" etc but still takes some time to get familiar for understanding).
  • Auto Scalability for heavy load transfer (Taking much time for >5 million record transfer from JDBC to ADLS destination in Avro file transfer).
  • There should be a concept of creating Global variables which is missing.
Majorly for all Batch and Streaming Scenarios we are designing StreamSets pipelines, few best suited and tried out use cases below :
1. JDBC to ADLS data transfer based on source refresh frequency.
2. Kafka to GCS.
3. Kafka to Azure Event.
4. Hub HDFS to ADLS data transfer.
5. Schema generation to generate Avro.

The easy to design Canvas, Scheduling Jobs, Fragment creation and utilization, an inbuilt wide range of Stage availability makes it an even more favorable tool for me to design data engineering pipelines.
Streaming Analytics (5)
90%
9.0
Visualization Dashboards
70%
7.0
Low Latency
80%
8.0
Integrated Development Tools
100%
10.0
Data wrangling and preparation
100%
10.0
Data Enrichment
100%
10.0
  • Simplified Improvised Overall data ingestion and Integration Process.
  • Support to various Hetrogenous Source systems like RDBMS< Kafka, Salesforce, Key Vault.
  • Secure, easy to launch Integration tool.
  • Cloudera Distribution Hadoop (CDH)
StreamSets is a one-stop solution to design Data engineering Pipelines and doesn't require deep Programming knowledge, It's so user-friendly that anyone in Team can contribute to the Idea of pipeline design. In Hadoop One has to be programming proficient to use its various components like Hive, HDFS, Kafka, etc but in StreamSets all these stages are built-in and ready to use with minor configuration.
Return to navigation