Overview
What is Cassandra?
Cassandra is a no-SQL database from Apache.
review of cassandra
One of the Best NoSQL Databases!
Cassandra at scale
Cassandra: A highly available and scalable database
Cassandra - a tunable NoSQL datastore
Pretty good software
Cassandra, put into the real business context
Cassandra Usage and Needs
Cassandra as NoSQL fault tolerant database choice
Cassandra, a highly scalable NoSQL DB
What makes Cassandra different!!!!
Apache Cassandra - Why Would You Look Elsewhere?
Cassandra, hands-on review, after 4 years of serious use
Cassandra Rocks !!!
Cassandra - pretty good if you know what you are doing
It serves as the …
Awards
Products that are considered exceptional by their customers based on a variety of criteria win TrustRadius awards. Learn more about the types of TrustRadius awards to make the best purchase decision. More about TrustRadius Awards
Popular Features
- Availability (5)8.888%
- Performance (5)8.585%
- Security (5)8.080%
- Concurrency (5)7.676%
Pricing
Entry-level set up fee?
- No setup fee
Offerings
- Free Trial
- Free/Freemium Version
- Premium Consulting/Integration Services
Would you like us to let the vendor know that you want pricing?
75 people also want pricing
Alternatives Pricing
What is MongoDB?
MongoDB is an open source document-oriented database system. It is part of the NoSQL family of database systems. Instead of storing data in tables as is done in a "classical" relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the format…
What is HCL Zen Edge Data Management?
HCL Zen Edge Data Management (formerly Actian Zen) is a NoSQL and SQL (fully ANSI compliant) embedded database that runs on Windows, Linux, Android, iOS, macOS, in VMs and Containers with AES 256-bit encryption. Version footprints range from 5MB (client only) to 50 MB (embedded client-server) to…
Product Demos
Presto and Cassandra: Doing SQL and Joins on Cassandra Tables
CassandraDB Connector Demo | CassandraDB Integration
Open Source BI Tools and Cassandra
Spark and Cassandra: Doing SQL and Joins on Cassandra Tables
Real-time IoT data analytics and visualization with Kaa, Apache Cassandra, and Apache Zeppelin
Features
NoSQL Databases
NoSQL databases are designed to be used across large distrusted systems. They are notably much more scalable and much faster and handling very large data loads than traditional relational databases.
- 8.5Performance(5) Ratings
How fast the database performs under data load
- 8.8Availability(5) Ratings
Availability is the probability that the NoSQL database will be available to preform its function when called upon.
- 7.6Concurrency(5) Ratings
Concurrency is the ability for multiple processes to access or change shared data simultaneously. The greater the number of concurrent user processes that can execute without blocking each other, the greater the concurrency of the database system.
- 8Security(5) Ratings
Security features include authentication against external security mechanisms liker LDAP, Windows Active Directory, and authorization or privilege management. Some NoSQL databases also support encryption.
- 9.5Scalability(5) Ratings
NoSQL databases are inherently more stable than relational databases and have built-in support for replication and partitioning of data to support scalability.
- 6.7Data model flexibility(5) Ratings
NoSQL databases do not rely on rely on tables, columns, rows, or schemas to organize and retrieve data, but use use more flexible data models to accommodate the large volume and variety of data being generated by modern applications.
- 7Deployment model flexibility(5) Ratings
Can be deployed on-premise or in the cloud.
Product Details
- About
- Tech Details
- FAQs
What is Cassandra?
Cassandra Technical Details
Operating Systems | Unspecified |
---|---|
Mobile Application | No |
Frequently Asked Questions
Comparisons
Compare with
Reviews and Ratings
(93)Community Insights
- Business Problems Solved
- Pros
- Cons
Apache Cassandra has gained extensive popularity and usage across various critical use cases and platform solutions in many organizations. Users have found it particularly useful in the tax domain, small businesses, profile platforms, and AB testing platforms. Algorithmic Ads, for example, relies solely on Cassandra for both real-time transactions and analytics.
In terms of implementation, a lightweight Java application serves as the primary means of accessing Cassandra, providing a RESTful web services API for seamless integration with other applications. This API is used internally as well as by customers, making it a central point for integration that includes business logic and data. The outstanding performance, linear scalability, and continuous availability of Cassandra make it a preferred choice among developers when a highly available NoSQL database is required.
Furthermore, Cassandra has proven its capabilities in multiple scenarios. It currently supports an enterprise eCommerce platform, offering excellent performance and acting as a powerful NoSQL database. Additionally, it has been employed to build a fully functional proof of concept for a shipment cloud concept at FedEx. By combining InMemory and NoSQL storage solutions, Cassandra enables unified RESTful-based service that caters to queries for the latest or historical shipment status. Moreover, users have found that Cassandra serves as a reliable backup for the IMDG component in case of a complete crash.
Cassandra's versatility extends to other domains as well. It effectively handles non-standard RDBMS data by providing fast write speeds and suitability for storing flat data. Many organizations leverage its cluster configuration to store personalization data for customers, ensuring up-to-date information with low latency. Cassandra also plays a crucial role in storing data in JSON format, allowing for efficient data storage and retrieval.
Moreover, Cassandra seamlessly integrates with various systems to provide distributed system logic. For instance, it is a core component of the HyperStore S3-compatible object storage system and collaborates with other Java servers to create scalable and fault-tolerant architectures.
Additionally, Cassandra has proven its efficiency in academic projects related to cloud computing and Salesforce, outperforming traditional RDBMS solutions. Prominent companies like Facebook and Uber rely on Cassandra for their real-time running apps due to its improved performance capabilities.
Although users have encountered challenges with the documentation, they still highly recommend using Cassandra for its scalability and faster request processing. Overall, Cassandra is a valuable asset for geographically dispersed architectures, offering availability, consistency, data distribution across multiple machines, and expandability on demand.
Greatest community and adoption: The Java-based NoSQL database has garnered a strong following with its greatest community and adoption. Many users have found it to be a highly popular choice among developers, benefiting from the extensive support and resources available.
Excellent integration with Apache Hadoop, Apache Spark, and Solr: Reviewers have consistently praised the database for its excellent integration capabilities with Apache Hadoop, Apache Spark, and Solr. This seamless integration provides a robust ecosystem of tools that enable efficient unit tests and stress testing.
Best-in-class performance across various workloads: Users have consistently highlighted the exceptional performance of this database across various read/write/mixed workloads. Its ability to provide low latency and high throughput has been widely appreciated by customers who require fast data retrieval and processing.
Missing Features: Some users have expressed that Apache Cassandra lacks certain functionalities, such as security and advanced tools like OpsCenter. They believe these features should be included in the open source version.
Challenging Data Modeling: Users with a background in relational databases may find it challenging to understand and work with NoSQL databases like Cassandra. They mention that data modeling needs to revolve around queries rather than the data structure.
Operational Challenges: Managing a large Cassandra cluster, even with the DataStax Enterprise Version, can pose challenges for maintenance teams due to frequent version upgrades and auto-repair. Users express the need for improved operational tools and continued enhancements to handle large clusters and massive amounts of data effectively.
Attribute Ratings
Reviews
(1-16 of 16)review of cassandra
- Masterless
- Schema-less
- Multiple datacenter usage w/ little or no data loss
- Rebuild/repair of objects (tables) in the keyspaces, allow to ignore keyspaces to repair.
- Monitoring tool form opscenter support for Cassandra 3.x (or some other open source tool)
- UI browser type to view data (rather than csql)
One of the Best NoSQL Databases!
- Continuous data availability is extremely powerful feature of Cassandra.
- Overall cost effective and low maintenance database platform.
- High performance and low tolerance no SQL database.
- Moving data from and to Cassandra to any relational database platform can be improved.
- Database event logging can be handled more efficiently.
Cassandra at scale
- Availability
- Fast performance
- Horizontal scalability
- Memory first
- Partition based
- Dealing with tombstone
- Maintenance/upgrade
- Compaction and repair
Cassandra: A highly available and scalable database
- Cassandra is a masterless design, hence massively scalable. It is great for applications and use cases that cannot afford to lose data. There is no single point of failure.
- You can add more nodes to Cassandra to linearly increase your transactions/requests. Also, it has great support across cloud regions and data centers.
- Cassandra provides features like tunable consistency, data compression and CQL(Cassandra Query Language) which we use.
- The underlying medium of Cassandra is a key-value store. So when you model your data, it is based on how you would want to query it and not how the data is structured. This results in a repetition of data when storing. Hence, there is no referential integrity - there is no concept of JOIN connections in Cassandra.
- Data aggregation functions like SUM, MIN, MAX, AVG, and others are very costly even if possible. Hence Ad-hoc query or analysis is difficult.
You can use it where you want to store log or user-behavior types of data. You can use it in heavy-write or time-series data storage. It is good in retail applications for fast product catalog inputs and lookups
Cassandra - a tunable NoSQL datastore
- Write speed. Cassandra is very fast while writing data due to its unique architecture.
- Tunable consistency - During data replication, consistency can be tuned for a particular data set to be available during an outage.
- CQL - cassandra query language is a subset of SQL and eases the transition from a more traditional database.
- Aggregation functions are not very efficient.
- Ad-hoc queries do not perform well. Queries which were visualized while designing the databases only perform well.
- Performance is unpredictable.
Pretty good software
- Runs on commodity hardware
- Build in fault tolerance
- Can grow horizontally
- It is a bit difficult for people that come from the SQL world.
- Managing anti-entropy repair is still a bit of a challenge.
- Better security patches.
Cassandra, put into the real business context
- Cassandra is very strong for saving the time series based transaction data model, simply by reversing the time series order when creating the data table, we can very quickly fetch the "latest" records even from millions of associated transactions because the latest record is always at the top of the search. By combining with the TTL feature of the Cassandra column, it is easy to "auto" delete the old data.
- Cassandra combines the key-value store from Amazon's DynamoDB with the column family data model from the Google's BigTable, which makes it easy to manage both structured and non-structured data model efficiently.
- By using the DataStax Enterprise version provided Solr integration, it can even solve some ad-hoc query needs which may not be fully taken into account at the beginning of the project when the data table is created. This extremely adds more room to play for a large enterprise or project which does require some flexibility in the practical context.
- The linear scalability provided by Cassandra, allowing us to easily scale up/down the cluster by simply adding/removing the servers.
- The throughput for both the read/write performance of Cassandra is quite good.
- Managing the big cluster of Cassandra , even with the DataStax Enterprise Version, is still quite challenging for a maintenance team, considering the frequent version upgrade (even in the rolling fashion) and more frequent auto-repair, for me on this area, a powerful tool should be provided to "automate" this process as much as possible.
- The TTL design is good, however the pain is if the TTL is set on some data already inserted, it can not be simply updated. Unless that data is reinserted again, this fact causes a lot of issues in case the business strategy is changed which requires the purge strategy to be updated also.
- As the nature of Cassandra is still Java based, the GC sometimes eats some performance, if Cassandra can allow using more non-Heap memory space, to reduce the GC efforts which will free more power on the hardware.
- The default indexing strategy for JSON formatted data in the DataStax's Solr integration is not available. At this moment we have to implement our own to support our JSON text stored. We extract the key field from our data which might be required to be ad-hoc searched, converting them into the JSON format (only one level Map), and save them into the Cassandra column. On top of that we want Solr to index the key of each token.
Cassandra Usage and Needs
- Cassandra lot of API's ready available for map reducing queries (like materialized queries).
- Cassandra uses ring architecture approach, there is no master-slave approach (like HBase). If data is published on the node, the data will get synced with other nodes in the ring architecture, compared to HBase which has a dedicated master node to orchestrate the data into its slaves.
- Write Speed
- Multi Data Center Replication
- Tunable Consistency
- Integrates with JVM because it's written in Java
- Cassandra Query Language is a subset of SQL query (less learning curve)
- No Ad-Hoc Queries: Cassandra data storage layer is basically a key-value storage system. This means that you must "model" your data around the queries you want to surface, rather than around the structure of the data itself.
- There are no aggregations queries available in Cassandra.
- Not fit for transactional data.
Cassandra as NoSQL fault tolerant database choice
- Cassandra can preform read/writes very quick
- Nodes in a ring will keep up to date by sharding information to each other
- Cassandra is well suited for scalable application needing keyspace storage
- Cassandra's query language is clunky, which is likely due to the nature of NoSQL.
- Lacking the ability to relate data between sets makes querying harder, but this again is the nature of NoSQL.
Cassandra, a highly scalable NoSQL DB
- Automatic data sharding between nodes
- High availability
- Python Support drivers
- Managing cassandra nodes (adding, removing)
- Need a separate tool to have a console (datastax opscenter)
What makes Cassandra different!!!!
I have simulated a few real time running apps like Facebook and Uber where I have used RDMS and Cassandra, and checked the performance using Jmeter. It clearly shows that Cassandra boosts the performance over RDMS. One thing I find difficult in Cassandra is following the documents, which are not so understandable.
- Undoubtedly performance is an important reason
- We have not encountered a single point of failure
- Scalability of Cassandra is good which is the most important for the companies where demand is scaling day by day.
- Cassandra has a wide range of asynchronous jobs and background tasks that are not scheduled by the client, the execution can be eccentric.
- Because Cassandra is a key-value store, doing things like SUM, MIN, MAX, AVG and other aggregations are incredibly resource intensive if even possible to accomplish.
- I think querying options for retrieving data is very limited.
Tunable Consistency
Write Speed
Less Appropriate
Ad-Hoc Queries
Unpredictable Performance
Apache Cassandra - Why Would You Look Elsewhere?
- As a Java based NoSQL database it has the greatest community and adoption. Coupled with great Apache hadoop, Apache Spark and Solr integration and a strong tools ecosystem(unit tests, stress testing), it is a unbeatable combination!
- As a hybrid architecture based on masterless architecture as in DynamoDB and column family data model as in BigTable, it hits the bulls eye!
- It has best in class performance across different kinds of read/write/mixed workloads. It provides linear scalability which works for the best performance, lowest latency and highest throughput.
- Being a tunable consistency model enables you to have consistency as your platform/application needs.
- If configured correctly, there is no downtime and no data loss.These are key criterias on critical domains.
- Apache Cassandra is lacking in some features, which Datastax provides in the Enterprise version. For example, security and advanced tools like OpsCenter. These would be a great addition to open source Apache Cassandra.
- At times we noticed some versions had issues not known in advance, for example, LostNotificationError on repair of nodes. However steadily the newer releases have become better and more stable.
- Examples of datastax native driver with Cassandra 2.1 can be improved, as it does not provide all scenarios one would need on production.
- If you prefer to work with an open source project and be hands on, Apache Cassandra is one of the best. However if you need a managed cassandra like service where you do not even want to configure/deploy/backup/restack, a DynamoDB service would be more preferred.
- Cassandra is JVM based NoSQL, hence garbage collector tuning is a key aspect, Garbage collection in JDK 8 and G1GC garbage collector is better or configure ConcurrentMarkSweep(CMS) garbage collector in an optimum manner.
Apache Cassandra is a NoSQL database and well suited where you need highly available, linearly scalable, tunable consistency and high performance across varying workloads. It has worked well for our use cases, and I shared my experiences to use it effectively at the last Cassandra summit! http://bit.ly/1Ok56TK
It is a NoSQL database, finally you can tune it to be strongly consistent and successfully use it as such. However those are not usual patterns, as you negotiate on latency. It works well if you require that. If your use case needs strongly consistent environments with semantics of a relational database or if the use case needs a data warehouse, or if you need NoSQL with ACID transactions, Apache Cassandra may not be the optimum choice.
Cassandra, hands-on review, after 4 years of serious use
- Continuous availability: as a fully distributed database (no master nodes), we can update nodes with rolling restarts and accommodate minor outages without impacting our customer services.
- Linear scalability: for every unit of compute that you add, you get an equivalent unit of capacity. The same application can scale from a single developer's laptop to a web-scale service with billions of rows in a table.
- Amazing performance: if you design your data model correctly, bearing in mind the queries you need to answer, you can get answers in milliseconds.
- Time-series data: Cassandra excels at recording, processing, and retrieving time-series data. It's a simple matter to version everything and simply record what happens, rather than going back and editing things. Then, you can compute things from the recorded history.
- Cassandra is a poor choice for implementing application queues.
- NoSQL requires thinking differently, and can be challenging for people with strong relational database backgrounds to understand. The CQL language helps with this, but it pays to understand how the engine works under the hood. That said, the benefits outweigh the challenge of the learning curve!
- Database compactions and anti-entropy repair can be burdensome on a busy cluster. Significant improvements have been made in recent versions, but it remains as an operational challenge.
Cassandra Rocks !!!
- Cassandra is highly scalable.
- It provides the flexibility to store data in any format. You can add column family dynamically as need by the application.
- One of the best noSQL solutions I've used so far.
- A better UI access for reading the data.
- More graphical information to understand how the data is being processed, system uptime/downtime, etc.
- I used Cassandra-cli for running quries but it is not very helpful when it returns a lot of results. If there was some way to improve the user queries, it would be great.
Cassandra - pretty good if you know what you are doing
It serves as the storage layer in our home grown sensor analytics platforms that utilizes spark for the computation. We use it to store billions of samples of wearable sensor data that is collected in various studies and experiments.
- High Availability - we utilize the data replication features of Cassandra. This enables us to access our data even when several nodes have gone down
- Data Locality - our architecture combines Cassandra storage nodes and computation nodes in the same machine. This enables us to utilize data locality and limit expensive network IO to read data.
- Elasticity - Cassandra is a shared nothing architecture. Nodes can be added very easily and they discover the network topology. As soon as a node has joined the Cassandra ring, the data is redistributed among the existing nodes and streamed to it automatically.
- Cassandra runs on the JVM and therefor may require a lot of GC tuning for read/write intensive applications.
- Requires manual periodic maintenance - for example it is recommended to run a cleanup on a regular basis.
- There are a lot of knobs and buttons to configure the system. For many cases the default configuration will be sufficient, but if its not - you will need significant ramp up on the inner workings of Cassandra in order to effectively tune it.
It is well suited for storing immutable data as deletes are extremely inefficient. As such, it is well suited for data archive and deep storage.
It is less appropriate for OLAP as has limited aggregation and filtering abilities, and no grouping whatsoever.
- Performant. In particular, write performance is very good. Recently, a lot of work to address the changing systems environment has been done to take advantage of areas like SSDs and very dense storage systems.
- Distributed system logic. Multiple data centers and other common network configurations like heterogeneous nodes are handled and exploited well.
- Community. Strong community with users and project contributors worldwide. The open-source and commercial software people work well together with sharing of lessons learned and improvements based on feedback.
- Operational tools. Would like to see continued work to improve the operational capability for large clusters and large amounts of data. For example, analyzing the on-disk files.
- Repair. Being able to run repair continuously and with greater control to avoid any spikes in resource use.