
This project is to create a BigDataX component with which an application client can communicate efficiently with database cluster which leverages Cassandra architecture.
The target database cluster, called "Galaxy", including 10 dual-core 20GB servers with 80 TB data storage. And with the developed BigDataX, the system will be able to support 40M transaction per day and 30 ms response time.
The data is stored in Galaxy by using cluster attribute (a combination of attribute and attribute value) as the row-key. As for the mapping value of row-key, there are two column-families, one for the ID of data listing and the other for the data fields of data details. When storing data, each field/element is of fixed length, to support quickly search for the corresponding records by offset and avoid a large number of random read requests and disk search.
Cassandra
Cassandra is a production combinate form Data model from Google Bigtable and Amazon Dynamo's high availability.Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.

Each Cassandra server [node] is assigned a unique Token that determines what keys it is the primary replica for. If you sort all nodes' Tokens, the Range of keys each is responsible for is (PreviousToken, MyToken], that is, from the previous token (exclusive) to the node's token (inclusive). The machine with the lowest Token gets both all keys less than that token, and all keys greater than the largest Token; this is called a "wrapping Range."
Besides datacenters, you can also tell Cassandra which nodes are in the same rack within a datacenter. Cassandra will use this to route both reads and data movement for Range changes to the nearest replicas. This is configured by a user-pluggable EndpointSnitch class in the configuration file.
EndpointSnitch is related to, but distinct from, replication strategy itself: RackAwareStrategy needs a properly configured Snitch to places replicas correctly, but even absent a Strategy that cares about datacenters, the rest of Cassandra will still be location-sensitive.