I am having trouble estimating the disk usage for the secondary index of Cassandra 3.x.
Tyler Hobbs, commissioner of DataStax, responded to 0.7 as follows:
Their size will throughly be:
(cardinity of the set of indexed values * the avg size of the index values) + (the number of keys in the index column family * the avg size of keys in the column family).
Nodes only index rows that are stored locally -- that is, only rows for which they are a replica.
Here (the number of keys in the indexed column family*the avg size of keys in the column family)
Am I correct in understanding that the keys used in are columns in the Primary Key of the table to be indexed?
Also, is this valid for Ver3.x?
Thank you for your cooperation.
The secondary index for Cassandra is the transposed index.
For example, the following tables are available:
CREATE TABLE users(
user_id text,
name text,
group text,
PRIMARY KEY(user_id)
);
Create an index in this table.
CREATE INDEX users_by_group ON users(group);
Each node in Cassandra then creates an internal table for the users table data that that node maintains:
CREATE TABLE users_by_group(
user_id text,
group text,
PRIMARY KEY((group), user_id)
);
Then, during SELECT using the index, the index table is searched first to obtain the primary key information of the original table.
Then search the original table again with the primary key you obtained.
Therefore,
in the questionAm I correct in understanding that the keys used in the number of keys in the indexed column family * the avg size of keys in the column family are columns in the Primary Key of the table to be indexed?
As you said, keys are the columns included in the Primary Key.
This structure has not changed in Ver3.x.
© 2024 OneMinuteCode. All rights reserved.