Cassandra
Cassandra is a NoSQL, row-oriented, highly scalable and highly available database.
CassandraByteStore needs the cassio package to be installed:
%pip install --upgrade --quiet cassio
The Store takes the following parameters:
- table: The table where to store the data.
- session: (Optional) The cassandra driver session. If not provided, the cassio resolved session will be used.
- keyspace: (Optional) The keyspace of the table. If not provided, the cassio resolved keyspace will be used.
- setup_mode: (Optional) The mode used to create the Cassandra table (SYNC, ASYNC or OFF). Defaults to SYNC.
CassandraByteStore
The CassandraByteStore is an implementation of ByteStore that stores the data in your Cassandra instance.
The store keys must be strings and will be mapped to the row_id column of the Cassandra table.
The store bytes values are mapped to the body_blob column of the Cassandra table.
from langchain_community.storage import CassandraByteStore
Init from a cassandra driver Session
You need to create a cassandra.cluster.Session object, as described in the Cassandra driver documentation. The details vary (e.g. with network settings and authentication), but this might be something like:
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
You need to provide the name of an existing keyspace of the Cassandra instance:
CASSANDRA_KEYSPACE = input("CASSANDRA_KEYSPACE = ")
Creating the store:
store = CassandraByteStore(
table="my_store",
session=session,
keyspace=CASSANDRA_KEYSPACE,
)
store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))
[b'v1', b'v2']
Init from cassio
It's also possible to use cassio to configure the session and keyspace.
import cassio
cassio.init(contact_points="127.0.0.1", keyspace=CASSANDRA_KEYSPACE)
store = CassandraByteStore(
table="my_store",
)
store.mset([("k1", b"v1"), ("k2", b"v2")])
print(store.mget(["k1", "k2"]))
Usage with CacheBackedEmbeddings
You may use the CassandraByteStore in conjunction with a CacheBackedEmbeddings to cache the result of embeddings computations.
from langchain.embeddings import CacheBackedEmbeddings
from langchain_openai import OpenAIEmbeddings
cassio.init(contact_points="127.0.0.1", keyspace=CASSANDRA_KEYSPACE)
store = CassandraByteStore(
table="my_store",
)
embeddings = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings=OpenAIEmbeddings(), document_embedding_cache=store
)