Tonbo is an embedded database that lets you query and update Arrow data without running a database server. You can run it locally or against object storage like S3 with the same consistency guarantees.
In this guide, we’ll start by writing some data to local disk, then move on to using S3 as the storage backend.
It’s much shorter than the usual database guide and along the way you’ll touch on a few ideas in modern database systems like query predicates and async writes.
Prerequisites: ensure you have Rust with Cargo installed. If not, install via rustup: https://www.rust-lang.org/tools/install
Create a new project:
cargo new tonbo-quickstart
cd tonbo-quickstart
Add dependencies:
cargo add [email protected] tokio --features tokio/rt-multi-thread,tokio/macros
cargo add [email protected] --features ext-hooks
If you’ve used ORMs or typed record stores before, this should feel familiar: describe the data as a struct, and mark the fields that form the primary key.
use tonbo::prelude::*;
#[derive(Record)]
struct User {
#[metadata(k = "tonbo.key", v = "true")]
id: String,
name: String,
score: Option<i64>,
}
Mark the primary key in the schema with #[metadata(k = "tonbo.key", v = "true")]. Then create or reuse a directory on disk (here /tmp/tonbo-quickstart) for the schema:
let db = DbBuilder::from_schema(User::schema())?
.on_disk("/tmp/tonbo-quickstart")?
.open()
.await?;
Tonbo ingests data in columnar batches rather than row by row. It’s more efficient and better suited for programmatic workflows.
let users = vec![
User { id: "u1".into(), name: "Alice".into(), score: Some(100) },
User { id: "u2".into(), name: "Bob".into(), score: Some(85) },
User { id: "u3".into(), name: "Carol".into(), score: None },
];
Let’s ingest these rows:
let mut builders = User::new_builders(users.len());
builders.append_rows(users);
db.ingest(builders.finish().into_record_batch()).await?;
Tonbo performs ingestion asynchronously, fitting naturally into async and serverless runtimes.
Tonbo uses expressions rather than SQL to query data, making it easy to push intent to the execution layer. Here we push a predicate (score > 80) so the engine can avoid scanning rows that don’t match (the ≤ 80 rows).
// query definition
let filter = Predicate::gt(ColumnRef::new("score"), ScalarValue::from(80_i64));
let batches = db.scan().filter(filter).collect().await?;
// result consumption
for batch in &batches {
for user in batch.iter_views::<User>()?.try_flatten()? {
println!("{} - {} ({:?})", user.id, user.name, user.score);
}
}
Put the snippets above into src/main.rs and run:
cargo run