mirror of
https://source.quilibrium.com/quilibrium/ceremonyclient.git
synced 2025-01-23 14:15:18 +00:00
227 lines
9.0 KiB
Markdown
227 lines
9.0 KiB
Markdown
|
# Pebble [![Build Status](https://github.com/cockroachdb/pebble/actions/workflows/ci.yaml/badge.svg?branch=master)](https://github.com/cockroachdb/pebble/actions/workflows/ci.yaml) [![GoDoc](https://godoc.org/github.com/cockroachdb/pebble?status.svg)](https://godoc.org/github.com/cockroachdb/pebble) <sup><sub><sub>[Coverage](https://storage.googleapis.com/crl-codecover-public/pebble/index.html)</sub></sub></sup>
|
||
|
|
||
|
#### [Nightly benchmarks](https://cockroachdb.github.io/pebble/)
|
||
|
|
||
|
Pebble is a LevelDB/RocksDB inspired key-value store focused on
|
||
|
performance and internal usage by CockroachDB. Pebble inherits the
|
||
|
RocksDB file formats and a few extensions such as range deletion
|
||
|
tombstones, table-level bloom filters, and updates to the MANIFEST
|
||
|
format.
|
||
|
|
||
|
Pebble intentionally does not aspire to include every feature in RocksDB and
|
||
|
specifically targets the use case and feature set needed by CockroachDB:
|
||
|
|
||
|
* Block-based tables
|
||
|
* Checkpoints
|
||
|
* Indexed batches
|
||
|
* Iterator options (lower/upper bound, table filter)
|
||
|
* Level-based compaction
|
||
|
* Manual compaction
|
||
|
* Merge operator
|
||
|
* Prefix bloom filters
|
||
|
* Prefix iteration
|
||
|
* Range deletion tombstones
|
||
|
* Reverse iteration
|
||
|
* SSTable ingestion
|
||
|
* Single delete
|
||
|
* Snapshots
|
||
|
* Table-level bloom filters
|
||
|
|
||
|
RocksDB has a large number of features that are not implemented in
|
||
|
Pebble:
|
||
|
|
||
|
* Backups
|
||
|
* Column families
|
||
|
* Delete files in range
|
||
|
* FIFO compaction style
|
||
|
* Forward iterator / tailing iterator
|
||
|
* Hash table format
|
||
|
* Memtable bloom filter
|
||
|
* Persistent cache
|
||
|
* Pin iterator key / value
|
||
|
* Plain table format
|
||
|
* SSTable ingest-behind
|
||
|
* Sub-compactions
|
||
|
* Transactions
|
||
|
* Universal compaction style
|
||
|
|
||
|
***WARNING***: Pebble may silently corrupt data or behave incorrectly if
|
||
|
used with a RocksDB database that uses a feature Pebble doesn't
|
||
|
support. Caveat emptor!
|
||
|
|
||
|
## Production Ready
|
||
|
|
||
|
Pebble was introduced as an alternative storage engine to RocksDB in
|
||
|
CockroachDB v20.1 (released May 2020) and was used in production
|
||
|
successfully at that time. Pebble was made the default storage engine
|
||
|
in CockroachDB v20.2 (released Nov 2020). Pebble is being used in
|
||
|
production by users of CockroachDB at scale and is considered stable
|
||
|
and production ready.
|
||
|
|
||
|
## Advantages
|
||
|
|
||
|
Pebble offers several improvements over RocksDB:
|
||
|
|
||
|
* Faster reverse iteration via backwards links in the memtable's
|
||
|
skiplist.
|
||
|
* Faster commit pipeline that achieves better concurrency.
|
||
|
* Seamless merged iteration of indexed batches. The mutations in the
|
||
|
batch conceptually occupy another memtable level.
|
||
|
* L0 sublevels and flush splitting for concurrent compactions out of L0 and
|
||
|
reduced read-amplification during heavy write load.
|
||
|
* Faster LSM edits in LSMs with large numbers of sstables through use of a
|
||
|
copy-on-write B-tree to hold file metadata.
|
||
|
* Delete-only compactions that drop whole sstables that fall within the bounds
|
||
|
of a range deletion.
|
||
|
* Block-property collectors and filters that enable iterators to skip tables,
|
||
|
index blocks and data blocks that are irrelevant, according to user-defined
|
||
|
properties over key-value pairs.
|
||
|
* Range keys API, allowing KV pairs defined over a range of keyspace with
|
||
|
user-defined semantics and interleaved during iteration.
|
||
|
* Smaller, more approachable code base.
|
||
|
|
||
|
See the [Pebble vs RocksDB: Implementation
|
||
|
Differences](docs/rocksdb.md) doc for more details on implementation
|
||
|
differences.
|
||
|
|
||
|
## RocksDB Compatibility
|
||
|
|
||
|
Pebble strives for forward compatibility with RocksDB 6.2.1 (the latest
|
||
|
version of RocksDB used by CockroachDB). Forward compatibility means
|
||
|
that a DB generated by RocksDB can be used by Pebble. Currently, Pebble
|
||
|
provides bidirectional compatibility with RocksDB (a Pebble generated DB
|
||
|
can be used by RocksDB) when using its FormatMostCompatible format. New
|
||
|
functionality that is backwards incompatible is gated behind new format
|
||
|
major versions. In general, Pebble only provides compatibility with the
|
||
|
subset of functionality and configuration used by CockroachDB. The scope
|
||
|
of RocksDB functionality and configuration is too large to adequately
|
||
|
test and document all the incompatibilities. The list below contains
|
||
|
known incompatibilities.
|
||
|
|
||
|
* Pebble's use of WAL recycling is only compatible with RocksDB's
|
||
|
`kTolerateCorruptedTailRecords` WAL recovery mode. Older versions of
|
||
|
RocksDB would automatically map incompatible WAL recovery modes to
|
||
|
`kTolerateCorruptedTailRecords`. New versions of RocksDB will
|
||
|
disable WAL recycling.
|
||
|
* Column families. Pebble does not support column families, nor does
|
||
|
it attempt to detect their usage when opening a DB that may contain
|
||
|
them.
|
||
|
* Hash table format. Pebble does not support the hash table sstable
|
||
|
format.
|
||
|
* Plain table format. Pebble does not support the plain table sstable
|
||
|
format.
|
||
|
* SSTable format version 3 and 4. Pebble does not support version 3
|
||
|
and version 4 format sstables. The sstable format version is
|
||
|
controlled by the `BlockBasedTableOptions::format_version` option.
|
||
|
See [#97](https://github.com/cockroachdb/pebble/issues/97).
|
||
|
|
||
|
## Format major versions
|
||
|
|
||
|
Over time Pebble has introduced new physical file formats. Backwards
|
||
|
incompatible changes are made through the introduction of 'format major
|
||
|
versions'. By default, when Pebble opens a database, it defaults to
|
||
|
`FormatMostCompatible`. This version is bi-directionally compatible with RocksDB
|
||
|
6.2.1 (with the caveats described above).
|
||
|
|
||
|
To opt into new formats, a user may set `FormatMajorVersion` on the
|
||
|
[`Options`](https://pkg.go.dev/github.com/cockroachdb/pebble#Options)
|
||
|
supplied to
|
||
|
[`Open`](https://pkg.go.dev/github.com/cockroachdb/pebble#Open), or
|
||
|
upgrade the format major version at runtime using
|
||
|
[`DB.RatchetFormatMajorVersion`](https://pkg.go.dev/github.com/cockroachdb/pebble#DB.RatchetFormatMajorVersion).
|
||
|
Format major version upgrades are permanent; There is no option to
|
||
|
return to an earlier format.
|
||
|
|
||
|
The table below outlines the history of format major versions:
|
||
|
|
||
|
| Name | Value | Migration |
|
||
|
|------------------------------------|-------|------------|
|
||
|
| FormatMostCompatible | 1 | No |
|
||
|
| FormatVersioned | 3 | No |
|
||
|
| FormatSetWithDelete | 4 | No |
|
||
|
| FormatBlockPropertyCollector | 5 | No |
|
||
|
| FormatSplitUserKeysMarked | 6 | Background |
|
||
|
| FormatSplitUserKeysMarkedCompacted | 7 | Blocking |
|
||
|
| FormatRangeKeys | 8 | No |
|
||
|
| FormatMinTableFormatPebblev1 | 9 | No |
|
||
|
| FormatPrePebblev1Marked | 10 | Background |
|
||
|
| FormatSSTableValueBlocks | 12 | No |
|
||
|
| FormatFlushableIngest | 13 | No |
|
||
|
| FormatPrePebblev1MarkedCompacted | 14 | Blocking |
|
||
|
| FormatDeleteSizedAndObsolete | 15 | No |
|
||
|
| FormatVirtualSSTables | 16 | No |
|
||
|
|
||
|
Upgrading to a format major version with 'Background' in the migration
|
||
|
column may trigger background activity to rewrite physical file
|
||
|
formats, typically through compactions. Upgrading to a format major
|
||
|
version with 'Blocking' in the migration column will block until a
|
||
|
migration is complete. The database may continue to serve reads and
|
||
|
writes if upgrading a live database through
|
||
|
`RatchetFormatMajorVersion`, but the method call will not return until
|
||
|
the migration is complete.
|
||
|
|
||
|
For reference, the table below lists the range of supported Pebble format major
|
||
|
versions for CockroachDB releases.
|
||
|
|
||
|
| CockroachDB release | Earliest supported | Latest supported |
|
||
|
|---------------------|------------------------------------|---------------------------|
|
||
|
| 20.1 through 21.1 | FormatMostCompatible | FormatMostCompatible |
|
||
|
| 21.2 | FormatMostCompatible | FormatSetWithDelete |
|
||
|
| 21.2 | FormatMostCompatible | FormatSetWithDelete |
|
||
|
| 22.1 | FormatMostCompatible | FormatSplitUserKeysMarked |
|
||
|
| 22.2 | FormatMostCompatible | FormatPrePebblev1Marked |
|
||
|
| 23.1 | FormatSplitUserKeysMarkedCompacted | FormatFlushableIngest |
|
||
|
| 23.2 | FormatSplitUserKeysMarkedCompacted | FormatVirtualSSTables |
|
||
|
| 24.1 plan | FormatSSTableValueBlocks | |
|
||
|
|
||
|
## Pedigree
|
||
|
|
||
|
Pebble is based on the incomplete Go version of LevelDB:
|
||
|
|
||
|
https://github.com/golang/leveldb
|
||
|
|
||
|
The Go version of LevelDB is based on the C++ original:
|
||
|
|
||
|
https://github.com/google/leveldb
|
||
|
|
||
|
Optimizations and inspiration were drawn from RocksDB:
|
||
|
|
||
|
https://github.com/facebook/rocksdb
|
||
|
|
||
|
## Getting Started
|
||
|
|
||
|
### Example Code
|
||
|
|
||
|
```go
|
||
|
package main
|
||
|
|
||
|
import (
|
||
|
"fmt"
|
||
|
"log"
|
||
|
|
||
|
"github.com/cockroachdb/pebble"
|
||
|
)
|
||
|
|
||
|
func main() {
|
||
|
db, err := pebble.Open("demo", &pebble.Options{})
|
||
|
if err != nil {
|
||
|
log.Fatal(err)
|
||
|
}
|
||
|
key := []byte("hello")
|
||
|
if err := db.Set(key, []byte("world"), pebble.Sync); err != nil {
|
||
|
log.Fatal(err)
|
||
|
}
|
||
|
value, closer, err := db.Get(key)
|
||
|
if err != nil {
|
||
|
log.Fatal(err)
|
||
|
}
|
||
|
fmt.Printf("%s %s\n", key, value)
|
||
|
if err := closer.Close(); err != nil {
|
||
|
log.Fatal(err)
|
||
|
}
|
||
|
if err := db.Close(); err != nil {
|
||
|
log.Fatal(err)
|
||
|
}
|
||
|
}
|
||
|
```
|