Common issues and solutions with Badger
Update: with the new Value(func(v []byte))
API, this deadlock can no longer
happen.
The following is true for users on Badger v1.x.
This can happen if a long running iteration with Prefetch
is set to false, but
an Item::Value
call is made internally in the loop. That causes Badger to
acquire read locks over the value log files to avoid value log GC removing the
file from underneath. As a side effect, this also blocks a new value log GC file
from being created, when the value log file boundary is hit.
Please see GitHub issues #293 and #315.
There are multiple workarounds during iteration:
Item::ValueCopy
instead of Item::Value
when retrieving value.Prefetch
to true. Badger would then copy over the value and release the
file lock immediately.Prefetch
is false, don’t call Item::Value
and do a pure key-only
iteration. This might be useful if you just want to delete a lot of keys.Are you creating a new transaction for every single key update, and waiting for
it to Commit
fully before creating a new one? This leads to very low
throughput.
We’ve created WriteBatch
API which provides a way to batch up many updates
into a single transaction and Commit
that transaction using callbacks to avoid
blocking. This amortizes the cost of a transaction really well, and provides the
most efficient way to do bulk writes.
Note that WriteBatch
API doesn’t allow any reads. For read-modify-write
workloads, you should be using the Transaction
API.
If you’re using Badger with SyncWrites=false
, then your writes might not be
written to value log and won’t get synced to disk immediately. Writes to LSM
tree are done in-memory first, before they get compacted to disk. The compaction
would only happen once BaseTableSize
has been reached. So, if you’re doing a
few writes and then checking, you might not see anything on disk. Once you
Close
the database, you’ll see these writes on disk.
Just like forward iteration goes to the first key which is equal or greater than
the SEEK key, reverse iteration goes to the first key which is equal or lesser
than the SEEK key. Therefore, SEEK key would not be part of the results. You can
typically add a 0xff
byte as a suffix to the SEEK key to include it in the
results. See the following issues:
#436 and
#347.
We recommend using instances which provide local SSD storage, without any limit on the maximum IOPS. In AWS, these are storage optimized instances like i3. They provide local SSDs which clock 100K IOPS over 4KB blocks easily.
If you’re seeing panics like this, it is because you’re operating on a closed
DB. This can happen, if you call Close()
before sending a write, or multiple
times. You should ensure that you only call Close()
once, and all your
read/write operations finish before closing.
We highly recommend setting a high number for GOMAXPROCS
, which allows Go to
observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set
it to 128. For more details,
see this thread.
We recommend setting max file descriptors
to a high number depending upon the
expected size of your data. On Linux and Mac, you can check the file descriptor
limit with ulimit -n -H
for the hard limit and ulimit -n -S
for the soft
limit. A soft limit of 65535
is a good lower bound. You can adjust the limit
as needed.
This error means you have a badger directory which was created by an older version of badger and you’re trying to open in a newer version of badger. The underlying data format can change across badger versions and users have to migrate their data directory. Badger data can be migrated from version X of badger to version Y of badger by following the steps listed below. Assume you were on badger v1.6.0 and you wish to migrate to v2.0.0 version.
Install Badger version v1.6.0
cd $GOPATH/src/github.com/dgraph-io/badger
git checkout v1.6.0
cd badger && go install
This should install the old Badger binary in your $GOBIN
.
Create Backup
badger backup --dir path/to/badger/directory -f badger.backup
Install Badger version v2.0.0
cd $GOPATH/src/github.com/dgraph-io/badger
git checkout v2.0.0
cd badger && go install
This should install the new Badger binary in your $GOBIN
.
Restore data from backup
badger restore --dir path/to/new/badger/directory -f badger.backup
This creates a new directory on path/to/new/badger/directory
and adds
data in the new format to it.
NOTE - The preceding steps shouldn’t cause any data loss but please ensure the new data is valid before deleting the old Badger directory.
Badger doesn’t directly use Cgo but it relies on https://github.com/DataDog/zstd
library for zstd compression and the library requires
gcc/cgo
. You can build Badger without Cgo by
running CGO_ENABLED=0 go build
. This builds Badger without the support for
ZSTD compression algorithm.
As of Badger versions v2.2007.4 and v3.2103.1 the DataDog ZSTD library was replaced by pure Golang version and Cgo is no longer required. The new library is backwards compatible in nearly all cases:
Yes they’re compatible both ways. The only exception is 0 bytes of input which gives 0 bytes output with the Go zstd. But you already have the zstd.WithZeroFrames(true) which wraps 0 bytes in a header so it can be fed to DD zstd. This is only relevant when downgrading.
Common issues and solutions with Badger
Update: with the new Value(func(v []byte))
API, this deadlock can no longer
happen.
The following is true for users on Badger v1.x.
This can happen if a long running iteration with Prefetch
is set to false, but
an Item::Value
call is made internally in the loop. That causes Badger to
acquire read locks over the value log files to avoid value log GC removing the
file from underneath. As a side effect, this also blocks a new value log GC file
from being created, when the value log file boundary is hit.
Please see GitHub issues #293 and #315.
There are multiple workarounds during iteration:
Item::ValueCopy
instead of Item::Value
when retrieving value.Prefetch
to true. Badger would then copy over the value and release the
file lock immediately.Prefetch
is false, don’t call Item::Value
and do a pure key-only
iteration. This might be useful if you just want to delete a lot of keys.Are you creating a new transaction for every single key update, and waiting for
it to Commit
fully before creating a new one? This leads to very low
throughput.
We’ve created WriteBatch
API which provides a way to batch up many updates
into a single transaction and Commit
that transaction using callbacks to avoid
blocking. This amortizes the cost of a transaction really well, and provides the
most efficient way to do bulk writes.
Note that WriteBatch
API doesn’t allow any reads. For read-modify-write
workloads, you should be using the Transaction
API.
If you’re using Badger with SyncWrites=false
, then your writes might not be
written to value log and won’t get synced to disk immediately. Writes to LSM
tree are done in-memory first, before they get compacted to disk. The compaction
would only happen once BaseTableSize
has been reached. So, if you’re doing a
few writes and then checking, you might not see anything on disk. Once you
Close
the database, you’ll see these writes on disk.
Just like forward iteration goes to the first key which is equal or greater than
the SEEK key, reverse iteration goes to the first key which is equal or lesser
than the SEEK key. Therefore, SEEK key would not be part of the results. You can
typically add a 0xff
byte as a suffix to the SEEK key to include it in the
results. See the following issues:
#436 and
#347.
We recommend using instances which provide local SSD storage, without any limit on the maximum IOPS. In AWS, these are storage optimized instances like i3. They provide local SSDs which clock 100K IOPS over 4KB blocks easily.
If you’re seeing panics like this, it is because you’re operating on a closed
DB. This can happen, if you call Close()
before sending a write, or multiple
times. You should ensure that you only call Close()
once, and all your
read/write operations finish before closing.
We highly recommend setting a high number for GOMAXPROCS
, which allows Go to
observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set
it to 128. For more details,
see this thread.
We recommend setting max file descriptors
to a high number depending upon the
expected size of your data. On Linux and Mac, you can check the file descriptor
limit with ulimit -n -H
for the hard limit and ulimit -n -S
for the soft
limit. A soft limit of 65535
is a good lower bound. You can adjust the limit
as needed.
This error means you have a badger directory which was created by an older version of badger and you’re trying to open in a newer version of badger. The underlying data format can change across badger versions and users have to migrate their data directory. Badger data can be migrated from version X of badger to version Y of badger by following the steps listed below. Assume you were on badger v1.6.0 and you wish to migrate to v2.0.0 version.
Install Badger version v1.6.0
cd $GOPATH/src/github.com/dgraph-io/badger
git checkout v1.6.0
cd badger && go install
This should install the old Badger binary in your $GOBIN
.
Create Backup
badger backup --dir path/to/badger/directory -f badger.backup
Install Badger version v2.0.0
cd $GOPATH/src/github.com/dgraph-io/badger
git checkout v2.0.0
cd badger && go install
This should install the new Badger binary in your $GOBIN
.
Restore data from backup
badger restore --dir path/to/new/badger/directory -f badger.backup
This creates a new directory on path/to/new/badger/directory
and adds
data in the new format to it.
NOTE - The preceding steps shouldn’t cause any data loss but please ensure the new data is valid before deleting the old Badger directory.
Badger doesn’t directly use Cgo but it relies on https://github.com/DataDog/zstd
library for zstd compression and the library requires
gcc/cgo
. You can build Badger without Cgo by
running CGO_ENABLED=0 go build
. This builds Badger without the support for
ZSTD compression algorithm.
As of Badger versions v2.2007.4 and v3.2103.1 the DataDog ZSTD library was replaced by pure Golang version and Cgo is no longer required. The new library is backwards compatible in nearly all cases:
Yes they’re compatible both ways. The only exception is 0 bytes of input which gives 0 bytes output with the Go zstd. But you already have the zstd.WithZeroFrames(true) which wraps 0 bytes in a header so it can be fed to DD zstd. This is only relevant when downgrading.