Skip to content

Discussion: BadgerDB should offer arbitrarily sized, atomic transactions #1325

@mvanderkroon

Description

@mvanderkroon

Observation

The current BadgerDB implementation keeps in-flight transactions in memory until txn.Commit() is called. If the amount of operations exceeds a predetermined amount, an ErrTxnTooBig is yielded. The de-facto way of dealing with this seems to be to commit the pending transaction there and then, to create a new transaction and continue the sequence of operations.

Problem statement

This is unacceptable for many applications for two reasons:

  1. It will leave the database in an inconsistent state; perhaps not at the physical level - but certainly at the application level.
  2. Concurrent transactions will use up available RAM quickly and unpredictably

In my opinion, the current situation essentially breaks the Atomicity guarantee for any transaction larger than (some function of) available RAM, or combination of transactions that exceeds (some function of) available RAM.

Real-world examples

To illustrate: our use cases dictate regular updates of blocks of large amounts of keys, which includes deleting the previously existing block (currently used by the application and its users) and then inserting the new block. The number of keys is not a fixed amount. One can easily see how inconvenient it is to the users of our application if the ErrTxnTooBig errors happens right at the moment we finished deleting the previously existing keyset. It would cause the removal of the entire block that is currently being used by the application, before replacing that block with new data. This is what I mean by leaving the database in an inconsistent state at the application level.

Additionally, we can imagine another situation: many blocks in various parts of the application are written to simultaneously, as external users might. Once any combination of these actions at some time exceeds the available RAM it crashes the application. Perhaps the ErrTxnTooBig mechanism takes this into account, I'm not sure.

Discussion

I believe the author of issue #1242 was also hinting at this problem, but I think the issue goes beyond ErrTxnTooBig and a solution (if this were accepted as a problem to begin with!) could/would entail potentially far reaching changes in how BadgerDB stores in-flight transactions. Postgres could serve as a good example of how to achieve it: https://brandur.org/postgres-atomicity.

My understanding is that many other databases follow similar practices; for Neo4J I know this is the case for certain. I also don't mean to imply this is a bad choice per-se by any means; perhaps BadgerDB/DGraph and others are aimed at those use cases where atomicity on arbitrarily sized transactions is not needed, where raw performance has priority and that is OK!

However, the (optional) ability to deal with arbitrarily sized transactions would make BadgerDB/DGraph a suitable solution for a wider domain of real-world applications.

I'm curious to learn:

  • Whether it is perceived as a problem at all
  • Is a solution to this problem part of the future of BadgerDB
  • If the above answers are both yes, how should it be approached
  • Is there anything I could do to help

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/apiIssues related to current API limitations.kind/featureSomething completely new we should consider.priority/P2Somehow important but would not block a release.status/acceptedWe accept to investigate or work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions