Skip to content

Adopt a linear-time replay algorithm #305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jan 25, 2022
Merged

Conversation

davidmrdavid
Copy link
Collaborator

@davidmrdavid davidmrdavid commented Jan 7, 2022

Follow-up to: Azure/azure-functions-durable-python#302

Problem Statement

A DF SDK's primary function is to rehydrate an orchestrator's state by processing its internal History log. We call this: "replaying an orchestrator". Previously, our approach to replay had a quadratic complexity cost in the number of History events.

This PR makes our replay algorithm linear-time instead. We do this by iterating over the orchestration History array only once, which is possible by making better use of the data within each History event. Based on our experience with this algorithm in the DF Python SDK, this change should give us unprecedented performance, correctness, and scalability improvements for JS/TS DF users.

Other contributions

This PR is quite intrusive to our Node.JS SDK: most, if not all, the "hot paths" of our SDK have been re-worked. As a result, this was a good opportunity to evaluate dropping support for legacy behavior and interfaces/types that we were supporting mostly for backwards compatibility's sake and that newer SDKs such as Python does not include.

As a result, my goal after merging this PR is to release a new major version of this SDK.

Outline of changes

A new replay driver

The TaskOrchestrationExecutor is the new driver of orchestration replay. Previously, each DF API was responsible for inferring it's corresponding Task status by searching through the History log. Now, this responsibility is centralized in the TaskOrchestrationExecutor. This class interleaves iterating through the History log with resuming the execution of the user-defined generator (the orchestrator code). While iterating through the History, it seeks to update the state of currently-scheduled ("open") tasks which, when resolved ("closed"), allow us to resume running the generator.

Support for the Extension-level OOProc replay schema V2

The DF extension "recently" introduced a new replay protocol for OOProc orchestrations. By subscribing to this new extension-level replay protocol, OOProc SDKs enjoy improved correctness when dealing with WhenAny and WhenAll tasks. This PR adds support to this new protocol.

To do this, it changes its representation of user actions depending on the upperSchemaVersion field sent by the extension.

New task state machines

Tasks are now represented as state machines that can be signaled to update their state. This is particularly helpful in the case of WhenAny/WhenAll tasks, where the task needs to determine its current status by listening to state changes in its child-tasks.

Breaking Changes

There are at least 3 breaking changes introduced in this PR.

  1. We no longer support returning a Task from an orchestrator to be valid. Previously, a statement such as return context.CallActivity(...) would be treated as return yield context.CallActivity(...). This is no longer the case.

  2. We no longer support yielding the ContinueAsNew API, and doing so would throw an exception. This API is supposed to be fire-and-forget and so it doesn't make sense to support "awaiting" it.

  3. All user-facing Task types now inherit from single identifier :Task. Additionally, the user-facing Task` types have been simplified and no longer expose various property that were for framework-internal use only. Our users have been asking for a more streamlined experience with Task-types so this is long overdue as well.

To my knowledge, this is the full extent of breaking changes, but I feel that the changes in this PR are disruptive enough that there might be one other minor behavior I could be missing. In any case, the behavior of the Node SDK after this PR should be more consistent with the Python SDK, which I believe is a good thing.

Follow-ups

This PR will need to be benchmarked and the changes extensively tested before a release.

Note:

Some files with major changes (like the orchestration context) are showing up as minimized due to large diffs. Don't miss them while reviewing :)

Thanks!

@davidmrdavid davidmrdavid marked this pull request as draft January 7, 2022 00:48
@davidmrdavid davidmrdavid marked this pull request as ready for review January 7, 2022 01:09
@cgillum
Copy link
Member

cgillum commented Jan 10, 2022

Do we need to make any documentation updates to account for the mentioned breaking changes?

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review feedback. I still have more to review.

@davidmrdavid davidmrdavid requested a review from cgillum January 25, 2022 01:08
@davidmrdavid
Copy link
Collaborator Author

The current state of the code should have addressed all PR feedback with the exception of two things:

  1. Adding the copyright header, which I'll do later to avoid over-cluttering this PR
  2. Saving some cycles by eagerly indexing into our arrays, and then checking for undefined. I'm concerned about letting undefined values flow through the code, so my first instinct is to avoid this; unless we feel this will be a real performance concern.

@cgillum: I'd appreciate if you could make another pass over this PR, resolving your old comments and questions if they've been addressed. Thanks!

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more things (and a couple old ones too).

@davidmrdavid davidmrdavid requested a review from cgillum January 25, 2022 04:17
@davidmrdavid
Copy link
Collaborator Author

My latest PR responds to the most recent round of feedback. My bad for missing a few of your previous comments, there was a lot to address :)

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this iteration. Remaining comments are optional/non-blocking.

@davidmrdavid davidmrdavid requested a review from cgillum January 25, 2022 19:41
@davidmrdavid
Copy link
Collaborator Author

Wohoo, we're finally ready to merge 🎉
Super excited to get this out!

I'll add the copyright headers in a different PR, to avoid cluttering this one

@davidmrdavid davidmrdavid merged commit f337aff into dev Jan 25, 2022
@davidmrdavid davidmrdavid deleted the dajusto/linear-replay branch January 25, 2022 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants