Adopt a linear-time replay algorithm #305

davidmrdavid · 2022-01-07T00:25:45Z

Follow-up to: Azure/azure-functions-durable-python#302

Problem Statement

A DF SDK's primary function is to rehydrate an orchestrator's state by processing its internal History log. We call this: "replaying an orchestrator". Previously, our approach to replay had a quadratic complexity cost in the number of History events.

This PR makes our replay algorithm linear-time instead. We do this by iterating over the orchestration History array only once, which is possible by making better use of the data within each History event. Based on our experience with this algorithm in the DF Python SDK, this change should give us unprecedented performance, correctness, and scalability improvements for JS/TS DF users.

Other contributions

This PR is quite intrusive to our Node.JS SDK: most, if not all, the "hot paths" of our SDK have been re-worked. As a result, this was a good opportunity to evaluate dropping support for legacy behavior and interfaces/types that we were supporting mostly for backwards compatibility's sake and that newer SDKs such as Python does not include.

As a result, my goal after merging this PR is to release a new major version of this SDK.

Outline of changes

A new replay driver

The TaskOrchestrationExecutor is the new driver of orchestration replay. Previously, each DF API was responsible for inferring it's corresponding Task status by searching through the History log. Now, this responsibility is centralized in the TaskOrchestrationExecutor. This class interleaves iterating through the History log with resuming the execution of the user-defined generator (the orchestrator code). While iterating through the History, it seeks to update the state of currently-scheduled ("open") tasks which, when resolved ("closed"), allow us to resume running the generator.

Support for the Extension-level OOProc replay schema V2

The DF extension "recently" introduced a new replay protocol for OOProc orchestrations. By subscribing to this new extension-level replay protocol, OOProc SDKs enjoy improved correctness when dealing with WhenAny and WhenAll tasks. This PR adds support to this new protocol.

To do this, it changes its representation of user actions depending on the upperSchemaVersion field sent by the extension.

New task state machines

Tasks are now represented as state machines that can be signaled to update their state. This is particularly helpful in the case of WhenAny/WhenAll tasks, where the task needs to determine its current status by listening to state changes in its child-tasks.

Breaking Changes

There are at least 3 breaking changes introduced in this PR.

We no longer support returning a Task from an orchestrator to be valid. Previously, a statement such as return context.CallActivity(...) would be treated as return yield context.CallActivity(...). This is no longer the case.
We no longer support yielding the ContinueAsNew API, and doing so would throw an exception. This API is supposed to be fire-and-forget and so it doesn't make sense to support "awaiting" it.
All user-facing Task types now inherit from single identifier :Task. Additionally, the user-facing Task` types have been simplified and no longer expose various property that were for framework-internal use only. Our users have been asking for a more streamlined experience with Task-types so this is long overdue as well.

To my knowledge, this is the full extent of breaking changes, but I feel that the changes in this PR are disruptive enough that there might be one other minor behavior I could be missing. In any case, the behavior of the Node SDK after this PR should be more consistent with the Python SDK, which I believe is a good thing.

Follow-ups

This PR will need to be benchmarked and the changes extensively tested before a release.

Note:

Some files with major changes (like the orchestration context) are showing up as minimized due to large diffs. Don't miss them while reviewing :)

Thanks!

package.json

cgillum · 2022-01-10T18:23:32Z

Do we need to make any documentation updates to account for the mentioned breaking changes?

cgillum

Partial review feedback. I still have more to review.

src/actions/actiontype.ts

src/actions/whenanyaction.ts

src/durableorchestrationbindinginfo.ts

src/durableorchestrationcontext.ts

src/taskorchestrationexecutor.ts

davidmrdavid · 2022-01-25T01:14:19Z

The current state of the code should have addressed all PR feedback with the exception of two things:

Adding the copyright header, which I'll do later to avoid over-cluttering this PR
Saving some cycles by eagerly indexing into our arrays, and then checking for undefined. I'm concerned about letting undefined values flow through the code, so my first instinct is to avoid this; unless we feel this will be a real performance concern.

@cgillum: I'd appreciate if you could make another pass over this PR, resolving your old comments and questions if they've been addressed. Thanks!

cgillum

Just a few more things (and a couple old ones too).

src/iorchestratorstate.ts

src/durableorchestrationcontext.ts

src/orchestratorstate.ts

src/taskorchestrationexecutor.ts

src/task.ts

src/taskorchestrationexecutor.ts

Co-authored-by: Chris Gillum <cgillum@microsoft.com>

…e-functions-durable-js into dajusto/linear-replay

davidmrdavid · 2022-01-25T04:18:08Z

My latest PR responds to the most recent round of feedback. My bad for missing a few of your previous comments, there was a lot to address :)

cgillum

I'm good with this iteration. Remaining comments are optional/non-blocking.

src/iorchestratorstate.ts

src/task.ts

davidmrdavid · 2022-01-25T21:34:03Z

Wohoo, we're finally ready to merge 🎉
Super excited to get this out!

I'll add the copyright headers in a different PR, to avoid cluttering this one

Enable linear-time replay

f7a34e4

davidmrdavid commented Jan 7, 2022

View reviewed changes

package.json Show resolved Hide resolved

davidmrdavid marked this pull request as draft January 7, 2022 00:48

davidmrdavid added 2 commits January 6, 2022 17:03

document translation from V2 to V1 actions

51a84cf

document testing helper

7bbfd81

davidmrdavid marked this pull request as ready for review January 7, 2022 01:09

davidmrdavid requested review from AnatoliB, bachuv, amdeel and cgillum January 7, 2022 01:10

davidmrdavid mentioned this pull request Jan 7, 2022

callHttp does no polling #284

Closed

cgillum reviewed Jan 10, 2022

View reviewed changes

src/actions/actiontype.ts Show resolved Hide resolved

src/actions/whenanyaction.ts Outdated Show resolved Hide resolved

src/durableorchestrationbindinginfo.ts Outdated Show resolved Hide resolved

src/durableorchestrationbindinginfo.ts Outdated Show resolved Hide resolved

cgillum requested changes Jan 10, 2022

View reviewed changes

davidmrdavid added 3 commits January 24, 2022 16:48

respond to PR feedback

80e5fb7

drop extensionSchema enum

b40e36f

check for non-generator orchestrations

84e331a

davidmrdavid requested a review from cgillum January 25, 2022 01:08

cgillum reviewed Jan 25, 2022

View reviewed changes

davidmrdavid and others added 3 commits January 24, 2022 19:19

Update src/taskorchestrationexecutor.ts

9ddde58

Co-authored-by: Chris Gillum <cgillum@microsoft.com>

apply PR feedback

e644485

Merge branch 'dajusto/linear-replay' of https://github.com/Azure/azur…

f6347e8

…e-functions-durable-js into dajusto/linear-replay

davidmrdavid requested a review from cgillum January 25, 2022 04:17

cgillum approved these changes Jan 25, 2022

View reviewed changes

src/iorchestratorstate.ts Show resolved Hide resolved

src/task.ts Outdated Show resolved Hide resolved

simplifying timer logic

b1e62a2

davidmrdavid requested a review from cgillum January 25, 2022 19:41

davidmrdavid merged commit f337aff into dev Jan 25, 2022

davidmrdavid deleted the dajusto/linear-replay branch January 25, 2022 21:34

This was referenced Jan 25, 2022

Support V2 replay protocol #300

Closed

Unifying TaskSet and Task under a single identifier #185

Closed

Adopt a linear-time replay algorithm #305

Adopt a linear-time replay algorithm #305

Uh oh!

Conversation

davidmrdavid commented Jan 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem Statement

Other contributions

Outline of changes

A new replay driver

Support for the Extension-level OOProc replay schema V2

New task state machines

Breaking Changes

Follow-ups

Note:

Uh oh!

Uh oh!

cgillum commented Jan 10, 2022

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidmrdavid commented Jan 25, 2022

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidmrdavid commented Jan 25, 2022

Uh oh!

cgillum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidmrdavid commented Jan 25, 2022

Uh oh!

Uh oh!

davidmrdavid commented Jan 7, 2022 •

edited

Loading