Skip to content

Log error for old invocation status table #3585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

slinkydeveloper
Copy link
Contributor

Part of #1852

@slinkydeveloper slinkydeveloper added the release-blocker Blocker for the next release label Jul 25, 2025
@slinkydeveloper slinkydeveloper added this to the 1.5 milestone Jul 25, 2025
Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this warning for our users @slinkydeveloper. Did you check whether it adds a noticeable start up penalty when the partition store is a bit larger? If that's the case, then we could remember if this check was executed before and had a negative outcome to avoid it in the future. Apart from this +1 for merging.

Comment on lines +629 to +633
error!(
"Detected an invocation with id {invocation_id} stored using the deprecated format. Cancel or kill the invocation.\nTHIS WILL BECOME AN HARD ERROR IN 1.6"
)
} else {
error!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a warn since Restate will still work after seeing these log messages.

pub fn verify_deprecations(&self, key_range: RangeInclusive<PartitionKey>) {
// Verify invocation status v1 is unused
let Ok(iter) = self.iterator_from::<InvocationStatusKeyV1>(
TableScan::FullScanPartitionKeyRange(key_range),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check whether this full table scan has a noticeable effect when starting Restate with a larger db?

@AhmedSoliman AhmedSoliman self-requested a review July 30, 2025 08:03
@@ -606,6 +609,33 @@ impl PartitionStore {
pub fn partition(&self) -> &Arc<Partition> {
self.db.partition()
}

pub fn verify_deprecations(&self, key_range: RangeInclusive<PartitionKey>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key range should be available already in self.partition_key_range(), or do you have a scenario where partial range scan is needed?

Comment on lines +629 to +631
error!(
"Detected an invocation with id {invocation_id} stored using the deprecated format. Cancel or kill the invocation.\nTHIS WILL BECOME AN HARD ERROR IN 1.6"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1- Isn't this a big ask for our users? is there a way to auto migrate those invocations instead?
2- What if the server operator is not he owner of the invocation, e.g. restate cloud or a platform team?
3- Is the intention here to warn about a single invocation_id even if the table has many entries?

Side nit: no need for shouting in log messages.

@@ -150,6 +150,8 @@ impl SpawnPartitionProcessorTask {
Err(e) => Err(ProcessorError::from(e)),
}?;

partition_store.verify_deprecations(partition.key_range);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should do operations like these on startup. If this is something that can happen lazily (potentially after a partition becomes leader) then we should do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you would move this in Leadership state?

@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented Jul 30, 2025

@AhmedSoliman

We already lazily execute the migration since 1.2, this is gonna trigger for invocations that have made no progress since 1.1 (not even transitioned to another state, etc). I don't think this is a big ask, the only realistic case where an invocation is blocked since 1.1 is someone doing a sleep of 1 year or similar (very unlikely).

@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented Jul 30, 2025

@AhmedSoliman Also check #1852 for the list of what we've already done in the context of this deprecation/removal. If you have alternative ideas for for removing this piece of code, that would be appreciated.

@AhmedSoliman
Copy link
Contributor

@slinkydeveloper is it possible to auto migrate the remaining invocations? Imagine if a user is running on v1.1 and successively upgrades to v1.2, v1.3, then v1.4. They might not have enough time for all their invocations to change states.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker Blocker for the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants