Skip to content

Add job benchmark loop #2226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

Kobzol
Copy link
Member

@Kobzol Kobzol commented Aug 20, 2025

This PR adds the main logic required to execute job benchmarks in the collector.

It handles:

  • Updating the collector's heartbeat periodically
  • Quick loading of already downloaded sysroots, to avoid redownloading them in-between jobs and also collector restarts (useful for local testing)
  • Dequeing jobs, including in-progress jobs and expanding benchmark sets
  • Distinguishing between transient and permanent job errors, storing job errors into the DB, and marking jobs as failed or successful
  • Marking jobs that have been dequeued too many times as failed
  • Reconnecting to the DB if a transient I/O/network/DB error happens, to try to refresh the DB connection

@Kobzol Kobzol requested a review from Jamesbarford August 20, 2025 10:32
component,
urls
))
if !non_404_error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be mistaken however this reads slightly oddly; if http is not a not found 404 then we produce an error of sha not found? Then we fall through to an IO error if it was a 404 despite the sha actually not being found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name wasn't great, renamed it and added a comment.

if !non_404_error {
Err(SysrootDownloadError::SysrootShaNotFound)
} else {
Err(SysrootDownloadError::IO(anyhow!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use resp.error_for_status() so we can get the actual error message too from the response which could be useful for debugging; https://docs.rs/reqwest/latest/reqwest/struct.Response.html#method.error_for_status

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_for_status() is nice, but the reason why we don't provide more context here is simply because we have up to three (potentially different) errors. Now that we detect 404s explicitly, we could just bail out on the first non-404 error, but I'm a bit worried about backwards compatibility, I don't know if "toolchain not found" is always reported with a 404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants