Skip to content

Conversation

embarnard
Copy link
Contributor

@embarnard embarnard commented Jul 3, 2025

Link to pivotal/JIRA issue

Is PM acceptance required? (delete one)

  • Yes - don't merge until JIRA issue is accepted!

Reminder: merge main into this branch and get green tests before merging to main

What was done?

  • Add new archiver service for archiving the state file intakes from the 2024 tax year
  • uploaded new az_addresses.csv, id_addresses.csv, nj_addresses.csv, md_addresses.csv and nc_addresses.csv into vita-min-prod-docs in S3 so that when these new intakes are created they will have new batch of addresses to pick from, different from last years

How to test?

  1. create accepted intakes for each state with unique contact methods
  2. transition them to accepted
  3. run bundle exec rake state_file:ty24:archive_#{state_code}
  4. check for the existence of StateFileArchivedIntakes with the correctly copied over data

Copy link

github-actions bot commented Jul 3, 2025

Heroku app: https://gyr-review-app-6017-79ac9d54b8fc.herokuapp.com/
View logs: heroku logs --app gyr-review-app-6017 (optionally add --tail)

@embarnard embarnard marked this pull request as ready for review July 21, 2025 16:57
@embarnard embarnard changed the title Add TY-24 archiver service Add TY-24 state file intake archiver service Jul 21, 2025
# we batch these since archiving involves copying the submission pdf to a new location in s3
StateFile::Ty24ArchiverService.archive!(
state_code: args[:state_code],
batch_size: 50
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last year the batch number was 10, it seemed a little stingy to me so I increased it but happy to go down again


et = Time.find_zone('America/New_York')
start_date = et.parse('2025-01-15 00:00:00') # state_file_start_of_open_intake
end_date = et.parse('2025-10-25 23:59:59') # state_file_end_of_in_progress_intakes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many times will this be run? i’m wondering if these dates might need to be changed?

Copy link
Contributor

@anisharamnani anisharamnani Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, because of the query for the already archived intakes, this doesn’t have to change. 👍

# remove intakes that have already been archived
archived_emails = StateFileArchivedIntake.where(state_code: state_code, tax_year: tax_year)
.pluck(:email_address)
archived_phones = StateFileArchivedIntake.where(state_code: state_code, tax_year: tax_year)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why query by the email and phone as opposed to the hashed ssn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess they are both unique so it doesn’t really matter but i was just curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is unique identifier that they use to sign in with, they either pick email or phone number. Last year they only did this check on emails since we didn't have sms working in the previous season before that. But actually I was thinking about it more and I don't know why we didn't just add a key reference for that state file intake like data_source_id instead wdyt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 ah, i think this what you have is OK. considering that this will eventually exist in another db the data_source_id would go away.

)
archived.save!

if intake.submission_pdf.attached?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we archive intakes where there is no submission pdf?

🤔 feels like a weird state to be in considering the query checks if the submission has been accepted.

could we just query to guarantee that the submission PDF is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, I assumed that there might be weird cases where they have an accepted return but the submission pdf itself got deleted for some reason. But any case this is just following the structure from last year. @mpidcock martha, as someone who worked on the service last year-- do you know why we would want to archive intakes where there are no submission pdfs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my best recollection about this: there really shouldn't ever be a case where there's no submission pdf, but my assumption was that if we did fail to retrieve it, then that was a system error, and we would need to investigate on a case-by-case basis about what happened. I didn't want to simply skip over any submissions with no pdf, because we still know they should have a pdf, and I wanted to be able to flag anyone in that state. We didn't have an issue though, they all had pdfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so the plan would have been to note all the ones that didn't have pdfs via the rails logger warning and then investigate why there wasn't a submission pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess another thing I could do is just not create an archived intake if they don't have a submission pdf and just print the logger warning

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think that's ok, we were in a rush to delete the old records before so I was prioritizing archiving anything we might need so that the og records could be safely cleared, but we don't have that concern this year, so skipping + logging should be fine!

Copy link
Contributor

@anisharamnani anisharamnani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anisharamnani
Copy link
Contributor

great work em! thank you ❤️

@embarnard embarnard merged commit 5dabe43 into main Jul 25, 2025
8 checks passed
@embarnard embarnard deleted the FYST-2143-update-ty-23-archiver-service-for-2024 branch July 25, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants