-
Notifications
You must be signed in to change notification settings - Fork 1
2021 11 08 webex j ftwg
#11/08/21 webex notes for joint FT/Sessions WGs meeting
Attending: Howard Pritchard, Thomas Hines, Trupeshkumar Patel, Martin Schreiber, Martin Schulz, Isaias Urenak, Grrace Nansamba
- Discuss Reinit proposal
- Other
Wes gives a highlevel overview of this proposal since no one from the WG who promotes it was present. Good for BSP style applications using CPR style approaches. Asynchronous vs synchronous error notification. If an error occurs, code gets re-inited. The app specifies a function to call when an error is detected by the MPI/runtime. Sync error notification related to a new function MPI_Test_failure.
If a process goes away, the reinit proposal will restart/spawn a new process. Wes not sure about options for how this appears to the application. Desire to make it as close to CPR as possible.
A main target is legacy BSP style applications. Desire to make minimal changes to these applications to make use of reinit functionality. How would one implement MPI_Reinit if it were implemented by an external library?
Some historical discussion.
- Should we discuss MPI Stages at some point?
- Tools and Sessions
Links:
Google doc we're using - https://docs.google.com/document/d/1l7LQ8eeVOUW69TDVG9LjKJUuerfE3S3teaMFG5DOudM/edit#heading=h.voobxhw94rt3 Miro document - https://miro.com/app/board/o9J_l_Rxe9Q=/