-
Notifications
You must be signed in to change notification settings - Fork 293
feat(iroh)!: Emit mDNS expiry events #3409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for making this PR! The code looks really solid. Mind adding a description of how mDNS expriry works? E.g. is it configurable, how is it triggered etc? I only know the basics of mDNS and some pointers towards those details would help me understand how this fits into the iroh discovery mechanism. |
any specific reason this is still a draft? |
I've still got to finish implementing a proper unit test. For some reason @flub will write up more tomorrow morning! |
It's all handled by the |
To build a mental picture of how these expiry events fit into iroh discovery I would like to have some mental picture of how the expiry events are emitted by swarm-discovery here. Superficially it seems like it is just a timeout since they last announced themselves? But I'm not sure if there's any more to it. Searching the crate docs you linked for "expiry" doesn't give me anything clear. I'd like to understand how this works because this is a breaking change to the discovery trait, and we need to be sure this is generic enough and will be future proof. For example, if my "this is a simple timeout" reasoning above is correct, does it really make sense to let every discovery mechanism decide it's own timeout? Maybe! Does it make more sense to instead have an (maybe optional) "announced at" time? Maybe! Is this all a special case of the swarm-discovery and does it make more sense to add the expiry stuff to that itself without involving the discovery trait? Maybe! This is why I'd like to get a better picture of the entire system we're involving here. |
If we think about this theoretically I think an expiry event should be emitted under two conditions:
The first condition is just a timer that aligns with the TTL of the DNS record emitted by the mDNS system. DNS records are normally published with a TTL which is the amount of time they are valid to be cached for. This is a reasonable value to use for the purpose of knowing if the node is still available as any resonable implementation would update the network before the TTL elapses. An argument could be made that condition 2 of a node going offline is the equilevent of it's record expiring (condition 1). The issue with this is if the node information is user-facing (as it is in my usecase) this is a horrible user expirence. When someone quits the application on a remote computer, as long as it gracefully shutdown, you would expect it to instantly disappear from the UI on all other machines on the network. If we jump back into Iroh. The design i've implemented here with the discovery system emitting If we relied on userspace timers, I don't think we would be able to have the instant feedback when a node went offline as the timer isn't deeply tied into the information the discovery system has. I personally think the individual discovery systems should implement the expiry themselves which means they can use whatever logic they want internally to determine if the node is still available. Although you could argue this is less flexible from a user perspective, I think there is generally only a single correct value for if a node is available or not. For example in the case of mDNS DNS record, when the TTL expires the client should be treated as expired. If the default TTL was a problem I think this should be adjusted as the point of emitting the DNS record, not on a node which is receiving it. I have only been using the mDNS discovery system so i'm not certain how this fits into the other systems but I assume all systems would have some form of TTL or expiry signal as you always need way of knowing if the discovery information is still going to be valid. So tldr: this isn't just a simple timeout, it also takes into account a node going offline near-instantly which is crucial to the UX of the application i'm building. Let me know if that makes a bit more sense, and anything if you would like me to clarify furthur! |
@oscartbeaumont Thanks for the overview! This helped me a lot. Do you know if mDNS (and the swarm-discovery crate) do announce when they're going offline? |
(Also, apologies for the delay! I've been on holiday and am catching up on things now) |
@@ -326,7 +326,7 @@ pub trait Discovery: std::fmt::Debug + Send + Sync + 'static { | |||
fn resolve( | |||
&self, | |||
_node_id: NodeId, | |||
) -> Option<BoxStream<Result<DiscoveryItem, DiscoveryError>>> { | |||
) -> Option<BoxStream<Result<DiscoveryEvent, DiscoveryError>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think resolve is specifically to find a DiscoveryItem
or not, would this conceivably ever need to return DiscoveryEvent::Expired
(or whatever colour that bikesheds ends up having)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was resolve
is scoped to a single node where subscribe
is for getting the events for every node.
If that is correct then personally I think DiscoveryEvent
here makes sense, if not we could concider changing it back but personally I think that feels a little inconsistent.
I suppose the method's name being resolve
doesn't fit the best with the new return type.
I'm happy to do whatever you think here, but personally I think we leave it or concider renaming resolve
to something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi hi! resolve
should still return a DiscoveryItem
. subscribe
makes sense to return a stream of events, but resolve
should never need to return an expired event. If a NodeId
can no longer be resolved, it should return None
.
I don't fully understand what would be inconsistent. If you mean, it seems cleaner to have the same return type in both resolve
and subscribe
methods, I think that it is not only fine to have different return types, but could be better in some ways. It would more clearly show, just by the function signature, that the methods have different purposes.
But let me know if you were referring to something else!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolve
is meant to be a request for a single node. The reason it returns a stream is because a discovery service might be using multiple queries behind the scenes and different results may come in at different times. iroh wants to start using the first result as soon as possible, but if later results are better still have them available. But this stream is not used for a "long" time.
Hence it makes more sense to leave this as a DiscoveryItem
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should emit Expired
for nodes that remove_node_info
is called on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With current understanding this was a bad call. The static provider does not, and should not, implement Discovery::subscribe
. Please revert this.
I have observed this in testing but I actually can't figure out what is causing it within |
Description
I'm working on a project where we want to have a list of peers that can be connected to without actually establishing a connection to each of them. The user can then later select a peer and we will establish a connection to the ones that are required.
For this to work we need to be able to tell our frontend about the available peers on the network. Right now Iroh emits events when peers are discovered but it doesn't provide a mechanism to detect when those peers are no longer online so they remain in the frontend UI until the application is restarted.
This PR implements a system for Iroh's discovery system to emit events when a peer is no-longer aviable. This is implemented into the core discovery system, however the only discovery mechanism that currently emits these are mDNS.
This PR solves #3040.
Breaking Changes
All methods that previously retruned
DiscoveryItem
now returnDiscoveryEvent
. This includes theDiscovery
trait,Endpoint::discovery_stream
, etc.Change checklist