-
-
Notifications
You must be signed in to change notification settings - Fork 50
Description
Hi guys, thanks for the amazing dataset!
However, me and my colleagues have encountered several issues with your evaluation script, which made us unable to get 100% accuracy when testing GT against GT:
- You set distance to invisible point (annotated invisible for gt or out-of-range invisible for pred) as
dist_th
:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L159
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L179
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L190
So the x & z error counting will be off, they will be at least dist_th = 1.5
for invisible points, I'm guessing these distances should be ignored here.
- Because of 1, if a GT line is entirely invisible, any pred's distance to this GT line will be exactly
dist_th = 1.5
, then it won't pass the initial check here:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L203
and will be accumulated as FP/FN error. Simply removing this could have other consequences likedivision by 0
later in:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L208
Anyways, this problem should not show because the script filters lines to have at least 2 visible points. However, the x range filtering is inconsistent between:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L104
and
Also, there is no filtering after interpolation, if a line has 2 visible points before interpolation but don't afterwards, it will also produce entirely invisible lines. For example, one line has y coordinates [23.5 23.8]
and is valid, but y_samples
are only integers, it won't be valid after (ex)interpolation.
Btw, by testing GT against GT, I can only get around 87%
F1. I saved GT after the coordinate transform and filtering. If you could clarify the ignore mechanism, I can make a pull request to fix this for you. There are two popular ignore mechanisms in metrics, I think the first one sounds better and aligns more with your original metric (only suggestions here):
- ignore and let the prediction predict anything (e.g., the 255 ignore index in segmentation datasets).
- neither encourage nor discourage a pred, provided if it matches with an ignored GT (e.g., the MOTChallenge non-pedestrian classes are ignored if matched with IoU 0.5, otherwise count the pred as a FP).
I think these issues could have been inherited from the synthetic benchmark. And they could non-trivially influence your already evaluated results.