Evaluation script Bugs?

Hi guys, thanks for the amazing dataset!
However, me and my colleagues have encountered several issues with your evaluation script, which made us unable to get 100% accuracy when testing GT against GT:

1. You set distance to invisible point (annotated invisible for gt or out-of-range invisible for pred) as `dist_th`:

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L159
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L179
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L190

So the x & z error counting will be off, they will be at least `dist_th = 1.5` for invisible points, I'm guessing these distances should be ignored here.

2. Because of 1, if a GT line is **entirely invisible**, any pred's distance to this GT line will be exactly `dist_th = 1.5`, then it won't pass the initial check here:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L203
and will be accumulated as FP/FN error. Simply removing this could have other consequences like `division by 0` later in:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L208

Anyways, this problem should not show because the script filters lines to have **at least 2 visible points**. However, the x range filtering is inconsistent between: 
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L104
and

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L121

Also, there is no filtering after interpolation, if a line has 2 visible points before interpolation but don't afterwards, it will also produce **entirely invisible** lines. For example, one line has y coordinates `[23.5 23.8]` and is valid, but `y_samples` are only integers, it won't be valid after (ex)interpolation.

---

Btw, by testing GT against GT, I can only get around `87%` F1. I saved GT after the coordinate transform and filtering. If you could clarify the ignore mechanism, I can make a pull request to fix this for you. There are two popular ignore mechanisms in metrics, I think the first one sounds better and aligns more with your original metric (only suggestions here):
1. ignore and let the prediction predict anything (e.g., the 255 ignore index in segmentation datasets).
2. neither encourage nor discourage a pred, provided if it matches with an ignored GT (e.g., the MOTChallenge non-pedestrian classes are ignored if matched with IoU 0.5, otherwise count the pred as a FP).

I think these issues could have been inherited from [the synthetic benchmark](https://github.com/yuliangguo/Pytorch_Generalized_3D_Lane_Detection/blob/master/tools/eval_3D_lane.py). And they could non-trivially influence your already evaluated results.  

cc @ChonghaoSima @dyfcalid @hli2020 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Evaluation script Bugs? #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Evaluation script Bugs? #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions