Skip to content

Evaluation script Bugs? #15

@voldemortX

Description

@voldemortX

Hi guys, thanks for the amazing dataset!
However, me and my colleagues have encountered several issues with your evaluation script, which made us unable to get 100% accuracy when testing GT against GT:

  1. You set distance to invisible point (annotated invisible for gt or out-of-range invisible for pred) as dist_th:

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L159
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L179
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L190

So the x & z error counting will be off, they will be at least dist_th = 1.5 for invisible points, I'm guessing these distances should be ignored here.

  1. Because of 1, if a GT line is entirely invisible, any pred's distance to this GT line will be exactly dist_th = 1.5, then it won't pass the initial check here:
    https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L203
    and will be accumulated as FP/FN error. Simply removing this could have other consequences like division by 0 later in:
    https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L208

Anyways, this problem should not show because the script filters lines to have at least 2 visible points. However, the x range filtering is inconsistent between:
https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L104
and

https://github.com/OpenPerceptionX/OpenLane/blob/f74ecca299e032e100c0ca200a3299c1745de084/eval/LANE_evaluation/lane3d/eval_3D_lane.py#L121

Also, there is no filtering after interpolation, if a line has 2 visible points before interpolation but don't afterwards, it will also produce entirely invisible lines. For example, one line has y coordinates [23.5 23.8] and is valid, but y_samples are only integers, it won't be valid after (ex)interpolation.


Btw, by testing GT against GT, I can only get around 87% F1. I saved GT after the coordinate transform and filtering. If you could clarify the ignore mechanism, I can make a pull request to fix this for you. There are two popular ignore mechanisms in metrics, I think the first one sounds better and aligns more with your original metric (only suggestions here):

  1. ignore and let the prediction predict anything (e.g., the 255 ignore index in segmentation datasets).
  2. neither encourage nor discourage a pred, provided if it matches with an ignored GT (e.g., the MOTChallenge non-pedestrian classes are ignored if matched with IoU 0.5, otherwise count the pred as a FP).

I think these issues could have been inherited from the synthetic benchmark. And they could non-trivially influence your already evaluated results.

cc @ChonghaoSima @dyfcalid @hli2020

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions