Skip to content

Arrow interface (RFC 86) returns features based on bounding box overlap instead of intersection when using spatial filter #8347

@brendan-ward

Description

@brendan-ward

Expected behavior and actual behavior.

Given a geometry spatial filter or a bounding box spatial filter, the Arrow interface returns more features than intersect the geometry for some drivers (GPKG, FlatGeobuf), whereas the regular interface returns the expected features. It appears that the Arrow interface uses the bounding boxes of the geometries in the data source instead of their actual geometries.

Other drivers tested (Shapefile, GeoJSON) produce expected results when using both the regular and Arrow interfaces.

For example when querying NaturalEarth countries (WGS84 coordinates), a point located in the middle of Canada returns only a single record for Canada when using the regular interface, whereas it returns Canada, Russia, and the USA when using the Arrow interface (bounding boxes for Russia and USA wrap around the anti-meridian).

First observed in pyogrio #285

Steps to reproduce the problem.

Using a test GPKG file created from NaturalEarth countries (1:110m)

from osgeo import ogr

path = "/tmp/test.gpkg"
driver = ogr.GetDriverByName("GPKG")

dataSource = driver.Open(path, 0)
layer = dataSource.GetLayer()

# point located in Canada
layer.SetSpatialFilter(ogr.CreateGeometryFromWkt("Point (-105 55)"))
# or 
# layer.SetSpatialFilterRect(-105, 54, -104, 55)

iso_a3 = []
for feature in layer:
    iso_a3.append(feature.GetField("iso_a3"))

print(f"Using regular interface: {iso_a3}")


stream = layer.GetArrowStreamAsPyArrow()

iso_a3_arrow = []
for batch in stream:
    iso_a3_arrow.extend(batch.field("iso_a3").tolist())

print(f"Using arrow interface: {iso_a3_arrow}")

Outputs:

Using regular interface: ['CAN']
Using arrow interface: ['RUS', 'USA', 'CAN']

Operating system

MacOS 12.6.5 (M1)

GDAL version and provenance

Reproduced using both:

  • 3.7.1 installed via Homebrew
  • gdal python package (3.7.1.1) installed via pip

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions