-
-
Notifications
You must be signed in to change notification settings - Fork 29
Correctly write mixed geometry types #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly write mixed geometry types #82
Conversation
Even though I admit the current behaviour is also a bit weird, for me personally this still rather feels like a step back than an improvement: without any warning/error you now lose the metadata in eg. a geopackage that a layer contains e.g. polygons. So I'd rather go for the 2 lines of code to give the user the choice:
|
For the default behaviour of gdal, I suppose it will depend on the driver used to read the data or the way it is read. I have mostly experience with running SQL statements on geopackages, and for that case GDAL will pick the geometrytype of the first row fetched as geometrytype of the destination file... probably because it cannot know all the geometrytypes that will be fetched in the future. |
I also wondered how e.g. QGIS deals with a "geometry" column so I had a (very quick) look:
So, all in all QGIS at least tries to do the best job possible to handle the file and even though there is a delay, at least you can use it. |
Note that this PR is meant as the baseline default behaviour (so not as a replacement for #75). We can still improve upon that for specific cases, such as detecting the case of single/multi geom of the same type and automatically doing something for that (cfr #75), or raising a warning if we use "Unknown" and let users overwrite it with an argument like But to reiterate, the current behaviour on the main branch avoids writing data to some formats and is seemingly random in some cases (depending on the order of the rows). For example:
For GeoJSON and Shapefile, nothing changes with this PR. GeoJSON simply doesn't care about geometry types, and Shapefile only supports a single type anyway (and for the case of single/multi of one type, passing "Unknown" is still fine as GDAL seems to infer the type in that case and writes a correct file). I would argue that the change in this PR is therefore generally an improvement (compared to I certainly agree that in the case of Polygon/MultiPolygon for GeoPackage, where it generally seems to be handled fine by readers like QGIS if the geometry type is not strictly correct (and even better handled compared to it set to "Geometry", according to your experiment). But again we can reiterate on this to find improvements for those specific cases on top of the general behaviour.
Do you have an example of that? I tried |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pulling this out as an isolated change @jorisvandenbossche
Bringing this in line with what Fiona / GeoPandas already does is a good argument for handling "Uknown" this way, since we want pyogrio to be a nearly seamless drop-in for Fiona in GeoPandas.
I see #75 building on top of this for mixed plurality but not type. I'd like to see both in before we cut a release.
raising a warning if we use "Unknown" and let users overwrite it with an argument like
geometry_type=
I like this option; emit a warning for default behavior but letting the user explicitly sidestep that by giving us specific instructions.
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
I would prefer to defer to that another issue/PR, focusing here on the minimal change to align us with fiona/geopandas (and to follow the geopackage spec). |
213b85e
to
a566a8c
Compare
Related to the discussion in #75
This is a very simple change (not yet trying to detect if mixed geometries are truly mixed or only single/multi of the same type): if there is more than 1 geometry type, we create the GDAL dataset with "Unknown" geometry type ("Unknown" here is the slightly confusing name for "any" geometry type: Use wkbUnknown if there are no constraints on the types geometry to be written. from https://gdal.org/api/raster_c_api.html#gdal_8h_1a7f01d3d8584f29e4ef3c56b5af49d816)
This is consistent with what Fiona (and therefore geopandas) currently does.
And I would argue that this is also what GDAL does to some extent (although it is of course not exactly equivalent situation). If you start from a source like GeoJSON which does not encode geometry type information and can store any mix of types (thus, very similar to a GeoDataFrame), GDAL (or
ogrinfo
) will infer it as "Unknown (any)", and for example when converting this GeoJSON file to a GPKG usingogr2ogr
it will keep this "Unknown" geometry type.Without this change, the tests that I added would fail (except for GeoJSON): for GPKG it "works" but sets a "Point" geometry type in the file, which is thus a "wrong" file. For FGB it actually errors.