Releases: geopandas/pyogrio
Releases Β· geopandas/pyogrio
Version 0.11.1
Version v0.11.0
Improvements
- Capture all errors logged by gdal when opening a file fails (#495).
- Add support to read and write ".gpkg.zip" (GDAL >= 3.7), ".shp.zip", and ".shz"
files (#527). - Compatibility with the string dtype in the upcoming pandas 3.0 release (#493).
Bug fixes
- Fix WKB writing on big-endian systems (#497).
- Fix writing fids to e.g. GPKG file with
use_arrow
(#511). - Fix error in
write_dataframe
when writing an empty or all-None object
column withuse_arrow
(#512).
Packaging
- The GDAL library included in the wheels is upgraded from 3.9.2 to 3.10.3 (#499).
Version 0.10.0
Improvements
- Add support to read, write, list, and remove
/vsimem/
files (#457).
Bug fixes
- Silence warning from
write_dataframe
withGeoSeries.notna()
(#435). - Enable mask & bbox filter when geometry column not read (#431).
- Raise NotImplmentedError when user attempts to write to an open file handle (#442).
- Prevent seek on read from compressed inputs (#443).
Packaging
- For the conda-forge package, change the dependency from
libgdal
to
libgdal-core
. This package is significantly smaller as it doesn't contain
some large GDAL plugins. Extra plugins can be installed as seperate conda
packages if needed: more info here.
This also leads topyproj
becoming an optional dependency; you will need
to installpyproj
in order to support spatial reference systems (#452). - The GDAL library included in the wheels is updated from 3.8.5 to GDAL 3.9.2 (#466).
- pyogrio now requires a minimum version of Python >= 3.9 (#473).
- Wheels are now available for Python 3.13.
Version 0.9.0
Version v0.8.0
Improvements
- Support for writing based on Arrow as the transfer mechanism of the data
from Python to GDAL (requires GDAL >= 3.8). This is provided through the
newpyogrio.raw.write_arrow
function, or by using theuse_arrow=True
option inpyogrio.write_dataframe
(#314, #346). - Add support for
fids
filter toread_arrow
andopen_arrow
, and to
read_dataframe
withuse_arrow=True
(#304). - Add some missing properties to
read_info
, including layer name, geometry name
and FID column name (#365). read_arrow
andopen_arrow
now provide
GeoArrow-compliant extension metadata,
including the CRS, when using GDAL 3.8 or higher (#366).- The
open_arrow
function can now be used without apyarrow
dependency. By
default, it will now return a stream object implementing the
Arrow PyCapsule Protocol
(i.e. having an__arrow_c_stream__
method). This object can then be consumed
by your Arrow implementation of choice that supports this protocol. To keep
the previous behaviour of returning apyarrow.RecordBatchReader
, specify
use_pyarrow=True
(#349). - Warn when reading from a multilayer file without specifying a layer (#362).
- Allow writing to a new in-memory datasource using io.BytesIO object (#397).
Bug fixes
- Fix error in
write_dataframe
if input has a date column and
non-consecutive index values (#325). - Fix encoding issues on windows for some formats (e.g. ".csv") and always write ESRI
Shapefiles using UTF-8 by default on all platforms (#361). - Raise exception in
read_arrow
orread_dataframe(..., use_arrow=True)
if
a boolean column is detected due to error in GDAL reading boolean values for
FlatGeobuf / GPKG drivers (#335, #387); this has been fixed in GDAL >= 3.8.3. - Properly ignore fields not listed in
columns
parameter when reading from
the data source not using the Arrow API (#391). - Properly handle decoding of ESRI Shapefiles with user-provided
encoding
option forread
,read_dataframe
, andopen_arrow
, and correctly encode
Shapefile field names and text values to the user-providedencoding
for
write
andwrite_dataframe
(#384). - Fixed bug preventing reading from bytes or file-like in
read_arrow
/
open_arrow
(#407).
Packaging
- The GDAL library included in the wheels is updated from 3.7.2 to GDAL 3.8.5.
Potentially breaking changes
- Using a
where
expression combined with a list ofcolumns
that does not include
the column referenced in the expression is not recommended and will now
return results based on driver-dependent behavior, which may include either
returning empty results (even if non-empty results are expected fromwhere
parameter)
or raise an exception (#391). Previous versions of pyogrio incorrectly
set ignored fields against the data source, allowing it to return non-empty
results in these cases.
Version 0.7.2
Version 0.7.1
Bug fixes
- Fix unspecified dependency on
packaging
(#318).
Version 0.7.0
Improvements
- Support reading and writing datetimes with timezones (#253).
- Support writing dataframes without geometry column (#267).
- Calculate feature count by iterating over features if GDAL returns an
unknown count for a data layer (e.g., OSM driver); this may have signficant
performance impacts for some data sources that would otherwise return an
unknown count (count is used inread_info
,read
,read_dataframe
) (#271). - Add
arrow_to_pandas_kwargs
parameter toread_dataframe
+ reduce memory usage
withuse_arrow=True
(#273) - In
read_info
, the result now also contains thetotal_bounds
of the layer as well
as some extracapabilities
of the data source driver (#281). - Raise error if
read
orread_dataframe
is called with parameters to read no
columns, geometry, or fids (#280). - Automatically detect supported driver by extension for all available
write drivers and addition ofdetect_write_driver
(#270). - Addition of
mask
parameter toopen_arrow
,read
,read_dataframe
,
andread_bounds
functions to select only the features in the dataset that
intersect the mask geometry (#285). Note: GDAL < 3.8.0 returns features that
intersect the bounding box of the mask when using the Arrow interface for
some drivers; this has been fixed in GDAL 3.8.0. - Removed warning when no features are read from the data source (#299).
- Add support for
force_2d=True
withuse_arrow=True
inread_dataframe
(#300).
Other changes
-
test suite requires Shapely >= 2.0
-
using
skip_features
greater than the number of features available in a data
layer now returns empty arrays forread
and an empty DataFrame for
read_dataframe
instead of raising aValueError
(#282). -
enabled
skip_features
andmax_features
forread_arrow
and
read_dataframe(path, use_arrow=True)
. Note that this incurs overhead
because all features up to the next batch size abovemax_features
(or size
of data layer) will be read prior to slicing out the requested range of
features (#282). -
The
use_arrow=True
option can be enabled globally for testing using the
PYOGRIO_USE_ARROW=1
environment variable (#296).
Bug fixes
- Fix int32 overflow when reading int64 columns (#260)
- Fix
fid_as_index=True
doesn't set fid as index usingread_dataframe
with
use_arrow=True
(#265) - Fix errors reading OSM data due to invalid feature count and incorrect
reading of OSM layers beyond the first layer (#271) - Always raise an exception if there is an error when writing a data source
(#284)
Potentially breaking changes
- In
read_info
(#281):- the
features
property in the result will now be -1 if calculating the
feature count is an expensive operation for this driver. You can force it to be
calculated using theforce_feature_count
parameter. - for boolean values in the
capabilities
property, the values will now be
booleans instead of 1 or 0.
- the
Packaging
- The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.
Version 0.6.0
Improvements
- Add automatic detection of 3D geometries in
write_dataframe
(#223, #229) - Add "driver" property to
read_info
result (#224) - Add support for dataset open options to
read
,read_dataframe
, and
read_info
(#233) - Add support for pandas' nullable data types in
write_dataframe
, or
specifying a mask manually for missing values inwrite
(#219) - Standardized 3-dimensional geometry type labels from "2.5D " to
" Z" for consistency with well-known text (WKT) formats (#234) - Failure error messages from GDAL are no longer printed to stderr (they were
already translated into Python exceptions as well) (#236). - Failure and warning error messages from GDAL are no longer printed to
stderr: failures were already translated into Python exceptions
and warning messages are now translated into Python warnings (#236, #242). - Add access to low-level pyarrow
RecordBatchReader
via
pyogrio.raw.open_arrow
, which allows iterating over batches of Arrow
tables (#205). - Add support for writing dataset and layer metadata (where supported by
driver) towrite
andwrite_dataframe
, and add support for reading
dataset and layer metadata inread_info
(#237).
Packaging
- The GDAL library included in the wheels is updated from 3.6.2 to GDAL 3.6.4.
- Wheels are now available for Linux aarch64 / arm64.