`orjson` for much faster JSON operations?

I was trying to optimize the performance for a high-throughput use case (proposed by @janosh) where one might need to load/save large quantities of `Structure` as JSON, and I'm currently starting with the saving phase of the workflow.

A quick line profiling reveals two significant bottlenecks:
- `as_dict` method of `Structure`: I did some preliminary profiling, and looks like this is owing to (unnecessarily) repeated calculation of `Lattice` properties like `lengths/angles` during serialization (and I believe proper caching could resolve this with correct cache voiding mechanism upon internal `_matrix` change). But **this would be a separate PR** so won't cover too much here.
- **Topic of this thread: slow `json.dump` to file**

```
Total time: 65.3412 s
File: show_structure_json_bottleneck.py
Function: save_to_file at line 38

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    38                                           @profile
    39                                           def save_to_file(tmpdir):
    40      1001        921.1      0.9      0.0      for i, struct in enumerate(STRUCTURES):
    41      1000   32742869.8  32742.9     50.1          d = struct.as_dict()
    42      1000       9134.1      9.1      0.0          path = os.path.join(tmpdir, f"struct_{i}.json")
    43      2000      77023.5     38.5      0.1          with open(path, "w") as f:
    44      1000   32511267.6  32511.3     49.8              json.dump(d, f)
```

---

And if we replace builtin `json` with the faster [drop-in replacement `orjson`](https://github.com/ijl/orjson) (I'm not aware of any caveat so far), we got (tested on WSL2 Ubuntu 22.04):
- almost 10x speed up converting Structure dict to string
- more than 10x speed up for saving Structure dict to file
- a little speed up loading JSON back

```
== Dump to String ==
    json: 0.9821 s
  orjson: 0.1173 s

== Save to File ==
  json_plain: 3.2918 s, 105549.10 KB
orjson_plain: 0.2284 s, 97192.09 KB
     json_gz: 8.0444 s, 32786.10 KB
   orjson_gz: 3.3784 s, 32217.06 KB

== Load from File ==
  json_plain: 3.1989 s
orjson_plain: 2.6992 s
     json_gz: 3.4490 s
   orjson_gz: 2.9699 s
```

Script: [compare_structure_json.txt](https://github.com/user-attachments/files/19998603/compare_structure_json.txt)

As the speed up is quite obvious with almost no code change (also considering JSON is heavily used around the code base, so not only `Structure` would benefit from this), perhaps we could consider adding `orjson` as the default JSON handler? (I'm afraid adding it as an optional dependency would incur too much code change to detect whether `orjson` is available)

But also I'm a bit unsure about the migration process as [current `json.dumps` takes `kwargs`](https://docs.python.org/3/library/json.html#basic-usage) (e.g. #4295) and this might cause breakage: https://github.com/materialsproject/pymatgen/blob/1e36ee511a89d828daa2882353c74167947a1d8e/src/pymatgen/core/structure.py#L2963


### `monty` JSON and lack of custom JSON encoder support

[`orjson` is already an optional dependency for `monty`](https://github.com/materialsvirtuallab/monty/blob/b4b879bc0b607edf05ea9180e7593428f1429385/pyproject.toml#L32) so maybe it's also good for us to double check if `orjson` is fully used from `monty`'s side as quite some JSON-related workloads are implicitly handled by `monty`.

Also as `orjson` doesn't directly support custom encoder/decoder (e.g. `MontyEncoder`), some more thoughts might be needed to this end


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`orjson` for much faster JSON operations? #4385

`monty` JSON and lack of custom JSON encoder support

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

orjson for much faster JSON operations? #4385

Description

monty JSON and lack of custom JSON encoder support

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`orjson` for much faster JSON operations? #4385

`monty` JSON and lack of custom JSON encoder support