Skip to content

Conversation

e3krisztian
Copy link
Contributor

This PR addresses two issues (at the commit level they are separately implemented):

Signed-off-by: Krisztian Fekete <1246751+e3krisztian@users.noreply.github.com>
ValidationError had a single untyped property: .data, which hold
different objects in the XML and JSON cases, while they do have some
common properties.

The new properties introduced in this commit are
- .message: the error message text
- .path: pointer to the location of the error in the input

Signed-off-by: Krisztian Fekete <1246751+e3krisztian@users.noreply.github.com>
ValidationError.data and .message is provided by third party libraries,
and they can give a message of any length. E.g. jsonschema inserts its
input in all of its messages, which could be arbitrary big.

To be able to show these errors to the user, some pre-processing is
needed. The new method allows for squeezing these messages in a way,
that is least disruptive, and has special knowledge how to shorten
jsonschema messages.

It is definitely still a workaround, and ideally the libraries should
not yield unlimited messages.

Signed-off-by: Krisztian Fekete <1246751+e3krisztian@users.noreply.github.com>
There was only one validation method, that returned a single error (and
it was the first error (JSON) or the last one (XML)).

The new function `SchemabasedValidator.iterate_errors()` allows to
enumerate over all the validation errors.

Signed-off-by: Krisztian Fekete <1246751+e3krisztian@users.noreply.github.com>
@e3krisztian e3krisztian requested a review from a team as a code owner June 25, 2025 17:48
Copy link
Member

@jkowalleck jkowalleck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this feature.

to me, the current implementation looks odd and not very pythonic.

Instead of the current one, I have a design proposal:
add an optional parameter to validate_str that controls the return type a bit.
Something like def validate_str(self, data: str, *, all_errors: boolean=False) -> None | ValidationError | Iterable[ValidationError]:
The implementation would return None if there were no errors at all.
The implementation would return ValidationError if there was at least one error and all_errors is False.
The implementation would return Iterable[ValidationError] if there was at least one error and all_errors is True.

after the implementation is done, we would add proper type hints for each of the implementations with @overload.

what do you think?

@jkowalleck jkowalleck changed the title Validation improvements: ValidationError fields, limiting message sizes and iterating over all errors feat: ValidationError fields, limiting message sizes and iterating over all errors Jun 26, 2025
@jkowalleck
Copy link
Member

jkowalleck commented Jun 26, 2025

thanks for working on this feature.

to me, the current implementation looks odd and not very pythonic.

Instead of the current one, I have a design proposal: add an optional parameter to validate_str that controls the return type a bit. Something like def validate_str(self, data: str, *, all_errors: boolean=False) -> None | ValidationError | Iterable[ValidationError]: The implementation would return None if there were no errors at all. The implementation would return ValidationError if there was at least one error and all_errors is False. The implementation would return Iterable[ValidationError] if there was at least one error and all_errors is True.

after the implementation is done, we would add proper type hints for each of the implementations with @overload.

what do you think?

I am working on the proposed implementation design, and will pullrequest it to your branch once i am done.

PS: pullrequested the proposed changes

…ll errors

Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck
Copy link
Member

i see this PR is mixing scopes.

  1. it makes validation return an iterable - almost done
  2. it adds additional capabilities to the error class - still in discussion.

i like the iterable feature - it brings immediately value for downstream users.
I see the additional capabilities debatable.

this means: we have a mix of scopes, and each scope is potentially moving in a different pace - from a developer/maintainer's perspective.

Therefore, @e3krisztian , I'd ask to remove all the additional error capabilities from this PR, and open a different PR for them.
This enables us to release the "iterable" feature soon, while we discuss/refine the feature of additional error class capabilities.

A quick outlook: i could imagine we might find, in our discussion, that the whole error class shall be redesigned - which would be a breaking change I'd welcome.

@e3krisztian
Copy link
Contributor Author

@jkowalleck I appreciate the quick reviews and responses!

You are probably right, that this PR had attempted to change too much.

Unfortunately we clearly have a different view of a library. For me it includes providing guarantees, and being responsible for not leaking the underlying implementation abstractions (which here in the XML case is not even a public one!), and if necessary working around/covering/or making explicit the problems in them.

This PR attempted to somewhat address what was missing in my view around the validation interface, but it is clear, that our views of what needs to be abstracted away from the underlying implementations differ, as well as our view on what is "Pythonic code", and thus I find it very hard, and expensive to continue discussing it at code level.

Since the iterate over errors is your implementation, and above you are asking me to remove all the rest, I ask you to take over this PR, and remove or keep whatever you feel right, or close it and open one for your error iteration implementation.

We can continue the discussions about the error interfaces in the original issues, but I no longer intend to open PR-s implementing them.

@jkowalleck
Copy link
Member

jkowalleck commented Jun 27, 2025

@jkowalleck I appreciate the quick reviews and responses!

You are probably right, that this PR had attempted to change too much.

Unfortunately we clearly have a different view of a library. For me it includes providing guarantees, and being responsible for not leaking the underlying implementation abstractions (which here in the XML case is not even a public one!), and if necessary working around/covering/or making explicit the problems in them.

Nope. no different views - I share your opinion.
This is why I ask to split the PR, so the two scopes can be handled in their respective pace.
I do not want to lose your ideas of improving the error class. The error class was nothing people cared about much in the past - that's why it is so sloppy today. I appreciate your ideas.

This PR attempted to somewhat address what was missing in my view around the validation interface, but it is clear, that our views of what needs to be abstracted away from the underlying implementations differ, as well as our view on what is "Pythonic code", and thus I find it very hard, and expensive to continue discussing it at code level.

Since the iterate over errors is your implementation, and above you are asking me to remove all the rest, I ask you to take over this PR, and remove or keep whatever you feel right, or close it and open one for your error iteration implementation.

We can continue the discussions about the error interfaces in the original issues, but I no longer intend to open PR-s implementing them.

I see. thats why requirements engineering is usually done in tickets (#827), not PRs ;-)

i will remove the error class changes from this very PR, and probably merge/release the rest soon.
This way we have access to all underlying errors sooner than the more complex improvements on the error class.

Afterwards, I will continue the discussion for error class improvements in #827 - which already has this scope of improving the error class to make it valuable and independent of 3rd party implementations. More details in the tickets soon - your opinion is appreciated very much!

PS: removed debatable parts and started/continued discussion in #827 (comment)

Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck jkowalleck changed the title feat: ValidationError fields, limiting message sizes and iterating over all errors feat: schema based validation may return iterable of all errors Jun 27, 2025
@jkowalleck jkowalleck self-requested a review June 27, 2025 16:35
Signed-off-by: Jan Kowalleck <jan.kowalleck@gmail.com>
@jkowalleck jkowalleck merged commit f95576f into CycloneDX:main Jun 28, 2025
42 checks passed
@jkowalleck
Copy link
Member

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jul 7, 2025
## v10.3.0 (2025-06-30)

### Documentation

- Instructions for code style
  ([`160810f`](CycloneDX/cyclonedx-python-lib@160810f))

### Features

- Schema based validation may return iterable of all errors
  ([#834](CycloneDX/cyclonedx-python-lib#834),
  [`f95576f`](CycloneDX/cyclonedx-python-lib@f95576f))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants