-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat: enrichment steps on all convert pipelines (incl docx, html, etc) #2251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
✅ DCO Check Passed Thanks @dolfim-ibm, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
This PR refactors the pipelines to allow the enrichment on "standard items" for all pipeline. It enables to run picture classification and description on the embedded images for MS Word and HTML documents.
Actual changes:
artifacts_path
to the base pipeline_options and class --> remove redundant code from other pipelines.ConvertPipeline
which has options to enable the common enrich stepsPictureItem
without the page images.Checklist: