feat(vlm): Ability to preprocess VLM response #1907

shkarupa-alex · 2025-07-08T06:37:21Z

This is the third and i hope the last PR to support models like OlmOcr ( 1 - #1802 , 2 - #1808 ).

VLM models may generate something more than just text or markdown.
E.g.:

OlmOcr generates json with information about page language, rotation and recognized text
Nanonets-OCR-s generates recognized text as markdown, but tables and some other elements as HTML.

For such models we need a way to decode vlm response (fully convert it to markdown).

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

github-actions · 2025-07-08T06:37:32Z

✅ DCO Check Passed

Thanks @shkarupa-alex, all your commits are properly signed off. 🎉

mergify · 2025-07-08T06:37:56Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

codecov · 2025-07-08T09:38:41Z

Codecov Report

Attention: Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
docling/models/api_vlm_model.py	0.00%	2 Missing ⚠️
.../models/vlm_models_inline/hf_transformers_model.py	0.00%	2 Missing ⚠️
docling/models/vlm_models_inline/mlx_model.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

dolfim-ibm · 2025-07-23T07:40:56Z

@shkarupa-alex we would like to propose an alternative solution.

First, let's add a method decode_response() in the ApiVlmOptions (similar also in HuggingFaceTransformersVlmModel) which is simply returning the input text.

Then, in your OlmOcr example, you can make a derived class from ApiVlmOptions which overloads the json parsing.

Are you willing to adapt your PR in this direction? (sorry for the late response)

shkarupa-alex · 2025-07-23T17:40:45Z

I will apply your recommendations in few days

shkarupa-alex · 2025-08-10T08:58:14Z

@dolfim-ibm i moved decode_response to vlm option as you proposed.
But to keep api consistent i also moved per-page prompt formulation (intorduced in #1808) to vlm.

…de). Per-page prompt formulation also moved to vlm options to keep api consistent. Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

dolfim-ibm · 2025-08-11T08:00:43Z

Thanks @shkarupa-alex, I just noticed I had a big typo in my message above. What we planned is to put the functions in the model classes not the options. Sorry about the message being misleading.

shkarupa-alex · 2025-08-11T08:17:35Z

@dolfim-ibm could you please clarify what is a model?
If you are talking about ApiVlmModel, HuggingFaceMlxModel and HuggingFaceTransformersVlmModel it is a good place, but it is not possible now to override them for user https://github.com/docling-project/docling/blob/main/docling/pipeline/vlm_pipeline.py#L78

Can you propose how to pass overrided model class (or instance) to pipeline?

dolfim-ibm

Great, thanks for following up on all iterations!

dolfim-ibm · 2025-08-12T13:20:19Z

@shkarupa-alex we just discussed a bit more the approach and we think it is covering the short-term needs. Again thanks a lot for the contribution and for following up on the discussions.

We are actually rethinking the whole stage and model-runtime design, so we might have to do a few iterations on this topic as well in the next days.

Add ability to preprocess VLM response

5e1e82a

Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

shkarupa-alex changed the title ~~(feat: vlm): Ability to preprocess VLM response~~ (feat): Ability to preprocess VLM response Jul 8, 2025

shkarupa-alex changed the title ~~(feat): Ability to preprocess VLM response~~ feat(vlm): Ability to preprocess VLM response Jul 8, 2025

PeterStaar-IBM requested review from PeterStaar-IBM, dolfim-ibm and cau-git July 8, 2025 07:48

cau-git assigned dolfim-ibm Jul 14, 2025

Move response decoding to vlm options (requires inheritance to overri…

713612e

…de). Per-page prompt formulation also moved to vlm options to keep api consistent. Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>

shkarupa-alex force-pushed the vlm-preprocess-response branch from 21932ac to 713612e Compare August 10, 2025 09:35

dolfim-ibm approved these changes Aug 12, 2025

View reviewed changes

dolfim-ibm merged commit 5f050f9 into docling-project:main Aug 12, 2025
11 checks passed

dosubot bot mentioned this pull request Aug 13, 2025

Docling not sending any requests to the setted up VLM API docling-project/docling-serve#318

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(vlm): Ability to preprocess VLM response #1907

feat(vlm): Ability to preprocess VLM response #1907

Uh oh!

shkarupa-alex commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

mergify bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 8, 2025

Uh oh!

dolfim-ibm commented Jul 23, 2025

Uh oh!

shkarupa-alex commented Jul 23, 2025

Uh oh!

shkarupa-alex commented Aug 10, 2025

Uh oh!

dolfim-ibm commented Aug 11, 2025

Uh oh!

shkarupa-alex commented Aug 11, 2025 •

edited

Loading

Uh oh!

dolfim-ibm left a comment

Uh oh!

dolfim-ibm commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

feat(vlm): Ability to preprocess VLM response #1907

feat(vlm): Ability to preprocess VLM response #1907

Uh oh!

Conversation

shkarupa-alex commented Jul 8, 2025

Uh oh!

github-actions bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

codecov bot commented Jul 8, 2025

Codecov Report

Uh oh!

dolfim-ibm commented Jul 23, 2025

Uh oh!

shkarupa-alex commented Jul 23, 2025

Uh oh!

shkarupa-alex commented Aug 10, 2025

Uh oh!

dolfim-ibm commented Aug 11, 2025

Uh oh!

shkarupa-alex commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dolfim-ibm left a comment

Choose a reason for hiding this comment

Uh oh!

dolfim-ibm commented Aug 12, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 8, 2025 •

edited

Loading

mergify bot commented Jul 8, 2025 •

edited

Loading

shkarupa-alex commented Aug 11, 2025 •

edited

Loading