-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat(vlm): Ability to preprocess VLM response #1907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vlm): Ability to preprocess VLM response #1907
Conversation
Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
✅ DCO Check Passed Thanks @shkarupa-alex, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Codecov ReportAttention: Patch coverage is 📢 Thoughts on this report? Let us know! |
@shkarupa-alex we would like to propose an alternative solution. First, let's add a method Then, in your OlmOcr example, you can make a derived class from Are you willing to adapt your PR in this direction? (sorry for the late response) |
I will apply your recommendations in few days |
@dolfim-ibm i moved decode_response to vlm option as you proposed. |
…de). Per-page prompt formulation also moved to vlm options to keep api consistent. Signed-off-by: Shkarupa Alex <shkarupa.alex@gmail.com>
21932ac
to
713612e
Compare
Thanks @shkarupa-alex, I just noticed I had a big typo in my message above. What we planned is to put the functions in the model classes not the options. Sorry about the message being misleading. |
@dolfim-ibm could you please clarify what is Can you propose how to pass overrided model class (or instance) to pipeline? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for following up on all iterations!
@shkarupa-alex we just discussed a bit more the approach and we think it is covering the short-term needs. Again thanks a lot for the contribution and for following up on the discussions. We are actually rethinking the whole stage and model-runtime design, so we might have to do a few iterations on this topic as well in the next days. |
This is the third and i hope the last PR to support models like OlmOcr ( 1 - #1802 , 2 - #1808 ).
VLM models may generate something more than just text or markdown.
E.g.:
For such models we need a way to decode vlm response (fully convert it to markdown).
Checklist: