TLDR: We propose DART (Duplication-Aware Reduction of Tokens), a training-free method that prunes vision tokens based on duplication, achieving 88.9% token reduction and 1.99 speed-up while ...
Ideally I don't have to deploy this as a docker image because that comes with it's own difficulties in a corporate environment, but if applying patch to the code base is too difficult, I'll have to go ...