Upload a screenshot and provide a description of an element. In the demo, we use the OpenCUA-32B model for demostration.