Reasoning About Images

Testing GPT-4V’s ability to understand and reason about visual content.

Experiments

Spatial Reasoning

Questions like “What is to the left of X?” or “How are these objects arranged?”

Result: Generally accurate for clear images, struggles with complex scenes.

Counting

Questions like “How many people are in this photo?”

Result: Accurate for small numbers, less reliable above 10.

Reading Text

OCR-style tasks: reading signs, labels, handwriting.

Result: Good for printed text, variable for handwriting.

Diagrams and Charts

Interpreting flowcharts, graphs, UML diagrams.

Result: Can describe structure, may misread specific values.

Practical Applications