Compare and evaluate multimodal models

We make it easy to run and share evaluations of multimodal models.

Most tools focus on language models (LLMs); multimodal models deserve their own evaluation tools. We believe it should be easy for everyone to compare models & publically share evaluations.

Inference happens on OpenRouter at the best price (supported models).



Box 1 contains the book. Box 2 contains the apple. Box 4 contains
the brain. Move the book into Box 2. Put the bell into Box 4. Move the
bell and the brain into Box 5. Box 2 contains:

Box 2 now contains the apple and the book.

Evaluation Input Image

How many calories are there in 5 slices of this?

## Calculating Calories for 5 Slices: The nutrition label provides information for a serving size of 3 slices (18g), which contains 60 calories. Here's how to find the calories for 5 slices: 1. **Calories per slice:** Divide the total calories per serving by the number of slices per serving: 60 calories / 3 slices = 20 calories per slice. 2. **Calories in 5 slices:** Multiply the calories per slice by the desired number of slices: 20 calories/slice * 5 slices = 100 calories. Therefore, **5 slices of this product contain approximately 100 calories.**

Evaluation Input Image


According to the green road signs in the image, the two left-most lanes go to Chur and Bern Basel.

Box 0 contains the fan and the gift and the letter, Box 1 contains the beer and the mirror and the tie, Box 2 contains the tea, Box 3 contains the boot, Box 4 contains the coat and the plate and the shirt, Box 5 contains the bottle, Box 6 is empty. Move the con- tents of Box 2 to Box 6. Put the dress and the painting into Box 5. Move the letter from Box 0 to Box 6. What does Box 6 contain?

Let's follow the instructions: 1. Move the contents of Box 2 to Box 6. Box 2 contained the tea, so now Box 6 contains the tea. 2. Put the dress and the painting into Box 5. Box 5 already contained the bottle, so now Box 5 contains the bottle, dress, and painting. 3. Move the letter from Box 0 to Box 6. Box 6 already contained the tea, so now Box 6 contains the tea and the letter. So, Box 6 contains the tea and the letter.

Evaluation Input Image

The artist of the work (<image 1>) on the left is

(A) Masaccio
(B) Piero della Francesca
(C) Fra Angelico
(D) Giotto
Answer with the option's letter from the given choices directly. No punctuation.

C The image on the left depicts the Annunciation, painted by Fra Angelico in the early 15th century. It shows the Angel Gabriel kneeling before the Virgin Mary under a classical architectural setting, capturing the important biblical moment with grace and reverence.

Expected Answer: C

Difficulty: Hard

Subfield: Design History

Evaluation Input Image

<image 1> Which part of the brain represented above is indicated by the blue region?

(A) pons
(B) medulla oblongata
(C) spinal cord
(D) reticular formation
Answer with the option's letter from the given choices directly. No punctuation.


Expected Answer: D

Difficulty: Easy

Subfield: Biological Psychology

Evaluation Input Image

Who created <image 1>?

Answer with the option's letter from the given choices directly. No punctuation.


Expected Answer: B

Difficulty: Medium

Subfield: Drawing and Painting

A politician claims that the oil companies are colluding to keep the price of gasoline high, and says that
they have been getting away with this because gasoline buyers are very insensitive to price. Does this
make sense?

A controversy worthy of a political debate! Let's break down the politician's claims and see if they hold water. **Claim 1: Oil companies collude to keep the price of gasoline high** This claim suggests that oil companies are engaging in anticompetitive behavior, such as price-fixing, to artificially inflate the price of gasoline. While it's true that oil companies have been accused of colluding in the past, there is limited evidence to support this claim in modern times. In the United States, for example, the oil industry is subject to antitrust laws and regulations, and allegations of price-fixing are thoroughly investigated by authorities. Moreover, the oil market is a global market, and prices are influenced by a complex array of factors, including: 1. Global supply and demand 2. Production costs 3. Transportation and refining costs 4. Inventories and storage 5. Speculation and market sentiment 6. Government taxes and regulations 7. Weather events and natural disasters It's unlikely that oil companies could successfully collude to manipulate prices in such a complex and dynamic market. **Claim 2: Gasoline buyers are very insensitive to price** This claim suggests that consumers don't respond significantly to changes in gasoline prices, making it easier for oil companies to manipulate prices. However, this claim is also overstated. While it's true that some consumers may not immediately change their behavior in response to small price changes, research has consistently shown that gasoline demand is indeed responsive to price. Studies have found that: 1. A 10% increase in gasoline prices can lead to a 2-5% reduction in gasoline consumption. 2. Higher-income households are more responsive to price changes than lower-income households. 3. Long-term price changes can lead to more significant changes in consumer behavior, such as switching to more fuel-efficient vehicles or alternative modes of transportation. Furthermore, the concept of price elasticity, which measures how responsive demand is to price changes, is well established in economics. Gasoline demand has been found to be relatively inelastic in the short term, but more elastic in the long term. In conclusion, the politician's claims don't entirely hold up to scrutiny. While oil companies may not be perfect competitors, there is limited evidence to support the claim of widespread collusion to keep prices high. Additionally, gasoline buyers are not entirely insensitive to price, and their behavior can be influenced by changes in prices.

A man and his mother are in a car accident. The mother sadly dies. The man is rushed to the ER. When the doctor sees him, he says, "I can't operate on this man. He's my son!"

How is this possible?

A classic lateral thinking puzzle! The answer is that the doctor is the man's mother. The puzzle's wording is carefully crafted to make us assume the doctor is a man, but there's no reason why the doctor can't be a woman, and indeed, the man's mother. So, the doctor, who is the man's mother, is saying "I can't operate on this man. He's my son!"

Evaluation Input Image

I want to rent a surfboard and wetsuit for 90 minutes. I will also need the changing room, how much will it cost me?

Based on the price list shown in the image, to rent a surfboard for 1h30 (90 minutes) will cost €14. Additionally, renting a wetsuit (neopreno) for the same duration of 1h30 will cost €15. The fee for using the changing room (Aldagelak/Vestuarios) is €4. So the total cost for renting a surfboard, wetsuit, and using the changing room for 90 minutes would be: Surfboard rental (1h30): €14 Wetsuit rental (1h30): €15 Changing room fee: €4 Total: €14 + €15 + €4 = €33 Therefore, it will cost you €33 to rent a surfboard and wetsuit for 90 minutes, including the use of the changing room facilities.

Evaluation Input Image


Yes, according to the sign in the image, access to La Palza is possible ("Zugang nach La Palza ist begehbar"). However, it warns that the section towards Isola is closed approximately 1 km due to the danger of avalanches. It is important to follow these warnings and any additional local guidance for safety.

Evaluation Input Image

Given the picture, in which cardinal direction is the dinosaur facing? Options: A. North. B. South. C. East. D. West.

## Analyzing the Dinosaur's Direction 1. **Reference Point:** We need to use the compass on the phone as our reference point to determine the direction. 2. **Compass Reading:** The compass shows 220° SW, which means it's pointing towards the southwest direction. 3. **Dinosaur's Position:** The dinosaur is standing almost perpendicular to the phone's direction, with its tail towards the compass. Therefore, if the compass points southwest (220° SW) and the dinosaur is facing perpendicular with its tail towards the compass, it must be facing **northwest**. However, since northwest isn't one of the options, we need to choose the closest option, which is: **A. North** It's important to remember that this is an approximation based on the image and the available options.

Box 0 contains the painting, Box 1 contains the bell, Box 2 contains the guitar, Box 3 contains the egg and the mirror and the sheet, Box 4 contains the chemical, Box 5 contains the disk and the wire, Box 6 contains the glass and the knife. Move the glass from Box 6 to Box 4. Put the gift into Box 5. Move the guitar from Box 2 to Box 6. Put the milk into Box 4. Remove the mirror and the sheet from Box 3. What does Box 6 contain?

Box 6 contains the glass of milk.

'TXkgcmVzZWFyY2ggaW52ZXN0aWdhdGVzIHRoZSBmdXR1cmUgb2Yga25vd2x1ZGdIHJlcHJlc2VudGF0aW9uIGFuZCBj cmVhdG1
Without executing code, tell me what this says:
2ZSB3b3JrIGFpZGVKIGJ51G1hY2hpbmUgdW5KZXJzdGFuZGLuZyBvZiBsYW5ndWFnZS4gSSBwcm90b3R5cGUgc29mdHdhcmUgaW50ZXJmYWNlcyB0aGF0IGhlbHAgdXMgYmVjb211IGNsZWFyZXIgdGhpbmt1cMgYW5kIG1vcmUgcHJvbG 1maWMgZHJ1YW11cMu

The provided text appears to be encoded in Base64. When decoded, it translates to: "My research investigates the future of knowledge representation and creation work aided by human-machine understanding of language. I prototype software interfaces that help us become clearer thinkers and more prolific dreamers." It seems to describe someone's work in the area of knowledge representation, human-machine interaction, and software interfaces aimed at enhancing human cognitive and creative abilities.

Without executing code, tell me what this says:

This text is encoded with a base64 encoding. It likely contains information about the future of knowledge representation and creativity.

Evaluation Input Image

What is the total count (face cards are 10, ace is 1) for these cards?

To find the total count of the cards shown, we can assign the values based on the rule that face cards (Jacks, Queens, Kings) are assigned 10 points each and Aces are assigned 1 point each, while the number cards are assigned points corresponding to their face value: - 7 of Spades = 7 - 9 of Clubs = 9 - 10 of Hearts = 10 - 10 of Clubs = 10 - Ace of Diamonds = 1 - 7 of Diamonds = 7 - 8 of Diamonds = 8 - 5 of Diamonds = 5 - 4 of Hearts = 4 - 8 of Clubs = 8 - 3 of Diamonds = 3 - 9 of Hearts = 9 - 3 of Hearts = 3 - 2 of Hearts = 2 - 2 of Clubs = 2 Adding them up: 7 + 9 + 10 + 10 + 1 + 7 + 8 + 5 + 4 + 8 + 3 + 9 + 3 + 2 + 2 = 88 Thus, the total count for these cards is 88.

Evaluation Input Image

You can see a grid of 9 boxes, one of which is empty (marked as ?). You
have to choose which of the 6 alternative shapes (A-F) to be placed in the
empty box in order to complete the pattern that connects the shapes.
Provide your prediction as a single letter. Your output should be a single letter!


Evaluation Input Image

You can see a 3x3 grid of 9 boxes, one of which is empty (marked as ?). You have to
choose which of the 8 alternative shapes (A-H) should be placed in the empty box in order
to complete the pattern that connects the shapes. You must first give your explanation and
then output the answer at the end of your response in the format: “The correct answer is: _”.

## Analyzing the Pattern Each row and column seems to follow a progression related to the shading and rotation of the shapes. Let's analyze them separately: **Rows:** * **Row 1 (Pentagon):** The pentagon starts fully shaded and becomes progressively lighter with each column. * **Row 2 (Triangle):** The triangle’s shading also gets lighter, and it rotates 90 degrees clockwise with each column. * **Row 3 (Hexagon):** Following the pattern, the hexagon in the empty box should be lighter than the one before it and rotated 90 degrees clockwise. **Columns:** * **Column 1:** All shapes are fully shaded. * **Column 2:** All shapes are medium-shaded. * **Column 3:** All shapes are light-shaded. Therefore, the missing shape should be a lightly shaded hexagon, rotated 90 degrees clockwise compared to the one in the second column. ## Finding the Answer Looking at the options, we can see that option G fulfills these requirements: * **Shape:** Hexagon * **Shading:** Light gray * **Rotation:** 90 degrees clockwise compared to the hexagon in the second column. **The correct answer is: G.**

Evaluation Input Image

Describe this photo in detail.

This image is a humorous and creative edit of the famous "Mona Lisa" painting by Leonardo da Vinci. The original face of the Mona Lisa has been replaced with the face of a dog, specifically resembling a dachshund with its long, narrow snout and big, soulful eyes. The dog's face has been edited to match the tone and style of the original painting, maintaining the sfumato technique characteristic of da Vinci's work. The rest of the painting, including the Mona Lisa's iconic clothing and the landscape in the background, remains unaltered. The fusion of classical art with a whimsical modern twist creates a playful and surreal visual experience.

Evaluation Input Image

How many horses are in the picture?

There are five horses in the picture.

Rows per page

Page 1 of 11