Model Focus: Mistral - Fast but Fundamentally Flawed

Reading time: approx. 5 min

Mistral:7b is another widely known and often recommended open AI model. It is known for its speed and efficiency. In our tests, it lived up to its reputation for being fast, but it also displayed fundamental and serious flaws in its logical ability and understanding, especially in the pedagogical area.

What You Will Learn

  • What fundamental errors the Mistral model made in the test.
  • Why speed cannot compensate for lack of reliability.

The Basics: Mistral in Brief

  • Model: Mistral:7b
  • Developer: Mistral AI
  • Weaknesses: Extremely unreliable on pedagogy and reasoning. Responses are often superficial.
  • Ollama command: ollama run mistral:7b

Results from the Benchmark

Mistral performed consistently poorly on all tasks that required deeper understanding.

  • Worst in Test (Pedagogy): Received the lowest rating of 1/5 for a "completely incorrect and confused explanation" of fractions, where it incorrectly mixed in Fibonacci numbers.
  • Reasoning: Used made-up concepts like "new force" and "moon line", which made the explanation misleading (2/5).
  • Ethics & Values: The answer was brief, superficial, and lacked the nuance required for complex questions (3/5).
  • Bright Spot (Factual Knowledge): Despite its flaws, the model could give a correct and quick answer to the direct question about compulsory elementary school (5/5), which shows that it can handle simple fact retrieval.

Practical Application: The Risk of Popularity

Mistral is an excellent example of how a model's popularity is not always a guarantee of quality in all contexts. The speed is tempting, but the risk of getting a nonsense answer or a directly incorrect explanation is too high. Using it to create materials or explanations could lead to spreading misinformation.

Conclusion

Based on these results, Mistral:7b cannot be recommended for serious work in education. It is too unreliable and the logical flaws are too serious.

Next Steps

Now we have gone through the best, the mediocre, and the worst models in our test. In the final lesson, we summarize everything in a clear table to give you a final recommendation on which model to choose.