The ‘strawberry’ problem: How to overcome the limitations of AI

Sign up for our daily and weekly newsletters for the latest updates and exclusive content on the industry’s best AI coverage. Learn more

Large-scale language models so far (L.L.M.) Words like ChatGPT and Claude have become household words around the world. Many people are starting to worry. AI is coming to find a jobIt is therefore ironic to see that almost all LLM-based systems struggle with the simple task of counting the number of “r”s in the word “strawberry.” They don’t just fail on the letter “r”. Other examples include counting the “m” in “mammal” and counting the “p” in “hippo”. In this article, we will analyze the causes of these failures and provide simple solutions.

LLM is a powerful AI system trained on massive amounts of text to understand and produce human-like language. They excel at tasks like answering questions, translating languages, summarizing content, and even doing creative writing by predicting and constructing coherent responses based on the input they receive. LLM is designed to recognize patterns in text, allowing it to handle a wide range of language-related tasks with impressive accuracy.

Despite their abilities, their inability to count the number of “r”s in the word “strawberry” is a reminder that LLMs cannot “think” like humans. They do not process the information we provide them like humans do.

*Chat with ChatGPT and Claude about the number of “r”s in strawberries.*

Almost all high-performing LLMs today are built on: transformer. This deep learning architecture does not collect text directly as input. They use the following process: TokenizationThis converts text into a numeric representation, or token. Some tokens may be complete words (such as “monkey”), while others may be parts of words (such as “mon” and “key”). Each token is like a code that the model understands. By breaking everything down into tokens, the model can better predict the next token in a sentence.

LLM doesn’t involve memorizing words. They are good at guessing what comes next by trying to understand how these tokens combine in different ways. For the word “hippopotamus”, the model would see tokens of the letters “hip”, “pop”, “o”, and “tamus”, but for the word “hippopotamus” it would see tokens of the letters “”h”, “i”, and “p”. , “p”, “o”, “p”, “o”, “t”, “a”, “m”, “u”, “s”.

Model architectures where individual characters can be viewed directly without tokenizing them would potentially not have this problem, but for today’s translator architectures this is computationally impossible.

Also, if we look at how LLM produces the output text: predict The next word builds on the previous input and output tokens. This is effective for generating human-like text that is recognized in context, but is not suitable for simple tasks such as counting letters. When asked to answer about the number of “r”s in the word “strawberry”, LLM predicts the answer purely based on the structure of the input sentence.

Here’s the solution:

LLMs may not be able to “think” or reason logically, but they are good at understanding structured text. An excellent example of structured text is computer code, among many programming languages. If you ask ChatGPT to count the number of “r”s in “strawberries” using Python, there’s a good chance you’ll get the right answer. If the LLM needs to perform calculations or other tasks that may require logical reasoning or arithmetic calculations, more extensive software can be designed to include messages that ask the LLM to use a programming language to process the input query.

conclusion

Simple character counting experiments reveal fundamental limitations of LLMs such as ChatGPT and Claude. Despite their incredible ability to generate human-like text, write code, and answer questions, these AI models are not yet able to “think” like humans. The experiment shows a model of what a pattern-matching prediction algorithm is, rather than an “intelligence” that can be understood or reasoned about. However, having prior knowledge of what types of prompts work well can alleviate the problem somewhat. As the integration of AI increases in our lives, recognizing its limitations is critical to responsible use and realistic expectations of these models.

Chinmay Jog is a senior machine learning engineer. Pangiam.

data decision maker

Welcome to the VentureBeat community!

DataDecisionMakers is a place for professionals, including technical people, who work with data to share data-related insights and innovations.

If you want to read about cutting-edge ideas, latest information, best practices, and the future of data and data technology, join DataDecisionMakers.

You might also consider contributing your own article!

Learn more at DataDecisionMakers

Here’s the solution:

conclusion

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.