What Is Multimodal Search & Why Is It Useful?
Published December 2, 2024.
When it comes to search in eCommerce, the basic formula no longer cuts it. Shoppers want to be understood on a deeper level. This is where machine learning, natural language processing and vector search come in. These all help search function based on a human-centric approach. Multimodal search is an extension of this. Let’s take a look at how this works.
What Is A Multimodal System?
Multimodal systems leverage information that comes in multiple modalities. These can be:
- Text
- Image
- Video
- Audio
The systems learn to associate different features within a defined task. When it comes to search, this multimodal system means that relevant data can be retrieved from a database to present results. This data can be in a number of formats, as long as it is relevant.
Visual Discovery is already being advanced in eCommerce spheres as a way to search using images. This goes even further when it comes to the use of virtual and augmented reality in the eCommerce sphere. However, multimodal search combines the best elements of this and natural language processing together.
Similarly, voice commerce is becoming more popular as with the rise of voice notes as a form of communication. Shoppers may want to dictate verbally their search, and multimodal search lso helps to facilitate this.
Why Is It Useful?
This multimodal search is useful because it hugely increases the scope of the retrieval system. It also creates huge opportunities for search query input. For shoppers who may not know the linguistic terms, or for products that are sound-related, or visual, multimodal search creates the opportunity to search in this way. This can revolutionize the way search functions.
Multimodal Search Use Cases
Let’s take a look at a use cases:
Use Case #1
“I want something like this image, but in red”
[Image contains a certain piece of furniture, but in a different color]
Multimodal search could analyze the image, and return all relevant images and products back to the searcher. This creates a much more natural and human-like search experience. The search is blending the image provided, with the text of the color (in this case ‘red’).
Use Case #2
If a user just types “floral dress”; their search will return lots of queries.
If a user types “floral dress” and attaches an image of the dress they envision, the chances of them retrieving the actual dress they are looking for is much higher.
Combining the text and the image modalities increases the opportunity for preferred results.
The Benefits Of Multimodal Search
The multimodal search bridges the gap between different search modalities. By bringing different data modalities into a joint embedding space, the difference between them lessens. This hugely broadens the scope of what eCommerce site search can achieve.
When text is the only system that search is operating from, the user’s input is limited. With a multimodal search interface, users have more freedom of expression and range. Relevance of search results would stem from similarity to available products, rather than simply through the medium used to search.
Why It’s Important
If you continue to use single modality based search, you won’t be keeping up with the competition. In the same way that if your site isn’t using the latest merchandising or personalization capabilities, users will be expecting more.
Google is now offering a multimodal and personalized search experience, so this will be what shoppers come to expect. There are platforms that create the capacity for you to offer these functions, so it’s important to give shoppers this opportunity.
Many shoppers are visual, and the combination of both visual and textual inputs together create the opportunity for an optimized user experience.