In this blog, we summarise Amazon's latest science paper on its multimodal search technology, which improves CTR by 6.08% by combining both visual and textual data in tandem with the Amazon COSMO Algorithm.
Amazon has recently unveiled a groundbreaking multimodal search technology that marks a significant advancement in AI for eCommerce (read the science paper here). Traditionally, Amazon’s search engine relied heavily on keyword matching, where users typed in specific terms to find products. While this approach worked, it often fell short in delivering accurate results, especially when the search terms were vague or when the products had multiple attributes. With the advent of multimodal search, Amazon is now integrating both texual and visual data—such as product images and descriptions—into a single, AI-powered search process. This innovation, driven in tandem with Amazon’s COSMO algorithm (read more here), allows the search system to better understand the full intent behind a user’s query, leading to significantly more accurate and relevant results.
At the heart of Amazon’s multimodal search technology is the advanced use of eCommerce AI. This technology works by combining product images with detailed textual descriptions to create a comprehensive search experience. Amazon employs a “3-tower” model and a more sophisticated “4-tower” model, both of which utilize AI tools for eCommerce to process and align different types of data. The 3-tower model takes into account the query image, the product image, and the product’s text description, ensuring that the search results are not only visually accurate but also contextually relevant. The 4-tower model goes even further by adding a short text query—like a specific product category or brand name—that helps narrow down the search results even more precisely. This model leverages AI-generated images for eCommerce to better match product listings with the user’s intent, enhancing the overall search experience. . For example, if you’re searching for a specific type of jacket, the model will consider both how the jacket looks and key descriptive terms like “waterproof” or “winter,” giving you results that match your needs more closely.
The integration of multimodal search technology, powered by AI, is transforming the shopping experience on Amazon. By computing both images and text in the search process, Amazon’s system can deliver results that are far more aligned with what customers are actually looking for. This advanced eCommerce AI technology helps avoid the common frustration of irrelevant search results, ensuring that the products displayed are more likely to meet the user’s expectations. The COSMO Amazon algorithm further enhances this capability by providing a deeper understanding of the context and intent behind each query, making search results even more accurate. Tests have shown that this new approach significantly improves click-through rates, demonstrating the effectiveness of AI for eCommerce in enhancing search relevance and user satisfaction. For the offline experiments, Amazon collected a massive dataset comprising 100 million images from 23 million Amazon products. This dataset was used to train the 3-tower model, which aligns query images with catalog images and product text. The results showed a 4.95% improvement in the image matching click-through rate compared to traditional methods. This improvement was attributed to the model's ability to better align visual and textual information, reducing the chances of irrelevant or inaccurate matches. Building on the success of the 3-tower model, Amazon then developed the 4-tower model, which included an additional text query input. This model was tested on a subset of 56 million images and associated textual data, constructed into 400 million quadruples of data points. The 4-tower model showed an additional 1.13% improvement in CTR over the 3-tower model. This further improvement was due to the model’s ability to incorporate more detailed and specific text information, leading to more precise search results. In addition to the offline experiments, Amazon conducted extensive online A/B testing to evaluate the real-world impact of these models on customer behavior. The A/B tests involved running the new models in parallel with the existing search system, tracking how users interacted with the search results. The total improvement in click-through rate (CTR) based on Amazon's new multimodal search technology was 6.08%. These tests show Amazon’s new multimodal search technology significantly enhances the relevance and accuracy of search results, making it a powerful tool in the eCommerce AI landscape.
This new era of multimodal search technology, underpinned by advanced eCommerce AI, is setting a new standard for online shopping. By integrating the best of both visual and textual search methods with AI, Amazon is not only enhancing the user experience but also providing sellers with powerful new tools to reach their customers more effectively.