← Back to the schedule

Machine learning for product matching: the fashion use case

Calendar icon

Thursday 15th

Time icon

12:20 | 13:00

Location icon

Theatre 20


Keywords defining the session:

- Machine learning

- Product matching

- Use case

Takeaway points of the session:

- Learn which technology, machine learning algorithms and techniques can be used for implementing an accurate end to end product matching pipeline

- Learn what are the challenges, potential solutions, heuristics applied and lesson learned when deploying a real product matching system at scale.


There is a boom in the retail e-commerce market. E-retail revenues are projected to grow to 4.88 trillion US dollars in 2021. As a result, the e-commerce landscape is going through some interesting technological evolution where the Big Data is playing a central role. Big data is opening up new opportunities for retail to strengthen its market position, offering business advantages to be achieved when huge data volumes are meaningfully gathered and analysed.

One decisive factor in this scenario is the customer, whose habits have been modified over recent years. Tools for aggregated shopping and price comparison portals have now developed into the norm. Nowadays customers can extensively inform themselves online and compare before purchase while are always just one click away from your competitors. Today, online retail is being forced into introducing data-analysis platforms to get a transparent market overview, where competitor and price observations can be done at scale. In fact, price comparison has become key in retail market analysis, and there is one underlying piece of technology that makes it possible: product matching.

Product matching is the problem of identifying equal product entities across e-commerce sites. It is a service which is central to answering key questions in retail analytics, since it enables companies to understand competitors pricing strategies, track stock availability and compare product life cycles. Product matching can been seen as a relatively simple problem in some domains, such as electronics, where software can in principle easily compare the model numbers on a TV or tablet. But it is much more difficult when it comes to markets like clothing, where universal codes are not common and there are a lot of variations in styles, materials and colors.

In this talk we will present a real product matching use case: matching products in the fashion domain. We will detail and discuss the different elements required to implement an accurate product matching solution to automatically detect duplicates in retail catalogs. Specifically in this talk we will address:

– The difficulties for extracting structured product data from fashion e-commerce sites. In retail, there is no requirement about how a product description should look or what information it should contain. The completeness of the product specifications and the taxonomies used for organizing the products differ across different e-shops. We will see which technology can map and normalize product data to categorise fashion products into a unified and standardized taxonomy.

– The machine learning algorithms involved and specifically the role of deep learning. Deep learning represents the biggest trend in machine learning in recent years, and for good reason. It is a machine learning technique that is capable of learning excellent representations directly from raw data. Deep learning has already provided state of the art results in problems dealing with unstructured data and it is the de-facto solution in the industry for dealing with text and images. We will explain how this technique has become a game changer when comparing product data and why it’s fueling the current product matching engines.

– The tips and advices for training and determining the effectiveness of the matching algorithm. We will discuss what must be taken into account when generating product matching datasets and which heuristics may be useful in the model training pipeline. We will also review what are the proper metrics to evaluate the performance of a matching algorithm.

– The challenges and problems for deploying product matching at scale. Due to the nature of the product matching, it can potentially become an intractable problem. We will discuss the lessons learned when moving our system into production and how to perform an adequate QA.