A Product Grouping Service

What is an E-Commerce Marketplace?

As opposed to traditional e-commerce websites, marketplaces are sites that allow you to buy products from 3rd party merchants in addition to or instead of products sold directly by the site itself. Jet follows this model, selling its own products as well as products from many other merchants. So do Amazon, Walmart, and eBay among many others.

How do Marketplaces Work?

From a catalog perspective, traditional e-commerce sites are relatively simple. They maintain a list of relatively static products with a single source of pricing, product data, shipping rules, and inventory. Marketplaces are much more complex. Product information is often aggregated from different merchants and multiple merchants may sell the same products or variations of the same “master product.” A master product is an item that varies on a particular attribute or attributes. For instance, clothing usually varies on color and size. It is the job of the grouping service to identify and match variant products that are children of the same master product. When merchants sell the same product, the marketplace has to decide which merchant(s) to allow to sell a product to the customer and must keep track of inventory, pricing, shipping, and product information from multiple sources.

What is a Grouping Service and Why do We Need One?

As data sources flow into a marketplace, they are ingested, matched, and categorized into products: entities which contain all the information about a given item. Many merchants call these SKUs and it is usually necessary to store this SKU data in a “staging” area until the information for a given SKU in a completed state. Once a sufficient amount of data is aggregated for a given SKU, it can be promoted into the catalog and sold to users. However, these entities are not the complete (master) products you see on an e-commerce site like Jet’s. Rather, these are variants of products. So for example, if we sell a shirt on Jet then one of these SKUs would represent a small green variant of this particular shirt, or a large red one.

But for many of the products on an e-commerce site, the ideal user experience is for all of these variants to be grouped together into a “master product” so that they can be represented by a single tile in the search results:

product array

And a single product detail page with drop down boxes or other user interface widgets which allow customers to chose between the variants:

sizes

The above shows three different SKUs grouped together to create the proper user experience. In this context, a grouping service analyzes the data provided to a marketplace from each of the data sources and uses this along with manual intervention to match and group variant SKUs into groups representing master products. This operation is somewhat tricky and complex (and very interesting) and today we’re going to show one possible implementation to you!

What Operations Does a Grouping Service Perform?

Grouping SKUs

Grouping SKUs can be initiated manually via commands from users or can result from deleting or merging groups or as a result of changes to a SKU (or the addition of a new SKU to the catalog). Before SKUs are grouped, their attribute types, attribute values, identifiers, brand, and root categorization are evaluated, matched, and validated against potential groups. This evaluation may also result in new groups being created, SKUs being removed from old groups, or groups being merged.

Merging Groups

Sometimes multiple groups are created for a set of SKUs that should all be grouped together. There are several scenarios under which this can happen:

  1. When the identifiers for one group of SKUs doesn’t overlap with another group of SKUs, even though they all represent variants of the same product
  2. When the data for one SKU or group of SKUs is initially wrong and is then updated to match another group of SKUs

These situations result from missing or incorrect information provided from our data sources. The problems are rectified when additional or corrected information is provided or when manual intervention occurs.

Handling SKU Updates

Changes to SKUs are monitored and evaluated by the grouping service as they occur. Changes that are irrelevant to grouping are ignored but modifications or additions to a SKUs identifiers, attributes, brands, or categorization are evaluated. Attribute, brand, and categorization changes may result in a SKU joining or being kicked out of a group. Changes to identifiers may result in groups being merged or ungrouped SKUs joining groups.

Deleting Groups

Group deletions occur either in response to manual commands or automatically when there are no longer any SKUs in a group. Any SKUs remaining in the group when it is deleted will automatically be regrouped, if possible.

Other Features of the Grouping Service

Attribute Effects on Group Formation

Groups are created because all of their member SKUs vary on the same attribute dimensions (in addition to brand, identifiers, and categorization). When a group is created, the initial SKU that creates the group provides the attribute dimensions that the rest of the group has to match. For example, if a group is formed by a shirt that has grouping information which identifies it as varying on size and color dimensions, then all other shirts that join that group also have to have appropriate attributes for size an color (in addition to matching the other information for the group).

Variation Refinements

When a marketplace like Jet ingest data sources, they often have related but different attribute types. However, even if a SKU contains a different attribute type from the SKU that created the group, we want them to match if that attribute type is in the same family. To do this we could use a concept called variation refinements. These are sets of attributes for each dimension that a group varies on. If a SKU forms a group and its attribute information matches a particular variation refinement (ex size/color) then we use this to identify the attributes that the group varies on. This allows group membership to occur for other SKUs even when attributes don’t match exactly, as long as that SKU has attributes for each dimension (ex size/color) that the SKU that created the group has.  It also allows the set of attributes for each dimension (or even the number of dimensions) to change over time so that membership can be expanded or contracted based on the attributes contained in the set for each dimension.

Automatic Regrouping

Sometimes events occur which result in SKUs being removed from their current group. This is usually caused by changes to a SKUs information that disqualifies it from it’s current group. However, when this happens we want to reevaluate it for membership in another group. So the grouping services automatically reevaluates each SKU from scratch when it is removed from a group, regardless of the reason. This is especially useful in cases where a brand or category changes for an entire group of products. This will result in each of the SKUs being removed from the original group. But as they are reevaluated, all of them will automatically join a new group which is created with the appropriate metadata.

Duplicate Attribute Value Validation

When a SKU applies for membership in a group, it first has to be validated to ensure it meets the criteria to join the group. If it passes that validation, we do further checks to ascertain that the values that this new SKU has for the variation dimensions do not duplicate those of any other SKU that is currently in the group. So if we have a group that varies on color and we already have a SKU with a value of “Red” then any future SKUs that also have a value of “Red” will be prevented from joining the group. This is because duplicate SKUs are an indication that SKUs within the group should be merged or that the group doesn’t vary on enough dimensions. Allowing these SKUs to join also results in a poor user experience on Jet.com.

Automatic Attribute Dimension Expansion

As we explained in the previous section, duplicate values for SKUs that try to group together may indicate that the group doesn’t vary on enough dimensions. For example, if we have a group that varies on color and two SKUs show up with the same color but also vary on size, we can assume that the group should vary on color and size. Towards this end, if a SKU attempts to join a group and fails because it contains duplicate values on the attribute dimensions that the group varies on, we automatically attempt to increase the number of dimensions that the group varies on in certain situations. If increasing the attribute dimensions allows the rejected SKU to join the group and all existing SKUs in the group to remain grouped then the group’s attributes are expanded and the new SKU is allowed to join the group.

Group Source Aggregation

As mentioned above, group sources are the identifier provided with sku data sources which link them as sibling variants of the same master product. As data sources are ingested into the system, the group source identifiers associated with SKUs can change. This is extremely useful in that it allows a grouping service to join groups since this new identifier may link the current group a SKU belongs to with another group. By retaining all of these identifiers associated with a SKU, regardless of priority, and continually trying to match them against the identifiers of other groups, we can maximize our ability to merge groups and ensure that SKUs converge into larger and larger groups.

Error Logging

As we attempt to group SKUs together, many issues can occur that prevent us from succeeding. Various pieces of data can be missing or invalid, conflicts can occur between brands or categorization, duplicate values show up, etc. These situations are often indications of invalid data, duplicate SKUs, duplicate groups, or other problems that need to be investigated and corrected. So we identify and log all errors, their causes, and the related SKUs and groups so that they can be reviewed and action can be taken.

Some Examples

Automatic Attribute Dimension Expansion

Below is a group with 2 SKUs that varies on size. A third SKU tries to join the group but it has the same size as a SKU that is already in the group:

 Group Needs To Be Expanded

So the grouping services tries to expand the attributes dimensions that the group varies on. In this case it tries varying on color in addition to size. Now each of the SKUs is unique across the two dimensions since the two size larges are different colors (Red and Green):

Group Expanded

The end result is a group containing all 3 SKUs!

Group Source Aggregation

In the example below we 4 SKUs in 2 groups that should be grouped together but aren’t because their identifiers don’t match:

But now one of the SKUs receives an update with a new identifier which is aggregated with it’s existing identifer (regardless of priority). This new identifier is then evaluated and matches the identifier for the other group:

Aggregate Group Source (1)

Because the two groups now share an identifier, they merge:

One group

What technologies might we use?

The grouping service is part of a larger application needed by marketplace- a matching and taxonomy engine. This engine is needed by the marketplace to generate the SKUs themselves and tag them with appropriate identifiers and other data. At Jet, this matching and taxonomy engine is called Nova and is a constellation of microservices written in F# and running in the Azure cloud. Different microservices read from and write to several different storage technologies:

  • EventStore– Used to immutably store events that aggregate to form SKUs
  • Kafka– Jet’s message bus of choice
  • Redis– Used to cache mappings and other data in order to speed up the transfer of information within Nova and to downstream systems like search
  • Azure Queues– Used to receive and send commands from other services and manually from users
  • Azure SQL– the source of truth for information about first-class citizen entities, identifiers, mappings, and failures

Next Steps

The services described above accomplishes many of the needs of marketplace in regards to grouping. That said, there are still many other things that can be explored in this area:

  • Normalizing shared data from the variant SKUs up into the master product group to avoid duplication and improve the user experience (ex group brand)
  • Make the group a first class citizen with event sourcing, snapshotting, and standardized commands and events
  • Improve concurrency, which is tricky given that the source of truth is stored in related tables rather than as snapshots and because of the high level of contention we see when all of a group’s SKUs are updated simultaneously
  • Smart group merging- the ability to merge groups that vary on different attribute dimensions by using various heuristics to determine the most effective attribute dimensions for the merged group to vary on:
    • The dimensions of the group with a greater number of dimensions as this will avoid more duplicate values
    • The dimensions of the group with a smaller number of dimensions to allow the most SKUs to join, even if they don’t have attribute values for all dimensions of the other group
    • An expanded set of dimensions encompassing the dimensions of both groups

2 comments

  1. Thank you for the great post (and for the great blog as well) 🙂

    A couple of questions:

    1. Do you run EventStore in Azure? Anything you would like to share?
    2. You wrote: “Azure SQL– *the source of truth* for information about first-class citizen entities, identifiers, mappings, and failures”. I would expect the event streams to be the source of truth, not the db. Am I missing something?

    1. 1. Yes, we run EventStore hosted in Azure.
      2. I guess the terminology is open to interpretation. The snapshots stored in Azure SQL are our source of truth in that they are used by the rest of the system as up to date and accurate pictures of our products. The events are the data that is aggregated to generate them so they are also a “source of truth” but are raw and require business logic to aggregate.

Leave a Reply

Your email address will not be published. Required fields are marked *