Product standards: how to reconcile them?
Simplify repository management with data reconciliation.
Obstacles to product standards
Our customer, a major player in the consumer credit market, wanted to offer a one-click online financing plan to all buyers of used vehicles.
Most used-vehicle sales partner sites use the Argus (and sometimes JATO) as their automotive reference frame, but the algorithms used to create our customer's financing plan were based on another reference frame: EUROTAX.
In order to enable private customers to receive their financing plan in a matter of seconds, it was necessary to establish a unique correspondence between entries in repositories that did not have a common key, and whose differences in vehicle descriptions made this correspondence non-trivial.
Reconcile product repositories with Tale of Data
Use of special special joins (designed by Tale of Data (approx. 100,000 entries per repository):
- Creation of a composite key for each repository by concatenating several fields (e.g. model, long version label, number of doors, year of commissioning, etc.).
- The composite key is matched with composite keys from other repositories that have the most "words" in common. In addition, words are weighted according to their rarity in the corpus of composite keys (principle: the rarer a word is in the corpus, the more credible the match).
- Elimination of multiple matches using so-called arbitration numerical fields (such as price incl. VAT or CO2 emission level): these fields are not standardized enough to be included in the composite key, but they are very effective for making a choice when a vehicle from one repository is matched with several vehicles from another repository. We'll take the one with the closest price and CO2 emission rate.
The benefits of reconciling product repositories
Thanks to the involvement of business experts (who have in-depth knowledge of automotive reference systems), the fields involved in the composite key, as well as the arbitration fields, have been optimally determined.
The rate of unique matches rose to :
- By 55% in the first approach, which consisted of asking the customer's Data Scientists to code string matching algorithms in Python, algorithms regularly rejected by the business for several months.
- At 95% in the composite key approach and business involvement proposed by Tale of Data
With the remaining 5% of multiple matches showing no significant difference in the financing plan generated, the Tale of Data approach was validated after one week by the customer's business teams.