top of page

Aggregating multiple databases with Record Lineage

Aggregating multiple databases with Record Lineage enables data from different sources to be grouped and unified, while retaining the links between records and their original sources.


The need


Our client wanted to be able to publish on just one portal a database created by pooling records from 12 source databases.

Overlaps between different source databases made deduplication necessary to ensure that just one view of each record was offered to portal visitors.

Since portal users can correct and/or enrich published information (crowdsourcing), a link had to be maintained between each entry in the aggregated database and its matching record(s) in the source databases (record lineage) so that corrections could also be applied to the source.

Although this particular use case concerns cultural sites, it can be applied identically to lists of businesses and individuals (CRM), product databases, etc.

Proposed solution


Verification + geolocation* of postal addresses.

Verification of postal codes, translation of postal codes into INSEE codes.

Harmonization of data from each of the 12 source databases to obtain a single target format.

Multi-criteria (name, address) and multi-strategy (phonetic, Levenshtein distance, N-gram, etc.) deduplication .

Record Lineage: preservation throughout the processing chain of each record's identifier and its original source database.

Automation of the entire processing chain in both directions (source bases → aggregated base AND aggregated base → source bases) to propagate any updates and enrichments that may occur on either side.



A single view of each registration on the portal, thanks to deduplication.

The possibility for the owners of the 12 source databases tocrowdsource* corrections and apply them to their own database.

Up-to-date data on the portal , including both the latest modifications made to the source databases AND corrections/enrichments by crowdsourcing.

Complete automation of the process , enabling corrections to be propagated in both directions at regular intervals.

testimonial tape.png

Stay up to date with our latest exciting articles!

new band cta.png

Harness the full potential of your data by scheduling a demonstration

bottom of page