search results

28 items found

  • Marketing | Tale of Data

    Marketing & HR Information Systems Segmentation / Churn, CRM migration, Recommendations, Optimization of marketing campaigns, HRIS Migration Improving the quality of HR data before migrating to a new HRIS Improving the quality of HR data as well as monitoring the reliability of this data over time was for our client an essential prerequisite for the success of its future projects, first and foremost the migration to its new HR platform. Solution provided by Tale of Data Automated Quality Audit reports detecting anomalies at the cell, column or row level: missing or malformed data, outliers, duplicates. ​ Automation of data quality operations: cleaning, harmonization, enrichment by fuzzy joins on first and last names, deduplication. ​ Implementation of the validation rules (Tale of Data business rules engine). ​ Creation of alert dashboards , in order to anticipate and prevent possible drops in data quality over time. Benefits Reduced data migration time to the customer's new HR database. Increased efficiency of Business / IT collaboration on data reliability. Strong increase in reporting consistency thanks to improved data quality. Risk reduction on the whole project : namely the migration of all their HR data to their new HRIS platform. Optimization of Marketing Campaigns by improving Data Quality and enriching CRM data Our client wanted to increase the relevance of the marketing messages sent to his customers. To achieve this goal, he needed to improve the segmentation of his customer base and therefore solve the following two problems: Reliability of CRM data: multiple views of the same customer (duplicates), inconsistencies in emails, postal addresses and phone numbers The lack of contextual information about customers in the CRM Solution provided by Tale of Data Verification + geolocation of postal addresses , enrichment of each address with the IRIS code (for french addresses). ​ Multi-criteria matching (name, first name, address) and multi-strategy (phonetics, Levenshtein distance, N-gram,…) deduplication . ​ Correction of telephone numbers , verification of the country code if present, otherwise reconstitution of the country code from the address. ​ Emails: checking for the existence of domains (, fixing differences between a contact's name and the spelling of their name in the email. ​ Enrichment (Data Augmentation) by cross-referencing with the Open Data First Names Repository: determination of the most probable age group for each customer from their first name. Benefits A unique view of each customer thanks to deduplication (Single Customer View ). A more relevant segmentation thanks to data enriched using IRIS code deduced from geolocation. Indeed, many Open Data datasets use the IRIS code as a key and provide valuable information: standard of living, equipment and services, housing, energy consumption, medical functions, etc. The recovery of e-mails + telephone numbers + postal addresses has made it possible to significantly reduce message sending failures . ​ The information acquired on age groups made it possible to further refine the segmentation. Optimization of communication campaigns by verifying people's identities and correcting postal addresses Our client, a French department, wanted to communicate with the beneficiaries of various services (household help, help for the elderly, remote assistance, etc.). The objective was to avoid unnecessary mailings which created an image problem or generated unnecessary costs: several mailings to the same people, wrong addresses, mailings to deceased people, ... Solution provided by Tale of Data Reconciliation of the postal addresses of beneficiaries with the French National Address Database. Multi-criteria (name, first name, address) and multi-strategy (phonetic, Levenshtein distance, N-gram, etc.) deduplication in order to identify duplicates, family homes (same last name and same address) as well as the people benefiting from several services). ​ Identification of deceased people in order to remove them from mailing lists: cross-referencing with the Open Data Deceased People Repository, on the last name and first name (phonetic + gram) as well as on the date of birth. Benefits Identification of people under a single identity , regardless of the services from which they benefit. Sharp reduction in the number of postal items thanks to the identification of individuals and family homes (several tens of thousands of stamped letters saved over the year). Fraud detection : identification of people benefiting from services which are non-cumulative. ​ Image gain by removing deceased people from the mailing lists. Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us

  • Banque et finance | Tale of Data

    Corporate Finance and Banking Risk-weighted assets (RWA), BCBS-239, KYC, Basel III BCBS 239 - Compliance Our client, one of the most important private banking players in Europe, had an obligation to comply with the BCBS 239 standard. On January 9, 2013, the Basel Committee published a set of principles under the name BCBS 239, the objective of which was to enable banks to increase their reporting capacity and the accuracy of regulatory reports. Solution provided by Tale of Data Automated Quality Audit report detecting anomalies at the cell, field or record level: missing or malformed data, insufficient number of decimal places, outliers, duplicates. ​ Consistency check : business rules engine in natural language in order to detect business inconsistencies (e.g. for a given record, field A and field B cannot be simultaneously empty ). ​ Chaining and automation of data processing (Flows). Benefits Reduced production time for regulatory reports. ​ Automated data aggregation : alerts are automatically raised in case of violation of consistency checking rules. ​ Production of accurate and reliable risk data . ​ Reduced risk of non-compliance. Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us

  • Sécurité | Tale of Data

    Security Fight against the financing of terrorism, Cybersecurity, Internal threats Internal threats: sensitive-information leak prevention Our client, one of the largest private banking players in Europe, wanted to minimize the risk of sensitive information leaking (identities, financial transactions, etc.). Since this type of leak is most often due to internal malicious acts, the Information Systems Security Manager wanted to exhaustively identify the sensitive information present in the bank’s information system in order to increase the level of protection. Two questions therefore arose: Where exactly are stored sensitive data? Which databases? What tables? Which columns? But also which files? (e.g. Excel files and other listings disseminated on the internal network) What types of sensitive data are these? Solution provided by Tale of Data Our “Mass Data Discovery” technology has enabled us to automatically scan: All relational databases All shared network disks: all directories, and their sub-directories, were searched for Excel, CSV, XML or JSON files The CRM and the content management systems (Sharepoint) Each record in each table was analyzed for sensitive data: last name, first name, addresses, e-mails, telephone numbers, bank account numbers, etc. The results were aggregated at the field level (whether it was a database, an Excel file or a CSV listing): at the end of the scan we knew, for example, the exact number of people last names present in any Excel file, in the bank network drives. Benefits The data scan (= “Bottom - Up” approach) provided the chief information security officer (CISO) with exhaustive identification and localization of sensitive data. ​ The scan report allowed security teams to greatly minimize the risk of data leaks: By tracking down malicious SQL queries that they previously thought were harmless (= any SQL query fetching columns that are part of the list of sensitive columns established by the data scan). By systematically checking access to network directories containing sensitive data, which they did not know to be sensitive before the data scan. By verifying the effectiveness of the anonymization procedures : cross-referencing (using Tale of Data fuzzy joins) anonymized files with a list of known customers should not normally generate any match. Finding any match means that it is mandatory to rework on the anonymization process. By controlling the risk of information leaks over time using regular scans: up to several times a day. Indeed, new listings can appear for a few hours on the network just before a leak. Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us

  • Analyse et monitoring de séries temporelles | Tale of Data

    Industry Sensor data (IoT), Financial transactions, Plant Information, Detection of anomalies Monitoring and improving the quality of PI data Our client, an industrial group with hundreds of subsidiaries around the world, wanted to control and improve the quality of PI data (PI = Plant Information: data emitted by sensors installed on production sites). ​ The objectives were as follows: Create PI nomenclatures (Assets, Attributes, Tags) with clear naming rules, which are free from duplicates in order to allow better reuse of Tags as well as cross-site analyzes. Set up an efficient monitoring system for PI Tags (= time series): real-time detection of missing or inconsistent data, identification of defective sensors, etc. Supply Data Scientist teams with reliable data, which is an essential prerequisite for building consistent and efficient predictive models (forecasting, predictive maintenance, etc.). Solution provided by Tale of Data Harmonization of sensors nomenclature: ​ Tale of Data automatically reconciles texts (name, description, etc.) with spelling differences using advanced “fuzzy matching” algorithms: phonetics (English / French), consonant (or vowel) frequency , word fragmentation (N-Gram), or even automatic word weighting (Inverse Document Frequency): a low weight is assigned to the least discriminating words. ​ Monitoring of sensor data using Tale of Data's time series analysis algorithms: Determination, by type of sensor, of the appropriate alert thresholds for the measured values ​​(temperature, pressure, etc.): these thresholds were obtained by launching an automatic analysis over several years of history Determination, by type of sensor, of the appropriate alert thresholds for the elapsed time between two measurements : these thresholds were obtained by launching an automatic analysis over several years of history Setting up automatic alerts if previously determined thresholds are exceeded or when data is missing Benefits Labeling harmonization and deduplication have enabled the creation of a shared repository of PI metadata: Assets, Attributes, Tags. ​ This shared PI metadata repository, with clear naming rules, opened up many possibilities: Consistent representation of the system : same set of attributes for items representing the same type of equipment, with names, descriptions and standardized units of measure Facilitation of "multipoint" analysis : standardized metadata make it possible to aggregate or compare time series, whether it is for monitoring, reporting or predictive analysis (Machine Learning) The time series analysis made it possible to put into production, in a few weeks , a fully automated monitoring system continuously analyzing data from tens of thousands of sensors. ​ Alerts on very precise conditions have been set up (e.g. sensors emitting erroneous values ​​or presenting anomalies in the time intervals between two measurements). These alerts can be reconfigured by business users at any time, without writing any code . Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us

  • Détection de fraudes | Tale of Data

    Fraud Detection Laundering, VAT fraud, False invoices, Hidden financing Detection of document fraud Our client, a French ministry, wanted to improve the effectiveness of controls over the allocation of administrative documents. ​ The size of the database (nearly one hundred million records) and the variety of applications allowing the entry of information - most often manual entry - severely limited the effectiveness of fraud detection. Solution provided by Tale of Data Reconciliation of the postal addresses of recipients of administrative documents with the French National Address Database made it possible to obtain reliable and standardized addresses. ​ Multi-criteria (name + first name + address) and multi-strategy (phonetic, Levenshtein distance, N-gram, etc.) deduplication to spot people who have obtained several versions of administrative documents that are supposed to be unique. ​ Cross-reference using fuzzy joins , with other databases of the ministry on the name and first name (phonetic + N-gram) as well as on the date of birth (with a tolerance of a few days), in order to identify the people who requested several administrative documents that are supposed to be non-cumulative. Benefits The ministry was able to identify people (sometimes up to several hundred in the same county) who had several versions of the same administrative document allowing them to avoid sanctions . The phenomenon had gone unnoticed until then because of a few approximations in the spelling of the name, in the address (eg: street number mentioning the neighboring building) or in the date of birth (1 to 2 days apart). Aggravating factor: obtaining such documents was impossible without internal complicity within the ministry. ​ Standardized postal addresses have made it possible, by simple grouping and counting, to spot suspicious addresses used to request a number of administrative documents largely exceeding the number of inhabitants at the specified address (factor of 10 or even 100). ​ The cross-referencing of several databases has brought to light many cases of prohibited accumulation of administrative documents . Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us

  • Accueil | Tale of Data

    MAKE YOUR DATA TALK To make your data more reliable, use it and make it accessible to everyone… without coding A new approach to enhance your data and monitor it in real time Test for free Meet an expert Steps to mastering data Read Discover / Audit data from your IS Discover the data Learn more -> Gather disparate sources, databases or files ​ Identify the corrections to be made without leaving the application Learn more > Read Discover / Audit data from your IS Audit the data Duplicate, merge, transform and enrich data Automatically identify missing, malformed, irrelevant or inconsistent data Learn more -> Learn more > Read Discover / Audit data from your IS Structurer the data Use our suggestion engine to assemble, convert, normalize, merge Enrich your data with internal or external repositories (in Open Source) Learn more > On-the-fly generation of audit reports on data quality and history of all operations performed Enrich the reliable audit trail Learn more > Learn more > Learn more -> Read Discover / Audit data from your IS To exploit your data Configuration of personalized alerts: presence of particular anomalies, empty or inconsistent data, duplicates... Powerful detection algorithms (using fuzzy logic, N-Gram, phonetics, etc.) Learn more > Test for free Meet an expert With valued, transformed and enriched data, Tale of Data allows business departments and data managers to free themselves from IT developments and acquire reliable and therefore usable data. The company presented by Jean-Christophe Bouramoué | Chairman Why choose Tale of Data? ''Tale Of Data is a tool that allows you to quickly get results without being a computer scientist '' ''Data capitalization generates a reduction in data acquisition costs and times'' ''I chose Tale Of data for its ease of use: everything can be done with the mouse, the history of transformations is archived and it's very easy to go back'' They trust us Bank and finance Learn more > Data sharing Learn more > Insurance and security Learn more > Our testimonials Stephane AVRONSART SNCF Network Contact us Meet an expert

  • Marketing | Tale of Data

    Marketing False declarations escheat Risk assessment Solvency II (pillar 3) Our client wishes to improve the effectiveness of controls on inactive accounts or dormant life insurance contracts. ​ The Tale of Data solution allows it to uniquely identify its customers: - Reconciliation of highly similar natural or legal persons thanks to the multi-criteria, multi-algorithm deduplication engine - Use of our phonetic search algorithm: one letter gap, sound / close phonetics ("au" "eau" "o") - Determination of a similarity score allowing fine deduplication - Get rid of punctuation, spaces or unwanted characters in names - Enrichment with external repositories (business repositories, Open Data, INSEE files, etc.) ​ Tale of Data makes it possible to analyze and detect acts presenting an anomaly or an inconsistency. ​ Fraud detection is carried out by an automatic query system. A list of relevant business criteria is determined beforehand. This list is then integrated into a repository in the solution, it allows our algorithms to identify dissonant behaviors, suspicious links or even to detect inconsistencies. Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us

  • Glossaire | Tale of Data

    The lexicon Tale of Data A word you don't understand? Aware that the words of the data world do not speak to everyone. Find below the definitions of the words followed by a *. The glossary Algorithm : set of operating rules specific to a calculation; suite of formal rules (source: Le Robert) Matching algorithm / fuzzy matching : algorithmic process based on an approximate match of two inputs, rather than an exact match. In practice, different algorithms are available in Tale of Data to rely, for example, on the specificities of French or English phonetics. Other approaches are proposed such as giving more weight to consonants or using proven mathematical procedures such as the Levenshtein distance. API or Application Programming Interface: software interface that allows software or a service to be “connected” to another software or service in order to exchange data and functionalities. ​ Relational database: In computing, a relational database is a database where information is organized in two-dimensional arrays called relationships or tables. According to this relational model, a database consists of one or more relations. (Source Wikipedia) ​ BAN - National Address Base: the National Address Base is the base grouping the official addresses of the French territory. This database is said to be “open”: its access and use are left free to users, who may be of private or public origin. ​ BCBS 239: banking standard aimed at increasing the capacity of banks in terms of aggregating financial risk data; produce reports and improve the quality of this risk data. ​ Churn: is used to refer to the loss of customers or subscribers. We find the use of this term mainly in the world of telecom companies and in that of banks. It is used in particular to measure the average duration of a subscription to an offer or service (subscription to a sports TV package, to a magazine, to a newspaper, etc.). It is one of the main indicators of customer satisfaction (source: journal du net). Cluster: mode of operation distributed over several servers, which makes it possible to process a large amount of data in parallel. IRIS code : the "Grouped Islands for Statistical Information" are territorial division bricks created by INSEE of uniform size. Each unit cell contains 2,000 inhabitants. Connectors: means to connect to a data source of a particular type (for example a SQL Server database, or an Azure Blob Storage type file server, etc.). -> see Architecture section Crowd sourcing: mode of organization calling on contributions from a large number of people to enrich and improve content. For example, Wikipedia is an encyclopedia whose content is enriched with the help of a very large number of contributors. Data visualization (dataviz): method of communicating figures or raw information by transforming them into easy-to-read visual objects: points, bars, curves, maps. Data scientist: data specialist, he collects, processes, analyzes and makes data speak to improve the company's performance Deduplication: method to eliminate duplicates Levenshtein distance: measures the similarity between two character strings. It is equal to the minimum number of characters that must be deleted, inserted or replaced to go from one string to another (source: Wikipedia) PI (Plant Information) data: this data, produced on industrial sites, comes from sensors installed on production sites and sent to a storage system. ​ Recording: rows in a database or file (as opposed to columns). ​ Data enrichment: consists of completing the data, improving it and structuring it through the use of another source (repository, database file, etc.). Flow: processing built by the user, allowing to carry out tasks of remediation, preparation and monitoring of data. A flow is by construction designed for production. ​ Flow Designer: environment in the Tale of Data software to develop Flows in order to design transformations on the data. ​ Geolocation: technology to determine the location of an object or a person with a certain precision (source CNIL). ​ Artificial intelligence: set of techniques that allow computers to simulate and reproduce human intelligence. Fuzzy joins : assembly of several sources by making correspondences between them using fuzzy matching algorithms. ​ Full-text join: joins multiple sources by searching deep into all specified textual data. This allows for example to discover links between records in two tables for which the differences are related to a different word order. A conventional algorithm will not be able to detect this type of correspondence whereas it may be obvious for a human operator and for a full-text join algorithm. ​ Natural language: means that the user does not need to know computer languages to use the solution. The functions are all usable via self-explanatory menus. ​ Machine Learning: automatic learning that consists of letting algorithms discover patterns in the data set. Once this training has been completed, the algorithm will be able to find the patterns in a new dataset. Mass Data Discovery: exploration process of the computer system to discover and map all the data present in the said system. This notably makes it possible to establish an atlas of stored sensitive data (such as personal data). It also allows the generation of a report analyzing the quality of the stored data. ​ Metadata: data used to characterize another piece of data, physical or digital (source Larousse). These are the data used to describe other data. Examples: file size, creation date, modification date, etc. N-gram: method used in Tale of Data to evaluate the similarity between several words or between several sentences. More generally, it is the succession of N elements of the same type extracted from a text, a sequence or a signal; the elements may in particular be words or letters (source: Wiktionary). Open Data: literally, “open data”, refers to data to which access is totally public and free of rights, in the same way as exploitation and reuse. The National Addresses Database or the SIRET database are illustrations of information that can be consulted in Open Data. ​ Pattern: A user-defined pattern that can be searched for in the data, or used as part of its transformation. ​ Phonetics / Phonetic algorithm / Phonetic analysis: comparison of terms according to a sound identity. Example: search for similarity between Surnames with the sound [o], which can be spelled o, ô, au, eau. Record Lineage: representation offered by Tale of Data which allows you to see for a particular data set the list and the chaining structure of the data used to feed this data set (the “downstream flows”), as well as all the sets of data and the sequences that are dependent on the selected data set (the “upstream streams”). This visualization mode makes it possible to understand the origin of the data (=upstream vision) and to establish the impact of a change within the data concerned on other data sets that depend on it (=downstream vision). Data reconciliation: process relating to the homogenization of data, their grouping according to their nature or source. ​ Rectification : phase during which the “raw” data is analyzed for correction. ​ Reference : list of elements forming a reference system. Example: a product repository is the list of all products containing a number of attributes for each product. ​ Business rules: set of transformation operations on data, which is defined by the user of Tale of Data without writing code, i.e. with an intuitive interface and allowing to specify conditions for each operation which can also be complex as necessary. Tale of Data allows you to get a readable summary of the rules that have been defined, and reuse them in other Flows and other data transformation operations. Runtime: environment in the Tale of Data software to run Flows in order to perform transformations on the data. The execution of Flows can be triggered directly by the user, or be scheduled in an extremely flexible way. Remediation: solving quality problems present in the data Saas or Software as a Service: system for providing software, in the form of a service, accessible via an Internet browser. Time series: data series indexed by time. The GDP of a country or the evolution of the population are time series. Script: computer program which, when run, performs an action or displays a Web page. Shadow IT: all the data and processing carried out on the sidelines of the IT department (eg: unofficial MS Access databases, Excel files with macros, etc.). This data and software are invisible to the IT department, which generates a security and non-compliance risk (GDPR).

  • Tarification | Tale of Data

    Three pricing options depending on your deployment strategy Fixed billing independent of the amount of data processed Discover Pricing SaaS Monthly user license fees The amount decreases depending on the duration of the subscription and the number of user licenses On Premise / Single Server User licenses + server license Minimum one year subscription On premise or Cloud / Cluster Big Data Apache Spark runtime environment Cluster management: YARN or Kubernetes Billing based on the number of Data Nodes in your cluster All of the platform's features are included in each of the three offers: All the platform’s features are included in each of the three options : Database and file connectors Flow Designer (interactive design of data processing pipelines) 80+ standard transformations Repositories for advanced data reconciliation Multi-criteria and multi-strategy data deduplication Integrated data visualization tools Data Discovery Runtime: schedule data preparation and monitoring jobs Contact us for a 2-week free trial of the solution Free try Data sovereignty Tale of Data offers a secure SaaS infrastructure in which the hosting and all data processing are located in France. The price of the licenses includes: Access to all the platform’s features. Functional and technical support. Software upgrades.

  • Compliance et risques | Tale of Data

    Compliance and risks Audit, Risk management, Litigation, GDPR Personal data scans for GDPR compliance Our client had to comply with the General Data Protection Regulation (GDPR ). In order to do that, all the personal data present in his information system had to be associated with processing acceptable to the supervisory authority. To achieve this goal, our client had to be able to answer these 4 questions: Who within the company keeps personal data? What types of personal data are these? Where are these personal data stored? Databases but also Shadow IT (e.g. Excel files disseminated on the internal network) For what purpose are these data kept? Solution provided by Tale of Data Our “Mass Data Discovery” technology has enabled us to automatically scan: All relational databases All shared network disks: all directories, and their sub-directories, were searched for Excel, CSV, XML or JSON files The CRM and the content management systems (Sharepoint) Each record in each table was analyzed , searching for sensitive data: last name, first name, addresses, e-mails, telephone numbers, bank account numbers, etc. The results were aggregated at the field level (whether it was a database, an Excel file or a CSV listing): at the end of the scan we knew, for example, the exact number of people last names present in any Excel file, in the network drives. The application mapping document provided by our customer's IT department has enabled us to establish the link between the personal data found and the actual usage of the data. Benefits Scanning all the data (= "Bottom - Up" approach) made it possible to carry out a comprehensive analysis , as opposed to interviews which rely on the memory of the people interviewed and on documentation that is rarely up to date. ​ The scan report gave the register of processing operations a lot of credibility and enabled the DPO (Data Protection Officer) to better organize his anonymization tasks and therefore, to greatly minimize the risks of non-compliance . ​ The automation of the entire process gave our client the ability to run regular scans, in order to prevent any accumulation of non-legitimate personal data over time . Other scenarios are possible, do not hesitate to contact us to discuss your business cases. Contact us