New Study on De-Duplication Technologies and Practices

Child health integration projects create enterprise-wide, person-centric systems from disparate files with different business rules for identification. Data cleaning activities, termed “de-duplication,” are performed to match and merge records appropriately. Projects are challenged to select the most effective de-duplication tools and strategies for their environments.

Members of the All Kids Count Connections community of practice on integrated child health information systems are among the health information systems projects that grapple with this issue. They requested this study to research de-duplication software and approaches, perform limited testing and technical analysis, and document the findings in matrices, showing effectiveness, underlying approach, cost and other factors.

This 2003 report by Susan Salkowitz of Salkowitz and Associates and Stephen Clyde of Utah State University’s Computer Science Department provides a description, analysis, and evaluation of de-duplication software based on vendor information and limited testing, documents de-duplication practices of the participating Connections projects, and discusses different approaches and their efficacy.

The study yielded no single best product, but provides a framework for health information systems integration projects to use as they examine alternatives and determine the trade-offs as they choose products and strategies that match project requirements. In addition, it demonstrates the value of the community of practice in contributing to the body of knowledge about public health informatics and identifies areas for further work.

View or download in PDF file format



©2005 Public Health Informatics Institute
All Rights Reserved

750 Commerce Drive, Suite 400 • Decatur, Georgia 30030
TEL: 1.866.815.9704 • FAX: 1.800.765.7520

Last updated July 13, 2005