June, 2005 The match-and-merge dilemma Since the beginning of electronic data entry, the problem of uniquely identifying individuals and removing duplicate records within a database has plagued developers and program managers. Duplicate records can cause problems that are far more serious than inconvenience and aggravation. Incomplete and fragmented records contribute to:
The match-merge dilemma has grown more complex as public health agencies increasingly seek to integrate information from disparate program databases. Unfortunately, help in tackling the problem has not been readily available. As a result, public health agencies typically wrestle with their de-duplication issues alone, essentially working in a vacuum. Now, a Connections workgroup on Unique Records is developing a toolkit to address match-merge issues in public health. The Unique Records workgroup is creating a portfolio of useful guides, including principles and concepts, real-world case studies, a questionnaire, and a self-assessment tool. This approach is designed to assist public health professionals in assessing the problem, evaluating solutions, and making informed decisions. The workgroup is composed of Connections members who have experience in de-duplication issues, other invited experts, consultant Susan Salkowitz of Salkowitz Associates LLC, and Stephen Clyde of Utah State University’s Computer Science Department. Child health integration projects create enterprise-wide, person-centric systems from disparate databases with different business rules for identification of individuals. Data cleaning activities, often termed “de-duplication,” consist of various processes:
These processes are often termed “record coalescing,” which refers to linking records, merging records, or both. Knowledge sharing: a key success factorFor years, public health agencies have wrestled with their de-duplication issues system by system without sharing knowledge among programs. Although most public health programs share similar issues with matching, merging, and linking, each database and system configuration is unique. Even when programs use the same kind of databases, software, or configurations, each system’s data may have unique patterns, data-entry fields, and ways of processing input. “In de-duplication, there isn’t one approach that solves everyone’s problems,” said Dr. Clyde. “In public health, you’re dealing with legacy systems that have evolved from different sources, often resulting from funding into silo programs. What may seem like subtle differences in data structure can result in significant problems for matching and merging. There are as many possible solutions to the duplicate data problem as there are individual situations. “You might ask,” he continued. “Why isn’t there only one kind of car on the road? People’s needs and preferences differ and change over time, and that’s reflected in evolving designs, whether they’re cars or databases.” When issues and problems are shared in a community of practice such as Connections, commonalities among programs and approaches emerge, and this knowledge can be synthesized so that programs can manage de-duplication more effectively. Right tools at the right timeThe Connections community of practice formed a Unique Records workgroup to develop a multifaceted toolkit – set for release in fall 2005 – to help public health agencies improve their de-duplication processes by planning and analyzing their projects methodically. The workgroup agreed that such a toolkit would offer a better chance of addressing the problems and allow individual agencies an opportunity to develop solutions that work best for their integration projects. The new toolkit does not simply give answers; it helps developers and program managers address de-duplication problems in their own agency settings. Workgroup participants are contributing their own experiences in identifying and managing duplicate records and understanding how to resolve them to serve as examples and guidelines for others. These materials are being assembled in a toolkit format so that people who are ready to roll up their sleeves can start down the road toward improving records quality in a database, said Salkowitz. “It’s a hands-on guide that distills textbook material and best practices for all programs to apply to get de-duplication done.” Components of the toolkit
HOME | SITE MAP | CONTACT US | SEARCH | PRIVACY POLICY
©2005 Public Health Informatics Institute
|