The Evils of Data Redundancy
Christine R. Coker
Source: Executive Update
Special Section Feature
Published: September 2002
There is a gremlin at work within the information management of most associations. That gremlin has a name: Redundant Data. In this Executive Update Special Section Feature, Christine Coker - who has developed and improved association databases for almost twenty years - explores the pitfalls and virtues of maintaining "clean data".
Redundant data is great if it is a backup and directly linked to the source, but the evil element of redundant data surfaces when an organization has duplicate data in two or more unlinked files or applications. For example, say you have a centralized database that includes basic membership information and each person acting as a liaison to a committee maintains a Word roster for his or her committee. You now have redundant, duplicate information. When John Smith's e-mail address changes, it must be changed in at least two places. Now let's say this committee communicates via a list serve. Unless that list serve is hooked into the main database, that's three changes. And the liaison also has John in their Outlook contacts - four changes. Oh yes, and John is a real up-and-comer in the organization and he's on two other committees. That's 10 changes (1 in the main database and 3 for each committee).
While the committee example likely hits home with many associations, that is not the only danger spot for damaging redundant data. Your education folks might have a list of speakers; your publications folks might have a list of authors; or perhaps your accounting department keeps a list of people in arrears. A foundation donor list, a registered Web user list, a public relations list, a special industry group list, a chapter officer list… the number of places an organization can keep information is bounded only by the activities of that organization.
What at first may not seem like much of a problem can actually add up to a major waste of time and money. And then there's the black eye the company can get when it takes three months before changes are made in all necessary places. In the mean time, those who hadn't made the change yet were still trying to communicate using data. The credibility of the organization is at stake, and that "up-and-comer" might just turn into a "down-and-outer" - all because the organization was sloppy in how it maintains data.
The key for associations, of course, is to keep one centralized database that reduces the amount of redundant data as much as possible - ideally down to none. The goal is to develop a centralized database that can do most of the things for most of the people and functions, with the flexibility to change and grow with your organization.
Your centralized database will contain three categories of data:
Transactions - multiple records for each unit of basic information
The centralized database should contain at the minimum the following information:
|Member||Basic Info||Name, address, phone|
|Committees||Lookup||List of committee names|
|Member Type||Lookup||List of member types|
|Committee Members||Transactions||Which members on which committees|
|Events||Lookup & Basic Info||List and details about the events / conferences|
|Event Attendees||Transaction||Who attended which event|
|Dues||Lookup||Dues structure based on parameters|
|Dues Invoiced||Transaction||Dues invoiced to members and paid transactions|
It is important to note that data redundancy is not a technological failure but a management or process failure. The basic ingredients technologically are interconnectivity, that is the computers must be able to interact across a network of some sort, and the serving and receiving of data from a centralized database. Except in isolated cases, interconnectivity is not much of a problem for organizations. However, everyone has different needs for the information, and it's difficult to envision pulling all this disparate data together, so choosing from the many centralized database software options becomes the challenge.
The most important first step in reducing redundancy in the workplace is to identify all the places where the duplicate data is kept. Gathering that information will help you identify who is keeping what data where.
The next step should be to select a project leader - perhaps someone from the department keeping the most data, such as member services or event management - to help cull through the information that is being duplicated. The leader analyzes all the known different data repositories in the organization to see what is unique about the ways that data is accessed. A systems analyst or process engineer is usually a helpful mediator during this stage. Special attention should be paid to not only listing what specific kinds of information it is valuable for the association to keep track of, but also how that information will be used in the form of output - whether it's a member directory, badges for events, broadcast faxes, or dues invoices. The ease and flexibility of getting information out of your centralized database is probably the most important feature of the software solution that you decide on.
Probably the most important step in developing a centralized database is to recognize that you have a budget and need to assign priorities. While it would be nice to have an automated interface with your e-mail service so that you could click a button and automatically send the Board of Directors an e-mail alerting them of an upcoming meeting, it is more important to be able to easily maintain who is on the Board of Directors. The auto e-mail feature can be added later. Everyone will have different priorities, so it will be the job of the project leader or project team to determine which data redundancy problems cause the organization the most pain. Once that is determined, the leader or team can decide how to attack the problems - it might add functionality to the centralized database or change the processes for how employees access and use data. Diplomatically, the leader or team will likely need to give each department a few of its "must have" features and a couple "nice-to-have" features. However, it is important to note that the needs of a department or employee will often change after a new system is up and running or if programs and priorities change.
For many associations, each department or function has its own specialized need for information processing. Each area will have developed a database, spreadsheet or other method to meet their unique needs, and likely will be unwilling to give it up without a fight. While the benefits of a centralized database would still outweigh the costs of redundant systems, sometimes a hybrid approach will work. The goal here is to utilize the 'master database' of information where possible, and allow the individual department to extract that data for their special project or needs.
For example, let's say the public affairs department is doing a special mailing to collect funds from members for a special project. This is a one-time only event. The department wants to keep track of who they sent the letter to, follow up with phone calls, and eventually record who sent in money. In this instance, the centralized database can be utilized to extract the name, address, and phone information to a spreadsheet. The spreadsheet data can be sorted and selected to produce a personalized letter. Columns could be added to the spreadsheet to record phone conversations and money received.
Sounds nice and easy. But what happens when the department wants to do another, separate mailing a year later, but only go to the biggest donors? This should send off loud bells and whistles, because the key to the hybrid approach is that the original extraction of data was for a one-time event. The contact information might be out of date in the spreadsheet the next time the department wants to use it. The solution? The natural solution would be to simply make the changes in the spreadsheet so that the public affairs employees didn't have to go back to look up the contact information for every person or company they want to go to. Voila, you have a redundant data problem, which might be compounded by the fact that the public affairs employees might update their spreadsheet but not pass along changes to the main database.
The organization has a decision to make: it can either add the necessary functionality to track the information in the centralized database or it can cross-reference the donor list with the centralized database contact information manually. Both choices mean using either time or money to solve the problem. A third possible decision would be to make an exception in this case, after all it's just a tiny fraction of the records in the database that is at issue. The major flaw with this decision is that every department with any member, vendor, or customer contact likely has its own isolated case that would only affect a fraction of the database records. The end result would be data chaos.
Accounting software is a special case, but even it should be set up so that the same dues transaction detail does not have to be recorded twice, once in the membership database and once the accounting software. An automated interface (with the appropriate batch controls/headers/detail) can be written to reduce duplicate data entry between membership dues invoicing and collections. Appropriate security would limit access to this process, and your detailed membership transactions would be located properly in with the membership data and only the account totals by date would be stored in your accounting software.
Linking your membership information with other pieces of software, for example your Web page, is made easier by setting up an interface or extract process. Whether in batch mode (run once a day or once a week) or linked dynamically (real-time) it is not difficult to share information from the centralized database for a variety of uses.
And occasionally you need access to your data on the road, or remotely from another office. A centralized database can be accessed via a VPN (virtual private network) through the Internet or through a replication process that allows you to take a copy of the database with you on the road, make changes to it, then update those changes back at the office. This function is especially useful for managing conferences or events at a remote location.
The trick to making a centralized database work, despite everyone having their own fiefdom of needs and information, is to try to meet the majority of those needs in the one database. Just as there are modules of information that everyone will use - member names, phone numbers, addresses - there will be specifically designed modules for each department in the centralized database, such as dues, events, and committees. Each of these detailed modules can be developed to meet that groups needs, with the added benefit that - with the appropriate permissions - others can view and share that information.
Whether a system is the latest whiz-bang association management system that includes dozens of modules and powers the content on a Web site, or a simple specifically designed Microsoft Access database, the redundant data gremlin can rear its ugly head. Technology is a tool that can help neutralize the gremlin, but the key resides with the users of the data. They must understand that the modern, technological world has no place for data fiefdoms. It may mean using time and resources in programming or may mean developing different work processes. In the end, having one, clean source of data will enable an organization to better serve all of its various constituents.