How to Find and Clean Dirty Data

By January 13, 2023Data, Technology
broom sweeping data

If dirty data sounds like something you want to get rid of, and fast, it is. What is dirty data? It is corrupted, incomplete, or inaccurate data that is clogging up your accounting system and making it difficult to produce accurate reports. Here, we share tips for finding and cleaning such data from your systems.

A Primer on Dirty Data

One question you may ask is, “How does dirty data get into our system in the first place?”

Several factors contribute to dirty data in a system. The first is obvious: human error. Individuals entering customer information may inadvertently create duplicates by misspelling a company or customer name or using an acronym instead of spelling out a word. Other factors that can cause poor data quality include lack of internal controls, merging systems together, and inadequate processes to manage data.

Steps to Identify and Correct Data Issues

CPAs and accountants who suspect that data quality issues are contributing to poor decision making should take steps to uncover and rectify dirty data. The following process can help identify problematic data.

  1. Understand and map the business process that creates the data: Data is captured as part of a business process or workflow. For example, an accounts payable process starts with the order of goods or services, receipt of an invoice from a vendor, and payment to the vendor. By mapping the process, you can then identify where data enters the system and points to review to ensure accuracy. For example, controls need to be in place to ensure that invoice amounts match the contract amounts, and that the final payment matches the approved invoice.
  2. Analyze data sources: How is data input into the system? Is it manually entered or automatically entered? Manual data entry creates more potential for mistakes, so these should be your first areas of inquiry.
  3. Identify acceptable data elements: Another important step is to identify what are considered the acceptable data elements or data fields. By making these consistent, you’ll ensure consistent data entry.
  4. Review existing data sets and tables: Although this step is time-consuming, it is important to manually open existing data sets and data tables and review them. You may wish to break this step into smaller parts or tackle it one hour per day for large datasets. This gives your mind a break between review sessions to ensure you see things with a fresh eye.
  5. Note what data is problematic or missing: After reviewing the data, take notes on which elements are missing or incomplete. These should be fixed as soon as possible.
  6. Document the database requirements: Create a data dictionary, which defines what information goes into each field. Document the requirements for data entry as well. This ensures consistency in future when others enter information into the system.
  7. Identify exceptions: As with every rule-based system, there will be exceptions to the rules. Identify these exceptions and document them as well to provide guidelines for what is an acceptable deviation from the norm.
  8. Clean the database: Fix any errors and remove duplicates after reviewing the entire database.

Hint: There are companies that can help you clean up big databases, especially those involving names and addresses. These companies can conduct what is called a “merge/purge/suppression” by comparing datasets and identifying for manual review any potential duplicates. Then duplicates can be merged, deleted (purged), or suppressed (hidden) depending on your needs. While this may not be an appropriate step for confidential financial information, for customer databases it can be an enormous time saver.

Garbage In, Garbage Out!

Failing to clean dirty data could result in poor decision-making (garbage in, garbage out) from reporting on bad data Taking the necessary steps now to have clean data in your system will be worth the short-term costs and resources required for this effort. Contact us for more information on this topic, help with your data clean-up project, best practices on data entry and shared data between multiple systems including automation, reporting and compliance.

Welter Consulting

Welter Consulting bridges people and technology together for effective solutions for nonprofit organizations. We offer software and services that can help you with your accounting needs. Please contact us for more information.