Andrew Lucas Tackling Poor Data Quality in the Age of Big Data
Jinfo Blog

11th June 2014

By Andrew Lucas

Abstract

The opportunity for organisations to benefit from big data may be limited by poor data quality. The cost to organisations of poor data is substantial. For marketing data the majority of databases surveyed were found to be lacking basic data. It is more cost effective to prevent data problems rather than trying to fix them.

Item

The benefits of big data are expected to be huge - revenues of $24 billion by 2016 according to IDC, with early adopters gaining a competitive margin over rivals.

But this potential big data bonanza may not be shared by all; poor data quality may undermine the efforts of some.

Data Accuracy is Essential

According to the "Business Intelligence Maturity Audit" (biMA®) carried out by the IT and business services group Steria, "Data quality is the Achilles heel of BI (Business Intelligence) and continues to be neglected, despite being the foundation of all BI analysis."

These findings are evidence that the truism of "garbage in, garbage out" (GIGO) can actually have a real world impact.

Data accuracy, for example, can be vital in avoiding problems in Anti-Money Laundering (AML) and Know Your Customer (KYC) datasets; for example, some of the technical issues with people's names were examined by Victoria Meyer in her FreePint article Combat Transcription Errors with Linguistic Identity Matching.

Impact on The Bottom Line & Beyond

Errors in marketing data also have a negative impact on business. D&B's white paper,"Gaining the Data Edge", quotes the Gartner Group saying that "poor data quality negatively impacts a company's bottom line by an average of $8.2 million annually in operational inefficiencies, lost sales and unrealised new opportunities".

As well as the negative financial impact, poor quality marketing data can reduce the effectiveness of campaigns, cause embarrassment to an organisation by sending inappropriate communications and reduce the efficiency of sales staff.

The Essence of High Quality Data

A recent survey, "The State of Marketing Data" (PDF), from NetProspex, a B2B marketing data services company, highlights some the challenges faced by companies.

The survey analysed 61 million records and found that 84% of marketing databases are barely functional. This included 88% of the records lacking basic data on companies such as industry, company revenue or number of employees; whilst 64% of the records analysed did not include a phone number.

So what does high quality data look like? The IBM "What is Data Quality" postdescribes the characteristics of high quality data as:

  • Complete
  • Accurate
  • Available
  • Timely.

A Two-Pronged Approach to Improving Data Quality

The challenges presented by poor data quality can tackled from both the front and the back end depending on the type of data.

For marketing information, where the data structure is relatively straightforward, the problems often arise from the initial input, or failure to input, the data - the GIGO syndrome.

In this scenario, according to D&B, "Data quality is a business issue, not an IT issue".  The D&B report "The Big Payback on Quality Data" claims that "it is far more cost-efficient to prevent data issues than to resolve them".

Information Professionals Play a Key Role

For other types of big data the answer is often more to do with the "data about the data" - the metadata. Minimum metadata requirements need to be established for big data quality and management. Taxonomies also need to be defined to enable organisation-wide use of data.

Information professionals with their knowledge of data structures - metadata, taxonomies and indexing, can play a key part in establishing the data standards of an organisation.

They also have a role in ensuring that people within organisation understand the importance of capturing and entering accurate data.

Editor's Note

FreePint Subscribers can log in to read and share more in Andrew Lucas' article, Big Data Bonanza - But Only for Those With High Quality Data

The FreePint Topic Series: Big Data in Action ran from April to June 2013. Visit the Topic page to find out more and see the links to the published articles.

« Blog