Many big data implementations are leaving document management systems behind, but DMSes house major stores of unstructured data. Should data analysts think again?

View of a Server room data center - 3d rendering
Image: Production Perig/Shutterstock

The earliest document management systems (DMSes) appeared in the 1980s. They moved beyond physical file cabinets and PC server storage and appeared on networks where multiple people and departments within a single company could gain access to a trove of documents in electronic form.

Since then, document management systems have been the primary movers and shakers behind companies’ efforts to digitalize. These systems scan, index, store, retrieve and transform documents. They have been instrumental in moving paper-based documents and images out of file cabinets and storage rooms and onto widely distributed networks that everyone uses.

 

The question is: Are companies linking document management with their big data strategies?

In many cases, companies are lagging.

SEE: Microsoft Power Platform: What you need to know about it (free PDF) (TechRepublic)

The big data repositories that are being built combine systems of record data with incoming Internet of Things and outside source data that is unstructured. Document management systems are used in this process, but there isn’t necessarily a concerted effort in big data strategies to maximize all of the data in a DMS.

On the DMS side, users search data and digitize and organize it — but other big data technologies, such as data cleaning and normalization, artificial intelligence, machine learning and more advanced algorithm development, aren’t yet broadly used.

Of course, there are niche exceptions.

One of these exceptions is the legal discovery process that pores through reams of documents that are often housed in corporate document management systems. The goal of a legal discovery software is to analyze unstructured documents and to use AI and machine learning to determine which documents (out of thousands) are most relevant to a potential upcoming legal case, and which are not.

In this instance, there are no lengthy corporate arguments about whether it’s necessary to import documents into a big data repository from a DMS. The use case stands by itself.

However, in other cases, a compelling reason to mesh a DMS with a big data repository might not be there. For instance, does a genome sequencing experiment really rely on what a DMS would typically include?

SEE: Digital transformation: 3 things your organization can’t afford to overlook (TechRepublic)

The takeaway is not really whether a DMS is needed for a big data repository but simply that it should be considered. The DMS so often becomes an outlier for big data strategy because data scientists and IT data analysts have a tendency to overlook it.

What should companies do to ensure that their DMS systems are included as potential sources for information that flows into a big data repository? Here are four steps.

  1. Document the types of data that are in the DMS so they can be evaluated for inclusion in big data repositories.
  2. Verify that the DMS systems that the company is using have a full set of APIs (application programming interfaces) that make data transfers into big data repositories easy.
  3. Develop a standard extract, transform and load methodology that can take incoming data from a DMS and prepare it for use in a big data repository.
  4. Determine any outgoing results from big data analytics that should be exported to DMS systems for user access.