Advice for CIOs: How to Stand Out in the Face of Mass Unstructured Data?


    Apr 21, 2023

    Unstructured data accounts for more than 80% of new enterprise data and is increasingly important to production and decision-making.

    Unstructured data does not exist in a recognized data structure such as a relational database table. It includes text, images, documents, and audio and video data. According to Huawei’s GIV report, the global data volume will reach 180 ZB by 2025, of which more than 80% is unstructured data.

    The global data volume will reach 180 ZB by 2025, of which more than 80% is unstructured data.

    Big data, AI, and high-performance data analytics (HPDA) give rise to mass unstructured data

    With the development of new technologies and applications such as 5G, cloud computing, big data, AI, and HPDA, we are seeing rapid growth in enterprise unstructured data, like video, images, and files. The volume of unstructured data is increasing from the PB to EB level. For example, a tier-one carrier processes up to 15 PB data on average every day. In terms of HPDA, a single DNA sequencer, a remote sensing satellite, and an autonomous-driving training car generate 8.5 PB, 18 PB, and 180 PB data every year, respectively.

    Unstructured data is widely used in enterprises and is becoming crucial to production and decision-making

    With the acceleration of digital transformation, unstructured data is widely used in enterprises. AI is a typical example: 56% of enterprises use AI for at least one business function, and various scenarios are analyzed and processed based on unstructured data. Improved enterprise data governance capabilities unlock data-driven business growth, and unstructured data is beginning to move to production and decision-making systems.

    In the healthcare industry, historical image archive files are more frequently accessed. AI-powered image reading helps shorten diagnosis from 15 minutes to 20 seconds and increase diagnosis accuracy from 40% to 95%. In the financial industry, to facilitate online real-time credit extensions, banks use a big data platform to perform real-time big data analytics, interactive analysis, offline processing, and real-time queries. This helps banks promptly identify new opportunities and risks, and brings down credit investigations from about one week to real time.

    Efficient and reliable storage of mass unstructured data underpins enterprise data governance

    Mass unstructured data is generated in public clouds, edge sites, IoT terminals, and most frequently in enterprise data centers. It is predicted that unstructured data will increase at a CAGR of 18% and exceed the amount of unstructured data in public clouds by 2025 to account for 51% of all unstructured data. More enterprises are choosing to deploy unstructured data storage in enterprise data centers.

    To efficiently and securely store unstructured data in enterprise data centers, a growing number of industries are looking for professional grade distributed storage solutions. The financial industry uses distributed storage to store image, audio, and video data. In the education industry, distributed storage is an effective way to support HPDA. Most common is the manufacturing industry, where breakthroughs in autonomous guided vehicles, industrial Internet, and industrial simulation drive explosive data growth and a greater need for distributed storage.

    However, it is clear that the storage capacity previously purchased by enterprises cannot meet today’s needs. They need to first overcome the problem of storing vast amounts of data, as the traditional multi-copy technique is a capacity barrier to unstructured data storage. To optimize storage space utilization, data reduction techniques implemented by professional distributed storage solutions are needed. These include:

    • High-ratio elastic erasure coding (EC)
    • Deduplication
    • Compression

    Replacing general purpose servers with high-density storage hardware optimizes TCO by reducing footprint, power consumption, and O&M complexity.

    In addition, the industry uses professional distributed storage that integrates software and hardware to provide enterprise customers with end-to-end solutions featuring high reliability, performance, and scalability. This simplifies deployment, management, and services and reduces OPEX.

    To deal with data mobility issues, professional distributed storage implements hot, warm, and cold data tiering and automatically relocates data to different tiers for optimal ROI. Unstructured data management is becoming increasingly complex. It is difficult to manually allocate data to the appropriate storage space quickly and flexibly, resulting in inefficient data management and costly O&M. The hot, warm, and cold data tiering technology of professional distributed storage can store data in proper storage space based on policies with automatic data migration. This solution solves the problems encountered by enterprise customers and is widely used across industries.

    Unstructured data-powered technologies often involve multiple access protocols (file, object, and HDFS) in one data processing flow. To ensure premium usability, preferred solutions adopt multi-protocol interworking without data copying to reduce data redundancy. Figure 1 shows the data processing flow of autonomous driving training.

    Figure 1: Data processing flow of autonomous driving training

    What we suggest

    • Enterprise IT teams strengthen their capabilities of processing mass unstructured data

    As enterprises use unstructured data more widely, especially in their production and decision-making systems, the ability to efficiently store mass unstructured data and extract the huge value of data to enable informed decisions is a key competitive edge. Therefore, it is necessary for enterprise IT teams to strengthen their capabilities in processing mass unstructured data, and transform their structured data-centric capabilities to the design, planning, and management of mass unstructured data.

    • Choose professional distributed storage to build a foundation for mass unstructured data

    To improve the efficiency of using mass unstructured data for production, use a professional distributed storage system to build a global unified data storage foundation centered on unstructured data. It is best to choose a distributed storage system that supports hybrid workloads, multi-protocol interworking (file, object, and HDFS), data reduction, and high-density hardware to ensure sufficient capacity, superb data mobility, and premium usability.

    • Evaluate multiple factors and then determine whether to use enterprise data centers or public clouds to deploy unstructured data

    When designing and planning the deployment of mass unstructured data, enterprises must consider the full data lifecycle management (data generation, storage, access, and migration), as well as data sharing and mobility between service platforms and even across cloud platforms. In addition, it is recommended that enterprises evaluate factors like TCO, performance, and security before selecting enterprise data centers or public clouds.

    Learn more about Huawei’s Data Storage solutions.

    Disclaimer: Any views and/or opinions expressed in this post by individual authors or contributors are their personal views and/or opinions and do not necessarily reflect the views and/or opinions of Huawei Technologies.


      Leave a Comment

      Posted in


      Posted in