One Huge Cluster Caught: Information Asset Standardization


Information asset standardization is the purposeful and punctiliously deliberate consolidation of redundant, contradictory stories, processes, and databases into enterprise requirements. The proliferation of information property can have the best opposed influence on environmental well being; standardization has many well being advantages:

  • Reduces the probability that ill-constructed property take down processes, nodes, and clusters
  • Reduces rivalry and competitors for compute and storage
  • Reduces course of and repair failures and related troubleshooting effort
  • Reduces effort spent sustaining and supporting redundant property

Though the impacts of information asset standardization on environmental well being may be larger than every other class on this collection, the enterprise worth advantages exponentially outweigh them: normal knowledge definitions, improved knowledge governance, constant knowledge interpretation, better knowledge trustworthiness, and improved data-driven choice making. Ideally you’re realizing these advantages utilizing Cloudera Information Catalog.  

Complete knowledge standardization is a multiyear journey and certain pointless, however the low hanging fruit is ripe for the choosing. We strongly advocate embarking on this journey till returns diminish.

Report Standardization

Take these steps:

  1. Stock stories, together with possession, utilization statistics, and report frequency. 
  2. Goal for retirement any stories unused within the final yr, then within the final 6 months. Pay explicit consideration to report frequency as low utilization of an annual report could also be acceptable.
  3. Choose a report archival methodology commensurate along with your buyer partnership dynamic (we hope you’re not in knowledge purgatory )
    1. Two weeks earlier than, every week earlier than, and the day of the archival, notify report homeowners as to which stories you plan to archive, permitting them a grace interval to object and supply justification for the stories continued existence. 
    2. Conversely, archive them with out notification and restore a report when anybody shouts about it.
  4. Archive focused stories. In Tableau, we desire to easily assign report possession to a system consumer which prohibits additional use whereas enabling us to simply restore it if requested and justified.
  5. Repeat the train quarterly. In our expertise, 80-90% of reporting stock may be archived in as little as 2 quarters.
    1. In case your visualization instrument employs extract jobs, cease them, and notice any database archival targets.
  6. Often examine the appropriateness of report refresh charges and negotiate.
  7. Over time, consolidate extra property by grafting closely used report options and capabilities into enterprise normal dashboards then retire redundant legacy stories. Admittedly, that is troublesome and time consuming work often undertaken as a method to trusted knowledge, not environmental well being. 

DB Standardization

  1. As earlier than, stock database property, possession, refresh frequency, and related utilization statistics. 
  2. Goal short-term/testing databases and consumer databases owned by former FTE. 
  3. Talk far and broad. We’re not as courageous as to archive dbs with out notification and permission most often. We take pleasure in our jobs and need to hold them.
  4. Archive the databases. We often archive into a typical archival database. In our expertise, this could scale back 35-55% of manufacturing tables. 
  5. Often negotiate refresh charges and knowledge retention insurance policies with database homeowners.
  6. We strongly advocate taking the multiyear journey to standardize centralized knowledge property into enterprise requirements as a lot as potential as it may possibly considerably enhance knowledge trustworthiness and correct data-driven choice making.

Pipelines and Jobs Standardization

Database asset standardization will determine archival alternatives for (1) the pipeline stock, right here referring to processes which transfer knowledge from one repository or supply to a different repository or curated dataset, in addition to (2) the roles stock, right here referring to queries which offer views or persist knowledge throughout the atmosphere. Standardizing processes is excessive effort with diminishing returns on environmental well being; due to this fact, start with processes that:

  • Steadily fail
  • Are most crucial
  • Are most regularly up to date
  • Are probably the most useful resource intensive

As at all times, in case you want help figuring out or executing knowledge asset standardization, have interaction our Skilled Providers consultants. We did!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles