Blog Post

Automate the Organization of Your Critical Data Assets

by
Nick Freund
-
January 31, 2024

In any organization, one of the key challenges end users face is finding data easily enough that they can use it in their daily work.

The challenge users face in simply “finding the dashboard” is evidenced by a commonly implemented hack: bookmarking the frequently used report. While these simple solutions might serve as an adequate band-aid for a user of your data, the inability of users organization wide to find what they need serves as a key impediment to widespread adoption of your data.

Data asset management solutions can accelerate the democratization of your data, particularly if they have a robust system for automating collections of data assets.

How are collections different from folders?

A collection is, simply put, a consolidated library of assets — the most impactful are organized around specific themes, job functions, or projects. Collections also represent a step forward over more traditional folder systems, bringing a higher level of flexibility and automation to your method for organizing assets.

While the associated assets might be from a single analytics tool, the most powerful and useful include assets from different systems where users access and analyze data. This could be as simple as a self service asset from your BI solution, alongside a Google Sheet. 

In its ideal form, a collection should encourage users not to even think about what systems they are using, and instead focus on accessing data they need.

Unlike a traditional folder system, collections are not mutually exclusive, allowing assets to exist in multiple collections at once rather than being restricted to an enforced hierarchy. Collections can also be automatically populated and shared with your target audience of end users, saving your team the manual work of maintaining folders and ensuring everyone who needs access to a specific set of assets has it.

What problems do collections solve?

Collections are meant to address many of the persistent issues organizations face with data asset sprawl, and the resulting disorder that can infect even the most data driven organization. Among the problems solved:

Providing clarity on which assets are available across different systems

  • As we have explored in a previous article, the accumulation of assets across many platforms can be a huge hurdle for users to access the correct information. 
  • Collections give users a single access point, and cut down on wasted time bouncing between different browser tabs in order to find the data they need. 
  • The thematic nature of collections also means that the search process is more goal-oriented, and involves fewer keyword searches in analytics tools where an asset might not even exist.

Contextualizing assets that may need to be used together, or that provide additional context a single asset cannot

  • Analytics assets exist in conversation with each other. At a more technical level, data catalogs that chart data lineage, or tools with lineage features like dbt can make these relationships clear for technical users. 
  • However, for most end users, that level of detail will be unnecessary. What is important is that all the assets around a related theme are brought together in one place, and that they can then use those assets in conjunction with each other to answer questions. 
  • Most users in your organization do not need to know the exact data source a particular column in one of the tables in your data warehouse comes from, but they do need to know which assets have information about that data point, so they can analyze them contextually.

Establishing non-mutually exclusive sets of assets, rather than forcing them into the hierarchy of folders

  • One of the problems with creating traditional folders of assets in an analytics tool is that in most cases the structure of assets in those tools will be mutually exclusive, with assets existing only in one folder. This leads teams to create copies or duplicates of assets that pertain to more than one team or topic. 
  • Because many assets are multi-purpose and make sense to include in different contexts, collections allow you to create libraries where assets do not have to be exclusively contained in that library. For example, a dashboard tracking sales pipeline and achievement may belong in a collection distributed to the Sales team, as well as one shared with the executive leadership team. 
  • Having this kind of flexibility allows you to truly tailor collections to their exact use case, and prevents the need to duplicate assets. This is incredibly important, since duplication is a surefire way to bring more disorder to your environment.

Automating the monotonous busy work required to maintain traditional folders

  • One of the problems in curating traditional folders is that the work is manual. In order to, for example, create a folder of assets pertaining to the marketing team, a user will need to manually drag and drop assets. This becomes cumbersome to maintain over time, especially as the number of curated folders in your environment proliferates.
  • In contrast, a collection is populated based on a predefined query of asset metadata. For example, you can define a collection based on who created an asset, when it was created, where it is in its lifecycle, its source, its topic, or any appropriate combination - and the collection will then be automatically maintained on an ongoing basis. 
  • Collections also function as a sharing mechanism. Once it has been created, you will just have to add each new user to the collection rather than each individual asset. This ensures everyone gets access to the same information without the manual busywork of updating their access.

There are many good use cases for collections, but at their most basic, collections solve the problem of where end users can find the data they need, and draw attention to the relationships between different assets.

How are collections more powerful than an intranet or folders?

We touched on this briefly in the previous section, but the differences between collections and other methods of organizing data are worth exploring in more detail. 

One obvious advantage of a collection over a folder in a native analytics tool is that collections are inherently tool-agnostic, meaning that your relevant data from many different tools can all be housed in the same place for easy reference. The organization can happen within a single place – your data asset management solution – rather than scattered across the different systems where your data and analytics might live. This is a boon to data users across your organization, as they can more easily locate what they need in a single place.

Moreover, because you can automate the maintenance and sharing of a collection, you save your team the monotonous work of curating folders, and ensure the ongoing quality of which data assets populate a collection and therefore are consumed. 

In contrast to traditional folder structures, intranets are another way teams have tried to highlight the most important assets users need to know about. While having the benefits of being tool agnostic, an intranet’s inherent limitation is that it does not contain the actual assets it is referring to, nor is its maintenance automated. At best it will contain links out to the relevant system, and at worst it will be so out of date that the information will be meaningless. 

Collections, by contrast, are a single access point for the real assets and their live data, removing this additional layer of maintenance and navigation between the user and the final data product. This approach brings your organization one step closer to real time data-driven decision making.

Why use collections?

In a data asset management system, asset collections are not just drag and drop folders, but are instead powered by an underlying rules engine. Rather than relying on manual maintenance, collections can be automatically populated, shared and updated in real-time. This solves pain points for both your data builders and your data consumers.

For data builders, the benefits are primarily in the amount of time saved both in maintenance and in having to answer fewer questions from other teams about which data to use. Keeping an intranet or folder structure up to date is a laborious process — automated collections save all of that busywork.

For data consumers the benefits are even more evident. With fewer uncertainties about which data to use for which task, the time to value for your data, and the relative confidence about that data, improves exponentially. Collections allow data consumers to spend their time driving positive impacts in the organization, rather than sifting through several different tools or asking their data team to find what they need.

The real benefit, of course, is the greater efficiency for your business in taking advantage of your existing analytics, and more time for your analytics teams to build the data products that make your organization successful.

by
Nick Freund
-
January 31, 2024