4.4 C
New York
Wednesday, February 8, 2023

How to Improve Data Discovery with Persona-Driven Strategies – Atlan


Guest blog by Jacob Frackson, Practice Lead at Montreal Analytics


Data discovery and personas: Different personas use data differently and that should be well understood in your data stack. Finance and product both care about customers, but how can you present those two different variants without causing confusion? From access control to naming conventions, we’ll explain how personas can make your data stack more usable and scalable.


Introduction

The possibilities are nearly endless when it comes to your data, but how do you choose what to build and design for? Who gets to drive when designing your data pipeline and data model?

A single company-wide vision is often out of touch with the real needs, and individual-oriented design is often unrealistic, so where does that leave us? By defining and leveraging your internal personas, you and your data team can strike the right balance between those two extremes and design a data stack that really works. 

Problem

What do these problems look like in action and are they present at your organization? Let’s use the fictional Poutine Shop project as an example. Poutine Shop is an internal project by Montreal Analytics where we have built an ecommerce website for a business that sells Poutine, a traditional dish from Quebec made of french fries, cheese curds and gravy. Poutine Shop strives to solve the global poutine supply chain, one online order at a time. We’ll use this example to showcase these two data organization extremes and how personas can help them better organize their data model.

Poutine Shop, an internal project by Montreal Analytics, used here to demonstrate data personas

Overly generic, company-level data models are superficial and may lead to miscommunications or misinterpretations. While some company-wide metrics can be very powerful and help tie everyone together – such as Customers Served All Time or Monthly Customer Growth – others can be an accident waiting to happen. With revenue, for example, it’s possible to maintain a single universal definition, but what happens when the finance team wants to start reporting on revenue net of cancellations or refunds? And what if sales wants to move the data up and start counting revenue when the contract is signed, not when the payment is collected? Well, now the simple term “revenue” is not nearly sufficient for all these potential analyses and use cases! How do we decide who gets to use “revenue” and what does everyone else do?

Diagram of different data personas that use the "revenue" metric

At the other end of the spectrum, if everyone is left to define revenue on their own we have either low adoption or even more miscommunication! With less structure, many potential data stakeholders will be pushed out due to their lack of familiarity with the tool or lack of confidence in their skills. On the other hand, if they do start defining their own metrics and models it’s inevitable that they will become more and more complex, creating plenty of potential for misalignment: Do you have the same filters? Do you use the same timeframe? Are you using the same aggregate? Sure, it can be very useful and flexible that anyone can now explore fully and define their own metrics, but your building blocks are too small and you don’t have sufficient guardrails in place! 

How different data personas define the "revenue" metric

Defining your Personas

Personas sit between these two extremes, so let’s talk about defining them.

Personas are groups of one or more stakeholders that are characterized by their shared relationship to data: How do they use data? How do they talk about data? What assumptions do they have about that data? In more complex organizations you can have overlapping personas or even nested personas. Data-savvy personas often lead to sub-personas too: for example, if Operations at the Poutine Shop is very sophisticated in their usage of data, subdivisions may be needed to properly define the differences between how the prep team and the delivery team think about scheduling and success metrics.

Personas are often defined by the data they’re interested in, the language they use to describe that data, and the application of that data. When thinking about your personas, if you don’t already have an idea, the best place to start is your org chart. If that doesn’t feel sufficient, look to your biggest data consumers today and see what defines their personas. 

For the Poutine Shop, there are four business units: Finance, Product, Operations, and Marketing, and within that, Operations also has three main functions of Prep, Delivery, and Support. This is a pretty good starting point for our personas, knowing that we can always add or subdivide in the future!

How different data personas in the org chart define and describe different data terms

Design Principles

With personas identified, the data model design can now be updated and tuned. Above, we saw that different personas are interested in different subjects and use different terminology to talk about those subjects; what can we do to accommodate that? Here are four steps that any organization can start applying:

Namespace

Start by reviewing your data namespace. How do you name your schemas? Tables? Columns? How do you name the folders in your BI Tool? In general, how do you choose unique names for all things data? 

Names should be unique, pattern-driven, and meaningful; when choosing names, think about what other names or entities might be competing for that name and choose names that don’t cause contention or create confusion. 

Here are a few examples for column-naming:

  • All timestamp columns should be named in the past tense and suffixed with `_at`: created_at, updated_at, ordered_at
  • All booleans should be prefixed with is_ or has_: is_active, has_subscription
  • All natural keys should be suffixed by `_id` and all surrogate keys should be suffixed by `_sk`: order_id, item_id, order_item_sk

Design Language

Once the names and their patterns have been settled, it’s time to think about everything in between: how should dashboards be designed for different personas? How should documentation norms differ across personas? What are the design rules for each persona when working with data?

Design language includes naming conventions, but it also includes layout, aggregation types, visualization types, documentation length and content, and more. What works for one persona is going to feel entirely out of place for another.

Here are a few examples of metric naming conventions:

  • Product: User Growth, Lifetime Value; these names are concise and packed with meaning.
  • Operations: Time to Package (from Order Placed, Min.), Time to Deliver (from Order Packaged, Min.); these names are verbose, specific, and descriptive.

Access and Organization

Namespaces and design language have been reviewed and updated, and now we need to think about day-to-day usage and access. What does each persona need access to and what don’t they? Limiting access, be it through strict permissions or simply by organizing content to move it off of people’s homepages, can be a huge help. It lightens the cognitive load of using the platform by showing you things in your own design language first, and it helps minimize the risk of misusing or misinterpreting data.

Folder structures, schemas, and in some cases access grants, can all help improve the data workflow for your personas. 

Automation and Process

We’ve overhauled our organization, and now we need to maintain it. Defining processes and adding automation is often critical to maintaining your data systems. In this regard, there are many different techniques at your disposal, ranging from out-of-the-box tools

Here are a few common examples:

  1. Using MR/PR templates to add table- and column-level naming reviews to your code review process
  2. Using dashboards like those available in Looker to review unused content
  3. Using dbt to test your information schema for naming convention consistency
  4. Using various tools like Slido’s dbt-coverage tool to measure your documentation or testing coverage
  5. Using Atlan to create personalized workspaces and automatically maintain data systems with automated column descriptions, documentation, quality checks, and more.

Conclusion

Thanks to the steps above, the Poutine Shop’s data model is more attuned to its team. Marketing is able to use data effectively, Operations is able to use data effectively, and neither has to compromise!

Future data pipelines and models will be able to push this boundary even further, from personas to sub-personas and even personalized data model experiences. Diving head-first into individualized data models is a recipe for disaster, but in the future, and by leveraging metadata, we’ll be able to slowly push towards that level of personalization. Ultimately, we want our data models to be useful and adaptable. Today, personas are a great technique, but in the future they’ll be only one tool in our toolbelt.


Montreal Analytics

Montreal Analytics is a Modern Data Stack consulting firm of 45+ people based out of North America. We help our clients on the whole data journey: pipelines, warehousing, modeling, visualization and activation, using technologies like Fivetran, Snowflake, dbt, Sigma, Looker and Census. From strategic advisory to hands-on development and enablement, our agile team can deploy greenfield data platforms, tackle complex migrations and audit & refactor entangled data models.


Imagine… what does Netflix for data look like? Data teams are diverse. Analysts, engineers, scientists, and architects all have their own preferences. Why serve the same generic experience to all individual personas?

Learn more about Atlan’s powerful Personas and Purposes, an effortless way to personalize Atlan to every user persona, business domain, and data project in your organization.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles