A couple weeks ago, dbt Labs made a big splash at their yearly conference by announcing the new dbt Semantic Layer. This was a big deal, spawning excited tweets, in-depth thinkpieces, and celebration from partners like us.
The term “semantic layer” (also known as a “metrics layer”) has been around for decades. dbt didn’t invent the concept, nor the word, though their version is certainly worth paying attention to.
“But Austin, what is a semantic layer then?” So glad you asked.
In this article, I’ll break down what a semantic layer is in simple terms and why you should care about dbt’s Semantic Layer.
What is a semantic layer, and where did it come from?
Semantic layer is a very literal term – it’s the “layer” in a data architecture that uses “semantics” (words) that the business user will actually understand. Sometimes it’s called the “business layer” or the “metrics layer”.
Instead of raw tables with column names like “A000_CUST_ID_PROD”, data teams build a semantic layer and rename that column “Customer”. Semantic layers help to hide complex code from business users. This code can get quite complex as data teams try to capture the business logic for key metrics, dimensions, and schemas.
So where did this idea come from? Back in the day (I’m talking about the ‘90s and early 2000s), we had pretty basic data tech. It was very slow and very hard to use if you didn’t have a deep IT background.
Big companies like IBM, SAP, and Oracle built Business Intelligence (BI) tools like Cognos, Business Objects, and Oracle BI, which would take smaller chunks of data from a clunky data warehouse and let IT people build these semantic layers for business users. Essentially, they were more human-readable data layers for business users.
The challenge with early semantic layers
This business-friendly layer sounds like a “nice to have” improvement, but it was really a necessity because trying to run even a basic report across an entire data warehouse could take hours or even days. (Yes, days.)
Enter the first problem: old-school semantic layers took wayyyyy too long to build, since people depended on IT to set up and modify them. To make matters worse, they were cumbersome to maintain since business needs were always changing.
The business users’ solution… export to Excel!
Enter fancy new BI tools like Tableau, Qlik, and Power BI. The theory was that if we empower the business users to “self-serve” with low-code or no-code BI tools, the IT bottleneck will go away and analytics will officially be democratized! At least, that was the idea.
Enter the second problem: we abandoned the semantic layer concept for years, in favor of agility.
Unlike old IT tools, more personas could buy and use these new BI tools. Instead of 1 BI tool using 1 semantic layer, built by 1 team from 1 data warehouse, we had multiple BI tools, being used by all kinds of teams with no real semantic layer.
Just picture this scenario, which probably seems all too real to most data people. I bring my Tableau dashboard to a meeting, someone else brings their Excel workbook, and someone else brings a Power BI dashboard. We all then show different numbers for “total revenue last quarter”. Uh oh!
After years of alternately ignoring and chasing the self-service BI dream, this topic blew up in the data world again. (We even flagged this as one of the six big ideas from 2022 in our Future of the Modern Data Stack report.)
This started in January, when Base Case proposed “Headless Business Intelligence”, a new approach to solving problems with business metrics and terms. A couple months later, Benn Stancil talked about the “missing metrics layer” in today’s data stack.
That’s when things really took off. Airbnb announced that it had been building a home-grown metrics platform called Minerva to solve this issue. Other prominent tech companies soon followed suit, including LinkedIn, Uber, and Spotify. Then dbt opened a PR hinting at a metrics or semantics layer, which included links to those foundational blogs by Benn and Base Case.
The result has been a big open question in the data and analytics world — how can we bring back all the great things that IT loved about semantic layers (consistency, clear governance, and trusted reliable data) without compromising the agility that analysts and business users demand?
Now less than two years after this debate kicked off, it seems that the future of the semantic layer has finally become a reality.
The dbt Semantic Layer
Enter dbt Labs and its new Semantic Layer!
The dbt Semantic Layer is the interface between your data and your analyses: A platform for compiling and accessing dbt assets in downstream tools.
Data practitioners can define metrics in their dbt projects, then data consumers can query consistently defined metrics in downstream tools.
Cameron Afzal, Product Manager for the dbt Semantic Layer
The core concept behind dbt’s Semantic Layer is: define things once, use them anywhere.
Why does that make people happy? This brings the concept of a semantic layer and its universal metrics into dbt’s transformation layer. As dbt Labs put it, “Data practitioners can define metrics in their dbt projects, then data consumers can query consistently defined metrics in downstream tools.”
Data teams can build these models and metrics in dbt, and then tie them into their other developer tools like version control and release management with the Semantic Layer.
Regardless of what BI tool they use, analysts and business users can then grab data and go into that meeting, confident that their answer will be the same because they pulled the metric from a centralized place.
dbt + Atlan
The dbt Semantic Layer is great in its own right, but what makes it even more exciting is how it ties in with key tools across the modern data stack… and we’re one of them!
Alongside the dbt keynote, we announced our partnership with dbt Labs and our integration with the Semantic Layer. With this, joint customers will have access to an end-to-end governance framework for data models and metrics in the modern data stack.
The dbt Semantic Layer created a standard way to define metrics across your transformations and models. Now our integration brings these rich metrics into the rest of the data stack.
With this integration, dbt metrics and models are first-class assets in Atlan. This means that they are searchable and discoverable through our platform and part of auto-generated, column-level lineage, just like any Snowflake table, Fivetran pipeline, or Looker dashboard.
Our native dbt Cloud integration ingests all dbt metrics and metadata about dbt models, merges it with metadata from all other tools in the data stack, creates column-level lineage from source to BI, and sends that unified context back into tools like Snowflake and the BI tools where people work daily.
With powerful impact and root cause analysis, modern data teams finally have the tools they need for end-to-end data governance and change management at every stage of the data lifecycle.