How to do automated testing for Power BI semantic models with Azure DevOps
People coming from Software Engineering or Data Engineering know that automated testing in a ci/cd pipeline is a common practice since years. So i was wondering why for Power BI there wasnt even one example out there (atleast for my usecase). So i will show how you can do it also for Power BI.
In my last post “How to enforce Power BI Best Practices with Azure Devops”, I already implemented a CI/CD Pipeline which enforces Power BI Best Practices with Tabular Editor 2 Best Practice Analyzer. I also included a check with Python, so that my Developers aren’t allowed to create Power BI tables with custom sql’s. They are only allowed to import database views or tables.
Now I will build on top of that.
Table of contents:
· 1. The Problem
· 2. The Idea
· 3. The Script
· 4. Demo
· 5. Conclusion
1. The Problem
- Development of a PBI semantic model locally
- Deploying to staging for manual testing
- Shipping to Prod with Tabular Editor or ALM Toolkit
This is a development process often seen for self service models. The only problem i have with this, is the manual testing. Lets assume you want to change a measure or change something on the model. How can you make sure that your measures still output the expected values? For that Azure DevOps Pipelines are a good fit.
2. The Idea
The idea is pretty simple. Lets say we have 1 or more reports already in our prod enviroment. Since the visuals in these reports should still work after making changes on the model, we take the dax queries as a test.
So what i do next is the following:
- Run the dax query on the staging semantic model
- Run the dax query on the prod semantic model
- Compare the results
3. The Script
First create a new project in visual studio. (I’m using my setup from my last article)
Then load your prod version of a Power BI semantic model as the “.pbip” format.
Then create a folder “tests ”with the subfolder “dax”. In that folder put dax queries as text files.
Even in a self service setup I see teams publishing reports via the Power BI App. So just take those as your tests. I would also suggest to create some big dax aggregations tables, so that you have most of your measures/kpi’s covered.
Let’s go through the script:
import pandas as pd
import msal
import json
import requests
import sys
import os
First we are import the necessary libraries,
client_id = "your data here"
tenant_name = "your data here"
client_secret = "your data here"
authority_url = "https://login.microsoftonline.com/" + tenant_name
scope = ["https://analysis.windows.net/powerbi/api/.default"]
app = msal.ConfidentialClientApplication(client_id, authority=authority_url, client_credential=client_secret)
result = app.acquire_token_for_client(scopes=scope)
access_token = result['access_token']
header = {'Content-Type':'application/json', 'Authorization':f'Bearer {access_token}'}
Since I’m using the Power BI Rest API, this code is to log in.
prod_workspace_id = "your data here"
prod_model_id = "your data here"
dev_workspace_id = "your data here"
dev_model_id = "your data here"
To run dax queries on the semantic model, I need the workspace and model id’s.
# Path to the directory containing the text files
directory = 'tests/dax/'
# Iterate through each file in the directory
for filename in os.listdir(directory):
# Check if the file is a text file
if filename.endswith('.txt'):
# Open the file and read its contents
with open(os.path.join(directory, filename), 'r') as file:
query = file.read()
run_checks(query, filename.replace(".txt", ""), prod_model_id, dev_model_id)
This code block loops through each text file which you included in the tests/dax folder. It then calls the function “run_checks”.
def run_checks(query, test_name, prod_model_id, dev_model_id):
body = {
"queries": [{"query": query}],
"serializerSettings": {"includeNulls": "true"}
}
# Endpoint for executing DAX queries against a dataset
dax_query_url = "https://api.powerbi.com/v1.0/myorg" +"/datasets/"+ dev_model_id + "/executeQueries"
# Send the DAX query request
response = requests.post(dax_query_url, headers=header, data=json.dumps(body))
# Extracting the rows from the JSON data
rows = response.json()['results'][0]['tables'][0]['rows']
# Creating a DataFrame
df_dev = pd.DataFrame(rows)
# Rounding all values to two decimal places
df_dev = df_dev.round(2)
# Endpoint for executing DAX queries against a dataset
dax_query_url = "https://api.powerbi.com/v1.0/myorg" +"/datasets/"+ prod_model_id + "/executeQueries"
# Send the DAX query request
response = requests.post(dax_query_url, headers=header, data=json.dumps(body))
# Extracting the rows from the JSON data
rows = response.json()['results'][0]['tables'][0]['rows']
# Creating a DataFrame
df_prod = pd.DataFrame(rows)
# Rounding all values to two decimal places
df_prod = df_prod.round(2)
# Perform comparisons
shape_equal = df_prod.shape == df_dev.shape
columns_equal = df_prod.columns.equals(df_dev.columns)
values_equal = df_prod.equals(df_dev)
index_equal = df_prod.index.equals(df_dev.index)
nan_equal = (df_prod.isna().equals(df_dev.isna()))
# Check if all comparisons are True
if shape_equal and columns_equal and values_equal and index_equal and nan_equal:
print(f"Test: {test_name} success")
else:
print(f"Test: {test_name} failed")
print("##vso[task.logissue type=error;]Test: {} error".format(test_name))
sys.exit(1)
This code block checks if the 2 semantic models return the same values. When they are not the same, the pipeline returns an error.
4. Demo
Now let me walk you trough one example. The following is how my Power BI semantic model looks like:
First i want to showcase what happens, when I add a new table by entering data manually.
I safe the file and upload it into the staging workspace.
Now I go into visual studio code and create a feature branch.
Commit the changes and go into Azure DevOps. There I can create a pull request.
Now we look at the pipeline run.
As expected the run failed, because in my last article I added a rule, that I dont want to have manually created tables in my Power BI semantic model.
Lets remove that table and try again.
This time it worked! Well, I didnt change anything on the semantic model, but you should still understand the concept.
What’s left in Azure DevOps is to approve the pull request and complete the process.
Sometimes you have to update the model or report in the prod workspace.
I believe it performs a function similar to that of Tabular Editor 2 or ALM Toolkit.
5. Conclusion
This is how the new process looks like:
- Development of a PBI semantic model locally
- Deploying to staging
- Create changes as a feature branch
- Create a pull request, so that the pipelines starts
- Get approval for pull request
- Sync main branch with prod workspace
Why I think CI/CD is important even for Power BI semantic models:
- Once you setup your tests and the pipelines, your team can develop way faster than before.
- Your team has to write verbally best practices down (Tabular Editor 2 BPA)
- Gain enterprise maturity. As I mentioned at the beginning, CI/CD and automated testing have been best practices for years in every other IT-related field.
- Easier onboarding for new colleagues, since they can try to contribute starting from their first week. If they break something or violate any best practice rule, they immediately get feedback.
What is your opinion on automated testing with Azure DevOps? How can I improve my example?