Fast Cube Processing Solution: “Process Add” via MDX, XMLA, and T-SQL
Problem
In the enterprise companies, we have to design pipelines that are used to gather data from different sources and then we transfer them into our data warehouse. We use data warehouses for creating cubes. In case the volume of data reaches billions and we need to have a daily process of the entire cubes as ‘Full Process’, the read time from the data warehouse may occur bottleneck. So, we propose a solution based on the “Process Add” option in SQL Server Analysis Server multidimensional mode. This solution is implemented based on MDX, T-SQL, and XMLA languages.
Solution
For example, we need to find the latest day when it has value in measure groups. It shows the latest day that our cube had processed. So, we write an MDX query on our cube to return Latest Day that Adventure Works cube has the “Internet Sale Amount” measure.
WITH MEMBER [Measures].[LatestDay] AS
TAIL(
FILTER(
[Date].[Date].[Date].MEMBERS,
NOT ISEMPTY([Measures].[Internet Sales Amount])
)
).ITEM(0).NAME
SELECT [Measures].[LatestDay] ON COLUMNS
FROM [Adventure Works]
WITH MEMBER: It creates a member in the measure group that contains the latest day that exists on the cube.
TAIL: This function picks up the last item of our filter. The filter finds the members of the date dimension having the sale amount.
We can execute this MDX script through an open query on an SSAS linked server and get the result as a variable. Two things are important when you try to create a linked server of SSAS on your SQL instance. Firstly you should use a provider as “Microsoft OLE DB Provider for Analysis Services” and specify “Data Source”. Moreover security info must be defined. We also must define the database name as ‘Catalog’ for working with Open Query.
Secondly, we must enable “RPC” and “RPC Out” in the server option for execution in the T-SQL script on the linked server and get the result.
In the next step, we could run below script with T-SQL and get the latest day.
DECLARE @ResultTable TABLE(Result NVARCHAR(MAX))
INSERT INTO @ResultTable
SELECT *
FROM OPENQUERY(MULTIDIMENSIONAL, '
WITH MEMBER [Measures].[LatestDay] AS
TAIL(
FILTER(
[Date].[Date].[Date].MEMBERS,
NOT ISEMPTY([Measures].[Internet Sales Amount])
)
).ITEM(0).NAME
SELECT [Measures].[LatestDay] ON COLUMNS
FROM [Adventure Works]
')-- January 28, 2014
Now we convert the result to an integer value that helps us to compare it with our data source key.
DECLARE @LatestDay INTSET @LatestDay = (SELECT Result FROM @ResultTable)SELECT @LatestDay = CONCAT(YEAR(@LatestDay),
MONTH(@LatestDay),
DAY(@LatestDay))-- January 28, 2014 will be converted to 2014128
Now, we could use the latest day on WHERE clause as a parameter with running that via XMLA script on Analysis Server. This helps us to write dynamically in one script. In order to generate the script, we could go to the SSAS instance -> Database -> Cube and then right-click and choose Process. subsequently, in the process option column select Process Add and “configure” will appear on the setting column.
Firstly, you have to choose the measure group and partition that you want to add. Secondly, you should write your query like the screenshot, and then after pressing OK, you need to click on “Script” on process cube wizard to generate the XMLA script on SSMS in order to edit the script for converting it to dynamic mode. In the generated script, find the <QueryDefinition> tag as below:
<QueryDefinition>SELECT *
FROM FactSale
WHERE DateKey ></QueryDefinition>
Then change the whole script like this. Put the defined parameter ‘@LetestDay’ from MDX query on it.
DECLARE @XMLA NVARCHAR(MAX) =
'...
<QueryDefinition>
SELECT *
FROM FactSale
WHERE DateKey >'
+ @LatestDay +
'</QueryDefinition>
...'
Then, the execution of XMLA script happens via T-SQL on the Analysis Server.
EXECUTE (@XMLA) AT MULTIDIMENSIONAL
Conclusion
In this article, we write some query and script to show how it is possible that you can add up your data ASAP on the cube via “Process Add”. You may notice that the process adds acts like insert, not update. So, if your data change through time, we recommend you make a partition as temporary for adding data fast, and then after a while do “process full” on your main partition to obtain all the updated records after fast data adding operation. Finally, by customizing the dimension key and planning a repeated job on SQL Server, you could achieve nearly real-time processing on your cubes!
Script on GitHub is available here.