Quantcast
Channel: Analysis Services Team Blog
Viewing all 69 articles
Browse latest View live

Using Azure Analysis Services on Top of Azure Data Lake Store

$
0
0

The latest release of SSDT Tabular adds support for Azure Data Lake Store (ADLS) to the modern Get Data experience (see the following screenshot). Now you can augment your big data analytics workloads in Azure Data Lake with Azure Analysis Services and provide rich interactive analysis for selected data subsets at the speed of thought!

If you are unfamiliar with Azure Data Lake, check out the various articles at the Azure Data Lake product information site. Also read the article “Get started with Azure Data Lake Analytics using Azure portal.”

Following these instructions, I provisioned a Data Lake Analytics account called tpcds for this article and a new Data Lake Store called tpcdsadls. I also added one of my existing Azure Blob Storage accounts, which contains a 1 TB TPC-DS data set, which I already created and used in the series “Building an Azure Analysis Services Model on Top of Azure Blob Storage.” The idea is to move this data set into Azure Data Lake as a highly scalable and sophisticated analytics backend, from which to serve a variety of Azure Analysis Services models.

For starters, Azure Data Lake can process raw data and put it into targeted output files so that Azure Analysis Services can import the data with less overhead. For example, you can remove any unnecessary columns at the source, which eliminates about 60 GB of unnecessary data from my 1 TB TPC-DS data set and therefore benefits processing performance, as discussed in “Building an Azure Analysis Services Model on Top of Azure Blob Storage–Part 3″.

Moreover, with relatively little effort and a few small changes to a U-SQL script, you can provide multiple targeted data sets to your users, such as a small data set for modelling purposes plus one or more production data sets with the most relevant data. In this way, a data modeler can work efficiently in SSDT Tabular against the small data set prior to deployment, and after production deployment, business users can get the relevant information they need from your Azure Analysis Services models in Microsoft Power BI, Microsoft Office Excel, and Microsoft SQL Server Reporting Services. And if a data scientist still needs more than what’s readily available in your models, you can use Azure Data Lake Analytics (ADLA) to run further U-SQL batch jobs directly against all the terabytes or petabytes of source data you may have. Of course, you can also take advantage of Azure HDInsight as a highly reliable, distributed and parallel programming framework for analyzing big data. The following diagram illustrates a possible combination of technologies on top of Azure Data Lake Store.

Azure Data Lake Analytics (ADLA) can process massive volumes of data extremely quickly. Take a look at the following screenshot, which shows a Data Lake job processing approximately 2.8 billion rows of TPC-DS store sales data (~500 GB) in under 7 minutes!

The screen in the background uses source files in Azure Data Lake Store and the screen in the foreground uses source files in Azure Blob Storage connected to Azure Data Lake. The performance is comparable, so I decided to leave my 1 TB TPC-DS data set in Azure Blob Storage, but if you want to ensure absolute best performance or would like to consolidate your data in one storage location, consider moving all your raw data files into ADLS. It’s straightforward to copy data from Azure Blob Storage to ADLS by using the AdlCopy tool, for example.

With the raw source data in a Data Lake-accessible location, the next step is to define the U-SQL scripts to extract the relevant information and write it along with column names to a series of output files. The following listing shows a general U-SQL pattern that can be used for processing the raw TPC-DS data and putting it into comma-separated values (csv) files with a header row.

@raw_parsed = EXTRACT child_id int,
                <list of all table columns>,
                empty string
FROM "<URI to Blob Container>/{*}_{child_id}_100.dat"
USING Extractors.Text(delimiter: '|');

@filtered_results = SELECT <list of relevant table columns>
FROM @raw_parsed
<WHERE clause to extract specific rows for output>;

OUTPUT @filtered_results
TO "/<output folder>/<filename>.csv"
USING Outputters.Csv(outputHeader:true);

The next listing shows a concrete example based on the small income_band table. Note how the query extracts a portion of the file name into a virtual child_id column in addition to the actual columns from the source files. This child_id column comes in handy later when generating multiple output csv files for the large TPC-DS tables. Also, the WHERE clause is not strictly needed in this example because the income_band table only has 20 rows, but it’s included to illustrate how to restrict the amount of data per table to a maximum of 100 rows to create a small modelling data set.

@raw_parsed = EXTRACT child_id int,
                      b_income_band_sk string,
                      b_lower_bound string,
                      b_upper_bound string,
                      empty string
FROM "wasb://income-band@aasuseast2/{*}_{child_id}_100.dat"
USING Extractors.Text(delimiter: '|');

@filtered_results = SELECT b_income_band_sk,
                           b_lower_bound,
                           b_upper_bound
FROM @raw_parsed
ORDER BY child_id ASC
FETCH 100 ROWS;

You can find complete sets of U-SQL scripts to generate output files for different scenarios (modelling, single csv file per table, multiple csv files for large tables, and large tables filtered by last available year) at the GitHub repository for Analysis Services.

For instance, for generating the modelling data set, there are 25 U-SQL scripts to generate a separate csv file for each TPC-DS table. You can run each U-SQL script manually in the Microsoft Azure portal, yet it is more convenient to use a small Microsoft PowerShell script for this purpose. Of course, you can also use Azure Data Factory, which among other things enables you to run U-SQL scripts on a scheduled basis. For this article, however, the following Microsoft PowerShell script suffices.

$script_folder = "<Path to U-SQL Scripts>"
$adla_account = "<ADLA Account Name>"
Login-AzureRmAccount -SubscriptionName "<Windows Azure Subscription Name>"

Get-ChildItem $script_folder -Filter *.usql |
Foreach-Object {
    $job = Submit-AdlJob -Name $_.Name -AccountName $adla_account –ScriptPath $_.FullName -DegreeOfParallelism 100
    Wait-AdlJob -Account $adla_account -JobId $job.JobId
}

Write-Host "Finished processing U-SQL jobs!";

It does not take long for Azure Data Lake to process the requests. You can use the Data Explorer feature in the Azure Portal to double-check that the desired csv files have been generated successfully, as the following screenshot illustrates.

With the modelling data set in place, you can finally switch over to SSDT and create a new Analysis Services Tabular model at the 1400 compatibility level. Make sure you have the latest version of the Microsoft Analysis Services Projects package installed so that you can pick Azure Data Lake Store from the list of available connectors. You will be prompted for the Azure Data Lake Store URL and you must sign in using an organizational account. Currently, the Azure Data Lake Store connector only supports interactive logons, which is an issue for processing the model in an automated way in Azure Analysis Services, as discussed later in this article. For now, let’s focus on the modelling aspects.

The Azure Data Lake Store connector does not automatically establish an association between the folders or files in the store and the tables in the Tabular model. In other words, you must create each table individually and select the corresponding csv file in Query Editor. This is a minor inconvenience. It also implies that each table expression specifies the folder path to the desired csv file individually. If you are using a small data set from a modelling folder to create the Tabular model, you would need to modify every table expression during production deployment to point to the desired production data set in another folder. Fortunately, there is a way to centralize the folder navigation by using a shared expression so that only a single expression requires an update on production deployment. The following diagram depicts this design.

To implement this design in a Tabular model, use the following steps:

  1. Start Visual Studio and check under Tools -> Extensions and Updates that you have the latest version of Microsoft Analysis Services Projects installed.
  2. Create a new Tabular project at the 1400 compatibility level.
  3. Open the Model menu and click on Import From Data Source.
  4. Pick the Azure Data Lake Store connector, provide the storage account URL, and sign in by using an Organizational Account. Click Connect and then OK to create the data source object in the Tabular model.
  5. Because you chose Import From Data Source, SSDT displays Query Editor automatically. In the Content column, click on the Table link next to the desired folder name (such as modelling) to navigate to the desired root folder where the csv files reside.
  6. Right-click the Table object in the right Queries pane, and click Create Function. In the No Parameters Found dialog box, click Create.
  7. In the Create Function dialog box, type GetCsvFileList, and then click OK.
  8. Make sure the GetCsvFileList function is selected, and then on the View menu, click Advanced Editor.
  9. In the Edit Function dialog box informing you that updates from the Table object will no longer propagate to the GetCsvFileList function if you continue, click OK.
  10. In Advanced Editor, note how the GetCsvFileList function navigates to the modelling folder, enter a whitespace character at the end of the last line to modify the expression, and then click Done.
  11. In the right Queries pane, select the Table object, and then in the left Applied Steps pane, delete the Navigation step, so that Source is the only remaining step.
  12. Make sure the Formula Bar is displayed (View menu -> Formula Bar), and then redefine the Source step as = GetCsvFileList() and press Enter. Verify that the list of csv files is displayed in Query Editor, as in the following screenshot.
  13. For each table you want to import:
    1. Right-click the existing Table object and click Duplicate.
    2. In the Content column, click on the Binary link next to the desired file name (such as call_center) and verify that Query Editor parses the columns and detects the data types correctly.
    3. Rename the table according to the csv file you selected (such as call_center).
    4. Right-click the renamed table object (such as call_center) in the Queries pane and click Create New Table.
    5. Verify that the renamed table object (such as call_center) is no longer displayed in italic, which indicates that the query will now be imported as a table into the Tabular model.
  14. After you created all desired tables by using the sequence above, delete the original Table object by right-clicking on it and selecting Delete.
  15. In Query Editor, click Import to add the GetCsvFileList expression and the tables to your Tabular model.

During the import, SSDT Tabular pulls in the small modelling data set. And prior to production deployment, it is now a simple matter of updating the shared expression by right-clicking on the Expressions node in Tabular Model Explorer and selecting Edit Expressions, and then changing the folder name in Advanced Editor. The below screenshot highlights the folder name in the GetCsvFileList expression. And if each table can find its corresponding csv file in the new folder location, deployment and processing can succeed.

Another option is to deploy the model with the Do Not Process deployment option and use a small TOM application in Azure Functions to process the model on a scheduled basis. Of course, you can also use SSMS to connect to your Azure Analysis Services server and send a processing command, but it might be inconvenient to keep SSDT or SSMS connected for the duration of the processing cycle. Processing against the full 1 TB data set with a single csv file per table took about 15 hours to complete. Processing with four csv files/partitions for the seven large tables and maxActiveConnections on the data source set to 46 concurrent connections took roughly 6 hours. This is remarkably faster in comparison to using general BLOB storage, as in the Building an Azure Analysis Services Model on Top of Azure Blob Storage article, and suggests that there is potential for performance improvements in the Azure BLOB storage connector.

Even the processing performance against Azure Data Lake could possibly be further increased, as the processor utilization on an S9 Azure Analysis Server suggests (see the following screenshot). For the first 30 minutes, processor utilization is close to the maximum and then it decreases as the AS engine finishes more and more partitions and tables. Perhaps with an even higher degree of parallelism, such as with eight or twelve partitions for each large table, Azure AS could keep processor utilization near the maximum for longer and finish the processing work sooner. But processing optimizations through elaborate table partitioning schemes is beyond the scope of this article. The processing performance achieved with four partitions on each large table suffices to conclude that Azure Data Lake is a very suitable big-data backend for Azure Analysis Services.

There is currently only one important caveat: The Azure Data Lake Store connector only supports interactive logons. When you define the Azure Data Lake Store data source, SSDT prompts you to log on to Azure Data Lake. The connector performs the logon and then stores the obtained authentication token in the model. However, this token only has a limited lifetime. Chances are fair that processing succeeds after the initial deployment, but when you come back the next day and want to process again, you get an error that “The credentials provided for the DataLake source are invalid.“ See the screenshot below. Either you deploy the model again in SSDT or you right-click the data source in SSMS and select Refresh Credentials to log on to Data Lake again and submit fresh tokens to the model.

A subsequent article is going to cover how to handle authentication tokens programmatically, so stay tuned for more on connecting to Azure Data Lake and other big data sources on the Analysis Services team blog. And as always, please deploy the latest monthly release of SSDT Tabular and send us your feedback and suggestions by using SSASPrev at Microsoft.com or any other available communication channels such as UserVoice or MSDN forums.


Using Legacy Data Sources in Tabular 1400

$
0
0

The modern Get Data experience in Tabular 1400 brings interesting new data discovery and transformation capabilities to Analysis Services. However, not every BI professional is equally excited. Especially those who prefer to build their models exclusively on top of SQL Server databases or data warehouses and appreciate the steadiness of tried and proven T-SQL queries over fast SQL OLE DB provider connections might not see a need for mashups. If you belong to this group of BI professionals, there is good news: Tabular 1400 fully supports provider data sources and native query partitions. The modern Get Data experience is optional.

Upgrading from 1200 to 1400

Perhaps the easiest way to create a Tabular 1400 model with provider data sources and native query partitions is to upgrade an existing 1200 model to the 1400 compatibility level. If you used Windiff or a similar tool to compare the Model.bim file in your Tabular project before and after the upgrade, you will find that not much was changed. In fact, the only change concerns the compatibilityLevel parameter, which the upgrade logic sets to a value of 1400, as the following screenshot reveals.

At the 1400 compatibility level, regardless of the data sources and table partitions, you can use any advanced modeling feature, such as detail rows expressions and object-level security. There are no dependencies on structured data sources or M partitions using the Mashup engine. Legacy provider data sources and native query partitions work just as well. They bypass the Mashup engine. It’s just two different code paths to get the data.

Provider data sources versus structured data sources

Provider data sources get their name from the fact that they define the parameters for a data provider in the form of a connection string that the Analysis Services engine then uses to connect to the data source. They are sometimes referred to as legacy data sources because they are typically used in 1200 and earlier compatibility levels to define the data source details.

Structured data sources, on the other hand, get their name from the fact that they define the connection details in structured JSON property bags. They are sometimes referred to as modern or Power Query/M-based data sources because they correspond to Power Query/M-based data access functions, as explained in more detail in Supporting Advanced Data Access Scenarios in Tabular 1400 Models.

At a first glance, provider data sources have an advantage over structured data sources because they provide full control over the connection string. You can specify any advanced parameter that the provider supports. In contrast, structured data sources only support the address parameters and options that their corresponding data access functions support. This is usually sufficient, however. Note that provider data sources also have disadvantages, as explained in the next section.

A small sample application can help to illustrate the metadata differences between provider data sources and structured data sources. Both can be added to a Tabular 1400 model using Tabular Object Model (TOM) or the Tabular Model Scripting Language (TMSL).

Note that Analysis Services always invokes the Mashup engine when using structured data sources to get the data. It might or might not for provider data sources. The choice depends on the table partitions on top of the data source, as the next section explains.

Query partitions versus M partitions

Just as there are multiple types of data source definitions in Tabular 1400, there are also multiple partition source types to import data into a table. Specifically, you can define a partition by using a QueryPartitionSource or an MPartitionSource, as in the following TOM code sample.

As illustrated, you can mix query partitions with M partitions even on a single table. The only requirement is that all partition sources must return the same set of source columns, mapped to table columns at the Tabular metadata layer. In the example above, both partitions use the same data source and import the same data, so you end up with duplicate rows. This is normally not what you want, but in this concrete example, the duplicated rows help to illustrate that Analyses Services could indeed process both partition sources successfully, as in the following screenshot.

The Model.bim file reveals that the M and query partition sources reference a structured data source, but they could also reference a provider data source as in the screenshot below the following table summarizing the possible combinations. In short, you can mix and match to your heart’s content.

  Data Source Partition Source Comments
1 Provider Data Source Query Partition Source The AS engine uses the cartridge-based connectivity stack to access the data source.
2 Provider Data Source M Partition Source The AS engine translates the provider data source into a generic structured data source and then uses the Mashup engine to import the data.
3 Structured Data Source Query Partition Source The AS engine wraps the native query on the partition source into an M expression and then uses the Mashup engine to import the data.
4 Structured Data Source M Partition Source The AS engine uses the Mashup engine to import the data.

 

The scenarios 1 and 4 are straightforward. Scenario 3 is practically equivalent to scenario 4. Instead of creating a query partition source with a native query and having the AS engine convert this into an M expression, you could define an M partition source in the first place and use the Value.NativeQuery function to specify the native query, as the following screenshot demonstrates. Of course, this only works for connectors that support native source queries and the Value.NativeQuery function.

Scenario 2, “M partition on top of a provider data source” is more complex than the others because it involves converting the provider data source into a generic structured data source. In other words, a provider data source pointing to a SQL Server database is not equivalent to a structured SQL Server data source because the AS engine does not convert this provider data source into a structured SQL Server data source. Instead, it converts it into a generic structured OLE DB, ODBC, or ADO.NET data source depending on the data provider that the provider data source referenced. For SQL Server connections, this is usually an OLE DB data source.

The fact that provider data sources are converted into generic structured data sources has important implications. For starters, M expressions on top of a generic data source differ from M expressions on top of a specific structured data source. For example, as the next screenshot highlights, an M expression over an OLE DB data source requires additional navigation steps to get to the desired table. You cannot simply take an M expression based on a structured SQL Server data source and put it on top of a generic OLE DB provider data source. If you tried, you would most likely get an error that the expression references an unknown variable or function.

Moreover, the Mashup engine cannot apply its query optimizations for SQL Server when using a generic OLE DB data source, so M expressions on top of generic provider data sources cannot be processed as efficiently as M expressions on top of specific structured data sources. For this reason, it is better to add a new structured data source to the model for any new M expression-based table partitions than to use an existing provider data source. Provider data sources and structured data sources can coexist in the same Tabular model.

In Tabular 1400, the main purpose of a provider data source is backward compatibility with Tabular 1200 so that the processing behavior of your models does not change just because you upgraded to 1400 and so that any ETL logic programmatically generating data sources and table partitions continues to work seamlessly. As mentioned, query partitions on top of a provider data source bypass the Mashup engine. However, the processing performance is not necessarily inferior with a structured data source thanks to a number of engine optimizations. This might seem counterintuitive, but it is a good idea to double-check the processing performance in your environment. The Microsoft SQL Server Native Client OLE DB Provider is indeed performing faster than the Mashup engine. In very large Tabular 1400 models connecting to SQL Server databases, it can therefore be advantageous to use a provider data source and query partitions.

Data sources and partitions in SSDT Tabular

With TMSL and TOM, you can create data sources and table partitions in any combination, but this is not the case in SSDT Tabular. By default, SSDT creates structured data sources, and when you right-click a structured data source in Tabular Model Explorer and select Import New Tables, you launch the modern Get Data UI. Among other things, the default behavior helps to provide a consistent user interface and avoids confusion. You don’t need to weigh the pros and cons of provider versus structured and you don’t need to select a different partition source type and work with a different UI just because you wanted to write a native query. As explained in the previous section, an M expression using Value.NativeQuery is equivalent to a query partition over a structured data source.

Only if a model contains provider data sources already, say due to an upgrade from 1200, SSDT displays the legacy UI for editing these metadata objects. By the same token, when you right-click a provider data source in Tabular Model Explorer and select Import New Tables, you launch the legacy UI for defining a query partition source. If you don’t add any new data sources, the user interface is still consistent with the 1200 experience. Yet, if you mix provider and structured data sources in a model, the UI switches back and forth depending on what object type you edit. See the following screenshot with the modern experience on the left and the legacy UI on the right – which one you see depends on the data source type you right-clicked.

Fully enabling the legacy UI

BI professionals who prefer to build their Tabular models exclusively on top of SQL Server data warehouses using native T-SQL queries might look unfavorable at SSDT Tabular’s strong bias towards the modern Get Data experience. But the good news is that you can fully enable the legacy UI to create provider data sources in Tabular 1400 models, so you don’t need to resort to using TMSL or TOM for this purpose.

In the current version of SSDT Tabular, you must configure a DWORD parameter called “Enable Legacy Import” in the Windows registry. Setting this parameter to 1 enables the legacy UI. Setting it to zero or removing the parameter disables it again. To enable the legacy UI, you can copy the following lines into a .reg file and import the file into the registry. Do not forget to restart Visual Studio to apply the changes.

Windows Registry Editor Version 5.0

[HKEY_CURRENT_USER\Software\Microsoft\Microsoft SQL Server\14.0\Microsoft Analysis Services\Settings]
"Enable Legacy Import"=dword:00000001

With the legacy UI fully enabled, you can right-click on Data Sources in Tabular Model Explorer and choose to Import From Data Source (Legacy) or reuse Existing Connections (Legacy), as in the following screenshot. As you would expect, these options create provider data sources in the model and then you can create query partitions on top of these.

Wrapping things up

While AS engine, TMSL, and TOM give you full control over data sources and table partitions, SSDT Tabular attempts to simplify things by favoring M partitions over structured data sources wherever possible. The legacy UI only shows up if you already have provider data sources or query partitions in your model. Should legacy data sources and query partitions be first-class citizens in Tabular 1400? Perhaps SSDT should provide an explicit option in the user interface to enable the legacy UI to eliminate the need for configuring a registry parameter. Let us know if this is something we should do. Also, there is currently no SSDT support for creating M partitions over provider data sources or query partitions over structured data sources because these scenarios seem less important and less desirable. Do you need these features?

Send us your feedback via email to SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Or simply post a comment to this article. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

Asynchronous Refresh with the REST API for Azure Analysis Services

$
0
0

We are pleased to introduce the REST API for Azure Analysis Services. Using any programming language that supports REST calls, you can now perform asynchronous data-refresh operations. This includes synchronization of read-only replicas for query scale out.

Data-refresh operations can take some time depending on various factors including data volume and level of optimization using partitions, etc. These operations have traditionally been invoked with existing methods such as using TOM (Tabular Object Model), PowerShell cmdlets for Analysis Services, or TMSL (Tabular Model Scripting Language). The traditional methods may require long-running HTTP connections. A lot of work has been done to ensure the stability of these methods, but given the nature of HTTP, it may be more reliable to avoid long-running HTTP connections from client applications.

The REST API for Azure Analysis Services enables data-refresh operations to be carried out asynchronously. It therefore does not require long-running HTTP connections from client applications. Additionally, there are other built-in features for reliability such as auto retries and batched commits.

Base URL

The base URL follows this format:

https://<rollout>.asazure.windows.net/servers/<serverName>/models/<resource>/

For example, consider a model named AdventureWorks, on a server named myserver, located in the West US Azure region. The server name is:

asazure://westus.asazure.windows.net/myserver

The base URL is:

https://westus.asazure.windows.net/servers/myserver/models/AdventureWorks/

Using the base URL, resources and operations can be appended based on the following diagram:

REST API Diagram

  • Anything that ends in “s” is a collection.
  • Anything that ends with “()” is a function.
  • Anything else is a resource/object.

For example, you can use the POST verb on the Refreshes collection to perform a refresh operation:

https://westus.asazure.windows.net/servers/myserver/models/AdventureWorks/refreshes

Authentication

All calls must be authenticated with a valid Azure Active Directory (OAuth 2) token in the Authorization header and must meet the following requirements:

  • The token must be either a user token or an application service principal.
  • The user or application must have sufficient permissions on the server or model to make the requested call. The permission level is determined by roles within the model or the admin group on the server.
  • The token must have the correct audience set to: “https://*.asazure.windows.net”.

POST /refreshes

To perform a refresh operation, use the POST verb on the /refreshes collection to add a new refresh item to the collection. The Location header in the response includes the refresh ID. The client application can disconnect and check the status later if required because it is asynchronous.

Only one refresh operation is accepted at a time for a model. If there is a current running refresh operation and another is submitted, the 409 Conflict HTTP status code will be returned.

The body may, for example, resemble the following:

{
    "Type": "Full",
    "CommitMode": "transactional",
    "MaxParallelism": 2,
    "RetryCount": 2,
    "Objects": [
        {
            "table": "DimCustomer",
            "partition": "DimCustomer"
        },
        {
            "table": "DimDate"
        }
    ]
}

Here’s a list of parameters:

Name Type Description Required? Default
Type Enum The type of processing to perform. The types are aligned with the TMSL refresh command types: full, clearValues, calculate, dataOnly, automatic, add and defragment. False automatic
CommitMode Enum Determines if objects will be committed in batches or only when complete.

Modes include: default, transactional, partialBatch.

False transactional
MaxParallelism int This value determines the maximum number of threads on which to run processing commands in parallel. This aligned with the MaxParallelism property that can be set in the TMSL Sequence command or using other methods. False 10
RetryCount int Indicates the number of times the operation will retry before failing. False 0
Objects Array[] An array of objects to be processed. Each object includes:

“table” when processing the entire table or “table” and “partition” when processing a partition.

If no objects are specified, the whole model is refreshed.

False Process the entire model

CommitMode equal to partialBatch can be used when doing an initial load of a large dataset that may take hours. At time of writing, the batch size is the MaxParallelism value, but this may change. If the refresh operation fails after successfully committing one or more batches, the successfully committed batches will remain committed (it will not roll back successfully committed batches).

GET /refreshes/<refreshId>

To check the status of a refresh operation, use the GET verb on the refresh ID. Here’s an example of the response body. The status field returns “inProgress” if the operation is in progress.

{
    "startTime": "2017-12-07T02:06:57.1838734Z",
    "endTime": "2017-12-07T02:07:00.4929675Z",
    "type": "full",
    "status": "succeeded",
    "currentRefreshType": "full",
    "objects": [
        {
            "table": "DimCustomer",
            "partition": "DimCustomer",
            "status": "succeeded"
        },
        {
            "table": "DimDate",
            "partition": "DimDate",
            "status": "succeeded"
        }
    ]
}

GET /refreshes

To retrieve a list of historical refresh operations for a model, use the GET verb on the /refreshes collection. Here is an example of the response body. At time of writing, the last 30 days of refresh operations are stored and returned, but this is subject to change.

[
    {
        "refreshId": "1344a272-7893-4afa-a4b3-3fb87222fdac",
        "startTime": "2017-12-09T01:58:04.76",
        "endTime": "2017-12-09T01:58:12.607",
        "status": "succeeded"
    },
    {
        "refreshId": "474fc5a0-3d69-4c5d-adb4-8a846fa5580b",
        "startTime": "2017-12-07T02:05:48.32",
        "endTime": "2017-12-07T02:05:54.913",
        "status": "succeeded"
    }
]

DELETE /refreshes/<refreshId>

To cancel an in-progress refresh operation, use the DELETE verb on the refresh ID.

POST /sync

Having performed refresh operations, it may be necessary to synchronize the new data with replicas for query scale out. To perform a synchronize operation for a model, use the POST verb on the /sync function. The Location header in the response includes the sync operation ID.

GET /sync?operationId=<operationId>

To check the status of a sync operation, use the GET verb passing the operation ID as a parameter. Here’s an example of the response body:

{
    "operationId": "cd5e16c6-6d4e-4347-86a0-762bdf5b4875",
    "database": "AdventureWorks2",
    "UpdatedAt": "2017-12-09T02:44:26.18",
    "StartedAt": "2017-12-09T02:44:20.743",
    "syncstate": 2,
    "details": null
}

Possible values for syncstate include the following:

  • 0: Replicating. Database files are being replicated to a target folder.
  • 1: Rehydrating. The database is being rehydrated on read-only server instance(s).
  • 2: Completed. The sync operation completed successfully.
  • 3: Failed. The sync operation failed.
  • 4: Finalizing. The sync operation has completed but is performing clean up steps.

Code sample

Here’s a C# code sample to get you started:

https://github.com/Microsoft/Analysis-Services/tree/master/RestApiSample

To use the code sample, first do the following:

  1. Clone or download the repo. Open the RestApiSample solution.
  2. Find the line “client.BaseAddress = …” and provide your base URL (see above).

The code sample can use the following forms of authentication:

  • Interactive login or username/password
  • Service principal

Interactive login or username/password

This form of authentication requires an Azure application be set up with the necessary API permissions assigned. This section describes how to set up the application using the Azure portal.

  1. Select the Azure Active Directory section, click App registrations, and then New application registration.

New app registration

  1. In the Create blade, enter a meaningful name, select Native application type, and then enter “urn:ietf:wg:oauth:2.0:oob” for the Redirect URI. Then click the Create button.

Create-App

  1. Select your app from the list and take note of the Application ID.

App-ID

  1. In the Settings section for your app, click Required permissions, and then click Add.

Required-Permissions

  1. In Select an API, type “SQL Server Analysis Services” into the search box. Then select “Azure Analysis Services (SQL Server Analysis Services Azure)”.

API-Permissions

  1. Select Read and Write all Models and then click the Select button. Then click Done to add the permissions. It may take a few minutes to propagate.

API-Permissions

  1. In the code sample, find the method UpdateToken(). Observe the contents of this method.
  2. Find the line “string clientID = …” and enter the application ID you previously recorded.
  3. Run the sample.

Service principal

Please see the Automation of Azure Analysis Services with Service Principals and PowerShell blog post for how to set up a service principal and assign the necessary permissions in Azure Analysis Services. Having done this, the following additional steps are required:

  1. Find the line “string authority = …” and enter your organization’s tenant ID in place of “common”. See code comments for further info.
  2. Comment/uncomment so the ClientCredential class is used to instantiate the cred object. Ensure the <App ID> and <App Key> values are accessed in a secure way or use certificate-based authentication for service principals.
  3. Run the sample.

Automation of Analysis Services with NuGet packages

$
0
0

We are pleased to announce that the Analysis Services Management Objects (AMO) and ADOMD client libraries are now available from NuGet.org! This simplifies the development and management of automation tasks for Azure Analysis Services and SQL Server Analysis Services.

NuGet provides benefits including the following.

  • Azure Functions require less manual intervention to work with Azure Analysis Services.
  • ISVs and developers of community tools for Analysis Services such as DAX Studio, Tabular Editor, BISM Normalizer and others will benefit from simplified deployment and reliable (non-GAC) references.

Visit this site for information on what NuGet is for and how to use it.

AMO

https://www.nuget.org/packages/Microsoft.AnalysisServices.retail.amd64/

AMO contains the object libraries to create and manage both Analysis Services multidimensional and tabular models. The object library for tabular is called the Tabular Object Model (TOM). See here for more information.

ADOMD

https://www.nuget.org/packages/Microsoft.AnalysisServices.AdomdClient.retail.amd64/

ADOMD is used primarily for development of client tools that submit MDX or DAX queries to Analysis Services for user analysis. See here for more information.

MSI installer

We recommend that all developers who use these libraries migrate to NuGet references instead of using the MSI installer. The MSI installer for AMO and ADOMD are still available here. Starting from MAJOR version 15 to the foreseeable future, we plan to release the client libraries as both NuGet packages and the MSI installer. In the long term, we want to retire the MSI installer.

NuGet versioning

NuGet package assemblies AssemblyVersion will follow semantic versioning: MAJOR.MINOR.PATCH. This ensures NuGet references load the expected version even if there is a different version in the GAC (resulting from MSI install). We will increment at least the PATCH for each public release. AMO and ADOMD versions will be kept in sync.

MSI versioning

The MSI version will continue to use a versioning scheme for AssemblyVersion like 15.0.0.0 (MAJOR version only). The MSI installs assemblies to the GAC and overrides previous MSI installations for the same MAJOR version. This versioning scheme ensures new releases do not affect NuGet references. It also ensures a single entry in Add/Remove Programs for each installation of a MAJOR version.

Windows Explorer file properties

The AssemblyFileVersion (visible in Windows Explorer as the File Version) will be the full version for both MSI and NuGet (e.g. 15.0.1.333).

The AssemblyInformationalVersion (visible in Windows Explorer as the Product Version) will be the semantic version for both MSI and NuGet (e.g. 15.0.1.0).

For anyone who has done significant development with Analysis Services, we hope you enjoy the NuGet experience for Analysis Services!

New memory options for Analysis Services

$
0
0

We’re happy to introduce some new memory settings for Azure Analysis Services and SQL Server Analysis Services tabular models. These new settings are primarily for resource governance, and in some cases can speed up data refresh.

IsAvailableInMdx

The IsAvailableInMdx column property is available in Azure Analysis Services and SQL Server Analysis Services 2017 CU7.

It prevents building attribute hierarchies, reducing memory consumption. This means the column is not available for group-by queries to MDX clients like Excel. Fact (transactional) table columns often don’t need to be grouped and are often the ones that use up the most memory.

This property can also improve performance of refresh operations; particularly for tables with lots of partitions. Building attribute hierarchies can be very expensive because they’re built over all the data in the table.

Currently, this property is not yet exposed in SSDT. It must be set in the JSON-based metadata by using Tabular Model Scripting Language (TMSL) or the Tabular Object Model (TOM). This property is specified as a Boolean.

The following snippet of JSON-based metadata from the Model.bim file disables attribute hierarchies for the Sales Amount column:

  {
    "name": "Sales Amount",
    "dataType": "decimal",
    "sourceColumn": "SalesAmount",
    "isAvailableInMdx": false
  }

QueryMemoryLimit

The Memory\QueryMemoryLimit property can be used to limit memory spools built by DAX queries submitted to the model. Currently, this property is only available in Azure Analysis Services.

Changing this property can be useful in controlling expensive queries that result in significant materialization. If the spooled memory for a query hits the limit, the query is cancelled and an error is returned, reducing the impact on other concurrent users of the system. Currently, MDX queries are not affected. It does not account for other general memory allocations used by the query.

The settable value of 1 to 100 is a percentage. Above that, it’s in bytes. The default value of 0 means not specified and no limit is applied.

You can set this property by using the latest version of SQL Server Management Studio (SSMS), in the Server Properties dialog box. See the Server Memory Properties article for more information.

QueryMemoryLimit

DbpropMsmdRequestMemoryLimit

The DbpropMsmdRequestMemoryLimit XMLA property can be used to override the Memory\QueryMemoryLimit server property value for a connection. Currently, this property is only available in Azure Analysis Services.  The unit of measure is in kilobytes. See the Connection String Properties article for more information.

RowsetSerializationLimit

The OLAP\Query\RowsetSerializationLimit server property limits the number of rows returned in a rowset to clients. Currently, this property is only available in Azure Analysis Services.

This property, set in the Server Properties dialog box in the latest version of SSMS, applies to both DAX and MDX. It can be used to protect server resources from extensive data export usage. Queries submitted to the server that would exceed the limit are cancelled and an error is returned. The default value is -1, meaning no limit is applied.

Whitepaper on modeling for AS tabular scalability

$
0
0

This is a guest post from Daniel Rubiolo of the Power BI Customer Advisory Team (PBI CAT).

To scale Tabular models to very big volumes of data, and with the objective of getting as much of the data as possible into memory so users can consume it in their analyses, we must consider data preparation and modeling best practices to help the engine do its best work.

AS (Analysis Services) Tabular at its core has an in-memory columnar database engine optimized for BI exploratory analytics. It’s highly efficient in encoding and compressing data in memory, supporting custom business logic with calculated columns & measures, enabling high concurrency, and delivering blazing fast responses.

In this whitepaper (DOCX, PDF) we cover a simplified overview of how the Tabular engine works, and several data modeling design best practices that take the most advantage of the engine’s inner workings. This article was written with an Azure Analysis Services example, but equally applies to SQL Server 2017 Analysis Services. (With a few exceptions, these guidelines also apply to Excel’s Power Pivot and Power BI Desktop & Service.)

With a real-world example, we cover recommendations such as:

  1. Benefits of designing your model as a “dimensional model”, a.k.a. “star schema”.
    • Pre-processing your data in this manner will enable the most scale for high data volumes, and deliver the best performance at query time.
  2. Steps for optimizing Dimensions:
    • Minimize the number of columns.
    • Reduce cardinality (data type conversions).
    • Filter out unused dimension values (unless a business scenario requires them).
    • Integer Surrogate Keys (SK).
    • Ordered by SK (to maximize Value encoding).
    • Hint for VALUE encoding on numeric columns.
    • Hint for disabling hierarchies on SKs.
  3. Steps for optimizing Facts:
    • Handle early arriving facts. [Facts without corresponding dimension records.]
    • Replace dimension IDs with their surrogate keys.
    • Reduce cardinality (data type conversions).
    • Consider moving calculations to the source (to use in compression evaluations).
    • Ordered by less diverse SKs first (to maximize compression).
    • Increased Tabular sample size for deciding Encoding, by considering segments and partitions.
    • Hint for VALUE encoding on numeric columns.
    • Hint for disabling hierarchies.

AS Tabular provides very high flexibility in what models you can build. The guidelines in this article will help you maximize the capabilities you provide with your solutions. We hope you find it useful!

What’s new for SQL Server 2019 Analysis Services CTP 2.3

$
0
0

We find great pleasure in announcing the public CTP 2.3 of SQL Server 2019 Analysis Services. New features detailed here are planned to ship later in Power BI Premium and Azure Analysis Services.

Calculation groups

Here is a question for seasoned BI professionals: what is the most powerful feature of SSAS multidimensional? Many would say the ability to define calculated members, typically using scoped cell assignments. Calculated members in multidimensional enable complex calculations by reusing calculation logic. Unfortunately, Analysis Services tabular doesn’t have equivalent functionality. Correction: it does now!!!

Calculation groups address the issue of proliferation of measures in complex BI models often caused by common calculations like time-intelligence. Enterprise models are reused throughout large organizations, so they grow in scale and complexity. It is not uncommon for Analysis Services models to have hundreds of base measures. Each base measure often requires the same time-intelligence analysis. For example, Sales and Order Count may require:

  • Sales MTD, Sales QTD, Sales YTD, Sales PY, Sales YOY%, …
  • Orders MTD, Orders QTD, Orders YTD, Orders PY, Orders YOY%, …

As you can see, this can easily explode the number of measures. If a model has 100 base measures and each requires 10 time-intelligence representations, the model ends up with 1,000 measures in total (100*10). This creates the following problems.

  • The user experience is overwhelming because must sift through so many measures
  • DAX is difficult to maintain
  • Model metadata is bloated

Calculation groups address these issues. They are presented to end-users as a table with a single column. Each value in the column represents a reusable calculation that can be applied to any of the measures where it makes sense. The reusable calculations are called calculation items.

By reducing the number of measures, calculation groups present an uncluttered user interface to end users. They are an elegant way to manage DAX business logic. Users simply select calculation groups in the field list to view the calculations in Power BI visuals. There is no need for the end user or modeler to create separate measures.

Calculation groups user experience

 

Time-intelligence example

Consider the following calculation group example.

Table Time Intelligence
Column Time Calculation
Precedence 20

 

Calculation Item Expression
"Current"
SELECTEDMEASURE()
"MTD"
CALCULATE(SELECTEDMEASURE(), DATESMTD(DimDate[Date]))
"QTD"
CALCULATE(SELECTEDMEASURE(), DATESQTD(DimDate[Date]))
"YTD"
CALCULATE(SELECTEDMEASURE(), DATESYTD(DimDate[Date]))
"PY"
CALCULATE(SELECTEDMEASURE(), SAMEPERIODLASTYEAR(DimDate[Date]))
"PY MTD"
CALCULATE(
    SELECTEDMEASURE(),
    SAMEPERIODLASTYEAR(DimDate[Date]),
    'Time Intelligence'[Time Calculation] = "MTD"
)
"PY QTD"
CALCULATE(
    SELECTEDMEASURE(),
    SAMEPERIODLASTYEAR(DimDate[Date]),
    'Time Intelligence'[Time Calculation] = "QTD"
)
"PY YTD"
CALCULATE(
    SELECTEDMEASURE(),
    SAMEPERIODLASTYEAR(DimDate[Date]),
    'Time Intelligence'[Time Calculation] = "YTD"
)
"YOY"
SELECTEDMEASURE() –
CALCULATE(
    SELECTEDMEASURE(),
    'Time Intelligence'[Time Calculation] = "PY"
)
"YOY%"
DIVIDE(
    CALCULATE(
        SELECTEDMEASURE(),
        'Time Intelligence'[Time Calculation]="YOY"
    ),
    CALCULATE(
        SELECTEDMEASURE(),
        'Time Intelligence'[Time Calculation]="PY"
    ),
)

 

Here is a DAX query and output. The output shows the calculations applied. For example, QTD for March 2012 is the sum of January, February and March 2012.

EVALUATE
CALCULATETABLE (
    SUMMARIZECOLUMNS (
        DimDate[CalendarYear],
        DimDate[EnglishMonthName],
        "Current", CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "Current" ),
        "QTD",     CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "QTD" ),
        "YTD",     CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "YTD" ),
        "PY",      CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "PY" ),
        "PY QTD",  CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "PY QTD" ),
        "PY YTD",  CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "PY YTD" )
    ),
    DimDate[CalendarYear] IN { 2012, 2013 }
)

Time intelligence

 

Sideways recursion

Some of the calculation items refer to other ones in the same calculation group. This is called “sideways recursion”. For example, YOY% (shown below for easy reference) refers to 2 other calculation items, but they are evaluated separately using different calculate statements. Other types of recursion are not supported (see below).

DIVIDE(
    CALCULATE(
        SELECTEDMEASURE(),
        'Time Intelligence'[Time Calculation]="YOY"
    ),
    CALCULATE(
        SELECTEDMEASURE(),
        'Time Intelligence'[Time Calculation]="PY"
    ),
)

 

Single calculation item in filter context

Here is the definition of PY YTD:

CALCULATE(
    SELECTEDMEASURE(),
    SAMEPERIODLASTYEAR(DimDate[Date]),
    'Time Intelligence'[Time Calculation] = "YTD"
)

The YTD argument to the CALCULATE() function overrides the filter context to reuse the logic already defined in the YTD calculation item. It is not possible to apply both PY and YTD in a single evaluation. Calculation groups are only applied if a single calculation item from the calculation group is in filter context.

This is illustrated by the following query and output.

EVALUATE
CALCULATETABLE (
    SUMMARIZECOLUMNS (
        DimDate[CalendarYear],
        DimDate[EnglishMonthName],

        //No time intelligence applied: all calc items in filter context:
        "InternetTotalSales", [InternetTotalSales],

        //No time intelligence applied: 2 calc items in filter context:
        "PY || YTD", CALCULATE ( [InternetTotalSales],
            'Time Intelligence'[Time Calculation] = "PY" || 'Time Intelligence'[Time Calculation] = "YTD"
        ),

        //YTD applied: exactly 1 calc item in filter context:
        "YTD", CALCULATE ( [InternetTotalSales], 'Time Intelligence'[Time Calculation] = "YTD" )
    ),
    DimDate[CalendarYear] = 2012
)

Single-calc-item

A calculation group should be designed so that each calculation item within it presented to the end user only makes sense to be applied one at a time. If there is a business requirement to allow the end user to apply more than one calculation item at a time, multiple calculation groups should be used with different precedence.

 

Precedence

In the same model as the time-intelligence example above, the following calculation group also exists. It contains average calculations that are independent of traditional time intelligence in that they don’t change the date filter context; they just apply average calculations within it.

In this example, a daily average calculation is defined. It is common in oil-and-gas applications to use calculations such as “barrels of oil per day”. Other common business examples include “store sales average” in the retail industry.

Whilst such calculations are calculated independently of time-intelligence calculations, there may well be a requirement to combine them. For example, the end-user might want to see “YTD barrels of oil per day” to view the daily-oil rate from the beginning of the year to the current date. In this scenario, precedence should be set for calculation items.

Table Averages
Column Average Calculation
Precedence 10

 

Calculation Item Expression
"No Average"
SELECTEDMEASURE()
"Daily Average"
DIVIDE(SELECTEDMEASURE(), COUNTROWS(DimDate))

 

Here is a DAX query and output.

EVALUATE
    CALCULATETABLE (
        SUMMARIZECOLUMNS (
        DimDate[CalendarYear],
        DimDate[EnglishMonthName],
        "InternetTotalSales", CALCULATE (
            [InternetTotalSales],
            'Time Intelligence'[Time Calculation] = "Current",
            'Averages'[Average Calculation] = "No Average"
        ),
        "YTD", CALCULATE (
            [InternetTotalSales],
            'Time Intelligence'[Time Calculation] = "YTD",
            'Averages'[Average Calculation] = "No Average"
        ),
        "Daily Average", CALCULATE (
            [InternetTotalSales],
            'Time Intelligence'[Time Calculation] = "Current",
            'Averages'[Average Calculation] = "Daily Average"
        ),
        "YTD Daily Average", CALCULATE (
            [InternetTotalSales],
            'Time Intelligence'[Time Calculation] = "YTD",
            'Averages'[Average Calculation] = "Daily Average"
        )
    ),
    DimDate[CalendarYear] = 2012
)

YTD-Daily-Avg

The following table shows how the March 2012 values are calculated.

Column name Calculation
YTD Sum of InternetTotalSales for Jan, Feb, Mar 2012

= 495,364 + 506,994 + 373,483

Daily Average InternetTotalSales for Mar 2012 divided by # of days in March

= 373,483 / 31

YTD Daily Average YTD for Mar 2012 divided by # of days in Jan, Feb and Mar

=  1,375,841 / (31 + 29 + 31)

 

For easy reference, here is the definition of the YTD calculation item. It is applied with Precedence of 20.

CALCULATE(SELECTEDMEASURE(), DATESYTD(DimDate[Date]))

Here is Daily Average. It is applied with Precedence of 10.

DIVIDE(SELECTEDMEASURE(), COUNTROWS(DimDate))

Since the precedence of the Time Intelligence calculation group is higher than the Averages one, it is applied as broadly as possible. The YTD Daily Average calculation applies YTD to both the numerator and the denominator (count of days) of the daily average calculation.

This is equivalent to this calculation:

CALCULATE(DIVIDE(SELECTEDMEASURE(), COUNTROWS(DimDate)), DATESYTD(DimDate[Date]))

Not this one:

DIVIDE(CALCULATE(SELECTEDMEASURE(), DATESYTD(DimDate[Date])), COUNTROWS(DimDate))

 

New DAX functions

The following new DAX functions have been introduced to work with calculation groups.

Function name Description
SELECTEDMEASURE()
Returns a reference to the measure currently in context.
SELECTEDMEASURENAME()
Returns a string containing the name of the measure currently in context.
ISSELECTEDMEASURE( M1, M2, … )
Returns a Boolean indicating whether the measure currently in context is one of those specified as an argument.

 

SELECTEMEASURENAME() or ISSELECTEDMEASURE() can be used to conditionally apply calculation items depending on the measure in context. For example, it probably doesn’t make sense to calculate the daily average of a ratio measure.

With ISSELECTEDMEASURE():

IF (
    ISSELECTEDMEASURE ( [Expense Ratio 1], [Expense Ratio 2] ),
    SELECTEDMEASURE (),
    DIVIDE ( SELECTEDMEASURE (), COUNTROWS ( DimDate ) )
)

ISSELECTEDMEASURE() has the advantage of working with formula fix up, so measure-name changes are reflected automatically.

 

Power BI implicit measures

Calculation groups work with query scope measures, but not inline DAX calculations. This is shown by the following query.

DEFINE
MEASURE FactInternetSales[QueryScope] = SUM ( FactInternetSales[SalesAmount] )
EVALUATE
CALCULATETABLE (
    SUMMARIZECOLUMNS (
        DimDate[CalendarYear],
        DimDate[EnglishMonthName],

        //YTD applied successfully to model measure:
        "Model Measure", CALCULATE (
            [InternetTotalSales],
            'Time Intelligence'[Time Calculation] = "YTD"
        ),

        //YTD applied successfully to query scope measure:
        "Query Scope", CALCULATE (
            [QueryScope],
            'Time Intelligence'[Time Calculation] = "YTD"
        ),

        //YTD not applied to inline calculation:
        "Inline", CALCULATE (
            SUM ( FactInternetSales[SalesAmount] ),
            'Time Intelligence'[Time Calculation] = "YTD"
        )
    ),
    DimDate[CalendarYear] = 2012
)

Power BI implicit measures are created when the end user drags columns onto visuals to view aggregated values without creating an explicit measure. At time of writing, Power BI generates DAX for implicit measures written as inline DAX calculations. This means implicit measures don’t work with calculation groups. To reserve the right to introduce this at a later date, a new model property visible in TOM has been introduced called DiscourageImplicitMeasures. In the current version, it must be set to true to create calculation groups. When set to true, Power BI Desktop in Live Connect mode disables creation of implicit measures.

 

DMV support

The following Dynamic Management Views (DMV) have been introduced for calculation groups.

  • TMSCHEMA_CALCULATION_GROUPS
  • TMSCHEMA_CALCULATION_ITEMS

 

OLS

Object-level security (OLS) defined on calculation group tables is not supported in the current release. They can be defined on other tables in the same model. If a calculation item refers to an OLS-secured object, it will return a generic error on evaluation. This is the planned behavior for SSAS 2019.

 

Planned for a forthcoming CTP

We plan to introduce the following items in a forthcoming SQL Server 2019 CTP.

  • MDX query support with calculation groups.
  • RLS is not supported in CTP 2.3. The planned behavior for SSAS 2019 is that you will be able to define RLS on tables in the same model, but not on calculation groups themselves (directly or indirectly).
  • Dynamic format strings. Calculation groups increase the need for dynamic format strings. For example, the YOY% calculation item needs to be displayed as a percentage, while the others should probably inherit the data type of the measure currently in context. We plan to introduce dynamic format strings in an upcoming SQL Server 2019 CTP.
  • ALLSELECTED DAX function support with calculation groups.
  • Detail rows support with calculation groups.

 

Limitations of CTP 2.3

CTP 2.3 of SSAS is still an early build of SSAS 2019. It being released for testing and feedback purposes only, and should not be used by customers in production environments. This applies to models with or without calculation groups.

 

New 1470 Compatibility Level

To use the new features, existing models must be upgraded to the 1470 compatibility level. 1470 models cannot be deployed to SQL Server 2017 or earlier or downgraded to lower compatibility levels.

 

Differences between calculation groups in tabular and calculated members in multidimensional

Calculated members in multidimensional are a little more flexible and enable a few scenarios beyond calculation groups, but they come at the cost of added complexity. We feel calculation groups in tabular provide a great deal of the benefits, with significantly less complexity.

Single calculation-item column

Calculation groups can only have a single calculation-item column, whereas multidimensional allows multiple hierarchies with calculated members in a single utility dimension.

A DAX filter on a column value implicitly filters the other columns in the same table to the values of that row. Without introducing new semantics and complexity, multiple calculation-item columns in a single table would filter each other implicitly, so are disallowed. If you have a requirement to apply multiple calculation items at a time, use separate calculation groups and the Precedence property shown above.

Recursion safeguards not required

MDX supports recursion although there are known performance limitations. Quite often the same query results can be achieved using MDX set-based calculations instead of recursion.

The right-hand side of MDX-script cell assignments to calculated members created by the Business Intelligence Wizard for multidimensional include a reference to the real member from the attribute hierarchy. This is required to safeguard against recursion.

Since DAX doesn’t support recursion, so we don’t need to worry about this for calculation groups. The complexity bar is kept lower. If we ever decide to support recursive DAX in the future, we could perhaps introduce an advanced property to indicate that a DAX object is enabled for recursion, and only then require such safeguards to be in place.

Calculation items cannot be created on other column types

Multidimensional allows creation of calculated members on attribute hierarchies that are not part of utility dimensions. For example, a Northwest Region member can be added to the State hierarchy to aggregate Washington, Oregon and Idaho. This is useful for custom-grouping scenarios but can increase the likelihood of solve-order issues.

Calculation items cannot be added to other column types. This keeps semantic definitions simpler. As we enhance calculation groups in the future – for example, if we introduce query-scoped calculation groups – we will take care to learn from the solve-order lessons of the past and strive for consistent behaviors.

 

Tooling

Calculation groups and many-to-many relationships are currently engine-only features. SSDT support will come before SQL Server 2019 general availability. In the meantime, you can use the fantastic open-source community tool Tabular Editor to author calculation groups. Alternatively, you can use SSAS programming and scripting interfaces such as TOM and TMSL.

Tabular-Editor

 

Pace of delivery

We think you will agree the AS engine team has been on a tear lately. This is the same team that recently delivered, or is currently working on, the following breakthrough features for Power BI.

  • Arguably the biggest scalability feature in the history of the AS engine: aggregations
  • Policy-based incremental refresh
  • Opening the XMLA endpoint to bring AS to Power BI

Calculation groups is yet another monumental feature delivered in a relatively short period of time. It demonstrates Microsoft’s continued commitment to enterprise BI customers.

 

Download Now

To try SQL Server 2019 CTP 2.3, find download instructions on the SQL Server 2019 web page. Enjoy!

 

What’s new for SQL Server 2019 Analysis Services CTP 2.4

$
0
0

We are excited to announce the public CTP 2.4 of SQL Server 2019 Analysis Services. This public preview includes the following enhancements for Analysis Services tabular models.

  • Many-to-many relationships
  • Memory settings for resource governance

Many-to-many relationships

Many-to-many (M2M) relationships in CTP 2.4 are based on M2M relationships in Power BI described here. M2M relationships in CTP 2.4 do not work with composite models. They allow relationships between tables where both columns are non-unique. A relationship can be defined between a dimension and fact table at a granularity higher than the key column of the dimension. This avoids having to normalize dimension tables and can improve the user experience because the resulting model has a smaller number of tables with logically grouped columns. For example, if Budget is defined at the Product Category level, it is not necessary to normalize the Product dimension into separate tables; one at the granularity of Product and the other at the granularity of Product Category.

Tooling

Many-to-many relationships are currently engine-only features. SSDT support will come before SQL Server 2019 general availability. In the meantime, you can use the fantastic open-source community tool Tabular Editor to create many-to-many relationships. Alternatively, you can use SSAS programming and scripting interfaces such as TOM and TMSL.

New 1470 Compatibility Level

To use many-to-many relationships, existing models must be upgraded to the 1470 compatibility level. 1470 models cannot be deployed to SQL Server 2017 or earlier or downgraded to lower compatibility levels.

Memory settings for resource governance

The memory settings described here are already available in Azure Analysis Services. With CTP 2.4, they are now also supported by SQL Server 2019 Analysis Services.

QueryMemoryLimit

The Memory\QueryMemoryLimit property can be used to limit memory spools built by DAX queries submitted to the model.

Changing this property can be useful in controlling expensive queries that result in significant materialization. If the spooled memory for a query hits the limit, the query is cancelled and an error is returned, reducing the impact on other concurrent users of the system. Currently, MDX queries are not affected. It does not account for other general memory allocations used by the query.

The settable value of 1 to 100 is a percentage. Above that, it’s in bytes. The default value of 0 means not specified and no limit is applied.

You can set this property by using the latest version of SQL Server Management Studio (SSMS), in the Server Properties dialog box. See the Server Memory Properties article for more information.

DbpropMsmdRequestMemoryLimit

The DbpropMsmdRequestMemoryLimit XMLA property can be used to override the Memory\QueryMemoryLimit server property value for a connection.  The unit of measure is in kilobytes. See the Connection String Properties article for more information.

RowsetSerializationLimit

The OLAP\Query\RowsetSerializationLimit server property limits the number of rows returned in a rowset to clients.

This property, set in the Server Properties dialog box in the latest version of SSMS, applies to both DAX and MDX. It can be used to protect server resources from extensive data export usage. Queries submitted to the server that would exceed the limit are cancelled and an error is returned. The default value is -1, meaning no limit is applied.

Download Now

To get started with SQL Server 2019 CTP 2.4, find download instructions on the SQL Server 2019 web page. Enjoy!


Future Analysis Services posts on Power BI blog

$
0
0

The Analysis Services Team Blog has served as the medium for many historic announcements and updates in Microsoft BI over the past 10 years. Well, Microsoft is moving off MSDN and TechNet blogs, so this will be the last post on this blog site. The site, including all historical posts, will remain online as a read-only archive.

‘So where will new Analysis Services blog posts be published?’ I hear you ask. We have made it clear that Power BI will be a one-stop shop for both enterprise and self-service BI on a single, all-inclusive platform. Power BI will be a superset of Analysis Services, so there is no better place to announce future Analysis Services features than, you guessed it, on the Power BI blog. The Analysis Services engine is at the foundation of Power BI and powers its datasets, so this just feels right.

Please stay tuned to the new Analysis Services category on the Power BI blog for future Azure Analysis Services and SQL Server Analysis Services announcements:

https://powerbi.microsoft.com/blog/category/analysis-services/

Viewing all 69 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>