DirectQuery in SQL Server 2016 Analysis Services whitepaper

April 6, 2017, 2:14 am

≫ Next: Learn how to “think like a Freak” with Freakonomics authors at Microsoft Data Insights Summit

≪ Previous: Learn about Azure Analysis Services at the Microsoft Data Insights Summit 2017

I am excited to announce the availability of a new whitepaper called “DirectQuery in SQL Server 2016 Analysis Services”. This whitepaper written by Marco Russo and Alberto Ferrari will take your understanding and knowledge of DirectQuery to the next level so you can make the right decisions in your next project. Although the whitepaper is written for SQL Server Analysis services many of the concepts are shared with Power BI.

A small summary of the whitepaper:

DirectQuery transforms the Microsoft SQL Server Analysis Services Tabular model into a metadata layer on top of an external database. For SQL Server 2016, DirectQuery was redesigned for dramatically improved speed and performance, however, it is also now more complex to understand and implement. There are many tradeoffs to consider when deciding when to use DirectQuery versus in memory mode (VertiPaq). Consider using DirectQuery if you have either a small database that is updated frequently or a large database that would not fit in memory

Download the whitepaper here.

↧

Learn how to “think like a Freak” with Freakonomics authors at Microsoft Data Insights Summit

April 10, 2017, 10:14 am

≫ Next: Meet the Azure Analysis Services team at upcoming user group meetings

≪ Previous: DirectQuery in SQL Server 2016 Analysis Services whitepaper

With more than 5 million copies sold in 40 countries, Freakonomics is a worldwide phenomenon — and we’re thrilled to announce that authors Steven Levitt and Stephen Dubner will join us for a special guest keynote at Microsoft Data Insights Summit, taking place June 12–13, 2017 in Seattle.

The Microsoft Data Insights Summit is our user conference for business analysts — and the place to be for those who want to create a data-driven culture at their organization. Now in its second year, the event is packed with strong technical content, hands-on workshops, and a great lineup of speakers. Plus, attendees can meet 1:1 with the experts behind Microsoft’s data insights tools and solutions, including Microsoft Power BI, SQL Server BI, Excel, PowerApps, and Flow.

From their bestselling books, to a documentary film, to a podcast boasting 8 million monthly downloads, authors Levitt and Dubner have been creating a data culture all their own, showing the world how to make smarter, savvier decisions with data. With their trademark blend of captivating storytelling and unconventional analysis, the duo will teach you to think more productively, more creatively, and more rationally — in other words, they’ll teach you how to think like a Freak!

Levitt and Dubner will get below the surface of modern business practices, discussing the topics that matter most to today’s businesses: how to create behavior change, incentives that work (and don’t work), and the value of asking unpopular questions. Their keynote will leave the audience energized, prepared to solve problems more effectively, and ready to succeed in fresh, new ways.

If you want to learn how to use data to drive better decisions in your business, you don’t want to miss this keynote — or the rest of the Microsoft Data Insights Summit. Register now to join us at the conference, June 12–13. Hope to see you there!

↧

Meet the Azure Analysis Services team at upcoming user group meetings

April 18, 2017, 6:33 pm

≫ Next: Introducing DirectQuery Support for Tabular 1400

≪ Previous: Learn how to “think like a Freak” with Freakonomics authors at Microsoft Data Insights Summit

Come meet the Analysis Services team in person as they answer your questions on Analysis Services in Azure. Learn about the new service and features available now.

The success of any modern data-driven organization requires that information is available at the fingertips of every business user, not just IT professionals and data scientists, to guide their day-to-day decisions. Self-service BI tools have made huge strides in making data accessible to business users. However, most business users don’t have the expertise or desire to do the heavy lifting that is typically required, finding the right sources of data, importing the raw data, transforming it into the right shape, and adding business logic and metrics, before they can explore the data to derive insights. With Azure Analysis Services, a BI professional can create a semantic model over the raw data and share it with business users so that all they need to do is connect to the model from any BI tool and immediately explore the data and gain insights. Azure Analysis Services uses a highly optimized in-memory engine to provide responses to user queries at the speed of thought.

SQL Saturday Silicon Valley – April 22nd
Microsoft Technology Center, 1065 La Avenida, Mountain View, CA
Group Site
Register Now

Boston Power BI User Group – April 25th 6:30pm – 8:30pm
MS Office 5 Wayside Road, Burlington, MA
Group Site
Register Now

New York Power BI User Group – April 27th 6pm-8:30pm
MS Office Times Square, NY
Group Site
Register Now

Philadelphia Power BI User Group – May 1st 3pm-6pm
MS Office Malvern, PA
Group Site
Register Now

Philadelphia SQL User Group – May 2nd
Group Site
Register Now

Portland Power BI User Group Meeting – late May
CSG Pro Office 734 NW 14th Ave Portland OR 97209
Group Site
Registration: Coming Soon!

New to Azure Analysis Services? Find out how you can try Azure Analysis Services or learn how to create your first data model.

↧

Introducing DirectQuery Support for Tabular 1400

April 19, 2017, 9:48 am

≫ Next: New Get Data Capabilities in the GA Release of SSDT Tabular 17.0 (April 2017)

≪ Previous: Meet the Azure Analysis Services team at upcoming user group meetings

With the production release of SSDT 17.0, Tabular projects now support DirectQuery mode at the 1400 compatibility level, so you can tap into large data sets that exceed the available memory on the server and meet data freshness requirements that would otherwise be difficult if not impossible to achieve in Import mode. As with Tabular 1200 models, DirectQuery 1400-supported data sources include SQL Server, Azure SQL Database, Azure SQL Data Warehouse, Oracle, and Teradata, as the following screenshot indicates, and you can only define a single data source per model. Available DAX functions are also limited, as documented in “DAX Formula Compatibility in DirectQuery Mode.” Another important restriction pertains to the M queries that you can create in DirectQuery mode.

Given that Analysis Services must transform all DAX and MDX client queries into source queries to send them to the source where the data resides, M transformations must be foldable. A foldable transformation is a transformation that the Mashup engine can translate (or fold) into the query dialect of the source, such as T-SQL for SQL Server or PL/SQL for Oracle. You can use the View Native Query option in the Query Builder dialog to verify that the transformation you create is foldable. If the option is available and can display a native query, the transformation meets the DirectQuery requirements (see the following screenshot).

On the other hand, if the option is unavailable and a warning is displayed, you must remove the problematic step because it does not meet the DirectQuery requirements. If you attempt to create a table based on an unsupported M query, SSDT Tabular will display an error message asking you to redefine the query or switch the model into Import mode, as the following screenshot illustrates.

The DirectQuery experience in SSDT Tabular is similar to Power BI Desktop, but there are some noteworthy differences. For example, in Power BI Desktop, you can switch individual connections into DirectQuery mode whereas SSDT Tabular enables DirectQuery only on a per-model basis, as the following screenshot illustrates with the Power BI Desktop dialog in the background and SSDT Tabular Solution Explorer and Properties window in the front. Mixing Import and DirectQuery mode data sources is not supported in a Tabular model because, in DirectQuery mode, a model can only have a single data source. Also, Power BI Desktop supports Live mode against Analysis Services, which Tabular models do not support.

Another issue worth mentioning is that there currently is no data preview for tables defined in the model. The preview in Query Editor works just fine, but when you apply the changes by clicking Import, the resulting table in the model remains empty because models in DirectQuery mode do not contain any data as all queries are directed to the source. Usually, you can work around this issue by adding a sample partition, as the article “Add sample data to a DirectQuery model in Design Mode” (https://docs.microsoft.com/en-us/sql/analysis-services/tabular-models/add-sample-data-to-a-directquery-model-in-design-mode) describes, but sample partitions are not yet supported in 1400 mode. This will be completed in a future SSDT Tabular release.

Moreover, SSDT Tabular, running inside Visual Studio, requires 32-bit drivers, while the SSAS engine runs as a 64-bit process and requires the 64-bit versions. This is particularly an issue when connecting to Oracle. Make sure you install the drivers per the following requirements.

	SSDT with Integrated Mode	SSAS Server
SQL Server, Azure SQL Database, Azure SQL Data Warehouse	Drivers preinstalled with the operating system	Drivers preinstalled with the operating system
Oracle	.Net provider for Oracle	OLEDB provider for Oracle (OraOLEDB.Oracle), .Net provider for Oracle(Oracle.DataAccess.Client)
Teradata	.Net provider for Teradata	.Net Provider for Teradata(Teradata.Client.Provider)

And that’s it for a quick introduction of DirectQuery support for Tabular 1400. Please take it for a test drive and send us your feedback and suggestions via ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. You can influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers.

↧

New Get Data Capabilities in the GA Release of SSDT Tabular 17.0 (April 2017)

April 19, 2017, 9:51 am

≫ Next: Introducing a DAX Editor Tool Window for SSDT Tabular

≪ Previous: Introducing DirectQuery Support for Tabular 1400

With the General Availability (GA) release of SSDT 17.0, the modern Get Data experience in Analysis Service Tabular projects comes with several exciting improvements, including DirectQuery support (see the blog article “Introducing DirectQuery Support for Tabular 1400”), additional data sources (particularly file-based), and support for data access options that control how the mashup engine handles privacy levels, redirects, and null values. Moreover, the GA release coincides with the CTP 2.0 release of SQL Server 2017, so the modern Get Data experience benefits from significant performance improvements when importing data. Thanks to the tireless effort of the Mashup engine team, data import performance over structured data sources is now at par with legacy provider data sources. Internal testing shows that importing data from a SQL Server database through the Mashup engine is in fact faster than importing the same data by using SQL Server Native Client directly!

Last month, the blog article “What makes a Data Source a Data Source?” previewed context expressions for structured data sources—and the file-based data sources that SSDT Tabular 17.0 GA adds to the portfolio of available data sources make use of context expressions to define a generic file-based source as an Access Database, an Excel workbook, or as a CSV, XML, or JSON file. The following screenshot shows a structured data source with a context expression that SSDT Tabular created for importing an XML file.

Note that file-based data sources are still a work in progress. Specifically, the Navigator window that Power BI Desktop shows for importing multiple tables from a source is not yet enabled so you end up immediately in the Query Editor in SSDT. This is not ideal because it makes it hard to import multiple tables. A forthcoming SSDT release is going to address this issue. Also, when trying to import from an Access database, note that SSDT Tabular in Integrated Workspace mode would require both the 32-bit and 64-bit ACE provider, but both cannot be installed on the same computer. This issue requires you to use a remote workspace server running SQL Server 2017 CTP 2.0, so that you can install the 32-bit driver on the SSDT workstation and the 64-bit driver on the server running Analysis Services CTP 2.0.

Keep in mind that SSDT Tabular 17.0 GA uses the Analysis Services CTP 2.0 database schema for Tabular 1400 models. This schema is incompatible with CTPs of SQL vNext Analysis Services. You cannot open Tabular 1400 models with previous schemas and you cannot deploy Tabular 1400 models with a CTP 2.0 database schema to a server running a previous CTP version.

Another great data source that you can find for the first time in SSDT Tabular is Azure Blob Storage, which will be particularly interesting when Azure Analysis Services provides support for the 1400 compatibility level. When connecting to Azure Blob Storage, make sure you provide the account name or URL without any containers in the data source definition, such as https://myblobdata.blob.core.windows.net. If you appended a container name to the URL, SSDT Tabular would fail to generate the full set of data source settings. Instead, select the desired contain in the Navigator window, as illustrated in the following screenshot.

As mentioned above, SSDT Tabular 17.0 GA uses the Analysis Services CTP 2.0 database schema for Tabular 1400 models. This database schema is more complete than any previous schema version. Specifically, you can find additional Data Access Options in the Properties window when selecting the Model.bim file in Solution Explorer (see the following screenshot). These data access options correspond to those options in Power BI Desktop that are applicable to Tabular 1400 models hosted on an Analysis Services server, including:

Enable Fast Combine (default is false) When set to true, the mashup engine will ignore data source privacy levels when combining data.
Enable Legacy Redirects (default is false) When set to true, the mashup engine will follow HTTP redirects that are potentially insecure (for example, a redirect from an HTTPS to an HTTP URI).
Return Error Values as Null (default is false) When set to true, cell level errors will be returned as null. When false, an exception will be raised if a cell contains an error.

And especially with the Enable Fast Combine setting you can now begin to refer to multiple data sources in a single source query.

Yet another great feature that is now available to you in SSDT Tabular is the Add Column from Example capability introduced with the April 2017 Update of Power BI Desktop. For details, refer to the article “Add a column from an example in Power BI Desktop.” The steps are practically identical. Add Column from Example is a great illustration of how the close collaboration and teamwork between the AS engine, Mashup engine, Power BI Desktop, and SSDT Tabular teams is compounding the value delivered to our customers.

Looking ahead, apart from tying up loose ends, such as the Navigator dialog for file-based sources, there is still a sizeable list of data sources we are going to add in further SSDT releases. Named expressions discussed in this blog article a while ago also still need to find their way into SSDT Tabular, and there are other things such as support for the full set of impersonation options that Analysis Services provides for data sources that can use Windows authentication. Currently, only service account and explicit Windows credentials can be used. Forthcoming impersonation options include current user and unattended accounts.

In short, the work to enable the modern Get Data experience in SSDT Tabular is not yet finished. Even though SSDT Tabular 17.0 GA is fully supported in production environments, Tabular 1400 is still evolving. The database schema is considered complete with CTP 2.0, but minor changes might still be coming. So please be invited to deploy SSDT Tabular 17.0 GA, use it to work with your Tabular 1200 models and take Tabular 1400 for a thorough test drive. And as always, please send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

↧

Introducing a DAX Editor Tool Window for SSDT Tabular

April 19, 2017, 9:56 am

≫ Next: What’s new in SQL Server 2017 CTP 2.0 for Analysis Services

≪ Previous: New Get Data Capabilities in the GA Release of SSDT Tabular 17.0 (April 2017)

The April 2017 release of SSDT Tabular for Visual Studio 2015 and 2017 comes with a DAX editor tool window that can be considered a complement to or replacement for the formula bar. You can find it on the View menu under Other Windows, and then select DAX Editor, as the following screenshot illustrates. You can dock this tool window anywhere in Visual Studio. If you select a measure in the Measure Grid, DAX Editor lets you edit the formula conveniently. You can also right-click on a measure in Tabular Model Explorer and select Edit Formula. Authoring new measures is as easy as typing a new formula in DAX Editor and clicking Apply. Of course, DAX Editor also lets you edit the expressions for calculated columns.

SSDT Tabular also displays the DAX Editor when defining Detail Rows expressions, which is an improvement over previous releases of SSDT Tabular that merely let you paste an expression into the corresponding textbox in the Properties windows, as the following screenshot illustrates. When working with measures, calculated columns, and the detail rows expression properties, note that there is only one DAX Editor tool window instance, so the DAX Editor switches to the expression you currently want to edit.

The DAX Editor tool window is a continuous improvement project. We have plans to include features such as code formatting and additional IntelliSense capabilities. Of course, we are also looking forward to hearing from you. So please send us your feedback and suggestions via ProBIToolsFeedback or SSASPrev at Microsoft.com, and report any issues you encounter. Or use any other available communication channels such as UserVoice or MSDN forums. You can influence the evolution of SSDT Tabular to the benefit of all our customers.

↧

What’s new in SQL Server 2017 CTP 2.0 for Analysis Services

April 19, 2017, 10:17 am

≫ Next: SSMS Improvements for Analysis Services in the April 2017 Release

≪ Previous: Introducing a DAX Editor Tool Window for SSDT Tabular

The public CTP 2.0 of SQL Server 2017 on Windows is available here! This public preview includes the following enhancements for Analysis Services tabular.

Object-level security to secure model metadata in addition to data.
Transaction-performance improvements for a more responsive developer experience.
Dynamic Management View improvements for 1200 and 1400 models enabling dependency analysis and reporting.
Improvements to the authoring experience of detail rows expressions.
Hierarchy and column reuse to be surfaced in more helpful locations in the Power BI field list.
Date relationships to easily create relationships to date dimensions based on date columns.
Default installation option for Analysis Services is tabular, not multidimensional.

Other enhancements not covered by this post include the following.

New Power Query data sources. See this post for more info.
DAX Editor for SSDT. See this post for more info.
Existing Direct Query data sources support for M expressions. See this post for more info.
SSMS improvements, such as viewing, editing, and scripting support for structured data sources.

Incompatibility with previous CTP versions

Tabular models with 1400 compatibility level that were created with previous versions are incompatible with CTP 2.0. They do not work correctly with the latest tools. Please download and install the April 2017 (17.0 GA) release of SSDT and SSMS.

Object-level security

Roles in tabular models already support a granular list of permissions, and row-level filters to help protect sensitive data. Further information is available here. CTP 1.1 introduced table-level security.

CTP 2.0 builds on this by introducing column-level security, which allows sensitive columns to be protected. This helps prevent a malicious user from discovering that such a column exists.

Column-level and table-level security are collectively referred to as object-level security (OLS).

The current version requires that column-level security is set using the JSON-based metadata, Tabular Model Scripting Language (TMSL), or Tabular Object Model (TOM). We plan to deliver SSDT support soon. The following snippet of JSON-based metadata from the Model.bim file secures the Base Rate column in the Employee table of the Adventure Works sample tabular model by setting the MetadataPermission property of the ColumnPermission class to None.

"roles": [
  {
    "name": "Users",
    "description": "All allowed users to query the model",
    "modelPermission": "read",
    "tablePermissions": [
      {
        "name": "Employee",
        "columnPermissions": [
          {
            "name": "Base Rate",
            "metadataPermission": "none"
          }
        ]
      }
    ]
  }

DAX query references to secured objects

If the current user is a member only of the Users role, the following query that explicitly refers to the [Base Rate] column fails with an error message saying the column cannot be found or may not be used.

EVALUATE
SELECTCOLUMNS(
    Employee,
    "Id", Employee[Employee Id],
    "Name", Employee[Full Name],
    "Base Rate", Employee[Base Rate] --Secured column
)

The following query refers to a measure that is defined in the model. The measure formula refers to the Base Rate column. It also fails with an equivalent error message. Model measures that refer to secured tables or columns are indirectly secured from queries.

EVALUATE
{ [Average of Base Rate] } --Indirectly secured measure

As you would expect, IntelliSense for DAX queries in SSMS also honors column-level security and does not disclose secured column names to unauthorized users.

Detail-rows expression references to secured objects

It is anticipated that the SELECTCOLUMNS() function will be commonly used for detail-rows expressions. Due to this, SELECTCOLUMNS() is subject to special behavior when used by DAX expressions in the model. The following detail-rows expression defined on the [Reseller Total Sales] measure does not return an error when invoked by a user without access to the [Base Rate] column. Instead it returns a table with the [Base Rate] column excluded.

--Detail rows expression for [Reseller Total Sales] measure
SELECTCOLUMNS(
    Employee,
    "Id", Employee[Employee Id],
    "Name", Employee[Full Name],
    "Base Rate", Employee[Base Rate] --Secured column
)

The following query returns the output shown below – with the [Base Rate] column excluded from the output – instead of returning an error.

EVALUATE
DETAILROWS([Reseller Total Sales])

However, derivation of a scalar value using a secured column fails on invocation of the detail-rows expression.

--Detail rows expression for [Reseller Total Sales] measure
SELECTCOLUMNS(
    Employee,
    "Id", Employee[Employee Id],
    "Name", Employee[Full Name],
    "Base Rate", Employee[Base Rate] * 1.1 --Secured column
)

Limitations of RLS and OLS combined from different roles

OLS and RLS are additive; conceptually they grant access rather than deny access. This means that combined membership from different roles that specify RLS and OLS could inadvertently cause security leaks. Hence combined RLS and OLS from different roles is not permitted.

RLS additive membership

Consider the following roles and row filters.

Role	Model Permission	Table
RoleA	Read	Geography	RLS Filter: Geography[Country Region Name] = “United Kingdom”
RoleB	Read	Geography	RLS Filter: Geography[Country Region Name] = “United States”

Users who are members of both RoleA and RoleB can see data for the UK and the US.

OLS additive membership

A similar concept applies to OLS. Consider the following roles.

Role	Model Permission	Table
RoleA	Read	Employee	OLS Column Permission: [Base Rate], MetadataPermission=None
RoleB	Read

RoleB allows access to all tables and columns in the model. Therefore, users who are members of both RoleA and RoleB can query the [Base Rate] column.

RLS and OLS combined from different roles

Consider the following roles that combine RLS and OLS.

Role	Purpose	Model Permission	Table
RoleA	Provide access to sales in the UK by customer (not product)	Read	Geography	RLS Filter: Geography[Country Region Name] = “United Kingdom”
			Product	OLS Table Permission: MetadataPermission=None
RoleB	Provide access to sales in the US by product (not customer)	Read	Geography	RLS Filter: Geography[Country Region Name] = “United States”
			Customer	OLS Table Permission: MetadataPermission=None

The following diagram shows the intersection of the tables and rows relevant to this discussion.

RoleA is intended to expose data only for the top right quadrant.

RoleB is intended to expose data only for the bottom left quadrant.

Given the additive nature of OLS and RLS, Analysis Services would be allowing access to all 4 quadrants by combining these permissions for users who are members of both roles. Data would be exposed that neither role had the intention of exposing. For this reason, queries for users who are granted RLS and OLS permissions combined from different roles fail with an error message stating that the combination of active roles results in dynamic security configuration that is not supported.

Transaction-performance improvements

SSDT updates the workspace database during the development process. Optimized transaction management in CTP 2.0 is expected to result in a more responsive developer experience due to faster metadata updates to the workspace database.

DMV improvements

DISCOVER_CALC_DEPENDENCY is back! This Dynamic Management View (DMV) is useful for tracking and documenting dependencies between calculations and other objects in a tabular model. In previous versions, it worked for tabular models with compatibility level of 1100 and 1103, but it did not work for 1200 models. In CTP 2.0, it works for all tabular compatibility levels including 1200 and 1400.

The following query shows how to use the DISCOVER_CALC_DEPENDENCY DMV.

SELECT * FROM $System.DISCOVER_CALC_DEPENDENCY;

There are differences in the output for 1200 and 1400 models. The easiest way to understand them is to compare the output for models with different compatibility levels. Notable differences are listed here for reference.

Relationships in 1200 and higher are identified by name (normally a GUID) in the OBJECT column. Active relationships have OBJECT_TYPE of “ACTIVE_RELATIONSHIP”; inactive relationships have OBJECT_TYPE of “RELATIONSHIP”. 1103 and lower models differ because they include all relationships with OBJECT_TYPE of “RELATIONSHIP” and an additional “ACTIVE_RELATIONSHIP” row to flag each active relationship.
1103 and lower models include a row with OBJECT_TYPE “HIERARCHY” for each attribute hierarchy dependency on its column. 1200 and higher do not.
1200 and higher models include rows for calculated tables with OBJECT_TYPE “CALC_TABLE”. Calculated tables are not supported in 1103 or lower models.
1200 and higher models currently do not include rows for measure data dependencies on tables and columns. Data dependencies between DAX measures are included.

We intend to may make further improvements to DISCOVER_CALC_DEPENDENCY in forthcoming CTPs, so stay tuned.

Improved authoring experience for Detail Rows

The April 2017 release (17.0 GA) of SSDT provides an improved authoring experience with IntelliSense and syntax highlighting for detail rows expressions using the new DAX Editor for SSDT. Click on the ellipsis in the Detail Rows Expression property to activate the DAX editor.

Hierarchy & column reuse

Hierarchy reuse is a Power BI feature, although it is surfaced differently in Analysis Services. Power BI uses it to provide easy access to implicit date hierarchies for date fields. Introducing such features for Analysis Services furthers the strategic objective of enabling a consistent modeling experience with Power BI.

Tabular models created with CTP 2.0 can leverage hierarchy reuse to surface user hierarchies and columns – not limited to those from a date dimension table – in more helpful locations in the Power BI field list. This can provide a more guided analytics experience for business users.

For example, the Calendar hierarchy from the Date table can be surfaced as a field in Internet Sales, and the Fiscal hierarchy as a field in the Sales Quota table. This assumes that, for some business reason, sales quotas are frequently reported by fiscal date.

The current version requires that hierarchy and column reuse is set using the JSON-based metadata, Tabular Model Scripting Language (TMSL), or Tabular Object Model (TOM). The following snippet of JSON-based metadata from the Model.bim file associates the Calendar hierarchy from the Date table with the Order Date column from the Internet Sales table. As shown by the type name, the feature is also known as variations.

{
  "name": "Order Date",
  "dataType": "dateTime",
  "sourceColumn": "OrderDate",
  "variations": [
    {
      "name": "Calendar Reuse",
      "description": "Show Calendar hierarchy as field in Internet Sales",
      "relationship": "3db0e485-88a9-44d9-9a12-657c8ef0f881",
      "defaultHierarchy": {
          "table": "Date",
          "hierarchy": "Calendar"
      },
      "isDefault": true
    }
  ]
}

The current version also requires the ShowAsVariationsOnly property on the dimension table to be set to true, which hides the dimension table. We intend to remove this restriction in a forthcoming CTP.

{
  "name": "DimDate",
  "showAsVariationsOnly": true

The Order Date field in Internet Sales now defaults to the Calendar hierarchy, and allows access to the other columns and hierarchies in the Date table.

Date relationships

Continuing the theme of bringing Power BI features to Analysis Services, CTP 2.0 allows the creation of date relationships using only the date part of a DateTime value. Power BI uses this internally for relationships to hidden date tables.

Date relationships that ignore the time component currently only work for imported models, not Direct Query.

The current version requires that date relationship behavior is set using the JSON-based metadata, Tabular Model Scripting Language (TMSL), or Tabular Object Model (TOM). The following snippet of JSON-based metadata from the Model.bim file defines a relationship from Reseller Sales to Order based on the date part only of the Order Date column. Valid values for JoinOnDateBehavior are DateAndTime and DatePartOnly.

{
  "name": "100ca454-655f-4e46-a040-cfa2ca981f88",
  "fromTable": "Reseller Sales",
  "fromColumn": "Order Date",
  "toTable": "Date",
  "toColumn": "Date",
  "joinOnDateBehavior": "datePartOnly"
}

Default installation option is tabular

Tabular mode is now the default installation option for SQL Server Analysis Services in CTP 2.0.

Note: this also applies to installations from the command line. Please see this document for further information on how to set up automated installations of Analysis Services from the command line. In CTP 2.0, if the ASSERVERMODE parameter is not provided, the installation will be in tabular mode. Previously it was multidimensional.

Extended events

Extended events were not working in CTP 1.3. They do work again in CTP 2.0 (actually since CTP 1.4).

Download now!

To get started, download SQL Server 2017 on Windows CTP 2.0 from here. Be sure to keep an eye on this blog to stay up to date on Analysis Services.

↧

SSMS Improvements for Analysis Services in the April 2017 Release

April 26, 2017, 10:08 am

≫ Next: Join Alberto Cairo at the Microsoft Data Insights Summit to Explore the Different Dimensions of Data Visualization

≪ Previous: What’s new in SQL Server 2017 CTP 2.0 for Analysis Services

The April 2017 Release of SSMS for Analysis Services is the first release with support for the modern Get Data experience. This release also features additional capabilities for the DAX parser, which come in handy when authoring or fine-tuning queries in the DAX Query Window. Azure Analysis Services now also supports Multi-Factor Authentication (MFA) based on Active Directory Universal Authentication. So, don’t delay and download SQL Server Management Studio 17.0 today.

As far as the modern Get Data experience is concerned, you can now view, edit, and script out structured data sources, as the following screenshot illustrates. This comes in handy when you want to update data source information or credentials in a deployed Tabular 1400 model on-prem or in the cloud. Azure Analysis Services will release support for 1400 models very soon. You can develop with the Integrated workspace until this is available.

Note that you can edit the connection details, credentials, options, as well as the context expression for a data source in SSMS. The values for these parameters correspond to strings in JSON format, and they depend on the data source type. For example, a SQL Server data source requires different connection details than an OData feed. The documentation covering all these options is not yet available, but you can glean the settings from a Tabular 1400 model in SSDT. Just create a data source of the desired type, set the connection details and credentials as desired, and then analyze the JSON-based data source definition in the resulting Model.bim file.

The DAX parser improvements, on the other hand, show up in the DAX Query Window. Among other things, the DAX Query Window now supports parentheses matching in IntelliSense, which facilitates examining how parentheses match up with each other in an expression. For example, if you select a closing parenthesis, the DAX Query Window will automatically highlight the corresponding opening parenthesis. Moreover, you can now enjoy support for DEFINE MEASURE and DEFINE VAR to define measures and named variable without distracting read squiggles. The following screenshot shows parentheses matching based on a simple DAX query with a named variable.

Finally, when connecting to Azure Analysis Services, you can choose Active Directory Universal Authentication as the authentication method, enter your user name, and then click Connect. SSMS then displays a tenant-specific sign-in page to provide the remaining credential information, such as a password, and complete any other MFA steps (see the following screenshot). If another user is already logged in previously, the sign-in dialog will give you a warning, asking you to log off as the previous user before signing in. After a successful login, SSMS caches the sign-in token in memory for future reconnects. Note that the in-memory token is not shared across processes. So a new or second instance of SSMS will not have the cached credentials.

And last but not least, SQL Server Management Studio 17.0 is now released for general availability (GA), meaning it is fully supported in production environments. This GA release is also closely aligned with SQL Server vNext CTP 2.0. This is important to note because in CTP 2.0, Analysis Services is introducing some changes that break backwards compatibility for Tabular 1400 models. You will get errors if you connect with SSMS 17.0 GA to an Analysis Services servers running an earlier CTP. For example, you cannot access the database properties dialog for a Tabular 1400 model. To avoid such issues, make sure you update all your preview server deployments to SQL Server vNext CTP 2.0.

↧

Join Alberto Cairo at the Microsoft Data Insights Summit to Explore the Different Dimensions of Data Visualization

May 2, 2017, 9:08 am

≫ Next: 1400 Compatibility Level in Azure Analysis Services

≪ Previous: SSMS Improvements for Analysis Services in the April 2017 Release

Known for his award-winning global achievements in infographics and multimedia as a professor, author, journalist and designer, we are excited to welcome data visualization expert Alberto Cairo back to the Microsoft Data Insights Summit, taking place June 12-13, 2017 in Seattle. This year Alberto will provide a special guest keynote focused on the different dimensions of data visualization and how to entice your audience to engage with your information.

Alberto serves as the Knight Chair in Visual Journalism at the School of Communication at the University of Miami (UM) and author of The Functional Art: an Introduction to Information Graphics and Visualization (2012), and The Truthful Art: Data, Charts, and Maps for Communication (2016). With more than 15 years of experience in news organizations, as a journalist, designer, and team manager and praised for his ability to explain in clear terms how to work with data, discover the stories hidden within and share those stories with the world, Alberto transforms elementary principles of data and scientific reasoning into tools that can be used in daily life.

At the Data Insights Summit, Alberto will enlighten attendees on how data visualization can be used to explore data, communicate results and engage people with compelling information. In addition to discussing the three dimensions of exploration, presentation and engagement, Alberto’s keynote will provide takeaways for how you can let your audience interact with data and grow their interest in your story.

If you want to learn more about the different dimensions of data visualization and how to captivate your audience when telling a data story, register to attend the Data Insights Summit. The conference will be filled with technical content, hands-on workshops, and a great lineup of speakers. Plus, attendees can meet 1:1 with the experts behind Microsoft’s data insights tools and solutions, including Microsoft Power BI, SQL Server BI, Excel, PowerApps, and Flow.

Additionally, you can get a sample of Alberto’s teachings in action with his Power BI Data Storytelling and Visualization Courses. We look forward to Alberto’s keynote and hope you’ll join us at the Data Insights Summit!

↧

1400 Compatibility Level in Azure Analysis Services

May 4, 2017, 9:59 am

≫ Next: Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 1

≪ Previous: Join Alberto Cairo at the Microsoft Data Insights Summit to Explore the Different Dimensions of Data Visualization

We are excited to announce the public preview of the 1400 compatibility level for tabular models in Azure Analysis Services! This brings a host of new connectivity and modeling features for comprehensive, enterprise-scale analytic solutions delivering actionable insights. The 1400 compatibility level will also be available in SQL Server 2017 Analysis Services, ensuring a symmetric modeling capability across on-premises and the cloud.

Here are just some highlights of the new features available to 1400 models.

New infrastructure for data connectivity and ingestion into tabular models with support for TOM APIs and TMSL scripting. This enables:
- Support for additional data sources, such as Azure Blob storage. Additional data sources are planned soon.
- Data transformation and data mashup capabilities.
Support for BI tools such as Microsoft Excel enable drill-down to detailed data from an aggregated report. For example, when end-users view total sales for a region and month, they can view the associated order details.
Object-level security to secure table and column names in addition to the data within them.
Enhanced support for ragged hierarchies such as organizational charts and chart of accounts.
Various other improvements for performance, monitoring and consistency with the Power BI modeling experience.

Please see this post on the Azure blog for more information.

↧

Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 1

May 15, 2017, 9:27 am

≫ Next: What’s new in SQL Server 2017 CTP 2.1 for Analysis Services

≪ Previous: 1400 Compatibility Level in Azure Analysis Services

In a comment to a recent blog article, Bill Anton raised a question about the target scenarios for the modern Get Data experience in Tabular 1400 models, especially concerning file-based data sources. So, let’s look at a concrete example from my personal to-do list: Building a Tabular model on top of industry standard synthetic data for testing purposes.

Performance testing and benchmarking is an important part of quality assurance for Analysis Services. A prerequisite is a representative workload and TPC-DS provides such a workload, widely accepted across the industry. It defines a relational database schema and includes tools to generate data files at various scale factors (see TPC-DS on the TPC.org site). It also defines SQL queries to evaluate performance in a replicable way. Of course, for Analysis Services, these SQL queries must be converted to DAX and/or MDX queries, but that’s beside the point. The point is that TPC-DS can generate a large amount of file-based source data, which I want to bring into a Tabular model to enjoy blazing fast query performance.

Of course, I am particularly interested in testing Azure Analysis Services across all tiers from the smallest to the largest offering. This requires me to bring potentially terabytes of TPC-DS source data into the cloud. A scalable way to accomplish this is to use Azure blob storage in conjunction with Azure Data Factory to orchestrate the data movement into an Azure SQL Data Warehouse that then serves as the backend for Azure Analysis Services. The following diagram highlights typical technologies that may be found in an Azure SQL Data Warehouse environment. For details about getting data into Azure SQL Data Warehouse, see “Load data into Azure SQL Data Warehouse.”

There are many good reasons to use Azure SQL Data Warehouse for enterprise workloads, ranging from true elasticity, high reliability, and global availability all the way to automatic threat detection. Another is to implement an infrastructure that helps focus dedicated teams on specific tasks to extract, transform, load, and maintain the data with high reliability and consistency. It is hard to overstate the importance of accurate and trustworthy data for BI solutions.

On the other hand, the TPC-DS source data requires little more than a bulk insert operation. Ensuring accuracy can be as trivial as counting the rows in the destination tables after import. In this situation, Azure SQL Data Warehouse would merely be used as staging repository. By using the Azure Blob Storage connector in Tabular 1400, it makes sense to simplify and streamline the data pipeline significantly, as the following diagram illustrates. Not only is the environment relatively easy to deploy, it also accelerates the movement of data into the Tabular model because an entire bulk insert step is eliminated.

Having provisioned an Azure Analysis Services server and a blob storage account, the next step is to generate the source data with the desired scale factor. TPC-DS includes the source code for the data generator tool (dsdgen.exe) for this purpose. For an initial test, it might be a good idea to start small, using the command dsdgen /scale 1 to generate only about 1GB of source data. The dsdgen tool generates this data relatively quickly. It also doesn’t take too long to upload the resulting data files to Blob storage by using the AZCopy command-line utility, as the following screenshot reveals.

With the data files in Blob storage, the moment has come to create a Tabular model in 1400 compatibility mode and import the data by using the Azure Blob Storage connector. Unfortunately, however, the Azure Blob Storage connector does not recognize the individual blobs as tables. Instead, the Import from Data Source flow produces a single table with the list of all files (see the following Query Builder screenshot).

The goal is to create 25 tables. One for each file. This is perhaps more quickly accomplished programmatically by using the Tabular Object Model (TOM), as outlined in “Introducing a Modern Get Data Experience for SQL Server vNext on Windows CTP 1.1 for Analysis Services,” but it’s also doable in SSDT Tabular. In Query Editor, I could repeatedly open the Query menu and then click Duplicate Query to create a total of 25 queries (or right-click on an existing query and then choose Duplicate from the context menu). The next step would be to rename the queries using the table names per TPC-DS database schema, as documented in the tpcds.sql file included in the TPC-DS toolset. Essentially, the table names correspond to the data file names. And then for each table query, I would click on the Binary content link (or right-click on the link and select Drill Down) to drill down and include only the content of the relevant file, as in the following screenshot.

The Query Editor automatically discovers the delimiter in the TPC-DS source data and shows the file contents neatly organized in columns, but it cannot discover the column names because these names are not included in the data files. It would be necessary to rename the columns manually. At this point, you might be wondering if there is a way to avoid all the tedious work. Renaming hundreds of columns manually in Query Builder across 25 rather wide tables, isn’t exactly a pleasant undertaking.

There are several options to streamline the process of defining the tables in the model. An obvious one is to generate the tables programmatically by using TOM. Another is to modify the dsdgen source code to include the column names in the first line of each data file so that Query Builder can discover the column names automatically. In the end, I chose a slightly different approach that is taking some elements from both. I would parse the tpcds.sql file to get the tables with their columns, but instead of generating the tables programmatically, I would write the columns per table into a header file. I could then upload the header file together with the data file per table into a separate Azure Blobs container and combine them in the table query to get the column names and the content.

The following code snippet uses a couple of Regex expressions to extract the table and column names from the tpcds.sql file. It then creates a separate subfolder for each table and saves a header file with the column names in it. It also uses the Microsoft.WindowsAzure.Storage API to create a separate container for each table in the Azure Blob storage account from which Azure Analysis Services is going to import the data. The only caveat is that underscores cannot be used in Azure Blob container names, so the code replaces the underscores in the table names with hyphens (it is not too much work to flip these characters back again later in the Tabular model). The code also generates the M queries for the tables and writes them into separate files. The queries can then be pasted into the Advanced Editor window of the Query Builder in SSDT. Note that the line “Source = #””AzureBlobs/https://tpcdsfiles blob core windows net/””,” is specific to my Tabular model because it refers to the data source object with the name “AzureBlobs/https://tpcdsfiles blob core windows net/”. If the data source had a different name, that line would have to be updated accordingly.

The data files for each table, generated by using dsdgen, can now be placed in each table’s subfolder and uploaded together with the header file to the corresponding Azure Blob container. The following batch file gets the job done. It takes the URL to the Azure Blob service endpoint as a command line parameter, loops through the subfolders and calls dsdgen to generate each table file within its subfolder, and even handles the special child tables catalog_returns, store_returns, and web_returns, which are generated together with their corresponding *_sales tables in a single dsdgen command. The batch simply moves these *_returns.dat files to their correct subfolders after dsdgen finishes.

@ECHO OFF
FOR /D %%A IN (“D:\TPC-DS\data\*”) DO (
ECHO.%%~nA| FIND /I “_returns”>Nul && (
ECHO.Skipping %%~nA child table
) || (
ECHO Generating data for %%~nA
dsdgen /scale 1 /dir %%~fA /table %%~nA
)
setlocal enabledelayedexpansion
ECHO.%%~nA| FIND /I “_sales”>Nul && (
set “file=%%~nA.dat”
set “file=!file:_sales=_returns!”
set “srcFolder=%%~fA”
set “destFolder=!srcFolder:_sales=_returns!”
set “filePath=!srcFolder!\!file!
move /-y !filePath! !destFolder!
)
endlocal
)
FOR /D %%A IN (“D:\TPC-DS\data\*”) DO (
setlocal enabledelayedexpansion
set “table=%%~nA”
set “table=!table:_=-!”
set “target=%1/!table!”
ECHO Uploading data to !target!
“C:\Program Files (x86)\Microsoft SDKs\Azure\AzCopy\AzCopy.exe” /Source:%%~fA /Dest:!target! /DestKey:<key> /S
endlocal
)

And that’s it! After uploading all header and data files into their specific Blob containers, it’s straightforward to import the tables with their column names. As the following screenshot reveals, it is now possible to select all Blob containers in the Navigator window, each container corresponds to a table with the relevant files, and the queries are quickly copied and pasted into the Advanced Editor from the .m files generated with the code snippet listed earlier. No more manual creating or renaming additional queries or columns in the Query Builder dialog box!

So far, the resulting solution is nice toy, only suitable for small scale factors. Serious data volumes require a slightly modified approach. The data needs to be generated in chunks distributed over many source files per table, the tables should be based on multiple partitions, and the partition queries should take advantage of named expressions. As soon as SSDT Tabular supports named expressions in a forthcoming release, a second blog post will cover how to take the current model to enterprise scale and run it on the largest tiers in Azure Analysis Services.

But there is one more aspect that must be touched: table relationships. The TPC-DS schema defines quite a few table relationships, as you can discover by analyzing the tpcds_ri.sql file. Using an approach like already covered, you could parse the tpcds_ri.sql file and create the relationships programmatically in the Tabular model by using TOM. Some relationships depend on multiple columns, for which you would need to create calculated columns to concatenate these multiple key columns into a single column. But before doing so, keep in mind that the purpose of table relationships is fundamentally different between a relational system and Analysis Services. In a relational system, relationships help to ensure referential integrity, among other things. In Analysis Services, relationships establish how the data should be correlated and filtered across tables. So, instead of analyzing the tpcds_ri.sql file and blindly creating a large number of relationships, it would be better to only create those relationships that are actually required for data analysis.

There’s one more aspect, and that is that table relationships in Analysis Services do not enforce referential integrity. This is important to keep in mind if you are planning to build production solutions without a relational database or data warehouse in the backend. If accuracy and consistency of data matters—and it usually does in a BI solution—some controls must be implemented to ensure data integrity. Building Tabular models in Azure Analysis Services directly on top of Azure blobs may seem attractive from an infrastructure complexity and data processing perspective, but the advantages and limitations should be carefully evaluated.

And that’s it for this first blog post covering Tabular 1400 in Azure Analysis Services on top of Azure Blob Storage. Stay tuned for the second blog post to take the data volume to more interesting scale factors. And as always, please deploy the latest monthly release of SSDT Tabular and use it to take Tabular 1400 for a test drive. Send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

↧

What’s new in SQL Server 2017 CTP 2.1 for Analysis Services

May 18, 2017, 9:58 am

≫ Next: Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

≪ Previous: Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 1

The public CTP 2.1 of SQL Server 2017 is available here! This public preview includes the following enhancements for Analysis Services tabular models.

Shared M expressions are shown in the SSDT Tabular Model Explorer, and can be maintained using the Query Editor.
Data Management View (DMV) improvements.
Opening an file with the .MSDAX extension in SSDT enables DAX non-model related IntelliSense.
Encoding hints can be set in the SSDT properties window.

Shared M expressions

Shared M expressions are shown in the Tabular Model Explorer in SSDT! By right clicking the Expressions node, you can edit the expressions in the Query Editor. This should seem familiar to Power BI Desktop users.

DMV improvements

DISCOVER_CALC_DEPENDENCY

M dependencies are included in DISCOVER_CALC_DEPENDENCY for CTP 2.1. As communicated in the CTP 2.0 blog post, DISCOVER_CALC_DEPENDENCY now works with 1200 models.

The following query returns the output shown below. M expressions and structured data sources are included for 1400 models.

SELECT * FROM $System.DISCOVER_CALC_DEPENDENCY
WHERE OBJECT_TYPE = 'PARTITION' OR OBJECT_TYPE = 'M_EXPRESSION';

The output is a superset of the information shown by the Query Dependencies visual, which is available in SSDT from the Query Editor.

This information is useful for numerous scenarios including the following.

Documentation of tabular models.
Community tools such as BISM Normalizer that perform incremental metadata deployment and merging, as well as other 3^rd party tools, can use it for impact analysis.

MDSCHEMA_MEASUREGROUP_DIMENSIONS

CTP 2.1 provides a fix for MDSCHEMA_MEASUREGROUP_DIMENSIONS. This DMV is used by various client tools to show measure dimensionality. For example, the Explore feature in Excel Pivot Tables allows the user to cross-drill to dimensions related to the selected measures.

Prior to CTP 2.1, some rows were missing in the output for 1200 models, which meant the Explore feature did not work correctly. This is fixed in CTP 2.1.

We intend to make further DMV improvements in forthcoming releases, so stay tuned.

DAX file editing

Opening a file with the .MSDAX extension in SSDT allows DAX editing with non-model related IntelliSense such as highlighting, statement completion and parameter info. As you can imagine, we intend to use this for interesting features to be released in the future!

Encoding hints

As documented in this blog post, encoding hints are an advanced feature introduced by CTP 1.3. They can help optimization of processing (data refresh) for large in-memory tabular models. In CTP 2.1, encoding hints can be set in the SSDT Visual Studio properties window.

Download now!

To get started, download SQL Server 2017 CTP2.1. The May 2017 release of the Analysis Services VSIX for SSDT is available here. VSIX deployment for Visual Studio 2017 is discussed in this blog post.

Be sure to keep an eye on this blog to stay up to date on Analysis Services!

↧

Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

May 30, 2017, 9:17 am

≫ Next: Update to Auto-Partitioning Code Sample & Whitepaper

≪ Previous: What’s new in SQL Server 2017 CTP 2.1 for Analysis Services

The first part in this series covering Azure Analysis Services models on top of Azure Blob Storage discussed techniques to implement a small Tabular 1400 model based on synthetic TPC-DS source data. This second part continues the journey to take Tabular 1400 in Azure Analysis Services to larger scale factors—up to the maximum capacity that Azure Analysis Services currently provides.

Taking a Tabular 1400 model to large scale requires an efficient approach that mitigates limitations in the tools and maximizes performance in the model. For starters, it would take a long time to generate 1 TB of TPC-DS source data by using a single dsdgen instance. A much better approach is to run multiple dsdgen instances in parallel to save time and create a 1 TB set of smaller source files that are easier to handle individually. Furthermore, having generated and uploaded the source data to Azure Blob storage, it would not be advisable to create the Tabular model directly against the full set of data because SQL Server Data Tools for Analysis Services Tabular (SSDT Tabular) would attempt to download all that data to the workspace server. Even if the workspace server had the capacity, it’s an unnecessarily large data transfer. Instead, it is a common best practice to create a representative subset of the data and then build the data model against that source, and then later switch the source during production deployment. Moreover, the Tabular 1400 model must be designed with data management and performance requirements in mind. Among other things, this includes a partitioning scheme for the tables in the model. And last but not least, the source queries of the table partitions should be optimized to avoid redundancies and keep the model metadata clean and small. The May 2017 release of SSDT Tabular introduces support for named expressions in Tabular 1400 models and this article demonstrates how to use them for source query optimization.

Generating 1 TB of TPC-DS source data by using multiple dsdgen instances is easy thanks to the command line parameters PARALLEL and CHILD. The PARALLEL parameter indicates the overall number of child processes to generate the source data. The CHILD parameter defines which particular chunk of data a particular dsdgen instance generates. For example, I distributed the data generation across 10 virtual machines in Azure, with each VM running 10 child dsdgen instances. Running in parallel, these 10 instances utilized the eight available cores per VM close to 100% and finished the data generation in roughly 2 hours. The following screenshot shows the resource utilization on one of the VMs about half an hour into the processing.

Windows PowerShell and the Azure Resource Manager cmdlets make provisioning 10 Azure VMs a blast. For a sample script to create a fully configured virtual machine, see the article “Create a fully configured virtual machine with PowerShell.” I then installed AzCopy via https://aka.ms/downloadazcopypr and copied the TPC-DS tool set and an empty data folder with all the sub-containers to each VM (as discussed in Part 1). Next, I slightly modified the batch file from Part 1 to create and upload the source files to accommodate the different file name format that dsdgen uses when PARALLEL and CHILD parameters are specified. Instead of <table name>.dat, dsdgen now generates files names as <table name>_<child id>_<parallel total>.dat. An additional batch file then helped to launch the 10 data generation processes, passing in the child ID and Blob service endpoint URL as parameters. It’s a trivial batch as listed below. On the second VM, the loop would start at 11 and go to 20, and so forth (see the following illustration).

@echo off

for /l %%x in (1, 1, 10) do (
start createandupload.bat %%x https://<BlobStorageAccount>.blob.core.windows.net/
)

Having finished the data generation and verified that all files were uploaded to Azure Blob storage successfully, I deleted all provisioned VM resources by using a single Remove-AzureRmResourceGroup command. Deleting a resource group deletes all associated Azure resources. Needless to say that the Azure Blob storage account with the generated data must not be associated with this resource group for it must remain available for the next steps.

The next task is to create a representative sample of the TPC-DS data for modelling purposes. This can be as easy as placing the 1 GB data set generated in Part 1 in a separate Azure Blob storage account. However, dsdgen creates a different set of source files per table for the 1 GB versus the 1 TB scale factor, even if the same PARALLEL and CHILD parameters are specified. If it is important to generate the same set of source files just with less data—and in my case it is because I want to create source queries on top of a large collection of blobs representative of a 1 TB data set—a different approach is needed.

By using the 1 GB data set from Part 1 and the code snippet below, I generated a representative set of sample files identical to the ones that dsdgen generates for a 1 TB data set. The following table summarizes how the code snippet distributed the 1 GB of data across the sample files. The code snippet then used the Azure SDK to upload the files to my second Azure Blob storage account.

Table Name	Row Count (1GB)	File Count (1TB)	Max Rows Per Sample File
call_center	6	1	6
catalog_page	11718	1	11718
catalog_returns	144067	100	1441
catalog_sales	1441548	100	14416
customer	100000	100	1000
customer_address	50000	100	500
customer_demographics	1920800	100	19208
date_dim	73049	1	73049
dbgen_version	1	1	1
household_demographics	7200	1	7200
income_band	20	1	20
inventory	11745000	100	117450
item	18000	1	18000
promotion	300	1	300
reason	35	1	35
ship_mode	20	1	20
store	12	1	12
store_returns	287514	100	2876
store_sales	2880404	100	28805
time_dim	86400	1	86400
warehouse	5	1	5
web_page	60	1	60
web_returns	71763	100	718
web_sales	719384	100	7194
web_site	30	1	30

The code is not very efficient, but it does get the job done eventually—giving me enough time to think about the partitioning scheme for the larger tables in the model. If you study the available Performance Guide for Analysis Services Tabular, you will find that the partitioning of tables in a Tabular model does not help to improve query performance. However, starting with SQL Server 2016 Analysis Services, Tabular models can process multiple partitions in parallel, so partitioning can help to improve processing performance. Still, as the performance guide points out, excessive partitioning could result in many small column segments, which could impact query performance. It’s therefore best to be conservative. A main reason for partitioning in Tabular models is to aid in incremental data loading, which is precisely my intention.

The goal is to load as much TPC-DS data as possible into a Tabular 1400 model. The largest Azure Analysis Services server currently has 20 cores and 100 GB of RAM. Even larger servers with 200 GB or 400 GB of RAM will be available soon. So, how large of a Tabular 1400 model can such a server load? The answer depends on, among other things, the compressibility of the source data. Achievable ratios can vary widely. With a cautious assumption of 2:1 compressibility, 1 TB of source data would far exceed 100 GB of RAM. It’s going to be necessary to start with smaller subsets. And even if a larger server could fit all the source data into 400 GB of RAM, it would still be advisable to go for incremental data loading. The data set consists of more than 1,000 blob files. Pulling all these files into a Tabular model at once would likely hit throttling thresholds on the Azure Blob storage side causing substantial delays during processing.

The TPC-DS tables can be categorized as follows:

Category	Tables	Amount of Data
Small tables with only 1 source file per table.	call_center, catalog_page, date_dim, dbgen_version, household_demographics, income_band, item, promotion, reason, ship_mode, store, time_dim, warehouse, web_page, web_site	~0.1 GB
Medium tables with 100 source files per table.	customer, customer_address, customer_demographics	~5 GB
Large tables with 100 source files per table.	catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, web_sales	~950 GB

The small tables with only 1 source file per table can be imported at once. These tables do not require an incremental loading strategy. Similarly, the medium files do not add much data and can be loaded in full, but the large tables require a staged approach. So, the first processing cycle will include all files for the small and medium tables, but only the first source file for each large table. This reduces the source data volume to approximately 10 GB for the initial processing cycle. Subsequent processing cycles can then add further partitions to the large tables to import the remaining data until RAM capacity is exhausted on the Azure Analysis Services server. The following diagram illustrates this staged loading process.

By using the 1 GB sample data set in Azure Blob storage, I can now build a Tabular 1400 model by using the May 2017 release of SSDT Tabular and implement the staged loading process by taking advantage of named expressions in the source queries. Note that previous SSDT Tabular releases are not able to deal with named expressions. The May (or a later) release is an absolute must have.

Regarding source queries, the small tables with a single source file don’t require any special attention. The source queries covered in Part 1 would suffice. So, let’s take one of the more interesting medium tables that comprises 100 source files, such as the customer table, create the source query for that table, and then see how the result could apply in a similar way to all other tables in the model.

The first step is to create a source query for the customer table by using Query Builder in SSDT Tabular. And the first task is to exclude the header file from the list of data files. It will be included later. In Navigator, select the customer table, and then in Query Builder, right-click on the Name cell in the last row (showing a value of “header_customer.dat”), expand Text Filters, and then select Does not start with. Next, in the Applied Steps list, for the resulting Filtered Rows step, click on the Settings button, and then in the Filter Rows dialog box, change the value for “does not begin with” from “header_customer.dat” to just “header” so that this filter can be applied later on in the same way to any header file in the data set. Click OK and verify that the header file has disappeared from the list of source files.

The next task is to combine the remaining files for the table. In the header cell of the Content column, click on the Combine Files button, as illustrated in the following screenshot, and then in the Combine Files dialog box, click OK.

As you can see in the Queries pane above, this sequence of steps creates quite a few new expressions, which the customer query relies on to combine the 100 source files for the table. However, apart from the fact that the header file still needs to be added to get the column names, it is a good idea to optimize the expressions at this point to keep the model free of clutter. The sample queries are unnecessary and should be eliminated. This is especially notable when importing many tables. In the TPC-DS case with 25 tables, Query Builder would generate 25 different sets of these sample queries, which would amount to a total of 75 avoidable expressions in the model. The only named expression worth keeping is the Transform File from customer function. Again, Query Builder would generate multiple such transform functions (one for each table) where only a single such function suffices.

The first cleanup step is to eliminate the need for the sample queries by editing the customer source query. In Query Builder, in the Applied Steps list, delete all steps after the “Invoke Custom Function 1” step so that this invoke step is the last step in the query, which adds a “Transform File from customer” column to the table. Right-click this column and select Remove other Columns so that it is the only remaining column. Next, click on the Expand button in this column’s header, and then make sure you deselect the Use original column name as prefix checkbox and click OK. At this point, the customer query no longer references any sample expressions so they can be deleted in Query Builder. Also, rename the “Transform File from customer” function and just call it “Transform File” as in the screenshot below so that it can be used across multiple tables without causing confusion. I also shortened the M expression for this function as follows.

let
Source = (File) => let
Source = Csv.Document(File,[Delimiter=”|”])
in
Source
in
Source

Note that even the Transform File function could be eliminated by replacing its reference in the Invoke Custom Function1 step with a direct call to the Csv.Document function, as in each Csv.Document([Content],[Delimiter=”|”]). But don’t eliminate the Transform File function just yet.

The next task is to extend the customer query to discover and apply the header file dynamically. This involves the following high-level steps:

Step	M Expression
Get all files from the Blob container for the customer table that start with “header”.	#”Get Header File List” = Table.SelectRows(customer1, each Text.StartsWith([Name], “header”))
Read the content of the first file from this list by using the Transform File function. There should only be one header file. Any additional files would be ignored.	#”Read First Header File” = #”Transform File”(#”Get Header File List”{0}[Content]),
Transform the first row from this header file into a table.	#”Headers Table” = Record.ToTable(#”Read First Header File”{0})
Clean up the headers table by removing any rows that have no values.	#”Cleaned Headers Table” = Table.SelectRows(#”Headers Table”, each [Value] <> null and [Value] <> “”)
Modify the Expanded Transform File step and replace the long lists of static column references with the Name and Value lists from the cleaned headers table.	#”Expanded Transform File” = Table.ExpandTableColumn(#”Removed Other Columns”, “Transform File”, #”Cleaned Headers Table”[Name], #”Cleaned Headers Table”[Value])

The result is a customer table that includes all source files from the table’s Blob container with the correct header names, as in the following screenshot.

This is great progress, but the job is not yet done. The next challenge is to convert this source query into a global function so that it can be used across all tables, not just the customer table. The existing Transform File function can serve as a template for the new function. In Query Builder, right-click on Transform File, and select Duplicate. Give the new function a meaningful name, such as ReadBlobData.

M functions follow the format = (Parameter List) => let statement. As parameters, I use DataSource and BlobContainerName, and the let statement is almost an exact copy of the query for the customer table, except that I replaced the data source and container references with the corresponding DataSource and BlobContainerName parameters. It’s relatively straightforward to copy and adjust the entire source query by using Advanced Editor, as in the screenshot below. Also, make sure to save the original source query in a separate text file because it might be needed again. The next step then is to replace the customer source query and call the ReadBlobData function instead, as follows (note that the data source name is specific to my data model):

let
Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”,
BlobData = ReadBlobData(Source, “customer”)
in
BlobData

The results so far suffice for the customer table, but there is one more requirement to support the staged imports for the large tables. In other words, the ReadBlobData function should not just read all the source files from a given Azure Blob container at once but in specified ranges. In Query Editor, this is easy to add to the original table query. It is not so easy to do in a complex named expression, such as the ReadBlobData function. Unfortunately, editing a complex named expression in Query Editor almost always requires jumping into the Advanced Editor. No doubt, there is room for improvements in future SSDT Tabular releases.

As workaround, I temporarily reverted the customer query using my previously saved copy, selected the Filtered Rows step, and then on the Rows menu, selected Keep Range of Rows. After clicking Insert in the Insert Step dialog box, I specified appropriate values for the First row and Number of rows parameters and clicked OK (see the following screenshot).

The new Kept Range of Rows step then needed to be inserted into the ReadBlobData function in between the Filtered Rows and the Invoke Custom Function1 steps in Advanced Editor. The ReadBlobData function also required two additional parameters called FirstRow and NumberOfRows, as in #”Kept Range of Rows” = Table.Range(#”Filtered Rows”, Value.Subtract(FirstRow, 1),NumberOfRows). Note that the Query Builder UI considers the value of 1 to refer to row 0, so the ReadBlobData function uses the Value.Substract function to maintain this behavior for the FirstRow parameter. This completes the ReadBlobData function (see the following code listing). It can now be called from all source queries in the model, as summarized in the table below.

let
Source = (DataSource, BlobContainerName, FirstRow, NumberOfRows) => let
customer1 = DataSource{[Name=BlobContainerName]}[Data],
#”Filtered Rows” = Table.SelectRows(customer1, each not Text.StartsWith([Name], “header”)),
#”Kept Range of Rows” = Table.Range(#”Filtered Rows”,Value.Subtract(FirstRow, 1),NumberOfRows),
#”Invoke Custom Function1″ = Table.AddColumn(#”Kept Range of Rows”, “Transform File”, each #”Transform File”([Content])),
#”Removed Other Columns” = Table.SelectColumns(#”Invoke Custom Function1″,{“Transform File”}),
#”Get Header File List” = Table.SelectRows(customer1, each Text.StartsWith([Name], “header”)),
#”Read First Header File” = #”Transform File”(#”Get Header File List”{0}[Content]),
#”Headers Table” = Record.ToTable(#”Read First Header File”{0}),
#”Cleaned Headers Table” = Table.SelectRows(#”Headers Table”, each [Value] <> null and [Value] <> “”),
#”Expanded Transform File” = Table.ExpandTableColumn(#”Removed Other Columns”, “Transform File”, #”Cleaned Headers Table”[Name], #”Cleaned Headers Table”[Value])
in
#”Expanded Transform File”
in
Source

Category	Source Query
Small tables with only 1 source file per table.	let Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”, BlobData = ReadBlobData(Source, “<blob container name>”, 1, 1) in BlobData
Medium tables with 100 source files per table.	let Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”, BlobData = ReadBlobData(Source, “<blob container name>”, 1, 100) in BlobData
Large tables with 100 source files per table.	let Source = #”AzureBlobs/https://tpcdsfiles blob core windows net/”, BlobData = ReadBlobData(Source, “<blob container name>”, 1, 1) in BlobData

It is straightforward to create the 25 TPC-DS tables in the model by using the above source query pattern. Still, there is one more issue that must be addressed and that is that the source queries do not yet detect the data types for the table columns. This is an important prerequisite to analyzing the data. For each table, I modified the source query as follows:

In Query Builder on the Rows menu, select all columns.
On the Transform menu, under Any Column, select Detect Data Type.
Display Advanced Editor and double-check that the detected data type for each column in the Changed Type step is correct.

As a side note, instead of editing the source query of an existing table, it is currently better to delete the table and recreate it from scratch. There are still some work items left to finish before the table editing in SSDT Tabular can work reliably.

And that’s it as far as creating the tables for my TPC-DS Tabular 1400 model is concerned. The initial data load into the workspace model finishes quickly because I’m working against the 1 GB sample data set. The row counts in the following screenshot confirm that only a small subset of the data is imported.

The Tabular model is now ready for deployment to an Azure Analysis Services server. Apart from updating the deployment settings in the Tabular project properties to point SSDT Tabular to the desired target server, this would require changing the data source definition to import the data from the actual 1 TB data set. In Tabular Model Explorer, this can be accomplished by right-clicking on the existing data source object, and then choosing Change Source, which displays the Azure Blob Storage dialog box to update the Account name or URL, as in the following screenshot.

Finally, the model can be deployed by switching to Solution Explorer, right-clicking on the project node, and selecting Deploy. If necessary, SSDT will prompt for the access key before deploying the model. Processing only takes minutes because the source queries only import a few gigabytes at this time (see the screenshots below for processor activity and memory consumption on the Azure Analysis Server during and after processing). Having finished the deployment and initial processing, it is a good idea to change the data source definition again to revert to the 1 GB sample data set. This helps to avoid accidentally downloading large amounts of data to the SSDT workstation.

And that’s it for this round of working with a Tabular 1400 model in Azure Analysis Services on top of Azure Blob storage. The data model is now deployed. The next part is to add partitions to load more data. For this part, I am going to switch from SSDT Tabular to SQL Server Management Studio (SSMS), Tabular Model Scripting Language (TMSL), and Tabular Object Model (TOM) in Azure Functions. One of the main reasons is that SSDT Tabular does not facilitate incremental modifications to a model. It prefers to deploy models in an all or nothing method, which is not suitable for the data load strategy discussed in the current article. SSMS, TMSL, and TOM provide finer control. So, stay tuned for part 3 in this series to put Azure Analysis Services under some serious data pressure.

And as always, please take Tabular 1400 for a test drive. Send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

↧

Update to Auto-Partitioning Code Sample & Whitepaper

May 30, 2017, 10:22 am

≫ Next: Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

≪ Previous: Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 2

A new version of the whitepaper and code sample of automated partition management (see here for more info) is released! Enhancements include the following.

Support for 1400 models with M partitions and M expressions. Query partitions (with queries of the data source dialect) are still supported.
Auto retry n times on error for near-real time scenarios and environments with network reliability issues.
Integrated authentication for Azure AS with synchronization between on-prem Windows AD and Azure AD.
Auto max date automatically set to current date option for easier maintenance of configuration data.
All of the above is configurable in the configuration and logging database, which now uses a SQL Server Database project for easier deployment and schema comparisons of new versions.
Classification of log messages as Error or Informational.

↧

Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

June 22, 2017, 8:43 am

≫ Next: What’s new in SQL Server 2017 RC1 for Analysis Services

≪ Previous: Update to Auto-Partitioning Code Sample & Whitepaper

Part 2 finished with the deployment of a scalable Tabular 1400 model in Azure Analysis Services on top of Azure Blob storage. Part 3 continues the story by attempting to load up to 1 TB of TPC-DS source data into the model–hosted on the largest server that Azure Analysis Services offered at the time of this writing (an S9 with a capacity of 640 Query Processing Units and 400 GB of Cache). Can Azure Analysis Services import 1 TB of source data? How long would processing take and could it be accelerated? Let’s find out!

For this part of the story, I work primarily in SQL Server Management Studio (SSMS). SSMS can connect directly to an Azure Analysis Services server by using the fully qualified server name. For this purpose, it supports Active Directory Universal Authentication as well as Active Directory Password Authentication. SSMS enables you to perform administrative actions against a deployed model, such as incremental data loads and other direct operations, through the user interface as well as through scripts. The main tasks include editing data sources, adding partitions, and processing tables. Recent releases of SSMS also include a DAX Query Editor, as introduced in “SSMS Improvements for Analysis Services in the April 2017 Release,” which is especially convenient if you want to double-check row counts after processing or run other queries. For example, the following screenshot shows a straightforward DAX query to count the rows for each TPC-DS table after a full import.

But before SSMS can show any impressive row counts, it is necessary to get the TPC-DS data into the Tabular model. Initially, I had planned to do this in increments, but I was anxious to see if the full TPC-DS data set could be processed at all, so I decided to go all in at once with an attempt to import the full 1 TB of source data. This required modifying the existing partitions of the large tables in the deployed model (catalog_returns, catalog_sales, inventory, store_returns, store_sales, web_returns, and web_sales) to pull in all 100 data files per table. Accordingly, the ReadBlobData line in the source queries had to be changed from ReadBlobData(Source, “<blob container>”, 1, 1) to ReadBlobData(Source, “<blob container>”, 1, 100). By right-clicking on each large table, selecting Partitions, and then clicking on the Edit button in the Partitions dialog box, this task was quickly accomplished. Next, I ran a Tabular Model Scripting Language (TMSL) script to process these seven tables in full, as the following screenshot illustrates.

Processing took roughly 21 hours to complete (see the script execution time in the lower right corner of the SSMS query window above). Certainly, not an impressive processing performance, but it was exciting to see that an S9 Azure Analysis Services server could take a 1 TB TPC-DS data set. The server overallocated about 25 GB of memory (a total of 425 GB), but processing succeeded. After a manual server restart in the Azure Portal to free up any unused memory, the server reallocated approximately 390 GB to load the model. The following graph shows the memory allocation on the server prior and after the restart.

Note that memory allocation is not necessarily equivalent to model size. Especially, the Intel Threading Building Blocks (Intel TBB) allocator might proactively allocate more memory than is strictly needed. The Intel TBB allocator is enabled by default on large Azure Analysis Services servers for best performance and scalability.

Perhaps a more detailed view of the memory consumption is available through the DISCOVER_OBJECT_MEMORY_USAGE schema rowset. Again, the numbers are not always exact, but they do provide a sufficient estimate. Kasper de Jonge published a useful workbook called BISMServerMemoryReport.xlsx that relies on the DISCOVER_OBJECT_MEMORY_USAGE rowset to analyze the memory consumption on an Analysis Services server at any desired level of detail. And thanks to the full compatibility and rather seamless exchangeability of Azure Analysis Services with SQL Server Analysis Services, it is straightforward to use Kasper’s workbook to analyze the size of the TPC-DS tables and their columns on an Azure Analysis Services server, as in the screenshot below.

So, 1 TB of TPC-DS source data fit into a 350 GB Tabular model. This is not a sensational compression ratio, but the TPC-DS tables are rather wide and not optimized for column-based compression. Still, smaller models are easier to handle, so I looked for low-hanging fruit to reduce the model size and optimize the data import.

In terms of model size, the first and foremost optimization step is to eliminate unnecessary table columns from the model. As far as TPC-DS is concerned, unnecessary columns are those columns that are not referenced in any of the TPC-DS benchmark queries. Why import columns that aren’t participating in any queries? A quick analysis of the benchmark queries revealed that there are quite a few unused columns in the large TPC-DS tables. Furthermore, the BISMServerMemoryReport.xlsx workbook showed that these unused columns consume about 60 GB in the model (see the following spreadsheet). Eliminating these columns would yield nice savings in terms of model size and therefore memory capacity.

To remove these unnecessary columns, I switched back to SSDT Tabular, deleted the columns one at a time by using Tabular Model Explorer (TME), and then redeployed the model with Processing Options set to Full so that SSDT would fully reprocess the model after deployment. Following the deployment, I continued in SSMS as before to update the source queries of the large tables so that the ReadBlobData function would again include all data files for the large tables, and then ran my TMSL processing script one more time.

As anticipated, the resulting model was about 60 GB smaller than before and the server would allocate about 75 GB less memory, as shown below. Note, however, that the processing time did not decrease because the data transfer still included the full 1 TB of source data. This is because the data files first need to be transferred before file parsing can be performed locally on the Azure Analysis Services server. It is only that fewer parsed columns are mapped to table columns, resulting in a smaller model size. If Azure Blob Storage could filter out the unused columns right away, as more sophisticated data sources could, such as Azure SQL Data Warehouse, then the transfer of about 150 GB of raw data could have been avoided and processing time would have been improved as well. But this was not an option.

Given that the files needed to be read from Azure Blob Storage as before, it was not necessary to edit the source queries or modify the ReadBlobData function. As the following diagram illustrates based on the store_sales table, the ReadBlobData function still reads the contents of all the source files and continues to offer the full set of parsed columns to Azure Analysis Services for import. It’s just that Azure Analysis Services ignores the ss_net_paid_inc_tax column because it was deleted from the store_sales table in the model.

If Azure Blob Storage does not offer an option to reduce the data volume at the source, then perhaps processing time can be improved by pulling more data into Azure Analysis Services in parallel. For the initial data import, I modified the existing partition on each of the seven large tables to import all 100 files per table. So, Azure Analysis Services processed seven partitions in parallel, which took more than 21 hours to complete. The next test would be to use two partitions per table, each importing 50 files.

In SSMS, connected to the Azure Analysis Services server, I performed the following steps (see also the screenshots below):

Run Process Clear on all seven large tables to purge the existing data.
Edit the exiting partition of each table and change the ReadBlobData(Source, “<container name>“, 1, 100) line to ReadBlobData(Source, “<container name>“, 1, 50).
Create a copy of the partition to add a second partition to each table and change the ReadBlobData(Source, “<container name>“, 1, 50) line to ReadBlobData(Source, “<container name>“, 51, 50).
Run Process Full on the large tables to import the data again.

As you can see in the screenshot below, using two partitions per table helped to reduce the processing time by roughly 7 hours.

If 2 partitions per table have such a positive effect, then 4 partitions might yield perhaps even more gains. There is, however, one more detail to consider: the maxConnections parameter on the data source. By default, maxConnections is not specified explicitly. The default value is 10. So, Analysis Services establishes a maximum of 10 concurrent connections to the data source by default. Yet, with 7 large tables in the model and 4 partitions each, Analysis Services would need to process 28 partitions in parallel. Hence, it is necessary to adjust the maxConnections setting, as in the screenshot below. Note that the user interface currently does not expose the maxConnections parameter for modern data sources. In the current tools, this parameter must be specified through TMSL or programmatically by using the Tabular Object Model. Note also that maxConnections should not exceed the number of processor cores on the server. With 28 partitions and maxConnections set to 28, the S9 Azure Analysis Services server was able to finish processing in 11 hours and 36 minutes.

Subsequent experiments with higher partition counts (up to 100 partitions per table — one source file per partition) and additional storage accounts (up to seven — one for each table) did not produce any further noteworthy gains. As mentioned earlier, processing time could be reduced by switching to a more sophisticated data source, such as Azure SQL Data Warehouse, and then excluding the unnecessary columns at the source. A corresponding test showed that it took no more than an amazing 2 hours and 30 minutes to load the entire data set into an Azure SQL Data Warehouse by using PolyBase, following the steps outlined in the tutorial “Load data with PolyBase in SQL Data Warehouse,” and then processing times in Azure Analysis Services could be reduced to around 9 hours. But for the mere joy of processing 1 TB of raw blob data in Azure Analysis Services, 11 hours and 36 minutes was reasonably sufficient.

And that’s it for this rather detailed journey about deploying a Tabular 1400 model in Azure Analysis Services on top of a 1 TB TPC-DS data set in Azure Blob Storage. Thanks to the modern Get Data experience, you can build a flexible data import pipeline directly in the model and process even very large data sets within a reasonable timespan. And as always, please deploy the latest monthly release of SSDT Tabular and SSMS and use these tools to take Tabular 1400 in Azure Analysis Services for a test drive. Send us your feedback and suggestions by using ProBIToolsFeedback or SSASPrev at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums. Influence the evolution of the Analysis Services connectivity stack to the benefit of all our customers!

↧

What’s new in SQL Server 2017 RC1 for Analysis Services

July 17, 2017, 12:36 pm

≫ Next: Model Comparison and Merging for Analysis Services

≪ Previous: Building an Azure Analysis Services Model on Top of Azure Blob Storage—Part 3

The RC1 public preview of SQL Server 2017 is available here! It includes Dynamic Management View improvements for tabular models with compatibility level 1200 and 1400.

DMVs are useful in numerous scenarios including the following.

Exposing information about server operations and health.
Documentation of tabular models.
Numerous client tools use DMVs for a variety of reasons. For example, BISM Normalizer uses them to perform impact analysis for incremental metadata deployment and merging.

RC1 rounds off the DMV improvements introduced in CTP 2.0 and CTP 2.1.

DISCOVER_CALC_DEPENDENCY

DISCOVER_CALC_DEPENDENCY now works with 1200 and 1400 models. 1400 models show dependencies between M partitions, M expressions and structured data sources.

Further enhancements in RC1 include the following for 1200 (where applicable) and 1400 models.

Named dependencies result from DAX or M expressions that explicitly reference other objects. RC1 introduces named dependencies for DAX in addition to DAX data dependencies. Previous versions of this DMV returned only data dependencies. In many cases a dependency is both named and data. RC1 returns the superset.
In addition to dependencies between M partitions, M expressions and structured data sources, dependencies between provider data sources and non-M partitions (these are the traditional partition and data source types for tabular models) are returned in RC1.
The following new schema restrictions have been introduced to allow focused querying of the DMV. The table below shows the intersection of the schema restrictions with the type of objects covered.
- KIND with values of ‘DATA_DEPENDENCY’ or ‘NAMED_DEPENDENCY’.
- OBJECT_CATEGORY with values of ‘DATA_ACCESS’ or ‘ANALYSIS’.

	KIND		OBJECT_CATEGORY
	DATA_DEPENDENCY	NAMED_DEPENDENCY	DATA_ACCESS	ANALYSIS
Mashup	✔	✔	✔
Provider data source & non-M partitions	✔		✔
DAX named dependencies		✔		✔
Other data dependencies	✔			✔

Mashup dependencies are dependencies between M partitions, M expressions and structured data sources. They are named, M-expression based, and only apply to 1400 models.
Provider data source & non-M partitions are dependencies between traditional partitions and provider data sources. They are based on properties in tabular metadata rather than expression based, so are not considered “named”. They are available for 1200 and 1400 models.
DAX named dependencies are explicit named references in DAX expressions. They are available for 1200 and 1400 models.
Other data dependencies are data dependencies for DAX expressions and other types of data dependencies such as hierarchies and relationships. To avoid potential performance issues, data dependencies from DAX measures are only returned when using a QUERY schema restriction. They are available for 1100, 1103, 1200 and 1400 models.

1100 and 1103 models only return other data dependencies, and they ignore the new schema restrictions.

DAX data dependencies

DAX data dependencies and DAX named dependencies are not necessarily the same thing. For example, a calculated table called ShipDate with a DAX formula of “=DimDate” clearly has a named dependency (and data dependency) on the DimDate table. It also has data dependencies on the columns within DimDate, but these are not considered named dependencies.

Example: [KIND]=’NAMED_DEPENDENCY’

The following query returns the output shown below. All DAX and M expression named references in the model are included. These can originate from calculated tables/columns, measures, M partitions, row-level security filters, detail rows expressions, etc.

SELECT * FROM SYSTEMRESTRICTSCHEMA
    ($SYSTEM.DISCOVER_CALC_DEPENDENCY, [KIND] = 'NAMED_DEPENDENCY')

Example: [KIND]=’DATA_DEPENDENCY’

The following query returns the output shown below. Some data dependencies happen to also be named dependencies, in which case they are returned by this query and the one above with a NAMED_DEPENDENCY schema restriction.

SELECT * FROM SYSTEMRESTRICTSCHEMA
    ($SYSTEM.DISCOVER_CALC_DEPENDENCY, [KIND] = 'DATA_DEPENDENCY')

Example: [OBJECT_CATEGORY]=’DATA_ACCESS’

The following query returns the output shown below. Partitions, M expressions and data source dependencies are included.

SELECT * FROM SYSTEMRESTRICTSCHEMA
    ($SYSTEM.DISCOVER_CALC_DEPENDENCY, [OBJECT_CATEGORY] = 'DATA_ACCESS')

Example: [OBJECT_CATEGORY]=’ANALYSIS’

The following query returns the output shown below. The results of this query are mutually exclusive with the results above with a DATA_ACCESS schema restriction.

SELECT * FROM SYSTEMRESTRICTSCHEMA
    ($SYSTEM.DISCOVER_CALC_DEPENDENCY, [OBJECT_CATEGORY] = 'ANALYSIS')

MDSCHEMA_MEASUREGROUP_DIMENSIONS

RC1 provides improvements for this DMV, which is used by various client tools to show measure dimensionality. For example, the Explore feature in Excel Pivot Tables allows the user to cross-drill to dimensions related to the selected measures.

RC1 corrects the cardinality columns, which were previously showing incorrect values.

SELECT * FROM $System.MDSCHEMA_MEASUREGROUP_DIMENSIONS;

Download now!

To get started, download SQL Server 2017 RC1. The latest release of the Analysis Services VSIX for SSDT is available here. VSIX deployment for Visual Studio 2017 is discussed in this blog post.

Be sure to keep an eye on this blog to stay up to date on Analysis Services!

↧

Model Comparison and Merging for Analysis Services

July 19, 2017, 8:32 am

≫ Next: Online Analysis Services Course: Developing a Multidimensional Model

≪ Previous: What’s new in SQL Server 2017 RC1 for Analysis Services

Relational-database schema comparison and merging is a well-established market. Leading products include SSDT Schema Compare and Redgate SQL Compare, which is partially integrated into Visual Studio. These tools are used by organizations seeking to adopt a DevOps culture to automate build-and-deployment processes and increase the reliability and repeatability of mission critical systems.

Comparison and merging of BI models also introduces opportunities to bridge the gap between self-service and IT-owned “corporate BI”. This helps organizations seeking to adopt a “bi-modal BI” strategy to mitigate the risk of competing IT-owned and business-owned models offering redundant solutions with conflicting definitions.

Such functionality is available for Analysis Services tabular models. Please see the Model Comparison and Merging for Analysis Services whitepaper for detailed usage scenarios, instructions and workflows.

This is made possible with BISM Normalizer, which we are pleased to announce now resides on the Analysis Services Git repo. BISM Normalizer is a popular open-source tool that works with Azure Analysis Services and SQL Server Analysis Services. All tabular model objects and compatibility levels, including the new 1400 compatibility level, are supported. As a Visual Studio extension, it is tightly integrated with source control systems, build and deployment processes, and model management workflows.

Thanks to Javier Guillen (Blue Granite), Chris Webb (Crossjoin Consulting), Marco Russo (SQLBI), Chris Woolderink (Tabular) and Bill Anton (Opifex Solutions) for their contributions to the whitepaper.

↧

Online Analysis Services Course: Developing a Multidimensional Model

July 20, 2017, 7:21 pm

≫ Next: Deploying Analysis Services and Reporting Services Project Types in Visual Studio 2017

≪ Previous: Model Comparison and Merging for Analysis Services

Check out the excellent, new online course by Peter Myers and Chris Randall for Microsoft Learning Experiences (LeX). Lean how to develop multidimensional data models with SQL Server 2016 Analysis Services. The complete course is available on edX at no cost to audit, or you can highlight your new knowledge and skills with a Verified Certificate for a small charge. Enrollment is available at edX.

↧

Deploying Analysis Services and Reporting Services Project Types in Visual Studio 2017

August 23, 2017, 1:18 pm

≫ Next: Supporting Advanced Data Access Scenarios in Tabular 1400 Models

≪ Previous: Online Analysis Services Course: Developing a Multidimensional Model

(Co-authored by Mike Mallit)

SQL Server Data Tools (SSDT) adds four different project types to Visual Studio 2017 to create SQL Server Database, Analysis Services, Reporting Service, and Integration Services solutions. The Database Project type is directly included with Visual Studio. The Analysis Services and Reporting Service project types are available as separate Visual Studio Extension (VSIX) packages. The Integration Services project type, on the other hand, is only available through the full SSDT installer due to dependencies on COM components, VSTA, and SSIS runtime, which cannot be packed into a VSIX file. The full SSDT for Visual Studio 2017 installer is available as a first preview at https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt.

This blog article covers the VSIX packages for the Analysis Services and Reporting Service project types, specifically the deployment and update of these extension packages as well as troubleshooting best practices.

Deploying Analysis Services and Reporting Services

In Visual Studio 2017, the Analysis Services and Reporting Services project types are always deployed through the VSIX packages, even if you deploy these project types by using the full SSDT installer. The SSDT installer simply downloads the VSIX packages, which ensures that you are deploying the latest released versions. But you can also deploy the VSIX package individually. You can find them in Visual Studio Marketplace:

Analysis Services project types https://marketplace.visualstudio.com/items?itemName=ProBITools.MicrosoftAnalysisServicesModelingProjects
Reporting Services project types https://marketplace.visualstudio.com/items?itemName=ProBITools.MicrosoftReportProjectsforVisualStudio

The SSDT Installer is the right choice if you don’t want to add the Analysis Services and Reporting Services project types to an existing instance of Visual Studio 2017 on your workstation. The SSDT Installer installs a separate instance of SSDT for Visual Studio 2017 to host the Analysis Services and Reporting Services project types.

On the other hand, if you want to deploy the VSIX packages in an existing Visual Studio instance, it is perhaps easiest to display the Extensions and Updates dialog box in Visual Studio by clicking on Extensions and Updates on the Tools menu, then expanding Online in the left pane and selecting the Visual Studio Marketplace node. Then search for “Analysis Services” or “Reporting Services” and then click the Download button next to the desired project type, as the following screenshot illustrates. After downloading the desired VSIX package, Visual Studio schedules the installation to begin when all Visual Studio instances are closed.

The actual VSIX installation is very straightforward. The only input requirement is to accept the licensing terms by clicking on the Modify button in the VSIX Installer dialog box.

Of course, you can also use the full SSDT Installer to add the Analysis Services and Reporting Services project types to an existing Visual Studio 2017 instance. These project types support all available editions of Visual Studio 2017. The SSDT installer requires Visual Studio 2017 version 15.3 or later. Earlier versions are not supported, so make sure you apply the latest updates to your Visual Studio 2017 instances.

Updating Analysis Services and Reporting Services

One of the key advantages of VSIX packages is that Visual Studio automatically informs you when updates are available. So, it’s less burdensome to stay on the latest updates. This is especially important considering that updates for the Analysis Services and Reporting Services project types are released monthly. Whether you chose the SSDT Installer or the VSIX deployment method, you get the same update notifications because both methods deploy the same VSIX packages.

You can also check for updates at any time by using the Extensions and Updates dialog box in Visual Studio. In the left pane, expand Updates, and then select Visual Studio Marketplace to list any available updates that have not have been deployed yet.

Troubleshooting VSIX Deployments

Although VSIX deployments are very straightforward, there are situations that may require troubleshooting, such as when a deployment completes unsuccessfully or when a project type fails to load. When troubleshooting the deployment of the Analysis Services and Reporting Services project types, keep the following software dependencies in mind:

The Analysis Services and Reporting Services project types require Visual Studio 2017 version 15.3 or later. Among other things, this is because of Microsoft OLE DB Provider for Analysis Services (MSOLAP). To load MSOLAP, the project types require support for Registration-Free COM, which necessitates Visual Studio 2017 version 15.3 at a minimum.
The VSIX packages for Analysis Services and Reporting Services depend on a shared VSIX, called Microsoft.DataTools.Shared.vsix, which is a hidden package that doesn’t get installed separately. It is installed when you select the Microsoft.DataTools.AnalysisServices.vsix or the Microsoft.DataTools.ReportingServices.vsix. Most importantly the shared VSIX contains data providers for Analysis Services (MSOLAP, ADOMD.NET, and AMO), which both project types rely on.

If you are encountering deployment or update issues, use the following procedure to try to resolve the issue:

Check that you have installed any previous versions of the VSIX packages. If no previous versions exist, skip steps 2 and 3. If previous versions are present, continue with step 2.
Uninstall any previews instances of the project types and verify that the shared VSIX is also uninstalled:

1. Start the Visual Studio Installer application and click Modify on the Visual Studio instance you are using.
2. Click on Individual Components at the top, and then scroll to the bottom of the list of installed components.
3. Under Uncategorized, clear the checkboxes for Microsoft Analysis Services Projects, Microsoft Reporting Services Projects, and Microsoft BI Shared Components for Visual Studio. Make sure you remove all three VSIX packages, then the Modify button.

Note: Occasionally, an orphaned Microsoft BI Shared Components for Visual Studio package causes deployment issues. If an entry exists, uninstall it.

Check the following folder paths to make sure these folders do not exist. If any of them exist, delete them.

1. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\CommonExtensions\Microsoft\BIShared
2. C:\Program Files (x86)\Microsoft Visual Studio\SSDT\Enterprise\Common7\IDE\CommonExtensions\Microsoft\SSAS
3. C:\Program Files (x86)\Microsoft Visual Studio\SSDT\Enterprise\Common7\IDE\CommonExtensions\Microsoft\SSRS
4. C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\IDE\PublicAssemblies\Microsoft BI

Install the VSIX pages for the Analysis Services and Reporting Services project types from Visual Studio Marketplace and verify that the issue was resolved. If you are still not successful, continue with step 5.
Repair the Visual Studio instance to fix any shell-related issues that may prevent the deployment or update of the VSIX packages as follows:

1. Start the Visual Studio Installer application, click More Options on the instance, and then choose Repair.
2. Alternatively, use the command line with Visual Studio closed. Run the following command: “%programfiles(x86)%\Microsoft Visual Studio\Installer\resources\app\layout\InstallCleanup.exe” â€“full

Note: InstallCleanup.exe is a utility to delete cache and instance data for Visual Studio 2017. It works across instances and deletes existing corrupt, partial and full installations. For more information, see Troubleshooting Visual Studio 2017 installation and upgrade failures.

Repeat the entire procedure again to uninstall packages, delete folder paths, and then re-install the project types.

In short, version mismatches between the shared VSIX and the project type VSIX packages can cause deployment or update issues as well as a damaged Visual Studio instance. Uninstalling the VSIX packages and deleting any extension folders that may have been left behind takes care of the former and repairing or cleaning the Visual Studio instance takes care of the latter root cause.

Troubleshooting SSDT Installer Issues

Another known cause of issues relates to the presence of older SSAS and SSRS VSIX packages when installing the preview release of the SSDT Installer. The newer Microsoft BI Shared Components for Visual Studio VSIX package included in the SSDT Installer is incompatible with the SSAS and SSRS VSIX packages, and so you must uninstall the existing SSAS and SSRS VSIX packages prior to running the SSDT Installer. As soon as the SSAS and SSRS VSIX packages version 17.3 are released to Visual Studio Marketplace, then upgrading the packages prior to running SSDT Installer also helps to avoid the version mismatch issues.

And that’s it for a quick overview of the VSIX deployment, update, and troubleshooting for the Analysis Services and Reporting Services project types in Visual Studio 2017. And as always, please send us your feedback and suggestions by using ProBIToolsFeedback at Microsoft.com. Or use any other available communication channels such as UserVoice or MSDN forums.

↧

Supporting Advanced Data Access Scenarios in Tabular 1400 Models

August 28, 2017, 9:19 am

≫ Next: Using Azure Analysis Services on Top of Azure Data Lake Store

≪ Previous: Deploying Analysis Services and Reporting Services Project Types in Visual Studio 2017

In the context of this blog article, advanced data access scenarios refers to situations where data access functions inside M expressions cannot easily be represented through data source definitions in Tabular metadata. These scenarios are predominantly encountered when copying and pasting complex M expressions from an Excel workbook or a Power BI Desktop file into a Tabular 1400 model, or when importing Power BI Desktop files into Azure Analysis Services. Often, these complex M expressions require manual tweaking for processing of the Tabular 1400 model to succeed.

Background information

Before jumping into these advanced data access scenarios, let’s review two basic concepts that differ between M expressions in Power BI Desktop files and in Tabular 1400 models: Data Source Representations (DSRs) and Credentials/Privacy Settings. Understanding the differences is a prerequisite to mastering the advanced data access scenarios covered later in this article.

In Power BI Desktop, you can write a very simplistic M expression like the following and import the resulting table of database objects into your data model:

let
    Source = Sql.Database("server01", "AdventureWorksDW")
in
    Source

Yet, in Tabular 1400, you first need to define a data source in the model, perhaps called “SQL/server01;AdventureWorksDW”, and then write the M expression like this:

let
    Source = #"SQL/server01;AdventureWorksDW"
in
    Source

The result is the same. So why did Tabular 1400 models introduce a different way of providing the data source information? One of the main reasons is explicit data source definitions facilitate deployments. Instead of editing 50 table expressions in a model as part of a production deployment, you can simply edit a single data source definition and all 50 table expressions access the correct production data source. Another key requirement is programmability. While SSDT Tabular is certainly a main tool for creating Tabular 1400 models, it’s by no means the only one. In fact, many Tabular models are created and deployed in an automated way through scripts or ETL packages. The ability to define data sources in a model programmatically is very important.

Going one level deeper, check out the definition of the Sql.Database function in the Power Query (M) Formula Reference. You can see it is defined as in the Data Access Function column of the following table. Next, look at a structured data source definition for SQL Server in a Model.bim file of a Tabular 1400 model. The data source definition follows the format shown in the Data Source Definition column of the below table. A side-by-side comparison shows you that all parameters of the Sql.Database function are also available in the data source definition. That’s not surprising. However, the Mashup engine only works with M expressions. It doesn’t know anything about Tabular 1400 metadata. During modeling and processing, the tools and the AS engine must translate the data access functions and their data source definitions back and forth with the help of the Mashup engine.

Data Access Function	Data Source Definition
Sql.Database( server as text, database as text, optional options as nullable record) as table	{ “type”: “structured”, “name”: “name as text”, “connectionDetails”: { “protocol”: “tds”, “address”: { “server”: “server as text”, “database”: “database as text” }, “authentication”: null, “query”: “optional query as text” }, “options”: {optional options as nullable record} “credential”: {…} }

As a side note, if your M expressions in Power BI Desktop use data access functions that do not yet have a data source representation in Tabular 1400, then you cannot use these M expressions in your Tabular 1400 model. We first must enable/extend the corresponding connector. Our goal is to add a data source representation to every connector available in Power BI Desktop and then enable these connectors for Tabular 1400.

So far, things are relatively straightforward. But, it gets tricky when credential objects come into the picture. In the M expressions above, where are the credentials to access the data source? Power BI Desktop stores the credentials in a user-specific location on the local computer. This mechanism doesn’t work for Tabular 1400 because these models are hosted on an Analysis Services server. The credentials are stored in the credential property of the structured data source in the model metadata (see the data source definition in the table above).

Given that Power BI Desktop stores the credentials outside of the .pbix file, how does the Mashup engine know which credentials to use for which data source? The crucial detail is the Mashup engine uses the data access parameters to establish the association between the data access functions and their credentials. In other words, if your Power BI Desktop file includes multiple Sql.Database(“server01”, “AdventureWorksDW”) statements, because they all use the same address parameters, there is only one credential for all of them. You can notice this in Power BI Desktop as you are prompted only once for credentials to access a given data source, no matter how often you use the corresponding data access function. By default, Power BI Desktop associates a credential with the highest level of the address parameters, such as the SQL Server server name. But you can select lower levels if the data source supports it, such as server and database name, as shown in the following screenshot.

The tricky part is the same association rules apply to Tabular 1400. Theoretically, each data source definition in a Tabular 1400 model has its own credential object. This is certainly the case at the Tabular metadata layer. But as mentioned earlier, the Mashup engine does not deal with Tabular metadata. The Mashup engine expects M expressions with resolved data access functions. The AS engine must translate the structured data source definitions accordingly. If two data source definitions had the same address parameters, it would no longer be possible to identify their corresponding credential objects uniquely at the M layer. To avoid credential ambiguity, there can be only one data source definition with a given set of address parameters in a Tabular 1400 model. Plans exist to eliminate this restriction in a future compatibility level, but in Tabular 1400 this limitation is important to keep in mind when dealing with advanced data access scenarios.

Now, let’s jump into these advanced data access scenarios with increasing levels of complexity.

Accessing the same data source with different options

As a warmup, look at the following M expression. It’s not very useful, but it illustrates the point. The expression includes two Sql.Database calls with the same address parameters, but with different options. As explained above, in Tabular 1400, you cannot create two separate data source objects for these two function calls. But if you can only create one, then you only get one options record (see the previous table). So what are you going to do with the second command timeout? Can you perhaps use the same command timeout and consolidate both into a single data source definition? What if not?

let
    Source1 = Sql.Database("server01", "AdventureWorksDW", [CommandTimeout=#duration(0, 0, 5, 0)]),
    Source2 = Sql.Database("server01", "AdventureWorksDW", [CommandTimeout=#duration(0, 0, 10, 0)]),
    Combined = Table.Combine({Source1, Source2})
in
    Combined

The solution is not intuitive. Perhaps even misleading. It is fragile, and it breaks dependency analysis in the tools. It also breaks the programmability contract because you can no longer simply update the data source definition and expect your M expressions to pick up the changes. It clearly needs a better implementation in a future compatibility level. But for now, you can simply define a data source with the address parameters shown above and then use the M expression unmodified in Tabular 1400. The credential object is magically applied to all corresponding data access functions that use the same parameters across all M expressions in the model. That’s how the Mashup engine works, as the following screenshot illustrates.

Note: This workaround of using a data source merely as a holder of a credential object is not recommended and should be considered a last-resort option. If possible, create a common set of options for all data access calls to the same source and use a single data source object to replace the data access functions. Another option might be to register the same server with multiple names in DNS and then use a different server name in each data source definition.

Accessing the same data source with different native queries

Perhaps, you can consolidate your data access options and keep things straightforward in your Tabular 1400 model. But what if your data access functions use different native queries as in the following example?

let
    Source1 = Sql.Database("server01", "AdventureWorksDW", [Query="SELECT * FROM dimCustomer WHERE LastName = 'Walker'"]),
    Source2 = Sql.Database("server01", "AdventureWorksDW", [Query="SELECT * FROM dimCustomer WHERE LastName = 'Jenkins'"]),
    Combined = Table.Combine({Source1, Source2})
in
    Combined

Of course, one option is to rewrite the first native query to deliver the combined result so that you can eliminate the second function call. Yet, native query rewrites are not really required if you modified the M expression and used the Value.NativeQuery function, as in the following expression. Now there is only a single Sql.Database function call, which can nicely be replaced with a data source definition in Tabular 1400.

let
    Source = Sql.Database("server01", "AdventureWorksDW"),
    Table1 = Value.NativeQuery(Source, "SELECT * FROM dimCustomer WHERE LastName = 'Walker'"),
    Table2 = Value.NativeQuery(Source, "SELECT * FROM dimCustomer WHERE LastName = 'Jenkins'"),
    Combined = Table.Combine({Table1, Table2})
in
    Combined

Note: Even if you consolidated the queries, as in SELECT * FROM dimCustomer WHERE LastName = ‘Walker’ Or LastName = ‘Jenkins’, avoid putting this native query on the query parameter of the data source definition. The query parameter exists for full compatibility between data access functions and data source definitions. But it shouldn’t be used because it narrows the data source down to a particular query. You could not import any other data from that source. As explained earlier, you cannot define a second data source object with the same address parameters in Tabular 1400. So, define the data source in broadest terms and then use the Value.NativeQuery function to submit a native query.

Handling parameterized data access functions

The next level of challenges revolves around the dynamic definition data source parameters. This technique is occasionally used in Power BI Desktop files to maintain the address parameters for multiple data access function calls centrally. The following screenshot shows an example.

Fortunately, the concept of M-based address parameter definitions is comparable to the concept of a data source definition in Tabular 1400. In most cases, you should be able to define the Tabular 1400 data source as always by simply using the parameter values directly. Then, replace the data access function calls in your M expressions with the reference to the data source, delete the M-based parameter definitions, and the job is done. The result is a common Tabular 1400 M expression.

let
    Source = #"SQL/server01;AdventureWorksDW"
in
    Source

Of course, this technique only works for trivial, single-valued parameter definitions. For more complex scenarios, well, read on.

Dealing with Dynamic Data Access Functions

The ultimate challenge in the advanced data access scenarios is the fully dynamic implementation of a data access function call. There are various flavors. Let’s analyze the following simple example first.

let
    #"Dynamic Function" = Sql.Database,
    Source = #"Dynamic Function"("server01", "AdventureWorksDW")
in
    Source

Clearly, this M expression is accessing a SQL Server database called AdventureWorksDW on server01. You can create a data source definition for this database in Tabular 1400 and replace these lines with a reference to that data source definition, as shown in the sample in the previous section above. This is easy because it wasn’t really a dynamic data source, yet. Here’s a fully dynamic example.

let
    Source = Sql.Database("server01", "Dynamic"),
    DataAccessPoints = Source{[Schema="dbo",Item="DataAccessPoints"]}[Data],
    NavigationTable = Table.AddColumn(DataAccessPoints, "Data", each Connect([FunctionName], [Parameter1], [Parameter2]))
in
    NavigationTable

This M expression retrieves all rows from a DataAccessPoints table in a SQL Server database called Dynamic. It then passes the column values of each row to a custom function called Connect, which then connects to the specified data source. The Connect function is based on the following M expression.

let
    Source = (FunctionName, Param1, Param2) => let
        FunctionTable = Record.ToTable(#shared),
        FunctionRow = Table.SelectRows(FunctionTable, each ([Name] = FunctionName)),
        Function = FunctionRow{0}[Value],
        Result = if Param2 = null then Function(Param1) else Function(Param1, Param2)
    in
        Result
in
    Source

Not only are the address parameters (Param1 and Param2) dynamically assigned, but the name of the data access function itself is also passed in as a parameter (FunctionName). So, unless you know the contents of the DataAccessPoints table, you cannot even determine what data sources this M expression accesses! You must evaluate the expressions against the actual data to know what credentials you need to supply, and you must define privacy settings. In this example, the following screenshot reveals that the expressions connect to an OData feed as well as a SQL Server database, but it could really be any supported data source type.

So, if you wanted to use such a dynamic construct, you would have to create the corresponding data source objects with their credentials in your Tabular 1400 model. Power BI Desktop can prompt you for the missing credentials if you added a new entry to the DataAccessPoints table, but Analysis Services cannot because there is no interactive user on a server, and processing would fail. You must add a new matching data source definition with the missing credentials and privacy settings upfront, which somewhat defeats the purpose of a fully dynamic data source definition.

Looking to the future

Perhaps you are wondering at this point why we didn’t provide more flexible support for credential handling in Tabular 1400. One of the main reasons is the footprint of the modern Get Data experience on the Mashup engine is already significant. It wasn’t justifiable to take a sledgehammer approach at the Mashup layer. Especially when it concerns security features used across several Microsoft technologies, including Power BI.

Fully dynamic scenarios are certainly interesting and fun, but they are corner cases. The more common advanced data access scenarios can be handled with moderate effort in Tabular 1400. And a future compatibility level is going to remove the limitation of a single data source definition per target parameters. It requires some replumbing deep in the Mashup engine. On the other hand, how Tabular models are going to support fully dynamic data source definitions in future compatibility levels hasn’t yet been decided. One idea is to move credential storage out of the Tabular metadata. Another is to introduce new metadata objects that can be associated with data access functions through their address parameters. Perhaps another is to invent a form of processing handshake. And there may be other options. If you have a great idea, please post a comment or send it our way via ssasprev at microsoft.com, or use any other available communication channels such as UserVoice or MSDN forums.

That’s it for this excursion into handling advanced data access scenarios when moving M expressions from Power BI Desktop to a Tabular 1400 model. One of the next articles will show you how to use legacy provider data sources and native query partitions together with the new structured data sources and M partitions in a Tabular 1400 model. Stay tuned for more Get Data coverage on the Analysis Services Team Blog!

↧