How to – Export Dataverse (Dynamics 365) data to Azure SQL using Azure Data Factory pipeline template


[Visual Guide to Azure Data Factory - https://acloudguru.com/blog/engineering/a-visual-guide-to-azure-data-factory]

Using the new Azure Data Factory pipeline template – Copy Dataverse data from Azure Data Lake to Azure SQL – we can now easily export the Dataverse data to Azure SQL Database.

https://docs.microsoft.com/en-us/power-platform-release-plan/2021wave1/data-platform/export-dataverse-data-azure-sql-database

Check other posts on Azure Data Factory

Select Pipeline from template option inside the Data Factory

Search for Dataverse and select the Copy Dataverse data from Azure Data Lake to Azure SQL template

Let us specify the User Inputs required by the template – i.e. Azure SQL Database and the Data Lake Storage.

First we have created the linked service for the Azure SQL Database.

We’d use it to connect to the below table MyContacts.

Similarly create a linked service to Azure Data Lake Gen 2, which holds our Dataverse data.

Get the URL from the Container’s property. (replace blob with dfs in the URL)

To get the storage account key, select Access Keys >> Show Keys >> Copy the Key for the Storage Account.

Here we have already configured Azure Synapse Link for Dataverse

https://nishantrana.me/2020/09/07/export-data-from-common-data-service-to-azure-data-lake-storage-gen2/

Now as we have defined the User Inputs, select Use this template.

Navigate to the data flow created – DataverseToAzureSQL

Select our source ADLS and check and configure its properties.

Source Settings

Here we have the Inline dataset type set to Common Data Model and the Linked service is the AzureDataLakeStorage1 we created earlier.

Source Option

Specify the Dataverse folder for the Root Location.

Here we have specified the contact entity from our Data Lake Storage.

Projection

In the projection we have cleared the generated schema using Clear Schema, also selected Schema options >> Allow schema drift


We have enabled Allow schema drift option which will create the required columns in the destination Azure SQL Table.

Optimize

Inspect

Data preview

As we have not turned on Debug mode, there is nothing to preview

Now let us move to our Destination – SQL.

Sink

Here we have AzureSQLTable dataset connected to contact table in Azure SQL and have checked Allow schema drift option.


Below is our AzureSQLTable Dataset connected to the MyContacts table.

Settings

Here we have selected Allow Insert as the Update Method and Table
action as Recreate table – as we want the destination table to be re-created dynamically based on the source.

Mapping

We have left it to Auto mapping.

Optimize

Inspect

Data preview

Let us Publish All our changes and Debug our pipeline.

Let us monitor our pipeline run.

We can see that pipeline has run successfully and took around 4 minutes.

We can see the contact’s data copied to our Azure SQL successfully.

So here we covered the insert operation, in the next posts we’d see how we can configure update, upsert and delete operation.

Also, check

Posts on Azure Data Factory

Posts on Azure Data Lake

Hope it helps..

Fixed – AuthorizationFailed. The client with object id does not have authorization to perform action ‘’Microsoft.Authorization/roleAssignments/write’ over scope ‘storageaccount’ – Azure Data Lake


While configuring the Azure Synapse Link/ Export to Data Lake service, we were getting below error for one of the users.

{“code”:”AuthorizationFailed”,”message”:”The client ‘nishantr@pmaurua105.onmicrosoft.com’ with object id ‘d56d5fbb-0d46-4814-afaa-e429e5f252c8’ does not have authorization to perform action ‘Microsoft.Authorization/roleAssignments/write’ over scope ‘/subscriptions/30ed4d5c-4377-4df1-a341-8f801a7943ad/resourceGroups/RG/providers/Microsoft.Storage/storageAccounts/saazuredatalakecrm/providers/Microsoft.Authorization/roleAssignments/2eb81813-3b38-4b2e-bc14-f649263b5fcf’ or the scope is invalid. If access was recently granted, please refresh your credentials.”}

The current Role assignments of the user were as below.

To resolve it we need to give the Owner role to the user configuring the Export to Data lake service on the Storage account. (apart from System Admin role within Dynamics 365)

https://docs.microsoft.com/en-us/powerapps/maker/data-platform/export-to-data-lake#prerequisites

After assigning the Owner Role (and removing the other roles), it worked successfully.

Check other posts on Azure Data Lake/ Azure Synapse Link

Posts on Azure Data Lake

Hope it helps..

Advertisements

How to setup – Azure Synapse Link – Microsoft Dataverse


Azure Synapse Link (earlier known as Export to Data Lake Service) provides seamless integration of DataVerse with Azure Synapse Analytics, thus making it easy for users to do ad-hoc analysis using the familiar T-SQL with Synapse Studio, build Power BI Reports using Azure Synapse Analytics Connector or use Azure Spark in Azure Synapse for analytics.

As a first step, we need to create the Azure Synapse Workspace.

Login to Azure Portal (https://portal.azure.com/ ) with appropriate roles and create a Synapse workspace.

https://docs.microsoft.com/en-us/azure/synapse-analytics/get-started-create-workspace

Wait for the deployment to be complete. Below are the resources created as part of the deployment.

Login to the maker portal (https://make.powerapps.com/ ) and select the appropriate environment

Click on Azure Synapse Link and check the Connect to your Azure Synapse Analytics workspace (preview) and specify the storage account created in the previous step

Here we have selected the Contact and System User table for export.

On clicking Save, we got this error couple of times, even though the account being used was Owner across the Azure resources. Trying save 3-4 times more, allowed the link to be created.

Click on Go to Azure Synapse Analytics workspace, opens the Azure Synapse Studio

Within Synapse Studio we can see our Dataverse Database and run query.

Similarly, we create a Power BI report with Azure Synapse Analytics (SQL DW) connector.

Copy the Serverless SQL endpoint of the Synapse workspace.

Select the Direct Query option.

https://azure.microsoft.com/en-gb/blog/power-your-business-applications-data-with-analytical-and-predictive-insights/

Hope it helps..

Advertisements

Azure Synapse Link / Export to Data Lake Service – Performance (initial sync)


Recently we configured the Export to Data Lake service for one of our projects.

Just sharing the performance, we got during the initial sync.

Entity

Count

Contact

2,36,2581

Custom entity

1,61,3554

The sync started at 11:47 A.M and was completed around 4:50 P.M. – around 5 hours i.e. 300 minutes

Let us consider total records synced as 3975000 – 1613446 + 2362581 + 2 = 3976029.

So the performance here comes down to

  • 795000 records per hour
  • 13250 records per minutes
  • 220 records per second.

Of course, it will vary depending on the specific environment, table / entity type, attributes in it etc. it gives us a rough idea.

Hope it helps..

Advertisements

Advanced configuration settings – Azure Synapse Link / Export to Data Lake service (Dataverse/ Dynamics 365)


The Export to Data Lake service now has some Advanced configuration settings available.

To learn more on Export to Data Lake service

https://nishantrana.me/2020/12/10/posts-on-azure-data-lake/

The new settings allow us to configure how the DataVerse / CRM table data is written to Azure Data Lake.

  • In-Place update or upsert (default)
  • Append Only

With the in-place update, the default setting, the file will contain the full data set, and any update in the source will update the same in the synced CSV file or the data partition, similarly, any record deleted will delete the row from the data partition, unlike Append Only where a new row will be added in case of both update and delete.

For huge volume of data, Microsoft recommends opting for Append only mode. This mode is also preferable when an organization wants to incrementally review the changed data.

The other option is to define the data partition strategy.

  • By Month (default)
  • By Year

With this option, files generated are partitioned by either year or more granular month-wise, which can be specified per-table basis.

Microsoft recommends Monthly partition if data volume is high.

Now, let us see it in action.

For the Lead table, we haven’t selected the option for advanced configuration settings and are going by default.

  • Append Only – No
  • Partition Strategy – Month

For contact, we have enabled the advanced configuration settings and opted for Partition Strategy as Year.

For Account , we have opted for Append Only as true, for which the Partition strategy option is disabled and set as Year.

The final configuration à

Within the container inside the Storage Account, we can see corresponding folders created per table/entity along with model.json as shown below.

Let us explore the Lead folder –

We can see 2 CSV created with format YYYY-MM.csv i.e. having the month part in it because we had specified Partition Strategy as Month i.e. the default value.

For Contact and Account, the Pattern Strategy was Year, so we have files generated in format YYYY.csv

Let us update one of the lead records by appending ‘Updated’ in the last name field.

After the successful sync,

we can see the record updated in the .csv / partition.

The same is the case with the contact record.

Now let us update an account record, it had Append Only specified as Yes.

Here we update the Account Name field from Litware to Litware Updated.

After the sync

We can see a new row appended with the updated record along with the original record.

Let us delete the same account record

As expected, being Append Only mode, we can see a new row added for the Litware record.

We have 2 additional rows apart from the original row, one created for update and the other for delete action.

Export to Data Lake service is Microsoft’s recommended way of synchronizing Dataverse Data with external storage, and we can see them continuously investing and adding enhancements to it.

Get all the details here –

https://docs.microsoft.com/en-us/powerapps/maker/data-platform/export-to-data-lake#data-partition-strategy

Hope it helps..

Fixed – Initial sync status – Not Started – Azure Synapse Link / Export to Data Lake


Recently while configuring the Export to Data Lake service, we observed the initial sync status being stuck for one of the tables at Not started.

Manage tables option also was not working

All changes for existing tables are temporarily paused when we are in the process of exporting data for new table(s). We will resume writing changes for existing table(s) after we complete exporting data for new table(s) to the Azure data lake.

As it was seemed to be stuck forever, we tried Unlinking the data lake and linking it back.

Select Yes

We left the Delete data lake file system unchecked.

Created a New link to data lake with the same storage account.

We would get the below error à

An error occurred: Container: dataverse-pmaurya105-unqdc8ed1c1df824188bbe2225de96f0 already existed with files for storage account: saazuredatalakecrm. Please clean dataverse-pmaurya105-unqdc8ed1c1df824188bbe2225de96f0

Basically, if we are using the same storage account for linking, we need to first delete or clean the container.

We cleaned the container.

And tried again.

This time it worked

Here we can check the Delete data lake file system option while Unlinking the data lake

This will perform the same step – deleting the files within the container.

If that doesn’t work or is not feasible, we should raise Microsoft Support Ticket.

https://powerusers.microsoft.com/t5/Microsoft-Dataverse/Added-new-Table-to-Export-to-Data-Lake-Now-Sync-is-blocked/m-p/924560#M11400

Check other posts on Azure Data Lake and Dynamics 365 – 

https://nishantrana.me/2020/12/10/posts-on-azure-data-lake/

Hope it helps..

Advertisements