How to – Export Dataverse (Dynamics 365) data to Azure SQL using Azure Data Factory pipeline template


[Visual Guide to Azure Data Factory - https://acloudguru.com/blog/engineering/a-visual-guide-to-azure-data-factory]

Using the new Azure Data Factory pipeline template – Copy Dataverse data from Azure Data Lake to Azure SQL – we can now easily export the Dataverse data to Azure SQL Database.

https://docs.microsoft.com/en-us/power-platform-release-plan/2021wave1/data-platform/export-dataverse-data-azure-sql-database

Check other posts on Azure Data Factory

Select Pipeline from template option inside the Data Factory

Search for Dataverse and select the Copy Dataverse data from Azure Data Lake to Azure SQL template

Let us specify the User Inputs required by the template – i.e. Azure SQL Database and the Data Lake Storage.

First we have created the linked service for the Azure SQL Database.

We’d use it to connect to the below table MyContacts.

Similarly create a linked service to Azure Data Lake Gen 2, which holds our Dataverse data.

Get the URL from the Container’s property. (replace blob with dfs in the URL)

To get the storage account key, select Access Keys >> Show Keys >> Copy the Key for the Storage Account.

Here we have already configured Azure Synapse Link for Dataverse

https://nishantrana.me/2020/09/07/export-data-from-common-data-service-to-azure-data-lake-storage-gen2/

Now as we have defined the User Inputs, select Use this template.

Navigate to the data flow created – DataverseToAzureSQL

Select our source ADLS and check and configure its properties.

Source Settings

Here we have the Inline dataset type set to Common Data Model and the Linked service is the AzureDataLakeStorage1 we created earlier.

Source Option

Specify the Dataverse folder for the Root Location.

Here we have specified the contact entity from our Data Lake Storage.

Projection

In the projection we have cleared the generated schema using Clear Schema, also selected Schema options >> Allow schema drift


We have enabled Allow schema drift option which will create the required columns in the destination Azure SQL Table.

Optimize

Inspect

Data preview

As we have not turned on Debug mode, there is nothing to preview

Now let us move to our Destination – SQL.

Sink

Here we have AzureSQLTable dataset connected to contact table in Azure SQL and have checked Allow schema drift option.


Below is our AzureSQLTable Dataset connected to the MyContacts table.

Settings

Here we have selected Allow Insert as the Update Method and Table
action as Recreate table – as we want the destination table to be re-created dynamically based on the source.

Mapping

We have left it to Auto mapping.

Optimize

Inspect

Data preview

Let us Publish All our changes and Debug our pipeline.

Let us monitor our pipeline run.

We can see that pipeline has run successfully and took around 4 minutes.

We can see the contact’s data copied to our Azure SQL successfully.

So here we covered the insert operation, in the next posts we’d see how we can configure update, upsert and delete operation.

Also, check

Posts on Azure Data Factory

Posts on Azure Data Lake

Hope it helps..

Write batch size, data integration unit, and degree of copy parallelism in Azure Data Factory for Dynamics CRM / 365 Dataset


Let us take a simple example where we are moving contact records (.CSV) stored in Azure File Share to Dataverse or Dynamics 365 (UPSERT).

CSV file has 50000 sample contact records (generated using https://extendsclass.com/csv-generator.html) stored in Azure File Storage.

Another option of generating sample data

https://nishantrana.me/2020/05/26/using-data-spawner-component-ssis-to-generate-sample-data-in-dynamics-365/

The Source in our Data Factory pipeline.

The Sink is our Dynamics 365 / Dataverse sandbox environment, here we are using the Upsert write behavior.

For the Sink, the default Write batch size is 10.

Max concurrent connections specify the upper limit of concurrent connections that can be specified.

Below is our Mapping configuration

The Settings tab for the pipeline, allows us to specify,

Data Integration Unit specifies is the powerfulness of the copy execution.

Degree of copy parallelism specifies the parallel thread to be used.

Let us run the pipeline with the default values.

  • Write Batch Size (Sink) – 10
  • Degree of copy parallelism – 10
  • Data integration unit – Auto (4)

The results à It took around 58 minutes to create 50K contact records.

We then ran the pipeline few more times by specifying the different batch sizes and degree of copy parallelism.

We kept Max concurrent connections as blank and Data Integration Unit as Auto. (during our testing even if we are setting it to higher values, the used DIUs value as always 4)

Below are the results we got à

Write Batch Size Degree of copy parallelism Data Integration Unit (Auto) Total Time (Minutes)
100 8 4 35
100 16 4 29
1000 32 4 35
       
250 8 4 35
250 16 4 25
250 32 4 55
       
500 8 4 38
500 16 4 29
500 32 4 28
       
750 8 4 37
750 16 4 25
750 32 4 17
       
999 8 4 36
999 16 4 30
999 32 4 20

The results show that increasing the batch size and degree of copy parallelism improves the performance in our scenario.

Ideally, we should run a few tests with different combinations before settling for a specific configuration as it could vary.

On trying to set the batch size to more than 1000,

We would get the below error à
ExecuteMultiple Request batch size exeeds the maximum batch size allowed.

Also refer –

Optimizing Data Migrationhttps://community.dynamics.com/crm/b/crminthefield/posts/optimizing-data-migration-integration-with-power-platform

Using Data Factory with Dynamics 365https://nishantrana.me/2020/10/21/posts-on-azure-data-factory/

Optimum batch size with SSIShttps://nishantrana.me/2018/06/04/optimum-batch-size-while-using-ssis-integration-toolkit-for-microsoft-dynamics-365/

Hope it helps..

Advertisements