Use query acceleration to retrieve data from Azure Data Lake Storage


Few key points about query acceleration –

Query acceleration supports ANSI SQL like language, to retrieve only the required subset of the data from the storage account, reducing network latency and compute cost.

Query acceleration requests can process only one file, thus joins and group by aggregates aren’t supported.

Query acceleration supports both Data Lake Storage (with hierarchical namespace enabled) and blobs in the storage account.

Query acceleration supports CSV and JSON formatted data as input.

Let us take a simple example to see it in action.

Within mydatalakegen (StorageV2 (general purpose v2)), we have All Contacts.csv with the mycrmcontainer.

Open the Windows PowerShell command window

Sign in to Azure subscription

  • Connect-AzAccount

Register the query acceleration feature

  • Register-AzProviderFeature -ProviderNamespace Microsoft.Storage -FeatureName BlobQuery

Register the resource provider

  • Register-AzResourceProvider -ProviderNamespace ‘Microsoft.Storage’

Create a console application project in Visual Studio and add the following NuGet Packages

Sample Code –

</pre>

using System;
using System.Globalization;
using System.IO;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
using Azure.Storage.Blobs.Specialized;
using CsvHelper;
using CsvHelper.Configuration;

namespace MyQuery
{
class Program
{
static void Main(string[] args)
{

// Initialize the BlockBlobClient
BlockBlobClient myBlobClient = new BlockBlobClient(connectionString: "DefaultEndpointsProtocol=https;AccountName=mydatalakegen;AccountKey=orc8e1Dpclu5P3Ox9PIlsLG2/x8KZLcmgyhOEgz6yFTmzFJty+EpHQ==;EndpointSuffix=core.windows.net",
containerName: "mycrmcontainer", blobName: "All Contacts.csv");

// Define the query
// First Name - space in the column header
// _4 - referring the 4th column in the csv file
// LIMIT - limit to first 10 records
string query = @"SELECT ""First Name"", _4, email FROM BlobStorage LIMIT 10";

var blobQueryOptions = new BlobQueryOptions();
blobQueryOptions.InputTextConfiguration = new BlobQueryCsvTextOptions() { HasHeaders = true };

var result = myBlobClient.Query(query, blobQueryOptions);
var reader = new StreamReader(result.Value.Content);

var parser = new CsvReader(reader, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true });

while(parser.Read())
{
Console.Out.WriteLine(String.Join(" ", parser.Context.Record));
}

Console.ReadLine();
}
}
}
<pre>

Output –

Get all the details here –

https://docs.microsoft.com/en-us/azure/storage/blobs/query-acceleration-sql-reference

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration-how-to?tabs=azure-powershell%2Cpowershell

Hope it helps..

Use AzCopy to transfer files from local drive to Azure Blog Storage


AzCopy is a command-line utility that can be used for copying data to and from the storage accounts.

Download the appropriate version of the tool –

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10#download-azcopy

We’d upload the files to the following container

Below will be the source files

Let us login first (here we are using Azure Active Directory to authorize AzCopy, the other option is using SAS token)

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10?WT.mc_id=itopstalk-blog-thmaure#authorize-azcopy

It will ask us to open the URL in the browser and enter the code followed by credentials.

After successful sign-in we can close the browser window.

Now let us transfer the directory along with the files inside it using the below syntax

azcopy copy ‘<local-directory-path>’

https://<storage-account-name&gt;.<blob or dfs>.core.windows.net/<container-name>’ – -recursive

in our case,

azcopy copy ‘C:\Customers’ ‘https://storageaccountrg9b58.blob.core.windows.net/mycontainer1‘ – -recursive

Get the URL from the Properties of the container

Make sure the account (service principal) used has the Storage Blob Data Contributor or Storage Blog Data Owner role assigned required for uploading the files.

Run the command.

We can see the folder and files successfully transferred.

Check other posts – 

Transfer files using – Azure Blob Upload task and Premium File transfer task using SSIS Package

https://nishantrana.me/2020/11/24/transfer-files-from-local-drive-to-azure-blob-using-azure-blog-upload-task-ssis/

https://nishantrana.me/2020/11/20/transfer-files-from-local-drive-to-azure-blob-using-premium-file-transfer-task-ssis/

Hope it helps..

Fixed – authorizationpermissionmismatch Azure Blob Storage


We got the below error while trying to transfer files to Azure Blob Storage using AzCopy

INFO: Authentication failed, it is either not correct, or expired, or does not have the correct permission -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/azure-storage-blob-go@v0.10.1-0.20201022074806-8d8fc11be726/azblob/zc_storage_error.go:42

===== RESPONSE ERROR (ServiceCode=AuthorizationPermissionMismatch) =====

Description=This request is not authorized to perform this operation using this permission.

RequestId:43ee21af-501e-0055-30ef-c07ec3000000

Time:2020-11-22T16:51:42.0459952Z, Details:

   Code: AuthorizationPermissionMismatch

   PUT https://storageaccountrg9b58.blob.core.windows.net/mycontainer1/Customers/CIF1/Sample1.txt?timeout=901

Here we were using Azure Active Directory to provide authorization credentials to AzCopy

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10#option-1-use-azure-active-directory

The account (service principal) we were using was having the Owner Role

To fix this issue, we assigned the

Storage Blob Data Contributor role to the account.

Retrying again after some time fixed the issue.

Points to keep in mind –

https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-authorize-azure-active-directory#verify-role-assignments

Hope it helps..