AZURE Data Factory

The data landscape for enterprises is rising exponentially in volume, variety, and complexity. Unstructured, semi-structured and cloud-borne data need to be processed through various tools, frameworks, and custom services.

Azure Data Factory (ADF) is a data integration service based on the cloud. It orchestrates and automates the movement and transformation of data. ADF is a part of Microsoft’s analytics suite and is not a standalone ETL (Extract Transform Load) tool. Composing, orchestrating and monitoring is done for diverse data of various origins which give ADF a diversified data collector stature.

Azure Data Factory works across on-premise, cloud data sources as well as SaaS to assimilate, prepare, transform, analyze, and finally publish the data. Like a manufacturing factory that runs various equipment to ingest the raw materials and convert them into finished goods, ADF conveniently collects raw data from diverse sources and transforms them into ready-to-use information.

In addition to the above functionalities, Azure Data Factory also provides a dynamic monitoring dashboard where the users can monitor the data pipelines.

Azure Data Factory Concepts

Dataset: Datasets are named references/pointers to the data you want to use as an input or an output of an Activity. E.g. table, files.

Activity: Activities define the actions to perform on your data. E.g. HIVE, copy.

Pipeline: Pipelines are a logical grouping of Activities. E.g. manage, monitor, schedule.

Linked service: Linked services define the required information for Data Factory to connect to various external resources (Connection Strings). E.g. SQL server, Hadoop cluster.

Developers can also create Data Factory with Azure Portal, PowerShell (using Azure Resource Manager templates), Visual Studio (with Azure .NET SDK 2.7 or later), REST APIs – Azure Data Factory SDK. Users can author activities, combine them into a pipeline and set an execution schedule.

ADF Can Come Handy in the Following Scenarios

Access to data sources such as SQL Server On-premise, SQL Azure, Azure Blob Storage, etc.
Data transformation through Hive, Pig, C#, etc.
Monitoring pipelines of data, validation, and execution of scheduled jobs and more.
Loading into desire destinations such as local SQL servers, SQL Azure or Azure Blob Storage, etc.

Prerequisites to Use ADF

Microsoft Data Management Gateway
To sign up, credit/debit card information is needed. A free trial of 30 days is available.

It’s a common notion that Azure is here to replace SSIS but ADF comes as a complimentary service and not to compete with SSIS.

SSIS	ADF
Development Tool	SSDT/BIDS	Portal, ADF Editor, PowerShell, JSON Script
Data Source and Destinations	Many sources and destinations	AZURE Storage, ASURE SQL database, SQL Server, Filesystem
Data Transformations	Many transformations	Fewer activities. Need to write own activities.
Environment	Administrative efforts need good software hardware	Azure will take care of everything
Pricing	Pay for futures	Pay per Usage
Error handling	Error handling through event handlers, the failure precedence constraint	Alert Rules, no event handlers, error message logging
Deployment	Deployment wizard	PowerShell Scripts
Monitoring	SSIS logging and catalogue reports	Powerful GUI, DataSlice execution, drill through monitor feature
Data Lineage	None	Available
Security	Role-based for deploy, execute and monitor	Roles such as owner, contributor, reader, data factory contributor, and user access administrator.

Moving On-premises Files to Azure SQL Database

Steps to copy .csv file data to cloud databases.

Place the Source file (.csv) file on the local machine.
Create SQL Database in Azure portal.
Create a destination table for the above-created database using SSMS on your local machine.
Create Data Factory in the Azure portal for the required transfer:

- Create a gateway to access on-premises resources (File, Database).
- Create Linked service for Source (File) and Destination (Azure table).
- Create Datasets for Source and Destination.
- Create a pipeline to move data from Source to Destination.

Monitor and Manage – for each data set we can see the status of the row (whether it is transferred or not, failure logs if any, re-run option if required).

Azure data factory is highly available, fault-tolerant, and a fully managed service. Additionally, features like automatic cluster management, retrying for transient failures, configurable timeout policies and alerts make it a very convenient tool to process data in a managed environment.

AZURE Data Factory

Azure Data Factory Concepts

ADF Can Come Handy in the Following Scenarios

Prerequisites to Use ADF

Moving On-premises Files to Azure SQL Database

Services

Industries

Partnerships

Innovation @ Work

Company

Careers

Clinical research and regulatory staff for a global biotechnology company

Overview

Scope of Offering

Results

Multiple executive direct hire positions for a California-based global provider of medical equipment

Overview

Scope of Offering

Results

IT staff augmentation for a Fortune 50 multinational investment bank and financial services corporation

Overview

Scope of Offering

Results

Staff augmentation and direct hire partner for a public biotechnology company

Overview

Scope of Offering

Results

700+ consultants for the CIO program of a Fortune 50 global telecommunications company

Overview

Scope of Offering

Results

2000+ consultants for a leading global IT consulting and business process services company

Overview

Scope of Offering

Results