Definition of Azure Data Factory
Azure Data Factory is a cloud-based data integration service. It does not store data itself, but allows you to create and monitor automated workflows that collect, integrate, and (to some extent) transform large volumes of data from disparate sources, and pass them on to other services that can store, transform, analyse and use the data. You can think of Azure Data Factory like a conveyor belt in a physical factory, where your data is a stream of products that are being sorted, collated and packaged.
Does your organisation need Azure Data Factory?
Azure Data Factory converts disparate data silos into trusted information that can be stored in a centralised repository. It can be used for a large-scale, single data migration (for example, if you want to migrate your data to an Azure storage service) or to continually process streams of data that are generated by your company from different sources, allowing you to implement data-driven processes that act on a central data repository.
You may need Azure Data Factory if:
you want to break away from data silos
your company utilises large volumes of data from disparate sources
you operate (or want to operate) data-driven processes
you make use of (or plan to make use of) cloud-based Azure data analysis and AI technologies
you have experienced bottlenecks when trying to prepare large quantities of data for analysis
you need to migrate a large volume of data between storage services
Benefits of Azure Data Factory include:
It can combine cloud-based and on-premises data securely and efficiently.
It streamlines and cleans data for use on a variety of Azure services.
It can handle structured, unstructured and semi-structured data, including streaming data.
The data pipelines it creates are highly available and fault-tolerant.
You have the option to set up processes via a visual interface or by writing code.
You can perform some transformations on your data within the pipeline you set up, making the output ready for analysis.
You can monitor and manage on-demand, trigger-based, and clock-driven custom flows.
Prerequisites and Integrations
To get started with Azure Data Factory, you need a Microsoft Azure account. Read more about Microsoft Azure here.
You do not need coding skills to set up and automate data migration pipelines with Microsoft Data Factory as there is a visual interface, however, if you want greater control of the pipeline and ability to customise processes you can use Python, .NET or ARM.
Azure Data Factory can take raw input data from a variety of sources and deliver trusted output data to a variety of sinks. Compatible sources and sinks include Azure storage services such as Azure Blob Storage, Azure Cosmos DB, and Azure SQL Database, as well as third-party products such as Amazon Redshift and Salesforce. A variety of data transformation activities can be performed as part of your data pipeline, including Hive, Pig and Hadoop Streaming scripts.
Security and Compliance
Azure Data Factory is built on Microsoft Azure security infrastructure and uses all Microsoft Azure security measures. You can read more about Microsoft Azure security here.
Azure Data Factory is certified by HIPAA, HITECH, ISO/IEC 27001, ISO/IEC 27018 and CSA STAR. Azure Data Factory can be used to ensure compliance with GDPR by setting up a process that will fetch and consolidate personal information about a person and send it to them if they make a GDPR information request.
Azure Data Factory is a pay-as-you-go service so you only pay for what you use, and there are no up-front set-up fees. Read more about Microsoft Azure and its pricing structure here.
When you set up data pipelines with Azure Data Factory you add a number of steps to each pipeline. The price of the service is based on the number of steps (called activities executed) and the time it takes to execute them each time they run.
You can read more about Azure Data Factory pricing here.
Alternatives to Azure Data Factory
Azure Data Factory is largely intended for Azure customers who need to integrate data from Microsoft and Azure sources. While Azure Data Factory does offer limited support for third-party sources such as Amazon Redshift, other data integration solutions, such as AWS Data Pipeline and Alooma, may be more appropriate for your business if you do not primarily use Azure data services.
If the majority of your data is stored in-house and you are not planning to migrate to a cloud storage solution, SQL Server Integration Services (SSIS) may also be a more appropriate tool for your data integration. SSIS also offers a wider range of transformation processes as part of the data pipeline than Azure Data Factory.