The company announced new copy capability limits for Azure Data Factory, while also highlighting the use of parallelism and PolyBase to boost movement speed.
Microsoft has announced that it is making data movement faster on the Azure platform, making Azure Data Factory as more rapid service. In an official blog post the company detailed the increased performance by boosting the throughput of data through Azure Data Factory.
Copy capability now has the following limits:
Ingest 1 TB data into Azure Blob Storage from on-premises File System and Azure Blob Storage in about three hours (i.e. @ 100 MBps)
Ingest 1 TB data into Azure Data Lake Store from on-premises File System and Azure Blob Storage in about three hours (i.e. @ 100 MBps)
Ingest 1 TB data into Azure SQL Data Warehouse from Azure Blob Storage in about three hours (i.e. @ 100 MBps)
In the post, Microsoft expands on how it has achieved increased performance in Azure Data Factory, citing parallelism as one factor. Data movement time has been reduced by allowing the service to read source data and write data to a destination at the same time:
“You now have ways to specify the parallelism factor when reading the data from the source store and when writing the data to the destination store. You could also decide to not specify it and the service will automatically figure out the best for you.”
Microsoft is also using PolyBase for data movement for some types of format, with the company saying that it has seen a 300x performance boosts when loading large amounts of data into Azure SQL Data Warehouse with PolyBase.
However, does not work with all types of data or stores, so users will need to move data to Azure Blob Storage as a store first before using PolyBase for further movement.