Azure Data Transfer and Transformation

Azure Data Factory

For transferring data from on-premis or another cloud provider to the Azure Cloud or Azure cloud with the same sunscription or different subscriptions the Azure Data Factory (ADF) is one of the available Azure service.

Azure Data Factory Concepts

  • Pipeline: is the processing steps, which is done on data. It’s a set of Activities.
  • Activity: is the action which is done on data for example it can be (Ingest data from )

Azure Databrick

For data transformation and enrichment.

Lambda Architecture in Google & Azure Could

Lambda Architecture Definition

Lambda Architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch stream-processing methods to design a robust, scalable and fault-tolerance (human and machine) big data systems.

Lamba Architecture tries tries also balancing between the latency & Accuracy.

Lambda Architecture Layers
Master Layery
Serving Layer
Speed Layer

Lambda Architecture Properties:

  • A paradigm for Big Data
  • In data processing for balance on throughput , latency, fault-tolerance and scalable.
  • For modern data warehouse

Applying the Lambda Architecture with Spark, Kafka, and Cassandra

The toolings are the following:

  • Spark Data Frame & Spark SQL in addition to Spark’s Data Source API to load, store and manipulate data.
  • Spark Streaming & Spark-Kafka Integration techniques -> for reliability and speed
  • Develop a Kafka Data Producer -> to simulate the real-time data stream feed into streaming application.
  • Stateful Spark Streaming Application -> to preserve global state and use memory efficiently with approximate algorithms.
  • Errors & Code updates -> when we build a stateful Spark streaming application and a production application isn’t complete without the ability to handle errors and code updates.
  • Persist Data to Cassandra & HDFS -> for working with the scalable NoSQL database and persist the data to Cassandra and HDFS.
Note

Your Text Here

Lambda Architecture on Azure, Google and AWS

AzureGCPAWS

Related links

References:

How to build a Big Data Pipeline