The Data Integration Lead/Architect will be responsible for designing, building, and optimizing the high-volume, reliable data pipelines that integrate data from Azure Databricks (source) environments into the Azure Data Lake Storage (ADLS) and ultimately into the Snowflake Data Warehouse (target). This role requires deep expertise in the Microsoft Azure Data stack, Databricks/Spark optimization, and Snowflake architecture to ensure data quality, security, and performance across the entire data lifecycle.
Blueprint Development: Design and document end-to-end scalable, high-performance, and secure data integration architectures leveraging the Azure Data Platform (Databricks, ADLS, Azure Synapse Analytics, etc.).
ELT/ETL Strategy: Define and implement the optimal ELT (Extract, Load, Transform) strategy for data movement, prioritizing Databricks/Spark for complex transformations and Snowflake for warehousing and consumption.
Data Modeling: Architect and govern the data models within the Data Lake (e.g., Delta Lake format) and Snowflake, ensuring consistency and efficiency for reporting and analytics.
Databricks/Spark Development: Lead the development of robust, optimized data ingestion and transformation pipelines using Python/PySpark or Scala within Azure Databricks notebooks.
Pipeline Orchestration: Implement and manage pipeline orchestration using tools like Azure Data Factory (ADF) or Databricks Workflows to schedule, monitor, and manage dependencies.
Snowflake Integration: Design and implement efficient data loading mechanisms into Snowflake, leveraging features like Snowpipe, bulk loading via S3/ADLS, and optimizing SQL queries.
Performance Tuning: Identify and resolve performance bottlenecks in Databricks/Spark jobs (e.g., cluster sizing, shuffle optimization) and within Snowflake (e.g., micro-partitions, clustering keys).
Data Quality: Establish comprehensive data quality checks and validation frameworks across integration layers to ensure accuracy and completeness of data moving from source to target.
Security & Compliance: Ensure all data movement adheres to enterprise security standards, including encryption at rest and in transit, access control (RBAC in Azure/Snowflake), and compliance requirements.
Technical Leadership: Act as the subject matter expert and technical lead for all data integration projects.
Code Review: Perform rigorous code reviews for data engineering team members, ensuring adherence to best practices, coding standards, and performance benchmarks.
Documentation: Create and maintain detailed technical design specifications, architecture diagrams, and operational runbooks.
Qualifications
Bachelor's or Master's degree in Computer Science, Data Engineering, or a related quantitative field.
7+ years of progressive experience in Data Engineering, with at least 3+ years focused on Data Integration Architecture.
Proven experience designing and deploying solutions in a Microsoft Azure/Databricks/Snowflake environment is mandatory.
Azure certification (e.g., Azure Data Engineer Associate - DP-203) is a significant plus.