Role Overview:
The Data Engineer will be a key builder on our AI journey, responsible for designing,
constructing, and maintaining the data infrastructure required to support our AI initiatives. This role will focus on building robust and scalable data pipelines to extract data from a variety of sources, integrate it with our data lake/warehouse, and prepare it for analysis by our Data Analysts and training custom AI models. This position is critical for enabling our focus on vendor-provided capabilities and eventually building custom solutions.
Key Responsibilities:
• Design, build, and maintain scalable and efficient ETL/ELT data pipelines to ingest
data from internal and external sources (e.g., APIs from EPIC, Workday, relational
databases, flat files). and data warehouse to ensure data is clean, accessible, and ready for analysis and
model training.
• Collaborate with the Data Analyst and other stakeholders to understand their data
requirements and provide them with clean, well-structured datasets.
• Implement data governance, security, and quality controls to ensure data integrity
and compliance.
• Automate data ingestion, transformation, and validation processes.
• Work with our broader IT team to ensure seamless integration of data infrastructure
with existing systems.
• Contribute to the evaluation and implementation of new data technologies and
tools.
Required Skills & Qualifications:
• ETL/ELT Development: Strong experience in designing and building data pipelines
using ETL/ELT tools and frameworks.
• SQL: Advanced proficiency in SQL for data manipulation, transformation, and
optimization.
• Experience with EPIC EMR and Informatica is required.
• Programming: Strong programming skills in Python (or a similar language) for
scripting, automation, and data processing.
• Data Warehousing: Experience with data warehousing concepts and technologies.
• Cloud Computing: Hands-on experience with at least one major cloud platform's
data services (e.g., Microsoft Azure Data Factory, Azure Fabric, IICS).
• Version Control: Proficiency with Git for code management and collaboration.
• Problem-Solving: Proven ability to troubleshoot and resolve data pipeline issues.
• Data Modeling: Experience with various data modeling techniques (e.g.,
dimensional modeling).
• Real-time Processing: Familiarity with real-time data streaming technologies (e.g.,
Kafka, Azure Event Hubs).
• Education: Bachelor's degree in Computer Science, Engineering, or related field.
Nice-to-Have Skills:
• API Integration: Experience building data connectors and integrating with APIs from major enterprise systems (e.g., EPIC, Workday).
• CI/CD: Knowledge of Continuous Integration/Continuous Deployment practices for
data pipelines.
• AI/ML MLOps: A basic understanding of the machine learning lifecycle and how to
build data pipelines to support model training and deployment.
• Experience with Microsoft Fabric: Direct experience with Microsoft Fabric's
integrated data platform (OneLake, Data Factory, Synapse Data Engineering).