Industry: Data Services Partner
Job Type: Permanent
Job Location: Makati
Work Setup: Fkexible
Experience Level: Senior
Responsibilities
- Design and implement data processing systems using distributed frameworks like Hadoop, Spark, Snowflake, Airflow, or other similar technologies.
- Build data pipelines to ingest data from various sources such as databases, APIs, or streaming platforms.
- Integrate and transform data to ensure its compatibility with the target data model or format.
- Design and optimize data storage architectures, including data lakes, data warehouses, or distributed file systems.
- Implement techniques like partitioning, compression, or indexing to optimize data storage and retrieval.
- Identify and resolve bottlenecks, tuning queries, and implementing caching strategies to enhance data retrieval speed and overall system efficiency.
- Design and implement data models that support efficient data storage, retrieval, and analysis.
- Collaborate with data scientists and analysts to understand their requirements and provide them with well-structured and optimized data for analysis and modeling purposes.
- Utilize frameworks like Hadoop or Spark to perform distributed computing tasks, such as parallel processing, distributed data processing, or machine learning algorithms.
- Implement security measures to protect sensitive data and ensuring compliance with data privacy regulations.
- Establish data governance practices to maintain data integrity, quality, and consistency.
- Monitor system performance, identifying anomalies, and conducting root cause analysis to ensure smooth and uninterrupted data operations.
- Communicating complex technical concepts to non-technical stakeholders in a clear and concise manner.
- Stay updated with emerging technologies, tools, and techniques in the field of big data engineering.
Requirements:
- Strong analytical thinking and problem-solving skills
- Strong communication skillset – ability to translate technical details to business/non-technical stakeholders
- Extensive experience in designing and building data pipelines (ELT/ETL) for large-scale datasets. Familiarity with tools like Databricks, Apache Nifi, Apache Airflow, or Informatica is advantageous
- Proficiency in programming languages such as Python, R or Scala is essential.
- In-depth knowledge and experience with distributed systems and technologies, including On-prem Platforms, Apache Hadoop, Spark, Hive or similar frameworks. Familiarity with cloud-based platforms like AWS, Azure, or Google Cloud is highly desirable.
- Solid understanding of data processing techniques such as batch processing, real-time streaming, and data integration. Experience with data analytics tools and frameworks like Apache Kafka, Apache Flink, or Apache Storm is a plus.
- Experience with Azure Data Services – Databricks and Data Factory
- Experience with Git repository maintenance and DevOps concepts
- Familiarity with building, testing, and deploying process
- Additional certifications in big data technologies or cloud platforms are advantageous