● Analyze, design, develop and maintain data pipelines.
● Identify and design manual and repetitive processes and automate them with tools such as Airflow and Spark.
● Design the infrastructure required for optimal ETL from various sources of log files, SQL and NoSQL databases.
● Collaborate with various teams such as product, design and data analysis to provide the necessary infrastructure.
● Implement all stacks with Docker-based tools.
Requirements
● Having in-depth knowledge of SQL and Python, Scala, or Java programming languages.
● Experience working with stream processing tools such as KSQLDB, KSQL, and Kafka Stream.
● Familiarity with Apache big data tools.
● Experience working with tools like Hadoop, Spark, and Kafka.
● Experience working with SQL and NoSQL databases (PostgreSQL, MongoDB, etc.).
● Experience working with Kafka connector tools for SQL and NoSQL databases.
● Familiarity with automation tools such as Apache Airflow and Apache NiFi.
● Teamwork skills and familiarity with Agile methodology.
● Ability to design and develop ML infrastructure and services.