Data Engineer - Spark/Python Uncap Research Labs

  • company name Uncap Research Labs
  • working location Office Location
  • job type Full Time

Experience: 4 - 4 years required

Pay:

Salary Information not included

Type: Full Time

Location: Haryana

Skills: Apache Spark, Apache Kafka, Languages, Tools, Spark SQL, Spark Streaming, Apache Cassandra, Data Volume Management, Feature Engineering, Batch, Realtime Processing, Pipeline Integration, System Optimization, Scalable Data Services, Advanced Knowledge

About Uncap Research Labs

Job Description

Job Description Apache Spark : Expertise in Spark for large-scale data processing and analytics, including Spark SQL and Spark Streaming. Apache Kafka Proficiency in Kafka for real-time data streaming, message brokering, and building event-driven architectures. Apache Cassandra In-depth knowledge of Cassandra for scalable and high-performance distributed database management Data Volume Management Experience in managing and processing large volumes of data (e.g., 1.8 TB daily) efficiently. Feature Engineering Ability to build pipelines for feature engineering, integrating and transforming data from diverse source Batch And Real-Time Processing Proficient in developing and deploying production data pipelines for both batch and real-time data processing. Pipeline Integration Expertise in integrating data pipelines with various data sources and ensuring smooth data flow. System Optimization Experience in performance engineering for big data systems, including optimization techniques for both data processing and storage. Scalable Data Services Ability to architect and design scalable data services that efficiently handle high user demand and large data volumes. Advanced Knowledge Strong understanding of data structures and algorithms relevant to backend development and big data processing. Languages Proficiency in languages commonly used for backend development and data processing, such as Java, Scala, Python, or similar. Tools Familiarity with version control systems (e.g., Git) and deployment processes for managing code and infrastructure changes. (ref:hirist.tech),