Data Science Engineer - IV Voice & Speech Tanla Platforms Limited
Tanla Platforms Limited
Office Location
Full Time
Experience: 5 - 5 years required
Pay:
Salary Information not included
Type: Full Time
Location: Hyderabad
Skills: ASr, TTS, Deep Learning, WebRTC, transformers, Docker, Kubernetes, aws, Azure, GCP, CUDA, Voice AI, ML Engineer, Speaker Diarization, wake word detection, audio processing pipelines, NLP technology, realtime transcription, VITS, Tacotron, FastSpeech, speaker identification, audio streaming pipelines, Websocket, grpc, Twilio Media Streams, IVR automation, AI contact center platforms, voice activity detection, GANs, VAEs, Diffusion Models, text normalization, G2P mapping, NLP intent extraction, emotionprosody control, modular system development, Cloud Deployment, ONNX, TorchScript, Model deployment
About Tanla Platforms Limited
Job Description
We are looking for a highly experienced Voice AI /ML Engineer to take the lead in designing and deploying real-time voice intelligence systems. This position specifically involves working on ASR, TTS, speaker diarization, wake word detection, and developing production-grade modular audio processing pipelines to support next-generation contact center solutions, intelligent voice agents, and high-quality audio systems. You will be operating at the convergence of deep learning, streaming infrastructure, and speech/NLP technology, with a focus on creating scalable, low-latency systems that cater to diverse audio formats and real-world applications. Your responsibilities will include: - Building, fine-tuning, and deploying ASR models such as Whisper, wav2vec2.0, and Conformer for real-time transcription. - Developing high-quality TTS systems using VITS, Tacotron, FastSpeech for natural-sounding voice generation. - Implementing speaker diarization to segment and identify speakers in multi-party conversations using embeddings and clustering techniques. - Designing wake word detection models with ultra-low latency and high accuracy even in noisy conditions. In addition to the above, you will also be involved in: - Architecting bi-directional real-time audio streaming pipelines utilizing WebSocket, gRPC, Twilio Media Streams, or WebRTC. - Integrating voice AI models into live voice agent solutions, IVR automation, and AI contact center platforms. - Building scalable microservices for audio processing, encoding, and streaming across various codecs and containers. - Leveraging deep learning and NLP techniques for speech and language tasks. Furthermore, you will be responsible for: - Developing reusable modules for different voice tasks and system components. - Designing APIs and interfaces for orchestrating voice tasks across multi-stage pipelines. - Writing efficient Python code, optimizing models for real-time inference, and deploying them on cloud platforms. Join us to be part of impactful work, tremendous growth opportunities, and an innovative environment at Tanla, where diversity is championed and inclusivity is valued.,