Remote Machine Learning Engineer (Voice Cloning and Speech Synthesis) at Factored

Description:

Factored is seeking an experienced Machine Learning Engineer with expertise in text-to-speech (TTS) models and voice cloning technologies.
The role involves developing and optimizing ML models to enhance user experience for voice actors generating content in multiple languages.
Responsibilities include designing, developing, and optimizing TTS models while maintaining the style and authenticity of original voice actors.
The engineer will implement real-time, scalable voice cloning systems with under 1-second inference time.
Collaboration with teams on audio datasets, including voice recordings and multilingual transcriptions, is essential.
The position requires experimentation with models like StyleDiffusion and exploring advanced approaches for realistic speech synthesis.
Ensuring performance reliability across millions of users by scaling systems for high-demand scenarios is a key task.
The engineer will handle audio data preparation, including splitting, up/downsampling, and file management using tools like Whisper.
Integration of models into a cloud environment (e.g., AWS) for deployment and monitoring is also part of the role.

Candidates must have strong proficiency in Python and experience with machine learning frameworks such as TensorFlow or PyTorch.
Proven expertise in speech synthesis models and TTS technologies, focusing on realistic, human-like outputs, is required.
Experience with voice cloning and familiarity with models like StyleDiffusion or similar is necessary.
The ability to deliver real-time solutions with high-performance reliability in production environments is essential.
Experience working with audio datasets, including data preprocessing, splitting, upsampling/downsampling, and file management, is required.
Familiarity with multilingual models and working with transcriptions in multiple languages is expected.
Proficiency in cloud platforms like AWS and experience deploying machine learning models in production environments is necessary.
Experience with Whisper or similar tools for handling audio datasets is required.
Knowledge of traditional ML techniques, including XGBoost or gradient boosting for model optimization, is a plus.

Factored offers a transparent workplace where every employee has a voice in building the company.
The company is committed to investing in employees' career and professional growth in meaningful ways.
Employees are encouraged to work with passionate and intelligent colleagues, fostering a collaborative environment.
The company values honesty, diligence, and kindness, creating a positive workplace culture.
Employees have opportunities for learning and growth based on merit, not just experience.
Factored promotes a fun and engaging work environment, with activities such as making music, playing sports, and hosting parties.