Fusemachines is a leading provider of AI strategy, talent, and education services, founded by Dr. Sameer Maskey.
The company aims to democratize AI and has a presence in four countries: Nepal, the United States, Canada, and the Dominican Republic.
This is a remote role offered as a 6-month contract, with the potential for full-time employment directly with the client afterward.
As a Data Scientist, you will contribute to new product development in a collaborative, small-team environment.
Your responsibilities will include writing production code for both run-time and build-time applications.
You will design and implement data-driven solutions for complex business challenges by working with large-scale natural language datasets.
The role involves prototyping new ideas and collaborating with data scientists, product designers, data engineers, front-end developers, and domain experts.
You will work in a fast-paced, start-up-like culture while leveraging the resources and scale of an established company.
Requirements:
You must have practical experience with large language models (LLMs), prompt engineering, fine-tuning RAG-based applications, and benchmarking using frameworks like LangChain.
A strong background in natural language processing (NLP) is required, with experience using tools such as spaCy, word2vec, Flair, and BERT.
Formal training in machine learning is necessary, including knowledge of dimensionality reduction, clustering, embeddings, and sequence classification algorithms.
Proficiency in Python and experience with ML frameworks like PyTorch, TensorFlow, and Hugging Face Transformers are essential.
You should have experience with cloud platforms such as AWS, GCP, or Azure.
An understanding of data modeling principles and complex data architectures is required.
Experience working with relational and NoSQL databases and vector stores (e.g., MySQL, Postgres, Solr, Elasticsearch, OpenSearch) is necessary.
Familiarity with distributed computing frameworks like Spark, Scala, or Ray is highly preferred.
Knowledge of API development, containerization (Docker, Kubernetes), and ML deployment is highly preferred.
Hands-on experience with ML Ops/AI Ops, including experiment tracking tools like LangFuse and DVC, is required.
Experience with deep learning frameworks such as PyTorch, TensorFlow, and Hugging Face Transformers is necessary.
A Master's degree in Data Science, Computer Science, Statistics, Machine Learning, or a related field is preferred.
You should have at least 5+ years of relevant work experience.
Benefits:
The position offers the opportunity to work remotely, providing flexibility in your work environment.
You will have the chance to contribute to innovative projects in a collaborative team setting.
The role allows for professional growth and the potential for full-time employment after the initial contract period.
You will be part of a diverse and inclusive workplace that values contributions from all qualified individuals.