The LLM & RAG Solutions Architect at BlackStone eIT will be responsible for designing and implementing solutions that leverage Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques.
This role focuses on creating innovative solutions that enhance data retrieval, natural language processing, and information delivery for clients.
Responsibilities include developing architectures that incorporate LLM and RAG technologies to improve client solutions.
The architect will collaborate with data scientists, engineers, and business stakeholders to understand requirements and translate them into effective technical solutions.
They will design and implement workflows that integrate LLMs with existing data sources for enhanced information retrieval.
The role involves evaluating and selecting appropriate tools and frameworks for building and deploying LLM and RAG solutions.
Conducting research on emerging trends in LLMs and RAG to inform architectural decisions is also a key responsibility.
The architect must ensure the scalability, security, and performance of LLM and RAG implementations.
Providing technical leadership and mentorship to development teams in LLM and RAG best practices is expected.
They will develop and maintain comprehensive documentation on solution architectures, workflows, and processes.
Engaging with clients to communicate technical strategies and educate them on the benefits of LLM and RAG is essential.
Monitoring and troubleshooting implementations to ensure optimal operation and address any arising issues is part of the role.
Requirements:
Proven experience in multi-agent chatbot architectures is required, with hands-on experience designing and implementing multi-agent conversational systems that allow for scalable, modular interaction handling.
The candidate must have demonstrated capability in deploying and integrating large language models (LLMs) in on-premise environments, ensuring data security and compliance.
Prior experience in successfully implementing RAG pipelines is necessary, including knowledge of embedding strategies, vector databases, document chunking, and query optimization.
A deep understanding of optimizing RAG systems for performance and relevance is required, including latency reduction, caching strategies, embedding quality improvements, and hybrid retrieval techniques.
Familiarity with open-source LLMs (e.g., LLaMA, Qwen, Mistral, Falcon) is preferred but not mandatory.
Experience with vector databases such as VectorDB, FAISS, Weaviate, Qdrant, etc., is also preferred.
Knowledge of workflow orchestration using frameworks like LangChain, LlamaIndex, Haystack, etc., is a plus.
Benefits:
The position offers paid time off to support work-life balance.
A performance bonus is available to reward outstanding contributions.
Opportunities for training and development are provided to enhance professional growth.