Please, let Extreme Networks know you found this job
on RemoteYeah.
This helps us grow 🌱.
Description:
We are seeking a talented Edge AI Staff Engineer with specialized expertise in GPU/TPU acceleration to join our team.
The ideal candidate will have extensive hands-on experience in local Large Language Models (LLM) inference with embedded GPU/TPU architectures.
As Staff Engineer specializing in Edge AI, you will play a crucial role in shaping the future Edge AI solution, leveraging the power of GPU/TPU acceleration and enterprise-grade, large-scale edge compute.
The successful candidate will combine technical excellence with effective leadership, creating a positive impact on both projects and team dynamics.
Key responsibilities include influencing the Edge AI strategy by providing expert advice on design and architecture, making critical decisions regarding technical directions, scalability, and system performance.
You will develop and optimize AI inference models for deployment on edge devices with embedded GPU/TPU accelerators, focusing on local Low Latency Model (LLM) inference.
Implement and fine-tune low-latency model inference pipelines to meet real-time performance requirements.
Collaborate with cross-functional teams to integrate AI inference solutions into edge computing platforms and applications.
Work with the GPU Hardware Design Team to design and optimize GPUs that power next-generation devices.
Conduct performance profiling and optimization to maximize the efficiency of GPU/TPU acceleration for local LLM inference.
Stay current with advancements in GPU/TPU technologies and edge AI frameworks, incorporating them into solution designs as appropriate.
Provide technical expertise and support to project teams, ensuring successful implementation and deployment of edge AI solutions.
Lead and inspire a team of engineers, providing guidance, setting goals, and ensuring collaboration.
Oversee project planning, execution, and delivery, ensuring alignment with business objectives.
Manage all phases of technical projects, from conception to completion, developing project specifications, tracking progress, and controlling costs.
Foster a positive work environment, encouraging professional growth and knowledge sharing.
Requirements:
A Bachelor’s degree in computer science, Engineering, or a related field is required; a Master’s degree is preferred.
A minimum of 5 years of hands-on experience in AI model development and deployment, with a focus on edge computing and local LLM inference is required.
Strong programming skills in languages such as Python and C++ are necessary.
Proficiency in LLM frameworks (e.g., vLLM, Text generation inference, OpenLLM, Ray Serve, and HuggingFace Transformers) and deep learning libraries is required.
Extensive experience with GPU/TPU acceleration for AI inference, including optimization techniques (tensor, pipeline, data, sharded data parallelism) and performance tuning is essential.
Hands-on experience with one or more GPU frameworks: CUDA, Vulkan, OpenCL is required.
Deep knowledge of GPU memory layout and familiarity with NVIDIA Jatison, ARM Mali, or relevant SoC configurations is necessary.
Knowledge of parallel computation, memory scheduling, and structural optimization is required.
Excellent problem-solving and analytical skills, with a passion for innovation and continuous learning are essential.
Benefits:
Join a company that values inclusion and fosters an atmosphere where all employees thrive because of their differences.
Be part of a global networking leader that is well-positioned to deliver scalable outcomes and accelerate digital transformation efforts.
Work in a positive environment that encourages professional growth and knowledge sharing.
Opportunity to influence the Edge AI strategy and work on cutting-edge technologies in GPU/TPU acceleration.
Engage in a role that combines technical excellence with effective leadership, impacting both projects and team dynamics.
Apply now
Please, let Extreme Networks know you found this job
on RemoteYeah
.
This helps us grow 🌱.