Posted on Nov 4, 01:28

DL Performance Software Engineer - LLM Inference

United States of America , Santa Clara
Remote
Visit company website

Hi everyone — I’m Akbar, Senior Manager of Deep Learning Inference Software at NVIDIA. I lead our engineering efforts around vLLM and SGLang, two of the most widely used open-source LLM inference frameworks.

We’re building teams focused on making LLM inference faster, more efficient, and more reliable at scale — from runtime and scheduling optimizations to kernel fusion, distributed serving, and continuous integration across new GPU architectures (Hopper, Blackwell, etc.).

We’re hiring for multiple roles:

  • Senior Deep Learning Software Engineer, Inference (Apply Here)
  • Engineering Manager, Deep Learning Inference (Apply Here)
  • DL Performance Software Engineer - LLM Inference (Apply Here)
  • DL Performance Software Engineer - LLM Inference (Apply Here)

These roles are remote-friendly (North America preferred) and fully focused on upstream open-source development — working directly with the maintainers and the wider AI community.

If you’re excited about large-scale inference, compiler/runtime performance, and pushing GPUs to their limits, we’d love to talk.