سلة تعلن عن وظيفة مهندس SRE أول (MLOps) في مكة المكرمة

Senior SRE Engineer (MLOps) - AI

🏢 سلة (Salla)

🕒 نُشرت: 17 يونيو 2026 (منذ شهر) 📍 مكة المكرمة وظائف الهندسة والتقنية

التقديم على الوظيفة من المصدر الرسمي ↗

تفاصيل الوظيفة

شركة سلة تبحث عن مهندس SRE أول (MLOps) للانضمام إلى فريق Salla AI في مكة المكرمة.

نبذة عن الوظيفة

يركز هذا الدور على تشغيل أنظمة الذكاء الاصطناعي والتعلم الآلي كنظم إنتاجية حقيقية، وليس تجارب جانبية، من خلال امتلاك الطبقة التشغيلية المحيطة بالنماذج والـ prompts والعوامل (agents) وخدمات الاستدلال وأنظمة الاسترجاع. ستعمل على تمكين ميزات الذكاء الاصطناعي التوليدي والوكيلي (Agentic AI) للعمل بموثوقية وأمان وفعالية من حيث التكلفة على نطاق واسع ضمن منظومة سلة. يتطلب الدور خبرة قوية في هندسة المنصات وSRE مع التركيز على الموثوقية والمراقبة والإصدارات الآمنة والتكلفة والحوكمة، مع التعاون الوثيق مع فرق الهندسة والبيانات والذكاء الاصطناعي لتوفير مسار سريع وآمن للإنتاج. تختلف أنظمة الذكاء الاصطناعي في فشلها عن الخدمات العادية - فتغيير prompt قد يتصرف كتغيير في الكود، وتحتاج العوامل التي تستدعي أدوات إلى قابلية التدقيق، وقد تتحرك زمن الاستجابة والجودة والتكلفة معًا بطرق غير مريحة.

المهام والمسؤوليات

تحمل مسؤولية موثوقية خدمات ML وAI الوكيلية في الإنتاج - SLOs، لوحات المعلومات، التنبيهات، أدلة التشغيل، ومتابعات الحوادث.
بناء مراقبة شاملة عبر مجموعة الذكاء الاصطناعي - زمن الاستجابة، الأخطاء، التتبعات، استدعاءات الأدوات، التكلفة، وتأثير المستخدم.
تصميم أنماط إصدار آمنة للنماذج والـ prompts والعوامل والأدوات والتكوين، تشمل الاستراتيجيات التجريبية (canary) والاسترجاع (rollback) وfeature-flag وتقييم العبور (evaluation-gate).
تقديم دعم تشغيلي لواجهات برمجة الاستدلال وقوائم الانتظار وطبقات الاسترجاع وسير عمل الذكاء الاصطناعي العاملة على Kubernetes/EKS.
تحديد ملكية وإمكانية التتبع والضوابط حول ما يُسمح للأنظمة الوكيلية (مثل Sidekick والمستشار النمائي) بفعله، بما في ذلك كيفية استدعاء الأدوات الداخلية.
الدفاع عن استدعاء أدوات العامل ضد هجمات حقن الـ prompt ومخاطر البيانات غير الموثوقة - إنشاء وإنفاذ حدود الثقة في البيانات لمنع المحتوى غير الموثوق من المتجر/التاجر من التلاعب بقرارات العامل أو استدعاءات الأدوات أو الإجراءات.
قيادة حوكمة تكلفة الذكاء الاصطناعي - رؤية الإنفاق لكل نموذج وكل pod، تتبع تكلفة التوكنات، والتنبيه على الحالات الشاذة.
بناء أتمتة ومسارات الخدمة الذاتية بحيث يكون لفرق المنتج مسار آمن معروف للإنتاج بدلاً من إعادة بنائه في كل مرة.
تحويل الآلام التشغيلية المتكررة إلى معايير منصة بسيطة قابلة لإعادة الاستخدام تتبناها الفرق الأخرى.
المشاركة في مناقشات التصميم الهندسي ومراجعة الكود واتخاذ القرارات التقنية.

الشروط والمتطلبات

4+ سنوات في SRE أو هندسة المنصات أو DevOps أو البنية التحتية الإنتاجية، مع تشغيل أنظمة موزعة في الإنتاج وليس فقط في العروض التوضيحية.
خبرة عملية مع Kubernetes والأنظمة السحابية الأصلية في الإنتاج.
إلمام بنشر مشاريع التعلم الآلي.
إتقان قوي لـ CI/CD وGitOps والمراقبة والاستجابة للحوادث.
خبرة صلبة في البنية التحتية كرمز (Infrastructure-as-Code) وإدارة الأسرار والشبكات.
القدرة على كتابة أتمتة أو أدوات للنظام الأساسي بلغة Python أو لغة مشابهة.
حكم إنتاجي - معرفة كيفية جعل الأنظمة قابلة للقياس والتصحيح والتكرار والآمنة للتغيير (لا حاجة لأن تكون باحثًا في التعلم الآلي).
القدرة على العمل عبر الفرق، وشرح المفاضلات بوضوح، وتحويل الألم التشغيلي إلى معايير سيستخدمها المهندسون فعليًا.

المهارات المطلوبة

خبرة في MLOps أو منصات التعلم الآلي - تقديم النماذج، السجلات، التقييم، تبعيات الميزات/البيانات، مراقبة الانحراف، أو خطوط أنابيب التعلم الآلي.
إلمام بتطبيقات LLM أو الأنظمة الوكيلية - RAG، قواعد البيانات المتجهة، استدعاء الأدوات، تنسيق سير العمل، الذاكرة، التتبعات، الضوابط، أو خطوط أنابيب التقييم.
الإلمام بأدوات مثل OpenTelemetry، Prometheus، Grafana، MLflow، KServe، Ray، LiteLLM، vLLM، LangGraph، Arize Phoenix، أو LangSmith.
خبرة مع مستهلكي Kafka، وأعباء عمل GPU، وتحسين الاستدلال، وتوجيه النماذج، أو حوكمة تكلفة الذكاء الاصطناعي.
خبرة العمل في فرق منتج متعددة الوظائف تضم مهندسي ذكاء اصطناعي ومهندسي خلفية ومهندسي واجهة أمامية.

عرض النص الأصلي للإعلان

Description

Salla is looking for a Senior SRE Engineer (MLOps) to join our Salla AI team. This role focuses on running our AI and ML systems as real production systems, not side experiments - owning the operational layer around models, prompts, agents, inference services, and retrieval systems. You will be responsible for enabling Agentic AI and Generative AI features to operate reliably, securely, and cost-effectively at scale within the Salla ecosystem.

This role is SRE- and platform-engineering-first, with a strong emphasis on reliability, observability, safe releases, cost, and governance, while collaborating closely with engineering, data, and AI teams to give every pod a fast, safe path to production. It exists because AI systems fail differently from normal services - a prompt change can behave like a code change, an agent calling tools needs auditability, and latency, quality, and cost can move together in uncomfortable ways.

Key Responsibilities

Own reliability for ML and agentic AI services in production - SLOs, dashboards, alerts, runbooks, and incident follow-ups
Build observability across the AI stack - latency, errors, traces, tool calls, cost, and user impact
Design safe-release patterns for models, prompts, agents, tools, and configuration, including canary, rollback, feature-flag, and evaluation-gate strategies
Provide operational support for inference APIs, queues, retrieval layers, and AI workflows running on Kubernetes/EKS
Establish ownership, traceability, and guardrails around what agentic systems (e.g. Sidekick, the growth advisor) are allowed to do, including how they call internal tools
Defend agent tool-calling against prompt injection and untrusted-data risks - establish and enforce data-trust boundaries so that untrusted store/merchant content cannot manipulate agent decisions, tool calls, or actions
Drive AI cost governance - per-model and per-pod spend visibility, token-cost tracking, and anomaly alerting
Build automation and self-service paths so product teams have a known safe path to production instead of rebuilding it each time
Turn recurring operational pain into simple, reusable platform standards that other teams adopt
Participate in architecture discussions, code reviews, and technical decision-making

Requirements

4+ years in SRE, platform engineering, DevOps, or production infrastructure, operating distributed systems in production - not only in demos
Hands-on experience with Kubernetes and cloud-native systems in production
Familiarity with deploying ML projects
Strong command of CI/CD, GitOps, observability, and incident response
Solid experience with infrastructure-as-code, secrets management, and networking
Ability to write automation or platform tooling in Python, or a similar language
Production judgment - knowing how to make systems measurable, debuggable, repeatable, and safe to change (you do not need to be a machine learning researcher)
Ability to work across teams, explain trade-offs clearly, and turn operational pain into standards engineers will actually use

Nice to have:

Experience with MLOps or ML platforms - model serving, registries, evaluation, feature/data dependencies, drift monitoring, or ML pipelines
Familiarity with LLM applications or agentic systems - RAG, vector databases, tool calling, workflow orchestration, memory, traces, guardrails, or evaluation pipelines
Exposure to tooling such as OpenTelemetry, Prometheus, Grafana, MLflow, KServe, Ray, LiteLLM, vLLM, LangGraph, Arize Phoenix, or LangSmith
Experience with Kafka consumers, GPU workloads, inference optimization, model routing, or AI cost governance
Experience working in cross-functional product teams involving AI, backend, and frontend engineers

المصدر: الموقع الرسمي للجهة - أُضيفت للموقع في 17 يونيو 2026

سلة تعلن عن وظيفة مهندس SRE أول (MLOps) في مكة المكرمة

تفاصيل الوظيفة

نبذة عن الوظيفة

المهام والمسؤوليات

الشروط والمتطلبات

المهارات المطلوبة

Description

Key Responsibilities

وظائف أخرى لدى سلة

شركة سلة تعلن عن وظيفة أخصائي تسجيل التجار في مكة المكرمة

سلة تعلن عن وظيفة أخصائي تأهيل التجار في مكة المكرمة

سلة تعلن عن وظيفة محلل مالي مبتدئ (FP&A Junior) في برنامج تمهير بمكة المكرمة

وظيفة عالم بيانات أول لدى سلة في المدينة المنورة

وظيفة مدير منتج أول - Enterprise لدى سلة بجدة

سلة تعلن عن وظيفة مدير هندسة الكتالوج في مكة المكرمة