First Principles for the MLOps Engineer
June 27, 2022

Taimur Rashid
Redis

Launching an airplane from an aircraft carrier is a systematic and well coordinated process that involves reliable systems, high-performance catapults, precise navigation systems, and above all, a specialized crew having different roles and responsibilities for managing air operations. This crew, also known as the flight deck crew, are known for their colored jerseys to visually distinguish their functions. Everyone associated with the flight deck has a specific job. As a corollary to this example, launching machine learning (ML) models into production are not entirely different, except instead of launching a 45,000-pound plane into air, ML teams are launching trained ML models into production to serve predictions.

There are several categorizations that define this function of enabling the whole process of taking trained ML models and launching them into production. One of those definitions is MLOps engineering and can be defined as the technical systems and processes associated with the stages of the ML lifecycle (also referred to as MLOps cycle) from data preparation, modeling building, and production deployment and management.

While MLOps engineering entails the provisioning, deployment, and management of infrastructure that enables model building, data labeling, and model inference, it can go much deeper than that. MLOps engineering can entail developing algorithms too.

Mature IT functions like data engineering, data preparation, and data quality all have corresponding personas that perform specific tasks, or in the frequently mentioned parlance, "Jobs to Be Done."

ML engineering also has a specific persona, and that is the MLOps Engineer. What do MLOps Engineers do?

For the sake of simplicity, MLOps Engineers design, deploy, and operate the underlying systems (infrastructure) that allow data science teams to do their jobs, which include feature engineering, model training, model validation, model refinement, just to name a few. MLOps Engineers also automate the process around those specific needs so that the work involved in launching ML models into production is streamlined, simplified, and instrumented.

Just like any other IT role, there is a broad spectrum of functional tasks MLOps Engineers can undertake. Fundamentally, a MLOps Engineer fuses software engineering expertise with knowledge of machine learning.

While the number of tools, frameworks, and approaches continue to expand and evolve, there are certain skill sets that are needed, which transcend the specific tools and frameworks. That’s why it’s important to ground the discussion on first principles. There is a core list of skill sets needed for an MLOps Engineer to carry out the specific tasks, and while not all are required, the tasks an MLOps Engineer undertakes is a function of the existing composition, size, and maturity of the broader ML team.

Some of these first principles or core skill sets entail:

1. Programming experience

2. Data science knowledge

3. Familiarity with math and statistics

4. Problem-solving skills

5. Proficiency with machine learning and deep learning frameworks

6. Hands-on experience with prototyping.

Related to these core skill sets are knowledge and experience with programming languages, DevOps tools, databases (relational, data warehousing, in-memory, etc). There are a variety of online resources that unpack the details related to skill sets, and this continues to evolve as more companies mainstream ML across their teams.

While definitions are important, the industry is still early in defining MLOps engineering and better characterizing the roles and responsibilities of a MLOps Engineer. In the journey towards understanding this domain, and the associated education and learning paths to become a MLOps Engineer, it’s important to not be too dogmatic across the board. By focusing on the Jobs to Be Done, and applying that to the context of the project, company process, and maturity of teams, companies can better structure and define the MLOps engineering crew that can launch ML models into production.

Taimur Rashid is Chief Business Development Officer at Redis
Share this

Industry News

May 02, 2024

Parasoft announces the opening of its new office in Northeast Ohio.

May 02, 2024

Postman released v11, a significant update that speeds up development by reducing collaboration friction on APIs.

May 02, 2024

Sysdig announced the launch of the company’s Runtime Insights Partner Ecosystem, recognizing the leading security solutions that combine with Sysdig to help customers prioritize and respond to critical security risks.

May 02, 2024

Nokod Security announced the general availability of the Nokod Security Platform.

May 02, 2024

Drata has acquired oak9, a cloud native security platform, and released a new capability in beta to seamlessly bring continuous compliance into the software development lifecycle.

May 01, 2024

Amazon Web Services (AWS) announced the general availability of Amazon Q, a generative artificial intelligence (AI)-powered assistant for accelerating software development and leveraging companies’ internal data.

May 01, 2024

Red Hat announced the general availability of Red Hat Enterprise Linux 9.4, the latest version of the enterprise Linux platform.

May 01, 2024

ActiveState unveiled Get Current, Stay Current (GCSC) – a continuous code refactoring service that deals with breaking changes so enterprises can stay current with the pace of open source.

May 01, 2024

Lineaje released Open-Source Manager (OSM), a solution to bring transparency to open-source software components in applications and proactively manage and mitigate associated risks.

May 01, 2024

Synopsys announced the availability of Polaris Assist, an AI-powered application security assistant on the Synopsys Polaris Software Integrity Platform®.

April 30, 2024

Backslash Security announced the findings of its GPT-4 developer simulation exercise, designed and conducted by the Backslash Research Team, to identify security issues associated with LLM-generated code. The Backslash platform offers several core capabilities that address growing security concerns around AI-generated code, including open source code reachability analysis and phantom package visibility capabilities.

April 30, 2024

Azul announced that Azul Intelligence Cloud, Azul’s cloud analytics solution -- which provides actionable intelligence from production Java runtime data to dramatically boost developer productivity -- now supports Oracle JDK and any OpenJDK-based JVM (Java Virtual Machine) from any vendor or distribution.

April 30, 2024

F5 announced new security offerings: F5 Distributed Cloud Services Web Application Scanning, BIG-IP Next Web Application Firewall (WAF), and NGINX App Protect for open source deployments.

April 29, 2024

Code Intelligence announced a new feature to CI Sense, a scalable fuzzing platform for continuous testing.

April 29, 2024

WSO2 is adding new capabilities for WSO2 API Manager, WSO2 API Platform for Kubernetes (WSO2 APK), and WSO2 Micro Integrator.