ML Service Misuses

Catalog Proposal

Inefficient data transmission

Data Collection and Preprocessing

Refers to sub-optimal data transmission between components within an ML service-based system, such as between storage services, compute nodes, and other cloud resources. This results in increased latency, higher costs, and decreased performance.

Not using batch API for data loading

Data Collection and Preprocessing

Cloud providers offer batch-processing APIs to optimize data loading performance by handling data in batches. However, developers sometimes fail to use these batch APIs, opting instead to load data items individually or implement their batch-processing solutions. This misuse can lead to increased data transfer times, higher memory usage, and reduced overall performance.

Non specification of early stopping criteria

Training

ML cloud services often provide options for setting early stopping criteria to prevent overfitting and reduce unnecessary computational costs.

Avoiding parallel training experiments

Training

Cloud providers offer the capability to run parallel training experiments to speed up the model training process and improve the efficiency of the system. Disabling parallel experiments can slow down the model development process and limit the exploration of different approaches within using the same resources to find the best-performing model.

Not using automatic hyperparameter tuning

Training

ML cloud providers offer the capability to define the search space and automatically optimize ML models hyperparameters.

Not using training checkpoints

Training

Cloud providers offer the functionality to resume training from the most recent checkpoint to save the current state of the experiment, rather than starting from scratch. This can save significant time and computational resources, especially when training large and complex models.However, developers may neglect to save training checkpoints in cloud storage.

Bad choice of training compute targets

Training

Refers to selecting the non-optimal hardware and compute resources for training ML models. Cloud providers offer various types of training that compute targets, yet not all resources can be used for automated machine learning, machine learning pipelines, or designer.

Excluding algorithms in automated ML

Training

Cloud providers offer automated machine learning services that, for a given prediction task, perform experiments with various ML algorithms to generate an optimized model ready for deployment. However, developers may mistakenly exclude a promising candidate. algorithms when configuring these services, thereby limiting the effectiveness and performance of the resulting model.

Misinterpreting output

Testing

ML services offer pre-built models that operate on high-dimensional continuous representations yet often ultimately produce a small discrete set of outputs. Consequently, ML services’ output can contain complicated, easily misinterpretable semantics, leading to bugs.

Ignoring fairness evaluation

Testing

ML cloud providers offer the possibility of fairness. evaluation, which is crucial to ensure unbiased and equitable models. However, developers may rely solely on performance metrics such as accuracy or precision to evaluate the effectiveness of a model.

Ignoring testing schema mismatch

Testing

Cloud providers offer ML services to detect unmatched data schemas, which include feature or data distribution mismatches between training, testing, and production data, often by raising alerts. However, developers might ignore setting up these alerts or may disable them.

Using suboptimal evaluation metrics

Testing

Some ML services optimize and evaluate models based on a set of specified evaluation metrics. Those metrics determine how the model’s performance is measured during training and how it should be evaluated during testing. However, developers sometimes choose suboptimal evaluation metrics, which can lead to less effective models that do not align well with business needs or dataset characteristics.

Overwriting existing ML APIs without versioning

Deployment

Cloud providers ensures ML API versioning in through several practices such as Azure API management and AWS Management Console. Without version control, it becomes challenging to track changes, revert to previous versions, or understand the evolution of the deployed model. However, developers may ignore using those practices and unintentionally overwrite existing ML APIs without proper versioning. which can lead to potential issues in the production environment.

Choosing the wrong deployment endpoint

Deployment

Cloud providers offer online endpoints and batch endpoints for deployment. Online endpoints are mainly to operationalize models for real-time inference in synchronous low-latency requests. Meanwhile, batch endpoints are mainly to operationalize models or pipelines for long-running asynchronous inference. However, developers may choose the inappropriate deployment endpoint.

Disabling automatic rollbacks

Deployment

ML cloud service providers offer features that automatically roll back to a previous stable version of a model if the newly deployed version causes errors or performance issues. However, developers might disable this feature, allowing poorly performing models to remain in production and negatively impact the system’s performance

Disabling automatic scaling for online prediction service

Deployment

Automatic scaling for online prediction service helps manage varying rates of prediction requests while minimizing cloud usage costs. Disabling this feature stops dynamically adjusting resources based on demand to ensure sufficient capacity for online prediction services.

Improper handling of ML API limits

Serving

Refers to the non-respect of API rate limiting. Those limits are a set of measures put in place to help ensure the stability and performance of the ML API system. However, developers may not take care of limits for a cloud-based ML API, leading to a sudden stop in predictions when the quota is exceeded.

Misusing Synchronous/Asynchronous APIs with deployment type

Serving

Refers to using the inappropriate APIs for the deployment endpoint. It is not recommended to use asynchronous API requests with online predictions, meaning in situations that require timely inference and using synchronous APIs with batch endpoint deployment, meaning when an immediate response is not required and processing accumulated data by using a single request is sufficient.

Calling the wrong ML service API

Serving

Cloud providers often offer multiple ML APIs for the same task. Without a thorough understanding of these APIs, developers might call the wrong one, leading to the significantly degraded prediction accuracy, incorrect prediction results, or even software failures.

Ignore monitoring data drift

Monitoring

Refers to ignoring the need to continually assess modifications to the statistical characteristics or distribution of data used to ensure the expected performance. Cloud providers encourage using skew and drift detection to detect when the statistical properties of the incoming data change over time in a way that affects the ML service-based system.

Catalog Proposal

Inefficient data transmission

Not using batch API for data loading

Non specification of early stopping criteria

Avoiding parallel training experiments

Not using automatic hyperparameter tuning

Not using training checkpoints

Bad choice of training compute targets

Excluding algorithms in automated ML

Misinterpreting output

Ignoring fairness evaluation

Ignoring testing schema mismatch

Using suboptimal evaluation metrics

Overwriting existing ML APIs without versioning

Choosing the wrong deployment endpoint

Disabling automatic rollbacks

Disabling automatic scaling for online prediction service

Improper handling of ML API limits

Misusing Synchronous/Asynchronous APIs with deployment type

Calling the wrong ML service API

Ignore monitoring data drift

Inefficient data transmission

Example:

References:

Not using batch API for data loading

Example:

References:

Non specification of early stopping criteria

Example:

References:

Avoiding parallel training experiments

Example:

References:

Not using automatic hyperparameter tuning

Example:

References:

Not using training checkpoints

Example:

References:

Bad choice of training compute targets

Example:

References:

Excluding algorithms in automated ML

Example:

References:

Misinterpreting output

Example:

References:

Ignoring fairness evaluation

Example:

References:

Ignoring testing schema mismatch

Example:

References:

Using suboptimal evaluation metrics

Example:

References:

Overwriting existing ML APIs without versioning

Example:

References:

Choosing the wrong deployment endpoint

Example:

References:

Disabling automatic rollbacks

Example:

References:

Disabling automatic scaling for online prediction service

Example:

References:

Improper handling of ML API limits

Example:

References:

Misusing Synchronous/Asynchronous APIs with deployment

Example:

References:

Calling the wrong ML service API

Example:

References:

Ignore monitoring data drift

Example: