MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model's predictions?

Options:

Use SageMaker Model Monitor on the deployed model.

Use SageMaker Clarify on the deployed model.

Show the distribution of inferences from A/В testing in Amazon CloudWatch.

Add a shadow endpoint. Analyze prediction differences on samples.

Buy Now

Questions 5

A company needs to give its ML engineers appropriate access to training data. The ML engineers must access training data from only their own business group. The ML engineers must not be allowed to access training data from other business groups.

The company uses a single AWS account and stores all the training data in Amazon S3 buckets. All ML model training occurs in Amazon SageMaker.

Which solution will provide the ML engineers with the appropriate access?

Options:

Enable S3 bucket versioning.

Configure S3 Object Lock settings for each user.

Add cross-origin resource sharing (CORS) policies to the S3 buckets.

Create IAM policies. Attach the policies to IAM users or IAM roles.

Buy Now

Questions 6

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker compute costs reach a specific threshold.

Which solution will meet these requirements?

Options:

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

Add resource tagging by editing the SageMaker user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is reached.

Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Buy Now

Questions 7

A company collects customer data daily and stores it as compressed files in an Amazon S3 bucket partitioned by date. Each month, analysts process the data, check data quality, and upload results to Amazon QuickSight dashboards.

An ML engineer needs to automatically check data quality before the data is sent to QuickSight, with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Run an AWS Glue crawler monthly and use AWS Glue Data Quality rules to check data quality.

Run an AWS Glue crawler and create a custom AWS Glue job with PySpark to evaluate data quality.

Use AWS Lambda with Python scripts triggered by S3 uploads to evaluate data quality.

Send S3 events to Amazon SQS and use Amazon CloudWatch Insights to evaluate data quality.

Buy Now

Questions 8

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

Put each day's time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Buy Now

Questions 9

A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

Options:

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

Create a SageMaker processing job by calling the SageMaker Python SDK.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Buy Now

Questions 10

A company is using Amazon SageMaker AI to develop a credit risk assessment model. During model validation, the company finds that the model achieves 82% accuracy on the validation data. However, the model achieved 99% accuracy on the training data. The company needs to address the model accuracy issue before deployment.

Which solution will meet this requirement?

Options:

Add more dense layers to increase model complexity. Implement batch normalization. Use early stopping during training.

Implement dropout layers. Use L1 or L2 regularization. Perform k-fold cross-validation.

Use principal component analysis (PCA) to reduce the feature dimensionality. Decrease model layers. Implement cross-entropy loss functions.

Augment the training dataset. Remove duplicate records from the training dataset. Implement stratified sampling.

Buy Now

Questions 11

A company has historical data that shows whether customers needed long-term support from company staff. The company needs to develop an ML model to predict whether new customers will require long-term support.

Which modeling approach should the company use to meet this requirement?

Options:

Anomaly detection

Linear regression

Logistic regression

Semantic segmentation

Buy Now

Questions 12

A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data.

Which solution will meet these requirements?

Options:

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a profile job.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a recipe job.

Buy Now

Questions 13

An ML engineer develops a neural network model to predict whether customers will continue to subscribe to a service. The model performs well on training data. However, the accuracy of the model decreases significantly on evaluation data.

The ML engineer must resolve the model performance issue.

Which solution will meet this requirement?

Options:

Penalize large weights by using L1 or L2 regularization.

Remove dropout layers from the neural network.

Train the model for longer by increasing the number of epochs.

Capture complex patterns by increasing the number of layers.

Buy Now

Questions 14

A digital media entertainment company needs real-time video content moderation to ensure compliance during live streaming events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon Rekognition and AWS Lambda to extract and analyze the metadata from the videos' image frames.

Use Amazon Rekognition and a large language model (LLM) hosted on Amazon Bedrock to extract and analyze the metadata from the videos’ image frames.

Use Amazon SageMaker AI to extract and analyze the metadata from the videos' image frames.

Use Amazon Transcribe and Amazon Comprehend to extract and analyze the metadata from the videos' image frames.

Buy Now

Questions 15

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.

Which solution will meet these requirements?

Options:

Use Amazon Athena to automatically detect the anomalies and to visualize the result.

Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Buy Now

Answer:

Explanation:

Amazon SageMaker Data Wrangler is a comprehensive tool that streamlines the process of data preparation and offers built-in capabilities for anomaly detection and visualization.

Key Features of SageMaker Data Wrangler:

Data Importation: Connects seamlessly to various data sources, including Amazon S3 and on-premises databases, facilitating the aggregation of transaction logs, customer profiles, and MySQL tables.

Anomaly Detection: Provides built-in analyses to detect anomalies in time series data, enabling the identification of outliers that may indicate fraudulent activities.

Visualization: Offers a suite of visualization tools, such as histograms and scatter plots, to help understand data distributions and relationships, which are crucial for feature engineering and model development.

Implementation Steps:

Data Aggregation:

Import data from Amazon S3 and on-premises MySQL databases into SageMaker Data Wrangler.

Utilize Data Wrangler's data flow interface to combine and preprocess datasets, ensuring a unified dataset for analysis.

Anomaly Detection:

Apply the anomaly detection analysis feature to identify outliers in the dataset.

Configure parameters such as the anomaly threshold to fine-tune the detection sensitivity.

Visualization:

Use built-in visualization tools to create charts and graphs that depict data distributions and highlight anomalies.

Interpret these visualizations to gain insights into potential fraud patterns and feature interdependencies.

Advantages of Using SageMaker Data Wrangler:

Integrated Workflow: Combines data preparation, anomaly detection, and visualization within a single interface, streamlining the ML development process.

Operational Efficiency: Reduces the need for multiple tools and complex integrations, thereby minimizing operational overhead.

Scalability: Handles large datasets efficiently, making it suitable for extensive transaction logs and customer profiles.

By leveraging SageMaker Data Wrangler, the ML engineer can effectively detect anomalies and visualize results, facilitating the development of a robust fraud detection model.

Analyze and Visualize - Amazon SageMaker

Transform Data - Amazon SageMaker

Questions 16

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now

Questions 17

An ML engineer must choose the appropriate Amazon SageMaker algorithm to solve specific AI problems.

Select the correct SageMaker built-in algorithm from the following list for each use case. Each algorithm should be selected one time.

• Random Cut Forest (RCF) algorithm

• Semantic segmentation algorithm

• Sequence-to-Sequence (seq2seq) algorithm

Options:

Buy Now

Questions 18

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

Options:

Buy Now

Questions 19

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

Options:

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Buy Now

Questions 20

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

Options:

Optimize for accuracy. Use image augmentation on the less common images.

Optimize for F1 score. Use image augmentation on the less common images.

Optimize for accuracy. Use SMOTE to generate synthetic images.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Buy Now

Questions 21

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

Accuracy

Area Under the ROC Curve (AUC)

F1 score

Mean absolute error (MAE)

Buy Now

Questions 22

A company is using Amazon SageMaker AI to build an ML model to predict customer behavior. The company needs to explain the bias in the model to an auditor. The explanation must focus on demographic data of the customers.

Which solution will meet these requirements?

Options:

Use SageMaker Clarify to generate a bias report. Send the report to the auditor.

Use AWS Glue DataBrew to create a job to detect drift in the model's data quality. Send the job output to the auditor.

Use Amazon QuickSight integration with SageMaker AI to generate a bias report. Send the report to the auditor.

Use Amazon CloudWatch metrics from the SageMaker AI namespace to create a bias dashboard. Share the dashboard with the auditor.

Buy Now

Questions 23

An ML engineer is using AWS CodeDeploy to deploy new container versions for inference on Amazon ECS.

The deployment must shift 10% of traffic initially, and the remaining 90% must shift within 10–15 minutes.

Which deployment configuration meets these requirements?

Options:

CodeDeployDefault.LambdaLinear10PercentEvery10Minutes

CodeDeployDefault.ECSAllAtOnce

CodeDeployDefault.ECSCanary10Percent15Minutes

CodeDeployDefault.LambdaCanary10Percent15Minutes

Buy Now

Questions 24

Case study

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

Options:

LightGBM

Linear learner

К-means clustering

Neural Topic Model (NTM)

Buy Now

Questions 25

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model's F1 score decreases significantly.

What could be the reason for the reduced F1 score?

Options:

Concept drift occurred in the underlying customer data that was used for predictions.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

The original baseline data had a data quality issue of missing values.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Buy Now

Questions 26

An ML engineer is designing an AI-powered traffic management system. The system must use near real-time inference to predict congestion and prevent collisions.

The system must also use batch processing to perform historical analysis of predictions over several hours to improve the model. The inference endpoints must scale automatically to meet demand.

Which combination of solutions will meet these requirements? (Select TWO.)

Options:

Use Amazon SageMaker real-time inference endpoints with automatic scaling based on ConcurrentInvocationsPerInstance.

Use AWS Lambda with reserved concurrency and SnapStart to connect to SageMaker endpoints.

Use an Amazon SageMaker Processing job for batch historical analysis. Schedule the job with Amazon EventBridge.

Use Amazon EC2 Auto Scaling to host containers for batch analysis.

Use AWS Lambda for historical analysis.

Buy Now

Questions 27

An ML engineer has a custom container that performs k-fold cross-validation and logs an average F1 score during training. The ML engineer wants Amazon SageMaker AI Automatic Model Tuning (AMT) to select hyperparameters that maximize the average F1 score.

How should the ML engineer integrate the custom metric into SageMaker AI AMT?

Options:

Define the average F1 score in the TrainingInputMode parameter.

Define a metric definition in the tuning job that uses a regular expression to capture the average F1 score from the training logs.

Publish the average F1 score as a custom Amazon CloudWatch metric.

Write the F1 score to a JSON file in Amazon S3 and reference it in ObjectiveMetricName.

Buy Now

Questions 28

A company wants to share data with a vendor in real time to improve the performance of the vendor's ML models. The vendor needs to ingest the data in a stream. The vendor will use only some of the columns from the streamed data.

Which solution will meet these requirements?

Options:

Use AWS Data Exchange to stream the data to an Amazon S3 bucket. Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) query to define relevant columns.

Use Amazon Kinesis Data Streams to ingest the data. Use Amazon Managed Service for Apache Flink as a consumer to extract relevant columns.

Create an Amazon S3 bucket. Configure the S3 bucket policy to allow the vendor to upload data to the S3 bucket. Configure the S3 bucket policy to control which columns are shared.

Use AWS Lake Formation to ingest the data. Use the column-level filtering feature in Lake Formation to extract relevant columns.

Buy Now

Questions 29

A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ТВ in size and consists of CSV, JSON, Apache Parquet, and simple text files.

The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated.

Which solution will meet these requirements?

Options:

Process data at each step by using Amazon SageMaker Data Wrangler. Automate the process by using Data Wrangler jobs.

Use Amazon SageMaker notebooks for each data processing step. Automate the process by using Amazon EventBridge.

Process data at each step by using AWS Lambda functions. Automate the process by using AWS Step Functions and Amazon EventBridge.

Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge.

Buy Now

Questions 30

A healthcare company wants to detect irregularities in patient vital signs that could indicate early signs of a medical condition. The company has an unlabeled dataset that includes patient health records, medication history, and lifestyle changes.

Which algorithm and hyperparameter should the company use to meet this requirement?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to greater than 100 to regulate tree complexity.

Use the Amazon SageMaker AI k-means clustering algorithm. Set k to determine the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to the number of training iterations.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set num_trees to greater than 100.

Buy Now

Questions 31

A company is building a near real-time data analytics application to detect anomalies and failures for industrial equipment. The company has thousands of IoT sensors that send data every 60 seconds. When new versions of the application are released, the company wants to ensure that application code bugs do not prevent the application from running.

Which solution will meet these requirements?

Options:

Use Amazon Managed Service for Apache Flink with the system rollback capability enabled to build the data analytics application.

Use Amazon Managed Service for Apache Flink with manual rollback when an error occurs to build the data analytics application.

Use Amazon Data Firehose to deliver real-time streaming data programmatically for the data analytics application. Pause the stream when a new version of the application is released and resume the stream after the application is deployed.

Use Amazon Data Firehose to deliver data to Amazon EC2 instances across two Availability Zones for the data analytics application.

Buy Now

Questions 32

An ML engineer is developing a classification model. The ML engineer needs to use custom libraries in processing jobs, training jobs, and pipelines in Amazon SageMaker AI.

Which solution will provide this functionality with the LEAST implementation effort?

Options:

Manually install the libraries in the SageMaker AI containers.

Build a custom Docker container that includes the required libraries. Host the container in Amazon Elastic Container Registry (Amazon ECR). Use the ECR image in the SageMaker AI jobs and pipelines.

Use a SageMaker AI notebook instance and install libraries at startup.

Run code externally on Amazon EC2 and import results into SageMaker AI.

Buy Now

Questions 33

A company wants to build an anomaly detection ML model. The model will use large-scale tabular data that is stored in an Amazon S3 bucket. The company does not have expertise in Python, Spark, or other languages for ML.

An ML engineer needs to transform and prepare the data for ML model training.

Which solution will meet these requirements?

Options:

Prepare the data by using Amazon EMR Serverless applications that host Amazon SageMaker Studio notebooks.

Prepare the data by using the Amazon SageMaker Data Wrangler visual interface in Amazon SageMaker Canvas.

Run SQL queries from a JupyterLab space in Amazon SageMaker Studio. Process the data further by using pandas DataFrames.

Prepare the data by using a JupyterLab notebook in Amazon SageMaker Studio.

Buy Now

Questions 34

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.

An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.

Which solution will meet these requirements?

Options:

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Buy Now

Questions 35

An ML engineer decides to use Amazon SageMaker AI automated model tuning (AMT) for hyperparameter optimization (HPO). The ML engineer requires a tuning strategy that uses regression to slowly and sequentially select the next set of hyperparameters based on previous runs. The strategy must work across small hyperparameter ranges.

Which solution will meet these requirements?

Options:

Grid search

Random search

Bayesian optimization

Hyperband

Buy Now

Questions 36

A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 ТВ of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker.

An ML engineer must make the training data accessible for ML models that are in the SageMaker environment.

Which solution will meet these requirements?

Options:

Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.

Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Buy Now

Questions 37

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of data quality for the models and must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and send alerts.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and send alerts.

Deploy the models by using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor the data quality and send alerts.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and send alerts.

Buy Now

Questions 38

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company is experimenting with consecutive training jobs.

How can the company MINIMIZE infrastructure startup times for these jobs?

Options:

Use Managed Spot Training.

Use SageMaker managed warm pools.

Use SageMaker Training Compiler.

Use the SageMaker distributed data parallelism (SMDDP) library.

Buy Now

Questions 39

A travel company wants to create an ML model to recommend the next airport destination for its users. The company has collected millions of data records about user location, recent search history on the company's website, and 2,000 available airports. The data has several categorical features with a target column that is expected to have a high-dimensional sparse matrix.

The company needs to use Amazon SageMaker AI built-in algorithms for the model. An ML engineer converts the categorical features by using one-hot encoding.

Which algorithm should the ML engineer implement to meet these requirements?

Options:

Use the CatBoost algorithm to recommend the next airport destination.

Use the DeepAR forecasting algorithm to recommend the next airport destination.

Use the Factorization Machines algorithm to recommend the next airport destination.

Use the k-means algorithm to cluster users into groups and map each group to the next airport destination.

Buy Now

Questions 40

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

Options:

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

Randomly oversample the majority class in the validation dataset.

Apply k-means clustering to undersample the minority class in the test dataset.

Buy Now

Questions 41

A company is building an enterprise AI platform. The company must catalog models for production, manage model versions, and associate metadata such as training metrics with models. The company needs to eliminate the burden of managing different versions of models.

Which solution will meet these requirements?

Options:

Use the Amazon SageMaker Model Registry to catalog the models. Create unique tags for each model version. Create key-value pairs to maintain associated metadata.

Use the Amazon SageMaker Model Registry to catalog the models. Create model groups for each model to manage the model versions and to maintain associated metadata.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model. Use the repositories to catalog the models and to manage model versions and associated metadata.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model. Create unique tags for each model version. Create key-value pairs to maintain associated metadata.

Buy Now

Questions 42

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Buy Now

Questions 43

A company is creating an ML model to identify defects in a product. The company has gathered a dataset and has stored the dataset in TIFF format in Amazon S3. The dataset contains 200 images in which the most common defects are visible. The dataset also contains 1,800 images in which there is no defect visible.

An ML engineer trains the model and notices poor performance in some classes. The ML engineer identifies a class imbalance problem in the dataset.

What should the ML engineer do to solve this problem?

Options:

Use a few hundred images and Amazon Rekognition Custom Labels to train a new model.

Undersample the 200 images in which the most common defects are visible.

Oversample the 200 images in which the most common defects are visible.

Use all 2,000 images and Amazon Rekognition Custom Labels to train a new model.

Buy Now

Questions 44

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Buy Now

Questions 45

A company is using an AWS Lambda function to monitor the metrics from an ML model. An ML engineer needs to implement a solution to send an email message when the metrics breach a threshold.

Which solution will meet this requirement?

Options:

Log the metrics from the Lambda function to AWS CloudTrail. Configure a CloudTrail trail to send the email message.

Log the metrics from the Lambda function to Amazon CloudFront. Configure an Amazon CloudWatch alarm to send the email message.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure a CloudWatch alarm to send the email message.

Log the metrics from the Lambda function to Amazon CloudWatch. Configure an Amazon CloudFront rule to send the email message.

Buy Now

Questions 46

An ML engineer is developing a neural network to run on new user data. The dataset has dozens of floating-point features. The dataset is stored as CSV objects in an Amazon S3 bucket. Most objects and columns are missing at least one value. All features are relatively uniform except for a small number of extreme outliers. The ML engineer wants to use Amazon SageMaker Data Wrangler to handle missing values before passing the dataset to the neural network.

Which solution will provide the MOST complete data?

Options:

Drop samples that are missing values.

Impute missing values with the mean value.

Impute missing values with the median value.

Drop columns that are missing values.

Buy Now

Questions 47

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records per second.

The company needs a scalable AWS solution to identify anomalous data points with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Ingest data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to detect anomalies.

Ingest data into Kinesis Data Streams. Deploy a SageMaker AI endpoint and use AWS Lambda to detect anomalies.

Ingest data into Apache Kafka on Amazon EC2 and use SageMaker AI for detection.

Send data to Amazon SQS and use AWS Glue ETL jobs for batch anomaly detection.

Buy Now

Questions 48

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

Options:

Create code to evaluate each instance's memory and compute usage.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

Check AWS CloudTrail event history for the creation of the resources.

Run AWS Compute Optimizer.

Buy Now

Questions 49

An ML engineer needs to use Amazon SageMaker to fine-tune a large language model (LLM) for text summarization. The ML engineer must follow a low-code no-code (LCNC) approach.

Which solution will meet these requirements?

Options:

Use SageMaker Studio to fine-tune an LLM that is deployed on Amazon EC2 instances.

Use SageMaker Autopilot to fine-tune an LLM that is deployed by a custom API endpoint.

Use SageMaker Autopilot to fine-tune an LLM that is deployed on Amazon EC2 instances.

Use SageMaker Autopilot to fine-tune an LLM that is deployed by SageMaker JumpStart.

Buy Now

Questions 50

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users.

Which solution will meet these requirements?

Options:

Set up SageMaker Debugger and create a custom rule.

Set up blue/green deployments with all-at-once traffic shifting.

Set up blue/green deployments with canary traffic shifting.

Set up shadow testing with a shadow variant of the new model.

Buy Now

Questions 51

A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks.

What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

Options:

Adjust the model's parameters and hyperparameters.

Initiate a manual Model Monitor job that uses the most recent production data.

Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

Include additional data in the existing training set for the model. Retrain and redeploy the model.

Buy Now

Questions 52

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer's AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

Options:

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Buy Now

Questions 53

A company must install a custom script on any newly created Amazon SageMaker AI notebook instances.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

Create a lifecycle configuration script to install the custom script when a new SageMaker AI notebook is created. Attach the lifecycle configuration to every new SageMaker AI notebook as part of the creation steps.

Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker AI notebook.

Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker AI instance. Install the script.

Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker AI notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker AI notebook is initialized.

Buy Now

Questions 54

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

Options:

Use SageMaker AI file mode to load and process the images in batches.

Reduce the batch size of the model and increase the number of pre-processing threads.

Reduce the quality of the training images in the S3 bucket.

Convert the images into RecordIO format and use the lazy loading pattern.

Buy Now

Questions 55

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to run an on-demand workflow to monitor bias drift for models that are deployed to real-time endpoints from the application.

Which action will meet this requirement?

Options:

Configure the application to invoke an AWS Lambda function that runs a SageMaker Clarify job.

Invoke an AWS Lambda function to pull the sagemaker-model-monitor-analyzer built-in SageMaker image.

Use AWS Glue Data Quality to monitor bias.

Use SageMaker notebooks to compare the bias.

Buy Now

Answer:

Explanation:

Monitoring bias drift in deployed machine learning models is crucial to ensure fairness and accuracy over time. Amazon SageMaker Clarify provides tools to detect bias in ML models, both during training and after deployment. To monitor bias drift for models deployed to real-time endpoints, an effective approach involves orchestrating SageMaker Clarify jobs using AWS Lambda functions.

Implementation Steps:

Set Up Data Capture:

Enable data capture on the SageMaker endpoint to record input data and model predictions. This captured data serves as the basis for bias analysis.

Develop a Lambda Function:

Create an AWS Lambda function configured to initiate a SageMaker Clarify job. This function will process the captured data to assess bias metrics.

Schedule or Trigger the Lambda Function:

Configure the Lambda function to run on-demand or at scheduled intervals using Amazon CloudWatch Events or EventBridge. This setup allows for regular bias monitoring as per the application's requirements.

Analyze and Respond to Results:

After each Clarify job completes, review the generated bias reports. If bias drift is detected, take appropriate actions, such as retraining the model or adjusting data preprocessing steps.

Advantages of This Approach:

Automation: Utilizing AWS Lambda for orchestrating Clarify jobs enables automated and scalable bias monitoring without manual intervention.

Cost-Effectiveness: AWS Lambda's serverless nature ensures that you only pay for the compute time consumed during the execution of the function, optimizing resource usage.

Flexibility: The solution can be tailored to specific monitoring needs, allowing for adjustments in monitoring frequency and analysis parameters.

By implementing this solution, the company can effectively monitor bias drift in real-time, ensuring that the AI application maintains fairness and accuracy throughout its lifecycle.

[References:, Bias drift for models in production - Amazon SageMaker, Schedule Bias Drift Monitoring Jobs - Amazon SageMaker, , ]

Questions 56

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

Options:

Lack of training data

Drift in production data distribution

Compute resource constraints

Model overfitting

Buy Now

Questions 57

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company's internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Options:

Use Amazon Kendra Experience Builder.

Use Amazon Aurora PostgreSQL with the pgvector extension.

Use Amazon RDS for PostgreSQL with the pgvector extension.

Use the AWS Glue Data Catalog metadata repository.

Buy Now

Questions 58

A company is developing a customer support AI assistant by using an Amazon Bedrock Retrieval Augmented Generation (RAG) pipeline. The AI assistant retrieves articles from a knowledge base stored in Amazon S3. The company uses Amazon OpenSearch Service to index the knowledge base. The AI assistant uses an Amazon Bedrock Titan Embeddings model for vector search.

The company wants to improve the relevance of the retrieved articles to improve the quality of the AI assistant's answers.

Which solution will meet these requirements?

Options:

Use auto-summarization on the retrieved articles by using Amazon SageMaker JumpStart.

Use a reranker model before passing the articles to the foundation model (FM).

Use Amazon Athena to pre-filter the articles based on metadata before retrieval.

Use Amazon Bedrock Provisioned Throughput to process queries more efficiently.

Buy Now

Questions 59

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

Options:

Deploy on EC2 Auto Scaling behind an ALB.

Deploy to a SageMaker AI real-time endpoint.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

Deploy to Amazon ECS on EC2.

Buy Now

Questions 60

An ML engineer is analyzing a classification dataset before training a model in Amazon SageMaker AI. The ML engineer suspects that the dataset has a significant imbalance between class labels that could lead to biased model predictions. To confirm class imbalance, the ML engineer needs to select an appropriate pre-training bias metric.

Which metric will meet this requirement?

Options:

Mean squared error (MSE)

Difference in proportions of labels (DPL)

Silhouette score

Structural similarity index measure (SSIM)

Buy Now

Questions 61

A company is developing an internal cost-estimation tool that uses an ML model in Amazon SageMaker AI. Users upload high-resolution images to the tool.

The model must process each image and predict the cost of the object in the image. The model also must notify the user when processing is complete.

Which solution will meet these requirements?

Options:

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

Buy Now

Questions 62

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company must implement a manual approval-based workflow to ensure that only approved models can be deployed to production endpoints.

Which solution will meet this requirement?

Options:

Use SageMaker Experiments to facilitate the approval process during model registration.

Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to "Approved."

Buy Now

Answer:

Explanation:

To implement a manual approval-based workflow ensuring that only approved models are deployed to production endpoints, Amazon SageMaker provides integrated tools such as SageMaker Pipelines and the SageMaker Model Registry.

SageMaker Pipelines is a robust service for building, automating, and managing end-to-end machine learning workflows. It facilitates the orchestration of various steps in the ML lifecycle, including data preprocessing, model training, evaluation, and deployment. By integrating with the SageMaker Model Registry, it enables seamless tracking and management of model versions and their approval statuses.

Implementation Steps:

Define the Pipeline:

Create a SageMaker Pipeline encompassing steps for data preprocessing, model training, evaluation, and registration of the model in the Model Registry.

Incorporate a Condition Step to assess model performance metrics. If the model meets predefined criteria, proceed to the next step; otherwise, halt the process.

Utilize the RegisterModel step to add the trained model to the Model Registry.

Set the ModelApprovalStatus parameter to PendingManualApproval during registration. This status indicates that the model awaits manual review before deployment.

Manual Approval Process:

Notify the designated approver upon model registration. This can be achieved by integrating Amazon EventBridge to monitor registration events and trigger notifications via AWS Lambda functions.

The approver reviews the model's performance and, if satisfactory, updates the model's status to Approved using the AWS SDK or through the SageMaker Studio interface.

Deploy the Approved Model:

Configure the pipeline to automatically deploy models with an Approved status to the production endpoint. This can be managed by adding deployment steps conditioned on the model's approval status.

Advantages of This Approach:

Automated Workflow: SageMaker Pipelines streamline the ML workflow, reducing manual interventions and potential errors.

Governance and Compliance: The manual approval step ensures that only thoroughly evaluated models are deployed, aligning with organizational standards.

Scalability: The solution supports complex ML workflows, making it adaptable to various project requirements.

By implementing this solution, the company can establish a controlled and efficient process for deploying models, ensuring that only approved versions reach production environments.

[References:, Automate the machine learning model approval process with Amazon SageMaker Model Registry and Amazon SageMaker Pipelines, Update the Approval Status of a Model - Amazon SageMaker, , ]

Exam Code: MLA-C01

Exam Name: AWS Certified Machine Learning Engineer - Associate

Last Update: Feb 19, 2026

Questions: 207

MLA-C01 PDF

$25.5 ~~$84.99~~

Add to Cart

MLA-C01 Testing Engine

$30 ~~$99.99~~

Add to Cart

MLA-C01 PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: clap70

clapgeek logo

MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer: