Pre-Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: clap70

MLA-C01 AWS Certified Machine Learning Engineer - Associate Questions and Answers

Questions 4

A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data.

Which solution will meet these requirements?

Options:

A.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.

B.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.

C.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a profile job.

D.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a recipe job.

Buy Now
Questions 5

A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model.

Which solution will set up the required online validation with the LEAST operational overhead?

Options:

A.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

B.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

C.

Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

D.

Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Buy Now
Questions 6

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

B.

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

C.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

D.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Buy Now
Questions 7

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

A.

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

D.

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

Buy Now
Questions 8

A company ' s ML engineer is creating a classification model. The ML engineer explores the dataset and notices a column named day_of_week. The column contains the following values: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Which technique should the ML engineer use to convert this column’s data to binary values?

Options:

A.

Binary encoding

B.

Label encoding

C.

One-hot encoding

D.

Tokenization

Buy Now
Questions 9

An ML engineering team has a data processing pipeline that ingests sensor data from IoT devices into an Amazon S3 bucket. The pipeline then processes the data by using AWS Glue extract, transform, and load (ETL) jobs for ML modeling. The team noticed throttling errors in the ETL jobs. The data ingestion process has also been slower than normal.

What is the cause of the problem?

Options:

A.

The AWS Glue service quotas have been reached.

B.

The network bandwidth between the IoT devices and the AWS Region is insufficient.

C.

The AWS Glue ETL jobs are not optimized for parallel processing.

D.

The AWS Glue execution role is missing Amazon S3 permissions.

Buy Now
Questions 10

A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.

Which solution will meet these requirements with the LEAST effort?

Options:

A.

Use SageMaker built-in algorithms to train the proprietary datasets.

B.

Use SageMaker script mode and premade images for ML frameworks.

C.

Build a container on AWS that includes custom packages and a choice of ML frameworks.

D.

Purchase similar production models through AWS Marketplace.

Buy Now
Questions 11

A company uses AWS CodePipeline to orchestrate a continuous integration and continuous delivery (CI/CD) pipeline for ML models and applications.

Select and order the steps from the following list to describe a CI/CD process for a successful deployment. Select each step one time. (Select and order FIVE.)

. CodePipeline deploys ML models and applications to production.

· CodePipeline detects code changes and starts to build automatically.

. Human approval is provided after testing is successful.

. The company builds and deploys ML models and applications to staging servers for testing.

. The company commits code changes or new training datasets to a Git repository.

Options:

Buy Now
Questions 12

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

A.

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

B.

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

C.

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

D.

Put each day ' s time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Buy Now
Questions 13

A music streaming company constantly streams song ratings from an application to an Amazon S3 bucket. The company wants to use the ratings as an input for training and inference of an Amazon SageMaker AI model.

The company has an AWS Glue Data Catalog that is configured with the S3 bucket as the source. An ML engineer needs to implement a solution to create a repository for this data. The solution must ensure that the data stays synchronized during batch training and real-time inference.

Which solution will meet these requirements?

Options:

A.

Ingest data into SageMaker Feature Store from the S3 bucket. Apply tags and indexes.

B.

Use Amazon Athena. Create tables by using CREATE TABLE AS SELECT (CTAS) queries to group data.

C.

Use AWS Lake Formation. Apply tag-based control on the data.

D.

Use the Generate Data Insights function in SageMaker Data Wrangler.

Buy Now
Questions 14

An ML engineer is using Amazon SageMaker to train a deep learning model that requires distributed training. After some training attempts, the ML engineer observes that the instances are not performing as expected. The ML engineer identifies communication overhead between the training instances.

What should the ML engineer do to MINIMIZE the communication overhead between the instances?

Options:

A.

Place the instances in the same VPC subnet. Store the data in a different AWS Region from where the instances are deployed.

B.

Place the instances in the same VPC subnet but in different Availability Zones. Store the data in a different AWS Region from where the instances are deployed.

C.

Place the instances in the same VPC subnet. Store the data in the same AWS Region and Availability Zone where the instances are deployed.

D.

Place the instances in the same VPC subnet. Store the data in the same AWS Region but in a different Availability Zone from where the instances are deployed.

Buy Now
Questions 15

A company is uploading thousands of PDF policy documents into Amazon S3 and Amazon Bedrock Knowledge Bases. Each document contains structured sections. Users often search for a small section but need the full section context. The company wants accurate section-level search with automatic context retrieval and minimal custom coding.

Which chunking strategy meets these requirements?

Options:

A.

Hierarchical

B.

Maximum tokens

C.

Semantic

D.

Fixed-size

Buy Now
Questions 16

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model ' s algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

After the data is aggregated, the ML engineer must implement a solution to automatically detect anomalies in the data and to visualize the result.

Which solution will meet these requirements?

Options:

A.

Use Amazon Athena to automatically detect the anomalies and to visualize the result.

B.

Use Amazon Redshift Spectrum to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

C.

Use Amazon SageMaker Data Wrangler to automatically detect the anomalies and to visualize the result.

D.

Use AWS Batch to automatically detect the anomalies. Use Amazon QuickSight to visualize the result.

Buy Now
Questions 17

An ML engineer must choose the appropriate Amazon SageMaker algorithm to solve specific AI problems.

Select the correct SageMaker built-in algorithm from the following list for each use case. Each algorithm should be selected one time.

• Random Cut Forest (RCF) algorithm

• Semantic segmentation algorithm

• Sequence-to-Sequence (seq2seq) algorithm

Options:

Buy Now
Questions 18

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon S3 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Deploy the model on Amazon SageMaker AI. Create a set of AWS Lambda functions to identify and remove the sensitive data.

B.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

C.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

D.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Buy Now
Questions 19

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company needs to use the central model registry to manage different versions of models in the application.

Which action will meet this requirement with the LEAST operational overhead?

Options:

A.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model.

B.

Use Amazon Elastic Container Registry (Amazon ECR) and unique tags for each model version.

C.

Use the SageMaker Model Registry and model groups to catalog the models.

D.

Use the SageMaker Model Registry and unique tags for each model version.

Buy Now
Questions 20

A customer call center uses Amazon Transcribe to convert hundreds of audio recordings of conversations between customers and support agents to text files. The call center wants to use the text files to train an ML model. To comply with industry regulations, the call center must remove customer names, addresses, and phone numbers from the training text files.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Use Amazon Bedrock Guardrails to process and redact personal information from the text files.

B.

Use the AWS Glue Detect PII transform to remove personal information from the text files.

C.

Store the text files in Amazon S3 buckets. Use S3 Object Lambda functions to redact personal information.

D.

Configure an Amazon SageMaker Data Wrangler custom transformation to remove personal information from the text files.

Buy Now
Questions 21

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually.

The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

B.

Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

C.

Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

D.

Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Buy Now
Questions 22

An ML engineer needs to run intensive model training jobs each month that can take 48–72 hours. The jobs can be interrupted and resumed. The engineer has a fixed budget and needs the most cost-effective compute option.

Which solution will meet these requirements?

Options:

A.

Purchase Reserved Instances with partial upfront payment.

B.

Purchase On-Demand Instances.

C.

Purchase SageMaker AI Savings Plans.

D.

Purchase Spot Instances that use automated checkpoints.

Buy Now
Questions 23

An ML engineer is tuning an image classification model that shows poor performance on one of two available classes during prediction. Analysis reveals that the images whose class the model performed poorly on represent an extremely small fraction of the whole training dataset.

The ML engineer must improve the model ' s performance.

Which solution will meet this requirement?

Options:

A.

Optimize for accuracy. Use image augmentation on the less common images to generate new samples.

B.

Optimize for F1 score. Use image augmentation on the less common images to generate new samples.

C.

Optimize for accuracy. Use Synthetic Minority Oversampling Technique (SMOTE) on the less common images to generate new samples.

D.

Optimize for F1 score. Use Synthetic Minority Oversampling Technique (SMOTE) on the less common images to generate new samples.

Buy Now
Questions 24

An ML engineering team is spread across multiple locations. When the lead ML engineer opens an Amazon SageMaker AI notebook, the ML engineer does not see the latest merged notebook made by other team members from a Git repository.

The lead ML engineer must see the latest SageMaker AI notebook updates.

Which solution will meet this requirement?

Options:

A.

Run the !git pull origin master command.

B.

Run the !git commit command.

C.

Run the !git push origin master command.

D.

Run the !git branch command.

Buy Now
Questions 25

An ML engineer wants to deploy a workflow that processes streaming IoT sensor data and periodically retrains ML models. The most recent model versions must be deployed to production.

Which service will meet these requirements?

Options:

A.

Amazon SageMaker Pipelines

B.

Amazon Managed Workflows for Apache Airflow (MWAA)

C.

AWS Lambda

D.

Apache Spark

Buy Now
Questions 26

A hospital wants to predict patient outcomes for the coming year An ML engineer must improve several existing ML models that currently perform poorly.

Select the correct regularization method from the following list to improve each model Select each regularization method one time, more than one time, or not at all. (Select THREE.)

• L1 regularization

• L2 regularization

• Early stopping

Options:

Buy Now
Questions 27

A company has implemented a data ingestion pipeline for sales transactions from its ecommerce website. The company uses Amazon Data Firehose to ingest data into Amazon OpenSearch Service. The buffer interval of the Firehose stream is set for 60 seconds. An OpenSearch linear model generates real-time sales forecasts based on the data and presents the data in an OpenSearch dashboard.

The company needs to optimize the data ingestion pipeline to support sub-second latency for the real-time dashboard.

Which change to the architecture will meet these requirements?

Options:

A.

Use zero buffering in the Firehose stream. Tune the batch size that is used in the PutRecordBatch operation.

B.

Replace the Firehose stream with an AWS DataSync task. Configure the task with enhanced fan-out consumers.

C.

Increase the buffer interval of the Firehose stream from 60 seconds to 120 seconds.

D.

Replace the Firehose stream with an Amazon Simple Queue Service (Amazon SQS) queue.

Buy Now
Questions 28

An ML engineer is training a simple neural network model. The ML engineer tracks the performance of the model over time on a validation dataset. The model ' s performance improves substantially at first and then degrades after a specific number of epochs.

Which solutions will mitigate this problem? (Choose two.)

Options:

A.

Enable early stopping on the model.

B.

Increase dropout in the layers.

C.

Increase the number of layers.

D.

Increase the number of neurons.

E.

Investigate and reduce the sources of model bias.

Buy Now
Questions 29

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

A.

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Buy Now
Questions 30

A retail company is analyzing customer purchase data to develop personalized product recommendations. The company wants to use Amazon SageMaker Clarify to assess fairness metrics across different customer groups to avoid potential bias in the recommendation system.

The recommendation system needs to identify if certain customer segments are underrepresented in the training data. The company needs to choose a pre-training bias metric in SageMaker Clarify.

Which metric meets these requirements?

Options:

A.

Prediction distribution skew

B.

Feature attribution bias

C.

Class imbalance ratio

D.

Model performance gap

Buy Now
Questions 31

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company must implement a manual approval-based workflow to ensure that only approved models can be deployed to production endpoints.

Which solution will meet this requirement?

Options:

A.

Use SageMaker Experiments to facilitate the approval process during model registration.

B.

Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

C.

Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

D.

Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to " Approved. "

Buy Now
Questions 32

A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.

The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.

Which solution will meet these requirements?

Options:

A.

Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

B.

Create a model group for each category. Move the existing models into these category model groups.

C.

Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

D.

Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Buy Now
Questions 33

An ML engineer is building a generative AI application on Amazon Bedrock by using large language models (LLMs).

Select the correct generative AI term from the following list for each description. Each term should be selected one time or not at all. (Select three.)

• Embedding

• Retrieval Augmented Generation (RAG)

• Temperature

• Token

Options:

Buy Now
Questions 34

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar

dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.

The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

B.

Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

C.

Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

D.

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Buy Now
Questions 35

A company uses Amazon SageMaker AI to create ML models. The data scientists need fine-grained control of ML workflows, DAG visualization, experiment history, and model governance for auditing and compliance.

Which solution will meet these requirements?

Options:

A.

Use AWS CodePipeline with SageMaker Studio and SageMaker ML Lineage Tracking.

B.

Use AWS CodePipeline with SageMaker Experiments.

C.

Use SageMaker Pipelines with SageMaker Studio and SageMaker ML Lineage Tracking.

D.

Use SageMaker Pipelines with SageMaker Experiments.

Buy Now
Questions 36

A streaming media company uses a churn risk model to assess the churn risk of its premium tier customers. Each month, the company runs an aggregation job on individual customers’ streaming data and uploads the user engagement features to an Amazon S3 bucket. The company manually re-trains the churn risk model with the user engagement data.

The current process requires manual intervention and is time-consuming. The company needs a solution that automatically re-trains the churn prediction model with the most recent data.

Which solution will meet these requirements with the SHORTEST delay?

Options:

A.

Set up an Amazon EventBridge rule to run an Amazon Elastic Container Service (Amazon ECS) task hourly for model re-training. Configure the ECS task to use the most recent data from the S3 bucket.

B.

Configure the S3 bucket to invoke an AWS Lambda function that re-trains the model.

C.

Create a pipeline in Amazon SageMaker Pipelines for re-training. Configure an Amazon EventBridge rule to monitor S3 PutObject creation events and invoke the pipeline.

D.

Create a pipeline in Amazon SageMaker Pipelines for re-training. Configure a pipeline schedule to re-train the model.

Buy Now
Questions 37

A company regularly receives new training data from a vendor of an ML model. The vendor delivers cleaned and prepared data to the company’s Amazon S3 bucket every 3–4 days.

The company has an Amazon SageMaker AI pipeline to retrain the model. An ML engineer needs to run the pipeline automatically when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.

Create an S3 lifecycle rule to transfer the data to the SageMaker AI training instance and initiate training.

B.

Create an AWS Lambda function that scans the S3 bucket and initiates the pipeline when new data is uploaded.

C.

Create an Amazon EventBridge rule that matches S3 upload events and configures the SageMaker pipeline as the target.

D.

Use Amazon Managed Workflows for Apache Airflow (MWAA) to orchestrate the pipeline when new data is uploaded.

Buy Now
Questions 38

An ML engineer is using Amazon SageMaker Canvas to build a custom ML model from an imported dataset. The model must make continuous numeric predictions based on 10 years of data.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.

Accuracy

B.

InferenceLatency

C.

Area Under the ROC Curve (AUC)

D.

Root Mean Square Error (RMSE)

Buy Now
Questions 39

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model ' s performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model ' s final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Options:

Buy Now
Questions 40

A company is developing a generative AI conversational interface to assist customers with payments. The company wants to use an ML solution to detect customer intent. The company does not have training data to train a model.

Which solution will meet these requirements?

Options:

A.

Fine-tune a sequence-to-sequence (seq2seq) algorithm in Amazon SageMaker JumpStart.

B.

Use an LLM from Amazon Bedrock with zero-shot learning.

C.

Use the Amazon Comprehend DetectEntities API.

D.

Run an LLM from Amazon Bedrock on Amazon EC2 instances.

Buy Now
Questions 41

A company uses a batching solution to process daily analytics. The company wants to provide near real-time updates, use open-source technology, and avoid managing or scaling infrastructure.

Which solution will meet these requirements?

Options:

A.

Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless clusters.

B.

Create Amazon MSK Provisioned clusters.

C.

Create Amazon Kinesis Data Streams with Application Auto Scaling.

D.

Create self-hosted Apache Flink applications on Amazon EC2.

Buy Now
Questions 42

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

A.

Enable network isolation.

B.

Configure traffic encryption by using security groups.

C.

Enable inter-container traffic encryption.

D.

Enable VPC flow logs.

Buy Now
Questions 43

An ML engineer is preparing a dataset that contains medical records to train an ML model to predict the likelihood of patients developing diseases.

The dataset contains columns for patient ID, age, medical conditions, test results, and a " Disease " target column.

How should the ML engineer configure the data to train the model?

Options:

A.

Remove the patient ID column.

B.

Remove the age column.

C.

Remove the medical conditions and test results columns.

D.

Remove the " Disease " target column.

Buy Now
Questions 44

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

Options:

A.

AWS::SageMaker::Model

B.

AWS::SageMaker::Endpoint

C.

AWS::SageMaker::NotebookInstance

D.

AWS::SageMaker::Pipeline

Buy Now
Questions 45

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

Options:

Buy Now
Questions 46

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

Options:

A.

Increase the temperature parameter and the top_k parameter.

B.

Increase the temperature parameter. Decrease the top_k parameter.

C.

Decrease the temperature parameter. Increase the top_k parameter.

D.

Decrease the temperature parameter and the top_k parameter.

Buy Now
Questions 47

An ML engineer is setting up a continuous integration and continuous delivery (CI/CD) pipeline for an ML workflow in Amazon SageMaker AI. The pipeline needs to automate model re-training, testing, and deployment whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer wants to track model versions for auditing.

Which solution will meet these requirements?

Options:

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and to track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Create an AWS Lambda function to re-train and deploy the model. Use Amazon EventBridge to invoke the Lambda function. Reference the Lambda logs to track model versions.

D.

Use SageMaker AI notebook instances to manually re-train and deploy the model when needed. Reference AWS CloudTrail logs to track model versions.

Buy Now
Questions 48

A bank needs to use Amazon SageMaker AI to create an ML model to determine which customers qualify for a new product. The bank must use algorithms that SageMaker AI directly supports. The model must be explainable to the bank ' s regulators.

Which modeling approach will meet these requirements?

Options:

A.

Train the model by using the Object2Vec algorithm.

B.

Train the model by using the linear learner algorithm.

C.

Train a neural network.

D.

Train the model by using the k-means algorithm.

Buy Now
Questions 49

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model ' s performance? (Select TWO.)

Options:

A.

Increase the model complexity by increasing the number of features in the dataset.

B.

Decrease the model complexity by reducing the number of features in the dataset.

C.

Decrease the model complexity by reducing the number of samples in the dataset.

D.

Increase the value of the L2 regularization parameter.

E.

Decrease the value of the L2 regularization parameter.

Buy Now
Questions 50

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company ' s internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Options:

A.

Use Amazon Kendra Experience Builder.

B.

Use Amazon Aurora PostgreSQL with the pgvector extension.

C.

Use Amazon RDS for PostgreSQL with the pgvector extension.

D.

Use the AWS Glue Data Catalog metadata repository.

Buy Now
Questions 51

A company is using ML to predict the presence of a specific weed in a farmer ' s field. The company is using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for the predictorjype hyperparameter.

What should the company do to MINIMIZE false positives?

Options:

A.

Set the value of the weight decay hyperparameter to zero.

B.

Increase the number of training epochs.

C.

Increase the value of the target_precision hyperparameter.

D.

Change the value of the predictorjype hyperparameter to regressor.

Buy Now
Questions 52

An ML engineer has trained an ML model by using Amazon SageMaker AI. The ML engineer determines that the model is overfitting and that the training data contains unnecessary features. The ML engineer must reduce the overfitting and the impact of the unnecessary features.

Which solution will meet these requirements?

Options:

A.

Apply L1 regularization to the training data. Retrain the model.

B.

Use SageMaker Debugger to apply L1 regularization to the running model.

C.

Increase the number of training iterations. Retrain the model.

D.

Decrease the number of training iterations. Retrain the model.

Buy Now
Questions 53

A company is developing an ML model to predict customer satisfaction. The company needs to use survey feedback and the past satisfaction level of customers to predict the future satisfaction level of customers.

The dataset includes a column named Feedback that contains long text responses. The dataset also includes a column named Satisfaction Level that contains three distinct values for past customer satisfaction: High, Medium, and Low. The company must apply encoding methods to transform the data in each column.

Which solution will meet these requirements?

Options:

A.

Apply one-hot encoding to the Feedback column and the Satisfaction Level column.

B.

Apply one-hot encoding to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

C.

Apply label encoding to the Feedback column. Apply binary encoding to the Satisfaction Level column.

D.

Apply tokenization to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Buy Now
Questions 54

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Buy Now
Questions 55

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company ' s main competitor.

Which solution will meet this requirement?

Options:

A.

Configure the competitor ' s name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor ' s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor ' s name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor ' s name.

Buy Now
Questions 56

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

A.

Configure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.

B.

Configure an AWS Glue job to drop columns that have string type values and to save the results to the S3 bucket.

C.

Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format.

D.

Configure an Amazon EMR cluster to process the data that is in the S3 bucket.

Buy Now
Questions 57

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

Options:

A.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

B.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

C.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

D.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Buy Now
Questions 58

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

Options:

A.

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

B.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

C.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

D.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Buy Now
Questions 59

A company uses an Amazon SageMaker AI model for real-time inference with auto scaling enabled. During peak usage, new instances launch before existing instances are fully ready, causing inefficiencies and delays.

Which solution will optimize the scaling process without affecting response times?

Options:

A.

Change to a multi-model endpoint configuration.

B.

Integrate Amazon API Gateway and AWS Lambda to manage invocations.

C.

Decrease the scale-in cooldown period and increase the maximum instance count.

D.

Increase the cooldown period after scale-out activities.

Buy Now
Questions 60

A company ' s ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker AI endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model ' s predictions?

Options:

A.

Use SageMaker Model Monitor on the deployed model.

B.

Use SageMaker Clarify on the deployed model.

C.

Show the distribution of inferences from A/B testing in Amazon CloudWatch.

D.

Add a shadow endpoint. Analyze prediction differences on samples.

Buy Now
Questions 61

An ML engineer wants to use, prepare, and load data from Amazon S3 for analytics. The ML engineer must run an extract, transform, and load (ETL) job to discover the schema of the data and to store the metadata.

Which solution will meet these requirements with the LEAST manual effort?

Options:

A.

Use AWS Glue to run the ETL job. Use the job to discover the schema and to store the associated metadata in the AWS Glue Data Catalog.

B.

Create an Amazon SageMaker Data Wrangler flow to run the ETL job. Use the job to discover the schema and to store the associated metadata in an S3 bucket.

C.

Create an ETL pipeline by using Amazon Athena integrated with AWS Step Functions. Use the pipeline to run the ETL job to discover the schema and to store the associated metadata in an S3 bucket.

D.

Launch an Amazon EC2 instance that includes the scikit-learn library to run the ETL job. Use the job to discover the schema and to store the associated metadata in Amazon Redshift.

Buy Now
Questions 62

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.

An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.

Which solution will meet these requirements?

Options:

A.

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

B.

Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

C.

Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

D.

Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Buy Now
Questions 63

A company has trained an ML model that is packaged in a container. The company will integrate the model with an existing Python web application. The company needs to host the model on AWS by using Kubernetes.

The company does not want to manage the control plane and must provision the resources in a repeatable manner. The infrastructure must be provisioned by using Python.

Which solution will meet these requirements?

Options:

A.

Use AWS CloudFormation to provision Amazon EC2 instances in multiple Availability Zones. Set up a Kubernetes cluster. Host the model container on the Kubernetes cluster.

B.

Use the AWS CLI to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

C.

Use the AWS Cloud Development Kit (AWS CDK) to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

D.

Use AWS CloudFormation to provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. Store the image in an Amazon Elastic Container Registry (Amazon ECR) repository. Host the model container on the EKS cluster.

Buy Now
Questions 64

A company wants to use large language models (LLMs) supported by Amazon Bedrock to develop a chat interface for internal technical documentation.

The documentation consists of dozens of text files totaling several megabytes and is updated frequently.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Train a new LLM in Amazon Bedrock using the documentation.

B.

Use Amazon Bedrock guardrails to integrate documentation.

C.

Fine-tune an LLM in Amazon Bedrock with the documentation.

D.

Upload the documentation to an Amazon Bedrock knowledge base and use it as context during inference.

Buy Now
Questions 65

A company has an ML model that needs to run one time each night to predict stock values. The model input is 3 MB of data that is collected during the current day. The model produces the predictions for the next day. The prediction process takes less than 1 minute to finish running.

How should the company deploy the model on Amazon SageMaker to meet these requirements?

Options:

A.

Use a multi-model serverless endpoint. Enable caching.

B.

Use an asynchronous inference endpoint. Set the InitialInstanceCount parameter to 0.

C.

Use a real-time endpoint. Configure an auto scaling policy to scale the model to 0 when the model is not in use.

D.

Use a serverless inference endpoint. Set the MaxConcurrency parameter to 1.

Buy Now
Questions 66

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline ' s correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

Options:

Buy Now
Questions 67

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.

Use a custom Amazon SageMaker AI notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Buy Now
Questions 68

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Buy Now
Questions 69

A company is developing an ML model to forecast future values based on time series data. The dataset includes historical measurements collected at regular intervals and categorical features. The model needs to predict future values based on past patterns and trends.

Which algorithm and hyperparameters should the company use to develop the model?

Options:

A.

Use the Amazon SageMaker AI XGBoost algorithm. Set the scale_pos_weight hyperparameter to adjust for class imbalance.

B.

Use k-means clustering with k to specify the number of clusters.

C.

Use the Amazon SageMaker AI DeepAR algorithm with matching context length and prediction length hyperparameters.

D.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm with contamination to set the expected proportion of anomalies.

Buy Now
Questions 70

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Buy Now
Questions 71

A travel company wants to create an ML model to recommend the next airport destination for its users. The company has collected millions of data records about user location, recent search history on the company ' s website, and 2,000 available airports. The data has several categorical features with a target column that is expected to have a high-dimensional sparse matrix.

The company needs to use Amazon SageMaker AI built-in algorithms for the model. An ML engineer converts the categorical features by using one-hot encoding.

Which algorithm should the ML engineer implement to meet these requirements?

Options:

A.

Use the CatBoost algorithm to recommend the next airport destination.

B.

Use the DeepAR forecasting algorithm to recommend the next airport destination.

C.

Use the Factorization Machines algorithm to recommend the next airport destination.

D.

Use the k-means algorithm to cluster users into groups and map each group to the next airport destination.

Buy Now
Questions 72

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

Options:

A.

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Buy Now
Exam Code: MLA-C01
Exam Name: AWS Certified Machine Learning Engineer - Associate
Last Update: May 17, 2026
Questions: 230
MLA-C01 pdf

MLA-C01 PDF

$25.5  $84.99
MLA-C01 Engine

MLA-C01 Testing Engine

$30  $99.99
MLA-C01 PDF + Engine

MLA-C01 PDF + Testing Engine

$40.5  $134.99