Winter Sale Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bestgeek

MLS-C01 AWS Certified Machine Learning - Specialty Questions and Answers

Questions 4

A Machine Learning Specialist is configuring automatic model tuning in Amazon SageMaker

When using the hyperparameter optimization feature, which of the following guidelines should be followed to improve optimization?

Choose the maximum number of hyperparameters supported by

Options:

A.

Amazon SageMaker to search the largest number of combinations possible

B.

Specify a very large hyperparameter range to allow Amazon SageMaker to cover every possible value.

C.

Use log-scaled hyperparameters to allow the hyperparameter space to be searched as quickly as possible

D.

Execute only one hyperparameter tuning job at a time and improve tuning through successive rounds of experiments

Buy Now
Questions 5

A global financial company is using machine learning to automate its loan approval process. The company has a dataset of customer information. The dataset contains some categorical fields, such as customer location by city and housing status. The dataset also includes financial fields in different units, such as account balances in US dollars and monthly interest in US cents.

The company’s data scientists are using a gradient boosting regression model to infer the credit score for each customer. The model has a training accuracy of 99% and a testing accuracy of 75%. The data scientists want to improve the model’s testing accuracy.

Which process will improve the testing accuracy the MOST?

Options:

A.

Use a one-hot encoder for the categorical fields in the dataset. Perform standardization on the financial fields in the dataset. Apply L1 regularization to the data.

B.

Use tokenization of the categorical fields in the dataset. Perform binning on the financial fields in the dataset. Remove the outliers in the data by using the z-score.

C.

Use a label encoder for the categorical fields in the dataset. Perform L1 regularization on the financial fields in the dataset. Apply L2 regularization to the data.

D.

Use a logarithm transformation on the categorical fields in the dataset. Perform binning on the financial fields in the dataset. Use imputation to populate missing values in the dataset.

Buy Now
Questions 6

A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for exploration and analysis. Which of the following services would both ingest and store this data in the correct format?

Options:

A.

AWSDMS

B.

Amazon Kinesis Data Streams

C.

Amazon Kinesis Data Firehose

D.

Amazon Kinesis Data Analytics

Buy Now
Questions 7

A company is running an Amazon SageMaker training job that will access data stored in its Amazon S3 bucket A compliance policy requires that the data never be transmitted across the internet How should the company set up the job?

Options:

A.

Launch the notebook instances in a public subnet and access the data through the public S3 endpoint

B.

Launch the notebook instances in a private subnet and access the data through a NAT gateway

C.

Launch the notebook instances in a public subnet and access the data through a NAT gateway

D.

Launch the notebook instances in a private subnet and access the data through an S3 VPC endpoint.

Buy Now
Questions 8

A Machine Learning Specialist is required to build a supervised image-recognition model to identify a cat. The ML Specialist performs some tests and records the following results for a neural network-based image classifier:

Total number of images available = 1,000 Test set images = 100 (constant test set)

The ML Specialist notices that, in over 75% of the misclassified images, the cats were held upside down by their owners.

Which techniques can be used by the ML Specialist to improve this specific test error?

Options:

A.

Increase the training data by adding variation in rotation for training images.

B.

Increase the number of epochs for model training.

C.

Increase the number of layers for the neural network.

D.

Increase the dropout rate for the second-to-last layer.

Buy Now
Questions 9

This graph shows the training and validation loss against the epochs for a neural network

The network being trained is as follows

• Two dense layers one output neuron

• 100 neurons in each layer

• 100 epochs

• Random initialization of weights

Which technique can be used to improve model performance in terms of accuracy in the validation set?

Options:

A.

Early stopping

B.

Random initialization of weights with appropriate seed

C.

Increasing the number of epochs

D.

Adding another layer with the 100 neurons

Buy Now
Questions 10

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting model using TensorFlow. The training is currently implemented on a single-GPU machine and takes approximately 23 hours to complete. The training needs to be run daily.

The model accuracy js acceptable, but the company anticipates a continuous increase in the size of the training data and a need to update the model on an hourly, rather than a daily, basis. The company also wants to minimize coding effort and infrastructure changes

What should the Machine Learning Specialist do to the training solution to allow it to scale for future demand?

Options:

A.

Do not change the TensorFlow code. Change the machine to one with a more powerful GPU to speed up the training.

B.

Change the TensorFlow code to implement a Horovod distributed framework supported by Amazon SageMaker. Parallelize the training to as many machines as needed to achieve the business goals.

C.

Switch to using a built-in AWS SageMaker DeepAR model. Parallelize the training to as many machines as needed to achieve the business goals.

D.

Move the training to Amazon EMR and distribute the workload to as many machines as needed to achieve the business goals.

Buy Now
Questions 11

A company is using Amazon Polly to translate plaintext documents to speech for automated company announcements However company acronyms are being mispronounced in the current documents How should a Machine Learning Specialist address this issue for future documents'?

Options:

A.

Convert current documents to SSML with pronunciation tags

B.

Create an appropriate pronunciation lexicon.

C.

Output speech marks to guide in pronunciation

D.

Use Amazon Lex to preprocess the text files for pronunciation

Buy Now
Questions 12

An e-commerce company needs a customized training model to classify images of its shirts and pants products The company needs a proof of concept in 2 to 3 days with good accuracy Which compute choice should the Machine Learning Specialist select to train and achieve good accuracy on the model quickly?

Options:

A.

m5 4xlarge (general purpose)

B.

r5.2xlarge (memory optimized)

C.

p3.2xlarge (GPU accelerated computing)

D.

p3 8xlarge (GPU accelerated computing)

Buy Now
Questions 13

A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types of animals. The Specialist has built a series of layers in a neural network that will take an input image of an animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a dense and fully connected layer with 10 nodes The Specialist would like to get an output from the neural network that is a probability distribution of how likely it is that the input image belongs to each of the 10 classes

Which function will produce the desired output?

Options:

A.

Dropout

B.

Smooth L1 loss

C.

Softmax

D.

Rectified linear units (ReLU)

Buy Now
Questions 14

A company wants to predict the sale prices of houses based on available historical sales data. The target

variable in the company’s dataset is the sale price. The features include parameters such as the lot size, living

area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built,

and postal code. The company wants to use multi-variable linear regression to predict house sale prices.

Which step should a machine learning specialist take to remove features that are irrelevant for the analysis

and reduce the model’s complexity?

Options:

A.

Plot a histogram of the features and compute their standard deviation. Remove features with high variance.

B.

Plot a histogram of the features and compute their standard deviation. Remove features with low variance.

C.

Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual

correlation scores.

D.

Run a correlation check of all features against the target variable. Remove features with low target variable

correlation scores.

Buy Now
Questions 15

A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.

Which prior probability distribution should the ML Specialist use for this variable?

Options:

A.

Poisson distribution ,

B.

Uniform distribution

C.

Normal distribution

D.

Binomial distribution

Buy Now
Questions 16

A company that promotes healthy sleep patterns by providing cloud-connected devices currently hosts a sleep

tracking application on AWS. The application collects device usage information from device users. The

company's Data Science team is building a machine learning model to predict if and when a user will stop

utilizing the company's devices. Predictions from this model are used by a downstream application that

determines the best approach for contacting users.

The Data Science team is building multiple versions of the machine learning model to evaluate each version

against the company’s business goals. To measure long-term effectiveness, the team wants to run multiple

versions of the model in parallel for long periods of time, with the ability to control the portion of inferences

served by the models.

Which solution satisfies these requirements with MINIMAL effort?

Options:

A.

Build and host multiple models in Amazon SageMaker. Create multiple Amazon SageMaker endpoints, one

for each model. Programmatically control invoking different models for inference at the application layer.

B.

Build and host multiple models in Amazon SageMaker. Create an Amazon SageMaker endpoint

configuration with multiple production variants. Programmatically control the portion of the inferences

served by the multiple models by updating the endpoint configuration.

C.

Build and host multiple models in Amazon SageMaker Neo to take into account different types of medical

devices. Programmatically control which model is invoked for inference based on the medical device type.

D.

Build and host multiple models in Amazon SageMaker. Create a single endpoint that accesses multiple

models. Use Amazon SageMaker batch transform to control invoking the different models through the

single endpoint.

Buy Now
Questions 17

A Machine Learning Specialist is building a logistic regression model that will predict whether or not a person will order a pizza. The Specialist is trying to build the optimal model with an ideal classification threshold.

What model evaluation technique should the Specialist use to understand how different classification thresholds will impact the model's performance?

Options:

A.

Receiver operating characteristic (ROC) curve

B.

Misclassification rate

C.

Root Mean Square Error (RM&)

D.

L1 norm

Buy Now
Questions 18

Example Corp has an annual sale event from October to December. The company has sequential sales data from the past 15 years and wants to use Amazon ML to predict the sales for this year's upcoming event. Which method should Example Corp use to split the data into a training dataset and evaluation dataset?

Options:

A.

Pre-split the data before uploading to Amazon S3

B.

Have Amazon ML split the data randomly.

C.

Have Amazon ML split the data sequentially.

D.

Perform custom cross-validation on the data

Buy Now
Questions 19

An Amazon SageMaker notebook instance is launched into Amazon VPC The SageMaker notebook references data contained in an Amazon S3 bucket in another account The bucket is encrypted using SSE-KMS The instance returns an access denied error when trying to access data in Amazon S3.

Which of the following are required to access the bucket and avoid the access denied error? (Select THREE )

Options:

A.

An AWS KMS key policy that allows access to the customer master key (CMK)

B.

A SageMaker notebook security group that allows access to Amazon S3

C.

An 1AM role that allows access to the specific S3 bucket

D.

A permissive S3 bucket policy

E.

An S3 bucket owner that matches the notebook owner

F.

A SegaMaker notebook subnet ACL that allow traffic to Amazon S3.

Buy Now
Questions 20

A Machine Learning Specialist trained a regression model, but the first iteration needs optimizing. The Specialist needs to understand whether the model is more frequently overestimating or underestimating the target.

What option can the Specialist use to determine whether it is overestimating or underestimating the target value?

Options:

A.

Root Mean Square Error (RMSE)

B.

Residual plots

C.

Area under the curve

D.

Confusion matrix

Buy Now
Questions 21

A Machine Learning Specialist works for a credit card processing company and needs to predict which

transactions may be fraudulent in near-real time. Specifically, the Specialist must train a model that returns the

probability that a given transaction may fraudulent.

How should the Specialist frame this business problem?

Options:

A.

Streaming classification

B.

Binary classification

C.

Multi-category classification

D.

Regression classification

Buy Now
Questions 22

A data scientist is working on a public sector project for an urban traffic system. While studying the traffic patterns, it is clear to the data scientist that the traffic behavior at each light is correlated, subject to a small stochastic error term. The data scientist must model the traffic behavior to analyze the traffic patterns and reduce congestion.

How will the data scientist MOST effectively model the problem?

Options:

A.

The data scientist should obtain a correlated equilibrium policy by formulating this problem as a multi-agent reinforcement learning problem.

B.

The data scientist should obtain the optimal equilibrium policy by formulating this problem as a single-agent reinforcement learning problem.

C.

Rather than finding an equilibrium policy, the data scientist should obtain accurate predictors of traffic flow by using historical data through a supervised learning approach.

D.

Rather than finding an equilibrium policy, the data scientist should obtain accurate predictors of traffic flow by using unlabeled simulated data representing the new traffic patterns in the city and applying an unsupervised learning approach.

Buy Now
Questions 23

A manufacturing company uses machine learning (ML) models to detect quality issues. The models use images that are taken of the company's product at the end of each production step. The company has thousands of machines at the production site that generate one image per second on average.

The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists used an industrial PC that ran AWS IoT Greengrass with a long-running AWS Lambda function that uploaded the images to Amazon S3. The uploaded images invoked a Lambda function that was written in Python to perform inference by using an Amazon SageMaker endpoint that ran a custom model. The inference results were forwarded back to a web service that was hosted at the production site to prevent faulty products from being shipped.

The company scaled the solution out to all manufacturing machines by installing similarly configured industrial PCs on each production machine. However, latency for predictions increased beyond acceptable limits. Analysis shows that the internet connection is at its capacity limit.

How can the company resolve this issue MOST cost-effectively?

Options:

A.

Set up a 10 Gbps AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images. Increase the size of the instances and the number of instances that are used by the SageMaker endpoint.

B.

Extend the long-running Lambda function that runs on AWS IoT Greengrass to compress the images and upload the compressed files to Amazon S3. Decompress the files by using a separate Lambda function that invokes the existing Lambda function to run the inference pipeline.

C.

Use auto scaling for SageMaker. Set up an AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images.

D.

Deploy the Lambda function and the ML models onto the AWS IoT Greengrass core that is running on the industrial PCs that are installed on each machine. Extend the long-running Lambda function that runs on AWS IoT Greengrass to invoke the Lambda function with the captured images and run the inference on the edge component that forwards the results directly to the web service.

Buy Now
Questions 24

For the given confusion matrix, what is the recall and precision of the model?

Options:

A.

Recall = 0.92 Precision = 0.84

B.

Recall = 0.84 Precision = 0.8

C.

Recall = 0.92 Precision = 0.8

D.

Recall = 0.8 Precision = 0.92

Buy Now
Questions 25

IT leadership wants Jo transition a company's existing machine learning data storage environment to AWS as a temporary ad hoc solution The company currently uses a custom software process that heavily leverages SOL as a query language and exclusively stores generated csv documents for machine learning

The ideal state for the company would be a solution that allows it to continue to use the current workforce of SQL experts The solution must also support the storage of csv and JSON files, and be able to query over semi-structured data The following are high priorities for the company:

• Solution simplicity

• Fast development time

• Low cost

• High flexibility

What technologies meet the company's requirements?

Options:

A.

Amazon S3 and Amazon Athena

B.

Amazon Redshift and AWS Glue

C.

Amazon DynamoDB and DynamoDB Accelerator (DAX)

D.

Amazon RDS and Amazon ES

Buy Now
Questions 26

A company will use Amazon SageMaker to train and host a machine learning (ML) model for a marketing campaign. The majority of data is sensitive customer data. The data must be encrypted at rest. The company wants AWS to maintain the root of trust for the master keys and wants encryption key usage to be logged.

Which implementation will meet these requirements?

Options:

A.

Use encryption keys that are stored in AWS Cloud HSM to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.

B.

Use SageMaker built-in transient keys to encrypt the ML data volumes. Enable default encryption for new Amazon Elastic Block Store (Amazon EBS) volumes.

C.

Use customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.

D.

Use AWS Security Token Service (AWS STS) to create temporary tokens to encrypt the ML storage volumes, and to encrypt the model artifacts and data in Amazon S3.

Buy Now
Questions 27

An e commerce company wants to launch a new cloud-based product recommendation feature for its web application. Due to data localization regulations, any sensitive data must not leave its on-premises data center, and the product recommendation model must be trained and tested using nonsensitive data only. Data transfer to the cloud must use IPsec. The web application is hosted on premises with a PostgreSQL database that contains all the data. The company wants the data to be uploaded securely to Amazon S3 each day for model retraining.

How should a machine learning specialist meet these requirements?

Options:

A.

Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest tables without sensitive data through an AWS Site-to-Site VPN connection directly into Amazon S3.

B.

Create an AWS Glue job to connect to the PostgreSQL DB instance. Ingest all data through an AWS Site- to-Site VPN connection into Amazon S3 while removing sensitive data using a PySpark job.

C.

Use AWS Database Migration Service (AWS DMS) with table mapping to select PostgreSQL tables with no sensitive data through an SSL connection. Replicate data directly into Amazon S3.

D.

Use PostgreSQL logical replication to replicate all data to PostgreSQL in Amazon EC2 through AWS Direct Connect with a VPN connection. Use AWS Glue to move data from Amazon EC2 to Amazon S3.

Buy Now
Questions 28

An ecommerce company is automating the categorization of its products based on images. A data scientist has trained a computer vision model using the Amazon SageMaker image classification algorithm. The images for each product are classified according to specific product lines. The accuracy of the model is too low when categorizing new products. All of the product images have the same dimensions and are stored within an Amazon S3 bucket. The company wants to improve the model so it can be used for new products as soon as possible.

Which steps would improve the accuracy of the solution? (Choose three.)

Options:

A.

Use the SageMaker semantic segmentation algorithm to train a new model to achieve improved accuracy.

B.

Use the Amazon Rekognition DetectLabels API to classify the products in the dataset.

C.

Augment the images in the dataset. Use open source libraries to crop, resize, flip, rotate, and adjust the brightness and contrast of the images.

D.

Use a SageMaker notebook to implement the normalization of pixels and scaling of the images. Store the new dataset in Amazon S3.

E.

Use Amazon Rekognition Custom Labels to train a new model.

F.

Check whether there are class imbalances in the product categories, and apply oversampling or undersampling as required. Store the new dataset in Amazon S3.

Buy Now
Questions 29

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric This workflow will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-through on data that goes stale every 24 hours

With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease costs, the Specialist wants to reconfigure the input hyperparameter range(s)

Which visualization will accomplish this?

Options:

A.

A histogram showing whether the most important input feature is Gaussian.

B.

A scatter plot with points colored by target variable that uses (-Distributed Stochastic Neighbor Embedding (I-SNE) to visualize the large number of input variables in an easier-to-read dimension.

C.

A scatter plot showing (he performance of the objective metric over each training iteration

D.

A scatter plot showing the correlation between maximum tree depth and the objective metric.

Buy Now
Questions 30

An insurance company is developing a new device for vehicles that uses a camera to observe drivers' behavior and alert them when they appear distracted The company created approximately 10,000 training images in a controlled environment that a Machine Learning Specialist will use to train and evaluate machine learning models

During the model evaluation the Specialist notices that the training error rate diminishes faster as the number of epochs increases and the model is not accurately inferring on the unseen test images

Which of the following should be used to resolve this issue? (Select TWO)

Options:

A.

Add vanishing gradient to the model

B.

Perform data augmentation on the training data

C.

Make the neural network architecture complex.

D.

Use gradient checking in the model

E.

Add L2 regularization to the model

Buy Now
Questions 31

A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords.

Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?

Options:

A.

Use Amazon SageMaker script mode and use train.py unchanged. Point the Amazon SageMaker training invocation to the local path of the data without reformatting the training data.

B.

Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3 bucket without reformatting the training data.

C.

Rewrite the train.py script to add a section that converts TFRecords to protobuf and ingests the protobuf data instead of TFRecords.

D.

Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue or AWS Lambda to reformat and store the data in an Amazon S3 bucket.

Buy Now
Exam Code: MLS-C01
Exam Name: AWS Certified Machine Learning - Specialty
Last Update: Feb 2, 2023
Questions: 208
MLS-C01 pdf

MLS-C01 PDF

$32  $80
MLS-C01 Engine

MLS-C01 Testing Engine

$38  $95
MLS-C01 PDF + Engine

MLS-C01 PDF + Testing Engine

$52  $130