Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: geek65

DY0-001 CompTIA DataX Exam Questions and Answers

Questions 4

A data scientist is preparing to brief a non-technical audience that is focused on analysis and results. During the modeling process, the data scientist produced the following artifacts:

Which of the following artifacts should the data scientist include in the briefing? (Choose two.)

Options:

A.

Final charts and dashboards

B.

Model selection, justification, and purpose

C.

Code documentation

D.

Mathematical descriptions of clustering algorithms included in the selected model

E.

Model performance statistics (accuracy, precision, recall, F1 score, etc.)

F.

Data dictionary

Buy Now
Questions 5

A data scientist is clustering a data set but does not want to specify the number of clusters present. Which of the following algorithms should the data scientist use?

Options:

A.

DBSCAN

B.

k-nearest neighbors

C.

k-means

D.

Logistic regression

Buy Now
Questions 6

A data scientist is building a model to predict customer credit scores based on information collected from reporting agencies. The model needs to automatically adjust its parameters to adapt to recent changes in the information collected. Which of the following is the best model to use?

Options:

A.

Decision tree

B.

Random forest

C.

Linear discriminant analysis

D.

XGBoost

Buy Now
Questions 7

Which of the following explains back propagation?

Options:

A.

The passage of convolutions backward through a neural network to update weights and biases

B.

The passage of accuracy backward through a neural network to update weights and biases

C.

The passage of nodes backward through a neural network to update weights and biases

D.

The passage of errors backward through a neural network to update weights and biases

Buy Now
Questions 8

A data scientist uses a large data set to build multiple linear regression models to predict the likely market value of a real estate property. The selected new model has an RMSE of 995 on the holdout set and an adjusted R² of 0.75. The benchmark model has an RMSE of 1,000 on the holdout set. Which of the following is the best business statement regarding the new model?

Options:

A.

The model should be deployed because it has a lower RMSE.

B.

The model's adjusted R² is exceptionally strong for such a complex relationship.

C.

The model fails to improve meaningfully on the benchmark model.

D.

The model's adjusted R² is too low for the real estate industry.

Buy Now
Questions 9

A movie production company would like to find the actors appearing in its top movies using data from the tables below. The resulting data must show all movies in Table 1, enriched with actors listed in Table 2.

Which of the following query operations achieves the desired data set?

Options:

A.

Perform an INNER JOIN between Table 1 using column Movie, and Table 2 using column Acted_In.

B.

Perform a UNION between Table 1 using column Movie, and Table 2 using column Acted_In.

C.

Perform an INTERSECT between Table 1 using column Movie, and Table 2 using column Acted_In.

D.

Perform a LEFT JOIN on Table 1 using column Movie, with Table 2 using column Acted_In.

Buy Now
Questions 10

A data scientist is working with a data set that covers a two-year period for a large number of machines. The data set contains:

    Machine system ID numbers

    Sensor measurement values

    Daily timestamps for each machine

The data scientist needs to plot the total measurements from all the machines over the entire time period. Which of the following is the best way to present this data?

Options:

A.

Scatter plot

B.

Line plot

C.

Histogram

D.

Box-and-whisker plot

Buy Now
Questions 11

A data scientist is using the following confusion matrix to assess model performance:

Actually Fails

Actually Succeeds

Predicted to Fail

80%

20%

Predicted to Succeed

15%

85%

The model is predicting whether a delivery truck will be able to make 200 scheduled delivery stops.

Every time the model is correct, the company saves 1 hour in planning and scheduling.

Every time the model is wrong, the company loses 4 hours of delivery time.

Which of the following is the net model impact for the company?

Options:

A.

25 hours lost

B.

25 hours saved

C.

165 hours lost

D.

165 hours saved

Buy Now
Questions 12

Which of the following techniques enables automation and iteration of code releases?

Options:

A.

Virtualization

B.

Markdown

C.

Code isolation

D.

CI/CD

Buy Now
Questions 13

A data scientist built several models that perform about the same but vary in the number of features. Which of the following models should the data scientist recommend for production according to Occam's razor?

Options:

A.

The model with the fewest features and highest performance

B.

The model with the fewest features and the lowest performance

C.

The model with the most features and the lowest performance

D.

The model with the most features and the highest performance

Buy Now
Questions 14

A data analyst wants to generate the most data using tables from a database. Which of the following is the best way to accomplish this objective?

Options:

A.

INNER JOIN

B.

LEFT OUTER JOIN

C.

RIGHT OUTER JOIN

D.

FULL OUTER JOIN

Buy Now
Questions 15

A data scientist has constructed a model that meets the minimum performance requirements specified in the proposal for a prediction project. The data scientist thinks the model's accuracy should be improved, but the proposed deadline is approaching. Which of the following actions should the data scientist take first?

Options:

A.

Continue collecting data.

B.

Request additional funding.

C.

Consult the key project stakeholder.

D.

Test additional model specifications.

Buy Now
Questions 16

Given matrix

Which of the following is AT?

Options:

A.

B.

C.

D.

Buy Now
Questions 17

Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?

Options:

A.

Binomial

B.

Exponential

C.

Normal

D.

Poisson

Buy Now
Questions 18

A data scientist is analyzing a data set with categorical features and would like to make those features more useful when building a model. Which of the following data transformation techniques should the data scientist use? (Choose two.)

Options:

A.

Normalization

B.

One-hot encoding

C.

Linearization

D.

Label encoding

E.

Scaling

F.

Pivoting

Buy Now
Questions 19

Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?

Options:

A.

An input layer, a pooling layer, and an output layer

B.

An input layer, a convolutional layer, and a hidden layer

C.

An input layer, a hidden layer, and an output layer

D.

An input layer, a dropout layer, and a hidden layer

Buy Now
Questions 20

A data analyst is examining the correlation matrix of a new data set to identify issues that could adversely impact model performance. Which of the following is the analyst most likely checking for?

Options:

A.

Undersampling

B.

Multicollinearity

C.

Oversampling

D.

Overfitting

Buy Now
Questions 21

A computer vision model is trained to identify cats on a training set that is composed of both cat and dog images. The model predicts a picture of a cat is a dog. Which of the following describes this error?

Options:

A.

Error due to reality

B.

False positive error

C.

Sampling error

D.

Type II error

Buy Now
Questions 22

A data scientist is presenting the recommendations from a monthslong modeling and experiment process to the company’s Chief Executive Officer. Which of the following is the best set of artifacts to include in the presentation?

Options:

A.

Methods, data overview, results, recommendations, and charts

B.

Results, recommendations, justifications, and clear charts

C.

Recommendation, charts, justifications, code reviews, and results

D.

Methodology, code snippets, findings, data tables, and p-values

Buy Now
Questions 23

A data analyst wants to save a newly analyzed data set to a local storage option. The data set must meet the following requirements:

    Be minimal in size

    Have the ability to be ingested quickly

    Have the associated schema, including data types, stored with it

Which of the following file types is the best to use?

Options:

A.

JSON

B.

Parquet

C.

XML

D.

CSV

Buy Now
Questions 24

A data scientist wants to predict a person's travel destination. The options are:

    Branson, Missouri, United States

    Mount Kilimanjaro, Tanzania

    Disneyland Paris, Paris, France

    Sydney Opera House, Sydney, Australia

Which of the following models would best fit this use case?

Options:

A.

Linear discriminant analysis

B.

k-means modeling

C.

Latent semantic analysis

D.

Principal component analysis

Buy Now
Questions 25

The term "greedy algorithms" refers to machine-learning algorithms that:

Options:

A.

update priors as more data is seen.

B.

examine every node of a tree before making a decision.

C.

apply a theoretical model to the distribution of the data.

D.

make the locally optimal decision.

Buy Now
Exam Code: DY0-001
Exam Name: CompTIA DataX Exam
Last Update: Jun 15, 2025
Questions: 85
DY0-001 pdf

DY0-001 PDF

$29.75  $84.99
DY0-001 Engine

DY0-001 Testing Engine

$35  $99.99
DY0-001 PDF + Engine

DY0-001 PDF + Testing Engine

$47.25  $134.99