Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: clap70

Databricks-Certified-Data-Engineer-Associate Databricks Certified Data Engineer Associate Exam Questions and Answers

Questions 4

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

The data engineer runs the following query to join these tables together:

Which of the following will be returned by the above query?

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 5

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which of the following approaches can the data engineer take to identify the table that is dropping the records?

Options:

A.

They can set up separate expectations for each table when developing their DLT pipeline.

B.

They cannot determine which table is dropping the records.

C.

They can set up DLT to notify them via email when records are dropped.

D.

They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

E.

They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.

Buy Now
Questions 6

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:

A.

A key-value pair configuration

B.

The preferred DBU/hour cost

C.

A path to cloud storage location for the written data

D.

A location of a target database for the written data

E.

At least one notebook library to be executed

Buy Now
Questions 7

A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark Ul and can see the ERRORS as follows:

Which two corrective actions should the data engineer perform to resolve this issue?

Choose 2 answers - (Q) Narrow the filters in order to collect less data in the query

Options:

A.

Upsize the worker nodes and activate autoshuffle partitions

B.

Upsize the driver node and deactivate autoshuffle partitions

C.

Cache the dataset in order to boost the query performance

D.

Fix the shuffle partitions to 50 to ensure the allocation

Buy Now
Questions 8

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

0

1

2

NULL -

2

3

Options:

A.

3 6 5

B.

4 6 5

C.

3 6 6

D.

4 6 6

Buy Now
Questions 9

A data engineer needs to optimize the data layout and query performance for an e-commerce transactions Delta table. The table is partitioned by "purchase_date" a date column which helps with time-based queries but does not optimize searches on user statistics "customer_id", a high-cardinality column.

The table is usually queried with filters on "customer_i

d" within specific date ranges, but since this data is spread across multiple files in each partition, it results in full partition scans and increased runtime and costs.

How should the data engineer optimize the Data Layout for efficient reads?

Options:

A.

Alter table implementing liquid clustering on "customerid" while keeping the existing partitioning.

B.

Alter the table to partition by "customer_id".

C.

Enable delta caching on the cluster so that frequent reads are cached for performance.

D.

Alter the table implementing liquid clustering by "customer_id" and "purchase_date".

Buy Now
Questions 10

Which type of workloads are compatible with Auto Loader?

Options:

A.

Streaming workloads

B.

Machine learning workloads

C.

Serverless workloads

D.

Batch workloads

Buy Now
Questions 11

What is the primary function of the Silver layer in the Databricks medallion architecture?

Options:

A.

lngest raw data in its original state

B.

Validate, clean, and deduplicate data for further processing

C.

Aggregate and enrich data for business analytics

D.

Store historical data solely for auditing purposes

Buy Now
Questions 12

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:

A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Buy Now
Questions 13

In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?

Options:

A.

When the location of the data needs to be changed

B.

When the target table is an external table

C.

When the source table can be deleted

D.

When the target table cannot contain duplicate records

E.

When the source is not a Delta table

Buy Now
Questions 14

A data engineering team is using Kafka to capture event data and then ingest it into Databricks. The team wants to be able to see these historical events. Medallion architecture is already in place. The team wants to be mindful of costs.

Where should this historical event data be stored?

Options:

A.

Gold

B.

Silver

C.

Bronze

D.

Raw layer

Buy Now
Questions 15

Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?

Options:

A.

When they are working interactively with a small amount of data

B.

When they are running automated reports to be refreshed as quickly as possible

C.

When they are working with SQL within Databricks SQL

D.

When they are concerned about the ability to automatically scale with larger data

E.

When they are manually running reports with a large amount of data

Buy Now
Questions 16

A Python file is ready to go into production and the client wants to use the cheapest but most efficient type of cluster possible. The workload is quite small, only processing 10GBs of data with only simple joins and no complex aggregations or wide transformations.

Which cluster meets the requirement?

Options:

A.

Job cluster with Photon enabled

B.

Interactive cluster

C.

Job cluster with spot instances disabled

D.

Job cluster with spot instances enabled

Buy Now
Questions 17

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

GRANT ALL PRIVILEGES ON TABLE sales TO team;

B.

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

C.

GRANT SELECT ON TABLE sales TO team;

D.

GRANT USAGE ON TABLE sales TO team;

E.

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Buy Now
Questions 18

Which query is performing a streaming hop from raw data to a Bronze table?

A)

B)

C)

D)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

Buy Now
Questions 19

A departing platform owner currently holds ownership of multiple catalogs and controls storage credentials and external locations. The data engineer wants to ensure continuity: transfer catalog ownership to the platform team group, delegate ongoing privilege management, and retain the ability to receive and share data via Delta Sharing.

Which role must be in place to perform these actions across the metastore?

Options:

A.

Account Admin, because account admins can only create metastores but cannot change ownership of catalogs.

B.

Workspace Admin, because workspace admins can transfer ownership of any Unity Catalog object.

C.

Metastore Admin, because metastore admins can transfer ownership and manage privileges across all metastore objects, including shares and recipients.

D.

Catalog Owner, because catalog owners can transfer any object in any catalog in the metastore.

Buy Now
Questions 20

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Options:

A.

Review the Permissions tab in the table's page in Data Explorer

B.

All of these options can be used to identify the owner of the table

C.

Review the Owner field in the table's page in Data Explorer

D.

Review the Owner field in the table's page in the cloud storage solution

E.

There is no way to identify the owner of the table

Buy Now
Questions 21

A new data engineering team team has been assigned to an ELT project. The new data engineering team will need full privileges on the table sales to fully manage the project.

Which command can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

grant all privileges on table sales TO team;

B.

GRANT SELECT ON TABLE sales TO team;

C.

GRANT SELECT CREATE MODIFY ON TABLE sales TO team;

D.

GRANT ALL PRIVILEGES ON TABLE team TO sales;

Buy Now
Questions 22

Which of the following benefits is provided by the array functions from Spark SQL?

Options:

A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Buy Now
Questions 23

A data engineer needs to combine sales data from an on-premises PostgreSQL database with customer data in Azure Synapse for a comprehensive report. The goal is to avoid data duplication and ensure up-to-date information

How should the data engineer achieve this using Databricks?

Options:

A.

Develop custom ETL pipelines to ingest data into Databricks

B.

Use Lakehouse Federation to query both data sources directly

C.

Manually synchronize data from both sources into a single database

D.

Export data from both sources to CSV files and upload them to Databricks

Buy Now
Questions 24

A new data engineering team has been assigned to work on a project. The team will need access to database customers in order to see what tables already exist. The team has its own group team.

Which of the following commands can be used to grant the necessary permission on the entire database to the new team?

Options:

A.

GRANT VIEW ON CATALOG customers TO team;

B.

GRANT CREATE ON DATABASE customers TO team;

C.

GRANT USAGE ON CATALOG team TO customers;

D.

GRANT CREATE ON DATABASE team TO customers;

E.

GRANT USAGE ON DATABASE customers TO team;

Buy Now
Questions 25

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook whenever this value reaches 100.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches 100?

Options:

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with a new webhook alert destination.

D.

They can set up an Alert with one-time notifications.

E.

They can set up an Alert without notifications.

Buy Now
Questions 26

Which method should a Data Engineer apply to ensure Workflows are being triggered on schedule?

Options:

A.

Scheduled Workflows require an always-running cluster, which is more expensive but reduces processing latency.

B.

Scheduled Workflows process data as it arrives at configured sources.

C.

Scheduled Workflows can reduce resource consumption and expense since the cluster runs only long enough to execute the pipeline.

D.

Scheduled Workflows run continuously until manually stopped.

Buy Now
Questions 27

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Production mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:

A.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

B.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

C.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will be deployed for the update and terminated when the pipeline is stopped.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

E.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

Buy Now
Questions 28

A Databricks workflow fails at the last stage due to an error in a notebook. This workflow runs daily. The data engineer fixes the mistake and wants to rerun the pipeline. This workflow is very costly and time-intensive to run.

Which action should the data engineer do in order to minimise downtime and cost?

Options:

A.

Switch to another cluster

B.

Repair run

C.

Re-run the entire workflow

D.

Restart the cluster

Buy Now
Questions 29

Which SQL keyword can be used to convert a table from a long format to a wide format?

Options:

A.

TRANSFORM

B.

PIVOT

C.

SUM

D.

CONVERT

Buy Now
Questions 30

A data engineer is setting up access control in Unity Catalog and needs to ensure that a group of data analysts can query tables but not modify data.

Which permission should the data engineer grant to the data analysts?

Options:

A.

SELECT

B.

INSERT

C.

MODIFY

D.

ALL PRIVILEGES

Buy Now
Questions 31

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

B)

C)

D)

E)

Options:

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Buy Now
Questions 32

Calculate the total sales amount for each region and store the results in a new dataframe called region_sales.

Given the expected result:

Which code will generate the expected result?

Options:

A.

region_sales = sales_df.groupBy("region").agg(sum("sales_amountM).alias("total_sales_amount"))

B.

region_sales = sales_df. sum ("salen_aiTiount") . groupBy ("region") .alias ("total_sale3_amount")

C.

region_sales= sales_df.groupBy("category").sum(nsales_amount").alias("t_otal_sales_amounl")

D.

region sales - sales_df.agg(sum("sales_amount").groupBy("region").alias("total sales amount"))

Buy Now
Questions 33

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

Options:

A.

There is no way to identify the owner of the table

B.

Review the Owner field in the table's page in the cloud storage solution

C.

Review the Permissions tab in the table's page in Data Explorer

D.

Review the Owner field in the table’s page in Data Explorer

Buy Now
Questions 34

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

Options:

A.

dbfs:/user/hive/database/customer360

B.

dbfs:/user/hive/warehouse

C.

dbfs:/user/hive/customer360

D.

More information is needed to determine the correct response

Buy Now
Questions 35

A data engineer wants to delegate day-to-day permission management for the schema main.marketing to the mkt-admins group, without making them workspace admins. They should be able to grant and revoke privileges for other users on objects within that schema.

Which approach aligns with Unity Catalog’s ownership and privilege model?

Options:

A.

Transfer ownership of the schema main.marketing to mkt-admins; owners can manage privileges on the schema and its contained objects.

B.

Grant MANAGE permissions on the metastore to mkt-admins, which allows managing privileges for all schemas and tables globally.

C.

Grant USE SCHEMA on main.marketing, and MODIFY on all tables to mkt-admins, which enables the management of grants within the schema.

D.

Make mkt-admins a workspace-level admins group, then assign SELECT on main.marketing to allow privilege delegation.

Buy Now
Questions 36

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

Options:

A.

SELECT * FROM sales

B.

There is no way to share data between PySpark and SQL.

C.

spark.sql("sales")

D.

spark.delta.table("sales")

E.

spark.table("sales")

Buy Now
Questions 37

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

None of these changes will need to be made

B.

The pipeline will need to stop using the medallion-based multi-hop architecture

C.

The pipeline will need to be written entirely in SQL

D.

The pipeline will need to use a batch source in place of a streaming source

E.

The pipeline will need to be written entirely in Python

Buy Now
Questions 38

Which TWO items are characteristics of the Gold Layer?

Choose 2 answers

Options:

A.

Read-optimized

B.

Normalised

C.

Raw Data

D.

Historical lineage

E.

De-normalised

Buy Now
Questions 39

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

Options:

A.

A data lakehouse provides storage solutions for structured and unstructured data.

B.

A data lakehouse supports ACID-compliant transactions.

C.

A data lakehouse allows the use of SQL queries to examine data.

D.

A data lakehouse stores data in open formats.

E.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Buy Now
Questions 40

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:

A.

The pipeline can have different notebook sources in SQL & Python.

B.

The pipeline will need to be written entirely in SQL.

C.

The pipeline will need to be written entirely in Python.

D.

The pipeline will need to use a batch source in place of a streaming source.

Buy Now
Questions 41

A new data engineering team team. has been assigned to an ELT project. The new data engineering team will need full privileges on the database customers to fully manage the project.

Which of the following commands can be used to grant full permissions on the database to the new data engineering team?

Options:

A.

GRANT USAGE ON DATABASE customers TO team;

B.

GRANT ALL PRIVILEGES ON DATABASE team TO customers;

C.

GRANT SELECT PRIVILEGES ON DATABASE customers TO teams;

D.

GRANT SELECT CREATE MODIFY USAGE PRIVILEGES ON DATABASE customers TO team;

E.

GRANT ALL PRIVILEGES ON DATABASE customers TO team;

Buy Now
Questions 42

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:

A.

They can turn on the Auto Stop feature for the SQL endpoint.

B.

They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.

C.

They can reduce the cluster size of the SQL endpoint.

D.

They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

E.

They can set up the dashboard's SQL endpoint to be serverless.

Buy Now
Questions 43

A team creates YAML manifests that declare jobs, resources, and dependencies, then deploys them to Databricks using the Databricks CLI. The deployment succeeds.

Which feature are they using?

Options:

A.

Databricks Asset Bundles

B.

GitHub

C.

Terraform

D.

DataOps

Buy Now
Questions 44

A data engineer at a company that uses Databricks with Unity Catalog needs to share a collection of tables with an external partner who also uses a Databricks workspace enabled for Unity Catalog. The data engineer decides to use Delta Sharing to accomplish this.

What is the first piece of information the data engineer should request from the external partner to set up Delta Sharing?

Options:

A.

Their Databricks account password

B.

The name of their Databricks cluster

C.

The IP address of their Databricks workspace

D.

The sharing identifier of their Unity Catalog metastore

Buy Now
Questions 45

Which of the following data workloads will utilize a Gold table as its source?

Options:

A.

A job that enriches data by parsing its timestamps into a human-readable format

B.

A job that aggregates uncleaned data to create standard summary statistics

C.

A job that cleans data by removing malformatted records

D.

A job that queries aggregated data designed to feed into a dashboard

E.

A job that ingests raw data from a streaming source into the Lakehouse

Buy Now
Questions 46

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

Options:

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with one-time notifications.

D.

They can set up an Alert with a new webhook alert destination.

E.

They can set up an Alert without notifications.

Buy Now
Questions 47

An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually rerunning the query each day and waiting for the results.

Which of the following approaches can the manager use to ensure the results of the query are updated each day?

Options:

A.

They can schedule the query to refresh every 1 day from the SQL endpoint's page in Databricks SQL.

B.

They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL.

C.

They can schedule the query to refresh every 1 day from the query's page in Databricks SQL.

D.

They can schedule the query to run every 1 day from the Jobs UI.

E.

They can schedule the query to run every 12 hours from the Jobs UI.

Buy Now
Exam Name: Databricks Certified Data Engineer Associate Exam
Last Update: Mar 5, 2026
Questions: 159
Databricks-Certified-Data-Engineer-Associate pdf

Databricks-Certified-Data-Engineer-Associate PDF

$25.5  $84.99
Databricks-Certified-Data-Engineer-Associate Engine

Databricks-Certified-Data-Engineer-Associate Testing Engine

$30  $99.99
Databricks-Certified-Data-Engineer-Associate PDF + Engine

Databricks-Certified-Data-Engineer-Associate PDF + Testing Engine

$40.5  $134.99