Practice Test Free
  • QUESTIONS
  • COURSES
    • CCNA
    • Cisco Enterprise Core
    • VMware vSphere: Install, Configure, Manage
  • CERTIFICATES
No Result
View All Result
  • Login
  • Register
Quesions Library
  • Cisco
    • 200-301
    • 200-901
      • Multiple Choice
      • Drag Drop
    • 350-401
      • Multiple Choice
      • Drag Drop
    • 350-701
    • 300-410
      • Multiple Choice
      • Drag Drop
    • 300-415
      • Multiple Choice
      • Drag Drop
    • 300-425
    • Others
  • AWS
    • CLF-C02
    • SAA-C03
    • SAP-C02
    • ANS-C01
    • Others
  • Microsoft
    • AZ-104
    • AZ-204
    • AZ-305
    • AZ-900
    • AI-900
    • SC-900
    • Others
  • CompTIA
    • SY0-601
    • N10-008
    • 220-1101
    • 220-1102
    • Others
  • Google
    • Associate Cloud Engineer
    • Professional Cloud Architect
    • Professional Cloud DevOps Engineer
    • Others
  • ISACA
    • CISM
    • CRIS
    • Others
  • LPI
    • 101-500
    • 102-500
    • 201-450
    • 202-450
  • Fortinet
    • NSE4_FGT-7.2
  • VMware
  • >>
    • Juniper
    • EC-Council
      • 312-50v12
    • ISC
      • CISSP
    • PMI
      • PMP
    • Palo Alto Networks
    • RedHat
    • Oracle
    • GIAC
    • F5
    • ITILF
    • Salesforce
Contribute
Practice Test Free
  • QUESTIONS
  • COURSES
    • CCNA
    • Cisco Enterprise Core
    • VMware vSphere: Install, Configure, Manage
  • CERTIFICATES
No Result
View All Result
Practice Test Free
No Result
View All Result
Home Free IT Exam Dumps

DP-203 Dump Free

Table of Contents

Toggle
  • DP-203 Dump Free – 50 Practice Questions to Sharpen Your Exam Readiness.
  • Access Full DP-203 Dump Free

DP-203 Dump Free – 50 Practice Questions to Sharpen Your Exam Readiness.

Looking for a reliable way to prepare for your DP-203 certification? Our DP-203 Dump Free includes 50 exam-style practice questions designed to reflect real test scenarios—helping you study smarter and pass with confidence.

Using an DP-203 dump free set of questions can give you an edge in your exam prep by helping you:

  • Understand the format and types of questions you’ll face
  • Pinpoint weak areas and focus your study efforts
  • Boost your confidence with realistic question practice

Below, you will find 50 free questions from our DP-203 Dump Free collection. These cover key topics and are structured to simulate the difficulty level of the real exam, making them a valuable tool for review or final prep.

Question 1

You are designing a folder structure for the files in an Azure Data Lake Storage Gen2 account. The account has one container that contains three years of data.
You need to recommend a folder structure that meets the following requirements:
✑ Supports partition elimination for queries by Azure Synapse Analytics serverless SQL pools
✑ Supports fast data retrieval for data from the current month
✑ Simplifies data security management by department
Which folder structure should you recommend?

A. DepartmentDataSourceYYYYMMDataFile_YYYYMMDD.parquet

B. DataSourceDepartmentYYYYMMDataFile_YYYYMMDD.parquet

C. DDMMYYYYDepartmentDataSourceDataFile_DDMMYY.parquet

D. YYYYMMDDDepartmentDataSourceDataFile_YYYYMMDD.parquet

 


Suggested Answer: A

Department top level in the hierarchy to simplify security management.
Month (MM) at the leaf/bottom level to support fast data retrieval for data from the current month.

Question 2

You are designing an Azure Databricks cluster that runs user-defined local processes.
You need to recommend a cluster configuration that meets the following requirements:
✑ Minimize query latency.
✑ Maximize the number of users that can run queries on the cluster at the same time.
✑ Reduce overall costs without compromising other requirements.
Which cluster type should you recommend?

A. Standard with Auto Termination

B. High Concurrency with Autoscaling

C. High Concurrency with Auto Termination

D. Standard with Autoscaling

 


Suggested Answer: B

A High Concurrency cluster is a managed cloud resource. The key benefits of High Concurrency clusters are that they provide fine-grained sharing for maximum resource utilization and minimum query latencies.
Databricks chooses the appropriate number of workers required to run your job. This is referred to as autoscaling. Autoscaling makes it easier to achieve high cluster utilization, because you don’t need to provision the cluster to match a workload.
Incorrect Answers:
C: The cluster configuration includes an auto terminate setting whose default value depends on cluster mode:
Standard and Single Node clusters terminate automatically after 120 minutes by default.
High Concurrency clusters do not terminate automatically by default.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/clusters/configure

Question 3

You have an Azure Stream Analytics query. The query returns a result set that contains 10,000 distinct values for a column named clusterID.
You monitor the Stream Analytics job and discover high latency.
You need to reduce the latency.
Which two actions should you perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Add a pass-through query.

B. Increase the number of streaming units.

C. Add a temporal analytic function.

D. Scale out the query by using PARTITION BY.

E. Convert the query to a reference query.

 


Suggested Answer: BD

Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-streaming-unit-consumption
https://docs.microsoft.com/en-us/azure/stream-analytics/repartition

Question 4

HOTSPOT
-
You have an Azure Synapse Analytics workspace that contains three pipelines and three triggers named Trigger1, Trigger2, and Trigger3.
Trigger3 has the following definition.
 Image
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
 Image

 


Suggested Answer:
Correct Answer Image

 

Question 5

You are designing a solution that will use tables in Delta Lake on Azure Databricks.
You need to minimize how long it takes to perform the following:
•	Queries against non-partitioned tables
•	Joins on non-partitioned columns
Which two options should you include in the solution? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. the clone command

B. Z-Ordering

C. Apache Spark caching

D. dynamic file pruning (DFP)

 


Suggested Answer: BD

 

Question 6

You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table.
Which output mode should you use?

A. update

B. complete

C. append

 


Suggested Answer: C

Append Mode: Only new rows appended in the result table since the last trigger are written to external storage. This is applicable only for the queries where existing rows in the Result Table are not expected to change.
Incorrect Answers:
B: Complete Mode: The entire updated result table is written to external storage. It is up to the storage connector to decide how to handle the writing of the entire table.
A: Update Mode: Only the rows that were updated in the result table since the last trigger are written to external storage. This is different from Complete Mode in that Update Mode outputs only the rows that have changed since the last trigger. If the query doesn’t contain aggregations, it is equivalent to Append mode.
Reference:
https://docs.databricks.com/getting-started/spark/streaming.html

Question 7

You have an Azure Data Factory pipeline named pipeline1 that contains a data flow activity named activity1.
You need to run pipeline1.
Which runtime will be used to run activity1?

A. Azure Integration runtime

B. Self-hosted integration runtime

C. SSIS integration runtime

 


Suggested Answer: A

 

Question 8

You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses PolyBase to load data from comma-separated value (CSV) files stored in Azure Data Lake Storage Gen2 using an external table.
Files with an invalid schema cause errors to occur.
You need to monitor for an invalid schema error.
For which error should you monitor?

A. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.’

B. Cannot execute the query “Remote Query” against OLE DB provider “SQLNCLI11” for linked server “(null)”. Query aborted- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.

C. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.’

D. EXTERNAL TABLE access failed due to internal error: ‘Java exception raised on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs] occurred while accessing external file.’

 


Suggested Answer: B

Error message: Cannot execute the query “Remote Query”
Possible Reason:
The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be seen in SSMS.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-errors-and-possible-solutions

Question 9

HOTSPOT
-
You have an Azure subscription that contains the Azure Synapse Analytics workspaces shown in the following table.
 Image
Each workspace must read and write data to datalake1.
Each workspace contains an unused Apache Spark pool.
You plan to configure each Spark pool to share catalog objects that reference datalake1.
For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.
 Image

 


Suggested Answer:
Correct Answer Image

 

Question 10

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads:
✑ A workload for data engineers who will use Python and SQL.
✑ A workload for jobs that will run notebooks that use Python, Scala, and SQL.
✑ A workload that data scientists will use to perform ad hoc analysis in Scala and R.
The enterprise architecture team at your company identifies the following standards for Databricks environments:
✑ The data engineers must share a cluster.
✑ The job cluster will be managed by using a request process whereby data scientists and data engineers provide packaged notebooks for deployment to the cluster.
✑ All the data scientists must be assigned their own cluster that terminates automatically after 120 minutes of inactivity. Currently, there are three data scientists.
You need to create the Databricks clusters for the workloads.
Solution: You create a Standard cluster for each data scientist, a High Concurrency cluster for the data engineers, and a High Concurrency cluster for the jobs.
Does this meet the goal?

A. Yes

B. No

 


Suggested Answer: A

We need a High Concurrency cluster for the data engineers and the jobs.
Note: Standard clusters are recommended for a single user. Standard can run workloads developed in any language: Python, R, Scala, and SQL.
A high concurrency cluster is a managed cloud resource. The key benefits of high concurrency clusters are that they provide Apache Spark-native fine-grained sharing for maximum resource utilization and minimum query latencies.
Reference:
https://docs.azuredatabricks.net/clusters/configure.html

Question 11

You have an Azure Stream Analytics job named Job1.
The metrics of Job1 from the last hour are shown in the following table.
 Image
The late arrival tolerance for Job1 is set to five seconds.
You need to optimize Job1.
Which two actions achieve the goal? Each correct answer presents a complete solution.
NOTE: Each correct answer is worth one point.

A. Increase the number of SUs.

B. Parallelize the query.

C. Resolve errors in output processing.

D. Resolve errors in input processing.

 


Suggested Answer: AB

 

Question 12

You have a Log Analytics workspace named la1 and an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 sends logs to la1.
You need to identify whether a recently executed query on Pool1 used the result set cache.
What are two ways to achieve the goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Review the sys.dm_pdw_sql_requests dynamic management view in Pool1.

B. Review the sys.dm_pdw_exec_requests dynamic management view in Pool1.

C. Use the Monitor hub in Synapse Studio.

D. Review the AzureDiagnostics table in la1.

E. Review the sys.dm_pdw_request_steps dynamic management view in Pool1.

 


Suggested Answer: BC

 

Question 13

HOTSPOT -
You are designing a near real-time dashboard solution that will visualize streaming data from remote sensors that connect to the internet. The streaming data must be aggregated to show the average value of each 10-second interval. The data will be discarded after being displayed in the dashboard.
The solution will use Azure Stream Analytics and must meet the following requirements:
✑ Minimize latency from an Azure Event hub to the dashboard.
✑ Minimize the required storage.
✑ Minimize development effort.
What should you include in the solution? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard

Question 14

You have an enterprise data warehouse in Azure Synapse Analytics named DW1 on a server named Server1.
You need to determine the size of the transaction log file for each distribution of DW1.
What should you do?

A. On DW1, execute a query against the sys.database_files dynamic management view.

B. From Azure Monitor in the Azure portal, execute a query against the logs of DW1.

C. Execute a query against the logs of DW1 by using the Get-AzOperationalInsightsSearchResult PowerShell cmdlet.

D. On the master database, execute a query against the sys.dm_pdw_nodes_os_performance_counters dynamic management view.

 


Suggested Answer: A

For information about the current log file size, its maximum size, and the autogrow option for the file, you can also use the size, max_size, and growth columns for that log file in sys.database_files.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/logs/manage-the-size-of-the-transaction-log-file

Question 15

You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?

A. DWU percentage

B. Cache hit percentage

C. DWU limit

D. Data IO percentage

 


Suggested Answer: B

Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive cache for dedicated SQL pools.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache

Question 16

HOTSPOT -
You need to implement an Azure Databricks cluster that automatically connects to Azure Data Lake Storage Gen2 by using Azure Active Directory (Azure AD) integration.
How should you configure the new cluster? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: Premium –
Credential passthrough requires an Azure Databricks Premium Plan
Box 2: Azure Data Lake Storage credential passthrough
You can access Azure Data Lake Storage using Azure Active Directory credential passthrough.
When you enable your cluster for Azure Data Lake Storage credential passthrough, commands that you run on that cluster can read and write data in Azure Data
Lake Storage without requiring you to configure service principal credentials for access to storage.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough

Question 17

DRAG DROP -
You have an Azure Active Directory (Azure AD) tenant that contains a security group named Group1. You have an Azure Synapse Analytics dedicated SQL pool named dw1 that contains a schema named schema1.
You need to grant Group1 read-only permissions to all the tables and views in schema1. The solution must use the principle of least privilege.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct orders you select.
Select and Place:
 Image

 


Suggested Answer:
Correct Answer Image

Step 1: Create a database user named dw1 that represents Group1 and use the FROM EXTERNAL PROVIDER clause.
Step 2: Create a database role named Role1 and grant Role1 SELECT permissions to schema1.
Step 3: Assign Role1 to the Group1 database user.
Reference:
https://docs.microsoft.com/en-us/azure/data-share/how-to-share-from-sql

Question 18

DRAG DROP
-
You have an Azure Synapse Analytics dedicated SQL pool.
You need to create a copy of the data warehouse and make the copy available for 28 days. The solution must minimize costs.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
 Image

 


Suggested Answer:
Correct Answer Image

 

Question 19

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure subscription that contains an Azure data factory named ADF1.
From Azure Data Factory Studio, you build a complex data pipeline in ADF1.
You discover that the Save button is unavailable, and there are validation errors that prevent the pipeline from being published.
You need to ensure that you can save the logic of the pipeline.
Solution: You disable all the triggers for ADF1.
Does this meet the goal?

A. Yes

B. No

 


Suggested Answer: B

 

Question 20

HOTSPOT -
You have an Azure Synapse Analytics dedicated SQL pool.
You need to create a table named FactInternetSales that will be a large fact table in a dimensional model. FactInternetSales will contain 100 million rows and two columns named SalesAmount and OrderQuantity. Queries executed on FactInternetSales will aggregate the values in SalesAmount and OrderQuantity from the last year for a specific product. The solution must minimize the data size and query execution time.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: (CLUSTERED COLUMNSTORE INDEX
CLUSTERED COLUMNSTORE INDEX –
Columnstore indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage. You can also achieve gains up to
10 times the data compression over the uncompressed data size. Beginning with SQL Server 2016 (13.x) SP1, columnstore indexes enable operational analytics: the ability to run performant real-time analytics on a transactional workload.
Note: Clustered columnstore index
A clustered columnstore index is the physical storage for the entire table.
Reference Image
To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs for deleted rows. The deltastore operations are handled behind the scenes. To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.
Box 2: HASH([ProductKey])
A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table is designed to achieve high performance for queries on large tables.
Choose a distribution column with data that distributes evenly
Incorrect:
* Not HASH([OrderDateKey]). Is not a date column. All data for the same date lands in the same distribution. If several users are all filtering on the same date, then only 1 of the 60 distributions do all the processing work
* A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables since joins on replicated tables don’t require data movement. Replication requires extra storage, though, and isn’t practical for large tables.
* A round-robin table distributes table rows evenly across all distributions. The rows are distributed randomly. Loading data into a round-robin table is fast. Keep in mind that queries can require more data movement than the other distribution methods.
Reference: alt=”Reference Image” />
To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs for deleted rows. The deltastore operations are handled behind the scenes. To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.
Box 2: HASH([ProductKey])
A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table is designed to achieve high performance for queries on large tables.
Choose a distribution column with data that distributes evenly
Incorrect:
* Not HASH([OrderDateKey]). Is not a date column. All data for the same date lands in the same distribution. If several users are all filtering on the same date, then only 1 of the 60 distributions do all the processing work
* A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables since joins on replicated tables don’t require data movement. Replication requires extra storage, though, and isn’t practical for large tables.
* A round-robin table distributes table rows evenly across all distributions. The rows are distributed randomly. Loading data into a round-robin table is fast. Keep in mind that queries can require more data movement than the other distribution methods.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

Question 21

You have two fact tables named Flight and Weather. Queries targeting the tables will be based on the join between the following columns.
 Image
You need to recommend a solution that maximizes query performance.
What should you include in the recommendation?

A. In the tables use a hash distribution of ArrivalDateTime and ReportDateTime.

B. In the tables use a hash distribution of ArrivalAirportID and AirportID.

C. In each table, create an IDENTITY column.

D. In each table, create a column as a composite of the other two columns in the table.

 


Suggested Answer: B

Hash-distribution improves query performance on large fact tables.
Incorrect Answers:
A: Do not use a date column for hash distribution. All data for the same date lands in the same distribution. If several users are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.

Question 22

HOTSPOT -
You have an Azure subscription that contains a logical Microsoft SQL server named Server1. Server1 hosts an Azure Synapse Analytics SQL dedicated pool named Pool1.
You need to recommend a Transparent Data Encryption (TDE) solution for Server1. The solution must meet the following requirements:
✑ Track the usage of encryption keys.
Maintain the access of client apps to Pool1 in the event of an Azure datacenter outage that affects the availability of the encryption keys.
 Image
What should you include in the recommendation? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: TDE with customer-managed keys
Customer-managed keys are stored in the Azure Key Vault. You can monitor how and when your key vaults are accessed, and by whom. You can do this by enabling logging for Azure Key Vault, which saves information in an Azure storage account that you provide.
Box 2: Create and configure Azure key vaults in two Azure regions
The contents of your key vault are replicated within the region and to a secondary region at least 150 miles away, but within the same geography to maintain high durability of your keys and secrets.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/security/workspaces-encryption
https://docs.microsoft.com/en-us/azure/key-vault/general/logging

Question 23

You have an Azure Synapse Analytics dedicated SQL pool.
You need to create a fact table named Table1 that will store sales data from the last three years. The solution must be optimized for the following query operations:
•	Show order counts by week.
•	Calculate sales totals by region.
•	Calculate sales totals by product.
•	Find all the orders from a given month.
Which data should you use to partition Table1?

A. product

B. month

C. week

D. region

 


Suggested Answer: B

 

Question 24

DRAG DROP -
You have an Azure subscription.
You plan to build a data warehouse in an Azure Synapse Analytics dedicated SQL pool named pool1 that will contain staging tables and a dimensional model.
Pool1 will contain the following tables.
 Image
You need to design the table storage for pool1. The solution must meet the following requirements:
✑ Maximize the performance of data loading operations to Staging.WebSessions.
✑ Minimize query times for reporting queries against the dimensional model.
Which type of table distribution should you use for each table? To answer, drag the appropriate table distribution types to the correct tables. Each table distribution type may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: Replicated –
The best table storage option for a small table is to replicate it across all the Compute nodes.
Box 2: Hash –
Hash-distribution improves query performance on large fact tables.
Box 3: Round-robin –
Round-robin distribution is useful for improving loading speed.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

Question 25

You develop data engineering solutions for a company.
A project requires the deployment of data to Azure Data Lake Storage.
You need to implement role-based access control (RBAC) so that project members can manage the Azure Data Lake Storage resources.
Which three actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. Create security groups in Azure Active Directory (Azure AD) and add project members.

B. Configure end-user authentication for the Azure Data Lake Storage account.

C. Assign Azure AD security groups to Azure Data Lake Storage.

D. Configure Service-to-service authentication for the Azure Data Lake Storage account.

E. Configure access control lists (ACL) for the Azure Data Lake Storage account.

 


Suggested Answer: ACE

AC: Create security groups in Azure Active Directory. Assign users or security groups to Data Lake Storage Gen1 accounts.
E: Assign users or security groups as ACLs to the Data Lake Storage Gen1 file system
Reference:
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data

Question 26

You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns.
 Image
FactPurchase will have 1 million rows of data added daily and will contain three years of data.
Transact-SQL queries similar to the following query will be executed daily.
SELECT -
SupplierKey, StockItemKey, COUNT(*)
FROM FactPurchase -
WHERE DateKey >= 20210101 -
AND DateKey <= 20210131 -
GROUP By SupplierKey, StockItemKey
Which table distribution will minimize query times?

A. replicated

B. hash-distributed on PurchaseKey

C. round-robin

D. hash-distributed on DateKey

 


Suggested Answer: B

Hash-distributed tables improve query performance on large fact tables, and are the focus of this article. Round-robin tables are useful for improving loading speed.
Incorrect:
Not D: Do not use a date column. . All data for the same date lands in the same distribution. If several users are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

Question 27

You are designing an Azure Databricks interactive cluster. The cluster will be used infrequently and will be configured for auto-termination.
You need to ensure that the cluster configuration is retained indefinitely after the cluster is terminated. The solution must minimize costs.
What should you do?

A. Pin the cluster.

B. Create an Azure runbook that starts the cluster every 90 days.

C. Terminate the cluster manually when processing completes.

D. Clone the cluster after it is terminated.

 


Suggested Answer: A

Azure Databricks retains cluster configuration information for up to 70 all-purpose clusters terminated in the last 30 days and up to 30 job clusters recently terminated by the job scheduler. To keep an all-purpose cluster configuration even after it has been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/clusters/

Question 28

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes mapping data flow, and then inserts the data into the data warehouse.
Does this meet the goal?

A. Yes

B. No

 


Suggested Answer: B

If you need to transform data in a way that is not supported by Data Factory, you can create a custom activity, not a mapping flow,5 with your own data processing logic and use the activity in the pipeline. You can create a custom activity to run R scripts on your HDInsight cluster with R installed.
Reference:
https://docs.microsoft.com/en-US/azure/data-factory/transform-data

Question 29

You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:
The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.
 Image
✑ Line total sales amount and line total tax amount will be aggregated in Databricks.
✑ Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.
You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.
What should you recommend?

A. Update

B. Complete

C. Append

 


Suggested Answer: A

By default, streams run in append mode, which adds new records to the table.
Incorrect Answers:
B: Complete mode: replace the entire table with every batch.
Reference:
https://docs.databricks.com/delta/delta-streaming.html

Question 30

You have an Azure Synapse Analytics workspace.
You plan to deploy a lake database by using a database template in Azure Synapse.
Which two elements are included in the template? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. relationships

B. data formats

C. linked services

D. table permissions

E. table definitions

 


Suggested Answer: AE

 

Question 31

You are creating an Azure Data Factory pipeline.
You need to add an activity to the pipeline. The activity must execute a Transact-SQL stored procedure that has the following characteristics:
•	Returns the number of sales invoices for a current date
•	Does NOT require input parameters
Which type on activity should you use?

A. Stored Procedure

B. Get Metadata

C. Append Variable

D. Lookup

 


Suggested Answer: D

 

Question 32

DRAG DROP
-
You have an Azure Data Lake Storage account named account1.
You use an Azure Synapse Analytics serverless SQL pool to access sales data stored in account1.
You need to create a bar chart that displays sales by product. The solution must minimize development effort.
In which order should you perform the actions? To answer, move all actions from the list of actions to the answer area and arrange them in the correct order.
 Image

 


Suggested Answer:
Correct Answer Image

 

Question 33

You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Contacts. Contacts contains a column named Phone.
You need to ensure that users in a specific role only see the last four digits of a phone number when querying the Phone column.
What should you include in the solution?

A. table partitions

B. a default value

C. row-level security (RLS)

D. column encryption

E. dynamic data masking

 


Suggested Answer: E

Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a policy-based security feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.
Reference:
https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview

Question 34

You have an Azure Synapse Analytics dedicated SQL pool named SQL1 and a user named User1.
You need to ensure that User1 can view requests associated with SQL1 by querying the sys.dm_pdw_exec_requests dynamic management view. The solution must follow the principle of least privilege.
Which permission should you grant to User1?

A. VIEW DATABASE STATE

B. SHOWPLAN

C. CONTROL SERVER

D. VIEW ANY DATABASE

 


Suggested Answer: A

 

Question 35

HOTSPOT -
You plan to develop a dataset named Purchases by using Azure Databricks. Purchases will contain the following columns:
✑ ProductID
✑ ItemPrice
✑ LineTotal
✑ Quantity
✑ StoreID
✑ Minute
✑ Month
✑ Hour
Year -
 Image
✑ Day
You need to store the data to support hourly incremental load pipelines that will vary for each Store ID. The solution must minimize storage costs.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: partitionBy –
We should overwrite at the partition level.
Example:
df.write.partitionBy(“y”,”m”,”d”)
.mode(SaveMode.Append)
.parquet(“/data/hive/warehouse/db_name.db/” + tableName)
Box 2: (“StoreID”, “Year”, “Month”, “Day”, “Hour”, “StoreID”)
Box 3: parquet(“/Purchases”)
Reference:
https://intellipaat.com/community/11744/how-to-partition-and-write-dataframe-in-spark-without-deleting-partitions-with-no-new-data

Question 36

You have an on-premises Linux server that contains a database named DB1.
You have an Azure subscription that contains an Azure data factory named ADF1 and an Azure Data Lake Storage account named ADLS1.
You need to create a pipeline in ADF1 that will copy data from DB1 to ADLS1.
Which type of integration runtime should you use to read the data from DB1?

A. self-hosted integration runtime

B. Azure integration runtime

C. Azure-SQL Server Integration Services (SSIS)

 


Suggested Answer: A

 

Question 37

You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?

A. DWU limit

B. Cache hit percentage

C. Local tempdb percentage

D. Data IO percentage

 


Suggested Answer: B

 

Question 38

HOTSPOT
-
You have an Azure Blob storage account that contains a folder. The folder contains 120,000 files. Each file contains 62 columns.
Each day, 1,500 new files are added to the folder.
You plan to incrementally load five data columns from each new file into an Azure Synapse Analytics workspace.
You need to minimize how long it takes to perform the incremental loads.
What should you use to store the files and in which format? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
 Image

 


Suggested Answer:
Correct Answer Image

 

Question 39

You have a SQL pool in Azure Synapse.
A user reports that queries against the pool take longer than expected to complete. You determine that the issue relates to queried columnstore segments.
You need to add monitoring to the underlying storage to help diagnose the issue.
Which two metrics should you monitor? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. Snapshot Storage Size

B. Cache used percentage

C. DWU Limit

D. Cache hit percentage

 


Suggested Answer: BD

D: Cache hit percentage: (cache hits / cache miss) * 100 where cache hits is the sum of all columnstore segments hits in the local SSD cache and cache miss is the columnstore segments misses in the local SSD cache summed across all nodes
B: (cache used / cache capacity) * 100 where cache used is the sum of all bytes in the local SSD cache across all nodes and cache capacity is the sum of the storage capacity of the local SSD cache across all nodes
Incorrect Asnwers:
C: DWU limit: Service level objective of the data warehouse.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-concept-resource-utilization-query-activity

Question 40

You have an Azure subscription that contains an Azure Data Lake Storage Gen2 account named account1 and an Azure Synapse Analytics workspace named workspace1.
You need to create an external table in a serverless SQL pool in workspace1. The external table will reference CSV files stored in account1. The solution must maximize performance.
How should you configure the external table?

A. Use a native external table and authenticate by using a shared access signature (SAS).

B. Use a native external table and authenticate by using a storage account key.

C. Use an Apache Hadoop external table and authenticate by using a shared access signature (SAS).

D. Use an Apache Hadoop external table and authenticate by using a service principal in Microsoft Azure Active Directory (Azure AD), part of Microsoft Entra.

 


Suggested Answer: A

 

Question 41

You have an Azure subscription that is linked to a tenant in Microsoft Azure Active Directory (Azure AD), part of Microsoft Entra. The tenant that contains a security group named Group1. The subscription contains an Azure Data Lake Storage account named myaccount1. The myaccount1 account contains two containers named container1 and container2.
You need to grant Group1 read access to container1. The solution must use the principle of least privilege.
Which role should you assign to Group1?

A. Storage Table Data Reader for myaccount1

B. Storage Blob Data Reader for container1

C. Storage Blob Data Reader for myaccount1

D. Storage Table Data Reader for container1

 


Suggested Answer: B

 

Question 42

You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast columns to specified types of data, and insert the data into a table in an
Azure Synapse Analytic dedicated SQL pool. The CSV file contains three columns named username, comment, and date.
The data flow already contains the following:
✑ A source transformation.
✑ A Derived Column transformation to set the appropriate types of data.
✑ A sink transformation to land the data in the pool.
You need to ensure that the data flow meets the following requirements:
✑ All valid rows must be written to the destination table.
✑ Truncation errors in the comment column must be avoided proactively.
✑ Any rows containing comment values that will cause truncation errors upon insert must be written to a file in blob storage.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. To the data flow, add a sink transformation to write the rows to a file in blob storage.

B. To the data flow, add a Conditional Split transformation to separate the rows that will cause truncation errors.

C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors.

D. Add a select transformation to select only the rows that will cause truncation errors.

 


Suggested Answer: AB

B: Example:
1. This conditional split transformation defines the maximum length of “title” to be five. Any row that is less than or equal to five will go into the GoodRows stream.
Any row that is larger than five will go into the BadRows stream.
Reference Image
2. This conditional split transformation defines the maximum length of “title” to be five. Any row that is less than or equal to five will go into the GoodRows stream.
Any row that is larger than five will go into the BadRows stream.
A:
3. Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for logging. Here, we’ll “auto-map” all of the fields so that we have logging of the complete transaction record. This is a text-delimited CSV file output to a single file in Blob Storage. We’ll call the log file “badrows.csv”.
Reference Image
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file.
Meanwhile, successful rows can continue to write to our target database.
Reference Image
Reference: alt=”Reference Image” />
2. This conditional split transformation defines the maximum length of “title” to be five. Any row that is less than or equal to five will go into the GoodRows stream.
Any row that is larger than five will go into the BadRows stream.
A:
3. Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for logging. Here, we’ll “auto-map” all of the fields so that we have logging of the complete transaction record. This is a text-delimited CSV file output to a single file in Blob Storage. We’ll call the log file “badrows.csv”.
Reference Image
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL truncation errors and put those entries into a log file.
Meanwhile, successful rows can continue to write to our target database.
<img src=”https://www.examtopics.com/assets/media/exam-media/04259/0018400001.png” alt=”Reference Image” />
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows

Question 43

You have an Azure Data Factory pipeline named Pipeline1. Pipeline1 contains a copy activity that sends data to an Azure Data Lake Storage Gen2 account.
Pipeline1 is executed by a schedule trigger.
You change the copy activity sink to a new storage account and merge the changes into the collaboration branch.
After Pipeline1 executes, you discover that data is NOT copied to the new storage account.
You need to ensure that the data is copied to the new storage account.
What should you do?

A. Publish from the collaboration branch.

B. Create a pull request.

C. Modify the schedule trigger.

D. Configure the change feed of the new storage account.

 


Suggested Answer: A

CI/CD lifecycle –
1. A development data factory is created and configured with Azure Repos Git. All developers should have permission to author Data Factory resources like pipelines and datasets.
2. A developer creates a feature branch to make a change. They debug their pipeline runs with their most recent changes
3. After a developer is satisfied with their changes, they create a pull request from their feature branch to the main or collaboration branch to get their changes reviewed by peers.
4. After a pull request is approved and changes are merged in the main branch, the changes get published to the development factory.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-delivery

Question 44

You have an Azure subscription that contains an Azure Synapse Analytics workspace and a user named User1.
You need to ensure that User1 can review the Azure Synapse Analytics database templates from the gallery. The solution must follow the principle of least privilege.
Which role should you assign to User1?

A. Storage Blob Data Contributor.

B. Synapse Administrator

C. Synapse Contributor

D. Synapse User

 


Suggested Answer: C

 

Question 45

You are designing a highly available Azure Data Lake Storage solution that will include geo-zone-redundant storage (GZRS).
You need to monitor for replication delays that can affect the recovery point objective (RPO).
What should you include in the monitoring solution?

A. 5xx: Server Error errors

B. Average Success E2E Latency

C. availability

D. Last Sync Time

 


Suggested Answer: D

Because geo-replication is asynchronous, it is possible that data written to the primary region has not yet been written to the secondary region at the time an outage occurs. The Last Sync Time property indicates the last time that data from the primary region was written successfully to the secondary region. All writes made to the primary region before the last sync time are available to be read from the secondary location. Writes made to the primary region after the last sync time property may or may not be available for reads yet.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/last-sync-time-get

Question 46

You use Azure Stream Analytics to receive data from Azure Event Hubs and to output the data to an Azure Blob Storage account.
You need to output the count of records received from the last five minutes every minute.
Which windowing function should you use?

A. Session

B. Tumbling

C. Sliding

D. Hopping

 


Suggested Answer: D

Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap and be emitted more often than the window size. Events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size.
Reference Image
Reference: alt=”Reference Image” />
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

Question 47

You create an Azure Databricks cluster and specify an additional library to install.
When you attempt to load the library to a notebook, the library in not found.
You need to identify the cause of the issue.
What should you review?

A. notebook logs

B. cluster event logs

C. global init scripts logs

D. workspace logs

 


Suggested Answer: C

Cluster-scoped Init Scripts: Init scripts are shell scripts that run during the startup of each cluster node before the Spark driver or worker JVM starts. Databricks customers use init scripts for various purposes such as installing custom libraries, launching background processes, or applying enterprise security policies.
Logs for Cluster-scoped init scripts are now more consistent with Cluster Log Delivery and can be found in the same root folder as driver and executor logs for the cluster.
Reference:
https://databricks.com/blog/2018/08/30/introducing-cluster-scoped-init-scripts.html

Question 48

DRAG DROP -
You are designing an Azure Data Lake Storage Gen2 structure for telemetry data from 25 million devices distributed across seven key geographical regions. Each minute, the devices will send a JSON payload of metrics to Azure Event Hubs.
You need to recommend a folder structure for the data. The solution must meet the following requirements:
✑ Data engineers from each region must be able to build their own pipelines for the data of their respective region only.
✑ The data must be processed at least once every 15 minutes for inclusion in Azure Synapse Analytics serverless SQL pools.
How should you recommend completing the structure? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Select and Place:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: {raw/regionID}
Box 2: {YYYY}/{MM}/{DD}/{HH}/{mm}
Box 3: {deviceID}
Reference:
https://github.com/paolosalvatori/StreamAnalyticsAzureDataLakeStore/blob/master/README.md

Question 49

HOTSPOT -
You build an Azure Data Factory pipeline to move data from an Azure Data Lake Storage Gen2 container to a database in an Azure Synapse Analytics dedicated
SQL pool.
Data in the container is stored in the following folder structure.
/in/{YYYY}/{MM}/{DD}/{HH}/{mm}
The earliest folder is /in/2021/01/01/00/00. The latest folder is /in/2021/01/15/01/45.
You need to configure a pipeline trigger to meet the following requirements:
✑ Existing data must be loaded.
✑ Data must be loaded every 30 minutes.
✑ Late-arriving data of up to two minutes must be included in the load for the time at which the data should have arrived.
How should you configure the pipeline trigger? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: Tumbling window –
To be able to use the Delay parameter we select Tumbling window.
Box 2:
Recurrence: 30 minutes, not 32 minutes
Delay: 2 minutes.
The amount of time to delay the start of data processing for the window. The pipeline run is started after the expected execution time plus the amount of delay.
The delay defines how long the trigger waits past the due time before triggering a new run. The delay doesn’t alter the window startTime.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger

Question 50

HOTSPOT -
You develop a dataset named DBTBL1 by using Azure Databricks.
DBTBL1 contains the following columns:
✑ SensorTypeID
✑ GeographyRegionID
✑ Year
✑ Month
✑ Day
✑ Hour
✑ Minute
✑ Temperature
✑ WindSpeed
✑ Other
You need to store the data to support daily incremental load pipelines that vary for each GeographyRegionID. The solution must minimize storage costs.
How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
 Image

 


Suggested Answer:
Correct Answer Image

Box 1: .partitionBy –
Incorrect Answers:
✑ .format:
Method: format():
Arguments: “parquet”, “csv”, “txt”, “json”, “jdbc”, “orc”, “avro”, etc.
✑ .bucketBy:
Method: bucketBy()
Arguments: (numBuckets, col, col…, coln)
The number of buckets and names of columns to bucket by. Uses Hive’s bucketing scheme on a filesystem.
Box 2: (“Year”, “Month”, “Day”,”GeographyRegionID”)
Specify the columns on which to do the partition. Use the date columns followed by the GeographyRegionID column.
Box 3: .saveAsTable(“/DBTBL1”)
Method: saveAsTable()
Argument: “table_name”
The table to save to.
Reference:
https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/ch04.html
https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch

Access Full DP-203 Dump Free

Looking for even more practice questions? Click here to access the complete DP-203 Dump Free collection, offering hundreds of questions across all exam objectives.

We regularly update our content to ensure accuracy and relevance—so be sure to check back for new material.

Begin your certification journey today with our DP-203 dump free questions — and get one step closer to exam success!

Share18Tweet11
Previous Post

DP-201 Dump Free

Next Post

DP-500 Dump Free

Next Post

DP-500 Dump Free

DP-900 Dump Free

DVA-C01 Dump Free

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Network+ Practice Test

Comptia Security+ Practice Test

A+ Certification Practice Test

Aws Cloud Practitioner Exam Questions

Aws Cloud Practitioner Practice Exam

Comptia A+ Practice Test

  • About
  • DMCA
  • Privacy & Policy
  • Contact

PracticeTestFree.com materials do not contain actual questions and answers from Cisco's Certification Exams. PracticeTestFree.com doesn't offer Real Microsoft Exam Questions. PracticeTestFree.com doesn't offer Real Amazon Exam Questions.

  • Login
  • Sign Up
No Result
View All Result
  • Quesions
    • Cisco
    • AWS
    • Microsoft
    • CompTIA
    • Google
    • ISACA
    • ECCouncil
    • F5
    • GIAC
    • ISC
    • Juniper
    • LPI
    • Oracle
    • Palo Alto Networks
    • PMI
    • RedHat
    • Salesforce
    • VMware
  • Courses
    • CCNA
    • ENCOR
    • VMware vSphere
  • Certificates

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.