DP-201 Dump Free – 50 Practice Questions to Sharpen Your Exam Readiness.
Looking for a reliable way to prepare for your DP-201 certification? Our DP-201 Dump Free includes 50 exam-style practice questions designed to reflect real test scenarios—helping you study smarter and pass with confidence.
Using an DP-201 dump free set of questions can give you an edge in your exam prep by helping you:
Understand the format and types of questions you’ll face
Pinpoint weak areas and focus your study efforts
Boost your confidence with realistic question practice
Below, you will find 50 free questions from our DP-201 Dump Free collection. These cover key topics and are structured to simulate the difficulty level of the real exam, making them a valuable tool for review or final prep.
You have a C# application that process data from an Azure IoT hub and performs complex transformations.
You need to replace the application with a real-time solution. The solution must reuse as much code as possible from the existing application.
A. Azure Databricks
B. Azure Event Grid
C. Azure Stream Analytics
D. Azure Data Factory
Suggested Answer: C
Azure Stream Analytics on IoT Edge empowers developers to deploy near-real-time analytical intelligence closer to IoT devices so that they can unlock the full value of device-generated data. UDF are available in C# for IoT Edge jobs
Azure Stream Analytics on IoT Edge runs within the Azure IoT Edge framework. Once the job is created in Stream Analytics, you can deploy and manage it using
IoT Hub.
References: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-edge
HOTSPOT -
A company has locations in North America and Europe. The company uses Azure SQL Database to support business apps.
Employees must be able to access the app data in case of a region-wide outage. A multi-region availability solution is needed with the following requirements:
✑ Read-access to data in a secondary region must be available only in case of an outage of the primary region.
✑ The Azure SQL Database compute and storage layers must be integrated and replicated together.
You need to design the multi-region high availability solution.
What should you recommend? To answer, select the appropriate values in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Suggested Answer:
Box 1: Standard –
The following table describes the types of storage accounts and their capabilities:
Box 2: Geo-redundant storage –
If your storage account has GRS enabled, then your data is durable even in the case of a complete regional outage or a disaster in which the primary region isn’t recoverable.
Note: If you opt for GRS, you have two related options to choose from:
GRS replicates your data to another data center in a secondary region, but that data is available to be read only if Microsoft initiates a failover from the primary to secondary region.
Read-access geo-redundant storage (RA-GRS) is based on GRS. RA-GRS replicates your data to another data center in a secondary region, and also provides you with the option to read from the secondary region. With RA-GRS, you can read from the secondary region regardless of whether Microsoft initiates a failover from the primary to secondary region.
Reference: alt=”Reference Image” />
Box 2: Geo-redundant storage –
If your storage account has GRS enabled, then your data is durable even in the case of a complete regional outage or a disaster in which the primary region isn’t recoverable.
Note: If you opt for GRS, you have two related options to choose from:
GRS replicates your data to another data center in a secondary region, but that data is available to be read only if Microsoft initiates a failover from the primary to secondary region.
Read-access geo-redundant storage (RA-GRS) is based on GRS. RA-GRS replicates your data to another data center in a secondary region, and also provides you with the option to read from the secondary region. With RA-GRS, you can read from the secondary region regardless of whether Microsoft initiates a failover from the primary to secondary region.
<img src=”https://www.examtopics.com/assets/media/exam-media/03774/0030400001.jpg” alt=”Reference Image” />
Reference: https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an HDInsight/Hadoop cluster solution that uses Azure Data Lake Gen1 Storage.
The solution requires POSIX permissions and enables diagnostics logging for auditing.
You need to recommend solutions that optimize storage.
Proposed Solution: Ensure that files stored are smaller than 250MB.
Does the solution meet the goal?
A. Yes
B. No
Suggested Answer: B
Ensure that files stored are larger, not smaller than 250MB.
You can have a separate compaction job that combines these files into larger ones.
Note: The file POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent when working with numerous small files. As a best practice, you must batch your data into larger files versus writing thousands or millions of small files to Data Lake Storage Gen1. Avoiding small file sizes can have multiple benefits, such as:
✑ Lowering the authentication checks across multiple files
✑ Reduced open file connections
✑ Faster copying/replication
✑ Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
Reference: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices
You plan to implement an Azure Data Lake Gen2 storage account.
You need to ensure that the data lake will remain available if a data center fails in the primary Azure region. The solution must minimize costs.
Which type of replication should you use for the storage account?
A. geo-redundant storage (GRS)
B. zone-redundant storage (ZRS)
C. locally-redundant storage (LRS)
D. geo-zone-redundant storage (GZRS)
Suggested Answer: A
Geo-redundant storage (GRS) copies your data synchronously three times within a single physical location in the primary region using LRS. It then copies your data asynchronously to a single physical location in the secondary region.
Incorrect Answers:
B: Zone-redundant storage (ZRS) copies your data synchronously across three Azure availability zones in the primary region. For applications requiring high availability, Microsoft recommends using ZRS in the primary region, and also replicating to a secondary region.
C: Locally redundant storage (LRS) copies your data synchronously three times within a single physical location in the primary region. LRS is the least expensive replication option, but is not recommended for applications requiring high availability.
D: GZRS is more expensive compared to GRS.
Reference: https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy
HOTSPOT -
You are designing an enterprise data warehouse in Azure Synapse Analytics that will store website traffic analytic in a star schema.
You plan to have a fact table for website visits. The table will be approximately 5 GB.
You need to recommend which distribution type and index type to use for the table. The solution must provide the fastest query performance.
What should you recommend? To answer, select the appropriate options in the answer area
NOTE: Each correct selection is worth one point.
Hot Area:
What should you recommend as a batch processing solution for Health Interface?
A. Azure CycleCloud
B. Azure Stream Analytics
C. Azure Data Factory
D. Azure Databricks
Suggested Answer: B
Scenario: ADatum identifies the following requirements for the Health Interface application:
Support a more scalable batch processing solution in Azure.
Reduce the amount of time it takes to add data from new hospitals to Health Interface.
Data Factory integrates with the Azure Cosmos DB bulk executor library to provide the best performance when you write to Azure Cosmos DB.
Reference: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db
You need to design a backup solution for the processed customer data.
What should you include in the design?
A. AzCopy
B. AdlCopy
C. Geo-Redundancy
D. Geo-Replication
Suggested Answer: C
Scenario: All data must be backed up in case disaster recovery is required.
Geo-redundant storage (GRS) is designed to provide at least 99.99999999999999% (16 9’s) durability of objects over a given year by replicating your data to a secondary region that is hundreds of miles away from the primary region. If your storage account has GRS enabled, then your data is durable even in the case of a complete regional outage or a disaster in which the primary region isn’t recoverable.
Reference: https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs
A company purchases IoT devices to monitor manufacturing machinery. The company uses an IoT appliance to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
A. Azure Data Factory instance using the Azure portal
B. Azure Analysis Services using Microsoft Visual Studio
C. Azure Stream Analytics Edge application using Microsoft Visual Studio
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure SQL Database that will use elastic pools. You plan to store data about customers in a table. Each record uses a value for
CustomerID.
You need to recommend a strategy to partition data based on values in CustomerID.
Proposed Solution: Separate data into customer regions by using vertical partitioning.
Does the solution meet the goal?
What should you do to improve high availability of the real-time data processing solution?
A. Deploy identical Azure Stream Analytics jobs to paired regions in Azure.
B. Deploy a High Concurrency Databricks cluster.
C. Deploy an Azure Stream Analytics job and use an Azure Automation runbook to check the status of the job and to start the job if it stops.
D. Set Data Lake Storage to use geo-redundant storage (GRS).
Suggested Answer: A
Guarantee Stream Analytics job reliability during service updates
Part of being a fully managed service is the capability to introduce new service functionality and improvements at a rapid pace. As a result, Stream Analytics can have a service update deploy on a weekly (or more frequent) basis. No matter how much testing is done there is still a risk that an existing, running job may break due to the introduction of a bug. If you are running mission critical jobs, these risks need to be avoided. You can reduce this risk by following Azure’s paired region model.
Scenario: The application development team will create an Azure event hub to receive real-time sales data, including store number, date, time, product ID, customer loyalty number, price, and discount amount, from the point of sale (POS) system and output the data to data storage in Azure
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-job-reliability
Design for high availability and disaster recovery
DRAG DROP -
You discover that the highest chance of corruption or bad data occurs during nightly inventory loads.
You need to ensure that you can quickly restore the data to its state before the nightly load and avoid missing any streaming data.
Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Suggested Answer:
Scenario: Daily inventory data comes from a Microsoft SQL server located on a private network.
Step 1: Before the nightly load, create a user-defined restore point
SQL Data Warehouse performs a geo-backup once per day to a paired data center. The RPO for a geo-restore is 24 hours. If you require a shorter RPO for geo- backups, you can create a user-defined restore point and restore from the newly created restore point to a new data warehouse in a different region.
Step 2: Restore the data warehouse to a new name on the same server.
Step 3: Swap the restored database warehouse name.
Reference: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/backup-and-restore
Which Azure service should you recommend for the analytical data store so that the business analysts and data scientists can execute ad hoc queries as quickly as possible?
A. Azure Data Lake Storage Gen2
B. Azure Cosmos DB
C. Azure Stream Analytics
D. Azure Synapse Analytics
Suggested Answer: A
There are several differences between a data lake and a data warehouse. Data structure, ideal users, processing methods, and the overall purpose of the data are the key differentiators.
<img src=”https://www.examtopics.com/assets/media/exam-media/03774/0004300001.png” alt=”Reference Image” />
Scenario: Litware employs business analysts who prefer to analyze data by using Microsoft Power BI, and data scientists who prefer analyzing data in Azure
Databricks notebooks.
Note: Azure Synapse Analytics formerly known as Azure SQL Data Warehouse.
Design Azure data storage solutions
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure SQL Database that will use elastic pools. You plan to store data about customers in a table. Each record uses a value for
CustomerID.
You need to recommend a strategy to partition data based on values in CustomerID.
Proposed Solution: Separate data into shards by using horizontal partitioning.
Does the solution meet the goal?
A. Yes
B. No
Suggested Answer: A
Horizontal Partitioning – Sharding: Data is partitioned horizontally to distribute rows across a scaled out data tier. With this approach, the schema is identical on all participating databases. This approach is also called ג€shardingג€. Sharding can be performed and managed using (1) the elastic database tools libraries or (2) self- sharding. An elastic query is used to query or compile reports across many shards.
Reference: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-query-overview
You are designing an Azure Databricks interactive cluster.
You need to ensure that the cluster meets the following requirements:
✑ Enable auto-termination
✑ Retain cluster configuration indefinitely after cluster termination.
What should you recommend?
A. Start the cluster after it is terminated.
B. Pin the cluster
C. Clone the cluster after it is terminated.
D. Terminate the cluster manually at process completion.
You are designing a solution for a company. The solution will use model training for objective classification.
You need to design the solution.
What should you recommend?
A. an Azure Cognitive Services application
B. a Spark Streaming job
C. interactive Spark queries
D. Power BI models
E. a Spark application that uses Spark MLib.
Suggested Answer: E
Spark in SQL Server big data cluster enables AI and machine learning.
You can use Apache Spark MLlib to create a machine learning application to do simple predictive analysis on an open dataset.
MLlib is a core Spark library that provides many utilities useful for machine learning tasks, including utilities that are suitable for:
✑ Classification
✑ Regression
✑ Clustering
✑ Topic modeling
✑ Singular value decomposition (SVD) and principal component analysis (PCA)
✑ Hypothesis testing and calculating sample statistics
Reference: https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-machine-learning-mllib-ipython
You need to implement an Azure Storage account that will use a Blob service endpoint that uses zone-redundant storage (ZRS).
The storage account must only accept connections from a virtual network over Azure Private Link.
What should you include in the implementation?
A. a private endpoint for Azure Blob storage
B. a customer-managed key
C. a shared access signature (SAS)
D. a firewall rule to allow traffic from the virtual network
Suggested Answer: A
You can use private endpoints for your Azure Storage accounts to allow clients on a virtual network (VNet) to securely access data over a Private Link.
When creating the private endpoint, you must specify the storage account and the storage service to which it connects. You need a separate private endpoint for each storage service in a storage account that you need to access, namely Blobs, Data Lake Storage Gen2, Files, Queues, Tables, or Static Websites.
Note: The private endpoint uses an IP address from the VNet address space for your storage account service. Network traffic between the clients on the VNet and the storage account traverses over the VNet and a private link on the Microsoft backbone network, eliminating exposure from the public internet.
Reference: https://docs.microsoft.com/en-us/azure/storage/common/storage-private-endpoints
HOTSPOT -
You are designing a solution for a company. You plan to use Azure Databricks.
You need to recommend workloads and tiers to meet the following requirements:
✑ Provide managed clusters for running production jobs.
✑ Provide persistent clusters that support auto-scaling for analytics processes.
✑ Provide role-based access control (RBAC) support for Notebooks.
What should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Suggested Answer:
Box 1: Data Engineering Only –
Box 2: Data Engineering and Data Analytics
Box 3: Standard –
Box 4: Data Analytics only –
Box 5: Premium –
Premium required for RBAC. Data Analytics Premium Tier provide interactive workloads to analyze data collaboratively with notebooks
Reference: https://azure.microsoft.com/en-us/pricing/details/databricks/
You need to design the storage for the telemetry capture system.
What storage solution should you use in the design?
A. Azure Synapse Analytics
B. Azure Databricks
C. Azure Cosmos DB
Suggested Answer: C
Azure Cosmos DB is a globally distributed database service. You can associate any number of Azure regions with your Azure Cosmos account and your data is automatically and transparently replicated.
Scenario:
Telemetry Capture –
The telemetry capture system records each time a vehicle passes in front of a sensor. The sensors run on a custom embedded operating system and record the following telemetry data:
✑ Time
✑ Location in latitude and longitude
✑ Speed in kilometers per hour (kmph)
✑ Length of vehicle in meters
You must write all telemetry data to the closest Azure region. The sensors used for the telemetry capture system have a small amount of memory available and so must write data as quickly as possible to avoid losing telemetry data.
Reference: https://docs.microsoft.com/en-us/azure/cosmos-db/high-availability
HOTSPOT -
You have an Azure Storage account that generates 200,000 new files daily. The file names have a format of {YYYY}/{MM}/{DD}/{HH}/{CustomerID}.csv.
You need to design an Azure Data Factory solution that will load new data from the storage account to an Azure Data Lake once hourly. The solution must minimize load times and costs.
How should you configure the solution? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point
Hot Area:
Suggested Answer:
Box 1: Incremental load –
When you start to build the end to end data integration flow the first challenge is to extract data from different data stores, where incrementally (or delta) loading data after an initial full load is widely used at this stage. Now, ADF provides a new capability for you to incrementally copy new or changed files only by
LastModifiedDate from a file-based store. By using this new feature, you do not need to partition the data by time-based folder or file name. The new or changed file will be automatically selected by its metadata LastModifiedDate and copied to the destination store.
Box 2: Tumbling window –
Tumbling window triggers are a type of trigger that fires at a periodic time interval from a specified start time, while retaining state. Tumbling windows are a series of fixed-sized, non-overlapping, and contiguous time intervals. A tumbling window trigger has a one-to-one relationship with a pipeline and can only reference a singular pipeline.
Reference: https://azure.microsoft.com/en-us/blog/incrementally-copy-new-files-by-lastmodifieddate-with-azure-data-factory/ https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger
HOTSPOT -
You need to design the data loading pipeline for Planning Assistance.
What should you recommend? To answer, drag the appropriate technologies to the correct locations. Each technology may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Hot Area:
Suggested Answer:
Box 1: SqlSink Table –
Sensor data must be stored in a Cosmos DB named treydata in a collection named SensorData
Box 2: Cosmos Bulk Loading –
Use Copy Activity in Azure Data Factory to copy data from and to Azure Cosmos DB (SQL API).
Scenario: Data from the Sensor Data collection will automatically be loaded into the Planning Assistance database once a week by using Azure Data Factory. You must be able to manually trigger the data load process.
Data used for Planning Assistance must be stored in a sharded Azure SQL Database.
Reference: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db
You manage a solution that uses Azure HDInsight clusters.
You need to implement a solution to monitor cluster performance and status.
Which technology should you use?
A. Azure HDInsight.NET SDK
B. Azure HDInsight REST API
C. Ambari REST API
D. Azure Log Analytics
E. Ambari Web UI
Suggested Answer: E
Ambari is the recommended tool for monitoring utilization across the whole cluster. The Ambari dashboard shows easily glanceable widgets that display metrics such as CPU, network, YARN memory, and HDFS disk usage. The specific metrics shown depend on cluster type. The “Hosts” tab shows metrics for individual nodes so you can ensure the load on your cluster is evenly distributed. The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs.
Reference: https://azure.microsoft.com/en-us/blog/monitoring-on-hdinsight-part-1-an-overview/ https://ambari.apache.org/
You are planning a big data solution in Azure.
You need to recommend a technology that meets the following requirements:
✑ Be optimized for batch processing.
✑ Support autoscaling.
✑ Support per-cluster scaling.
Which technology should you recommend?
A. Azure Synapse Analytics
B. Azure HDInsight with Spark
C. Azure Analysis Services
D. Azure Databricks
Suggested Answer: D
Azure Databricks is an Apache Spark-based analytics platform. Azure Databricks supports autoscaling and manages the Spark cluster for you.
Incorrect Answers:
A, B:
<img src=”https://www.examtopics.com/assets/media/exam-media/03774/0025000001.png” alt=”Reference Image” />
DRAG DROP -
Which three actions should you perform in sequence to allow FoodPrep access to the analytical data store? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Select and Place:
Suggested Answer:
Scenario: Litware will build a custom application named FoodPrep to provide store employees with the calculation results of how many prepared food items to produce every four hours.
Step 1: Register the FoodPrep application in Azure AD
You create your Azure AD application and service principal.
Step 2: Create a login for the service principal on the Azure SQL Server
Step 3: Create a user-defined database role that grant access.
To access resources in your subscription, you must assign the application to a role.
You can then assign the required permissions to the service principal.
Reference: https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal
You are designing a log storage solution that will use Azure Blob storage containers.
CSV log files will be generated by a multi-tenant application. The log files will be generated for each customer at five-minute intervals. There will be more than
5,000 customers. Typically, the customers will query data generated on the day the data was created.
You need to recommend a naming convention for the virtual directories and files. The solution must minimize the time it takes for the customers to query the log files.
What naming convention should you recommend?
A. {year}/{month}/{day}/{hour}/{minute}/{CustomerID}.csv
B. {year}/{month}/{day}/{CustomerID}/{hour}/{minute}.csv
C. {minute}/{hour}/{day}/{month}/{year}/{CustomeriD}.csv
D. {CustomerID}/{year}/{month}/{day}/{hour}/{minute}.csv
You are designing an Azure Synapse solution that will provide a query interface for the data stored in an Azure Storage account. The storage account is only accessible from a virtual network.
You need to recommend an authentication mechanism to ensure that the solution can access the source data.
What should you recommend?
A. a shared key
B. an Azure Active Directory (Azure AD) service principal
HOTSPOT -
You are designing a solution that will use Azure Table storage. The solution will log records in the following entity.
You are evaluating which partition key to use based on the following two scenarios:
✑ Scenario1: Minimize hotspots under heavy write workloads.
✑ Scenario2: Ensure that date lookups are as efficient as possible for read workloads.
Which partition key should you use for each scenario? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
HOTSPOT -
You are designing a solution to store flat files.
You need to recommend a storage solution that meets the following requirements:
✑ Supports automatically moving files that have a modified date that is older than one year to an archive in the data store
✑ Minimizes costs
A higher latency is acceptable for the archived files.
Which storage location and archiving method should you recommend? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Suggested Answer:
Storage location: Azure Blob Storage
Archiving method: A lifecycle management policy
Azure Blob storage lifecycle management offers a rich, rule-based policy for GPv2 and Blob storage accounts. Use the policy to transition your data to the appropriate access tiers or expire at the end of the data’s lifecycle.
The lifecycle management policy lets you:
✑ Transition blobs to a cooler storage tier (hot to cool, hot to archive, or cool to archive) to optimize for performance and cost
✑ Delete blobs at the end of their lifecycles
✑ Define rules to be run once per day at the storage account level
Apply rules to containers or a subset of blobs
Reference: alt=”Reference Image” />
Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifecycle-management-concepts
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure SQL database that has columns. The columns contain sensitive Personally Identifiable Information (PII) data.
You need to design a solution that tracks and stores all the queries executed against the PII data. You must be able to review the data in Azure Monitor, and the data must be available for at least 45 days.
Solution: You add classifications to the columns that contain sensitive data. You turn on Auditing and set the audit log destination to use Azure Blob storage.
Does this meet the goal?
A. Yes
B. No
Suggested Answer: A
Auditing has been enhanced to log sensitivity classifications or labels of the actual data that were returned by the query. This would enable you to gain insights on who is accessing sensitive data.
Note: You now have multiple options for configuring where audit logs will be written. You can write logs to an Azure storage account, to a Log Analytics workspace for consumption by Azure Monitor logs, or to event hub for consumption using event hub. You can configure any combination of these options, and audit logs will be written to each.
Reference: https://azure.microsoft.com/en-us/blog/announcing-public-preview-of-data-discovery-classification-for-microsoft-azure-sql-data-warehouse/
You plan to use Azure SQL Database to support a line of business app.
You need to identify sensitive data that is stored in the database and monitor access to the data.
Which three actions should you recommend? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Configure Data Discovery and Classification.
B. Implement Transparent Data Encryption (TDE).
C. Enable Auditing.
D. Run Vulnerability Assessment.
E. Use Advanced Threat Protection.
Suggested Answer: ACE
A: Data Discovery & Classification is built into Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics. It provides advanced capabilities for discovering, classifying, labeling, and reporting the sensitive data in your databases.
C: An important aspect of the information-protection paradigm is the ability to monitor access to sensitive data. Azure SQL Auditing has been enhanced to include a new field in the audit log called data_sensitivity_information. This field logs the sensitivity classifications (labels) of the data that was returned by a query.
E: Data Discovery & Classification is part of the Advanced Data Security offering, which is a unified package for advanced Azure SQL security capabilities. You can access and manage Data Discovery & Classification via the central SQL Advanced Data Security section of the Azure portal.
Reference: https://docs.microsoft.com/en-us/azure/azure-sql/database/data-discovery-and-classification-overview
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an HDInsight/Hadoop cluster solution that uses Azure Data Lake Gen1 Storage.
The solution requires POSIX permissions and enables diagnostics logging for auditing.
You need to recommend solutions that optimize storage.
Proposed Solution: Ensure that files stored are larger than 250MB.
Does the solution meet the goal?
A. Yes
B. No
Suggested Answer: A
Depending on what services and workloads are using the data, a good size to consider for files is 256 MB or greater. If the file sizes cannot be batched when landing in Data Lake Storage Gen1, you can have a separate compaction job that combines these files into larger ones.
Note: POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent when working with numerous small files. As a best practice, you must batch your data into larger files versus writing thousands or millions of small files to Data Lake Storage Gen1. Avoiding small file sizes can have multiple benefits, such as:
✑ Lowering the authentication checks across multiple files
✑ Reduced open file connections
✑ Faster copying/replication
✑ Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
Reference: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices
You are designing a solution for the ad hoc analysis of data in Azure Databricks notebooks. The data will be stored in Azure Blob storage.
You need to ensure that Blob storage will support the recovery of the data if the data is overwritten accidentally.
What should you recommend?
A. Enable soft delete.
B. Add a resource lock.
C. Enable diagnostics logging.
D. Use read-access geo-redundant storage (RA-GRS).
Suggested Answer: A
Soft delete protects blob data from being accidentally or erroneously modified or deleted. When soft delete is enabled for a storage account, blobs, blob versions
(preview), and snapshots in that storage account may be recovered after they are deleted, within a retention period that you specify.
Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-overview
A company is designing a solution that uses Azure Databricks.
The solution must be resilient to regional Azure datacenter outages.
You need to recommend the redundancy type for the solution.
What should you recommend?
A company manufactures automobile parts. The company installs IoT sensors on manufacturing machinery.
You must design a solution that analyzes data from the sensors.
You need to recommend a solution that meets the following requirements:
✑ Data must be analyzed in real-time.
✑ Data queries must be deployed using continuous integration.
✑ Data must be visualized by using charts and graphs.
✑ Data must be available for ETL operations in the future.
✑ The solution must support high-volume data ingestion.
Which three actions should you recommend? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Use Azure Analysis Services to query the data. Output query results to Power BI.
B. Configure an Azure Event Hub to capture data to Azure Data Lake Storage.
C. Develop an Azure Stream Analytics application that queries the data and outputs to Power BI. Use Azure Data Factory to deploy the Azure Stream Analytics application.
D. Develop an application that sends the IoT data to an Azure Event Hub.
E. Develop an Azure Stream Analytics application that queries the data and outputs to Power BI. Use Azure Pipelines to deploy the Azure Stream Analytics application.
F. Develop an application that sends the IoT data to an Azure Data Lake Storage container.
You are designing an Azure Databricks cluster that runs user-defined local processes.
You need to recommend a cluster configuration that meets the following requirements:
✑ Minimize query latency.
✑ Reduce overall costs without compromising other requirements.
✑ Maximize the number of users that can run queries on the cluster at the same time.
Which cluster type should you recommend?
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an Azure SQL Database that will use elastic pools. You plan to store data about customers in a table. Each record uses a value for
CustomerID.
You need to recommend a strategy to partition data based on values in CustomerID.
Proposed Solution: Separate data into customer regions by using horizontal partitioning.
Does the solution meet the goal?
A. Yes
B. No
Suggested Answer: B
We should use Horizontal Partitioning through Sharding, not divide through regions.
Note: Horizontal Partitioning – Sharding: Data is partitioned horizontally to distribute rows across a scaled out data tier. With this approach, the schema is identical on all participating databases. This approach is also called ג€shardingג€. Sharding can be performed and managed using (1) the elastic database tools libraries or
(2) self-sharding. An elastic query is used to query or compile reports across many shards.
Reference: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-query-overview
HOTSPOT -
Which Azure Data Factory components should you recommend using together to import the customer data from Salesforce to Data Lake Storage? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Suggested Answer:
Box 1: Self-hosted integration runtime
A self-hosted IR is capable of nunning copy activity between a cloud data stores and a data store in private network.
Box 2: Schedule trigger –
Schedule every 8 hours –
Box 3: Copy activity –
Scenario:
✑ Customer data, including name, contact information, and loyalty number, comes from Salesforce and can be imported into Azure once every eight hours. Row modified dates are not trusted in the source table.
✑ Product data, including product ID, name, and category, comes from Salesforce and can be imported into Azure once every eight hours. Row modified dates are not trusted in the source table.
HOTSPOT -
You have an Azure Data Lake Storage Gen2 account named account1 that stores logs as shown in the following table.
You do not expect that the logs will be accessed during the retention periods.
You need to recommend a solution for account1 that meets the following requirements:
✑ Automatically deletes the logs at the end of each retention period
✑ Minimizes storage costs
What should you include in the recommendation? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
Suggested Answer:
Box 1: Store the infrastructure in the Cool access tier and the application logs in the Archive access tier.
Cool – Optimized for storing data that is infrequently accessed and stored for at least 30 days.
Archive – Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours.
Box 2: Azure Blob storage lifecycle management rules
Blob storage lifecycle management offers a rich, rule-based policy that you can use to transition your data to the best access tier and to expire data at the end of its lifecycle.
Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers
You have an Azure subscription that contains an Azure virtual machine and an Azure Storage account. The virtual machine will access the storage account.
You are planning the security design for the storage account.
You need to ensure that only the virtual machine can access the storage account.
Which two actions should you include in the design? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. Select Allow trusted Microsoft services to access this storage account.
B. Select Allow read access to storage logging from any network.
C. Enable a virtual network service endpoint.
D. Set the Allow access from setting to Selected networks.
Suggested Answer: AC
C: Virtual Network (VNet) service endpoint provides secure and direct connectivity to Azure services over an optimized route over the Azure backbone network.
Endpoints allow you to secure your critical Azure service resources to only your virtual networks. Service Endpoints enables private IP addresses in the VNet to reach the endpoint of an Azure service without needing a public IP address on the VNet.
A: You must have Allow trusted Microsoft services to access this storage account turned on under the Azure Storage account Firewalls and Virtual networks settings menu.
Incorrect Answers:
D: Virtual Network (VNet) service endpoint policies allow you to filter egress virtual network traffic to Azure Storage accounts over service endpoint, and allow data exfiltration to only specific Azure Storage accounts. Endpoint policies provide granular access control for virtual network traffic to Azure Storage when connecting over service endpoint.
Reference: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview
You are designing an anomaly detection solution for streaming data from an Azure IoT hub. The solution must meet the following requirements:
✑ Send the output to Azure Synapse.
✑ Identify spikes and dips in time series data.
✑ Minimize development and configuration effort
Which should you include in the solution?
You are designing a real-time stream processing solution in Azure Stream Analytics. The solution must read data from a blob container in an Azure Storage account via a service endpoint.
You need to recommend an authentication mechanism for the solution.
What should you recommend?
A. a managed identity
B. a storage access signature (SAS)
C. a user-assigned managed identity
D. an account key
Suggested Answer: A
Azure Stream Analyticsג€‰supportsג€‰Managed Identity authenticationג€‰for both Azure Event Hubsג€‰input and output.
Note: First, you create a managed identity for your Azure Stream Analytics job.ג€‰
1. In theג€‰Azure portal, open your Azure Stream Analytics job.ג€‰
2. Fromג€‰theג€‰leftג€‰navigationג€‰menu, selectג€‰Managed Identityג€‰located underג€‰Configure. Then, check the box next toג€‰Useג€‰System-assigned Managed Identityג€‰and selectג€‰Save.
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/event-hubs-managed-identity
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are designing an HDInsight/Hadoop cluster solution that uses Azure Data Lake Gen1 Storage.
The solution requires POSIX permissions and enables diagnostics logging for auditing.
You need to recommend solutions that optimize storage.
Proposed Solution: Implement compaction jobs to combine small files into larger files.
Does the solution meet the goal?
A. Yes
B. No
Suggested Answer: A
Depending on what services and workloads are using the data, a good size to consider for files is 256 MB or greater. If the file sizes cannot be batched when landing in Data Lake Storage Gen1, you can have a separate compaction job that combines these files into larger ones.
Note: POSIX permissions and auditing in Data Lake Storage Gen1 comes with an overhead that becomes apparent when working with numerous small files. As a best practice, you must batch your data into larger files versus writing thousands or millions of small files to Data Lake Storage Gen1. Avoiding small file sizes can have multiple benefits, such as:
✑ Lowering the authentication checks across multiple files
✑ Reduced open file connections
✑ Faster copying/replication
✑ Fewer files to process when updating Data Lake Storage Gen1 POSIX permissions
Reference: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-best-practices
HOTSPOT -
You manage an on-premises server named Server1 that has a database named Database1. The company purchases a new application that can access data from
Azure SQL Database.
You recommend a solution to migrate Database1 to an Azure SQL Database instance.
What should you recommend? To answer, select the appropriate configuration in the answer area.
NOTE: Each correct selection is worth one point.
Hot Area:
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure Data Lake Storage account that contains a staging zone.
You need to design a daily process to ingest incremental data from the staging zone, transform the data by executing an R script, and then insert the transformed data into a data warehouse in Azure Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that copies the data to a staging table in the data warehouse, and then uses a stored procedure to execute the R script.
Does this meet the goal?
A. Yes
B. No
Suggested Answer: A
If you need to transform data in a way that is not supported by Data Factory, you can create a custom activity with your own data processing logic and use the activity in the pipeline. You can create a custom activity to run R scripts on your HDInsight cluster with R installed.
Reference: https://docs.microsoft.com/en-US/azure/data-factory/transform-data
You need to recommend an Azure Cosmos DB solution that meets the following requirements:
✑ All data that was NOT modified during the last 30 days must be purged automatically.
✑ The solution must NOT affect ongoing user requests.
What should you recommend using to purge the data?
A. an Azure Cosmos DB stored procedure executed by an Azure logic app
B. an Azure Cosmos DB REST API Delete Document operation called by an Azure function
C. Time To Live (TTL) setting in Azure Cosmos DB
D. an Azure Cosmos DB change feed queried by an Azure function
A company purchases IoT devices to monitor manufacturing machinery. The company uses an Azure IoT Hub to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
A. Azure Data Factory instance using Azure Portal
B. Azure Analysis Services using Microsoft Visual Studio
C. Azure Stream Analytics Edge application using Microsoft Visual Studio
D. Azure Data Factory instance using Microsoft Visual Studio
Suggested Answer: C
Azure Stream Analytics (ASA) on IoT Edge empowers developers to deploy near-real-time analytical intelligence closer to IoT devices so that they can unlock the full value of device-generated data.
You can use Visual Studio plugin to create an ASA Edge job.
Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-edge
You are planning a streaming data solution that will use Azure Databricks. The solution will stream sales transaction data from an online store. The solution has the following specifications:
✑ The output data will contain items purchased, quantity, line total sales amount, and line total tax amount.
✑ Line total sales amount and line total tax amount will be aggregated in Databricks.
✑ Sales transactions will never be updated. Instead, new rows will be added to adjust a sale.
You need to recommend an output mode for the dataset that will be processed by using Structured Streaming. The solution must minimize duplicate data.
What should you recommend?
A. Append
B. Complete
C. Update
Suggested Answer: A
Append Mode: Only new rows appended in the result table since the last trigger are written to external storage. This is applicable only for the queries where existing rows in the Result Table are not expected to change.
Incorrect Answers:
B: Complete Mode: The entire updated result table is written to external storage. It is up to the storage connector to decide how to handle the writing of the entire table.
C: Update Mode: Only the rows that were updated in the result table since the last trigger are written to external storage. This is different from Complete Mode in that Update Mode outputs only the rows that have changed since the last trigger. If the query doesn’t contain aggregations, it is equivalent to Append mode.
Reference: https://docs.microsoft.com/en-us/azure/databricks/getting-started/spark/streaming
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure SQL database that has columns. The columns contain sensitive Personally Identifiable Information (PII) data.
You need to design a solution that tracks and stores all the queries executed against the PII data. You must be able to review the data in Azure Monitor, and the data must be available for at least 45 days.
Solution: You add classifications to the columns that contain sensitive data. You turn on Auditing and set the audit log destination to use Azure Log Analytics.
Does this meet the goal?
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Azure SQL database that has columns. The columns contain sensitive Personally Identifiable Information (PII) data.
You need to design a solution that tracks and stores all the queries executed against the PII data. You must be able to review the data in Azure Monitor, and the data must be available for at least 45 days.
Solution: You create a SELECT trigger on the table in SQL Database that writes the query to a new table in the database, and then executes a stored procedure that looks up the column classifications and joins to the query text.
Does this meet the goal?
You work for a finance company.
You need to design a business network analysis solution that meets the following requirements:
✑ Analyzes the flow of transactions between the Azure environments of the company's various partner organizations
✑ Supports Gremlin (graph) queries
What should you include in the solution?
A. Azure Cosmos DB
B. Azure Synapse
C. Azure Analysis Services
D. Azure Data Lake Storage Gen2
Suggested Answer: A
Gremlin is one of the most popular query languages for exploring and analyzing data modeled as property graphs. There are many graph-database vendors out there that support Gremlin as their query language, in particular Azure Cosmos DB which is one of the world’s first self-managed, geo-distributed, multi-master capable graph databases.
Azure Synapse Link for Azure Cosmos DB is a cloud native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data. Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.
Reference: https://jayanta-mondal.medium.com/analyzing-and-improving-the-performance-azure-cosmos-db-gremlin-queries-7f68bbbac2c https://docs.microsoft.com/en-us/azure/cosmos-db/synapse-link-use-cases
Access Full DP-201 Dump Free
Looking for even more practice questions? Click here to access the complete DP-201 Dump Free collection, offering hundreds of questions across all exam objectives.
We regularly update our content to ensure accuracy and relevance—so be sure to check back for new material.
Begin your certification journey today with our DP-201 dump free questions — and get one step closer to exam success!