BDS-C00 Practice Exam Free

Table of Contents

BDS-C00 Practice Exam Free – 50 Questions to Simulate the Real Exam

Are you getting ready for the BDS-C00 certification? Take your preparation to the next level with our BDS-C00 Practice Exam Free – a carefully designed set of 50 realistic exam-style questions to help you evaluate your knowledge and boost your confidence.

Using a BDS-C00 practice exam free is one of the best ways to:

Experience the format and difficulty of the real exam
Identify your strengths and focus on weak areas
Improve your test-taking speed and accuracy

Below, you will find 50 realistic BDS-C00 practice exam free questions covering key exam topics. Each question reflects the structure and challenge of the actual exam.

Question 1

An advertising organization uses an application to process a stream of events that are received from clients in multiple unstructured formats.
The application does the following:
✑ Transforms the events into a single structured format and streams them to Amazon Kinesis for real-time analysis.
✑ Stores the unstructured raw events from the log files on local hard drivers that are rotated and uploaded to Amazon S3.
The organization wants to extract campaign performance reporting using an existing Amazon redshift cluster.
Which solution will provide the performance data with the LEAST number of operations?

A. Install the Amazon Kinesis Data Firehose agent on the application servers and use it to stream the log files directly to Amazon Redshift.

B. Create an external table in Amazon Redshift and point it to the S3 bucket where the unstructured raw events are stored.

C. Write an AWS Lambda function that triggers every hour to load the new log files already in S3 to Amazon redshift.

D. Connect Amazon Kinesis Data Firehose to the existing Amazon Kinesis stream and use it to stream the event directly to Amazon Redshift.

Question 2

An organization has 10,000 devices that generate 100 GB of telemetry data per day, with each record size around 10 KB. Each record has 100 fields, and one field consists of unstructured log data with a "String" data type in the English language. Some fields are required for the real-time dashboard, but all fields must be available for long-term generation.
The organization also has 10 PB of previously cleaned and structured data, partitioned by Date, in a SAN that must be migrated to AWS within one month.
Currently, the organization does not have any real-time capabilities in their solution. Because of storage limitations in the on-premises data warehouse, selective data is loaded while generating the long-term trend with ANSI SQL queries through JDBC for visualization. In addition to the one-time data loading, the organization needs a cost-effective and real-time solution.
How can these requirements be met? (Choose two.)

A. use AWS IoT to send data from devices to an Amazon SQS queue, create a set of workers in an Auto Scaling group and read records in batch from the queue to process and save the data. Fan out to an Amazon SNS queue attached with an AWS Lambda function to filter the request dataset and save it to Amazon Elasticsearch Service for real-time analytics.

B. Create a Direct Connect connection between AWS and the on-premises data center and copy the data to Amazon S3 using S3 Acceleration. Use Amazon Athena to query the data.

C. Use AWS IoT to send the data from devices to Amazon Kinesis Data Streams with the IoT rules engine. Use one Kinesis Data Firehose stream attached to a Kinesis stream to batch and stream the data partitioned by date. Use another Kinesis Firehose stream attached to the same Kinesis stream to filter out the required fields to ingest into Elasticsearch for real-time analytics.

D. Use AWS IoT to send the data from devices to Amazon Kinesis Data Streams with the IoT rules engine. Use one Kinesis Data Firehose stream attached to a Kinesis stream to stream the data into an Amazon S3 bucket partitioned by date. Attach an AWS Lambda function with the same Kinesis stream to filter out the required fields for ingestion into Amazon DynamoDB for real-time analytics.

E. use multiple AWS Snowball Edge devices to transfer data to Amazon S3, and use Amazon Athena to query the data.

Question 3

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.
Which three steps should the data engineer take to accomplish this task? (Choose three.)

A. Create a new KMS key in the destination region.

B. Copy the existing KMS key to the destination region.

C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.

D. In the source region, enable cross-region replication and specify the name of the copy grant created.

E. In the destination region, enable cross-region replication and specify the name of the copy grant created.

F. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key created in the destination region. ADF

Question 4

How should an Administrator BEST architect a large multi-layer Long Short-Term Memory (LSTM) recurrent neural network (RNN) running with MXNET on
Amazon EC2? (Choose two.)

A. Use data parallelism to partition the workload over multiple devices and balance the workload within the GPUs.

B. Use compute-optimized EC2 instances with an attached elastic GPU.

C. Use general purpose GPU computing instances such as G3 and P3.

D. Use processing parallelism to partition the workload over multiple storage devices and balance the workload within the GPUs.

Question 5

An organization is soliciting public feedback through a web portal that has been deployed to track the number of requests and other important data. As part of reporting and visualization, AmazonQuickSight connects to an Amazon RDS database to virtualize data. Management wants to understand some important metrics about feedback and how the feedback has changed over the last four weeks in a visual representation.
What would be the MOST effective way to represent multiple iterations of an analysis in Amazon QuickSight that would show how the data has changed over the last four weeks?

A. Use the analysis option for data captured in each week and view the data by a date range.

B. Use a pivot table as a visual option to display measured values and weekly aggregate data as a row dimension.

C. Use a dashboard option to create an analysis of the data for each week and apply filters to visualize the data change.

D. Use a story option to preserve multiple iterations of an analysis and play the iterations sequentially.

Question 6

Company A operates in Country X. Company A maintains a large dataset of historical purchase orders that contains personal data of their customers in the form of full names and telephone numbers. The dataset consists of 5 text files, 1TB each. Currently the dataset resides on-premises due to legal requirements of storing personal data in-country. The research and development department needs to run a clustering algorithm on the dataset and wants to use Elastic Map Reduce service in the closest AWS region. Due to geographic distance, the minimum latency between the on-premises system and the closet AWS region is 200 ms.
Which option allows Company A to do clustering in the AWS Cloud and meet the legal requirement of maintaining personal data in-country?

A. Anonymize the personal data portions of the dataset and transfer the data files into Amazon S3 in the AWS region. Have the EMR cluster read the dataset using EMRFS.

B. Establish a Direct Connect link between the on-premises system and the AWS region to reduce latency. Have the EMR cluster read the data directly from the on-premises storage system over Direct Connect.

C. Encrypt the data files according to encryption standards of Country X and store them on AWS region in Amazon S3. Have the EMR cluster read the dataset using EMRFS.

D. Use AWS Import/Export Snowball device to securely transfer the data to the AWS region and copy the files onto an EBS volume. Have the EMR cluster read the dataset using EMRFS.

Question 7

A company uses Amazon Redshift for its enterprise data warehouse. A new on-premises PostgreSQL OLTP
DB must be integrated into the data warehouse. Each table in the PostgreSQL DB has an indexed timestamp column. The data warehouse has a staging layer to load source data into the data warehouse environment for further processing.
The data lag between the source PostgreSQL DB and the Amazon Redshift staging layer should NOT exceed four hours.
What is the most efficient technique to meet these requirements?

A. Create a DBLINK on the source DB to connect to Amazon Redshift. Use a PostgreSQL trigger on the source table to capture the new insert/update/delete event and execute the event on the Amazon Redshift staging table.

B. Use a PostgreSQL trigger on the source table to capture the new insert/update/delete event and write it to Amazon Kinesis Streams. Use a KCL application to execute the event on the Amazon Redshift staging table.

C. Extract the incremental changes periodically using a SQL query. Upload the changes to multiple Amazon Simple Storage Service (S3) objects, and run the COPY command to load to the Amazon Redshift staging layer.

D. Extract the incremental changes periodically using a SQL query. Upload the changes to a single Amazon Simple Storage Service (S3) object, and run the COPY command to load to the Amazon Redshift staging layer.

Question 8

An Operations team continuously monitors the number of visitors to a website to identify any potential system problems. The number of website visitors varies throughout the day. The site is more popular in the middle of the day and less popular at night.
Which type of dashboard display would be the MOST useful to allow staff to quickly and correctly identify system problems?

A. A vertical stacked bar chart showing today’s website visitors and the historical average number of website visitors.

B. An overlay line chart showing today’s website visitors at one-minute intervals and also the historical average number of website visitors.

C. A single KPI metric showing the statistical variance between the current number of website visitors and the historical number of website visitors for the current time of day.

D. A scatter plot showing today’s website visitors on the X-axis and the historical average number of website visitors on the Y-axis.

Question 9

A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who participates in the trial requires visual reports each morning. The reports are built from aggregations of all the sensor data taken each minute.
What is the most cost-effective solution for creating this visualization each day?

A. Use Kinesis Aggregators Library to generate reports for reviewing the patient sensor data and generate a QuickSight visualization on the new data each morning for the physician to review.

B. Use a transient EMR cluster that shuts down after use to aggregate the patient sensor data each night and generate a QuickSight visualization on the new data each morning for the physician to review.

C. Use Spark streaming on EMR to aggregate the patient sensor data in every 15 minutes and generate a QuickSight visualization on the new data each morning for the physician to review.

D. Use an EMR cluster to aggregate the patient sensor data each night and provide Zeppelin notebooks that look at the new data residing on the cluster each morning for the physician to review.

Question 10

A company that manufactures and sells smart air conditioning units also offers add-on services so that customers can see real-time dashboards in a mobile application or a web browser. Each unit sends its sensor information in JSON format every two seconds for processing and analysis. The company also needs to consume this data to predict possible equipment problems before they occur. A few thousand pre-purchased units will be delivered in the next couple of months. The company expects high market growth in the next year and needs to handle a massive amount of data and scale without interruption.
Which ingestion solution should the company use?

A. Write sensor data records to Amazon Kinesis Streams. Process the data using KCL applications for the end-consumer dashboard and anomaly detection workflows.

B. Batch sensor data to Amazon Simple Storage Service (S3) every 15 minutes. Flow the data downstream to the end-consumer dashboard and to the anomaly detection application.

C. Write sensor data records to Amazon Kinesis Firehose with Amazon Simple Storage Service (S3) as the destination. Consume the data with a KCL application for the end-consumer dashboard and anomaly detection.

D. Write sensor data records to Amazon Relational Database Service (RDS). Build both the end-consumer dashboard and anomaly detection application on top of Amazon RDS.

Question 11

An organization is designing an application architecture. The application will have over 100 TB of data and will support transactions that arrive at rates from hundreds per second to tens of thousands per second, depending on the day of the week and time of the day. All transaction data, must be durably and reliably stored. Certain read operations must be performed with strong consistency.
Which solution meets these requirements?

A. Use Amazon DynamoDB as the data store and use strongly consistent reads when necessary.

B. Use an Amazon Relational Database Service (RDS) instance sized to meet the maximum anticipated transaction rate and with the High Availability option enabled.

C. Deploy a NoSQL data store on top of an Amazon Elastic MapReduce (EMR) cluster, and select the HDFS High Durability option.

D. Use Amazon Redshift with synchronous replication to Amazon Simple Storage Service (S3) and row-level locking for strong consistency.

Question 12

An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived.
How should the administrator recommend storing the log data?

A. Create an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on the device folders.

B. Create an Amazon DynamoDB table partitioned on the device and sorted on date, write log data to table. Execute the EMR job on the Amazon DynamoDB table.

C. Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder.

D. Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table.

Question 13

A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after an hour.
What is the most cost-efficient option to meet these requirements?

A. Write file contents to an Amazon DynamoDB table.

B. Copy files to Amazon S3 Standard Storage.

C. Write file contents to Amazon ElastiCache.

D. Copy files to Amazon S3 infrequent Access Storage.

Question 14

An organization is using Amazon Kinesis Data Streams to collect data generated from thousands of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to 12 million records every day, but Lambda is processing only around 450 thousand records. Amazon CloudWatch indicates that throttling on Lambda is not occurring.
What should be done to ensure that all data is processed? (Choose two.)

A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

B. Decrease the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

C. Create multiple Lambda functions that will consume the same Amazon Kinesis stream.

D. Increase the number of vCores allocated for the Lambda function.

E. Increase the number of shards on the Amazon Kinesis stream.

Question 15

An organization needs to store sensitive information on Amazon S3 and process it through Amazon EMR. Data must be encrypted on Amazon S3 and Amazon
EMR at rest and in transit. Using Thrift Server, the Data Analysis team users HIVE to interact with this data. The organization would like to grant access to only specific databases and tables, giving permission only to the SELECT statement.
Which solution will protect the data and limit user access to the SELECT statement on a specific portion of data?

A. Configure Transparent Data Encryption on Amazon EMR. Create an Amazon EC2 instance and install Apache Ranger. Configure the authorization on the cluster to use Apache Ranger.

B. Configure data encryption at rest for EMR File System (EMRFS) on Amazon S3. Configure data encryption in transit for traffic between Amazon S3 and EMRFS. Configure storage and SQL base authorization on HiveServer2.

C. Use AWS KMS for encryption of data. Configure and attach multiple roles with different permissions based on the different user needs.

D. Configure Security Group on Amazon EMR. Create an Amazon VPC endpoint for Amazon S3. Configure HiveServer2 to use Kerberos authentication on the cluster.

Question 16

A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints:
✑ Bicycle origination points
✑ Bicycle destination points
✑ Mileage between the points
✑ Number of bicycle slots available at the station (which is variable based on the station location)
✑ Number of slots available and taken at a given time
The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier.
The new bicycle stations must be located to provide the most riders access to bicycles.
How should this task be performed?

A. Move the data from Amazon S3 into Amazon EBS-backed volumes and use an EC-2 based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization.

B. Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicycle stations.

C. Persist the data on Amazon S3 and use a transient EMR cluster with spot instances to run a Spark streaming job that will move the data into Amazon Kinesis.

D. Keep the data on Amazon S3 and use an Amazon EMR-based Hadoop cluster with spot instances to run a Spark job that performs a stochastic gradient descent optimization over EMRFS.

Question 17

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.
Which three steps should the data engineer take to accomplish this task? (Choose three.)

A. Create a new KMS key in the destination region.

B. Copy the existing KMS key to the destination region.

C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.

D. In the source region, enable cross-region replication and specify the name of the copy grant created.

E. In the destination region, enable cross-region replication and specify the name of the copy grant created.

Question 18

An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some of the data sets to be able to perform interactive joins and then display results quickly.
Which technology is most appropriate to enable this capability?

A. Presto

B. MicroStrategy

C. Pig

D. R Studio

Question 19

The department of transportation for a major metropolitan area has placed sensors on roads at key locations around the city. The goal is to analyze the flow of traffic and notifications from emergency services to identify potential issues and to help planners correct trouble spots.
A data engineer needs a scalable and fault-tolerant solution that allows planners to respond to issues within
30 seconds of their occurrence.
Which solution should the data engineer choose?

A. Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for analysis. Collect emergency services events with Amazon SQS and store in Amazon DynampDB for analysis.

B. Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis. Collect emergency services events with Amazon Kinesis Firehose and store in Amazon Redshift for analysis.

C. Collect both sensor data and emergency services events with Amazon Kinesis Streams and use DynamoDB for analysis.

D. Collect both sensor data and emergency services events with Amazon Kinesis Firehose and use Amazon Redshift for analysis.

Question 20

A large oil and gas company needs to provide near real-time alerts when peak thresholds are exceeded in its pipeline system. The company has developed a system to capture pipeline metrics such as flow rate, pressure, and temperature using millions of sensors. The sensors deliver to AWS IoT.
What is a cost-effective way to provide near real-time alerts on the pipeline metrics?

A. Create an AWS IoT rule to generate an Amazon SNS notification.

B. Store the data points in an Amazon DynamoDB table and poll if for peak metrics data from an Amazon EC2 application.

C. Create an Amazon Machine Learning model and invoke it with AWS Lambda.

D. Use Amazon Kinesis Streams and a KCL-based application deployed on AWS Elastic Beanstalk.

Question 21

A medical record filing system for a government medical fund is using an Amazon S3 bucket to archive documents related to patients. Every patient visit to a physician creates a new file, which can add up millions of files each month. Collection of these files from each physician is handled via a batch process that runs ever night using AWS Data Pipeline. This is sensitive data, so the data and any associated metadata must be encrypted at rest.
Auditors review some files on a quarterly basis to see whether the records are maintained according to regulations. Auditors must be able to locate any physical file in the S3 bucket for a given date, patient, or physician. Auditors spend a significant amount of time location such files.
What is the most cost- and time-efficient collection methodology in this situation?

A. Use Amazon Kinesis to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.

B. Use Amazon API Gateway to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.

C. Use Amazon S3 event notification to populate an Amazon DynamoDB table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.

D. Use Amazon S3 event notification to populate an Amazon Redshift table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.

Question 22

An organization's data warehouse contains sales data for reporting purposes. data governance policies prohibit staff from accessing the customers' credit card numbers.
How can these policies be adhered to and still allow a Data Scientist to group transactions that use the same credit card number?

A. Store a cryptographic hash of the credit card number.

B. Encrypt the credit card number with a symmetric encryption key, and give the key only to the authorized Data Scientist.

C. Mask the credit card numbers to only show the last four digits of the credit card number.

D. Encrypt the credit card number with an asymmetric encryption key and give the decryption key only to the authorized Data Scientist.

Question 23

A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP address into
5-minute chunks stored in Amazon S3.
Many analysts in the company use Hive on Amazon EMR to analyze this data. Their queries always reference a single IP address. Data must be optimized for querying based on IP address using Hive running on Amazon
EMR.
What is the most efficient method to query the data with Hive?

A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS.

B. Store the Amazon S3 objects with the following naming scheme: bucket_name/source=ip_address/ year=yy/month=mm/day=dd/hour=hh/filename.

C. Store the data in an HBase table with the IP address as the row key.

D. Store the events for an IP address as a single file in Amazon S3 and add metadata with keys: Hive_Partitioned_IPAddress.

Question 24

A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers.
The architect is building a platform where suppliers can view the status of one or more of their shipments.
Each supplier can have multiple roles that will only allow access to specific fields in the resulting information.
Which strategy allows the appropriate level of access control and requires the LEAST amount of management work?

A. Send the tracking data to Amazon Kinesis Streams. Use AWS Lambda to store the data in an Amazon DynamoDB Table. Generate temporary AWS credentials for the suppliers users with AWS STS, specifying fine-grained security policies to limit access only to their applicable data.

B. Send the tracking data to Amazon Kinesis Firehose. Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate data for each suppliers roles. Generate temporary AWS credentials for the suppliers users with AWS STS. Limit access to the appropriate files through security policies.

C. Send the tracking data to Amazon Kinesis Streams. Use Amazon EMR with Spark Streaming to store the data in HBase. Create one table per supplier. Use HBase Kerberos integration with the suppliers users. Use HBase ACL-based security to limit access for the roles to their specific table and columns.

D. Send the tracking data to Amazon Kinesis Firehose. Store the data in an Amazon Redshift cluster. Create views for the suppliers users and roles. Allow suppliers access to the Amazon Redshift cluster using a user limited to the applicable view. B

Question 25

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage.
Which AWS service strategy is best for this use case?

A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.

B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.

C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.

D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.

Question 26

An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the companys DynamoDB table. The company is NOT consuming close to the provisioned capacity. The table contains a large number of items and is partitioned on user and sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU.
Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.)

A. The structure of any GSIs that have been defined on the table

B. CloudWatch data showing consumed and provisioned write capacity when writes are being throttled

C. Application-level metrics showing the average item size and peak update rates for each attribute

D. The structure of any LSIs that have been defined on the table

E. The maximum historical WCU and RCU for the table

Question 27

An online retailer is using Amazon DynamoDB to store data related to customer transactions. The items in the table contains several string attributes describing the transaction as well as a JSON attribute containing the shopping cart and other details corresponding to the transaction. Average item size is  250KB, most of which is associated with the JSON attribute. The average customer generates  3GB of data per month.
Customers access the table to display their transaction history and review transaction details as needed.
Ninety percent of the queries against the table are executed when building the transaction history view, with the other 10% retrieving transaction details. The table is partitioned on CustomerID and sorted on transaction date.
The client has very high read capacity provisioned for the table and experiences very even utilization, but complains about the cost of Amazon DynamoDB compared to other NoSQL solutions.
Which strategy will reduce the cost associated with the clients read queries while not degrading quality?

A. Modify all database calls to use eventually consistent reads and advise customers that transaction history may be one second out-of-date.

B. Change the primary table to partition on TransactionID, create a GSI partitioned on customer and sorted on date, project small attributes into GSI, and then query GSI for summary data and the primary table for JSON details.

C. Vertically partition the table, store base attributes on the primary table, and create a foreign key reference to a secondary table containing the JSON data. Query the primary table for summary data and the secondary table for JSON details.

D. Create an LSI sorted on date, project the JSON attribute into the index, and then query the primary table for summary data and the LSI for JSON details.

Question 28

A real-time bidding company is rebuilding their monolithic application and is focusing on serving real-time data. A large number of reads and writes are generated from thousands of concurrent users who follow items and bid on the company's sale offers.
The company is experiencing high latency during special event spikes, with millions of concurrent users.
The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard.
What is the BEST approach for serving and analyzing data, considering the constraint of the row latency on the highly demanded data?

A. Use Amazon Aurora with Multi Availability Zone and read replicas. Use Amazon ElastiCache in front of the read replicas to serve read-only content quickly. Use the same database as datasource for the dashboard.

B. Use Amazon DynamoDB to store real-time data with Amazon DynamoDB. Accelerator to serve content quickly. use Amazon DynamoDB Streams to replay all changes to the table, process and stream to Amazon Elasti search Service with AWS Lambda.

C. Use Amazon RDS with Multi Availability Zone. Provisioned IOPS EBS volume for storage. Enable up to five read replicas to serve read-only content quickly. Use Amazon EMR with Sqoop to import Amazon RDS data into HDFS for analysis.

D. Use Amazon Redshift with a DC2 node type and a multi-mode cluster. Create an Amazon EC2 instance with pgpoo1 installed. Create an Amazon ElastiCache cluster and route read requests through pgpoo1, and use Amazon Redshift for analysis. D

Question 29

A media advertising company handles a large number of real-time messages sourced from over 200 websites.
The companys data engineer needs to collect and process records in real time for analysis using Spark
Streaming on Amazon Elastic MapReduce (EMR). The data engineer needs to fulfill a corporate mandate to keep ALL raw messages as they are received as a top priority.
Which Amazon Kinesis configuration meets these requirements?

A. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Pull messages off Firehose with Spark Streaming in parallel to persistence to Amazon S3.

B. Publish messages to Amazon Kinesis Streams. Pull messages off Streams with Spark Streaming in parallel to AWS Lambda pushing messages from Streams to Firehose backed by Amazon Simple Storage Service (S3).

C. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Use AWS Lambda to pull messages from Firehose to Streams for processing with Spark Streaming.

D. Publish messages to Amazon Kinesis Streams, pull messages off with Spark Streaming, and write row data to Amazon Simple Storage Service (S3) before and after processing.

Question 30

An administrator needs to design a distribution strategy for a star schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema.
In which three circumstances would choosing Key-based distribution be most appropriate? (Select three.)

A. When the administrator needs to optimize a large, slowly changing dimension table.

B. When the administrator needs to reduce cross-node traffic.

C. When the administrator needs to optimize the fact table for parity with the number of slices.

D. When the administrator needs to balance data distribution and collocation data.

E. When the administrator needs to take advantage of data locality on a local node for joins and aggregates.

Question 31

An organization would like to run analytics on their Elastic Load Balancing logs stored in Amazon S3 and join this data with other tables in Amazon S3. The users are currently using a BI tool connecting with JDBC and would like to keep using this BI tool.
Which solution would result in the LEAST operational overhead?

A. Trigger a Lambda function when a new log file is added to the bucket to transform and load it into Amazon Redshift. Run the VACUUM command on the Amazon Redshift cluster every night.

B. Launch a long-running Amazon EMR cluster that continuously downloads and transforms new files from Amazon S3 into its HDFS storage. Use Presto to expose the data through JDBC.

C. Trigger a Lambda function when a new log file is added to the bucket to transform and move it to another bucket with an optimized data structure. Use Amazon Athena to query the optimized bucket.

D. Launch a transient Amazon EMR cluster every night that transforms new log files and loads them into Amazon Redshift.

Question 32

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema.
In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)

A. When the tables are highly denormalized and do NOT participate in frequent joins.

B. When data must be grouped based on a specific key on a defined slice.

C. When data transfer between nodes must be eliminated.

D. When a new table has been loaded and it is unclear how it will be joined to dimension.

Question 33

An organization is currently using an Amazon EMR long-running cluster with the latest Amazon EMR release for analytic jobs and is storing data as external tables on Amazon S3.
The company needs to launch multiple transient EMR clusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running cluster.
Which solution will expose the Hive metastore with the LEAST operational effort?

A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.

B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site classification to point to the Amazon RDS database.

C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive metastore information to derby.

D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.

Question 34

A web-hosting company is building a web analytics tool to capture clickstream data from all of the websites hosted within its platform and to provide near-real-time business intelligence. This entire system is built on
AWS services. The web-hosting company is interested in using Amazon Kinesis to collect this data and perform sliding window analytics.
What is the most reliable and fault-tolerant technique to get each website to send data to Amazon Kinesis with every click?

A. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the sessionID as a partition key and set up a loop to retry until a success response is received.

B. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis Producer Library .addRecords method.

C. Each web server buffers the requests until the count reaches 500 and sends them to Amazon Kinesis using the Amazon Kinesis PutRecord API.

D. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the exponential back-off algorithm for retries until a successful response is received.

Question 35

There are thousands of text files on Amazon S3. The total size of the files is 1 PB. The files contain retail order information for the past 2 years. A data engineer needs to run multiple interactive queries to manipulate the data. The Data Engineer has AWS access to spin up an Amazon EMR cluster. The data engineer needs to use an application on the cluster to process this data and return the results in interactive time frame.
Which application on the cluster should the data engineer use?

A. Oozie

B. Apache Pig with Tachyon

C. Apache Hive

D. Presto

Question 36

A companys social media manager requests more staff on the weekends to handle an increase in customer contacts from a particular region. The company needs a report to visualize the trends on weekends over the past 6 months using QuickSight.
How should the data be represented?

A. A line graph plotting customer contacts vs. time, with a line for each region

B. A pie chart per region plotting customer contacts per day of week

C. A map of regions with a heatmap overlay to show the volume of customer contacts

D. A bar graph plotting region vs. volume of social media contacts

Question 37

An administrator receives about 100 files per hour into Amazon S3 and will be loading the files into Amazon
Redshift. Customers who analyze the data within Redshift gain significant value when they receive data as quickly as possible. The customers have agreed to a maximum loading interval of 5 minutes.
Which loading approach should the administrator use to meet this objective?

A. Load each file as it arrives because getting data into the cluster as quickly as possibly is the priority.

B. Load the cluster as soon as the administrator has the same number of files as nodes in the cluster.

C. Load the cluster when the administrator has an event multiple of files relative to Cluster Slice Count, or 5 minutes, whichever comes first.

D. Load the cluster when the number of files is less than the Cluster Slice Count.

Question 38

An Amazon Kinesis stream needs to be encrypted.
Which approach should be used to accomplish this task?

A. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the producer.

B. Use a partition key to segment the data by MD5 hash function, which makes it undecipherable while in transit.

C. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the consumer.

D. Use a shard to segment the data, which has built-in functionality to make it indecipherable while in transit.

Question 39

A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This application must be submitted to regulators for review. The data engineer needs to provide a control framework that lists the security controls from the process to follow to add new users down to the physical controls of the data center, including items like security guards and cameras.
How should this control mapping be achieved using AWS?

A. Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided.

B. Request data center Temporary Auditor access to an AWS data center to verify the control mapping.

C. Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the applications architecture to map to the control framework.

D. Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided.

Question 40

Managers in a company need access to the human resources database that runs on Amazon Redshift, to run reports about their employees. Managers must only see information about their direct reports.
Which technique should be used to address this requirement with Amazon Redshift?

A. Define an IAM group for each manager with each employee as an IAM user in that group, and use that to limit the access.

B. Use Amazon Redshift snapshot to create one cluster per manager. Allow the manager to access only their designated clusters.

C. Define a key for each manager in AWS KMS and encrypt the data for their employees with their private keys.

D. Define a view that uses the employee’s manager name to filter the records based on current user names.

Question 41

A social media customer has data from different data sources including RDS running MySQL, Redshift, and
Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.
What is the most cost-effective solution to meet these requirements?

A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.

B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.

C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.

D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.

Question 42

A gaming organization is developing a new game and would like to offer real-time competition to their users. The data architecture has the following characteristics:
✑ The game application is writing events directly to Amazon DynamoDB from the user's mobile device.
✑ Users from the website can access their statistics directly from DynamoDB.
✑ The game servers are accessing DynamoDB to update the user's information.
✑ The data science team extracts data from DynamoDB for various applications.
The engineering team has already agreed to the IAM roles and policies to use for the data science team and the application.
Which actions will provide the MOST security, while maintaining the necessary access to the website and game application? (Choose two.)

A. Use Amazon Cognito user pool to authenticate to both the website and the game application.

B. Use IAM identity federation to authenticate to both the website and the game application.

C. Create an IAM policy with PUT permission for both the website and the game application.

D. Create an IAM policy with fine-grained permission for both the website and the game application.

E. Create an IAM policy with PUT permission for the game application and an IAM policy with GET permission for the website.

Question 43

A company is centralizing a large number of unencrypted small files from multiple Amazon S3 buckets. The company needs to verify that the files contain the same data after centralization.
Which method meets the requirements?

A. Compare the S3 Etags from the source and destination objects.

B. Call the S3 CompareObjects API for the source and destination objects.

C. Place a HEAD request against the source and destination objects comparing SIG v4.

D. Compare the size of the source and destination objects.

Question 44

An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, Amazon
Redshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog.
How can this be accomplished?

A. Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the DynamoDB table.

B. Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.

C. Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the catalog.

D. Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.

Question 45

An organization needs to design and deploy a large-scale data storage solution that will be highly durable and highly flexible with respect to the type and structure of data being stored. The data to be stored will be sent or generated from a variety of sources and must be persistently available for access and processing by multiple applications.
What is the most cost-effective technique to meet these requirements?

A. Use Amazon Simple Storage Service (S3) as the actual data storage system, coupled with appropriate tools for ingestion/acquisition of data and for subsequent processing and querying.

B. Deploy a long-running Amazon Elastic MapReduce (EMR) cluster with Amazon Elastic Block Store (EBS) volumes for persistent HDFS storage and appropriate Hadoop ecosystem tools for processing and querying.

C. Use Amazon Redshift with data replication to Amazon Simple Storage Service (S3) for comprehensive durable data storage, processing, and querying.

D. Launch an Amazon Relational Database Service (RDS), and use the enterprise grade and capacity of the Amazon Aurora engine for storage, processing, and querying.

Question 46

A customer has an Amazon S3 bucket. Objects are uploaded simultaneously by a cluster of servers from multiple streams of data. The customer maintains a catalog of objects uploaded in Amazon S3 using an
Amazon DynamoDB table. This catalog has the following fileds: StreamName, TimeStamp, and ServerName, from which ObjectName can be obtained.
The customer needs to define the catalog to support querying for a given stream or server within a defined time range.
Which DynamoDB table scheme is most efficient to support these queries?

A. Define a Primary Key with ServerName as Partition Key and TimeStamp as Sort Key. Do NOT define a Local Secondary Index or Global Secondary Index.

B. Define a Primary Key with StreamName as Partition Key and TimeStamp followed by ServerName as Sort Key. Define a Global Secondary Index with ServerName as partition key and TimeStamp followed by StreamName.

C. Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with StreamName as Partition Key. Define a Global Secondary Index with TimeStamp as Partition Key.

D. Define a Primary Key with ServerName as Partition Key. Define a Local Secondary Index with TimeStamp as Partition Key. Define a Global Secondary Index with StreamName as Partition Key and TimeStamp as Sort Key.

Question 47

A company is building a new application in AWS. The architect needs to design a system to collect application log events. The design should be a repeatable pattern that minimizes data loss if an application instance fails, and keeps a durable copy of a log data for at least 30 days.
What is the simplest architecture that will allow the architect to analyze the logs?

A. Write them directly to a Kinesis Firehose. Configure Kinesis Firehose to load the events into an Amazon Redshift cluster for analysis.

B. Write them to a file on Amazon Simple Storage Service (S3). Write an AWS Lambda function that runs in response to the S3 event to load the events into Amazon Elasticsearch Service for analysis.

C. Write them to the local disk and configure the Amazon CloudWatch Logs agent to load the data into CloudWatch Logs and subsequently into Amazon Elasticsearch Service.

D. Write them to CloudWatch Logs and use an AWS Lambda function to load them into HDFS on an Amazon Elastic MapReduce (EMR) cluster for analysis.

Question 48

A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job.
Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the
EMR job.
Which recommendation should an administrator provide?

A. Reduce the HDFS block size to increase the number of task processors.

B. Use bzip2 or Snappy rather than gzip for the archives.

C. Decompress the gzip archives and store the data as CSV files.

D. Use Avro rather than gzip for the archives.

Question 49

A gas company needs to monitor gas pressure in their pipelines. Pressure data is streamed from sensors placed throughout the pipelines to monitor the data in real time. When an anomaly is detected, the system must send a notification to open valve. An Amazon Kinesis stream collects the data from the sensors and an anomaly Kinesis stream triggers an AWS Lambda function to open the appropriate valve.
Which solution is the MOST cost-effective for responding to anomalies in real time?

A. Attach a Kinesis Firehose to the stream and persist the sensor data in an Amazon S3 bucket. Schedule an AWS Lambda function to run a query in Amazon Athena against the data in Amazon S3 to identify anomalies. When a change is detected, the Lambda function sends a message to the anomaly stream to open the valve.

B. Launch an Amazon EMR cluster that uses Spark Streaming to connect to the Kinesis stream and Spark machine learning to detect anomalies. When a change is detected, the Spark application sends a message to the anomaly stream to open the valve.

C. Launch a fleet of Amazon EC2 instances with a Kinesis Client Library application that consumes the stream and aggregates sensor data over time to identify anomalies. When an anomaly is detected, the application sends a message to the anomaly stream to open the valve.

D. Create a Kinesis Analytics application by using the RANDOM_CUT_FOREST function to detect an anomaly. When the anomaly score that is returned from the function is outside of an acceptable range, a message is sent to the anomaly stream to open the valve.

Question 50

A customer has a machine learning workflow that consists of multiple quick cycles of reads-writes-reads on
Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles.
How should the customer accomplish this?

A. Turn on EMRFS consistent view when configuring the EMR cluster.

B. Use AWS Data Pipeline to orchestrate the data processing cycles.

C. Set hadoop.data.consistency = true in the core-site.xml file.

D. Set hadoop.s3.consistency = true in the core-site.xml file.

Free Access Full BDS-C00 Practice Exam Free

Looking for additional practice? Click here to access a full set of BDS-C00 practice exam free questions and continue building your skills across all exam domains.

Our question sets are updated regularly to ensure they stay aligned with the latest exam objectives—so be sure to visit often!

Good luck with your BDS-C00 certification journey!

BDS-C00 Practice Exam Free

AZ-900 Practice Exam Free

CAS-003 Practice Exam Free

CAS-003 Practice Exam Free

CAS-004 Practice Exam Free

CCAK Practice Exam Free

Leave a Reply Cancel reply

Recommended

Network+ Practice Test

Comptia Security+ Practice Test

A+ Certification Practice Test

Aws Cloud Practitioner Exam Questions

Aws Cloud Practitioner Practice Exam

Comptia A+ Practice Test

Welcome Back!

Create New Account!

Retrieve your password