BDS-C00 Practice Questions Free

Table of Contents

BDS-C00 Practice Questions Free – 50 Exam-Style Questions to Sharpen Your Skills

Are you preparing for the BDS-C00 certification exam? Kickstart your success with our BDS-C00 Practice Questions Free – a carefully selected set of 50 real exam-style questions to help you test your knowledge and identify areas for improvement.

Practicing with BDS-C00 practice questions free gives you a powerful edge by allowing you to:

Understand the exam structure and question formats
Discover your strong and weak areas
Build the confidence you need for test day success

Below, you will find 50 free BDS-C00 practice questions designed to match the real exam in both difficulty and topic coverage. They’re ideal for self-assessment or final review. You can click on each Question to explore the details.

Question 1

An organization is developing a mobile social application and needs to collect logs from all devices on which it is installed. The organization is evaluating the
Amazon Kinesis Data Streams to push logs and Amazon EMR to process data. They want to store data on HDFS using the default replication factor to replicate data among the cluster, but they are concerned about the durability of the data. Currently, they are producing 300 GB of raw data daily, with additional spikes during special events. They will need to scale out the Amazon EMR cluster to match the increase in streamed data.
Which solution prevents data loss and matches compute demand?

A. Use multiple Amazon EBS volumes on Amazon EMR to store processed data and scale out the Amazon EMR cluster as needed.

B. Use the EMR File System and Amazon S3 to store processed data and scale out the Amazon EMR cluster as needed.

C. Use Amazon DynamoDB to store processed data and scale out the Amazon EMR cluster as needed.

D. use Amazon Kinesis Data Firehose and, instead of using Amazon EMR, stream logs directly into Amazon Elasticsearch Service.

Question 2

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.
Which three steps should the data engineer take to accomplish this task? (Choose three.)

A. Create a new KMS key in the destination region.

B. Copy the existing KMS key to the destination region.

C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.

D. In the source region, enable cross-region replication and specify the name of the copy grant created.

E. In the destination region, enable cross-region replication and specify the name of the copy grant created.

F. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key created in the destination region. ADF

Question 3

A media advertising company handles a large number of real-time messages sourced from over 200 websites.
The companys data engineer needs to collect and process records in real time for analysis using Spark
Streaming on Amazon Elastic MapReduce (EMR). The data engineer needs to fulfill a corporate mandate to keep ALL raw messages as they are received as a top priority.
Which Amazon Kinesis configuration meets these requirements?

A. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Pull messages off Firehose with Spark Streaming in parallel to persistence to Amazon S3.

B. Publish messages to Amazon Kinesis Streams. Pull messages off Streams with Spark Streaming in parallel to AWS Lambda pushing messages from Streams to Firehose backed by Amazon Simple Storage Service (S3).

C. Publish messages to Amazon Kinesis Firehose backed by Amazon Simple Storage Service (S3). Use AWS Lambda to pull messages from Firehose to Streams for processing with Spark Streaming.

D. Publish messages to Amazon Kinesis Streams, pull messages off with Spark Streaming, and write row data to Amazon Simple Storage Service (S3) before and after processing.

Question 4

A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP address into
5-minute chunks stored in Amazon S3.
Many analysts in the company use Hive on Amazon EMR to analyze this data. Their queries always reference a single IP address. Data must be optimized for querying based on IP address using Hive running on Amazon
EMR.
What is the most efficient method to query the data with Hive?

A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS.

B. Store the Amazon S3 objects with the following naming scheme: bucket_name/source=ip_address/ year=yy/month=mm/day=dd/hour=hh/filename.

C. Store the data in an HBase table with the IP address as the row key.

D. Store the events for an IP address as a single file in Amazon S3 and add metadata with keys: Hive_Partitioned_IPAddress.

Question 5

An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some of the data sets to be able to perform interactive joins and then display results quickly.
Which technology is most appropriate to enable this capability?

A. Presto

B. MicroStrategy

C. Pig

D. R Studio

Question 6

A company that provides economics data dashboards needs to be able to develop software to display rich, interactive, data-driven graphics that run in web browsers and leverages the full stack of web standards
(HTML, SVG, and CSS).
Which technology provides the most appropriate support for this requirements?

A. D3.js

B. IPython/Jupyter

C. R Studio

D. Hue

Question 7

A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in this schema.
How should the company determine the most appropriate distribution key for the ORDERS table?

A. Identify the largest and most frequently joined dimension table and ensure that it and the ORDERS table both have EVEN distribution.

B. Identify the largest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.

C. Identify the smallest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.

D. Identify the largest and the most frequently joined dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.

Question 8

An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, Amazon
Redshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog.
How can this be accomplished?

A. Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the DynamoDB table.

B. Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.

C. Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the catalog.

D. Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.

Question 9

An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR.
They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon
EMR environment. They have multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G AWS Direct Connect setup between their data center and AWS, and the network team has agreed to allocate 50% of AWS Direct Connect bandwidth to data transfer. The data transfer cannot take more than two days.
What would be the MOST efficient approach to transfer data to AWS on a monthly basis?

A. Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to Amazon S3.

B. Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct Connect.

C. Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct Connect.

D. Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3 over AWS Direct Connect.

Question 10

A medical record filing system for a government medical fund is using an Amazon S3 bucket to archive documents related to patients. Every patient visit to a physician creates a new file, which can add up millions of files each month. Collection of these files from each physician is handled via a batch process that runs ever night using AWS Data Pipeline. This is sensitive data, so the data and any associated metadata must be encrypted at rest.
Auditors review some files on a quarterly basis to see whether the records are maintained according to regulations. Auditors must be able to locate any physical file in the S3 bucket for a given date, patient, or physician. Auditors spend a significant amount of time location such files.
What is the most cost- and time-efficient collection methodology in this situation?

A. Use Amazon Kinesis to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.

B. Use Amazon API Gateway to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.

C. Use Amazon S3 event notification to populate an Amazon DynamoDB table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.

D. Use Amazon S3 event notification to populate an Amazon Redshift table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.

Question 11

A company uses Amazon Redshift for its enterprise data warehouse. A new on-premises PostgreSQL OLTP
DB must be integrated into the data warehouse. Each table in the PostgreSQL DB has an indexed timestamp column. The data warehouse has a staging layer to load source data into the data warehouse environment for further processing.
The data lag between the source PostgreSQL DB and the Amazon Redshift staging layer should NOT exceed four hours.
What is the most efficient technique to meet these requirements?

A. Create a DBLINK on the source DB to connect to Amazon Redshift. Use a PostgreSQL trigger on the source table to capture the new insert/update/delete event and execute the event on the Amazon Redshift staging table.

B. Use a PostgreSQL trigger on the source table to capture the new insert/update/delete event and write it to Amazon Kinesis Streams. Use a KCL application to execute the event on the Amazon Redshift staging table.

C. Extract the incremental changes periodically using a SQL query. Upload the changes to multiple Amazon Simple Storage Service (S3) objects, and run the COPY command to load to the Amazon Redshift staging layer.

D. Extract the incremental changes periodically using a SQL query. Upload the changes to a single Amazon Simple Storage Service (S3) object, and run the COPY command to load to the Amazon Redshift staging layer.

Question 12

An organization needs a data store to handle the following data types and access patterns:
✑ Faceting
✑ Search
✑ Flexible schema (JSON) and fixed schema
✑ Noise word elimination
Which data store should the organization choose?

A. Amazon Relational Database Service (RDS)

B. Amazon Redshift

C. Amazon DynamoDB

D. Amazon Elasticsearch Service

Question 13

A gas company needs to monitor gas pressure in their pipelines. Pressure data is streamed from sensors placed throughout the pipelines to monitor the data in real time. When an anomaly is detected, the system must send a notification to open valve. An Amazon Kinesis stream collects the data from the sensors and an anomaly Kinesis stream triggers an AWS Lambda function to open the appropriate valve.
Which solution is the MOST cost-effective for responding to anomalies in real time?

A. Attach a Kinesis Firehose to the stream and persist the sensor data in an Amazon S3 bucket. Schedule an AWS Lambda function to run a query in Amazon Athena against the data in Amazon S3 to identify anomalies. When a change is detected, the Lambda function sends a message to the anomaly stream to open the valve.

B. Launch an Amazon EMR cluster that uses Spark Streaming to connect to the Kinesis stream and Spark machine learning to detect anomalies. When a change is detected, the Spark application sends a message to the anomaly stream to open the valve.

C. Launch a fleet of Amazon EC2 instances with a Kinesis Client Library application that consumes the stream and aggregates sensor data over time to identify anomalies. When an anomaly is detected, the application sends a message to the anomaly stream to open the valve.

D. Create a Kinesis Analytics application by using the RANDOM_CUT_FOREST function to detect an anomaly. When the anomaly score that is returned from the function is outside of an acceptable range, a message is sent to the anomaly stream to open the valve.

Question 14

A system engineer for a company proposes digitalization and backup of large archives for customers. The systems engineer needs to provide users with a secure storage that makes sure that data will never be tampered with once it has been uploaded.
How should this be accomplished?

A. Create an Amazon Glacier Vault. Specify a “Deny” Vault Lock policy on this Vault to block “glacier:DeleteArchive”.

B. Create an Amazon S3 bucket. Specify a “Deny” bucket policy on this bucket to block “s3:DeleteObject”.

C. Create an Amazon Glacier Vault. Specify a “Deny” vault access policy on this Vault to block “glacier:DeleteArchive”.

D. Create secondary AWS Account containing an Amazon S3 bucket. Grant “s3:PutObject” to the primary account.

Question 15

An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema.
In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)

A. When the tables are highly denormalized and do NOT participate in frequent joins.

B. When data must be grouped based on a specific key on a defined slice.

C. When data transfer between nodes must be eliminated.

D. When a new table has been loaded and it is unclear how it will be joined to dimension.

Question 16

An online photo album app has a key design feature to support multiple screens (e.g, desktop, mobile phone, and tablet) with high-quality displays. Multiple versions of the image must be saved in different resolutions and layouts.
The image-processing Java program takes an average of five seconds per upload, depending on the image size and format. Each image upload captures the following image metadata: user, album, photo label, upload timestamp.
The app should support the following requirements:
✑ Hundreds of user image uploads per second
✑ Maximum image upload size of 10 MB
✑ Maximum image metadata size of 1 KB
✑ Image displayed in optimized resolution in all supported screens no later than one minute after image upload
Which strategy should be used to meet these requirements?

A. Write images and metadata to Amazon Kinesis. Use a Kinesis Client Library (KCL) application to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB.

B. Write image and metadata RDS with BLOB data type. Use AWS Data Pipeline to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB.

C. Upload image with metadata to Amazon S3, use Lambda function to run the image processing and save the images output to Amazon S3 and metadata to the app repository DB.

D. Write image and metadata to Amazon Kinesis. Use Amazon Elastic MapReduce (EMR) with Spark Streaming to run image processing and save the images output to Amazon S3 and metadata to app repository DB.

Question 17

A company receives data sets coming from external providers on Amazon S3. Data sets from different providers are dependent on one another. Data sets will arrive at different times and in no particular order.
A data architect needs to design a solution that enables the company to do the following:
✑ Rapidly perform cross data set analysis as soon as the data becomes available
✑ Manage dependencies between data sets that arrive at different times
Which architecture strategy offers a scalable and cost-effective solution that meets these requirements?

A. Maintain data dependency information in Amazon RDS for MySQL. Use an AWS Data Pipeline job to load an Amazon EMR Hive table based on task dependencies and event notification triggers in Amazon S3.

B. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon SNS and event notifications to publish data to fleet of Amazon EC2 workers. Once the task dependencies have been resolved, process the data with Amazon EMR.

C. Maintain data dependency information in an Amazon ElastiCache Redis cluster. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to Redis. Once the task dependencies have been resolved, process the data with Amazon EMR.

D. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to the task associated with it in DynamoDB. Once all task dependencies have been resolved, process the data with Amazon EMR.

Question 18

A Redshift data warehouse has different user teams that need to query the same table with very different query types. These user teams are experiencing poor performance.
Which action improves performance for the user teams in this situation?

A. Create custom table views.

B. Add interleaved sort keys per team.

C. Maintain team-specific copies of the table.

D. Add support for workload management queue hopping.

Question 19

A travel website needs to present a graphical quantitative summary of its daily bookings to website visitors for marketing purposes. The website has millions of visitors per day, but wants to control costs by implementing the least-expensive solution for this visualization.
What is the most cost-effective solution?

A. Generate a static graph with a transient EMR cluster daily, and store it an Amazon S3.

B. Generate a graph using MicroStrategy backed by a transient EMR cluster.

C. Implement a Jupyter front-end provided by a continuously running EMR cluster leveraging spot instances for task nodes.

D. Implement a Zeppelin application that runs on a long-running EMR cluster.

Question 20

An organization is using Amazon Kinesis Data Streams to collect data generated from thousands of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to 12 million records every day, but Lambda is processing only around 450 thousand records. Amazon CloudWatch indicates that throttling on Lambda is not occurring.
What should be done to ensure that all data is processed? (Choose two.)

A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

B. Decrease the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.

C. Create multiple Lambda functions that will consume the same Amazon Kinesis stream.

D. Increase the number of vCores allocated for the Lambda function.

E. Increase the number of shards on the Amazon Kinesis stream.

Question 21

An organization needs to store sensitive information on Amazon S3 and process it through Amazon EMR. Data must be encrypted on Amazon S3 and Amazon
EMR at rest and in transit. Using Thrift Server, the Data Analysis team users HIVE to interact with this data. The organization would like to grant access to only specific databases and tables, giving permission only to the SELECT statement.
Which solution will protect the data and limit user access to the SELECT statement on a specific portion of data?

A. Configure Transparent Data Encryption on Amazon EMR. Create an Amazon EC2 instance and install Apache Ranger. Configure the authorization on the cluster to use Apache Ranger.

B. Configure data encryption at rest for EMR File System (EMRFS) on Amazon S3. Configure data encryption in transit for traffic between Amazon S3 and EMRFS. Configure storage and SQL base authorization on HiveServer2.

C. Use AWS KMS for encryption of data. Configure and attach multiple roles with different permissions based on the different user needs.

D. Configure Security Group on Amazon EMR. Create an Amazon VPC endpoint for Amazon S3. Configure HiveServer2 to use Kerberos authentication on the cluster.

Question 22

An organization's data warehouse contains sales data for reporting purposes. data governance policies prohibit staff from accessing the customers' credit card numbers.
How can these policies be adhered to and still allow a Data Scientist to group transactions that use the same credit card number?

A. Store a cryptographic hash of the credit card number.

B. Encrypt the credit card number with a symmetric encryption key, and give the key only to the authorized Data Scientist.

C. Mask the credit card numbers to only show the last four digits of the credit card number.

D. Encrypt the credit card number with an asymmetric encryption key and give the decryption key only to the authorized Data Scientist.

Question 23

An organization is soliciting public feedback through a web portal that has been deployed to track the number of requests and other important data. As part of reporting and visualization, AmazonQuickSight connects to an Amazon RDS database to virtualize data. Management wants to understand some important metrics about feedback and how the feedback has changed over the last four weeks in a visual representation.
What would be the MOST effective way to represent multiple iterations of an analysis in Amazon QuickSight that would show how the data has changed over the last four weeks?

A. Use the analysis option for data captured in each week and view the data by a date range.

B. Use a pivot table as a visual option to display measured values and weekly aggregate data as a row dimension.

C. Use a dashboard option to create an analysis of the data for each week and apply filters to visualize the data change.

D. Use a story option to preserve multiple iterations of an analysis and play the iterations sequentially.

Question 24

A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who participates in the trial requires visual reports each morning. The reports are built from aggregations of all the sensor data taken each minute.
What is the most cost-effective solution for creating this visualization each day?

A. Use Kinesis Aggregators Library to generate reports for reviewing the patient sensor data and generate a QuickSight visualization on the new data each morning for the physician to review.

B. Use a transient EMR cluster that shuts down after use to aggregate the patient sensor data each night and generate a QuickSight visualization on the new data each morning for the physician to review.

C. Use Spark streaming on EMR to aggregate the patient sensor data in every 15 minutes and generate a QuickSight visualization on the new data each morning for the physician to review.

D. Use an EMR cluster to aggregate the patient sensor data each night and provide Zeppelin notebooks that look at the new data residing on the cluster each morning for the physician to review.

Question 25

A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers.
The architect is building a platform where suppliers can view the status of one or more of their shipments.
Each supplier can have multiple roles that will only allow access to specific fields in the resulting information.
Which strategy allows the appropriate level of access control and requires the LEAST amount of management work?

A. Send the tracking data to Amazon Kinesis Streams. Use AWS Lambda to store the data in an Amazon DynamoDB Table. Generate temporary AWS credentials for the suppliers users with AWS STS, specifying fine-grained security policies to limit access only to their applicable data.

B. Send the tracking data to Amazon Kinesis Firehose. Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate data for each suppliers roles. Generate temporary AWS credentials for the suppliers users with AWS STS. Limit access to the appropriate files through security policies.

C. Send the tracking data to Amazon Kinesis Streams. Use Amazon EMR with Spark Streaming to store the data in HBase. Create one table per supplier. Use HBase Kerberos integration with the suppliers users. Use HBase ACL-based security to limit access for the roles to their specific table and columns.

D. Send the tracking data to Amazon Kinesis Firehose. Store the data in an Amazon Redshift cluster. Create views for the suppliers users and roles. Allow suppliers access to the Amazon Redshift cluster using a user limited to the applicable view. B

Question 26

A company needs a churn prevention model to predict which customers will NOT renew their yearly subscription to the companys service. The company plans to provide these customers with a promotional offer. A binary classification model that uses Amazon Machine Learning is required.
On which basis should this binary classification model be built?

A. User profiles (age, gender, income, occupation)

B. Last user session

C. Each user time series events in the past 3 months

D. Quarterly results

Question 27

A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be able to confirm at any point which servers are running and which network access controls are deployed.
Which action should the data engineer take to meet this requirement?

A. Provide the auditor IAM accounts with the SecurityAudit policy attached to their group.

B. Provide the auditor with SSH keys for access to the Amazon EMR cluster.

C. Provide the auditor with CloudFormation templates.

D. Provide the auditor with access to AWS DirectConnect to use their existing tools.

Question 28

A solutions architect works for a company that has a data lake based on a central Amazon S3 bucket. The data contains sensitive information. The architect must be able to specify exactly which files each user can access. Users access the platform through a SAML federation Single Sign On platform.
The architect needs to build a solution that allows fine grained access control, traceability of access to the objects, and usage of the standard tools (AWS Console, AWS CLI) to access the data.
Which solution should the architect build?

A. Use Amazon S3 Server-Side Encryption with AWS KMS-Managed Keys for storing data. Use AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for auditing.

B. Use Amazon S3 Server-Side Encryption with Amazon S3-Managed Keys. Set Amazon S3 ACLs to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing.

C. Use Amazon S3 Client-Side Encryption with Client-Side Master Key. Set Amazon S3 ACLs to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing.

D. Use Amazon S3 Client-Side Encryption with AWS KMS-Managed Keys for storing data. Use AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for auditing.

Question 29

A company has several teams of analysts. Each team of analysts has their own cluster. The teams need to run
SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The company needs to enable a centralized metadata layer to expose the Amazon S3 objects as tables to the analysts.
Which approach meets the requirement for a centralized metadata layer?

A. EMRFS consistent view with a common Amazon DynamoDB table

B. Bootstrap action to change the Hive Metastore to an Amazon RDS database

C. s3distcp with the outputManifest option to generate RDS DDL

D. Naming scheme support with automatic partition discovery from Amazon S3

Question 30

How should an Administrator BEST architect a large multi-layer Long Short-Term Memory (LSTM) recurrent neural network (RNN) running with MXNET on
Amazon EC2? (Choose two.)

A. Use data parallelism to partition the workload over multiple devices and balance the workload within the GPUs.

B. Use compute-optimized EC2 instances with an attached elastic GPU.

C. Use general purpose GPU computing instances such as G3 and P3.

D. Use processing parallelism to partition the workload over multiple storage devices and balance the workload within the GPUs.

Question 31

A company hosts a portfolio of e-commerce websites across the Oregon, N. Virginia, Ireland, and Sydney
AWS regions. Each site keeps log files that capture user behavior. The company has built an application that generates batches of product recommendations with collaborative filtering in Oregon. Oregon was selected because the flagship site is hosted there and provides the largest collection of data to train machine learning models against. The other regions do NOT have enough historic data to train accurate machine learning models.
Which set of data processing steps improves recommendations for each region?

A. Use the e-commerce application in Oregon to write replica log files in each other region.

B. Use Amazon S3 bucket replication to consolidate log entries and build a single model in Oregon.

C. Use Kinesis as a buffer for web logs and replicate logs to the Kinesis stream of a neighboring region.

D. Use the CloudWatch Logs agent to consolidate logs into a single CloudWatch Logs group.

Question 32

Multiple rows in an Amazon Redshift table were accidentally deleted. A System Administrator is restoring the table from the most recent snapshot. The snapshot contains all rows that were in the table before the deletion.
What is the SIMPLEST solution to restore the table without impacting users?

A. Restore the snapshot to a new Amazon Redshift cluster, then UNLOAD the table to Amazon S3. In the original cluster, TRUNCATE the table, then load the data from Amazon S3 by using a COPY command.

B. Use the Restore Table from a Snapshot command and specify a new table name DROP the original table, then RENAME the new table to the original table name.

C. Restore the snapshot to a new Amazon Redshift cluster. Create a DBLINK between the two clusters in the original cluster, TRUNCATE the destination table, then use an INSERT command to copy the data from the new cluster.

D. Use the ALTER TABLE REVERT command and specify a time stamp of immediately before the data deletion. Specify the Amazon Resource Name of the snapshot as the SOURCE and use the OVERWRITE REPLACE option.

Question 33

An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived.
How should the administrator recommend storing the log data?

A. Create an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on the device folders.

B. Create an Amazon DynamoDB table partitioned on the device and sorted on date, write log data to table. Execute the EMR job on the Amazon DynamoDB table.

C. Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder.

D. Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table.

Question 34

An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customers query patterns involve performing many joins with thousands of rows.
The customer needs to know how many nodes are needed in its target Redshift cluster. The customer has a limited budget and needs to avoid performing tests unless absolutely needed.
Which approach should this customer use?

A. Start with many small nodes.

B. Start with fewer large nodes.

C. Have two separate clusters with a mix of a small and large nodes.

D. Insist on performing multiple tests to determine the optimal configuration.

Question 35

A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon
Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.
How should the administrator accomplish this task?

A. Feed the data into Amazon Machine Learning and build a regression model.

B. Feed the data into Spark Mlib and build a random forest modest.

C. Feed the data into Apache Mahout and build a multi-classification model.

D. Feed the data into Amazon Machine Learning and build a binary classification model.

Question 36

An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.
Which three steps should the data engineer take to accomplish this task? (Choose three.)

A. Create a new KMS key in the destination region.

B. Copy the existing KMS key to the destination region.

C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.

D. In the source region, enable cross-region replication and specify the name of the copy grant created.

E. In the destination region, enable cross-region replication and specify the name of the copy grant created.

Question 37

A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety of other clinical tests that are available when blood type knowledge is unavailable.
What is the appropriate model choice and target attribute combination for this problem?

A. Multi-class classification model with a categorical target attribute.

B. Regression model with a numeric target attribute.

C. Binary Classification with a categorical target attribute.

D. K-Nearest Neighbors model with a multi-class target attribute.

Question 38

The department of transportation for a major metropolitan area has placed sensors on roads at key locations around the city. The goal is to analyze the flow of traffic and notifications from emergency services to identify potential issues and to help planners correct trouble spots.
A data engineer needs a scalable and fault-tolerant solution that allows planners to respond to issues within
30 seconds of their occurrence.
Which solution should the data engineer choose?

A. Collect the sensor data with Amazon Kinesis Firehose and store it in Amazon Redshift for analysis. Collect emergency services events with Amazon SQS and store in Amazon DynampDB for analysis.

B. Collect the sensor data with Amazon SQS and store in Amazon DynamoDB for analysis. Collect emergency services events with Amazon Kinesis Firehose and store in Amazon Redshift for analysis.

C. Collect both sensor data and emergency services events with Amazon Kinesis Streams and use DynamoDB for analysis.

D. Collect both sensor data and emergency services events with Amazon Kinesis Firehose and use Amazon Redshift for analysis.

Question 39

An organization uses a custom map reduce application to build monthly reports based on many small data files in an Amazon S3 bucket. The data is submitted from various business units on a frequent but unpredictable schedule. As the dataset continues to grow, it becomes increasingly difficult to process all of the data in one day. The organization has scaled up its Amazon EMR cluster, but other optimizations could improve performance.
The organization needs to improve performance with minimal changes to existing processes and applications.
What action should the organization take?

A. Use Amazon S3 Event Notifications and AWS Lambda to create a quick search file index in DynamoDB.

B. Add Spark to the Amazon EMR cluster and utilize Resilient Distributed Datasets in-memory.

C. Use Amazon S3 Event Notifications and AWS Lambda to index each file into an Amazon Elasticsearch Service cluster.

D. Schedule a daily AWS Data Pipeline process that aggregates content into larger files using S3DistCp.

E. Have business units submit data via Amazon Kinesis Firehose to aggregate data hourly into Amazon S3.

Question 40

An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the companys DynamoDB table. The company is NOT consuming close to the provisioned capacity. The table contains a large number of items and is partitioned on user and sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU.
Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.)

A. The structure of any GSIs that have been defined on the table

B. CloudWatch data showing consumed and provisioned write capacity when writes are being throttled

C. Application-level metrics showing the average item size and peak update rates for each attribute

D. The structure of any LSIs that have been defined on the table

E. The maximum historical WCU and RCU for the table

Question 41

An Operations team continuously monitors the number of visitors to a website to identify any potential system problems. The number of website visitors varies throughout the day. The site is more popular in the middle of the day and less popular at night.
Which type of dashboard display would be the MOST useful to allow staff to quickly and correctly identify system problems?

A. A vertical stacked bar chart showing today’s website visitors and the historical average number of website visitors.

B. An overlay line chart showing today’s website visitors at one-minute intervals and also the historical average number of website visitors.

C. A single KPI metric showing the statistical variance between the current number of website visitors and the historical number of website visitors for the current time of day.

D. A scatter plot showing today’s website visitors on the X-axis and the historical average number of website visitors on the Y-axis.

Question 42

A company generates a large number of files each month and needs to use AWS import/export to move these files into Amazon S3 storage. To satisfy the auditors, the company needs to keep a record of which files were imported into Amazon S3.
What is a low-cost way to create a unique log for each import job?

A. Use the same log file prefix in the import/export manifest files to create a versioned log file in Amazon S3 for all imports.

B. Use the log file prefix in the import/export manifest files to create a unique log file in Amazon S3 for each import.

C. Use the log file checksum in the import/export manifest files to create a unique log file in Amazon S3 for each import.

D. Use a script to iterate over files in Amazon S3 to generate a log after each import/export job.

Question 43

An organization would like to run analytics on their Elastic Load Balancing logs stored in Amazon S3 and join this data with other tables in Amazon S3. The users are currently using a BI tool connecting with JDBC and would like to keep using this BI tool.
Which solution would result in the LEAST operational overhead?

A. Trigger a Lambda function when a new log file is added to the bucket to transform and load it into Amazon Redshift. Run the VACUUM command on the Amazon Redshift cluster every night.

B. Launch a long-running Amazon EMR cluster that continuously downloads and transforms new files from Amazon S3 into its HDFS storage. Use Presto to expose the data through JDBC.

C. Trigger a Lambda function when a new log file is added to the bucket to transform and move it to another bucket with an optimized data structure. Use Amazon Athena to query the optimized bucket.

D. Launch a transient Amazon EMR cluster every night that transforms new log files and loads them into Amazon Redshift.

Question 44

A social media customer has data from different data sources including RDS running MySQL, Redshift, and
Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results.
What is the most cost-effective solution to meet these requirements?

A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.

B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.

C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.

D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.

Question 45

A companys social media manager requests more staff on the weekends to handle an increase in customer contacts from a particular region. The company needs a report to visualize the trends on weekends over the past 6 months using QuickSight.
How should the data be represented?

A. A line graph plotting customer contacts vs. time, with a line for each region

B. A pie chart per region plotting customer contacts per day of week

C. A map of regions with a heatmap overlay to show the volume of customer contacts

D. A bar graph plotting region vs. volume of social media contacts

Question 46

An organization is designing an application architecture. The application will have over 100 TB of data and will support transactions that arrive at rates from hundreds per second to tens of thousands per second, depending on the day of the week and time of the day. All transaction data, must be durably and reliably stored. Certain read operations must be performed with strong consistency.
Which solution meets these requirements?

A. Use Amazon DynamoDB as the data store and use strongly consistent reads when necessary.

B. Use an Amazon Relational Database Service (RDS) instance sized to meet the maximum anticipated transaction rate and with the High Availability option enabled.

C. Deploy a NoSQL data store on top of an Amazon Elastic MapReduce (EMR) cluster, and select the HDFS High Durability option.

D. Use Amazon Redshift with synchronous replication to Amazon Simple Storage Service (S3) and row-level locking for strong consistency.

Question 47

An administrator needs to manage a large catalog of items from various external sellers. The administrator needs to determine if the items should be identified as minimally dangerous, dangerous, or highly dangerous based on their textual descriptions. The administrator already has some items with the danger attribute, but receives hundreds of new item descriptions every day without such classification.
The administrator has a system that captures dangerous goods reports from customer support team of from user feedback.
What is a cost-effective architecture to solve this issue?

A. Build a set of regular expression rules that are based on the existing examples, and run them on the DynamoDB Streams as every new item description is added to the system.

B. Build a Kinesis Streams process that captures and marks the relevant items in the dangerous goods reports using a Lambda function once more than two reports have been filed.

C. Build a machine learning model to properly classify dangerous goods and run it on the DynamoDB Streams as every new item description is added to the system.

D. Build a machine learning model with binary classification for dangerous goods and run it on the DynamoDB Streams as every new item description is added to the system.

Question 48

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of 5 PB, which also resides in Amazon S3 storage.
Which AWS service strategy is best for this use case?

A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.

B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.

C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.

D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.

Question 49

A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job.
Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the
EMR job.
Which recommendation should an administrator provide?

A. Reduce the HDFS block size to increase the number of task processors.

B. Use bzip2 or Snappy rather than gzip for the archives.

C. Decompress the gzip archives and store the data as CSV files.

D. Use Avro rather than gzip for the archives.

Question 50

A company with a support organization needs support engineers to be able to search historic cases to provide fast responses on new issues raised. The company has forwarded all support messages into an Amazon
Kinesis Stream. This meets a company objective of using only managed services to reduce operational overhead.
The company needs an appropriate architecture that allows support engineers to search on historic cases and find similar issues and their associated responses.
Which AWS Lambda action is most appropriate?

A. Ingest and index the content into an Amazon Elasticsearch domain.

B. Stem and tokenize the input and store the results into Amazon ElastiCache.

C. Write data as JSON into Amazon DynamoDB with primary and secondary indexes.

D. Aggregate feedback in Amazon S3 using a columnar format with partitioning.

Free Access Full BDS-C00 Practice Questions Free

Want more hands-on practice? Click here to access the full bank of BDS-C00 practice questions free and reinforce your understanding of all exam objectives.

We update our question sets regularly, so check back often for new and relevant content.

Good luck with your BDS-C00 certification journey!

BDS-C00 Practice Questions Free

AZ-900 Practice Questions Free

CAS-003 Practice Questions Free

CAS-003 Practice Questions Free

CAS-004 Practice Questions Free

CCAK Practice Questions Free

Leave a Reply Cancel reply

Recommended

Network+ Practice Test

Comptia Security+ Practice Test

A+ Certification Practice Test

Aws Cloud Practitioner Exam Questions

Aws Cloud Practitioner Practice Exam

Comptia A+ Practice Test

Welcome Back!

Create New Account!

Retrieve your password