BDS-C00 Mock Test Free – 50 Realistic Questions to Prepare with Confidence.
Getting ready for your BDS-C00 certification exam? Start your preparation the smart way with our BDS-C00 Mock Test Free – a carefully crafted set of 50 realistic, exam-style questions to help you practice effectively and boost your confidence.
Using a mock test free for BDS-C00 exam is one of the best ways to:
- Familiarize yourself with the actual exam format and question style
- Identify areas where you need more review
- Strengthen your time management and test-taking strategy
Below, you will find 50 free questions from our BDS-C00 Mock Test Free resource. These questions are structured to reflect the real exam’s difficulty and content areas, helping you assess your readiness accurately.
An organization is currently using an Amazon EMR long-running cluster with the latest Amazon EMR release for analytic jobs and is storing data as external tables on Amazon S3. The company needs to launch multiple transient EMR clusters to access the same tables concurrently, but the metadata about the Amazon S3 external tables are defined and stored on the long-running cluster. Which solution will expose the Hive metastore with the LEAST operational effort?
A. Export Hive metastore information to Amazon DynamoDB hive-site classification to point to the Amazon DynamoDB table.
B. Export Hive metastore information to a MySQL table on Amazon RDS and configure the Amazon EMR hive-site classification to point to the Amazon RDS database.
C. Launch an Amazon EC2 instance, install and configure Apache Derby, and export the Hive metastore information to derby.
D. Create and configure an AWS Glue Data Catalog as a Hive metastore for Amazon EMR.
A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance. How should the administrator accomplish this task?
A. Feed the data into Amazon Machine Learning and build a regression model.
B. Feed the data into Spark Mlib and build a random forest modest.
C. Feed the data into Apache Mahout and build a multi-classification model.
D. Feed the data into Amazon Machine Learning and build a binary classification model.
An administrator needs to design a strategy for the schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which two circumstances would choosing EVEN distribution be most appropriate? (Choose two.)
A. When the tables are highly denormalized and do NOT participate in frequent joins.
B. When data must be grouped based on a specific key on a defined slice.
C. When data transfer between nodes must be eliminated.
D. When a new table has been loaded and it is unclear how it will be joined to dimension.
An administrator needs to design the event log storage architecture for events from mobile devices. The event data will be processed by an Amazon EMR cluster daily for aggregated reporting and analytics before being archived. How should the administrator recommend storing the log data?
A. Create an Amazon S3 bucket and write log data into folders by device. Execute the EMR job on the device folders.
B. Create an Amazon DynamoDB table partitioned on the device and sorted on date, write log data to table. Execute the EMR job on the Amazon DynamoDB table.
C. Create an Amazon S3 bucket and write data into folders by day. Execute the EMR job on the daily folder.
D. Create an Amazon DynamoDB table partitioned on EventID, write log data to table. Execute the EMR job on the table.
A system needs to collect on-premises application spool files into a persistent storage layer in AWS. Each spool file is 2 KB. The application generates 1 M files per hour. Each source file is automatically deleted from the local server after an hour. What is the most cost-efficient option to meet these requirements?
A. Write file contents to an Amazon DynamoDB table.
B. Copy files to Amazon S3 Standard Storage.
C. Write file contents to Amazon ElastiCache.
D. Copy files to Amazon S3 infrequent Access Storage.
A social media customer has data from different data sources including RDS running MySQL, Redshift, and Hive on EMR. To support better analysis, the customer needs to be able to analyze data from different data sources and to combine the results. What is the most cost-effective solution to meet these requirements?
A. Load all data from a different database/warehouse to S3. Use Redshift COPY command to copy data to Redshift for analysis.
B. Install Presto on the EMR cluster where Hive sits. Configure MySQL and PostgreSQL connector to select from different data sources in a single query.
C. Spin up an Elasticsearch cluster. Load data from all three data sources and use Kibana to analyze.
D. Write a program running on a separate EC2 instance to run queries to three different systems. Aggregate the results after getting the responses from all three systems.
A real-time bidding company is rebuilding their monolithic application and is focusing on serving real-time data. A large number of reads and writes are generated from thousands of concurrent users who follow items and bid on the company's sale offers. The company is experiencing high latency during special event spikes, with millions of concurrent users. The company needs to analyze and aggregate a part of the data in near real time to feed an internal dashboard. What is the BEST approach for serving and analyzing data, considering the constraint of the row latency on the highly demanded data?
A. Use Amazon Aurora with Multi Availability Zone and read replicas. Use Amazon ElastiCache in front of the read replicas to serve read-only content quickly. Use the same database as datasource for the dashboard.
B. Use Amazon DynamoDB to store real-time data with Amazon DynamoDB. Accelerator to serve content quickly. use Amazon DynamoDB Streams to replay all changes to the table, process and stream to Amazon Elasti search Service with AWS Lambda.
C. Use Amazon RDS with Multi Availability Zone. Provisioned IOPS EBS volume for storage. Enable up to five read replicas to serve read-only content quickly. Use Amazon EMR with Sqoop to import Amazon RDS data into HDFS for analysis.
D. Use Amazon Redshift with a DC2 node type and a multi-mode cluster. Create an Amazon EC2 instance with pgpoo1 installed. Create an Amazon ElastiCache cluster and route read requests through pgpoo1, and use Amazon Redshift for analysis. D
A company is building a new application in AWS. The architect needs to design a system to collect application log events. The design should be a repeatable pattern that minimizes data loss if an application instance fails, and keeps a durable copy of a log data for at least 30 days. What is the simplest architecture that will allow the architect to analyze the logs?
A. Write them directly to a Kinesis Firehose. Configure Kinesis Firehose to load the events into an Amazon Redshift cluster for analysis.
B. Write them to a file on Amazon Simple Storage Service (S3). Write an AWS Lambda function that runs in response to the S3 event to load the events into Amazon Elasticsearch Service for analysis.
C. Write them to the local disk and configure the Amazon CloudWatch Logs agent to load the data into CloudWatch Logs and subsequently into Amazon Elasticsearch Service.
D. Write them to CloudWatch Logs and use an AWS Lambda function to load them into HDFS on an Amazon Elastic MapReduce (EMR) cluster for analysis.
An online retailer is using Amazon DynamoDB to store data related to customer transactions. The items in the table contains several string attributes describing the transaction as well as a JSON attribute containing the shopping cart and other details corresponding to the transaction. Average item size is 250KB, most of which is associated with the JSON attribute. The average customer generates 3GB of data per month. Customers access the table to display their transaction history and review transaction details as needed. Ninety percent of the queries against the table are executed when building the transaction history view, with the other 10% retrieving transaction details. The table is partitioned on CustomerID and sorted on transaction date. The client has very high read capacity provisioned for the table and experiences very even utilization, but complains about the cost of Amazon DynamoDB compared to other NoSQL solutions. Which strategy will reduce the cost associated with the clients read queries while not degrading quality?
A. Modify all database calls to use eventually consistent reads and advise customers that transaction history may be one second out-of-date.
B. Change the primary table to partition on TransactionID, create a GSI partitioned on customer and sorted on date, project small attributes into GSI, and then query GSI for summary data and the primary table for JSON details.
C. Vertically partition the table, store base attributes on the primary table, and create a foreign key reference to a secondary table containing the JSON data. Query the primary table for summary data and the secondary table for JSON details.
D. Create an LSI sorted on date, project the JSON attribute into the index, and then query the primary table for summary data and the LSI for JSON details.
A large oil and gas company needs to provide near real-time alerts when peak thresholds are exceeded in its pipeline system. The company has developed a system to capture pipeline metrics such as flow rate, pressure, and temperature using millions of sensors. The sensors deliver to AWS IoT. What is a cost-effective way to provide near real-time alerts on the pipeline metrics?
A. Create an AWS IoT rule to generate an Amazon SNS notification.
B. Store the data points in an Amazon DynamoDB table and poll if for peak metrics data from an Amazon EC2 application.
C. Create an Amazon Machine Learning model and invoke it with AWS Lambda.
D. Use Amazon Kinesis Streams and a KCL-based application deployed on AWS Elastic Beanstalk.
An advertising organization uses an application to process a stream of events that are received from clients in multiple unstructured formats. The application does the following: ✑ Transforms the events into a single structured format and streams them to Amazon Kinesis for real-time analysis. ✑ Stores the unstructured raw events from the log files on local hard drivers that are rotated and uploaded to Amazon S3. The organization wants to extract campaign performance reporting using an existing Amazon redshift cluster. Which solution will provide the performance data with the LEAST number of operations?
A. Install the Amazon Kinesis Data Firehose agent on the application servers and use it to stream the log files directly to Amazon Redshift.
B. Create an external table in Amazon Redshift and point it to the S3 bucket where the unstructured raw events are stored.
C. Write an AWS Lambda function that triggers every hour to load the new log files already in S3 to Amazon redshift.
D. Connect Amazon Kinesis Data Firehose to the existing Amazon Kinesis stream and use it to stream the event directly to Amazon Redshift.
A company that manufactures and sells smart air conditioning units also offers add-on services so that customers can see real-time dashboards in a mobile application or a web browser. Each unit sends its sensor information in JSON format every two seconds for processing and analysis. The company also needs to consume this data to predict possible equipment problems before they occur. A few thousand pre-purchased units will be delivered in the next couple of months. The company expects high market growth in the next year and needs to handle a massive amount of data and scale without interruption. Which ingestion solution should the company use?
A. Write sensor data records to Amazon Kinesis Streams. Process the data using KCL applications for the end-consumer dashboard and anomaly detection workflows.
B. Batch sensor data to Amazon Simple Storage Service (S3) every 15 minutes. Flow the data downstream to the end-consumer dashboard and to the anomaly detection application.
C. Write sensor data records to Amazon Kinesis Firehose with Amazon Simple Storage Service (S3) as the destination. Consume the data with a KCL application for the end-consumer dashboard and anomaly detection.
D. Write sensor data records to Amazon Relational Database Service (RDS). Build both the end-consumer dashboard and anomaly detection application on top of Amazon RDS.
An administrator receives about 100 files per hour into Amazon S3 and will be loading the files into Amazon Redshift. Customers who analyze the data within Redshift gain significant value when they receive data as quickly as possible. The customers have agreed to a maximum loading interval of 5 minutes. Which loading approach should the administrator use to meet this objective?
A. Load each file as it arrives because getting data into the cluster as quickly as possibly is the priority.
B. Load the cluster as soon as the administrator has the same number of files as nodes in the cluster.
C. Load the cluster when the administrator has an event multiple of files relative to Cluster Slice Count, or 5 minutes, whichever comes first.
D. Load the cluster when the number of files is less than the Cluster Slice Count.
A data engineer wants to use an Amazon Elastic Map Reduce for an application. The data engineer needs to make sure it complies with regulatory requirements. The auditor must be able to confirm at any point which servers are running and which network access controls are deployed. Which action should the data engineer take to meet this requirement?
A. Provide the auditor IAM accounts with the SecurityAudit policy attached to their group.
B. Provide the auditor with SSH keys for access to the Amazon EMR cluster.
C. Provide the auditor with CloudFormation templates.
D. Provide the auditor with access to AWS DirectConnect to use their existing tools.
A company generates a large number of files each month and needs to use AWS import/export to move these files into Amazon S3 storage. To satisfy the auditors, the company needs to keep a record of which files were imported into Amazon S3. What is a low-cost way to create a unique log for each import job?
A. Use the same log file prefix in the import/export manifest files to create a versioned log file in Amazon S3 for all imports.
B. Use the log file prefix in the import/export manifest files to create a unique log file in Amazon S3 for each import.
C. Use the log file checksum in the import/export manifest files to create a unique log file in Amazon S3 for each import.
D. Use a script to iterate over files in Amazon S3 to generate a log after each import/export job.
An organization uses Amazon Elastic MapReduce(EMR) to process a series of extract-transform-load (ETL) steps that run in sequence. The output of each step must be fully processed in subsequent steps but will not be retained. Which of the following techniques will meet this requirement most efficiently?
A. Use the EMR File System (EMRFS) to store the outputs from each step as objects in Amazon Simple Storage Service (S3).
B. Use the s3n URI to store the data to be processed as objects in Amazon S3.
C. Define the ETL steps as separate AWS Data Pipeline activities.
D. Load the data to be processed into HDFS, and then write the final output to Amazon S3.
A company receives data sets coming from external providers on Amazon S3. Data sets from different providers are dependent on one another. Data sets will arrive at different times and in no particular order. A data architect needs to design a solution that enables the company to do the following: ✑ Rapidly perform cross data set analysis as soon as the data becomes available ✑ Manage dependencies between data sets that arrive at different times Which architecture strategy offers a scalable and cost-effective solution that meets these requirements?
A. Maintain data dependency information in Amazon RDS for MySQL. Use an AWS Data Pipeline job to load an Amazon EMR Hive table based on task dependencies and event notification triggers in Amazon S3.
B. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon SNS and event notifications to publish data to fleet of Amazon EC2 workers. Once the task dependencies have been resolved, process the data with Amazon EMR.
C. Maintain data dependency information in an Amazon ElastiCache Redis cluster. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to Redis. Once the task dependencies have been resolved, process the data with Amazon EMR.
D. Maintain data dependency information in an Amazon DynamoDB table. Use Amazon S3 event notifications to trigger an AWS Lambda function that maps the S3 object to the task associated with it in DynamoDB. Once all task dependencies have been resolved, process the data with Amazon EMR.
A company operates an international business served from a single AWS region. The company wants to expand into a new country. The regulator for that country requires the Data Architect to maintain a log of financial transactions in the country within 24 hours of the product transaction. The production application is latency insensitive. The new country contains another AWS region. What is the most cost-effective way to meet this requirement?
A. Use CloudFormation to replicate the production application to the new region.
B. Use Amazon CloudFront to serve application content locally in the country; Amazon CloudFront logs will satisfy the requirement.
C. Continue to serve customers from the existing region while using Amazon Kinesis to stream transaction data to the regulator.
D. Use Amazon S3 cross-region replication to copy and persist production transaction logs to a bucket in the new countrys region.
A company with a support organization needs support engineers to be able to search historic cases to provide fast responses on new issues raised. The company has forwarded all support messages into an Amazon Kinesis Stream. This meets a company objective of using only managed services to reduce operational overhead. The company needs an appropriate architecture that allows support engineers to search on historic cases and find similar issues and their associated responses. Which AWS Lambda action is most appropriate?
A. Ingest and index the content into an Amazon Elasticsearch domain.
B. Stem and tokenize the input and store the results into Amazon ElastiCache.
C. Write data as JSON into Amazon DynamoDB with primary and secondary indexes.
D. Aggregate feedback in Amazon S3 using a columnar format with partitioning.
A medical record filing system for a government medical fund is using an Amazon S3 bucket to archive documents related to patients. Every patient visit to a physician creates a new file, which can add up millions of files each month. Collection of these files from each physician is handled via a batch process that runs ever night using AWS Data Pipeline. This is sensitive data, so the data and any associated metadata must be encrypted at rest. Auditors review some files on a quarterly basis to see whether the records are maintained according to regulations. Auditors must be able to locate any physical file in the S3 bucket for a given date, patient, or physician. Auditors spend a significant amount of time location such files. What is the most cost- and time-efficient collection methodology in this situation?
A. Use Amazon Kinesis to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.
B. Use Amazon API Gateway to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.
C. Use Amazon S3 event notification to populate an Amazon DynamoDB table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.
D. Use Amazon S3 event notification to populate an Amazon Redshift table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.
A company that provides economics data dashboards needs to be able to develop software to display rich, interactive, data-driven graphics that run in web browsers and leverages the full stack of web standards (HTML, SVG, and CSS). Which technology provides the most appropriate support for this requirements?
A. D3.js
B. IPython/Jupyter
C. R Studio
D. Hue
An administrator tries to use the Amazon Machine Learning service to classify social media posts that mention the administrators company into posts that require a response and posts that do not. The training dataset of 10,000 posts contains the details of each post including the timestamp, author, and full text of the post. The administrator is missing the target labels that are required for training. Which Amazon Machine Learning model is the most appropriate for the task?
A. Binary classification model, where the target class is the require-response post
B. Binary classification model, where the two classes are the require-response post and does-not-require- response
C. Multi-class prediction model, with two classes: require-response post and does-not-require-response
D. Regression model where the predicted value is the probability that the post requires a response
An online gaming company uses DynamoDB to store user activity logs and is experiencing throttled writes on the companys DynamoDB table. The company is NOT consuming close to the provisioned capacity. The table contains a large number of items and is partitioned on user and sorted by date. The table is 200GB and is currently provisioned at 10K WCU and 20K RCU. Which two additional pieces of information are required to determine the cause of the throttling? (Choose two.)
A. The structure of any GSIs that have been defined on the table
B. CloudWatch data showing consumed and provisioned write capacity when writes are being throttled
C. Application-level metrics showing the average item size and peak update rates for each attribute
D. The structure of any LSIs that have been defined on the table
E. The maximum historical WCU and RCU for the table
A solutions architect works for a company that has a data lake based on a central Amazon S3 bucket. The data contains sensitive information. The architect must be able to specify exactly which files each user can access. Users access the platform through a SAML federation Single Sign On platform. The architect needs to build a solution that allows fine grained access control, traceability of access to the objects, and usage of the standard tools (AWS Console, AWS CLI) to access the data. Which solution should the architect build?
A. Use Amazon S3 Server-Side Encryption with AWS KMS-Managed Keys for storing data. Use AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for auditing.
B. Use Amazon S3 Server-Side Encryption with Amazon S3-Managed Keys. Set Amazon S3 ACLs to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing.
C. Use Amazon S3 Client-Side Encryption with Client-Side Master Key. Set Amazon S3 ACLs to allow access to specific elements of the platform. Use Amazon S3 to access logs for auditing.
D. Use Amazon S3 Client-Side Encryption with AWS KMS-Managed Keys for storing data. Use AWS KMS Grants to allow access to specific elements of the platform. Use AWS CloudTrail for auditing.
An organization is setting up a data catalog and metadata management environment for their numerous data stores currently running on AWS. The data catalog will be used to determine the structure and other attributes of data in the data stores. The data stores are composed of Amazon RDS databases, Amazon Redshift, and CSV files residing on Amazon S3. The catalog should be populated on a scheduled basis, and minimal administration is required to manage the catalog. How can this be accomplished?
A. Set up Amazon DynamoDB as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the DynamoDB table.
B. Use an Amazon database as the data catalog and run a scheduled AWS Lambda function that connects to data sources to populate the database.
C. Use AWS Glue Data Catalog as the data catalog and schedule crawlers that connect to data sources to populate the catalog.
D. Set up Apache Hive metastore on an Amazon EC2 instance and run a scheduled bash script that connects to data sources to populate the metastore.
A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in this schema. How should the company determine the most appropriate distribution key for the ORDERS table?
A. Identify the largest and most frequently joined dimension table and ensure that it and the ORDERS table both have EVEN distribution.
B. Identify the largest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.
C. Identify the smallest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.
D. Identify the largest and the most frequently joined dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.
An administrator is deploying Spark on Amazon EMR for two distinct use cases: machine learning algorithms and ad-hoc querying. All data will be stored in Amazon S3. Two separate clusters for each use case will be deployed. The data volumes on Amazon S3 are less than 10 GB. How should the administrator align instance types with the clusters purpose?
A. Machine Learning on C instance types and ad-hoc queries on R instance types
B. Machine Learning on R instance types and ad-hoc queries on G2 instance types
C. Machine Learning on T instance types and ad-hoc queries on M instance types
D. Machine Learning on D instance types and ad-hoc queries on I instance types
A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job. Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the EMR job. Which recommendation should an administrator provide?
A. Reduce the HDFS block size to increase the number of task processors.
B. Use bzip2 or Snappy rather than gzip for the archives.
C. Decompress the gzip archives and store the data as CSV files.
D. Use Avro rather than gzip for the archives.
An administrator needs to design a distribution strategy for a star schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which three circumstances would choosing Key-based distribution be most appropriate? (Select three.)
A. When the administrator needs to optimize a large, slowly changing dimension table.
B. When the administrator needs to reduce cross-node traffic.
C. When the administrator needs to optimize the fact table for parity with the number of slices.
D. When the administrator needs to balance data distribution and collocation data.
E. When the administrator needs to take advantage of data locality on a local node for joins and aggregates.
An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customers query patterns involve performing many joins with thousands of rows. The customer needs to know how many nodes are needed in its target Redshift cluster. The customer has a limited budget and needs to avoid performing tests unless absolutely needed. Which approach should this customer use?
A. Start with many small nodes.
B. Start with fewer large nodes.
C. Have two separate clusters with a mix of a small and large nodes.
D. Insist on performing multiple tests to determine the optimal configuration.
There are thousands of text files on Amazon S3. The total size of the files is 1 PB. The files contain retail order information for the past 2 years. A data engineer needs to run multiple interactive queries to manipulate the data. The Data Engineer has AWS access to spin up an Amazon EMR cluster. The data engineer needs to use an application on the cluster to process this data and return the results in interactive time frame. Which application on the cluster should the data engineer use?
A. Oozie
B. Apache Pig with Tachyon
C. Apache Hive
D. Presto
A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety of other clinical tests that are available when blood type knowledge is unavailable. What is the appropriate model choice and target attribute combination for this problem?
A. Multi-class classification model with a categorical target attribute.
B. Regression model with a numeric target attribute.
C. Binary Classification with a categorical target attribute.
D. K-Nearest Neighbors model with a multi-class target attribute.
A media advertising company handles a large number of real-time messages sourced from over 200 websites in real time. Processing latency must be kept low. Based on calculations, a 60-shard Amazon Kinesis stream is more than sufficient to handle the maximum data throughput, even with traffic spikes. The company also uses an Amazon Kinesis Client Library (KCL) application running on Amazon Elastic Compute Cloud (EC2) managed by an Auto Scaling group. Amazon CloudWatch indicates an average of 25% CPU and a modest level of network traffic across all running servers. The company reports a 150% to 200% increase in latency of processing messages from Amazon Kinesis during peak times. There are NO reports of delay from the sites publishing to Amazon Kinesis. What is the appropriate solution to address the latency?
A. Increase the number of shards in the Amazon Kinesis stream to 80 for greater concurrency.
B. Increase the size of the Amazon EC2 instances to increase network throughput.
C. Increase the minimum number of instances in the Auto Scaling group.
D. Increase Amazon DynamoDB throughput on the checkpoint table.
An organization uses a custom map reduce application to build monthly reports based on many small data files in an Amazon S3 bucket. The data is submitted from various business units on a frequent but unpredictable schedule. As the dataset continues to grow, it becomes increasingly difficult to process all of the data in one day. The organization has scaled up its Amazon EMR cluster, but other optimizations could improve performance. The organization needs to improve performance with minimal changes to existing processes and applications. What action should the organization take?
A. Use Amazon S3 Event Notifications and AWS Lambda to create a quick search file index in DynamoDB.
B. Add Spark to the Amazon EMR cluster and utilize Resilient Distributed Datasets in-memory.
C. Use Amazon S3 Event Notifications and AWS Lambda to index each file into an Amazon Elasticsearch Service cluster.
D. Schedule a daily AWS Data Pipeline process that aggregates content into larger files using S3DistCp.
E. Have business units submit data via Amazon Kinesis Firehose to aggregate data hourly into Amazon S3.
A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon Redshift. What is the most efficient architecture strategy for this purpose?
A. Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.
B. Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.
C. When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schema on Redshift.
D. Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.
A solutions architect for a logistics organization ships packages from thousands of suppliers to end customers. The architect is building a platform where suppliers can view the status of one or more of their shipments. Each supplier can have multiple roles that will only allow access to specific fields in the resulting information. Which strategy allows the appropriate level of access control and requires the LEAST amount of management work?
A. Send the tracking data to Amazon Kinesis Streams. Use AWS Lambda to store the data in an Amazon DynamoDB Table. Generate temporary AWS credentials for the suppliers users with AWS STS, specifying fine-grained security policies to limit access only to their applicable data.
B. Send the tracking data to Amazon Kinesis Firehose. Use Amazon S3 notifications and AWS Lambda to prepare files in Amazon S3 with appropriate data for each suppliers roles. Generate temporary AWS credentials for the suppliers users with AWS STS. Limit access to the appropriate files through security policies.
C. Send the tracking data to Amazon Kinesis Streams. Use Amazon EMR with Spark Streaming to store the data in HBase. Create one table per supplier. Use HBase Kerberos integration with the suppliers users. Use HBase ACL-based security to limit access for the roles to their specific table and columns.
D. Send the tracking data to Amazon Kinesis Firehose. Store the data in an Amazon Redshift cluster. Create views for the suppliers users and roles. Allow suppliers access to the Amazon Redshift cluster using a user limited to the applicable view. B
An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region. Which three steps should the data engineer take to accomplish this task? (Choose three.)
A. Create a new KMS key in the destination region.
B. Copy the existing KMS key to the destination region.
C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region.
D. In the source region, enable cross-region replication and specify the name of the copy grant created.
E. In the destination region, enable cross-region replication and specify the name of the copy grant created.
An organization needs to design and deploy a large-scale data storage solution that will be highly durable and highly flexible with respect to the type and structure of data being stored. The data to be stored will be sent or generated from a variety of sources and must be persistently available for access and processing by multiple applications. What is the most cost-effective technique to meet these requirements?
A. Use Amazon Simple Storage Service (S3) as the actual data storage system, coupled with appropriate tools for ingestion/acquisition of data and for subsequent processing and querying.
B. Deploy a long-running Amazon Elastic MapReduce (EMR) cluster with Amazon Elastic Block Store (EBS) volumes for persistent HDFS storage and appropriate Hadoop ecosystem tools for processing and querying.
C. Use Amazon Redshift with data replication to Amazon Simple Storage Service (S3) for comprehensive durable data storage, processing, and querying.
D. Launch an Amazon Relational Database Service (RDS), and use the enterprise grade and capacity of the Amazon Aurora engine for storage, processing, and querying.
An organization's data warehouse contains sales data for reporting purposes. data governance policies prohibit staff from accessing the customers' credit card numbers. How can these policies be adhered to and still allow a Data Scientist to group transactions that use the same credit card number?
A. Store a cryptographic hash of the credit card number.
B. Encrypt the credit card number with a symmetric encryption key, and give the key only to the authorized Data Scientist.
C. Mask the credit card numbers to only show the last four digits of the credit card number.
D. Encrypt the credit card number with an asymmetric encryption key and give the decryption key only to the authorized Data Scientist.
A gas company needs to monitor gas pressure in their pipelines. Pressure data is streamed from sensors placed throughout the pipelines to monitor the data in real time. When an anomaly is detected, the system must send a notification to open valve. An Amazon Kinesis stream collects the data from the sensors and an anomaly Kinesis stream triggers an AWS Lambda function to open the appropriate valve. Which solution is the MOST cost-effective for responding to anomalies in real time?
A. Attach a Kinesis Firehose to the stream and persist the sensor data in an Amazon S3 bucket. Schedule an AWS Lambda function to run a query in Amazon Athena against the data in Amazon S3 to identify anomalies. When a change is detected, the Lambda function sends a message to the anomaly stream to open the valve.
B. Launch an Amazon EMR cluster that uses Spark Streaming to connect to the Kinesis stream and Spark machine learning to detect anomalies. When a change is detected, the Spark application sends a message to the anomaly stream to open the valve.
C. Launch a fleet of Amazon EC2 instances with a Kinesis Client Library application that consumes the stream and aggregates sensor data over time to identify anomalies. When an anomaly is detected, the application sends a message to the anomaly stream to open the valve.
D. Create a Kinesis Analytics application by using the RANDOM_CUT_FOREST function to detect an anomaly. When the anomaly score that is returned from the function is outside of an acceptable range, a message is sent to the anomaly stream to open the valve.
An online photo album app has a key design feature to support multiple screens (e.g, desktop, mobile phone, and tablet) with high-quality displays. Multiple versions of the image must be saved in different resolutions and layouts. The image-processing Java program takes an average of five seconds per upload, depending on the image size and format. Each image upload captures the following image metadata: user, album, photo label, upload timestamp. The app should support the following requirements: ✑ Hundreds of user image uploads per second ✑ Maximum image upload size of 10 MB ✑ Maximum image metadata size of 1 KB ✑ Image displayed in optimized resolution in all supported screens no later than one minute after image upload Which strategy should be used to meet these requirements?
A. Write images and metadata to Amazon Kinesis. Use a Kinesis Client Library (KCL) application to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB.
B. Write image and metadata RDS with BLOB data type. Use AWS Data Pipeline to run the image processing and save the image output to Amazon S3 and metadata to the app repository DB.
C. Upload image with metadata to Amazon S3, use Lambda function to run the image processing and save the images output to Amazon S3 and metadata to the app repository DB.
D. Write image and metadata to Amazon Kinesis. Use Amazon Elastic MapReduce (EMR) with Spark Streaming to run image processing and save the images output to Amazon S3 and metadata to app repository DB.
An administrator is processing events in near real-time using Kinesis streams and Lambda. Lambda intermittently fails to process batches from one of the shards due to a 5-munite time limit. What is a possible solution for this problem?
A. Add more Lambda functions to improve concurrent batch processing.
B. Reduce the batch size that Lambda is reading from the stream.
C. Ignore and skip events that are older than 5 minutes and put them to Dead Letter Queue (DLQ).
D. Configure Lambda to read from fewer shards in parallel.
An organization is soliciting public feedback through a web portal that has been deployed to track the number of requests and other important data. As part of reporting and visualization, AmazonQuickSight connects to an Amazon RDS database to virtualize data. Management wants to understand some important metrics about feedback and how the feedback has changed over the last four weeks in a visual representation. What would be the MOST effective way to represent multiple iterations of an analysis in Amazon QuickSight that would show how the data has changed over the last four weeks?
A. Use the analysis option for data captured in each week and view the data by a date range.
B. Use a pivot table as a visual option to display measured values and weekly aggregate data as a row dimension.
C. Use a dashboard option to create an analysis of the data for each week and apply filters to visualize the data change.
D. Use a story option to preserve multiple iterations of an analysis and play the iterations sequentially.
An organization is designing a public web application and has a requirement that states all application users must be centrally authenticated before any operations are permitted. The organization will need to create a user table with fast data lookup for the application in which a user can read only his or her own data. All users already have an account with amazon.com. How can these requirements be met?
A. Create an Amazon RDS Aurora table, with Amazon_ID as the primary key. The application uses amazon.com web identity federation to get a token that is used to assume an IAM role from AWS STS. Use IAM database authentication by using the rds:db-tag IAM authentication policy and GRANT Amazon RDS row-level read permission per user.
B. Create an Amazon RDS Aurora table, with Amazon_ID as the primary key for each user. The application uses amazon.com web identity federation to get a token that is used to assume an IAM role. Use IAM database authentication by using rds:db-tag IAM authentication policy and GRANT Amazon RDS row- level read permission per user.
C. Create an Amazon DynamoDB table, with Amazon_ID as the partition key. The application uses amazon.com web identity federation to get a token that is used to assume an IAM role from AWS STS in the Role, use IAM condition context key dynamodb:LeadingKeys with IAM substitution variables $ and allow the required DynamoDB API operations in IAM JSON policy Action element for reading the records. {www.amazon.com:user_id}
D. Create an Amazon DynamoDB table, with Amazon_ID as the partition key. The application uses amazon.com web identity federation to assume an IAM role from AWS STS in the Role, use IAM condition context key dynamodb:LeadingKeys with IAM substitution variables $ {www.amazon.com:user_id} and allow the required DynamoDB API operations in IAM JSON policy Action element for reading the records.
A system engineer for a company proposes digitalization and backup of large archives for customers. The systems engineer needs to provide users with a secure storage that makes sure that data will never be tampered with once it has been uploaded. How should this be accomplished?
A. Create an Amazon Glacier Vault. Specify a “Deny” Vault Lock policy on this Vault to block “glacier:DeleteArchive”.
B. Create an Amazon S3 bucket. Specify a “Deny” bucket policy on this bucket to block “s3:DeleteObject”.
C. Create an Amazon Glacier Vault. Specify a “Deny” vault access policy on this Vault to block “glacier:DeleteArchive”.
D. Create secondary AWS Account containing an Amazon S3 bucket. Grant “s3:PutObject” to the primary account.
An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR. They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon EMR environment. They have multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G AWS Direct Connect setup between their data center and AWS, and the network team has agreed to allocate 50% of AWS Direct Connect bandwidth to data transfer. The data transfer cannot take more than two days. What would be the MOST efficient approach to transfer data to AWS on a monthly basis?
A. Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to Amazon S3.
B. Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct Connect.
C. Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct Connect.
D. Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3 over AWS Direct Connect.
An administrator needs to manage a large catalog of items from various external sellers. The administrator needs to determine if the items should be identified as minimally dangerous, dangerous, or highly dangerous based on their textual descriptions. The administrator already has some items with the danger attribute, but receives hundreds of new item descriptions every day without such classification. The administrator has a system that captures dangerous goods reports from customer support team of from user feedback. What is a cost-effective architecture to solve this issue?
A. Build a set of regular expression rules that are based on the existing examples, and run them on the DynamoDB Streams as every new item description is added to the system.
B. Build a Kinesis Streams process that captures and marks the relevant items in the dangerous goods reports using a Lambda function once more than two reports have been filed.
C. Build a machine learning model to properly classify dangerous goods and run it on the DynamoDB Streams as every new item description is added to the system.
D. Build a machine learning model with binary classification for dangerous goods and run it on the DynamoDB Streams as every new item description is added to the system.
A company has several teams of analysts. Each team of analysts has their own cluster. The teams need to run SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The company needs to enable a centralized metadata layer to expose the Amazon S3 objects as tables to the analysts. Which approach meets the requirement for a centralized metadata layer?
A. EMRFS consistent view with a common Amazon DynamoDB table
B. Bootstrap action to change the Hive Metastore to an Amazon RDS database
C. s3distcp with the outputManifest option to generate RDS DDL
D. Naming scheme support with automatic partition discovery from Amazon S3
An Amazon Kinesis stream needs to be encrypted. Which approach should be used to accomplish this task?
A. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the producer.
B. Use a partition key to segment the data by MD5 hash function, which makes it undecipherable while in transit.
C. Perform a client-side encryption of the data before it enters the Amazon Kinesis stream on the consumer.
D. Use a shard to segment the data, which has built-in functionality to make it indecipherable while in transit.
A clinical trial will rely on medical sensors to remotely assess patient health. Each physician who participates in the trial requires visual reports each morning. The reports are built from aggregations of all the sensor data taken each minute. What is the most cost-effective solution for creating this visualization each day?
A. Use Kinesis Aggregators Library to generate reports for reviewing the patient sensor data and generate a QuickSight visualization on the new data each morning for the physician to review.
B. Use a transient EMR cluster that shuts down after use to aggregate the patient sensor data each night and generate a QuickSight visualization on the new data each morning for the physician to review.
C. Use Spark streaming on EMR to aggregate the patient sensor data in every 15 minutes and generate a QuickSight visualization on the new data each morning for the physician to review.
D. Use an EMR cluster to aggregate the patient sensor data each night and provide Zeppelin notebooks that look at the new data residing on the cluster each morning for the physician to review.
Access Full BDS-C00 Mock Test Free
Want a full-length mock test experience? Click here to unlock the complete BDS-C00 Mock Test Free set and get access to hundreds of additional practice questions covering all key topics.
We regularly update our question sets to stay aligned with the latest exam objectives—so check back often for fresh content!
Start practicing with our BDS-C00 mock test free today—and take a major step toward exam success!