1.
You are designing a solution that will use Apache HBase on Microsoft Azure HDInsight. You need to design the row keys for the database to ensure that client traffic is directed over all of the nodes in the cluster. What are two possible techniques that you can use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.
2.
A company named Fabrikam, Inc. has a Microsoft Azure web app. Billions of users visit the app daily. The web app logs all user activity by using text files in Azure Blob storage. Each day, approximately 200 GB of text files are created. Fabrikam uses the log files from an Apache Hadoop cluster on Azure HDInsight. You need to recommend a solution to optimize the storage of the log files for later Hive use. What is the best property to recommend adding to the Hive table definition to achieve the goal? More than one answer choice may achieve the goal. Select the BEST answer.
3.
You have structured data that resides in Microsoft Azure Blob storage. You need to perform a rapid interactive analysis of the data and to generate visualizations of the data. What is the best type of Azure HDInsight cluster to use to achieve the goal? More than one answer choice may achieve the goal. Choose the BEST answer.
4.
You are designing a solution based on the lambda architecture. You need to recommend which technology to use for the serving layer. What should you recommend?
5.
 Your company has thousands of Internet-connected sensors. You need to recommend a computing solution to perform a real-time analysis of the date generated by the sensors. Which computing solution should you include in the recommendation?
6.
You are designing an Internet of Thing: (IoT) solution intended to identify trends. The solution requires the real-time analysis of data originating from sensors. The results of the analysis will be stored in a SQL database. You need to recommend a data processing solution that uses the Transact-SQL language. Which data processing solution should you recommend?
7.
You have an Apache Storm cluster. The cluster will ingest data from a Microsoft Azure event hub. The event hub has the characteristics described in the following table. You are designing the Storm application topology. You need to ingest data from all of the partitions. The solution must maximize the throughput of the data ingestion. Which setting should you use?
8.

A company named Fabrikam, Inc. has a web app. Millions of users visit the app daily. Fabrikam performs a daily analysis of the previous day's logs by scheduling the following Hive query.
CREATE EXTERNAL TABLE IF NOT EXISTS UserActivity (...) Partitioned BY (LogDate string) Location MSCK REPAIR TABLE UserActivity;
Select ... From UserActivity where LogDate = "{date}";
You need to recommend a solution to gather the log collections from the web app. What should you recommend?
9.
You have a Microsoft Azure Data Factory pipeline. You discover that the pipeline fails to execute because data is missing. You need to rerun the failure in the pipeline. Which cmdlet should you use?
10.
You need to recommend a platform architecture for a big data solution that meets the following requirements: Supports batch processing Provides a holding area for a 3-petabyte (PB) dataset Minimizes the development effort to implement the solution Provides near real-time relational querying across a multi-terabyte (TB) dataset Which two platform architectures should you include in the recommendation? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.