Question # 1

You support a high-traffic web application and want to ensure that the home page loads in a timely manner. As a first step, you decide to implement a Service Level Indicator (SLI) to represent home page request latency with an acceptable page load time set to 100 ms. What is the Google-recommended way of calculating this SLI? 

A. Buckelize Ihe request latencies into ranges, and then compute the percentile at 100 ms. 
B. Bucketize the request latencies into ranges, and then compute the median and 90th percentiles. 
C. Count the number of home page requests that load in under 100 ms, and then divide by the total number of home page requests. 
D. Count the number of home page requests that load in under 100 ms. and then divide by the total number of all web application requests. 

Question # 2

You are managing the production deployment to a set of Google Kubernetes Engine (GKE) clusters. You want to make sure only images which are successfully built by your trusted CI/CD pipeline are deployed to production. What should you do? 

A. Enable Cloud Security Scanner on the clusters. 
B. Enable Vulnerability Analysis on the Container Registry. 
C. Set up the Kubernetes Engine clusters as private clusters. 
D. Set up the Kubernetes Engine clusters with Binary Authorization.

Question # 3

You are on-call for an infrastructure service that has a large number of dependent systems. You receive an alert indicating that the service is failing to serve most of its requests and all of its dependent systems with hundreds of thousands of users are affected. As part of your Site Reliability Engineering (SRE) incident management protocol, you declare yourself Incident Commander (IC) and pull in two experienced people from your team as Operations Lead (OLJ and Communications Lead (CL). What should you do next? 

A. Look for ways to mitigate user impact and deploy the mitigations to production. 
B. Contact the affected service owners and update them on the status of the incident. 
C. Establish a communication channel where incident responders and leads can communicate with each other.
 D. Start a postmortem, add incident information, circulate the draft internally, and ask internal stakeholders for input. 

Question # 4

You have a CI/CD pipeline that uses Cloud Build to build new Docker images and push them to Docker Hub. You use Git for code versioning. After making a change in the Cloud Build YAML configuration, you notice that no new artifacts are being built by the pipeline. You need to resolve the issue following Site Reliability Engineering practices. What should you do? 

A. Disable the CI pipeline and revert to manually building and pushing the artifacts. 
B. Change the CI pipeline to push the artifacts to Container Registry instead of Docker Hub. 
C. Upload the configuration YAML file to Cloud Storage and use Error Reporting to identify and fix the issue.
 D. Run a Git compare between the previous and current Cloud Build Configuration files to find and fix the bug. 

Question # 5

You support an application running on App Engine. The application is used globally and accessed from various device types. You want to know the number of connections. You are using Stackdriver Monitoring for App Engine. What metric should you use? 

A. flex/connections/current 
B. tcp_ssl_proxy/new_connections 
C. tcp_ssl_proxy/open_connections 
D. flex/instance/connections/current 

Question # 6

You support a multi-region web service running on Google Kubernetes Engine (GKE) behind a Global HTTP'S Cloud Load Balancer (CLB). For legacy reasons, user requests first go through a third-party Content Delivery Network (CDN). which then routes traffic to the CLB. You have already implemented an availability Service Level Indicator (SLI) at the CLB level. However, you want to increase coverage in case of a potential load balancer misconfiguration. CDN failure, or other global networking catastrophe. Where should you measure this new SLI? Choose 2 answers 

A. Your application servers' logs 
B. Instrumentation coded directly in the client
C. Metrics exported from the application servers 
D. GKE health checks for your application servers 
E. A synthetic client that periodically sends simulated user requests 

Question # 7

You need to run a business-critical workload on a fixed set of Compute Engine instances for several months. The workload is stable with the exact amount of resources allocated to it. You want to lower the costs for this workload without any performance implications. What should you do? 

A. Purchase Committed Use Discounts. 
B. Migrate the instances to a Managed Instance Group. 
C. Convert the instances to preemptible virtual machines. 
D. Create an Unmanaged Instance Group for the instances used to run the workload. 

Question # 8

You support an application running on GCP and want to configure SMS notifications to your team for the most critical alerts in Stackdriver Monitoring. You have already identified the alerting policies you want to configure this for. What should you do?

 A. Download and configure a third-party integration between Stackdriver Monitoring and an SMS gateway. Ensure that your team members add their SMS/phone numbers to the external tool. 
B. Select the Webhook notifications option for each alerting policy, and configure it to use a third-party integration tool. Ensure that your team members add their SMS/phone numbers to the external tool. 
C. Ensure that your team members set their SMS/phone numbers in their Stackdriver Profile. Select the SMS notification option for each alerting policy and then select the appropriate SMS/phone numbers from the list.
D. Configure a Slack notification for each alerting policy. Set up a Slack-to-SMS integration to send SMS messages when Slack messages are received. Ensure that your team members add their SMS/phone numbers to the external integration. 

Question # 9

You support an e-commerce application that runs on a large Google Kubernetes Engine (GKE) cluster deployed on-premises and on Google Cloud Platform. The application consists of microservices that run in containers. You want to identify containers that are using the most CPU and memory. What should you do? 

A. Use Stackdriver Kubernetes Engine Monitoring. 
B. Use Prometheus to collect and aggregate logs per container, and then analyze the results in Grafana.
 C. Use the Stackdriver Monitoring API to create custom metrics, and then organize your containers using groups. 
D. Use Stackdriver Logging to export application logs to BigOuery. aggregate logs per container, and then analyze CPU and memory consumption.

Question # 10

You encountered a major service outage that affected all users of the service for multiple hours. After several hours of incident management, the service returned to normal, and user access was restored. You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended practices. What should you do first? 

A. Call individual stakeholders lo explain what happened. 
B. Develop a post-mortem to be distributed to stakeholders.
C. Send the Incident State Document to all the stakeholders. 
D. Require the engineer responsible to write an apology email to all stakeholders. 

Question # 11

You use Cloud Build to build your application. You want to reduce the build time while minimizing cost and development effort. What should you do? 

A. Use Cloud Storage to cache intermediate artifacts. 
B. Run multiple Jenkins agents to parallelize the build. 
C. Use multiple smaller build steps to minimize execution time. 
D. Use larger Cloud Build virtual machines (VMs) by using the machine-type option. 

Question # 12

You have an application running in Google Kubernetes Engine. The application invokes multiple services per request but responds too slowly. You need to identify which downstream service or services are causing the delay. What should you do? 

A. Analyze VPC flow logs along the path of the request. 
B. Investigate the Liveness and Readiness probes for each service. 
C. Create a Dataflow pipeline to analyze service metrics in real time. 
D. Use a distributed tracing framework such as OpenTelemetry or Stackdriver Trace. 

Question # 13

Your application images are built and pushed to Google Container Registry (GCR). You want to build an automated pipeline that deploys the application when the image is updated while minimizing the development effort. What should you do? 

A. Use Cloud Build to trigger a Spinnaker pipeline. 
B. Use Cloud Pub/Sub to trigger a Spinnaker pipeline. 
C. Use a custom builder in Cloud Build to trigger a Jenkins pipeline. 
D. Use Cloud Pub/Sub to trigger a custom deployment service running in Google Kubernetes Engine (GKE). 

Question # 14

You are running a real-time gaming application on Compute Engine that has a production and testing environment. Each environment has their own Virtual Private Cloud (VPC) network. The application frontend and backend servers are located on different subnets in the environment's VPC. You suspect there is a malicious process communicating intermittently in your production frontend servers. You want to ensure that network traffic is captured for analysis. What should you do?

A. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only with a sample volume scale of 0.5. 
B. Enable VPC Flow Logs on the production VPC network frontend and backend subnets only with a sample volume scale of 1.0. 
C. Enable VPC Flow Logs on the testing and production VPC network frontend and backend subnets with a volume scale of 0.5. Apply changes in testing before production. 
D. Enable VPC Flow Logs on the testing and production VPC network frontend and backend subnets with a volume scale of 1.0. Apply changes in testing before production. 

Question # 15

Your company follows Site Reliability Engineering practices. You are the Incident Commander for a new. customer-impacting incident. You need to immediately assign two incident management roles to assist you in an effective incident response. What roles should you assign? Choose 2 answers 

A. Operations Lead 
B. Engineering Lead 
C. Communications Lead 
D. Customer Impact Assessor 
E. External Customer Communications Lead 

Question # 16

You are managing an application that exposes an HTTP endpoint without using a load balancer. The latency of the HTTP responses is important for the user experience. You want to understand what HTTP latencies all of your users are experiencing. You use Stackdriver Monitoring. What should you do? 

A. • In your application, create a metric with a metricKind set to DELTA and a valueType set to DOUBLE.• In Stackdriver's Metrics Explorer, use a Slacked Bar graph to visualize the metric. 
B. • In your application, create a metric with a metricKind set to CUMULATIVE and a valueType set to DOUBLE. • In Stackdriver's Metrics Explorer, use a Line graph to visualize the metric. 
C. • In your application, create a metric with a metricKind set to gauge and a valueType set to distribution. • In Stackdriver's Metrics Explorer, use a Heatmap graph to visualize the metric. 
D. • In your application, create a metric with a metricKind. set toMETRlc_KIND_UNSPECIFIEDanda valueType set to INT64. • In Stackdriver's Metrics Explorer, use a Stacked Area graph to visualize the metric. 

Question # 17

You need to deploy a new service to production. The service needs to automatically scale using a Managed Instance Group (MIG) and should be deployed over multiple regions. The service needs a large number of resources for each instance and you need to plan for capacity. What should you do? 

A. Use the n1-highcpu-96 machine type in the configuration of the MIG. 
B. Monitor results of Stackdriver Trace to determine the required amount of resources.
 C. Validate that the resource requirements are within the available quota limits of each region. 
D. Deploy the service in one region and use a global load balancer to route traffic to this region.

Question # 18

Some of your production services are running in Google Kubernetes Engine (GKE) in the eu-west-1 region. Your build system runs in the us-west-1 region. You want to push the container images from your build system to a scalable registry to maximize the bandwidth for transferring the images to the cluster. What should you do? 

A. Push the images to Google Container Registry (GCR) using the gcr.io hostname. 
B. Push the images to Google Container Registry (GCR) using the us.gcr.io hostname. 
C. Push the images to Google Container Registry (GCR) using the eu.gcr.io hostname. 
D. Push the images to a private image registry running on a Compute Engine instance in the eu-west-1 region. 

Question # 19

You support a trading application written in Python and hosted on App Engine flexible environment. You want to customize the error information being sent to Stackdriver Error Reporting. What should you do? 

A. Install the Stackdriver Error Reporting library for Python, and then run your code on a Compute Engine VM. 
B. Install the Stackdriver Error Reporting library for Python, and then run your code on Google Kubernetes Engine. 
C. Install the Stackdriver Error Reporting library for Python, and then run your code on App Engine flexible environment. 
D. Use the Stackdriver Error Reporting API to write errors from your application to ReportedErrorEvent, and then generate log entries with properly formatted error messages in Stackdriver Logging. 

Question # 20

You are performing a semiannual capacity planning exercise for your flagship service. You expect a service user growth rate of 10% month-over-month over the next six months. Your service is fully containerized and runs on Google Cloud Platform (GCP). using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler enabled. You currently consume about 30% of your total deployed CPU capacity, and you require resilience against the failure of a zone. You want to ensure that your users experience minimal negative impact as a result of this growth or as a result of zone failure, while avoiding unnecessary costs. How should you prepare to handle the predicted growth?

A. Verity the maximum node pool size, enable a horizontal pod autoscaler, and then perform a load test to verity your expected resource needs. 
B. Because you are deployed on GKE and are using a cluster autoscaler. your GKE cluster will scale automatically, regardless of growth rate. 
C. Because you are at only 30% utilization, you have significant headroom and you won't need to add any additional capacity for this rate of growth.
 D. Proactively add 60% more node capacity to account for six months of 10% growth rate, and then perform a load test to make sure you have enough capacity. 

Question # 21

You are running an application on Compute Engine and collecting logs through Stackdriver. You discover that some personally identifiable information (PII) is leaking into certain log entry fields. You want to prevent these fields from being written in new log entries as quickly as possible. What should you do? 

A. Use the filter-record-transformer Fluentd filter plugin to remove the fields from the log entries in flight. 
B. Use the fluent-plugin-record-reformer Fluentd output plugin to remove the fields from the log entries in flight. 
C. Wait for the application developers to patch the application, and then verify that the log entries are no longer exposing PII. 
D. Stage log entries to Cloud Storage, and then trigger a Cloud Function to remove the fields and write the entries to Stackdriver via the Stackdriver Logging API. 

Question # 22

You manage several production systems that run on Compute Engine in the same Google Cloud Platform (GCP) project. Each system has its own set of dedicated Compute Engine instances. You want to know how must it costs to run each of the systems. What should you do? 

A. In the Google Cloud Platform Console, use the Cost Breakdown section to visualize the costs per system.
B. Assign all instances a label specific to the system they run. Configure BigQuery billing export and query costs per label. 
C. Enrich all instances with metadata specific to the system they run. Configure Stackdriver Logging to export to BigQuery, and query costs based on the metadata. 
D. Name each virtual machine (VM) after the system it runs. Set up a usage report export to a Cloud Storage bucket. Configure the bucket as a source in BigQuery to query costs based on VM name. 

Question # 23

Your organization recently adopted a container-based workflow for application development. Your team develops numerous applications that are deployed continuously through an automated build pipeline to a Kubernetes cluster in the production environment. The security auditor is concerned that developers or operators could circumvent automated testing and push code changes to production without approval. What should you do to enforce approvals? 

A. Configure the build system with protected branches that require pull request approval. 
B. Use an Admission Controller to verify that incoming requests originate from approved sources. 
C. Leverage Kubernetes Role-Based Access Control (RBAC) to restrict access to only approved users. 
D. Enable binary authorization inside the Kubernetes cluster and configure the build pipeline as an attestor. 

Question # 24

Your organization recently adopted a container-based workflow for application development. Your team develops numerous applications that are deployed continuously through an automated build pipeline to the production environment. A recent security audit alerted your team that the code pushed to production could contain vulnerabilities and that the existing tooling around virtual machine (VM) vulnerabilities no longer applies to the containerized environment. You need to ensure the security and patch level of all code running through the pipeline. What should you do? 

A. Set up Container Analysis to scan and report Common Vulnerabilities and Exposures. 
B. Configure the containers in the build pipeline to always update themselves before release.
C. Reconfigure the existing operating system vulnerability software to exist inside the container. 
D. Implement static code analysis tooling against the Docker files used to create the containers. 

Question # 25

You are creating and assigning action items in a postmodern for an outage. The outage is over, but you need to address the root causes. You want to ensure that your team handles the action items quickly and efficiently. How should you assign owners and collaborators to action items? 

A. Assign one owner for each action item and any necessary collaborators. 
B. Assign multiple owners for each item to guarantee that the team addresses items quickly 
C. Assign collaborators but no individual owners to the items to keep the postmortem blameless.
 D. Assign the team lead as the owner for all action items because they are in charge of the SRE team. 

Question # 26

You are ready to deploy a new feature of a web-based application to production. You want to use Google Kubernetes Engine (GKE) to perform a phased rollout to half of the web server pods. What should you do? 

A. Use a partitioned rolling update. 
B. Use Node taints with NoExecute. 
C. Use a replica set in the deployment specification. 
D. Use a stateful set with parallel pod management policy. 

Question # 27

You need to reduce the cost of virtual machines (VM| for your organization. After reviewing different options, you decide to leverage preemptible VM instances. Which application is suitable for preemptible VMs? 

A. A scalable in-memory caching system 
B. The organization's public-facing website 
C. A distributed, eventually consistent NoSQL database cluster with sufficient quorum 
D. A GPU-accelerated video rendering platform that retrieves and stores videos in a storage bucket 

Question # 28

You have a pool of application servers running on Compute Engine. You need to provide a secure solution that requires the least amount of configuration and allows developers to easily access application logs for troubleshooting. How would you implement the solution on GCP?

A. • Deploy the Stackdriver logging agent to the application servers. • Give the developers the IAM Logs Viewer role to access Stackdriver and view logs. 
B. • Deploy the Stackdriver logging agent to the application servers. • Give the developers the IAM Logs Private Logs Viewer role to access Stackdriver and view logs. 
C. • Deploy the Stackdriver monitoring agent to the application servers. • Give the developers the IAM Monitoring Viewer role to access Stackdriver and view metrics. 
D. • Install the gsutil command line tool on your application servers. • Write a script using gsutil to upload your application log to a Cloud Storage bucket, and then schedule it to run via cron every 5 minutes. • Give the developers IAM Object Viewer access to view the logs in the specified bucket. 

Question # 29

You support a service with a well-defined Service Level Objective (SLO). Over the previous 6 months, your service has consistently met its SLO and customer satisfaction has been consistently high. Most of your service’s operations tasks are automated and few repetitive tasks occur frequently. You want to optimize the balance between reliability and deployment velocity while following site reliability engineering best practices. What should you do? (Choose two.) 

A. Make the service’s SLO more strict. 
B. Increase the service’s deployment velocity and/or risk. 
C. Shift engineering time to other services that need more reliability. 
D. Get the product team to prioritize reliability work over new features. 
E. Change the implementation of your Service Level Indicators (SLIs) to increase coverage. 

Question # 30

You are running an experiment to see whether your users like a new feature of a web application. Shortly after deploying the feature as a canary release, you receive a spike in the number of 500 errors sent to users, and your monitoring reports show increased latency. You want to quickly minimize the negative impact on users. What should you do first? 

A. Roll back the experimental canary release. 
B. Start monitoring latency, traffic, errors, and saturation. 
C. Record data for the postmortem document of the incident. 
D. Trace the origin of 500 errors and the root cause of increased latency. 

Question # 31

You support a production service that runs on a single Compute Engine instance. You regularly need to spend time on recreating the service by deleting the crashing instance and creating a new instance based on the relevant image. You want to reduce the time spent performing manual operations while following Site Reliability Engineering principles. What should you do? 

A. File a bug with the development team so they can find the root cause of the crashing instance. 
B. Create a Managed Instance Group with a single instance and use health checks to determine the system status.
 C. Add a Load Balancer in front of the Compute Engine instance and use health checks to determine the system status. 
D. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start recreating the crashed instance promptly after it has crashed.

Question # 32

You have migrated an e-commerce application to Google Cloud Platform (GCP). You want to prepare the application for the upcoming busy season. What should you do first to prepare for the busy season? 

A. Load teat the application to profile its performance for scaling. 
B. Enable AutoScaling on the production clusters, in case there is growth. 
C. Pre-provision double the compute power used last season, expecting growth. 
D. Create a runbook on inflating the disaster recovery (DR) environment if there is growth. 

