|
| 1 | +# Anomaly Detection |
| 2 | +**GA4** has anomaly detection, but it will also report anomalies based on changes in traffic. This anomaly detection tries to avoid reporting anomalies caused by session fluctuations. |
| 3 | + |
| 4 | +* This functionality is in **BETA** |
| 5 | +* Anomaly detection flags Events or Parameters with significant spikes or drops that aren’t explained by session fluctuations. |
| 6 | + * **Event** anomalies are detected independently across platforms. |
| 7 | + * **Parameter** anomalies are detected independently across scopes, platforms and events. |
| 8 | + * Parameter anomalies are only flagged if their anomaly aren’t explained by an Event. |
| 9 | +* New Events and Parameters detected are flagged independently. |
| 10 | + |
| 11 | +This helps you identify if something is potentially broken or has changed. |
| 12 | + |
| 13 | +* Anomalies are detected using **Standard Deviation** |
| 14 | + - You can choose between **adjusted for day of week**, and **not adjusted for day of week** |
| 15 | + |
| 16 | +## Anomaly Detection Setup |
| 17 | + |
| 18 | +1. Create 1 [**Scheduled Query**](#scheduled-queries-settings) |
| 19 | +2. Create 1 [**Logs Router**](#create-the-logs-router) |
| 20 | +3. Create 1 [**Cloud Function**](#google-cloud-functions) |
| 21 | + |
| 22 | +### Anomaly Settings |
| 23 | + |
| 24 | +* Anomaly settings can be adjusted in **Google Sheet** in **Advanced Settings**. |
| 25 | + |
| 26 | +#### Anomaly Query Periods settings |
| 27 | + |
| 28 | +| Declaration | Default | Comment | |
| 29 | +| ------------- | ------------- | ------------- | |
| 30 | +| Day Interval Short | 1 | Number of days to check for anomalies (e.g., last 1 day). Declared in query as **day_interval_short**. | |
| 31 | +| Day Interval Extended | 28 | Number of days to query the first time to get some event & parameter count data. If you have lot's of data, cost may occour if you are selecting a long period. For anomaly detection you need at least 28 days of data. Declared in query as **day_interval_extended**. | |
| 32 | +| Minimum Number of Days before Anomaly Detection | 28 | Minimum number of days of data collected before running anomaly detection. With **standard deviation model**, **28** days (as minimum) is recommended. With **day of week** adjustment, **56** or **84** is recommended. Declared in query as **days_before_anomaly_detection**. | |
| 33 | +| Rolling Statistics Interval | 90| Number of Days for rolling statistics (e.g., last 90 days). Declared in query as **day_interval_large**. | |
| 34 | + |
| 35 | +#### Anomaly Detection Settings |
| 36 | + |
| 37 | +| Declaration | Default | Comment | |
| 38 | +| ------------- | ------------- | ------------- | |
| 39 | +| Minimum Expected Count Threshold | 10 | Minimum expected count threshold for anomaly detection. If expected count is equal to or lower than this number, no anomaly detection will be run. Delcared in query as **min_expected_count**. | |
| 40 | +| Standard Deviation Multiplier | 3 | Multiplier for standard deviation. Standard deviation for events and parameters. Scale goes from 1 to 3. Default setting is 3; lower sensitivity, fewer false positives. Declared in query as **stddev_multiplier**. | |
| 41 | +| Events Explained by Sessions Threshold | 0.2 | If an event anomaly is reported, and should have been explained by changes in sessions, increase the number. Decrease the number for the opposite scenario. Declared in query as **events_explained_by_sessions_threshold**. | |
| 42 | +| Parameters Explained by Sessions Threshold | 0.2 | If an parameter anomaly is reported, and should have been explained by changes in sessions, increase the number. Decrease the number for the opposite scenario. Declared in query as **parameters_explained_by_sessions_threshold**. | |
| 43 | +| Standard Deviation Model Setting | standard | Standard Deviation model can either be **standard** or **dayofweek**. dayofweek = adjusted for day of week. standard = not adjusted for day of week. <br /><br /> **Day of Week:** More accurate in detecting true anomalies by considering natural day-of-week fluctuations, but may fail if the day-of-week effect varies seasonally or due to external factors. <br /><br /> **Standard:** Works well for detecting overall trends and anomalies unrelated to weekly patterns. May work better with seasonally changes. <br /><br/>Declared in query as **stddev_model_setting**. | |
| 44 | + |
| 45 | + |
| 46 | +### Scheduled queries logic |
| 47 | +All scheduled queries have this logic: |
| 48 | + |
| 49 | +* If **events_fresh_** table exist (GA 360 only), query **only** this table including **today**. |
| 50 | + * If **events_fresh_** doesn't exist, query **events_** table until **yesterday**. |
| 51 | + * If **yesterday** doesn't exist in **events_** table, query **events_intraday_** (if the table exist), between **yesterday** and **today**. |
| 52 | + * Else query **events_intraday_** only for **today**. |
| 53 | + |
| 54 | +### Scheduled queries settings |
| 55 | +* Replace **your-project.analytics_XXX** with your project and data set |
| 56 | +* Settings can be edited in **Google Sheet** in the **Advanced Settings** sheet. |
| 57 | + |
| 58 | +| Scheduled query | Comment | |
| 59 | +| ------------- | ------------- | |
| 60 | +| [ga4_documentation_anomaly_detection](ga4_documentation_anomaly_detection.sql) | This query creates 2 tables: <ol> <li>[ga4_documentation_anomaly_detection](#table-ga4_documentation_anomaly_detection)</li> <li>[ga4_documentation_anomaly_detection_session_counts](#table-ga4_documentation_anomaly_detection_session_counts)</li></ol> | |
| 61 | + |
| 62 | +* Scheduled queries should use **On-demand Repeat frequency** |
| 63 | +* Do NOT tick the checkbox **Set a destination table for query results**. That logic is handled within the SQL query |
| 64 | +* Decide if you need to specify **Location type** (Ex. Multi-region and EU) |
| 65 | +* Click **Save** |
| 66 | + |
| 67 | +**The complete setup works like this:** |
| 68 | +When a _INSERT_ is made to the **ga4_documentation_parameters_daily_counts** table, a **Cloud Run Function** will automatically run the scheduled query. |
| 69 | + |
| 70 | +## Create the Logs Router |
| 71 | +* Go to [**Logs Router**](https://console.cloud.google.com/logs/router), and click the **CREATE SINK** button. |
| 72 | + * Give the sink a name, ex. **ga4_documentation_anomaly_update**. |
| 73 | + * Choose **Google Cloud Pub/Sub topic** as the destination. |
| 74 | + * From the list of available Pub/Sub topics, click to **create a new topic**. |
| 75 | + * Create a Topic ID, ex. **ga4ga4_documentation_anomaly_update**. |
| 76 | + * In the **Build inclusion filter**, copy the filter below, but replace **analytics_XXX** with your **Dataset ID**. |
| 77 | + |
| 78 | + ### Build inclusion filter |
| 79 | + |
| 80 | +```sql |
| 81 | + |
| 82 | +protoPayload.methodName="jobservice.jobcompleted" |
| 83 | +protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId="analytics_XXX" |
| 84 | +protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.statementType="INSERT" |
| 85 | +protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId="ga4_documentation_parameters_daily_counts" |
| 86 | + |
| 87 | +``` |
| 88 | + |
| 89 | +## Google Cloud Functions |
| 90 | +You have to create **1** Cloud Function. |
| 91 | + |
| 92 | +Go to [**Cloud Functions**](https://console.cloud.google.com/functions/list), and click **Create function**. |
| 93 | + |
| 94 | +* **Configuration page** |
| 95 | + * Keep environment as **1st gen**. |
| 96 | + * Give the function a descriptive name, ex. "ga4_documentation_anomaly_update". |
| 97 | + * Choose **Cloud Pub/Sub** as the trigger type. |
| 98 | + * Select the **Pub/Sub topic** you created in the previous chapter as the trigger. |
| 99 | + * Check the box **Retry on failure**. |
| 100 | + * Click **Save** to save the trigger settings. |
| 101 | + * Ignore the Runtime, build, connections and security settings accordion and click **Next** to continue. |
| 102 | +* **Code page** |
| 103 | + * Keep Node.js as the runtime (choose the latest non-Preview version). |
| 104 | + * Click **package.json** to edit its contents. |
| 105 | + * Add the following line in the “dependencies” property: |
| 106 | + * _"@google-cloud/bigquery-data-transfer": "^3.1.3"_ |
| 107 | + * Select **index.js** from the list of files to activate the code editor. |
| 108 | + * Edit the Entry point field on top of the editor to be **runScheduledQuery**. |
| 109 | + * Copy-paste the **index.js** code below into the editor. |
| 110 | + * Change the **projectId** value to match your **Google Cloud Platform project ID**. |
| 111 | + * To get values for **region** and **configId**, browse to [**scheduled queries**](https://console.cloud.google.com/bigquery/scheduled-queries), open your scheduled query, and click the **Configuration tab** to view its details. |
| 112 | + * **region** value should be the Google Cloud region of the Destination dataset, so click through to that to check if you don’t remember what it was. |
| 113 | + * **configId** is the **GUID** at the end of the **Resource name** of the scheduled query. |
| 114 | + |
| 115 | +### index.js |
| 116 | +```javascript |
| 117 | + |
| 118 | +const bigqueryDataTransfer = require('@google-cloud/bigquery-data-transfer'); |
| 119 | + |
| 120 | +exports.runScheduledQuery = async (event, context) => { |
| 121 | + // Update configuration options |
| 122 | + const projectId = 'REPLACE-THIS'; |
| 123 | + const configId = 'REPLACE-THIS'; |
| 124 | + const region = 'REPLACE-THIS'; |
| 125 | + |
| 126 | + const d = new Date(); |
| 127 | + const year = d.getFullYear(), |
| 128 | + month = d.getMonth(), |
| 129 | + day = d.getDate(); |
| 130 | + |
| 131 | + const runTime = new Date(Date.UTC(year, month, parseInt(day), 12)); |
| 132 | + // Create a proto-buffer Timestamp object from this |
| 133 | + const requestedRunTime = bigqueryDataTransfer.protos.google.protobuf.Timestamp.fromObject({ |
| 134 | + seconds: runTime / 1000, |
| 135 | + nanos: (runTime % 1000) * 1e6 |
| 136 | + }); |
| 137 | + |
| 138 | + const client = new bigqueryDataTransfer.v1.DataTransferServiceClient(); |
| 139 | + const parent = client.projectLocationTransferConfigPath(projectId, region, configId); |
| 140 | + |
| 141 | + const request = { |
| 142 | + parent, |
| 143 | + requestedRunTime |
| 144 | + }; |
| 145 | + |
| 146 | + const response = await client.startManualTransferRuns(request); |
| 147 | + return response; |
| 148 | +}; |
| 149 | + |
| 150 | +``` |
| 151 | + |
| 152 | +## Testing scheduled queries setup |
| 153 | +To test the setup, simply go to the **Google Sheet**, select the **📈 GA4 Documentation Menu** at the top of the sheet, and select **BigQuery -> Export Event & Parameter Documentation**. |
| 154 | + |
| 155 | +This will run a BigQuery query using Apps Script. If this is completed without errors, you should now see 2 anomaly tables in BigQuery. |
| 156 | + |
| 157 | +## Overview over tables created in BigQuery |
| 158 | +**ga4_documentation_anomaly_detection** is the BigQuery table that you will use in Looker Studio. |
| 159 | + |
| 160 | +1. [ga4_documentation_anomaly_detection](#table-ga4_documentation_anomaly_detection) |
| 161 | +2. [ga4_documentation_anomaly_detection_session_counts](#table-ga4_documentation_anomaly_detection_session_counts) |
| 162 | + |
| 163 | + |
| 164 | +### Table: ga4_documentation_anomaly_detection |
| 165 | + |
| 166 | +| Field name | Type | Comment | |
| 167 | +| ------------- | ------------- | ------------- | |
| 168 | +| event_date | DATE | Event Date | |
| 169 | +| platform | STRING | Platform can be WEB, IOS or ANDROID | |
| 170 | +| event_or_parameter_name | STRING | **event_name** or **parameter_name** | |
| 171 | +| event_or_parameter_type | STRING | Can be either **event** or **parameter** | |
| 172 | +| actual_count | INTEGER | Actual Count for the Event or Parameter | |
| 173 | +| expected_count | FLOAT | Standard Deviation Expected Count | |
| 174 | +| anomaly_description | STRING | Anomaly described as text | |
| 175 | +| net_change_percentage | FLOAT | Anomaly change expressed as percent in the format 0.1 = 10%, 1 = 100% etc. | |
| 176 | +| parameter_scope | STRING | Parameter Scope if the anomaly is for a parameter | |
| 177 | +| event_name | STRING | Event Name. Relevant for parameter anomaly | |
| 178 | +| upper_bound | FLOAT64 | Upper Bound deviation from expected value. This can help with post-analysis, debugging, and tuning the detection sensitivity. | |
| 179 | +| lower_bound | FLOAT64 | Lower Bound deviation from expected value. This can help with post-analysis, debugging, and tuning the detection sensitivity. | |
| 180 | + |
| 181 | +### Table: ga4_documentation_anomaly_detection_session_counts |
| 182 | + |
| 183 | +| Field name | Type | Comment | |
| 184 | +| ------------- | ------------- | ------------- | |
| 185 | +| event_date | DATE | Event Date| |
| 186 | +| platform | STRING | Platform can be WEB, IOS or ANDROID | |
| 187 | +| session_count_total | INTEGER | Total count of sessions for platform | |
0 commit comments