Skip to content

Commit 96b9b16

Browse files
authored
Merge pull request #10 from Knowit-Experience-MarTech/version-3.0
Version 3.0
2 parents a05ccde + 0dfac62 commit 96b9b16

76 files changed

Lines changed: 6227 additions & 2715 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
# Anomaly Detection
2+
**GA4** has anomaly detection, but it will also report anomalies based on changes in traffic. This anomaly detection tries to avoid reporting anomalies caused by session fluctuations.
3+
4+
* This functionality is in **BETA**
5+
* Anomaly detection flags Events or Parameters with significant spikes or drops that aren’t explained by session fluctuations.
6+
* **Event** anomalies are detected independently across platforms.
7+
* **Parameter** anomalies are detected independently across scopes, platforms and events.
8+
* Parameter anomalies are only flagged if their anomaly aren’t explained by an Event.
9+
* New Events and Parameters detected are flagged independently.
10+
11+
This helps you identify if something is potentially broken or has changed.
12+
13+
* Anomalies are detected using **Standard Deviation**
14+
- You can choose between **adjusted for day of week**, and **not adjusted for day of week**
15+
16+
## Anomaly Detection Setup
17+
18+
1. Create 1 [**Scheduled Query**](#scheduled-queries-settings)
19+
2. Create 1 [**Logs Router**](#create-the-logs-router)
20+
3. Create 1 [**Cloud Function**](#google-cloud-functions)
21+
22+
### Anomaly Settings
23+
24+
* Anomaly settings can be adjusted in **Google Sheet** in **Advanced Settings**.
25+
26+
#### Anomaly Query Periods settings
27+
28+
| Declaration | Default | Comment |
29+
| ------------- | ------------- | ------------- |
30+
| Day Interval Short | 1 | Number of days to check for anomalies (e.g., last 1 day). Declared in query as **day_interval_short**. |
31+
| Day Interval Extended | 28 | Number of days to query the first time to get some event & parameter count data. If you have lot's of data, cost may occour if you are selecting a long period. For anomaly detection you need at least 28 days of data. Declared in query as **day_interval_extended**. |
32+
| Minimum Number of Days before Anomaly Detection | 28 | Minimum number of days of data collected before running anomaly detection. With **standard deviation model**, **28** days (as minimum) is recommended. With **day of week** adjustment, **56** or **84** is recommended. Declared in query as **days_before_anomaly_detection**. |
33+
| Rolling Statistics Interval | 90| Number of Days for rolling statistics (e.g., last 90 days). Declared in query as **day_interval_large**. |
34+
35+
#### Anomaly Detection Settings
36+
37+
| Declaration | Default | Comment |
38+
| ------------- | ------------- | ------------- |
39+
| Minimum Expected Count Threshold | 10 | Minimum expected count threshold for anomaly detection. If expected count is equal to or lower than this number, no anomaly detection will be run. Delcared in query as **min_expected_count**. |
40+
| Standard Deviation Multiplier | 3 | Multiplier for standard deviation. Standard deviation for events and parameters. Scale goes from 1 to 3. Default setting is 3; lower sensitivity, fewer false positives. Declared in query as **stddev_multiplier**. |
41+
| Events Explained by Sessions Threshold | 0.2 | If an event anomaly is reported, and should have been explained by changes in sessions, increase the number. Decrease the number for the opposite scenario. Declared in query as **events_explained_by_sessions_threshold**. |
42+
| Parameters Explained by Sessions Threshold | 0.2 | If an parameter anomaly is reported, and should have been explained by changes in sessions, increase the number. Decrease the number for the opposite scenario. Declared in query as **parameters_explained_by_sessions_threshold**. |
43+
| Standard Deviation Model Setting | standard | Standard Deviation model can either be **standard** or **dayofweek**. dayofweek = adjusted for day of week. standard = not adjusted for day of week. <br /><br /> **Day of Week:** More accurate in detecting true anomalies by considering natural day-of-week fluctuations, but may fail if the day-of-week effect varies seasonally or due to external factors. <br /><br /> **Standard:** Works well for detecting overall trends and anomalies unrelated to weekly patterns. May work better with seasonally changes. <br /><br/>Declared in query as **stddev_model_setting**. |
44+
45+
46+
### Scheduled queries logic
47+
All scheduled queries have this logic:
48+
49+
* If **events_fresh_** table exist (GA 360 only), query **only** this table including **today**.
50+
* If **events_fresh_** doesn't exist, query **events_** table until **yesterday**.
51+
* If **yesterday** doesn't exist in **events_** table, query **events_intraday_** (if the table exist), between **yesterday** and **today**.
52+
* Else query **events_intraday_** only for **today**.
53+
54+
### Scheduled queries settings
55+
* Replace **your-project.analytics_XXX** with your project and data set
56+
* Settings can be edited in **Google Sheet** in the **Advanced Settings** sheet.
57+
58+
| Scheduled query | Comment |
59+
| ------------- | ------------- |
60+
| [ga4_documentation_anomaly_detection](ga4_documentation_anomaly_detection.sql) | This query creates 2 tables: <ol> <li>[ga4_documentation_anomaly_detection](#table-ga4_documentation_anomaly_detection)</li> <li>[ga4_documentation_anomaly_detection_session_counts](#table-ga4_documentation_anomaly_detection_session_counts)</li></ol> |
61+
62+
* Scheduled queries should use **On-demand Repeat frequency**
63+
* Do NOT tick the checkbox **Set a destination table for query results**. That logic is handled within the SQL query
64+
* Decide if you need to specify **Location type** (Ex. Multi-region and EU)
65+
* Click **Save**
66+
67+
**The complete setup works like this:**
68+
When a _INSERT_ is made to the **ga4_documentation_parameters_daily_counts** table, a **Cloud Run Function** will automatically run the scheduled query.
69+
70+
## Create the Logs Router
71+
* Go to [**Logs Router**](https://console.cloud.google.com/logs/router), and click the **CREATE SINK** button.
72+
* Give the sink a name, ex. **ga4_documentation_anomaly_update**.
73+
* Choose **Google Cloud Pub/Sub topic** as the destination.
74+
* From the list of available Pub/Sub topics, click to **create a new topic**.
75+
* Create a Topic ID, ex. **ga4ga4_documentation_anomaly_update**.
76+
* In the **Build inclusion filter**, copy the filter below, but replace **analytics_XXX** with your **Dataset ID**.
77+
78+
### Build inclusion filter
79+
80+
```sql
81+
82+
protoPayload.methodName="jobservice.jobcompleted"
83+
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId="analytics_XXX"
84+
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.statementType="INSERT"
85+
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId="ga4_documentation_parameters_daily_counts"
86+
87+
```
88+
89+
## Google Cloud Functions
90+
You have to create **1** Cloud Function.
91+
92+
Go to [**Cloud Functions**](https://console.cloud.google.com/functions/list), and click **Create function**.
93+
94+
* **Configuration page**
95+
* Keep environment as **1st gen**.
96+
* Give the function a descriptive name, ex. "ga4_documentation_anomaly_update".
97+
* Choose **Cloud Pub/Sub** as the trigger type.
98+
* Select the **Pub/Sub topic** you created in the previous chapter as the trigger.
99+
* Check the box **Retry on failure**.
100+
* Click **Save** to save the trigger settings.
101+
* Ignore the Runtime, build, connections and security settings accordion and click **Next** to continue.
102+
* **Code page**
103+
* Keep Node.js as the runtime (choose the latest non-Preview version).
104+
* Click **package.json** to edit its contents.
105+
* Add the following line in the “dependencies” property:
106+
* _"@google-cloud/bigquery-data-transfer": "^3.1.3"_
107+
* Select **index.js** from the list of files to activate the code editor.
108+
* Edit the Entry point field on top of the editor to be **runScheduledQuery**.
109+
* Copy-paste the **index.js** code below into the editor.
110+
* Change the **projectId** value to match your **Google Cloud Platform project ID**.
111+
* To get values for **region** and **configId**, browse to [**scheduled queries**](https://console.cloud.google.com/bigquery/scheduled-queries), open your scheduled query, and click the **Configuration tab** to view its details.
112+
* **region** value should be the Google Cloud region of the Destination dataset, so click through to that to check if you don’t remember what it was.
113+
* **configId** is the **GUID** at the end of the **Resource name** of the scheduled query.
114+
115+
### index.js
116+
```javascript
117+
118+
const bigqueryDataTransfer = require('@google-cloud/bigquery-data-transfer');
119+
120+
exports.runScheduledQuery = async (event, context) => {
121+
// Update configuration options
122+
const projectId = 'REPLACE-THIS';
123+
const configId = 'REPLACE-THIS';
124+
const region = 'REPLACE-THIS';
125+
126+
const d = new Date();
127+
const year = d.getFullYear(),
128+
month = d.getMonth(),
129+
day = d.getDate();
130+
131+
const runTime = new Date(Date.UTC(year, month, parseInt(day), 12));
132+
// Create a proto-buffer Timestamp object from this
133+
const requestedRunTime = bigqueryDataTransfer.protos.google.protobuf.Timestamp.fromObject({
134+
seconds: runTime / 1000,
135+
nanos: (runTime % 1000) * 1e6
136+
});
137+
138+
const client = new bigqueryDataTransfer.v1.DataTransferServiceClient();
139+
const parent = client.projectLocationTransferConfigPath(projectId, region, configId);
140+
141+
const request = {
142+
parent,
143+
requestedRunTime
144+
};
145+
146+
const response = await client.startManualTransferRuns(request);
147+
return response;
148+
};
149+
150+
```
151+
152+
## Testing scheduled queries setup
153+
To test the setup, simply go to the **Google Sheet**, select the **📈 GA4 Documentation Menu** at the top of the sheet, and select **BigQuery -> Export Event & Parameter Documentation**.
154+
155+
This will run a BigQuery query using Apps Script. If this is completed without errors, you should now see 2 anomaly tables in BigQuery.
156+
157+
## Overview over tables created in BigQuery
158+
**ga4_documentation_anomaly_detection** is the BigQuery table that you will use in Looker Studio.
159+
160+
1. [ga4_documentation_anomaly_detection](#table-ga4_documentation_anomaly_detection)
161+
2. [ga4_documentation_anomaly_detection_session_counts](#table-ga4_documentation_anomaly_detection_session_counts)
162+
163+
164+
### Table: ga4_documentation_anomaly_detection
165+
166+
| Field name | Type | Comment |
167+
| ------------- | ------------- | ------------- |
168+
| event_date | DATE | Event Date |
169+
| platform | STRING | Platform can be WEB, IOS or ANDROID |
170+
| event_or_parameter_name | STRING | **event_name** or **parameter_name** |
171+
| event_or_parameter_type | STRING | Can be either **event** or **parameter** |
172+
| actual_count | INTEGER | Actual Count for the Event or Parameter |
173+
| expected_count | FLOAT | Standard Deviation Expected Count |
174+
| anomaly_description | STRING | Anomaly described as text |
175+
| net_change_percentage | FLOAT | Anomaly change expressed as percent in the format 0.1 = 10%, 1 = 100% etc. |
176+
| parameter_scope | STRING | Parameter Scope if the anomaly is for a parameter |
177+
| event_name | STRING | Event Name. Relevant for parameter anomaly |
178+
| upper_bound | FLOAT64 | Upper Bound deviation from expected value. This can help with post-analysis, debugging, and tuning the detection sensitivity. |
179+
| lower_bound | FLOAT64 | Lower Bound deviation from expected value. This can help with post-analysis, debugging, and tuning the detection sensitivity. |
180+
181+
### Table: ga4_documentation_anomaly_detection_session_counts
182+
183+
| Field name | Type | Comment |
184+
| ------------- | ------------- | ------------- |
185+
| event_date | DATE | Event Date|
186+
| platform | STRING | Platform can be WEB, IOS or ANDROID |
187+
| session_count_total | INTEGER | Total count of sessions for platform |

0 commit comments

Comments
 (0)