Anomaly detection
Anomaly detection
One of the key features of Waaila is the ability to evaluate data using AI algorithms to determine whether there has been any non-standard increase or decrease in the data. Tests based on anomaly detection have the advantage, as compared to tests based on simple threshold, that the algorithm takes into account the periodicity of data and its recent evolution. Therefore, users do not need to adjust test threshold based on seasons.
In Waaila there are two options to evaluate the anomalies in your data. The first one expects weekly periodicity of data and compares the current values to a moving average from recent weeks. The weekly pattern does not have to be strong. Already with slightly different values during weekends you can benefit from this feature.
The second approach is more flexible regarding the periodicity of the data but it requires connection to Azure AI Services. It is built on the AI Anomaly Detector resource using their existing AI models. The Anomaly Detector allows setting different expected length of cycles, or this parameter can also be deduced from the data.
waaila.functions.isDayOfWeekAnomaly()
The isDayOfWeekAnomaly() is based on the comparison of values in previous weeks and the latest value of the selected metric. You can test a single aggregate metric or you can split the metric by one or more dimensions and test for anomaly in evolution for each combination of dimensions separately. This can be useful for example to observe the behavior of visitors from different sources or different devices. If you find an anomaly, you can directly see which (if any specific) group contributed to it and investigate the cause directly.
In the test you can adjust the anomaly configuration parameters in The test logic. Mainly you can variate the sensitivity parameter. This parameter sets how strict the test is when evaluating deviations from the expected value. It ranges from 0 to 99 (where 99 is the most strict) and corresponds to percentage deviation from expected value.
isDayOfWeekAnomaly example (Google Analytics 4)
- Go to the Templates Gallery and select General Daily Checks
- Select Use tests and follow the dialog to save the selected tests to a depot and connect them to data
- Open the test W4130 - Hostname traffic anomaly
- Run the test
This test has two parts. In the first the total number of sessions across all webs is tested, in the second part the test checks each hostname separately. Thus you can see if there is some significant change in sessions overall. Moreover, you can see directly which hostnames have contributed to this change. At the top of the Test logic you can find the configuration in the variables anomalyDetectionConfigAgg (for anomaly across all webs) and anomalyDetectionConfigHostname (for anomaly by hostname). The anomaly across all webs has higher sensitivity because the overall sessions should be more stable than the sessions by individual hostnames.
- Adjust the sensitivity parameter to suit your data - if the test failed but the change in sessions does not indicate an issue for your website, decrease it; if you had a singificant change in your data but the test does not indicate it, increase the threshold
- Run the test again and compare the results
isDayOfWeekAnomaly example (Piano Analytics)
Anomaly tests are currently prepared in templates only for the GA4 data provider but you can use the functionality with other providers as well. The steps below describe how to create anomaly detection on your Piano data.
Replace the content of the Query logic with the following JSON. You can adjust it for your needs using the steps below. If you need further help with the Query logic, follow the documentation on Query logic.
- The period is set to get data from current and past 2 months
- You need enough data beforehand – you can select in the test how many weeks to take into account, with the trade-off that taking more weeks is more robust but less quick to adjust in case of change of trend (we recommend selecting 6 weeks)
- Unfortunately the API for Piano Analytics data does not allow to select last N days (for example last 42 days), it is necessary to combine the data from multiple queries of monthly data
- In columns you specify the values that will be available for the anomaly analysis
- must include “date”
- add a metric that you want to analyze, e.g. m_visits, m_prod_purchased, m_unique_visitors, m_events, m_page_loads
- additionally you can specify other property/ies to be used to search for anomalies across different characteristics of the website traffic (e.g. device_type, page_chapter1, src)
- don't forget to adjust the columns in each of the three queries
{
"queries": [
{
"columns": [
"date",
"device_type",
"m_visits"
],
"sort": [
"date"
],
"space": {
"s": [
"[code]"
]
},
"period": {
"p1": [
{
"type": "R",
"granularity": "M",
"offset": 0
}
]
},
"max-results": 1000,
"page-num": 1
},
{
"columns": [
"date",
"device_type",
"m_visits"
],
"sort": [
"date"
],
"space": {
"s": [
"[code]"
]
},
"period": {
"p1": [
{
"type": "R",
"granularity": "M",
"offset": -1
}
]
},
"max-results": 1000,
"page-num": 1
},
{
"columns": [
"date",
"device_type",
"m_visits"
],
"sort": [
"date"
],
"space": {
"s": [
"[code]"
]
},
"period": {
"p1": [
{
"type": "R",
"granularity": "M",
"offset": -2
}
]
},
"max-results": 1000,
"page-num": 1
}
]
}
Replace the Test logic content with the content below and configure the anomaly detection parameters to match any adjustments you did to the Query logic. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.
- In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies
- The name corresponds to Piano Analytics official label of the metric (typically derived by excluding the “m_” prefix, sometimes change more) e.g. Visits (for m_visits), Purchases (for m_prod_purchased), Visitors (for m_unique_visitors), Events (for m_events) Page_loads (for m_page_loads)
-
Optionally, adjust the number of weeks (if you want to use more weeks of data, adjust the Query logic accordingly)
-
Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 80
-
If you included additional properties in the Query logic, you need to specify them all inside the dimensions array (as the data are loaded in the granularity based on all included dimensions in the Query logic)
-
Save and run the test
If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. The column isAnomaly is conditionally formatted for better overview of the results. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.
(results, waaila) => {
// < Test configuration starts >
const anomalyConfig = {
valueColumn: 'Visits',
numberOfWeeks: 4,
sensitivity: 80,
dimensions: ['Device - Type']
}
// < Test configuration ends >
// Transform the data
const inputThisMonth = waaila.functions.normalizeAtiResult(results[0]);
const inputLastMonth = waaila.functions.normalizeAtiResult(results[1]);
const input2ndToLastMonth = waaila.functions.normalizeAtiResult(results[2]);
const inputAll = input2ndToLastMonth.concat(inputLastMonth).concat(inputThisMonth);
const processedData = waaila.functions.isDayOfWeekAnomaly(anomalyConfig, inputAll);
if (typeof processedData[0] === 'undefined') {
waaila.table(processedData);
} else {
const anomalies = processedData.filter(row => row['isAnomaly'] == true);
const assert_pass_message = 'No anomalies detected in the ' + anomalyConfig['valueColumn'];
const assert_fail_message = 'There was an anomaly in the ' + anomalyConfig['valueColumn'];
waaila.assert(typeof anomalies[0] === 'undefined', 100)
.pass.message(assert_pass_message).fail.message(assert_fail_message);
const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
}
}
waaila.functions.isAnomaly()
The second anomaly feature allows you to follow the dynamics of your analytics data and informs you when the values jump significantly as compared to their trends and cyclical patterns. This function is based on the Azure AI Anomaly Detector to benefit from the already existing Artificial Intelligence models and allow to exploit arbitrary periodicity in the data in the predictions. You can set the length of cycles observed in your data. While weekly cycles (cycles of length 7) are most common, in some cases it can be more meaningful to use 2-week cycles (i.e. length 14) or even monthly cycles. You just need to provide enough data for the analysis.
Prerequisities
-
Azure subscription (unless you have a premium access to Waaila) – you can setup in Portal Azure
-
Create an Anomaly Detector resource in Azure (unless you have a premium access to Waaila)
-
go to Waaila Github,
-
click on the “Deploy to Azure” button
-
adjust the Resource group name if you want
-
fill in the workspace ID – fill in some unique identifier (e.g. a GUID)
-
Free tier of the Anomaly Detector allows to run up to 20000 detection transactions per month for free
- Mark the name of the resource and the resource key, you will need to configure them in the Waaila test (unless you have a premium access to Waaila)
-
To find the resource, find and open the newly created Resource group in the Azure portal (searching for its name among the resource groups under your subscription)
-
You will find the name in the left upper corner when you open the resource – it will consist of “waaila-“ + the inputted workspace ID
-
You will find the key in the Keys and Endpoint tab
- Return back to Waaila and open an existing depot or create a new depot
- To create a depot, select the button "Create a new depot", choose a name for it and optionally write a description of the purpose of this depot
- Open an existing dataset or create a new dataset
-
Use either GA or Piano Analytics as the data provider
-
To create a new dataset with no pre-specified tests follow the steps in this documentation
- Create a new test – select a name, fill in 100 for the score, no need to select any checkbox
isAnomaly example (Google Analytics 4)
Currently there is no prepared template for this functionality. The steps below describe how to create a new test yourself.
Replace the content of the Query logic with the following JSON. Follow the steps below to adjust the query for your needs. If you need further help adjusting the Query logic, follow the documentation on Query logic.
- In the dateRanges you select the dates which you want to use in the analysis
-
The last value will be analysed if it consists of an anomaly
-
You need enough data beforehand – if your data follows a weekly pattern, it is advised to use at least 4 weeks; you cannot use less than 2 weeks; the more data you use, the more precise the result
-
Dates can be inputted as relative date, e.g. “yesterday” or “28daysAgo”, or as absolute dates in format YYYY-MM-DD, e.g. “2020-12-30”
- In metrics you specify the values that will be available for the anomaly analysis
-
you can query multiple metrics because you will specify which one to use in the analysis later, e.g. ga:sessions, ga:pageviews, ga:transactions, ga:transactionRevenue
-
list of all GA metrics and dimensions can be found on the official page
- Dimensions must include only “ga:date”
- at this moment there is no possibility to extend this analysis by including more dimensions
{
"requests": [
{
"id": "anomalyInput",
"queries": [
{
"dateRanges": [
{
"startDate": "63daysAgo",
"endDate": "yesterday"
}
],
"metrics": [
{
"expression": "ga:pageviews"
}
],
"dimensions": [
{
"name": "ga:date"
}
],
"orderBys": [
{
"fieldName": "ga:date",
"sortOrder": "ASCENDING"
}
],
"includeEmptyRows": true
}
]
}
]
}
Replace the Test logic content with the content below and configure the anomaly detection parameters. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.
-
Fill in the configuration parameters resourceName and resourceKey the values for the Azure Anomaly Detector resource name and key that you marked previously
-
In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies
- The name corresponds to the name of Google Analytics metric without the “ga:” prefix, e.g. sessions, pageviews, transactions, transactionRevenue
-
Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 95
-
Optionally, change the parameter period which represents the length of the cycle: possible in whole numbers, so that you have enough data for two periods – default 7 days (i.e. weekly periodicity of data), leave out for automatic detection
-
Save and run the test
If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.
async (results, waaila, done) => {
// < Test configuration starts >
const anomalyDetectionConfig = {
resourceName: '<here fill in the name of Azure Anomaly Detector resource>',
resourceKey: '<here fill in the key for Azure Anomaly Detector resource>',
timeColumn: 'date',
valueColumn: 'pageviews',
sensitivity: 95,
period: 7
}
// < Test configuration ends >
const inputData = waaila.functions.normalizeGaResult(results['anomalyInput'][0]);
const processedData = await waaila.functions.isAnomaly(anomalyDetectionConfig, inputData);
if (typeof processedData[0] === 'undefined'){
waaila.table(processedData);
} else {
const anomalies = processedData.filter(row => row['isAnomaly'] == true);
const assert_pass_message = 'No anomalies detected in the ' + anomalyDetectionConfig['valueColumn'];
const assert_fail_message = 'There was an anomaly in the ' + anomalyDetectionConfig['valueColumn'];
waaila.assert(typeof anomalies[0] === 'undefined', 100)
.pass.message(assert_pass_message).fail.message(assert_fail_message);
const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
}
done();
}
isAnomaly example (Piano Analytics)
Currently there is no prepared template for this functionality. The steps below describe how to create a new test yourself.
Replace the content of the Query logic with the following JSON. Follow the steps below to adjust the query for your needs. If you need further help adjusting the Query logic, follow the documentation on Query logic/.
- The period is set to get data from current and past 2 months
- You need enough data beforehand – you can select in the test how many weeks to take into account, with the trade-off that taking more weeks is more robust but less quick to adjust in case of change of trend (we recommend selecting 6 weeks)
- Unfortunately the API for Piano Analytics data does not allow to select last N days (for example last 42 days), it is necessary to combine the data from multiple queries of monthly data
- In columns you specify the values that will be available for the anomaly analysis
- must include “date”
- add a metric that you want to analyze, e.g. m_visits, m_prod_purchased, m_unique_visitors, m_events, m_page_loads
- do not add any further property/ies as at this moment, there is no possibility to extend this analysis by including more dimensions
- don't forget to adjust the columns in each of the three queries
{
"queries": [
{
"columns": [
"date",
"m_visits"
],
"sort": [
"date"
],
"space": {
"s": [
"[code]"
]
},
"period": {
"p1": [
{
"type": "R",
"granularity": "M",
"offset": 0
}
]
},
"max-results": 1000,
"page-num": 1
},
{
"columns": [
"date",
"m_visits"
],
"sort": [
"date"
],
"space": {
"s": [
"[code]"
]
},
"period": {
"p1": [
{
"type": "R",
"granularity": "M",
"offset": -1
}
]
},
"max-results": 1000,
"page-num": 1
},
{
"columns": [
"date",
"m_visits"
],
"sort": [
"date"
],
"space": {
"s": [
"[code]"
]
},
"period": {
"p1": [
{
"type": "R",
"granularity": "M",
"offset": -2
}
]
},
"max-results": 1000,
"page-num": 1
}
]
}
Replace the Test logic content with the content below and configure the anomaly detection parameters. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.
-
Fill in the configuration parameters resourceName and resourceKey the values for the Azure Anomaly Detector resource name and key that you marked previously
-
In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies
- The name corresponds to Piano Analytics official label of the metric (typically derived by excluding the “m_” prefix, sometimes change more) e.g. Visits (for m_visits), Purchases (for m_prod_purchased), Visitors (for m_unique_visitors), Events (for m_events) Page_loads (for m_page_loads)
-
Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 95
-
Optionally, change the parameter period which represents the length of the cycle: possible in whole numbers, so that you have enough data for two periods – default 7 days (i.e. weekly periodicity of data), leave out for automatic detection
-
Save and run the test
If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.
async (results, waaila, done) => {
// < Test configuration starts >
const anomalyDetectionConfig = {
resourceName: '<here fill in the name of Azure Anomaly Detector resource>',
resourceKey: '<here fill in the key for Azure Anomaly Detector resource>',
timeColumn: 'Date',
valueColumn: 'Pageviews',
sensitivity: 95,
period: 7
}
// < Test configuration ends >
const inputThisMonth = waaila.functions.normalizeAtiResult(results[0]);
const inputLastMonth = waaila.functions.normalizeAtiResult(results[1]);
const input2ndToLastMonth = waaila.functions.normalizeAtiResult(results[2]);
const inputAll = input2ndToLastMonth.concat(inputLastMonth).concat(inputThisMonth);
const processedData = await waaila.functions.isAnomaly(anomalyDetectionConfig, inputAll);
if (typeof processedData[0] === 'undefined'){
waaila.table(processedData);
} else {
const anomalies = processedData.filter(row => row['isAnomaly'] == true);
const assert_pass_message = 'No anomalies detected in the ' + anomalyDetectionConfig['valueColumn'];
const assert_fail_message = 'There was an anomaly in the ' + anomalyDetectionConfig['valueColumn'];
waaila.assert(typeof anomalies[0] === 'undefined', 100)
.pass.message(assert_pass_message).fail.message(assert_fail_message);
const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
}
done();
}