Anomaly detection

Anomaly detection

One of the key features of Waaila is the ability to evaluate data using AI algorithms to determine whether there has been any non-standard increase or decrease in the data. Tests based on anomaly detection have the advantage, as compared to tests based on simple threshold, that the algorithm takes into account the periodicity of data and its both longer-term and most-recent evolution. Therefore, users can avoid the experimentation period to set an appropriate threshold.

In Waaila there are two options to evaluate the anomalies in your data. The first one expects weekly periodicity of data and compares the current values to a moving average from recent weeks. The weekly pattern does not have to be strong, the algorithm just is not able to extract information from cycles of other length (such as monthly).

The second approach is more flexible regarding the periodicity of the data but it requires connection to Azure Cognitive Services. It is built on the Anomaly Detector resource benefitting from existing AI models. The Anomaly Detector allows setting different expected length of cycles, or this parameter can also be deduced from the data.

waaila.functions.isDayOfWeekAnomaly()

The isDayOfWeekAnomaly() is based on the comparison of values in previous weeks and the latest value of the selected metric. The basic test where you observe an aggregate metric over a selected period of time can be extended by dimension - you can analyze the evolution of the metric for each value of the dimension or for each combination of dimensions separately. This can be useful for example to observe the behavior of visitors from different sources or different devices, and if you find an anomaly, you can already see which (if any specific) group contributed to it and investigate the cause directly.

In the test you select the data to load in to the test and moreover, you need to adjust anomaly configuration parameters in the test logic. To proceed, create a new test in a dataset with the data you want to analyze and then adjust the test parameters and settings based on the instructions below (select appropriate data provider).

  • If you need to create a depot, select the button "Create a new depot", choose a name for it and optionally write a description of the purpose of this depot

  • Use a dataset with either GA or Piano Analytics (AT Internet) as the data provider

  • To create a new dataset with no pre-specified tests follow the steps in this documentation

  • During creating of a new test, select a name, fill in 100 for the score, no need to select any checkbox

isDayOfWeekAnomaly example (Google Analytics)

Replace the content of the Query logic with the following JSON. Follow the steps below to adjust the query for your needs. If you need further help adjusting the Query logic, follow the documentation on Query logic.

  1. In the dateRanges you select the dates which you want to use in the analysis
  • The last value will be analyzed if it consists of an anomaly

  • You need enough data beforehand –you can select how many weeks to take into account, with the trade-off that taking more weeks is more robust but less quick to adjust in case of change of trend (we recommend selecting 6 weeks)

  • Dates can be inputted as relative date, e.g. “yesterday” or “28daysAgo”, or as absolute dates in format YYYY-MM-DD, e.g. “2020-12-30”

  1. In metrics you specify the values that will be available for the anomaly analysis
  • select a metric that you want to analyze, e.g. ga:sessions, ga:pageviews, ga:transactions, ga:transactionRevenue
  1. Dimensions must include “ga:date” and additionally you can specify other dimension(s) to be used to search for anomalies across different characteristics of the website traffic

list of all GA metrics and dimensions can be found on the official page

{
  "requests": [
    {
      "id": "anomalyInput",
      "queries": [
        {
          "dateRanges": [
            {
              "startDate": "42daysAgo",
              "endDate": "yesterday"
            }
          ],
          "metrics": [
            {
              "expression": "ga:pageviews"
            }
          ],
          "dimensions": [
            {
              "name": "ga:date"
            },
            {
              "name": "ga:deviceCategory"
            }
          ],
          "orderBys": [
            {
              "fieldName": "ga:date",
              "sortOrder": "ASCENDING"
            }
          ],
          "includeEmptyRows": true
        }
      ]
    }
  ]
}

Replace the Test logic content with the content below and configure the anomaly detection parameters. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.

  1. In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies
  • The name corresponds to the name of Google Analytics metric without the “ga:” prefix, e.g. sessions, pageviews, transactions, transactionRevenue
  1. Optionally, adjust the number of weeks (if you want to use more weeks of data, adjust the Query logic accordingly)

  2. Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 80

  3. If you included additional dimensions in the Query logic, you need to specify them all inside the dimensions array (as the data are loaded in the granularity based on all included dimensions in the Query logic)

  4. Save and run the test

If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. The column isAnomaly is conditionally formatted for better overview of the results. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.

(results, waaila) => {
   // < Test configuration starts >
   const anomalyConfig = {
       valueColumn:'pageviews',
        numberOfWeeks: 6,
        sensitivity: 80,
        dimensions: ['deviceCategory']
   } 
   // < Test configuration ends > 
   const inputData = waaila.functions.normalizeGaResult(results['anomalyInput'][0]);
   const processedData = waaila.functions.isDayOfWeekAnomaly(anomalyConfig, inputData)
   
   if (typeof processedData[0] === 'undefined'){
      waaila.table(processedData);
   } else {
      const anomalies = processedData.filter(row => row['isAnomaly']==true);
             
       const assert_pass_message ='No anomalies detected in the '+ anomalyConfig['valueColumn']; 
      const assert_fail_message ='There was an anomaly in the '+ anomalyConfig['valueColumn'];
      waaila.assert(typeof anomalies[0]==='undefined',100) 
         .pass.message(assert_pass_message).fail.message(assert_fail_message); 
       
     const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
     waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
   }
}

isDayOfWeekAnomaly example (Piano Analytics, formerly AT Internet)

Replace the content of the Query logic with the following JSON. Follow the steps below to adjust the query for your needs. If you need further help adjusting the Query logic, follow the documentation on Query logic.

  1. The period is set to get data from current and past 2 months
  • You need enough data beforehand – you can select in the test how many weeks to take into account, with the trade-off that taking more weeks is more robust but less quick to adjust in case of change of trend (we recommend selecting 6 weeks)
  • Unfortunately the API for Piano Analytics data does not allow to select last N days (for example last 42 days), it is necessary to combine the data from multiple queries of monthly data
  1. In columns you specify the values that will be available for the anomaly analysis
  • must include “date”
  • add a metric that you want to analyze, e.g. m_visits, m_prod_purchased, m_unique_visitors, m_events, m_page_loads
  • additionally you can specify other property/ies to be used to search for anomalies across different characteristics of the website traffic (e.g. device_type, page_chapter1, src)
  • don't forget to adjust the columns in each of the three queries
{
  "queries": [
    {
      "columns": [
        "date",
        "device_type",
        "m_visits"
      ],
      "sort": [
        "date"
      ],
      "space": {
        "s": [
          "[code]"
        ]
      },
      "period": {
        "p1": [
          {
            "type": "R",
            "granularity": "M",
            "offset": 0
          }
        ]
      },
      "max-results": 1000,
      "page-num": 1
    },
    {
      "columns": [
        "date",
        "device_type",
        "m_visits"
      ],
      "sort": [
        "date"
      ],
      "space": {
        "s": [
          "[code]"
        ]
      },
      "period": {
        "p1": [
          {
            "type": "R",
            "granularity": "M",
            "offset": -1
          }
        ]
      },
      "max-results": 1000,
      "page-num": 1
    },
    {
      "columns": [
        "date",
        "device_type",
        "m_visits"
      ],
      "sort": [
        "date"
      ],
      "space": {
        "s": [
          "[code]"
        ]
      },
      "period": {
        "p1": [
          {
            "type": "R",
            "granularity": "M",
            "offset": -2
          }
        ]
      },
      "max-results": 1000,
      "page-num": 1
    }
  ]
}

Replace the Test logic content with the content below and configure the anomaly detection parameters. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.

  1. In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies
  • The name corresponds to Piano Analytics official label of the metric (typically derived by excluding the “m_” prefix, sometimes change more) e.g. Visits (for m_visits), Purchases (for m_prod_purchased), Visitors (for m_unique_visitors), Events (for m_events) Page_loads (for m_page_loads)
  1. Optionally, adjust the number of weeks (if you want to use more weeks of data, adjust the Query logic accordingly)

  2. Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 80

  3. If you included additional properties in the Query logic, you need to specify them all inside the dimensions array (as the data are loaded in the granularity based on all included dimensions in the Query logic)

  4. Save and run the test

If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. The column isAnomaly is conditionally formatted for better overview of the results. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.

(results, waaila) => {
   // < Test configuration starts > 
   const anomalyConfig = {
      valueColumn: 'Visits',
      numberOfWeeks: 4,
      sensitivity: 80,
      dimensions: ['Device - Type']
   }
   // < Test configuration ends > 
   // Transform the data
   const inputThisMonth = waaila.functions.normalizeAtiResult(results[0]);
   const inputLastMonth = waaila.functions.normalizeAtiResult(results[1]);
   const input2ndToLastMonth = waaila.functions.normalizeAtiResult(results[2]);
   const inputAll = input2ndToLastMonth.concat(inputLastMonth).concat(inputThisMonth);
   const processedData = waaila.functions.isDayOfWeekAnomaly(anomalyConfig, inputAll);
   if (typeof processedData[0] === 'undefined') {
      waaila.table(processedData);
   } else {
      const anomalies = processedData.filter(row => row['isAnomaly'] == true);
      const assert_pass_message = 'No anomalies detected in the ' + anomalyConfig['valueColumn'];
      const assert_fail_message = 'There was an anomaly in the ' + anomalyConfig['valueColumn'];
      waaila.assert(typeof anomalies[0] === 'undefined', 100)
         .pass.message(assert_pass_message).fail.message(assert_fail_message);
      
     const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
     waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
   }
}

waaila.functions.isAnomaly()

The anomaly feature allows you to follow the dynamics of your analytics data and informs you when the values jump significantly as compared to their trends and cyclical patterns. This function is based on the Azure Anomaly Detector to benefit from the already existing Artificial Intelligence models and allow to exploit arbitrary periodicity in the data in the predictions. You can set the length of cycles observed in your data. While weekly cycles (cycles of length 7) are most common, in some cases it can be more meaningful to use 2-week cycles (i.e. length 14) or even monthly cycles. You just need to pay attention to the amount of data that you provide for the analysis.

What you need for isAnomaly detection in Waaila

  1. Azure subscription (unless you have a premium access to Waaila) – you can setup in Portal Azure

  2. Create an Anomaly Detector resource in Azure (unless you have a premium access to Waaila)

  • go to Waaila Github,

  • click on the “Deploy to Azure” button

  • adjust the Resource group name if you want

  • fill in the workspace ID – fill in some unique identifier (e.g. a GUID)

  • Free tier of the Anomaly Detector allows to run up to 20000 detection transactions per month for free

  1. Mark the name of the resource and the resource key, you will need to configure them in the Waaila test (unless you have a premium access to Waaila)
  • To find the resource, find and open the newly created Resource group in the Azure portal (searching for its name among the resource groups under your subscription)

  • You will find the name in the left upper corner when you open the resource – it will consist of “waaila-“ + the inputted workspace ID

  • You will find the key in the Keys and Endpoint tab

  1. Return back to Waaila and open an existing depot or create a new depot
  • To create a depot, select the button "Create a new depot", choose a name for it and optionally write a description of the purpose of this depot
  1. Open an existing dataset or create a new dataset
  • Use either GA or Piano Analytics (AT Internet) as the data provider

  • To create a new dataset with no pre-specified tests follow the steps in this documentation

  1. Create a new test – select a name, fill in 100 for the score, no need to select any checkbox

isAnomaly example (GA Reporting data)

Replace the content of the Query logic with the following JSON. Follow the steps below to adjust the query for your needs. If you need further help adjusting the Query logic, follow the documentation on Query logic.

  1. In the dateRanges you select the dates which you want to use in the analysis
  • The last value will be analysed if it consists of an anomaly

  • You need enough data beforehand – if your data follows a weekly pattern, it is advised to use at least 4 weeks; you cannot use less than 2 weeks; the more data you use, the more precise the result

  • Dates can be inputted as relative date, e.g. “yesterday” or “28daysAgo”, or as absolute dates in format YYYY-MM-DD, e.g. “2020-12-30”

  1. In metrics you specify the values that will be available for the anomaly analysis
  • you can query multiple metrics because you will specify which one to use in the analysis later, e.g. ga:sessions, ga:pageviews, ga:transactions, ga:transactionRevenue

  • list of all GA metrics and dimensions can be found on the official page

  1. Dimensions must include only “ga:date”
  • at this moment there is no possibility to extend this analysis by including more dimensions
{
 "requests": [
  {
   "id": "anomalyInput",
   "queries": [
    {
     "dateRanges": [
      {
       "startDate": "63daysAgo",
       "endDate": "yesterday"
      }
     ],
     "metrics": [
      {
       "expression": "ga:pageviews"
      }
     ],
     "dimensions": [
      {
       "name": "ga:date"
      }
     ],
     "orderBys": [
      {
       "fieldName": "ga:date",
       "sortOrder": "ASCENDING"
      }
     ],
     "includeEmptyRows": true
    }
   ]
  }
 ]
}

Replace the Test logic content with the content below and configure the anomaly detection parameters. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.

  1. Fill in the configuration parameters resourceName and resourceKey the values for the Azure Anomaly Detector resource name and key that you marked previously

  2. In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies

  • The name corresponds to the name of Google Analytics metric without the “ga:” prefix, e.g. sessions, pageviews, transactions, transactionRevenue
  1. Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 95

  2. Optionally, change the parameter period which represents the length of the cycle: possible in whole numbers, so that you have enough data for two periods – default 7 days (i.e. weekly periodicity of data), leave out for automatic detection

  3. Save and run the test

If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.

async (results, waaila, done) => {
 // < Test configuration starts >
 const anomalyDetectionConfig = {
    resourceName: '<here fill in the name of Azure Anomaly Detector resource>',
    resourceKey: '<here fill in the key for Azure Anomaly Detector resource>',
    timeColumn: 'date',
    valueColumn: 'pageviews',
    sensitivity: 95,
    period: 7
 }
 // < Test configuration ends >

 const inputData = waaila.functions.normalizeGaResult(results['anomalyInput'][0]);
 const processedData = await waaila.functions.isAnomaly(anomalyDetectionConfig, inputData);
 
 if (typeof processedData[0] === 'undefined'){
    waaila.table(processedData);
 } else {
    const anomalies = processedData.filter(row => row['isAnomaly'] == true);      
    const assert_pass_message = 'No anomalies detected in the ' + anomalyDetectionConfig['valueColumn'];
    const assert_fail_message = 'There was an anomaly in the ' + anomalyDetectionConfig['valueColumn'];
    waaila.assert(typeof anomalies[0] === 'undefined', 100)
       .pass.message(assert_pass_message).fail.message(assert_fail_message);
     
    const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
    waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
 }
 done();
}

isAnomaly example (Piano Analytics, formerly AT Internet)

Replace the content of the Query logic with the following JSON. Follow the steps below to adjust the query for your needs. If you need further help adjusting the Query logic, follow the documentation on Query logic/.

  1. The period is set to get data from current and past 2 months
  • You need enough data beforehand – you can select in the test how many weeks to take into account, with the trade-off that taking more weeks is more robust but less quick to adjust in case of change of trend (we recommend selecting 6 weeks)
  • Unfortunately the API for Piano Analytics data does not allow to select last N days (for example last 42 days), it is necessary to combine the data from multiple queries of monthly data
  1. In columns you specify the values that will be available for the anomaly analysis
  • must include “date”
  • add a metric that you want to analyze, e.g. m_visits, m_prod_purchased, m_unique_visitors, m_events, m_page_loads
  • do not add any further property/ies as at this moment, there is no possibility to extend this analysis by including more dimensions
  • don't forget to adjust the columns in each of the three queries
{
  "queries": [
    {
      "columns": [
        "date",
        "m_visits"
      ],
      "sort": [
        "date"
      ],
      "space": {
        "s": [
          "[code]"
        ]
      },
      "period": {
        "p1": [
          {
            "type": "R",
            "granularity": "M",
            "offset": 0
          }
        ]
      },
      "max-results": 1000,
      "page-num": 1
    },
    {
      "columns": [
        "date",
        "m_visits"
      ],
      "sort": [
        "date"
      ],
      "space": {
        "s": [
          "[code]"
        ]
      },
      "period": {
        "p1": [
          {
            "type": "R",
            "granularity": "M",
            "offset": -1
          }
        ]
      },
      "max-results": 1000,
      "page-num": 1
    },
    {
      "columns": [
        "date",
        "m_visits"
      ],
      "sort": [
        "date"
      ],
      "space": {
        "s": [
          "[code]"
        ]
      },
      "period": {
        "p1": [
          {
            "type": "R",
            "granularity": "M",
            "offset": -2
          }
        ]
      },
      "max-results": 1000,
      "page-num": 1
    }
  ]
}

Replace the Test logic content with the content below and configure the anomaly detection parameters. If you need any help regarding the parameters in the Test logic, you can find more information in the documentation on Test logic.

  1. Fill in the configuration parameters resourceName and resourceKey the values for the Azure Anomaly Detector resource name and key that you marked previously

  2. In the parameter valueColumn fill in the name of the column that you want to analyze for anomalies

  • The name corresponds to Piano Analytics official label of the metric (typically derived by excluding the “m_” prefix, sometimes change more) e.g. Visits (for m_visits), Purchases (for m_prod_purchased), Visitors (for m_unique_visitors), Events (for m_events) Page_loads (for m_page_loads)
  1. Optionally, change the parameter sensitivity: possible whole number from 0 to 99 (99 means most sensitive to anomalies) – default is 95

  2. Optionally, change the parameter period which represents the length of the cycle: possible in whole numbers, so that you have enough data for two periods – default 7 days (i.e. weekly periodicity of data), leave out for automatic detection

  3. Save and run the test

If there was an anomaly, the test results in Failed status (red cross symbol) and informs there has been an anomaly. In case no anomaly was found, the test result status is Passed (green tick symbol). The test always outputs a table including the actual and predicted value. For reference there is also the value the algorithm expected along with the bounds that define an area around the expected value where the values are statistically not different from the expected value.

async (results, waaila, done) => {
   // < Test configuration starts >
   const anomalyDetectionConfig = {
    resourceName: '<here fill in the name of Azure Anomaly Detector resource>',
    resourceKey: '<here fill in the key for Azure Anomaly Detector resource>',
    timeColumn: 'Date',
    valueColumn: 'Pageviews',
    sensitivity: 95,
    period: 7
  }
   // < Test configuration ends >
   const inputThisMonth = waaila.functions.normalizeAtiResult(results[0]);
   const inputLastMonth = waaila.functions.normalizeAtiResult(results[1]);
   const input2ndToLastMonth = waaila.functions.normalizeAtiResult(results[2]);
   const inputAll = input2ndToLastMonth.concat(inputLastMonth).concat(inputThisMonth);
   const processedData = await waaila.functions.isAnomaly(anomalyDetectionConfig, inputAll);
    
  if (typeof processedData[0] === 'undefined'){
     waaila.table(processedData);
  } else {
     const anomalies = processedData.filter(row => row['isAnomaly'] == true);      
     const assert_pass_message = 'No anomalies detected in the ' + anomalyDetectionConfig['valueColumn'];
     const assert_fail_message = 'There was an anomaly in the ' + anomalyDetectionConfig['valueColumn'];
     waaila.assert(typeof anomalies[0] === 'undefined', 100)
        .pass.message(assert_pass_message).fail.message(assert_fail_message);
      
     const processedDataLastDay = processedData.filter(row => row['isAnomaly'] != null);
     waaila.table(processedDataLastDay.order(['expectedValue'], true), [{'column': 'isAnomaly', 'condition': {'EQUAL': false}}]);
  }
  done();
}