AWS vs Azure Anomaly Detector Process

Narendra Reddy
4 min readJul 3, 2021

This Document is just a comparison between Azure and AWS on the Anomaly Detection process and its tools costing and usage

What is Anomaly Detection:

Anomaly detection (aka outlier analysis) is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior. Anomalous data can indicate critical incidents, such as a technical glitch, or potential opportunities, for instance, a change in consumer behavior.

What is Time Series Data anomaly detection?

In time-series data, an anomaly or outlier can be termed as a data point which is not following the common collective trend or seasonal or cyclic pattern of the entire data and is significantly distinct from the rest of the data. By significance, most data scientists mean statistical significance, which in order words, signifies that the statistical properties of the data point are not in alignment with the rest of the series.

Source: https://towardsdatascience.com/effective-approaches-for-time-series-anomaly-detection-9485b40077f1

Archiechure Comparion:

Microsoft Azure:

Data flow

  1. Ingests data from the various stores that contain raw data to be monitored by Anomaly Detector.
  2. Aggregates, samples, and computes the raw data to generate the time series or calls the Anomaly Detector API directly if the time series is already prepared and responds with the detection results.
  3. Queue the anomaly-related metadata.
  4. The serverless app picks the message from the message queue based on the anomaly-related metadata and sends the alert about the anomaly.
  5. Stores the anomaly detection metadata.
  6. Visualize the results of the time series anomaly detection.

Components

Key technologies used to implement this architecture:

Service Bus: Reliable cloud messaging as a service (MaaS) and simple hybrid integration

Azure Databricks: Fast, easy, and collaborative Apache Spark-based analytics service

Power BI: Interactive data visualization BI tools

Storage Accounts: Durable, highly available, and massively scalable cloud storage

Cognitive Services: cloud-based services with REST APIs and client library SDKs available to help you build cognitive intelligence into your applications

Logic Apps: Serverless platform for building enterprise workflows that integrate applications, data, and services. In this architecture, the logic apps are triggered by HTTP requests.

Alternatives

Event Hubs with Kafka: An alternative to running your own Kafka cluster. This Event Hubs feature provides an endpoint that is compatible with Kafka APIs.

Azure Synapse Analytics: Analytics service that brings together enterprise data warehousing and Big Data analytics

Azure Machine Learning: lets you build, train, deploy, and manage custom machine learning/anomaly detection models in a cloud-based environment.

AWS MLOps Framework

DataFlow

  1. The Orchestrator (solution owner or DevOps engineer) launches the solution in the AWS account and selects the desired options (for example, using Amazon SageMaker Registry, or providing an existing S3 bucket).
  2. The Orchestrator uploads the required assets for the target pipeline (for example, model artifact, training data, and/or custom algorithm zip file) into the Assets S3 bucket. If Amazon SageMaker Model Registry is used, the Orchestrator (or an automated pipeline) must register the model with the Model Registry.
  3. A single account AWS CodePipeline instance is provisioned by either sending an API call to the API Gateway or by committing the mlopsconfig.json file to the Git repository. Depending on the pipeline type, the Orchestrator AWS Lambda function packages the target AWS CloudFormation template and its parameters/configurations using the body of the API call or the mlops-config.json file and uses it as the source stage for the AWS CodePipeline instance
  4. The DeployPipeline stage takes the packaged CloudFormation template and its parameters/configurations and deploys the target pipeline into the same account.
  5. After the target pipeline is provisioned, users can access its functionalities. An Amazon Simple Notification Service (Amazon SNS) notification is sent to the email provided in the solution’s launch parameters.

Components

Key technologies used to implement this architecture:

Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile apps quickly and easily.

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information

AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes.

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML

Pricing starts with Basic between Azure vs AWS:

Azure: Monthly: $300

For 100GB of storage size in total with DS3 v2 (4 cores, 14GiB)

This calculation is based DataBricks to run every hour per Month

https://azure.com/e/477bf65237eb43bfb2fdffdfc3b0cd61

AWS: $250/month

For 100GB of storage size in total with ml.m5.2xlarge (8vcores 32Gib)

References:

--

--