Data Streaming

Data Streaming is an Observe product that allows you to feed your stream processing, SIEM, and big data platforms with the event logs from your applications on Azion in real time. To keep an enhanced performance, it uses ASCII encoding to avoid parser issues and problems in data interpretation.

You can choose from which of the available Azion product and domains you want to collect your logs from and connect them to the endpoint of your data analysis platforms. You can also decide which data you want to use on your analysis by choosing among the available variables.

By creating a data streaming, you can:

  • Have an organized set of logs.
  • Connect your data streamings to endpoints.
  • Understand the behavior of your users.
  • Analyze the performance of your content and applications.
  • Identify data on security threats.
  • Make informed decisions.
  • Improve your applications and your business through reliable observability practices.

After configuring your data streaming, you can check your logs successfully being sent through Real-Time Events.

See the Data Streaming first steps.

TaskGuide
Configure data streamingHow to use Data Streaming
Associate domainsHow to associate domains on Data Streaming
Create customize templateHow to create a custom template on Data Streaming

By default, Data Streaming sends your event logs when the code block with variables reaches 2,000 records, every 60 seconds, or when the packet size reaches the value determined in the payload Max Size field, whichever occurs first. However, in case you’re using the AWS Kinesis Firehose endpoint, Data Streaming will send your event logs when the block reaches 500 records or every 60 seconds.

For example, if a package reaches 2,000 records in 13 seconds, it’ll be sent.

Once you activate a data streaming, there’s a propagation time until the logs become available for consultation in tools or products such as Real-Time Events.

To find detailed reference about each section of configuring a Data Streaming, keep reading the following subsections.


A Data Source represents the application at Azion that generates the event logs you want to use. By selecting one, you decide where your data will be collected from and the remaining product settings are configured according to your choice.

Selecting a Data Source in the dropdown list is mandatory. You can choose between:

Each data source has a preset of variables, combined in a template, representing the specific information you can receive from your event logs. See each data source’s prerequisites and variables and what data they provide next.

The Activity History data source displays the data for logs activity regarding your account on Azion Console. The following variables are available for this option:

VariableDescription
$author_emailEmail address of the Azion Console user who performed the action.
$author_nameName of the Azion Console user who performed the action.
$clientUnique Azion customer identifier. Example: 4529r
$commentEditable space available for users to add comments when performing changes.
$timeRequest date and time. Example: Oct. 31st, 2022 - 19:30:41
$titleTitle of the activity, composed of: model name, name, and type of activity. Example: Pathorigin Default Origin was changed
$typeType of performed action on Azion Console: CREATED, CHANGED, DELETED, or SIGNED UP.

You can’t associate domains if you use the Activity History data source.


The Edge Applications data source provides the data from requests made to your edge applications at Azion. The following variables are available for this option:

VariableDescription
$asnAutonomous System Number (ASN) Allocation, which are IP address networks managed by one or more network operators that have a clear and unique routing policy. Example: AS52580
$bytes_sentNumber of bytes sent to a client. Example: 191
$clientUnique Azion customer identifier. Example: 4529r
$configurationUnique Azion configuration identifier set on virtual host configuration file. Example: 1595368520
$countryClient’s country detected via IP address geolocation. Example: United States
$hostHost information sent on the request line. Stores: host name from the request line, or host name from the “Host” request header field, or the server name matching a request.
$http_referrerAddress of the page the user made the request from. Value of the Referer header. Example: https://example.com
$http_user_agentEnd user’s application, operating system, vendor, and/or version. Value of the User-Agent header. Example: Mozilla/5.0 (Windows NT 10.0; Win64; x64)
$proxy_statusHTTP error status code or origin when no response is obtained from the upstream. Example: 520. In case of cache, the response is -.
$remote_addrIP address of the origin that generated the request.
$remote_portRemote port of the origin that generated the request.
$request_idUnique request identifier. Example: 5f222ae5938482c32a822dbf15e19f0f
$request_lengthRequest length, including request line, headers, and body.
$request_methodHTTP request method. Example: GET or POST.
$request_timeRequest processing time elapsed since the first bytes were read from the client with resolution in milliseconds. Example: 1.19
$request_uriURI of the request made by the end user, without the host and protocol information and with arguments. Example: /v1?v=bo%20dim
$requestPathRequest URI without Query String, Host, and Protocol information. Example: if request_uri: /jira/plans/48/scenarios/27?vop=320#plan/backlog, then requestPath: /jira/plans/48/scenarios/27
$requestQueryURI parameters of the request. Example: requestQuery: vid=320#plan/backlog
$schemeRequest scheme. Example: HTTP or HTTPS.
$sent_http_content_typeContent-Type header sent in the origin’s response. Example: text/html; charset=UTF-8.
$sent_http_x_original_image_size“X-Original-Image-Size” header sent in the origin’s response. Used by IMS to inform original image size. Example: 987390
$server_addrIP address of the server that received the request.
$server_portRemote port of the server that received the request. Example: 443
$server_protocolRequest protocol. Example: HTTP/1.1, HTTP/2.0, HTTP/3.0
$session_idIdentification of the session.
$ssl_cipherCipher string used to establish TLS connection. Example: TLS_AES_256_GCM_SHA384
$ssl_protocolProtocol for an established TLS connection. Example: TLS v1.2
$ssl_server_nameServer name informed by the client that is trying to connect. Example: www.example.com
$ssl_session_reusedReturns r if the TLS session was reused; otherwise, returns ..
$stateClient’s state detected via IP address geolocation. Example: CA
$statusHTTP status code of the request. Example: 200
$streamID set through virtual host configuration based on location directive. Set on virtual host configuration file.
$tcpinfo_rttRound-Trip Time (RTT) measured by the edge for the user. Available on systems that support the TCP_INFO socket option.
$timeRequest date and time. Example: Oct. 31st, 2022 - 19:30:41
$tracebackProvides the names of the Rules Engine from your Edge Application and your Edge Firewall that are run by the request.
$upstream_addrClient IP address and port. Can also store multiple servers or server groups. Example: 192.168.1.1:80. When the response is 127.0.0.1:1666, the upstream is Azion Cells Runtime.
$upstream_bytes_receivedNumber of bytes received by the origin’s edge if the content isn’t cached. Example: 8304
$upstream_bytes_sentNumber of bytes sent to the origin. Example: 2733
$upstream_cache_statusStatus of the local edge cache. Example: MISS, BYPASS, EXPIRED, STALE, UPDATING, REVALIDATED, or HIT
$upstream_connect_timeTime it takes for the edge to establish a connection with the origin in milliseconds. In the case of TLS, it includes time spent on handshake. Example: 0.123. 0 in case of KeepAlive and - in case of cache.
$upstream_header_timeTime it takes for the edge to receive the response header from the origin in milliseconds. Example: 0.345. In case of cache, the response is -.
$upstream_response_timeTime it takes for the edge to receive a default response from the origin in milliseconds, including headers and body. Example: 0.876. In case of cache, the response is -.
$upstream_statusHTTP status code of the origin. If a server cannot be selected, the variable keeps the 502 (Bad Gateway) status code. Example: 200. In case of cache, the response is -.
$waf_attack_actionReports WAF’s action regarding the action. Can be: $BLOCK, $PASS, $LEARNING_BLOCK, or $LEARNING_PASS.
$waf_attack_familyInforms the classification of the WAF infraction detected in the request. Example: SQL, XSS, TRAVERSAL, among others.
$waf_blockInforms whether the WAF blocked the action or not. 0 when action wasn’t blocked and 1 when action was blocked. When in Learning Mode, it won’t be blocked regardless of the return.
$waf_headersWhen the request headers sent by the user are analyzed by the WAF module and tagged as blocked with $waf_block = 1, it contains a base64 encoded string. Otherwise, it contains a dash character -. It applies to both WAF Learning or Blocking modes.
$waf_learningInforms if WAF is in Learning mode. Can be 0 or 1.
$waf_matchList of infractions found in the end user’s request. It’s formed by key-value elements: the key refers to the type of violation detected; the value shows the string that generated the infraction.
$waf_scoreReports the score that will be increased in case of a match with the rules set for the WAF.
$waf_total_blockedInforms the total number of blocked requests.
$waf_total_processedInforms the total number of processed requests.

The variables: $upstream_bytes_received, $upstream_cache_status, $upstream_connect_time, $upstream_header_time, $upstream_response_time, and $upstream_status can have more than one comma-separated element. When a connection is triggered, either by internal redirection or choice of source with Load Balancer, for example, each value contained in the field represents the respective initiated connection. The field can be separated by:

  • A comma, representing multiple IPs.
  • A colon, representing internal redirection.

If several servers were contacted during the request processing, their addresses are separated by commas. For example: 192.168.1.1:80, 192.168.1.2:80.

If an internal redirect from one server group to another happens, initiated by X-Accel-Redirect or Error Responses, then the server addresses from different groups are separated by colons. For example: 192.168.1.1:80, 192.168.1.2:80, unix:/tmp/sock : 192.168.10.1:80, 192.168.10.2:80.

If a server can’t be selected, the variable keeps the name of the server group.

Considering multiple values as transitions in the connection, the last value tends to be the most important. If you use the Error Responses feature on your edge applications, you’ll see two values on upstream fields that represent the status of the origin and the result of the request that was made to get the content to be delivered instead. In normal cases, you may get 502 : 200.

502 is the HTTP error code for the response of the first try to get content from the origin server. Because it returned an 502 error, considering you have configured an Error Responses for status 502, another request will be made in order to get the URI defined. Then, the page will be delivered and the HTTP status will be added to the upstream fields, respecting its position for all of them. In this example, it results in the 502 : 200 composition.


Requires: Edge Functions

The Edge Functions data source provides the data from requests made to your edge functions at Azion.

The following variables are available for this option:

VariableDescription
$clientUnique Azion customer identifier. Example: 4529r
$edge_function_idIdentification of your Edge Function. Example: 1321
$global_idSettings identification.
$log_levelLevel of the log generator: ERROR, WARN, INFO, DEBUG, or TRACE.
$log_messageEditable message used in the log function. Available for users to identify and report a given behavior.
$message_sourceThe source of the message. When messages are generated by the Console API: CONSOLE; when it’s related to an error message: RUNTIME.
$request_idUnique request identifier. Example: 5f222ae5938482c32a822dbf15e19f0f
$timeRequest date and time. Example: Oct. 31st, 2022 - 19:30:41

Requires: Web Application Firewall

The WAF Events data source provides the data from requests analyzed by Web Application Firewall (WAF) to allow you to map the score assigned to the request, the WAF rules that matched, and the reason for the block.

The following variables are available for this option:

VariableDescription
$blockedInforms whether the WAF blocked the action or not. 0 when action wasn’t blocked and 1 when action was blocked. When in Learning Mode, it won’t be blocked regardless of the return.
$clientUnique Azion customer identifier. Example: 4529r
$configurationUnique Azion configuration identifier set on virtual host configuration file. Example: 1595368520
$countryClient’s country detected via IP address geolocation. Example: United States
$headersWhen the request headers sent by the user are analyzed by the WAF module and tagged as blocked with $waf_block = 1, it contains a base64 encoded string. Otherwise, it contains a dash character -. It applies to both WAF Learning or Blocking modes.
$hostHost information sent on the request line. Stores: host name from the request line, or host name from the “Host” request header field, or the server name matching a request.
$remote_addrIP address of the origin that generated the request.
$requestPathRequest URI without Query String, Host, and Protocol information. Example: if request_uri: /jira/plans/48/scenarios/27?vop=320#plan/backlog, then requestPath: /jira/plans/48/scenarios/27
$requestQueryURI parameters of the request. Example: requestQuery: vid=320#plan/backlog
$server_protocolRequest protocol. Example: HTTP/1.1, HTTP/2.0, HTTP/3.0
$timeRequest date and time. Example: Oct. 31st, 2022 - 19:30:41
$truncated_bodyThis variable has been deprecated. It won’t have a value assigned to it, only - instead of a value.
$versionThe Azion Log version used. Example: v5.
$waf_argsThe request arguments.
$waf_attack_actionReports WAF’s action regarding the action: $BLOCK, $PASS, $LEARNING_BLOCK, or $LEARNING_PASS.
$waf_attack_familyInforms the classification of the WAF infraction detected in the request. Examples: SQL, XSS, TRAVERSAL, among others.
$waf_learningInforms if WAF is in Learning mode. Can be 0 or 1.
$waf_matchList of infractions found in the end user’s request. It’s formed by key-value elements: the key refers to the type of violation detected; the value shows the string that generated the infraction.
$waf_scoreReports the score that will be increased in case of a match with the rules set for the WAF.
$waf_serverHostname used in the WAF request. Example: api-login.azion.com.br
$waf_uriURI used in the WAF request. Example: /access/v2/after-login

A Data Streaming template provides the preset of variables available for each data source in a format suitable to transfer your event logs. After selecting your data source, you can:

  • Select the corresponding template, provided by Azion.
  • Customize your own template, choosing which variables you want to use.

You can find four templates provided by Azion and a Custom Template, which provides you the option to decide which variables to use. Templates are available in the Template dropdown menu and the variables for the templates are shown in the Data Set code field in JSON format.

See which template corresponds to which data source:

Data SourceTemplate
Activity HistoryActivity History Collector
Edge ApplicationsEdge Applications + WAF Event Collector
Edge FunctionsEdge Functions Event Collector
WAF EventsWAF Event Collector
AllCustom Template

By selecting one of the templates provided by Azion, you can’t modify the variables shown in the Data Set code field. If you select Custom Template, you’re able to customize which variables you want to use according to your needs.

How to create a custom template on Data Streaming

You can associate your existing domains registered on Azion to your data streaming. If you haven’t registered any domains to your account yet, see the Creating a new domain associated with your edge application documentation.

When you associate a domain, the events related with that or those specific domains are collected and sent to the endpoint you configure through a data streaming. You can associate one or more domains and you have the option to Filter Domains or select All Domains.

When you select All Domains, the platform automatically selects all current and future domains you have on your Azion Console account.

How to associate domains on Data Streaming

In case you select the All Domains option, you can also set the percentage of data you want to receive randomly from your data streaming through the Sampling option. In addition to filtering by sampling, it can also reduce costs of data collection and analysis.

The Sampling (%) field should contain the percentage of data you want to receive. This percentage will return the total data related to all your domains.

When the Sampling option is enabled, you’re allowed to add only one data streaming on your account. Once this data streaming is disabled, the Add Streaming option will be enabled again on the Data Streaming screen on Azion Console.


The endpoint is the destination where you want to send the data collected by Azion to, which is usually a stream processing platform or analysis tool you use. The endpoint type represents the method you want to configure as a destination to your data.

Azion supports the following endpoints:

To configure an endpoint, you must select an Endpoint Type from the dropdown list in Azion Console. Then, you must fill the presented fields according to your choice of endpoint type.

See more about each available endpoint and their fields next. Fields marked with an asterisk * on Console are mandatory.


To configure the Apache Kafka endpoint, you need access to their platform to get the required information:

  • Bootstrap Servers: the servers—hosts and ports—in the Kafka cluster. You can add one or more Kafka servers by separating them with a comma , and no space. They must be informed in this format: myownhost.com:2021,imaginaryhost.com:4525,anotherhost:4030.

    There’s no need to include all the servers in your cluster in this field, only the few servers that will be used for the initial connection.

  • Kafka Topic: the topic name from your Kafka cluster to which Data Streaming should send messages to. This field only accepts one topic. Example: azure.analytics.fct.pageviews.0

  • Transport Layer Security (TLS): option to send encrypted data using Transport Layer Security (TLS). It’s a cryptographic protocol designed to provide secure communication on a computer network and is commonly used as an HTTPS security layer. To use it, select Yes.

If you want to use this protocol, make sure the endpoint that will receive the data is properly protected with a digital certificate issued by a globally recognized Certificate Authority (CA), such as IndenTrust, DigiCert, Sectigo, GoDaddy, GlobalSign, or Let’s Encrypt.

The TLS variable use_tls receives either true or false to enable/disable its use.


To configure the AWS Kinesis Data Firehose endpoint, you need access to their platform to get the required information:

  • Stream Name: the delivery stream name that the user defined when they created the Kinesis Data Firehose in AWS’s platform. Example: MyKDFConnector
  • Region: the region where your Amazon Kinesis instance is running. Example: us-east-1
  • Access Key: the public key to access the Data Firehose, which is given by AWS. Example: ORIA5ZEH9MW4NL5OITY4
  • Secret Key: the secret key to access the Data Firehose, which is given by AWS. Example: +PLjkUWJyOLth3anuWXcLLVrMLeiiiThIokaPEiw
How to use Amazon Kinesis Data Firehose to receive data

To configure the Azure Blob Storage endpoint, you need access to their platform to get the required information:

  • Storage Account: the storage account name you defined in Blob Storage. Example: mystorageaccount
  • Container Name: the storage container name you defined in Blob Storage. Example: mycontainer
  • Blob SAS Token: the token generated by Blob Storage. It should have create, read, write, and list accesses granted. Example: sp=oiuwdl&st=2022-04-14T18:05:08Z&se=2026-03-02T02:05:08Z&sv=2020-08-04&sr=c&sig=YUi0TBEt7XTlxXex4Jui%2Fc88h6qAgMmCY4XIXeMvxa0%3F
How to use Azure Blob Storage to receive data

To configure the Azure Monitor endpoint, you need access to their platform to get the required information:

  • Log Type: the record type of the data that’s being submitted. It can contain only letters, numbers, the underscore (_) character, and it can’t exceed 100 characters. Example: AzureMonitorTest
  • Shared Key: the Shared Key of the Workspace in Azure Monitor. Example: OiA9AdGr4As5Iujg5FAHsTWfawxOD4
  • Time Generated Field: used for the TimeGenerated field, which specifies how long it’ll take for the log to be available after being collected. If it isn’t specified, it uses the ingestion time. Example: myCustomTimeField
  • Workspace ID: the ID of your Workspace in Azure Monitor. Example: kik73154-0426-464c-aij3-eg6d24u87c50
How to use Azure Monitor to receive data

To configure the Datadog endpoint, you need access to their platform to get the required information:

  • Datadog URL: the URL or URI of your Datadog endpoint. Example: https://inputs.splunk-client.splunkcloud.com:1337/services/collector
  • API Key: the API key generated through the Datadog dashboard. Example: ij9076f1ujik17a81f938yhru5g713422
How to use Datadog to receive data

To configure the Elasticsearch endpoint, you need access to their platform to get the required information:

  • Elasticsearch URL: the URL address + the Elasticsearch index that will receive the collected data. Example: https://elasticsearch-domain.com/myindex
  • API Key: the base64 key provided by Elasticsearch. Example: VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw
How to use Elasticsearch to receive data

To configure the Google BigQuery endpoint, you need access to their platform to get the required information:

  • Project ID: your project ID on Google Cloud. Example: mycustomGBQproject01
  • Dataset ID: the name you have to your dataset on Google BigQuery. It’s a unique identifier per project and case sensitive. Example: myGBQdataset
  • Table ID: your name that you choose for the table on Google BigQuery. Example: mypagaviewtable01
  • Service Account Key: the JSON file provided by Google Cloud. It has the following format:
{
"type": "service_account",
"project_id": "mycustomGBQproject01",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
"client_email": "service-account-email",
"client_id": "client-id",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}
How to use Google BigQuery to receive data

To configure the IBM Qradar endpoint, you need access to their platform to get the required information:

  • URL: The URL that will receive the collected data.

You can use any type of S3 (Simple Storage Service) provider of your choice. To configure the S3 endpoint, you need access to the chosen platform to get the required information:

  • Host URL: the URL of the Host S3. Example: https://myownhost.s3.us-east-1.myprovider.com
  • Bucket Name: the name of the Bucket that the object will be sent to. You define the bucket name. Example: mys3bucket
    • The bucket must be created before enabling Data Streaming to send the objects.
  • Region: the region in which your bucket is hosted. Example: us-east-1
  • Access Key: the public key to access your bucket given by your provider. Example: ORIA5ZEH9MW4NL5OITY4
  • Secret Key: the secret key to access your bucket given by your provider. Example: +PLjkUWJyOLth3anuWXcLLVrMLeiiiThIokaPEiw
  • Object Key Prefix: a prefix that you can add to your uploaded object to the files that will be sent. The objects’ names are composed of Prefix + Timestamp + UUID. Example: if you use waf_logs as the prefix, one of the sent objects will be saved as waf_logs_1622575860091_37d66e78-c308-4006-9d4d-1c013ed89276
  • Content Type: the format in which the object will be created in your bucket. You can chose between plain/text or application/gzip.
How to use Amazon S3 to receive data

To configure the Splunk endpoint, you need access to their platform to get the required information:

  • Splunk URL: the URL that will receive the collected data. If you have an alternative index to point, you can add it at the end of the URL. Example: https://inputs.splunkcloud.com:8080/services/collector?index=myindex
  • API Key: the HTTP Event Collector Token provided during your Splunk installation. Example: crfe25d2-23j8-48gf-a9ks-6b75w3ska674
How to use Splunk to receive data

To configure the Standard HTTP/HTTPS POST endpoint, you need to get the required information:

  • Endpoint URL: the URL that will receive the collected data. Example: https://app.domain.com/
  • Custom Headers (optional): the names and values for each header to send to the endpoint. You can enter one or more custom headers for your HTTP/HTTPS request. Example: header-name:value

Requires: Standard HTTP/HTTPS POST endpoint

When using the Standard HTTP/HTTPS POST, the payload for your endpoint isn’t predefined. You have the option to customize the essential information that will be sent in your data as you best see fit.

To customize the payload sent by the connector in Data Streaming, you have the following fields:

  • Max Size (optional): defines the size of data packets that will be sent in bytes. It accepts values starting from 1000000.
  • Log Line Separator (optional): defines what information will be used at the end of each log line. Used to break information into different lines.
  • Payload Format (optional): defines which information will be sent in your data, for each data streaming request.

By default, Data Streaming recommends a NDJSON format with the use of \n as a log line separator and $dataset as a payload format, which uses the information from the Data Set code box, describing the variables chosen as a template.

You can choose other options for both fields. Depending on your logs, you can use , as a log line separator, for example.

A NDJSON format isn’t wrapped with typical JSON arrays [], and each data is presented in a different line, without a comma separating them. It can be useful for structured data with processing of one record at a time. A JSON format, on the other hand, is well known and has tabular data, using arrays [] and being separated by commas. With it, the payload can be treated as a single record.

For example, if the Log Line Separator receives , and the Payload Format receives [$dataset], and the template has the following variables in the Data Set code box:

$request_method
$host
$status

You get a JSON response similar to this:

[
{
"request_method": "GET",
"host": "www.onedomain.com",
"status": "200"
},
{
"request_method": "POST",
"host": "www.anotherdomain.com.br",
"status": "200"
}
]

But if the Log Line Separator receives \n and the Payload Format receives $dataset, you get a NDJSON response similar to this:

{"request_method": "GET", "host": "www.onedomain.com", "status": "200"}
{"request_method": "POST", "host": "www.anotherdomain.com.br", "status": "200"}

Customizable payload with Activity History

Section titled Customizable payload with Activity History

You can customize the payload with specific information if you’re using the Activity History data source and the Standard HTTP/HTTPS POST endpoint.

The following information must be used in the payload fields to configure it:

  • Log Line Separator: \n
  • Payload Format: ‘v1\t$time_iso8601\t$clientid\t$title\t$comment\t$type\t$author_name\t$author_email’

Data Streaming servers work in two steps: they monitor the endpoints once a minute (1x/min) and state whether the endpoint is available or unavailable. The cost of sending messages with an error is very high, so prior monitoring is required.

Data Streaming attempts to send messages to endpoints that are available. If the endpoint is unavailable, messages aren’t sent, as the information is discarded. During the next minute, Data Streaming sends the data again if the endpoint is considered available.

The endpoint must be approved by all Azion servers to be considered as available. If one of the servers indicates that the endpoint is unavailable, messages aren’t sent.

Occurs when the endpoint responds to the test successfully, but doesn’t receive messages within the timeout: 20 seconds for HTTP POST type endpoints.

Occurs when the endpoint is declared unavailable by endpoint monitoring. Therefore, Data Streaming doesn’t attempt to send the messages.

Since the system is distributed, it isn’t possible to know the specific server that sends messages to each endpoint.




Contributors