1 of 20
2 of 20
3 of 20
4 of 20
5 of 20
6 of 20
7 of 20
8 of 20
9 of 20
10 of 20
11 of 20
12 of 20
13 of 20
14 of 20
15 of 20
16 of 20
17 of 20
18 of 20
19 of 20
20 of 20

site

doc

blog

success stories

Data Streaming

Edit on GitHub

Data Streaming is an Edge Analytics product that allows you to feed your SIEM, big data and stream processing platforms with access to your content and applications data, in real time, adding even more intelligence to your business.

This integration allows you to analyze the behavior of your users and the performance of your content, applications and troubleshooting, in a simple and agile way.

In addition, Data Streaming follows the good practices of information traffic by using ASCII encoding to avoid parser issues, that is, problems in data interpretation.

  1. Accessing Data Streaming

  2. Selecting Data Sources

  3. Using a Template

  4. Associating Domains

  5. Setting the Endpoint

    Apache Kafka          AWS Kinesis Data Firehose          Data Dog

    Elasticsearch          Google BigQuery          IBM QRadar

    Simple Storage Service          Splunk         Standard HTTP/HTTPS POST

  6. Customizing the Payload

  7. Activating your Settings


1. Accessing Data Streaming

Enter Real-Time Manager. Click on Data Streaming either on the Edge Analytics or the Products menu, which is on the upper left.

Configure your Data Streaming settings according to the following possibilities.

Fields marked with an asterisk are required.


2. Selecting Data Sources

The first step is choosing the Data Source, which represents the application at Azion that generated the event logs. In doing so, you must select where your data will be collected from. You have the following options:

Edge Applications

It displays the data from requests made to your Edge Applications at Azion.

Variables Description
$bytes_sent Bytes sent to the user, including header and body.
$client Unique Azion customer identifier.
$configuration Unique Azion configuration identifier.
$country Country name of the remote client, for example “Russian Federation”, “United States”. Geolocation detection by IP address.
$host Host information sent on the request line; or HTTP header Host field.
$http_referrer Information from the last page the user was on before making the request.
$http_user_agent The identification of the application that made the request, for example: Mozilla/5.0 (Windows NT 10.0; Win64; x64).
$proxy_status HTTP Status Code of origin (”-” in case of cache).
$remote_addr IP address of the request.
$remote_port Remote port of the request.
$request_length Request size, including request line, headers and body.
$request_method Request method; usually “GET” or “POST”.
$request_time Request processing time with resolution in milliseconds.
$request_uri URI of the request made by the user, without the Host and Protocol information.
$requestPath The request URI without Query String, Host and Protocol information.
$requestQuery Only the URI parameters of the request.
$scheme Request scheme “http” or “https”.
$sent_http_content_type “Content-Type” header sent in the origin’s response.
$sent_http_x_original_image_size “X-Original-Image-Size” header sent in the origin’s response (used by IMS to inform original image size).
$server_protocol The protocol of the connection established, usually “HTTP/1.1” or “HTTP/2.0”.
$ssl_cipher Cipher string used to establish SSL connection.
$ssl_protocol The protocol for an established SSL connection, for example “TLS v1.2”.
*$state Name of the remote client’s state, for example: “RS”, “SP”.
Geolocation detection of IP address.
$status The status code of the request, for example: 200.
$tcpinfo_rtt The RTT time in microseconds measured by Edge for the user.
$time Request date and time.
$upstream_bytes_received Number of bytes received by the origin’s Edge, if the content is not cached.
$upstream_cache_status Status of the Edge cache. It can assume the values “MISS”, “BYPASS”, “EXPIRED”, “STALE”, “UPDATING”, “REVALIDATED” or “HIT”.
$upstream_connect_time Time in milliseconds for Edge to establish a connection with the origin (“0” in case of KeepAlive and “-“ in case of cache).
$upstream_header_time Time in milliseconds for Edge to receive the origin’s response headers ( “-“ in case of cache).
$upstream_response_time Time in milliseconds for Edge to receive all of the response from the origin, including headers and body (“-“ in case of cache).
$upstream_status HTTP Status Code of the origin (“-“ in case of cache).
$waf_attack_action It reports WAF’s action regarding the action ($BLOCK, $PASS, $LEARNING_BLOCK, $LEARNING_PASS).
$waf_attack_family It informs the classification of the WAF infraction detected in the request (SQL, XSS, TRAVERSAL, among others).

Edge Functions

In case you have contracted Edge Functions, you may use the following available variables for the defined data source.

Variable Description
$time Request date and time.
$global_id Settings ID.
$edge_function_id Edge Function ID.
$request_id Request ID.
$log_level Level of the log generator (ERROR, WARN, INFO, DEBUG, TRACE).
$log_message Message used in the log function.

WAF

If you have contracted the Web Application Firewall product, the WAF Events data source will display the requests analyzed by WAF to allow you to map the score assigned to the request, the WAF rules that matched, the reason for the block and more.

Variable Description
$blocked It informs whether the WAF blocked the action or not; 0 when not blocked and 1 when blocked. When in “Learning Mode”, it will not be blocked, regardless of the return.
$client Unique Azion customer identifier.
$configuration Unique Azion configuration identifier.
$country Country name of the remote client, for example “Russian Federation”, “United States”. Geolocation detection by IP address.
$headers Request headers analyzed by WAF.
$host Host information sent on the request line; or Host field of the HTTP header.
$remote_addr IP address of the request.
$requestPath The request URI without Query String, Host and Protocol information.
$requestQuery Only the URI parameters of the request.
$server_protocol The protocol of the connection established, usually “HTTP/1.1” or “HTTP/2.0”.
$time Timestamp of the start of the request.
$version The version of Azion Log used.
$waf_args The request arguments.
$waf_attack_action It reports WAF’s action regarding the action ($BLOCK, $PASS, $LEARNING_BLOCK, $LEARNING_PASS).
$waf_attack_family It informs the classification of the WAF infraction detected in the request (SQL, XSS, TRAVERSAL, among others)
$waf_learning It informs if WAF is in learning mode, usually 0 or 1.
$waf_match List of infractions found in the request, it is formed by key-value elements; the key refers to the type of violation detected; the value shows the string that generated the infraction.
$waf_score It reports the score that will be increased in case of match.
$waf_server Hostname used in the request.
$waf_uri URI used in the request.

3. Using a Template

The template represents a selection of variables to be collected and a format for transfer. You can select templates created and maintained by Azion or customize your own selection.

  • Custom Template:

    Choose the Custom Template option to create your own customized Data Set, in JSON format, and select the variables that best suit your needs.

On the Data Sources list you’ll find a description of all the available variables. Give it a try!

By default, Data Streaming will send your events when the block reaches 2000 records or every 60 seconds, whichever occurs first. But in case you choose the AWS Kinesis Firehose endpoint, Data Streaming will send your events when the block reaches 500 records or every 60 seconds.

Your records will be separated by the character \n, and sent in the payload to your endpoint.


4. Associating Domains

You can associate Data Streaming with one or more of your domains registered with Azion.

When associating a domain with Data Streaming, the events associated with that domain will be collected and sent to its endpoint.

You can also set the percentage of data you want to receive from your Data Streaming through the Sampling option. In addition to filtering by sampling, it can also reduce costs of data collection and analysis.

  • Sampling:
  1. Select the All Domains option to enable Sampling.
  2. Enter the number for the percentage of data you want to receive. This percentage will return the total data related to all your domains.

When the Sampling option is enabled, you are allowed to add only one Data Streaming. If this Data Streaming is disabled, the Add Data Streaming option will be enabled again.


5. Setting the Endpoint

The Endpoint is the destination where you want to send the data collected by Azion. The endpoint type represents the method that your endpoint is configured to receive data from the Data Streaming.

See how to configure the endpoints available in Real-Time Manager as follows.

Apache Kafka

When setting up Data Streaming to send data to a Kafka endpoint in your infrastructure, you need first to obtain the following information to fill in the Real-Time Manager fields.

  • Bootstrap Servers: refers to the servers in the Kafka cluster, in this format host1: port1, host2: port2, …, for example.

    You can add one or more Kafka servers to receive data collected by Data Streaming. When adding multiple servers, separate each host:door with a comma and no space, as in this example:

    myownhost.com:2021,imaginaryhost.com:4525,anotherhost:4030

    This list should have just a few servers that will be used for the initial connection. There is no need to include all of the servers in your cluster.

    We recommend that you use more than one server to increase redundancy and availability.

  • Kafka Topic: refers to the Topic where Data Streaming should send messages in your cluster. This field only accepts a single Topic.

    In this field, you have to inform the Topic name where the messages will be posted. Even if multiple addresses have been registered in the Bootstrap Servers field, you can register only one destination topic.

    Example: azure.analytics.fct.pageviews.0

    For more details on Apache Kafka settings, see here.

AWS Kinesis Data Firehose

By using this type of endpoint, the Data Streaming service sends data directly to your AWS Kinesis Data Firehose service.

When using AWS Kinesis Data Firehose, your events will be grouped in blocks of up to 500 records, instead of the default 2000, or every 60 seconds, whichever occurs first.

  • Stream Name: refers to the stream name that the user defined when they created the Kinesis Data Firehose. For example: MyKDFConnector

  • Region: refers to the region in which we are going to send the data. For example: us-east-1.

  • Access Key: refers to the public key to access the Data Firehose, which is given by AWS.

    For example: ORIA5ZEH9MW4NL5OITY4

  • Secret Key: refers to the secret key to access the Data Firehose, which is given by AWS.

    For example: +PLjkUWJyOLth3anuWXcLLVrMLeiiiThIokaPEiw

Data Dog

By using this type of endpoint, the Data Streaming service sends data to a Data Dog endpoint in your infrastructure.

  • Data Dog URL: (requiered) refers to the URI that will receive the collected data from Data Streaming.

  • API Key: (required) refers to the key generated through the Data Dog panel and it is unique to your organization. An API Key is required by the Data Dog agent to submit metrics and events to Data Dog.

    For example: ij9076f1ujik17a81f938yhru5g713422

Elasticsearch

By using this type of endpoint, the Data Streaming service sends data to a Elasticsearch endpoint in your infrastructure.

  • Elasticsearch URL: (required) refers to the URL address plus the Elasticsearch index that will receive the data collected from Data Streaming. Here you must enter the data destination URL, followed by the index.

    For example: https://elasticsearch-domain.com/myindex

  • API Key: (required) refers to the base64 key, provided by Elasticsearch and will be used for validation with Elasticsearch. When generated, the key will look like this example:

    VuaCfGcBCdbkQm-e5aOx:ui2lp2axTNmsyakw9tvNnw

Google BigQuery

By using this type of endpoint, the Data Streaming service sends data to a Google BigQuery endpoint in your infrastructure.

  • Project ID: refers to your project ID on Google Cloud. This is a unique identifier and may have been generated by Google or customized by the customer.

    For example: mycustomGBQproject01

  • Dataset ID: refers to your dataset ID on Google BigQuery, that is, the name that you have given to your dataset. It is a unique identifier per project and case sensitive.

    For example: myGBQdataset

  • Table ID: refers to your table ID on Google BigQuery, that is, the name that you choose for the table. Just as the dataset must have a unique name per project, the table name also has to be unique per dataset.

    For example: mypagaviewtable01

  • Service Account Key: refers to the JSON file that is provided by Google Cloud. It has the following format:

{
    "type": "service_account",
    "project_id": "mycustomGBQproject01",
    "private_key_id": "key-id",
    "private_key": "-----BEGIN PRIVATE KEY-----\nprivate-key\n-----END PRIVATE KEY-----\n",
    "client_email": "service-account-email",
    "client_id": "client-id",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://accounts.google.com/o/oauth2/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account-email"
}

IBM QRadar

By using this type of endpoint, the Data Streaming service sends data in an HTTP POST payload to be processed on your platform.

  • URL: The URL that will receive the collected data from Data Streaming.

S3 - Simple Storage Service

By using this type of endpoint, the Data Streaming service sends data directly to any storage that works with the S3 protocol, such as Amazon S3, among others.

  • Host URL: refers to the URL of the Host S3. You can connect with any provider that works with the S3 protocol.

    For example: http://myownhost.s3.us-east-1.myprovider.com

  • Bucket Name: refers to the name of the Bucket that the object will be sent to.

    It is important that the Bucket is already created to enable Data Streaming to send the objects.

    You define the Bucket name. For example: mys3bucket

  • Region: refers to the region in which the Bucket is hosted.

    For example: us-east-1.

  • Access Key: refers to the public key to access the Bucket, which is given by your provider.

    For example: ORIA5ZEH9MW4NL5OITY4

  • Secret Key: refers to the secret key to access the Bucket, which is given by your provider.

    For example: +PLjkUWJyOLth3anuWXcLLVrMLeiiiThIokaPEiw

  • Object Key Prefix: refers to a prefix that you can add to the files that will be sent. The objects name are composed of Prefix + Timestamp + UUID.

    For example: if you use waf_logs as prefix, one of the sent objects will be saved like waf_logs_1622575860091_37d66e78-c308-4006-9d4d-1c013ed89276.

  • Content Type: refers to the format in which the object will be created in the Bucket.

    You count on the plain/text and application/gzip options.

Based on these examples, when configuring the Destination section, you’ll get the following:

DS_S3

Splunk

By using this type of endpoint, the Data Streaming service sends data to a Splunk endpoint in your infrastructure.

  • Splunk URL: (required) refers to the URL that will receive the collected data from Data Streaming. If you have an alternative index to point, you can do it at the end of the URL.

    For example:https://inputs.splunkcloud.com:8080/services/collector?index=myindex.

  • API Key: (required) refers to the HTTP Event Collector Token, provided by your Splunk installation. It will look like this example: crfe25d2-23j8-48gf-a9ks-6b75w3ska674

Standard HTTP/HTTPS POST

By using this type of endpoint, the Data Streaming service sends data in an HTTP POST payload to be processed on your platform.

  • Endpoint URL: refers to the URL configured on the platform to receive Data Streaming data.

    Use the format scheme://domain/path.

    For example: https://app.domain.com/

  • Custom Headers: you can enter one or more custom headers for your HTTP / HTTPS request.

    For the headers configuration, you have to inform the Name and Value for each header. For example, if you want to inform the authorization and secret key, your custom headers will look like this:

customheader


6. Customizing the Payload

You can also customize the Payload sent by the connector according to the following options.

  • Log Line Separator: (optional) defines the type of information that will be used at the end of each log line.

  • Payload Format: (optional) defines which information will be at the beginning and at the end of the dataset.

For example, if the Log Line Separator receives “,” and the Payload Format receives “[&dataset]”, the user will get the following exit code:”

    [
    	{
			"request_method": "GET",
         	"host": "statis.exampledomain.com.br",
          "status": "200"
        },
        {
        	"request_method": "GET",
          	"host": "statis.exampledomain.com.br",
          	"status": "200"
        }
	]

This feature is only available for the HTTP POST endpoint.


7. Activating your Settings

You will find the following buttons at the bottom of the screen:

  • Active: turn this button into active to enable your settings on the system.

  • Cancel: with this option, you return to the Data Streaming home page, also discard your edits.

  • Save: once your selections are complete, save your settings by clicking the Save button.


Didn’t find what you were looking for? Open a support ticket.