Sending logs to Scalyr using Fluentd

Fluentd is an open source software which allows you to unify log data collection and it is designed to scale and simplify log management. You can stream logs to Scalyr with fluentd-plugin-scalyr, so you can search logs, setup alerts and build dashboards from a centralized log repository.

Prerequisites

This document assumes that you already have a machine with Fluentd installed. If you need help, please refer to the instructions to install Fluentd on your machine.

Steps

1. Install fluent-plugin-scalyr

Run the following command to get fluent-plugin-scalyr ` td-agent-gem install fluent-plugin-scalyr ` You can check to see if this is the latest version as well as checking dependencies by checking this page. You may need to install ruby-dev (and possibly make/gcc) as well, depending on your current environment.

2. Setup Fluentd Configuration File You can find the Fluentd configuration file at /etc/td-agent/td-agent.conf. Here is the example of the configuration to demonstrate the Fluentd to Scalyr ingestion workflow.

```yaml
<source>
  @type http
  port 8888
</source>
<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>
<source>
  @type tail
  path /var/log/httpd/access_log
  pos_file /var/log/td-agent/httpd.access_log.pos
  <parse>
    @type apache2
  </parse>
  tag scalyr.apache.access
</source>
<filter scalyr.myapp.access>
   @type record_transformer
   <record>
   parser myapp
   </record>
   @type record_modifier
   <record>
   cluster "fluentd_test_cluster"
   </record>
   @type record_modifier
   <record>
   fluentd_parser_time ${Time.now.utc.iso8601(3)}
   </record>
</filter>
<filter scalyr.apache.access>
  @type record_transformer
  <record>
    parser accessLog
  </record>
</filter>
<match scalyr.**>
  @type scalyr
  api_write_token XXXXX
  scalyr_server "https://upload.scalyr.com"
  server_attributes {
    "serverHost": "fluentd_host",
    "parser": "myapp"
  }
  message_field log
</match>
```

The configuration file consists of a series of directives and you need to include at least `source`, `filter`,
and `match` in order to send logs to Scalyr.

Source directives control the input sources. In this example Fluentd is accepting requests from 3 different sources

* HTTP messages from port `8888`
* TCP packets from port `24224`
* Read events from the tail of the access log file

The `scalyr.apache.access` tag in the access log source directive matches the `filter` and `match` directives
in the latter parts of the configuration.

Filter directives determine the event processing pipelines. In the scope of log ingestion to Scalyr,
filter directives are used to specify the parser, i.e. `myapp`, `accessLog`, and append additional fields,
i.e. `cluster`, `fluentd_parser_time`, to the log event.

Match directives determine the output destinations.
Copy the write logs API key from Manage API Keys page in appropriate regions -
[US](https://app.scalyr.com/keys) or [EU](https://app.eu.scalyr.com/keys) - and paste the value to the "api_write_token".
You can also specify serverAttributes by adding additional fields, such as "serverHost" and "parser".

To the match directive, you can add one extra field - "message_field".
"Message_field" specifies the field that contains the actual log message you want to send to Scalyr.
You can use either **message** or **log**, **message** being the default.
Fluentd checks to see if the field specified by "message_field" exists. If so then it uses that, otherwise it uses **message**.

3. Start Fluentd Use `/etc/init.d/td-agent` to start, stop or restart Fluentd agent.

```
$ sudo /etc/init.d/td-agent start
Starting td-agent (via systemctl):                         [OK]
```
Check td-agent log (i.e. */var/log/td-agent/td-agent.log*) if you encounter any issues launching the td-agent.

4. Send Logs As mentioned above, we would use three methods to send logs to Scalyr: * Sending logs with HTTP In this example, we’re sending an HTTP POST call with body `json {"msg": "hello scalyr from http"} ` using port `8888`. Append `myapp.access` to the URL path to route the traffic using the `scalyr.myapp.access` filter: ` curl  -XPOST -d 'json={"message":"{\"msg\":\"hello scalyr from http\"}"}' http://127.0.0.1:8888/scalyr.myapp.access ` * Sending logs with TCP Here, we’re using Docker’s Fluentd log driver to send a message from stdout to Scalyr. The actual log message `json {"msg": "hello scalyr from tcp"} ` is applied to the `log` field of the log event, so it is important to include `message_field log` in your match directive; otherwise, the `myapp` parser will not be applied to the log message.

    ```shell script
    sudo docker run \
        --log-driver=fluentd \
        --log-opt tag=scalyr.myapp.access \
        --log-opt fluentd-address=127.0.0.1:24224 \
        ubuntu echo '{"msg": "hello scalyr from tcp"}'
    ```
* Sending logs from an access log file
    In this example, We are sending our Apache httpd access log to Scalyr.
    We have specified the access log file path (/var/log/httpd/access_log) in the source directive.
    Fluentd will then start reading the tail of the access log to Scalyr.
    Open a URL from your web server, refresh an already open page,
    or use curl to generate a call against your server and write to your access log.

5. View Logs Go to Scalyr search page and search `$serverHost == "fluentd_host"` to find logs you ingested in the above examples. !search page The parser `myapp` has one simple format `${parse=json}$` to parse json logs and it is applied to logs ingested using http and tcp. Click "INSPECT FIELDS" on the log event to verify the parsing; you should find a `msg` field with value "hello scalyr from http". In addition, you should also see `fluentd_parser_time` and `cluster attributes` if you used the sample Fluentd configuration file from step 2. Those fields were added because we included the `record_modifier` option in the filter directive. !search page

More about configuration

The Scalyr output plugin has a number of sensible defaults so the minimum configuration only requires your Scalyr 'write logs' token.

<match scalyr.*>
  @type scalyr
  api_write_token YOUR_SCALYR_WRITE_LOGS_TOKEN
</match>

The following configuration options are also supported:

<match scalyr.*>
  @type scalyr

  #scalyr specific options
  api_write_token YOUR_SCALYR_WRITE_TOKEN
  compression_type deflate
  compression_level 6
  use_hostname_for_serverhost true
  server_attributes {
    "serverHost": "front-1",
    "serverType": "frontend",
    "region":     "us-east-1"
  }

  scalyr_server https://agent.scalyr.com/
  ssl_ca_bundle_path /etc/ssl/certs/ca-bundle.crt
  ssl_verify_peer true
  ssl_verify_depth 5
  message_field message

  max_request_buffer 5500000

  force_message_encoding nil
  replace_invalid_utf8 false

  #buffered output options
  <buffer>
    retry_max_times 40
    retry_wait 5s
    retry_max_interval 30s
    flush_interval 5s
    flush_thread_count 1
    chunk_limit_size 2.5m
    queue_limit_length 1024
  </buffer>

</match>

For some additional examples of configuration for different setups, please refer to the examples/configs/ directory.

Scalyr specific options

compression_type - compress Scalyr traffic to reduce network traffic. Options are `bz2` and `deflate`. See here for more details. This feature is optional.

api_write_token - your Scalyr write logs token. See here for more details. This value must be specified.

server_attributes - a JSON hash containing custom server attributes you want to include with each log request. This value is optional and defaults to nil.

use_hostname_for_serverhost - if `true` then if `server_attributes` is nil or it does not include a field called `serverHost` then the plugin will add the `serverHost` field with the value set to the hostname that fluentd is running on. Defaults to `true`.

scalyr_server - the Scalyr server to send API requests to. This value is optional and defaults to agent.scalyr.com/

ssl_ca_bundle_path - a path on your server pointing to a valid certificate bundle. This value is optional and defaults to nil, which means it will look for a valid certificate bundle on its own.

Note: if the certificate bundle does not contain a certificate chain that verifies the Scalyr SSL certificate then all requests to Scalyr will fail unless ssl_verify_peer is set to false. If you suspect logging to Scalyr is failing due to an invalid certificate chain, you can grep through the Fluentd output for warnings that contain the message 'certificate verification failed'. The full text of such warnings will look something like this:

>2015-04-01 08:47:05 -0400 [warn]: plugin/out_scalyr.rb:85:rescue in write: SSL certificate verification failed. Please make sure your certificate bundle is configured correctly and points to a valid file. You can configure this with the ssl_ca_bundle_path configuration option. The current value of ssl_ca_bundle_path is '/etc/ssl/certs/ca-bundle.crt'

>2015-04-01 08:47:05 -0400 [warn]: plugin/out_scalyr.rb:87:rescue in write: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

>2015-04-01 08:47:05 -0400 [warn]: plugin/out_scalyr.rb:88:rescue in write: Discarding buffer chunk without retrying or logging to &lt;secondary&gt;

The cURL project maintains CA certificate bundles automatically converted from mozilla.org here.

ssl_verify_peer - verify SSL certificates when sending requests to Scalyr. This value is optional, and defaults to true.

ssl_verify_depth - the depth to use when verifying certificates. This value is optional, and defaults to 5.

message_field - Scalyr expects all log events to have a 'message' field containing the contents of a log message. If your event has the log message stored in another field, you can specify the field name here, and the plugin will rename that field to 'message' before sending the data to Scalyr. Note: this will override any existing 'message' field if the log record contains both a 'message' field and the field specified by this config option.

max_request_buffer - The maximum size in bytes of each request to send to Scalyr. Defaults to 5,500,000 (5.5MB). Fluentd chunks that generate JSON requests larger than the max_request_buffer will be split in to multiple separate requests. Note: The maximum size the Scalyr servers accept for this value is 6MB and requests containing data larger than this will be rejected.

force_message_encoding - Set a specific encoding for all your log messages (defaults to nil). If your log messages are not in UTF-8, this can cause problems when converting the message to JSON in order to send to the Scalyr server. You can avoid these problems by setting an encoding for your log messages so they can be correctly converted.

replace_invalid_utf8 - If this value is true and force_message_encoding is set to 'UTF-8' then all invalid UTF-8 sequences in log messages will be replaced with <?>. Defaults to false. This flag has no effect if force_message_encoding is not set to 'UTF-8'.

Buffer options

retry_max_times - the maximum number of times to retry a failed post request before giving up. Defaults to 40.

retry_wait - the initial time to wait before retrying a failed request. Defaults to 5 seconds. Wait times will increase up to a maximum of retry_max_interval

retry_max_interval - the maximum time to wait between retrying failed requests. Defaults to 30 seconds. Note: This is not the total maximum time of all retry waits, but rather the maximum time to wait for a single retry.

flush_interval - how often to upload logs to Scalyr. Defaults to 5 seconds.

flush_thread_count - the number of threads to use to upload logs. This is currently fixed to 1 will cause fluentd to fail with a ConfigError if set to anything greater.

chunk_limit_size - the maximum amount of log data to send to Scalyr in a single request. Defaults to 2.5MB. Note: if you set this value too large, then Scalyr may reject your requests. Requests smaller than 6 MB will typically be accepted by Scalyr, but note that the 6 MB limit also includes the entire request body and all associated JSON keys and punctuation, which may be considerably larger than the raw log data. This value should be set lower than the `max_request_buffer` option.

queue_limit_length - the maximum number of chunks to buffer before dropping new log requests. Defaults to 1024. Combines with chunk_limit_size to give you the total amount of buffer to use in the event of request failures before dropping requests.

The Scalyr output plugin has a number of sensible defaults so the minimum configuration only requires your Scalyr 'write logs' token.

Enable multiple workers

In case of high traffic, Scalyr plugin also provides ability to use multiple workers feature of the Fluentd. Here is a simple example:

<system>
  workers 4
</system>

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

<match **>
  @type scalyr
  # Don't forget to change your token!!
  api_write_token <scalyr account api key here>
  #...
  # other options.
  #...
</match>

Send logs to Scalyr using multiple accounts.

Not only does the multiple workers feature allow us to increase throughput, it also allows us to send logs with particular tags to different Scalyr accounts. Sending logs to multiple Scalyr team accounts can be useful when a customer wants to have different access levels by team (for example a person could have full access to the logs for their own team, but read-only access for other teams). Also, for high-volume cusrtomers (many TB a day) it is good to split their logs into multiple accounts so that users are not constantly searching across a huge data set when they are typically only focused on a subset. This is usually done by organizational team as well.

To do that, you need to create a separate Scalyr Output entries for each Scalyr account.

Here is an example of using multiple Scalyr accounts.

<system>
  workers 4
</system>

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>

# this matches anything coming with tag "myapp.scalyr1.*" and sends it using an appropriate Scalyr account API key.
<match myapp.scalyr1.**>
  @type scalyr
  # Don't forget to change your token!!
  api_write_token <scalyr account 1 api key here>

  #...
  # other options.
  #...
</match>

# this matches anything coming with tag myapp.scalyr2.* and sends it using an appropriate Scalyr account API key.
<match myapp.scalyr2.**>
  @type scalyr
  # Don't forget to change your token!!
  api_write_token <scalyr account 2 api key here>

  #...
  # other options.
  #...
</match>