Archive Logs to Amazon S3

This Solution describes how to archive a copy of your logs to Amazon's S3 storage service for long-term storage. (Note that the archived copy cannot be viewed, searched or analyzed from within Scalyr.)

The document covers the latest version of Archive to S3 released in 2020.

Prerequisites

1. An Amazon AWS account.

2. An Amazon S3 bucket to hold your archived logs. To create a bucket: log into the AWS console, navigate to the S3 service page, and click Create Bucket.

Steps

1. Give Scalyr permission to add logs to the S3 bucket you'll be using:

  • Log into the AWS console and navigate to the S3 service page.
  • Click on the name of the bucket where archived logs should be written.
  • Click on Permissions.
  • Click Add User.
  • Enter the canonical ID for aws@scalyr.com ( c768943f39940f1a079ee0948ab692883824dcb6049cdf3d7725691bf4f31cbb ) and check the Object Access Read and Write permission check boxes and click Save.

For example, the Add User form should look like:

2. Open the /scalyr/logs configuration file.

Add a logArchiveRules entry like the following. This example archives activity from frontend servers.

logArchiveRules: [
  {
    match: "$serverHost contains 'frontend'",
    includeParsedFields: true,
    omitIfEmpty: true,
    destination: {
      type: "s3",
      bucket: "log-archive.us-east-1.example.com",
      prefix: "frontend/{yyyy}/{mm}/{dd}/{hh}"
    }
  }, {
    ...more rules...
  }
],

This defines a list of archive rules — you can define up to 20. The match expression is a Scalyr filter expression used to organize log messages into archives. Each message is written to the first destination that it matches, so the order of these configuration entry matters. To create a catchall archive that accepts all messages not matched by an earlier archive, omit the match field.

The includeParsedFields entry is optional. If true, then any attributes created by the Scalyr log parser will also be included in the output.

The omitIfEmpty entry is optional, default false. If true, then do not create an S3 object for an empty result.

The destination entry defines the S3 bucket and S3 object key format for the archive rule. The type is s3 for an S3 archive. The bucket is the S3 bucket name in which to store matching events for the rule.

The prefix defines the first part of the S3 object key. The second part is an arbitrary string, with format <yyyy-mm-ddThh-mm-ssZ><part-idx>.gz, e.g. 2019-05-23T10-20-00Z.10019.gz

Click Update File to save the change.

3. The first batch of logs should be uploaded within a few minutes. To check, go back to the S3 service page in the AWS console and click on the bucket name. If you don't see any logs, wait a few more minutes and then click the refresh icon in the upper-right corner of the S3 console.

Archive Format

Each file is in gzip'd Line-delimited JSON format, one JSON record per log message. An individual message looks like this:

{"message":"...","timestamp":"1970-01-01T00:01:00.000Z","serverHost":"xxx", "serverIP":"1.2.3.4"}

The message and timestamp fields are always present, holding the raw log text and the logical timestamp of the message (in ISO 8601 format). The remaining fields, serverHost and serverIP in this example, are whichever source-level attributes were specified when the message was imported. For messages imported via the Scalyr Agent, this will include the name of the log file.

If the archive rule specifies includeParsedFields: true, then any attributes created by the Scalyr log parser will also be included in the JSON record.

If an event does not have a message field, then all fields of the event are included in the JSON record. Typically this occurs for events that are inserted as structured data via our /addEvents API, and for which no raw message was specified.

Organizing Logs by Day or Month

You can use the prefix setting to group logs into multiple directories, avoiding one gigantic directory in S3. For instance, if you specify {yyyy}/{mm}/{dd}, then a directory will be created for each year, month, and date. All of these directories reside in the same S3 bucket.

You can use the following substitution tokens in prefix:

Token Replacement
{yyyy} The four-digit year, e.g. 2019
{yy} The two-digit year, e.g. 19
{mm} The two-digit month, e.g. 05 for May
{dd} The two-digit date, e.g. 23 for May 23

If you would like to use a different format for log archives, please let us know.

Troubleshooting

Scalyr logs a brief message each time it uploads a batch of logs to S3. You can view these messages to confirm that everything is working correctly, or to troubleshoot problems. To view these messages, go to the Search page and search for tag='archive'.

If you haven't properly configured S3 permissions, you'll see an error like this:

errorMessage=Amazon S3 error: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: AccessDenied, AWS Error Message: Access Denied

Review step 1 above ("Give Scalyr permission to add logs to the S3 bucket..."), and make sure that the bucket name you've specified in the configuration file is correct.

After correcting a configuration error, you will usually have to wait an hour or so until the next batch of logs is uploaded.

If you don't see any records at all with tag='archive', check your usage plan (https://app.scalyr.com/plan). S3 log archiving is not enabled for accounts on the Startup plan.

Turning off S3 Archives

To turn off S3 archives, return to the /scalyr/logs configuration file and remove the logArchiveRules entries. This will not affect existing archives in S3. You can re-enable archiving at any time by adding the logArchiveRules entries again.

Further Reading

If you would like to delete your archived logs after some period of time, you can use S3's Object Lifecycle Management feature. See the S3 documentation for details.