Archive Logs to Amazon S3

This Solution describes how to archive a copy of your logs to Amazon's S3 storage service for long-term storage. (Note that the archived copy cannot be viewed, searched or analyzed from within Scalyr.)

The document covers the latest version of Archive to S3 released in 2020.


1. An Amazon AWS account.

2. An Amazon S3 bucket to hold your archived logs. To create a bucket: log into the AWS console, navigate to the S3 service page, and click Create Bucket.


1. Give Scalyr permission to add logs to the S3 bucket you'll be using:

  • Log into the AWS console and navigate to the S3 service page.
  • Click on the name of the bucket where archived logs should be written.
  • Click on Permissions.
  • Click Add User.
  • Enter the canonical ID for ( c768943f39940f1a079ee0948ab692883824dcb6049cdf3d7725691bf4f31cbb ) and check the Object Access Read and Write permission check boxes and click Save.

For example, the Add User form should look like:

2. Open the /scalyr/logs configuration file.

Add a logArchiveRules entry like the following. This example archives activity from frontend servers.

logArchiveRules: [
    match: "serverHost contains 'frontend'",
    includeParsedFields: true,
    destination: {
      type: "s3",
      bucket: "",
      prefix: "frontend/{yyyy}/{mm}/{dd}/{hh}"
  }, {
    ...more rules...

This defines a list of archive rules — you can define up to 20. The match expression is a Scalyr filter expression used to organize log messages into archives. Each message is written to the first destination that it matches, so the order of these configuration entry matters. To create a catchall archive that accepts all messages not matched by an earlier archive, omit the match field.

The includeParsedFields entry is optional. If true, then any attributes created by the Scalyr log parser will also be included in the output.

The destination entry defines the S3 bucket and S3 object key format for the archive rule. The type is s3 for an S3 archive. The bucket is the S3 bucket name in which to store matching events for the rule.

The prefix defines the first part of the S3 object key. The second part is an arbitrary string, with format <yyyy-mm-ddThh-mm-ssZ><part-idx>.gz, e.g. 2019-05-23T10-20-00Z.10019.gz

Click Update File to save the change.

3. The first batch of logs should be uploaded within a few minutes. To check, go back to the S3 service page in the AWS console and click on the bucket name. If you don't see any logs, wait a few more minutes and then click the refresh icon in the upper-right corner of the S3 console.

Archive Format

Each file is in gzip'd Line-delimited JSON format, one JSON record per log message. An individual message looks like this:

{"message":"...","timestamp":"1970-01-01T00:01:00.000Z","serverHost":"xxx", "serverIP":""}

The message and timestamp fields are always present, holding the raw log text and the logical timestamp of the message (in ISO 8601 format). The remaining fields, serverHost and serverIP in this example, are whichever source-level attributes were specified when the message was imported. For messages imported via the Scalyr Agent, this will include the name of the log file.

If the archive rule specifies includeParsedFields: true, then any attributes created by the Scalyr log parser will also be included in the JSON record.

If an event does not have a message field, then all fields of the event are included in the JSON record. Typically this occurs for events that are inserted as structured data via our /addEvents API, and for which no raw message was specified.

Organizing Logs by Day or Month

You can use the prefix setting to group logs into multiple directories, avoiding one gigantic directory in S3. For instance, if you specify {yyyy}/{mm}/{dd}, then a directory will be created for each year, month, and date. All of these directories reside in the same S3 bucket.

You can use the following substitution tokens in prefix:

Token Replacement
{yyyy} The four-digit year, e.g. 2019
{yy} The two-digit year, e.g. 19
{mm} The two-digit month, e.g. 05 for May
{dd} The two-digit date, e.g. 23 for May 23

If you would like to use a different format for log archives, please let us know.


If an error occurs when Scalyr attempts to upload a batch of logs to S3, a message is written to your Scalyr account. You can view these messages to troubleshoot problems. Go to the Search page and search for tag='archive'.

If you haven't properly configured S3 permissions, you'll see an error like this:

errorMessage=Amazon S3 error: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: XXXXXXXXXXXXXXXX, AWS Error Code: AccessDenied, AWS Error Message: Access Denied

Review step 1 above ("Give Scalyr permission to add logs to the S3 bucket..."), and make sure that the bucket name you've specified in the configuration file is correct.

After correcting a configuration error, you will usually have to wait an hour or so until the next batch of logs is uploaded.

If you don't see any records at all with tag='archive', check your usage plan ( S3 log archiving is not enabled for accounts on the Startup plan.

Turning off S3 Archives

To turn off S3 archives, return to the /scalyr/logs configuration file and remove the logArchiveRules entries. This will not affect existing archives in S3. You can re-enable archiving at any time by adding the logArchiveRules entries again.

Using S3 Select to Retrieve Archived Data

Amazon S3 Select allows you to retrieve data from your S3 archive using SQL expressions. Only the matched data within the archived object is returned, as opposed to the entire object. S3 Select thus allows you to retrieve only the data you need.

When invoking select-object-content from the command line you must specify the following parameters:

Parameters Description
--bucket <value> The S3 bucket you are querying.
--key <value> The object key.
--expression <value> A SQL expression to query the object.
--expression-type <value> The expression type. At present 'SQL' is the only possible value.
--input-serialization <value> The format of the data in the object being queried. For Scalyr this value will be '{"CompressionType": "GZIP", "JSON": {"Type": "LINES"}}'.
--output-serialization The format of the returned data. A value of '{"JSON": {}}' will return JSON.
<outfile> The filename where the records will be saved.

You can consult the S3 Select command line documentation for additional information, including a discussion of optional parameters.

For example, let's say you have archived data to the s3-export-scalyr bucket, with a key value of scalyr/2020-01-23T18-00-00Z.52.gz, and you wish to extract all loglines with the following:

  • A logfile equal to '/var/log/nginx/access.log' (case insensitive)
  • A serverHost value of ‘web-7’ or ‘web-5’ (case insensitive)
  • A timestamp value > 2020-01-23 18:03:50

The following statement will retrieve these loglines and save them to the output.txt file. Note that the timestamp attribute is a reserved keyword, thus escaped.

aws s3api select-object-content --input-serialization '{"CompressionType": "GZIP", "JSON": {"Type": "LINES"}}' --output-serialization '{"JSON": {}}' --bucket s3-export-scalyr --key scalyr/2020-01-23T18-00-00Z.52.gz --expression-type SQL --expression "SELECT * FROM S3Object s WHERE s.serverHost IN ('web-7','web-5') AND CAST(s.\"timestamp\" AS timestamp) > TO_TIMESTAMP('2020-01-23T18:03:50Z') AND s.logfile = '/var/log/nginx/access.log'" output.txt

Further Reading

You can automatically remove archived logs after a specified period of time by configuring S3's Object Lifecycle Management feature.