Increase Scalyr Agent upload throughput, using multiple Scalyr team accounts. (BETA)

Note, the use of multiple sessions to upload logs to Scalyr is currently in beta. Please be sure to test use of this feature before deploying it widely in your production environment.

Multi-session configuration

Use multiple sessions.

In a default configuration, the Scalyr Agent creates a single session with the Scalyr servers to upload all collected logs. This typically performs well enough for most users, but at some point, the log generation rate can exceed a single session's capacity (typically 2 to 8 MB/s depending on line size and other factors). You may increase overall throughput by increasing the number of sessions using the default_sessions_per_worker option in the agent configuration. By default, the option value is 1.

The main idea of the multi-session configuration is to spread the produced load among multiple upload sessions.

For example:

{
    "default_sessions_per_worker": 3,
    "api_key": "<you_key>",
    "logs": [
      {
        "path": "/var/log/app/*.log",
      },
    ]
}

Here all the matched log files will be distributed among three independent upload sessions. Each of them will upload only a subset of the log files, thereby reducing the load on a particular session.

Use multiprocess workers.

Even if the multiple sessions are created, by default, they will still run within the same Python process, limiting their resources to a single CPU core. This may become a problem when the number of log files is too big to be handled by a single CPU core, especially if additional features (such as sampling and redaction rules) are enabled.

This limitation can be addressed using the use_multiprocess_workers option in the agent configuration.

A worker is responsible for uploading a subset of the log files using one or more sessions. To learn more about workers, please see the Multiple workers section. If use_multiprocess_workers is false (this is the default value), then the worker executes all of its sessions in a single process."

{
    "use_multiprocess_workers": true,
    "default_sessions_per_worker": 3,
    "api_key": "<you_key>",
    "logs": [
      {
        "path": "/var/log/app/*.log",
      },
    ]

}

Now the worker will upload each session in its own process and does not share the same CPU with other sessions.

Multiple workers.

As mentioned above, a worker is responsible for uploading a subset of the log files. By default, the agent only uses one - default worker and it is responsible for uploading all log files. However, you can increase the number of workers and control which log files each uploads.

Each worker can be configured to use its own API key, so all sessions withing the worker will upload logs using this API key. The default worker uses API key which is defined in the api_key field of the configuration.

As an example, the multiple workers feature is useful when is it necessary to upload particular logs to different Scalyr team accounts to logically split the logs between teams and also to improve access control.

Adding new workers.

Each additional worker has to be defined in the list in the workers section.

"workers": [
     {
         "api_key": "<your_second_key>",
         "id": "frontend_team",
     },
     {
         "api_key": "<your_third_key>",
         "id": "backend_team",
     }
]

Possible options for each element of the workers list:

Field required Description
api_key yes The API key token.
id yes The identifier of the worker. Must be unique. String.
sessions no The number of sessions for the worker. By default it has the same value as the default_sessions_per_worker option.

You may also split the definition of the workers field across multiple configuration files in the agent.d directory. The agent will join together all entries in all workers fields defined.

NOTE: The default worker, which is created automatically using the api_key, also has it's own implicit identifier - "default".

The identifier "default" is reserved by the default worker key and can not be used by other workers. For example, the next configuration is invalid.

{
    "api_key": "<main_key>",
    "workers": [
         {
             "api_key": "<second_key>",
             "id": "default",  // WRONG, another worker, which is using the <main_key> API key, already has the identifier "default".
         },
    ]
}

As an exception, the "default" identifier can be used to overwrite the default settings of the default worker.

{
    "api_key": "<main_key>",
    "workers": [
         {
             "id": "default",
             "sessions": 3 // change the sessions number from default - 1 to 3.
         },
         {
             "api_key": "<second_key>",
             "id": "second",
         },
    ]
}

Associating logs with workers.

Each element of the logs section can be associated with a particular worker by specifying the worker_id field.

NOTE: If the worker_id is omitted, then the default worker is used.

"logs": [
    {
       "path": "/var/log/frontend/*.log",
       "worker_id": "frontend_team" // the worker with identifier - "frontend_team" is used.
    }
    {
       "path": "/var/log/backend/*.log"
       // the "worker_id" is omitted, the worker with "default" identifier is used.
    }
]

Full configuration example.

Let's say that there are two teams: "messaging" and "queue" and the agent should upload some logs using the API key of the "messaging" team and upload some logs using the API key of the "queue" team.

The resulting configuration:

{
    "api_key": "<messaging_team_api_key>", // the "messaging" team uses the API key which is used by the default worker.
    "workers": [
        {
            // Since the id - "default", we do not create new worker, but change the default values of the default worker.
            "id": "default",
            "sessions": 4 // increase the number of sessions from 1 to 4.
        },
        {
            // Create new worker for the "queue" team, which uses another API key.
            "api_key": "<queue_team_api_key>",
            "id": "queue",
            "sessions": 2 // set the number of sessions for the API key of the "queue" team
        }
     ],
    "logs": [
      {
        "path": "/var/log/app/messaging/*.log", // those logs are uploaded to the "messaging" team's account.
      },
      {
        "path": "/var/log/app/queue/*.log" // those logs now are uploaded to the "queue" team's account.
        "worker_id": "queue_team_key" // refers to the <queue_team_api_key> worker.
      }
    ]
}