Built-In Parsers

Scalyr includes a suite of standard log parsers. If your log matches one of the following formats, use the corresponding parser name when Uploading Logs and they will be parsed automatically.

You can view the built-in configuration files below. Click on the links in the table, or scroll down the page. To view more configuration files, you can consult our Pre-Built Parsers.

Please note that double escaping regex elements is required almost everywhere at Scalyr, including when specifying a parser. See Regex for more information.

Parser Log format
accessLog Standard web access logs ("extended" Apache format)
mysqlGeneralQueryLog MySQL "general" query log
mysqlSlowQueryLog MySQL slow query log
postgresLog Postgres log
json or dottedJson Logs where each line is a JSON object
keyValue Logs containing fields in the form key=value or key="value"
systemLog Supports several common system or error log formats
leveldbLog LevelDB LOG files

Standard Web Access Logs ("Extended" Apache Format)

// "accessLog" built-in parser for standard web access logs ("extended" Apache format)

{
  attributes: {
    // Tag all events parsed with this parser so we can easily select them in queries.
    dataset: "accesslog"
  },

  formats: [
    // Extended format including referrer, user-agent, and response time.
    {
      format: "$ip$ $user$ $authUser$ \\[$timestamp$\\] \"$method$ $uri{parse=uri}$ $protocol$\" $status$ $bytes$ $referrer=quotable$ $agent=quotable$ $time=number$",
      halt: true
    },

    // Format including referrer and user-agent (but no response time)
    {
      format: "$ip$ $user$ $authUser$ \\[$timestamp$\\] \"$method$ $uri{parse=uri}$ $protocol$\" $status$ $bytes$ $referrer=quotable$ $agent=quotable$",
      halt: true
    },

    // Including referrer and user-agent, but with no separate method, uri, and protocol. Sometimes
    // observed for invalid or incomplete requests.
    {
      format: "$ip$ $user$ $authUser$ \\[$timestamp$\\] \"$header$\" $status$ $bytes$ $referrer=quotable$ $agent=quotable$",
      halt: true
    },

    // Basic format with no referrer or user-agent
    {
      format: "$ip$ $user$ $authUser$ \\[$timestamp$\\] \"$method$ $uri{parse=uri}$ $protocol$\" $status$ $bytes$",
      halt: true
    },

    // Basic format, with no separate method, uri, and protocol.
    {
      format: "$ip$ $user$ $authUser$ \\[$timestamp$\\] \"$header$\" $status$ $bytes$",
      halt: true
    }
  ]
}

MySQL "General" Query Logs

// "mysqlGeneralQueryLog" built-in parser for MySQL "general" query logs.
// See http://dba.stackexchange.com/questions/42532/format-of-mysql-query-log
// for a brief discussion of the log format.

{
 attributes: {
   // Tag all events parsed with this parser so we can easily select them in queries.
   dataset: "mysqlGeneralQueryLog"
 },

 // MySQL takes an unusual approach to timestamps in the log. Timestamps appear only when time advances to
 // the next full second. Thus, many log entries don't contain a timestamp. This flag tells the parser to
 // apply each timestamp to all subsequent log entries (until the next timestamp).
 intermittentTimestamps: true,

 patterns: {
   mySqlTimestamp: "[0-9]+ [0-9:]+"
 },

 formats: [
   // Format for a line which contains a timestamp.
   "$timestamp=mySqlTimestamp$ $thread_id=number$ $command$ $body{parse=sqlToSignature}$",

   // Format for a line which does not contain a timestamp.
   " $thread_id=number$ $command$ $body{parse=sqlToSignature}$"
 ]
}

MySQL Slow Query Logs

// "mysqlSlowQueryLog" built-in parser, for MySQL slow query logs.

{
  attributes: {
    // Tag all events parsed with this parser so we can easily select them in queries.
    dataset: "mysqlSlowQueryLog"
  },

  // MySQL takes an unusual approach to timestamps in the log. Timestamps appear on a line of their own,
  // and only when time advances to the next full second. Thus, some log entries don't contain a timestamp.
  // This flag tells the parser to apply each timestamp to all subsequent log entries (until the next timestamp).
  intermittentTimestamps: true,

  // For each query, MySQL places the user/host information, execution statistics, and query text on separate lines.
  // Here, we group them back together. A group starts with a line beginning with "# User", and continues until the
  // next such line or a timestamp line.
  lineGroupers: [
    {
      start: "^# User",
      haltBefore: "^# (User|Time)"
    }
  ],

  formats: [
    // Timestamps appear on a line by themselves.
    "# Time: $timestamp$",

    // Parse the user / host line.
    "# User@Host: $user$\\[$user2$\\] @ $host$ ",

    // Parse the execution statistics line. Note that we begin the format with ".*" to skip over the user/host data,
    // which is logically treated as part of the same log message.
    ".*# Query_time: $queryTime$ Lock_time: $lock_time$ Rows_sent: $rowsSent$ Rows_examined: $rowsExamined=digits$",

    // Parse the query text, which is everything after the Rows_examined statistic.
    ".*Rows_examined: [0-9]+[\\s]*$query$",

    // Parse the query text again, this time applying a normalizing filter to extract a signature. This allows similar
    // queries (differing only in constant values) to be grouped for analysis.
    ".*Rows_examined: [0-9]+[\\s]*$signature{parse=sqlToSignature}$"
  ]
}

Postgres Logs

//"postgresLog" built-in parser for Postgres logs.

{
  lineGroupers: [
 	  {
      // group between timestamps - grabs sql queries
      start: "^[\\d-]+\\s+[\\d:]+\\s+UTC",
      haltBefore: "^[\\d-]+\\s+[\\d:]+\\s+UTC"
    },
    {
      // groups line used to display stats
      start: "^[^\\s]",
      continueThrough: "^[\\s]+!"
    }
  ],

  patterns: {
    // timestamp pattern as logged: "2014-04-10 02:14:35 UTC" - Contact support@scalyr.com if you want to use other timezones.
    ts_pattern: "\\d+-\\d+-\\d+\\s+\\d+:\\d+:\\d+ UTC",

    // default prefix used by RDS is "%t:%r:%u@%d:[%p]:"
    // this pattern is not used anywhere and is for reference only
    prefix_pattern: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:",

    stats_pattern: "!\\s+system usage stats:\\s+!.+$"
  },
  formats: [
    {
      id: "statement",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:STATEMENT: $query{parse=sqlToSignature}$",
      halt: true
    },
    {
      id: "statement_log",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+statement: $query{parse=sqlToSignature}$",
      halt: true
    },
    {
      // enabled via log_duration or log_min_duration_statement
      // we correlate on the process_id since session id cannot be logged as of this writing
      // 2014-04-10 02:14:35 UTC:hostname(34025):user@database:[32414]:LOG:  duration: 6.399 ms
      id: "duration",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+duration: $duration_ms$ ms",
      halt: true
    },
    {
      // enabled via log_connections
      id: "connection_received",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+connection received: host=$connection_host$ port=$connection_port$",
      halt: true
    },
    {
      // enabled via log_connections
      id: "connection_authorized",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+connection authorized: user=$connection_user$ database=$connection_database$",
      halt: true
    },
    {
      // enabled via log_disconnections
      // session_time is formatted as H:mm:SS.sss for example 0:06:36.161
      id: "disconnection",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+disconnection: session time:$session_time$ user=$connection_user$ database=$connection_database$ host=$connection_host$ port=$connection_port$",
      halt: true
    },
    {
      // enabled via log_parser_stats, log_planner_stats, log_statement_stats
      // this log line is the will appear before the stats details (below)
      id: "statistics_start",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+$stats_type$ STATISTICS",
      association: {tag: "statistics", keys: ["process_id"], store: ["stats_type"]}
    },
    {
      // statistics details - see above
      id: "statistics_end",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+$stats=stats_pattern$",
      association: {tag: "statistics", keys: ["process_id"], fetch: ["stats_type"]},
      halt: true
    },
    {
      // all the session related logs which haven't been mapped above
      id: "session",
      format: "$timestamp=ts_pattern$:$remote_hostname$:$user$@$db_name$:\\[$process_id$\\]:$severity$:\\s+$details$",
      halt: true
    },
    {
      // this is basically maintenance operations such as checkpoint
      // typically:
      // 2014-04-10 01:32:38 UTC::@:[23929]:LOG:  checkpoint starting: time
      id: "default",
      format: "$timestamp=ts_pattern$::@:\\[$process_id$\\]:$severity$: $details$"
    }
  ]
}

JSON

// "json" built-in parser for logs where each line is a JSON object.

{
  attributes: {
    // Tag all events parsed with this parser so we can easily select them in queries.
    dataset: "json"
  },

  formats: [
    {format: "${parse=json}$", repeat: true}
  ]
}

The dottedJson parser is similar, but uses {parse=dottedJson}. This affects the treatment of nested records; json uses camelCase while dottedJson uses dotted names. For instance, if the input contains {"foo": {"bar": 123}}, the json parser will create a field named fooBar, while dottedJson will use foo.bar.

Simple Key/Value Pairs

// "keyValue" built-in parser for logs containing fields in the form key=value or key="value"

{
  attributes: {
    // Tag all events parsed with this parser so we can easily select them in queries.
    dataset: "keyValue"
  },

  formats: [
    // Look for an optional timestamp at the beginning of the line, enclosed in square brackets.
    "\\[$timestamp$\\]",

    // Look for entries of the form key=value. This pattern can match multiple key/value pairs per line. The
    // value can optionally be enclosed in double-quotes; if it is not quoted, then it runs until the next whitespace
    // character (or the end of the line).
    {format: ".*$_=identifier$=$_=quoteOrSpace$", repeat: true}
  ]
}

Standard System Logs

// "systemLog" built-in parser. Supports several common system or error log formats.

{
  attributes: {
    // Tag all events parsed with this parser so we can easily select them in queries.
    dataset: "systemlog"
  },

  patterns: {
    timestamp: "([a-z]+\\s+[0-9]+\\s+[0-9:]+)|"    // e.g. Feb  3 03:47:01
             + "(\\d{4}-\\d{2}-\\d+T\\d{2}:\\d{2}:\\d{2}.\\d+\\+\\d+:\\d{2})"  // e.g. 2013-03-20T00:53:26.942569+00:00
  },

  formats: [
    // Process name plus ID. Examples:
    // 2013-03-19T12:25:16.267245+00:00 ip-10-11-222-111 auditd[14957]: Audit daemon rotating log files
    // Feb  3 13:17:00 host-1 dhclient[1576]: DHCPREQUEST on eth0 to 169.108.1.0 port 67 (xid=0x323f0123)
    {
      format: "$timestamp=timestamp$ $host$ $process$\\[$procid$\\]: $text$",
      halt: true
    },

    // Process name with no ID. Examples:
    // Feb  3 03:47:01 host-1 rsyslogd: [origin software="rsyslogd" swVersion="5.8.03" x-pid="1631" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
    // Mar 17 04:34:12 li58-102 dhclient: DHCPREQUEST on eth0 to 206.192.11.29 port 67
    {
      format: "$timestamp=timestamp$ $host$ $process$: $text$",
      halt: true
    }
  ]
}

LevelDB LOG Files

// "leveldbLog" built-in parser for LevelDB LOG files.
//
// See db/db_impl.cc and db/version_set.cc in the LevelDB source for much of the logging code whose output we parse here.
{
  attributes: {
    // Tag all events parsed with this parser so we can easily select them in queries.
    dataset: "leveldb"
  },

  formats: [
    // Extract the timestamp and session identifier from each line. Place the remaining text in a field named "details".
    "$timestamp$ $session$ $details$",

    // Parse "Generated table" messages — useful for graphing # of keys, bytes.
    ".*Generated table [^:]+: $newTableKeys$ keys, $newTableBytes$ bytes",

    // Parse "compacting" messages. We put the number of input files into a variable named
    // compact0Files, compact1Files, etc. depending on which level it came from. We do this
    // so that we can then generate graphs of compactions at each level.
    ".*Compacting $compact0Files=digits$@0",
    ".*Compacting $compact1Files=digits$@1",
    ".*Compacting $compact2Files=digits$@2",
    ".*Compacting $compact3Files=digits$@3",
    ".*Compacting $compact4Files=digits$@4",
    ".*Compacting $compact5Files=digits$@5",
    ".*Compacting $compact6Files=digits$@6",
    ".*Compacting $compact7Files=digits$@7",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact0Files$@0 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact1Files$@1 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact2Files$@2 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact3Files$@3 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact4Files$@4 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact5Files$@5 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact6Files$@6 files",
    ".*Compacting [0-9]+@[0-9]+ \\+ $compact7Files$@7 files",

    // Parse "compacted" messages. These contain the number of output bytes, useful for graphing
    // compaction rates by level.
    { format: ".*Compacted $compactFiles1$@$level1$ \\+ $compactFiles2$@$level2$ files => $compactionSizeBytes$ bytes",
      attributes: { tag: "compactionResult"}
    },

    // Parse "expanding" messages.
    { format: ".*Expanding@$level$ $input1$\\+$input2$ \\($input1Size$\\+$input2Size$ bytes\\) "
            + "to $output1$\\+$output2$ \\($output1Size$\\+$output2Size$ bytes\\)",
      attributes: { tag: "compactionResult"}
    }
  ]
}