Regex

Scalyr uses regular expressions to match and extract patterns from your log data. Regular expressions which are compatible with the traditional Java and Python libraries are supported, however, there are some unsupported operations (more details below). For more information, please review the following tips / best practices.

Conventions

  • Scalyr uses java.util.regex as its regex library.
  • Scalyr uses the 're' Python library.
  • Scalyr Regex is case insensitive, so [A-Z] = [a-z].
  • Group naming is not supported.
  • In the Scalyr Agent Redaction Rules. \\1 and \\2 is the group back reference - syntax. More information can be found here: app.scalyr.com/help/scalyr-agent#redaction.
  • In parsers, $1 and $2 is the group back reference syntax. The $0 group is also supported by rewrite rules.

Testing Parsers

You can test parsers with free text here:

You can also test parsers using your live data. Although this page looks similar to the "free text" parser above, it allows you to debug the workings of your format statements. Access this editor by:

  • Clicking Account > Parsers and choosing the "Edit" or "Create" buttons, or
  • Select a log line in Search view and clicking "Inspect Fields", then "Edit Parser"
  • Video Explanation: https://youtu.be/uNOiu8CVnJU?t=161.

Lookaheads / Lookbehinds / Lookarounds

We restrict certain functions. See list below:

Searches

  • When performing searches, $"regex" searches the message field. For example, $"tomcat" would match all log lines with "tomcat" in the message field. For those who are not yet familiar with it, the message field contains the original log line in its entirety
  • This is shorthand for message matches "regex"
  • Double escaping is required everywhere except the $"regex" syntax in Search and PowerQueries, and when entering matches while Log Processing from your Account.
  • In the shorthand format, the regex does not need to be escaped. For example, $"\d+\.\d+\.\d+\.\d+"
  • In the full syntax, the regex needs to be escaped. For example, $message matches "\\d+\\.\\d+\\.\\d+\\.\\d+"

Characters

Character Legend Example Sample Match
\d one digit from 0 to 9 log_\\d\\d log_25
\w "word character": ASCII letter, digit or underscore \\w-\\w\\w\\w A-b_1
\s "whitespace character": space, tab, newline, carriage return, vertical tab a\\sb\\sc a bc
\D One character that is not a digit as defined by \\d \\D\\D\\D ABC
\W One character that is not a word character as defined by \\w \\W\\W\\W\\W\\W *-+=)
\S One character that is not a whitespace character as defined by your engine's \s \\S\\S\\S\\S Yoyo
\u\X Match specific or ranges of unicode characters. See chart .*[\u00A8] ‰pò†…¨2020-08-28T20:28:34.343-0500

Quantifiers

Quantifier Legend Example Sample Match
+ One or more Version \\w-\\w+ Version A-b1_1
{3} Exactly three times \\D{3} ABC
{2,4} Two to four times \\d{2,4} 156
{3,} Three or more times \\w{3,} regex_tutorial
* Zero or more times A*B*C* AAACC
? Once or none plurals? plural

More Characters

Character Legend Example Sample Match
. Any character except line break a.c abc
. Any character except line break .* whatever, man.
\. A period (special character: needs to be escaped by a \) a\.c a.c
\ Escapes a special character \\.\\*\\+\\? \\$\\^\\\\/ .*+? $^\/
\ Escapes a special character \\[\\{\\(\\)\\}\\] [{()}]

Logic

Logic Legend Example Sample Match
| Alternation / OR operand 22|33 33
( … ) Capturing group A(nt|pple) Apple (captures "pple")
agent: \1 parser: $1 Contents of Group 1 parser - r(\\w)g$1x agent - r(\\w)g\\1x regex
agent: \2 parser: $2 Contents of Group 2 parser - r(\\w)g$1x2 agent - r(\\w)g\\1x2 regex2

More White-Space

Character Legend Example Sample Match
\n New line stack trace\ntrace stack trace

More Quantifiers

Quantifier Legend Example Sample Match
+ The + (one or more) is "greedy" \d+ 12345
? Makes quantifiers "lazy" \d+? 1 in 12345
* The * (zero or more) is "greedy" A* AAA
? Makes quantifiers "lazy" A*? empty in AAA
{2,4} Two to four times, "greedy" \w{2,4} abcd
? Makes quantifiers "lazy" \w{2,4}? ab in abcd

Character Classes

Character Legend Example Sample Match
[ … ] One of the characters in the brackets [AEIOU] One uppercase vowel
[ … ] One of the characters in the brackets T[ao]p Tap or Top
- Range indicator [a-z] One lowercase letter
[x-y] One of the characters in the range from x to y [A-Z]+ GREAT
[ … ] One of the characters in the brackets [AB1-5w-z] One of either: A,B,1,2,3,4,5,w,x,y,z
[x-y] One of the characters in the range from x to y [ -~]+ Characters in the printable section of the ASCII table.
[^x] One character that is not x [^a-z]{3} A1!
[^x-y] One of the characters not in the range from x to y [^ -~]+ Characters that are not in the printable section of the ASCII table.
[\d\D] One character that is a digit or a non-digit [\\d\\D]+ Any characters, including new lines, which the regular dot doesn't match

Anchors and Boundaries

Anchor Legend Example Sample Match
^ or "regex" in parser format Start of string or start of line depending on multiline mode. (But when [^inside brackets], it means "not") ^start.*end$ or "start.*the end" abc (line start)
$ or "regex" in a parser format End of string or end of line depending on multiline mode. Many engine-dependent subtleties. .*? the end$ OR ".*the end" this is the end
\b position where one side only is an ASCII letter, digit or underscore Bob.*\bcat\b Bob ate the cat