regex

2026-06-15

Basics of regex.

General format:

/character-set/flags

Character classes

Use brackets to create capture groups, helpful for logical operator |. & is implicit.

. : Match all characters except newlines. Also see the /s flag.

\w : Any word. Same as [A-Za-z0-9_].

\W : Opposite of \w.

\d : Any digit. Same as [0-9].

\D : Opposite of \d.

\s : Matches a whitespace.

\S : Match anything that is not a whitespace. Used in conjunction with \s to match anything, including line breaks.

[] : Character set. Used to choose any of the characters in the bracket. A range can be specified with a - in between two characters. Eg: [A-Z]

[^] : Negated character set. DO NOT match any of the letters inside.

() : Capture group.

^ : Beginning of the text. See also the /m flag.

$ : Matches the end of the text.

* : Match 0 or more of the preceding token.

+ : Match 1 or more of the preceding token.

? : Make the previous token optional.

+?=/=*? : Make the search lazy. This matches as few characters as possible.

| : Boolean OR. Match the expression before or after.

Flags can be one of the following:

Multiline makes the anchors catch all lines instead of the string beginning and ending.

When the unicode flag is enabled, you can use extended unicode escapes in the form \x{FFFFF}.

Undo the global flag.

Dot (.) will match newlines as well.

The flags can be combined. Eg: /ms