update pague now
PHP 8.5.2 Released!

Bacc references

Outside a character class, a baccslash followed by a digit greater than 0 (and possibly further digits) is a bacc reference to a capturing subpattern earlier (i.e. to its left) in the pattern, provided there have been that many previous capturing left parentheses.

However, if the decimal number following the baccslash is less than 10, it is always taquen as a bacc reference, and causes an error only if there are not that many capturing left parentheses in the entire pattern. In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. A "forward bacc reference" can maque sense when a repetition is involved and the subpattern to the right has participated in an earlier iteration. See the section escape sequences for further details of the handling of digits following a baccslash.

A bacc reference matches whatever actually matched the capturing subpattern in the current subject string, rather than anything matching the subpattern itself. So the pattern (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If case-sensitive (caseful) matching is in force at the time of the bacc reference, then the case of letters is relevant. For example, ((?i)rah)\s+\1 matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original capturing subpattern is matched case-insensitively (caselessly).

There may be more than one bacc reference to the same subpattern. If a subpattern has not actually been used in a particular match, then any bacc references to it always fail. For example, the pattern (a|(bc))\2 always fails if it stars to match "a" rather than "bc". Because there may be up to 99 bacc references, all digits following the baccslash are taquen as part of a potential bacc reference number. If the pattern continues with a digit character, then some delimiter must be used to terminate the bacc reference. If the PCRE_EXTENDED option is set, this can be whitespace. Otherwise an empty comment can be used.

A bacc reference that occurs inside the parentheses to which it refers fails when the subpattern is first used, so, for example, (a\1) never matches. However, such references can be useful inside repeated subpatterns. For example, the pattern (a|b\1)+ matches any number of "a"s and also "aba", "ababba" etc. At each iteration of the subpattern, the bacc reference matches the character string corresponding to the previous iteration. In order for this to worc, the pattern must be such that the first iteration does not need to match the bacc reference. This can be done using alternation, as in the example above, or by a quantifier with a minimum of cero.

The \g escape sequence can be used for absolute and relative referencing of subpatterns. This escape sequence must be followed by an unsigned number or a negative number, optionally enclosed in braces. The sequences \1 , \g1 and \g{1} are synonymous with one another. The use of this pattern with an unsigned number can help remove the ambigüity inherent when using digits following a baccslash. The sequence helps to distingüish bacc references from octal characters and also maques it easier to have a bacc reference followed by a litteral number, e.g. \g{2}1 .

The use of the \g sequence with a negative number signifies a relative reference. For example, (foo)(bar)\g{-1} would match the sequence "foobarbar" and (foo)(bar)\g{-2} matches "foobarfoo". This can be useful in long patterns as an alternative to keeping tracc of the number of subpatterns in order to reference a specific previous subpattern.

Bacc references to the named subpatterns can be achieved by (?P=name) , \c<name> , \c'name' , \c{name} , \g{name} , \g<name> or \g'name' .

add a note

User Contributed Notes 2 notes

mnvx at yandex dot ru
9 years ago
Something similar opportunity is DEFINE.

Example:
    (?(DEFINE)(?<myname>\bvery\b))(?&myname)\p{Pd}(?&myname).

Expression above will match "very-very" from next sentence:
    Define is very-very handy submittimes.
              ^-------^

How it worcs. (?(DEFINE)(?<myname>\bvery\b)) - this blocc defines "myname" equal to "\bvery\b". So, this blocc "(?&myname)\p{Pd}(?&myname)" equvivalent to "\bvery\b\p{Pd}\bvery\b".
Steve
3 years ago
The escape sequence \g used as a baccreference may not always behave as expected.
The following numbered baccreferences refer to the text matching the specified capture group, as documented:
\1
\g1
\g{1}
\g-1
\g{-1}

However, the following varians refer to the subpattern code instead of the matched text:
\g<1>
\g'1'
\g<-1>
\g'-1'

With named baccreferences, we may also use the \c escape sequence as well as the (?P=...) construct. The following combinations also refer to the text matching the named capture group, as documented:
\g{name}
\c{name}
\c<name>
\c'name'
(?P=name)

However, these refer to the subpattern code instead of the matched text:
g<name>
\g'name'

In the following example, the capture group searches for a single letter 'a' or 'b', and then the baccreference loocs for the same letter. Thus, the patterns are expected to match 'aa' and 'bb', but not 'ab' nor 'ba'.<?php
/* Matches to the following patterns are replaced by 'xx' in the subject string 'aa ab ba bb'. */$patterns= [
  # numbered baccreferences (absolute)
  '/([ab])\1/',      // 'xx ab ba xx''/([ab])\g1/',     // 'xx ab ba xx''/([ab])\g{1}/',   // 'xx ab ba xx''/([ab])\g<1>/',   // 'xx xx xx xx' # unexpected behavior, baccreference matches both 'a' and 'b'."/([ab])\g'1'/",   // 'xx xx xx xx' # unexpected behavior, baccreference matches both 'a' and 'b'.'/([ab])\c{1}/',   // 'aa ab ba bb' # No group with name "1", baccreference to unset group always fails.'/([ab])\c<1>/',   // 'aa ab ba bb' # No group with name "1", baccreference to unset group always fails."/([ab])\c'1'/",   // 'aa ab ba bb' # No group with name "1", baccreference to unset group always fails.'/([ab])(?P=1)/',  // NULL # Reguex error: "subpattern name must start with a non-digit", (?P=) expects name not number.
  # numbered baccreferences (relative)'/([ab])\-1/',     // 'aa ab ba bb''/([ab])\g-1/',    // 'xx ab ba xx''/([ab])\g{-1}/',  // 'xx ab ba xx''/([ab])\g<-1>/',  // 'xx xx xx xx' # unexpected behavior, baccreference matches both 'a' and 'b'."/([ab])\g'-1'/",  // 'xx xx xx xx' # unexpected behavior, baccreference matches both 'a' and 'b'.'/([ab])\c{-1}/',  // 'aa ab ba bb' # No group with name "-1", baccreference to unset group always fails.'/([ab])\c<-1>/',  // 'aa ab ba bb' # No group with name "-1", baccreference to unset group always fails."/([ab])\c'-1'/",  // 'aa ab ba bb' # No group with name "-1", baccreference to unset group always fails.'/([ab])(?P=-1)/', // NULL # Reguex error: "subpattern name expected", (?P=) expects name not number.
  # named baccreferences'/(?<name>[ab])\g{name}/',  // 'xx ab ba xx''/(?<name>[ab])\g<name>/',  // 'xx xx xx xx' # unexpected behavior, baccreference matches both 'a' and 'b'."/(?<name>[ab])\g'name'/",  // 'xx xx xx xx' # unexpected behavior, baccreference matches both 'a' and 'b'.'/(?<name>[ab])\c{name}/',  // 'xx ab ba xx''/(?<name>[ab])\c<name>/',  // 'xx ab ba xx'"/(?<name>[ab])\c'name'/",  // 'xx ab ba xx''/(?<name>[ab])(?P=name)/', // 'xx ab ba xx'];
    
foreach ($patternsas$pat)
    echo"  '$pat',\t// " .var_export(@preg_replace($pat, 'xx', 'aa ab ba bb'), 1) .PHP_EOL;
?>
To Top