Fandom

Документация к библиотекам

Pcre fullinfo

38pages on
this wiki
Add New Page
Comments0 Share

Извлекает информацию о шаблонеEdit

int pcre_fullinfo(const pcre *code, const pcre_extra *extra,
	int what, void *where);

Функция возвращает информацио о скомпилированном регулярном выражении. Эта функция является заменой устаревшей pcre_info, which is neverthe- less retained for backwards compability (and is documented below).

Первый аргумент для pcre_fullinfo() - указатель на скомпилированный шаблон. Второй аргумент - результат работы pcre_study или NULL, если шаблон не анализировался. Третmbv аргументом указывается, какую информацию требуется положить в память, на которую ссылается 4й аргумент. Функция возвращает 0 в случае успеха, в противном случае - отрицательные числа-коды ошибки:

PCRE_ERROR_NULL       the argument code was NULL

the argument where was NULL

PCRE_ERROR_BADMAGIC   the "magic number" was not found
PCRE_ERROR_BADOPTION  the value of what was invalid

The "magic number" is placed at the start of each compiled pattern as an simple check against passing an arbitrary memory pointer. Here is a typical call of pcre_fullinfo(), to obtain the length of the compiled pattern:

 int rc;
 size_t length;
 rc = pcre_fullinfo(
   re,               /* result of pcre_compile() */
   pe,               /* result of pcre_study(), or NULL */
   PCRE_INFO_SIZE,   /* what is required */
   &length);         /* where to put the data */

The possible values for the third argument are defined in pcre.h, and are as follows:

PCRE_INFO_BACKREFMAX

Return the number of the highest back reference in the pattern. The fourth argument should point to an int variable. Zero is returned if there are no back references.

PCRE_INFO_CAPTURECOUNT

Return the number of capturing subpatterns in the pattern. The fourth argument should point to an int variable.

PCRE_INFO_DEFAULT_TABLES

Return a pointer to the internal default character tables within PCRE. The fourth argument should point to an unsigned char * variable. This information call is provided for internal use by the pcre_study() func- tion. External callers can cause PCRE to use its internal tables by passing a NULL table pointer.

PCRE_INFO_FIRSTBYTE

Return information about the first byte of any matched string, for a non-anchored pattern. The fourth argument should point to an int vari- able. (This option used to be called PCRE_INFO_FIRSTCHAR; the old name is still recognized for backwards compatibility.)

If there is a fixed first byte, for example, from a pattern such as (cat|cow|coyote), its value is returned. Otherwise, if either

(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch starts with "^", or

(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set (if it were set, the pattern would be anchored),

-1 is returned, indicating that the pattern matches only at the start of a subject string or after any newline within the string. Otherwise -2 is returned. For anchored patterns, -2 is returned.

PCRE_INFO_FIRSTTABLE

If the pattern was studied, and this resulted in the construction of a 256-bit table indicating a fixed set of bytes for the first byte in any matching string, a pointer to the table is returned. Otherwise NULL is returned. The fourth argument should point to an unsigned char * vari- able.

PCRE_INFO_HASCRORLF

Return 1 if the pattern contains any explicit matches for CR or LF characters, otherwise 0. The fourth argument should point to an int variable. An explicit match is either a literal CR or LF character, or \r or \n.

PCRE_INFO_JCHANGED

Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise 0. The fourth argument should point to an int variable. (?J) and (?-J) set and unset the local PCRE_DUPNAMES option, respectively.

PCRE_INFO_LASTLITERAL

Return the value of the rightmost literal byte that must exist in any matched string, other than at its start, if such a byte has been recorded. The fourth argument should point to an int variable. If there is no such byte, -1 is returned. For anchored patterns, a last literal byte is recorded only if it follows something of variable length. For example, for the pattern /^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value is -1.

PCRE_INFO_MINLENGTH

If the pattern was studied and a minimum length for matching subject strings was computed, its value is returned. Otherwise the returned value is -1. The value is a number of characters, not bytes (this may be relevant in UTF-8 mode). The fourth argument should point to an int variable. A non-negative value is a lower bound to the length of any matching string. There may not be any strings of that length that do actually match, but every string that does match is at least that long.

PCRE_INFO_NAMECOUNT
PCRE_INFO_NAMEENTRYSIZE
PCRE_INFO_NAMETABLE

PCRE supports the use of named as well as numbered capturing parenthe- ses. The names are just an additional way of identifying the parenthe- ses, which still acquire numbers. Several convenience functions such as pcre_get_named_substring() are provided for extracting captured sub- strings by name. It is also possible to extract the data directly, by first converting the name to a number in order to access the correct pointers in the output vector (described with pcre_exec() below). To do the conversion, you need to use the name-to-number map, which is described by these three values.

The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each entry; both of these return an int value. The entry size depends on the length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first entry of the table (a pointer to char). The first two bytes of each entry are the number of the capturing parenthe- sis, most significant byte first. The rest of the entry is the corre- sponding name, zero terminated.

The names are in alphabetical order. Duplicate names may appear if (?| is used to create multiple groups with the same number, as described in the section on duplicate subpattern numbers in the pcrepattern page. Duplicate names for subpatterns with different numbers are permitted only if PCRE_DUPNAMES is set. In all cases of duplicate names, they appear in the table in the order in which they were found in the pat- tern. In the absence of (?| this is the order of increasing number; when (?| is used this is not necessarily the case because later subpat- terns may have lower numbers.

As a simple example of the name/number table, consider the following pattern (assume PCRE_EXTENDED is set, so white space - including new- lines - is ignored):

(?<date> (?<year>(\d\d)?\d\d) -
(?<month>\d\d) - (?<day>\d\d) )

There are four named subpatterns, so the table has four entries, and each entry in the table is eight bytes long. The table is as follows, with non-printing bytes shows in hexadecimal, and undefined bytes shown as ??:

 00 01 d  a  t  e  00 ??
 00 05 d  a  y  00 ?? ??
 00 04 m  o  n  t  h  00
 00 02 y  e  a  r  00 ??

When writing code to extract data from named subpatterns using the name-to-number map, remember that the length of the entries is likely to be different for each compiled pattern.

PCRE_INFO_OKPARTIAL

Return 1 if the pattern can be used for partial matching with pcre_exec(), otherwise 0. The fourth argument should point to an int variable. From release 8.00, this always returns 1, because the restrictions that previously applied to partial matching have been lifted. The pcrepartial documentation gives details of partial match- ing.

PCRE_INFO_OPTIONS

Return a copy of the options with which the pattern was compiled. The fourth argument should point to an unsigned long int variable. These option bits are those specified in the call to pcre_compile(), modified by any top-level option settings at the start of the pattern itself. In other words, they are the options that will be in force when matching starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED.

A pattern is automatically anchored by PCRE if all of its top-level alternatives begin with one of the following:

^     unless PCRE_MULTILINE is set
\A    always
\G    always
.*    if PCRE_DOTALL is set and there are no back

references to the subpattern in which .* appears

For such patterns, the PCRE_ANCHORED bit is set in the options returned by pcre_fullinfo().

PCRE_INFO_SIZE

Return the size of the compiled pattern, that is, the value that was passed as the argument to pcre_malloc() when PCRE was getting memory in which to place the compiled data. The fourth argument should point to a size_t variable.

PCRE_INFO_STUDYSIZE

Return the size of the data block pointed to by the study_data field in a pcre_extra block. That is, it is the value that was passed to pcre_malloc() when PCRE was getting memory into which to place the data created by pcre_study(). If pcre_extra is NULL, or there is no study data, zero is returned. The fourth argument should point to a size_t variable.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.