Click here to Skip to main content
16,004,778 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
My patterns are valid UTF encoded and the input subjects may or may not be valid UTF,the application worked fine for UTF formatted subjects and crashed for Big5 and ISO-8859-1 subjects during matching.

pcre2unicode specification[^]

The above site suggested on using flag PCRE2_MATCH_INVALID_UTF during compile time but it didnt work for me.

So does PCRE2 really work without crash when compiled with PCRE2_MATCH_INVALID_UTF flag?

What I have tried:

My sample code:

int rc;
pcre2_code *re;
pcre2_match_data *match_data;
pcre2_match_context *mcontext;
pcre2_jit_stack *jit_stack;

re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, PCRE2_MATCH_INVALID_UTF,
&errornumber, &erroffset, NULL);
rc = pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);
mcontext = pcre2_match_context_create(NULL);
jit_stack = pcre2_jit_stack_create(32*1024, 512*1024, NULL);
pcre2_jit_stack_assign(mcontext, NULL, jit_stack);
match_data = pcre2_match_data_create(re, 10);
rc = pcre2_jit_match(re, subject, length, 0, 0, match_data, mcontext);//Application crashes here for invalid UTF subjects

pcre2_code_free(re);
pcre2_match_data_free(match_data);
pcre2_match_context_free(mcontext);
pcre2_jit_stack_free(jit_stack);
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900