KSES: Fix tests and detection of HTML Bogus Comment spans.

In [58418] a test was added without the `test_` prefix in its function
name, and because of that, it wasn't run in the test suite.
The prefix has been added to ensure that it runs.

In the original patch, due to a logical bug, a recursive loop to
transform the inside contents of the bogus comments was never run
more than once. This has been fixed.

This patch also includes one more case where `kses` wasn't
properly detecting the bogus comment state, and adds a test case
to cover this. It limits itself to some but not all constructions
of invalid markup declaration so that it doesn't conflict with
existing behaviors around those and other kinds of invalid comments.

Props ellatrix, dmsnell.
See #61009.
Follow-up to [58418].

Built from https://develop.svn.wordpress.org/trunk@58424


git-svn-id: http://core.svn.wordpress.org/trunk@57873 1a063a9b-81f0-0310-95a4-ce76da25c4cd
This commit is contained in:
dmsnell 2024-06-17 12:04:12 +00:00
parent 963175f228
commit 9c25b9d9b8
2 changed files with 23 additions and 12 deletions

View File

@ -988,6 +988,9 @@ function wp_kses_split( $content, $allowed_html, $allowed_protocols ) {
(<!--.*?(-->|$)) # - Normative HTML comments.
|
</[^a-zA-Z][^>]*> # - Closing tags with invalid tag names.
|
<![^>]*> # - Invalid markup declaration nodes. Not all invalid nodes
# are matched so as to avoid breaking legacy behaviors.
)
|
(<[^>]*(>|$)|>) # Tag-like spans of text.
@ -1114,22 +1117,30 @@ function wp_kses_split2( $content, $allowed_html, $allowed_protocols ) {
}
/*
* When a closing tag appears with a name that isn't a valid tag name,
* it must be interpreted as an HTML comment. It extends until the
* first `>` character after the initial opening `</`.
* When certain invalid syntax constructs appear, the HTML parser
* shifts into what's called the "bogus comment state." This is a
* plaintext state that consumes everything until the nearest `>`
* and then transforms the entire span into an HTML comment.
*
* Preserve these comments and do not treat them like tags.
*
* @see https://html.spec.whatwg.org/#bogus-comment-state
*/
if ( 1 === preg_match( '~^</[^a-zA-Z][^>]*>$~', $content ) ) {
$content = substr( $content, 2, -1 );
$transformed = null;
if ( 1 === preg_match( '~^(?:</[^a-zA-Z][^>]*>|<![a-z][^>]*>)$~', $content ) ) {
/**
* Since the pattern matches `</…>` and also `<!…>`, this will
* preserve the type of the cleaned-up token in the output.
*/
$opener = $content[1];
$content = substr( $content, 2, -1 );
while ( $transformed !== $content ) {
$transformed = wp_kses( $content, $allowed_html, $allowed_protocols );
$content = $transformed;
}
do {
$prev = $content;
$content = wp_kses( $content, $allowed_html, $allowed_protocols );
} while ( $prev !== $content );
return "</{$transformed}>";
// Recombine the modified inner content with the original token structure.
return "<{$opener}{$content}>";
}
/*

View File

@ -16,7 +16,7 @@
*
* @global string $wp_version
*/
$wp_version = '6.6-beta2-58423';
$wp_version = '6.6-beta2-58424';
/**
* Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.