HTML API: Avoid calling subclass method while internally scanning in Tag Processor.

After modifying tags in the HTML API, the Tag Processor backs up to before the tag being modified and then re-parses its attributes. This saves on the code complexity involved in applying updates, which have already been transformed to “lexical updates” by the time they are applied.

In order to do that, `::get_updated_html()` called `::next_tag()` to reuse its logic. However, as a public method, subclasses may change the behavior of that method, and the HTML Processor does just this. It maintains an HTML stack of open elements and when the Tag Processor calls this method to re-scan a tag and its attributes, it leads to a broken stack.

This commit replaces the call to `::next_tag()` with a more appropriate reapplication of its internal parsing logic to rescan the tag name and its attributes. Given the limited nature of what's occurring in `::get_updated_html()`, this should bring with it certain guarantees that no HTML structure is being changed (that structure will only be changed by subclasses like the HTML Processor).

Follow-up to [56274], [56702].

Props dmsnell, zieladam, nicolefurlan.
Fixes #59607.
Built from https://develop.svn.wordpress.org/trunk@56941


git-svn-id: http://core.svn.wordpress.org/trunk@56452 1a063a9b-81f0-0310-95a4-ce76da25c4cd
This commit is contained in:
Sergey Biryukov 2023-10-16 14:01:27 +00:00
parent 09554030d4
commit 89a5bbb997
2 changed files with 17 additions and 15 deletions

View File

@ -2270,6 +2270,7 @@ class WP_HTML_Tag_Processor {
*
* @since 6.2.0
* @since 6.2.1 Shifts the internal cursor corresponding to the applied updates.
* @since 6.4.0 No longer calls subclass method `next_tag()` after updating HTML.
*
* @return string The processed HTML.
*/
@ -2303,24 +2304,25 @@ class WP_HTML_Tag_Processor {
* Rewind before the tag name starts so that it's as if the cursor didn't
* move; a call to `next_tag()` will reparse the recently-updated attributes
* and additional calls to modify the attributes will apply at this same
* location.
* location, but in order to avoid issues with subclasses that might add
* behaviors to `next_tag()`, the internal methods should be called here
* instead.
*
* It's important to note that in this specific place there will be no change
* because the processor was already at a tag when this was called and it's
* rewinding only to the beginning of this very tag before reprocessing it
* and its attributes.
*
* <p>Previous HTML<em>More HTML</em></p>
* ^ | back up by the length of the tag name plus the opening <
* \<-/ back up by strlen("em") + 1 ==> 3
* back up by the length of the tag name plus the opening <
* └←─┘ back up by strlen("em") + 1 ==> 3
*/
// Store existing state so it can be restored after reparsing.
$previous_parsed_byte_count = $this->bytes_already_parsed;
$previous_query = $this->last_query;
// Reparse attributes.
$this->bytes_already_parsed = $before_current_tag;
$this->next_tag();
// Restore previous state.
$this->bytes_already_parsed = $previous_parsed_byte_count;
$this->parse_query( $previous_query );
$this->parse_next_tag();
// Reparse the attributes.
while ( $this->parse_next_attribute() ) {
continue;
}
return $this->html;
}

View File

@ -16,7 +16,7 @@
*
* @global string $wp_version
*/
$wp_version = '6.4-beta4-56940';
$wp_version = '6.4-beta4-56941';
/**
* Holds the WordPress DB revision, increments when changes are made to the WordPress DB schema.