Generic: use compat_urllib_parse_unquote to prevent utf8 mangling

of the entire page in python 2.

-requires- fixed compat_urllib_parse_unquote

example - the following will save with a mangled playlist title,
 instead of the kanji for 'tsunami'. This affects all utf8encoded
 urls as well

youtube-dl -f18 -o '%(playlist_title)s-%(title)s.%(ext)s' \
  61c14c1e3a/tsunami.html
This commit is contained in:
fnord 2015-07-15 15:30:47 -05:00
parent e37c932fca
commit 45eedbe58c

View File

@ -1115,7 +1115,7 @@ class GenericIE(InfoExtractor):
# Sometimes embedded video player is hidden behind percent encoding # Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/rg3/youtube-dl/issues/2448) # (e.g. https://github.com/rg3/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way # Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse.unquote(webpage) webpage = compat_urllib_parse_unquote(webpage)
# it's tempting to parse this further, but you would # it's tempting to parse this further, but you would
# have to take into account all the variations like # have to take into account all the variations like