WordPress.org

Make WordPress Core

Opened 4 months ago

Closed 35 hours ago

#28786 closed defect (bug) (fixed)

wp_send_json silently fails on non UTF-8 strings

Reported by: pento Owned by: pento
Milestone: 4.1 Priority: normal
Severity: normal Version: 3.5
Component: Formatting Keywords: has-patch commit
Focuses: Cc:

Description

When wp_send_json() is used to return a JSON-encoded lump of data, it silently fails if the data contains any non UTF-8 characters.

Attachments (17)

28786.diff (1.5 KB) - added by pento 4 months ago.
28786.2.diff (2.7 KB) - added by pento 4 months ago.
28786.3.diff (2.8 KB) - added by pento 4 months ago.
28786.4.diff (2.8 KB) - added by pento 4 months ago.
28786.5.diff (2.8 KB) - added by pento 4 months ago.
28786.6.diff (4.5 KB) - added by pento 4 months ago.
28786.7.diff (4.5 KB) - added by pento 4 months ago.
json_test.php (1.3 KB) - added by pento 4 months ago.
28786.b1.diff (443 bytes) - added by DJPaul 3 months ago.
28786.8.diff (6.8 KB) - added by pento 2 months ago.
28786.9.diff (6.9 KB) - added by pento 7 weeks ago.
28786.10.diff (17.7 KB) - added by pento 2 weeks ago.
28786.11.diff (18.6 KB) - added by pento 2 weeks ago.
28786.12.diff (18.7 KB) - added by pento 3 days ago.
28786.13.diff (668 bytes) - added by JustinSainton 37 hours ago.
28786.docs.diff (4.0 KB) - added by TobiasBg 36 hours ago.
28786.docs.2.diff (4.2 KB) - added by TobiasBg 35 hours ago.

Download all attachments as: .zip

Change History (53)

pento4 months ago

comment:1 pento4 months ago

  • Keywords has-patch needs-unit-tests added

pento4 months ago

pento4 months ago

pento4 months ago

comment:2 pento4 months ago

  • Keywords needs-unit-tests removed

attachment:28786.4.diff

  • Adds unit tests
  • Tries to use mb_convert_encoding(), if available
  • Removes some old code

comment:3 pento4 months ago

  • Keywords commit added

No objections? Cool. Lets commit it and see what happens.

comment:4 ocean904 months ago

Related: #28807

pento4 months ago

comment:5 pento4 months ago

attachment:28786.5.diff falls back to treating the string as UTF-8 if it can't detect the encoding. This is the same behaviour as wp_check_invalid_utf8().

pento4 months ago

pento4 months ago

comment:6 pento4 months ago

attachment:28786.7.diff

  • Adds $options and $depth parameters, to match json_encode()
  • A handful of performance improvements for _wp_json_sanity_check()
  • Tidied up test_wp_json_encode()
  • test_wp_json_encode_depth() added to make sure we obey $depth

pento4 months ago

comment:7 pento4 months ago

attachment:json_test.php does some performance testing. Results show that, for data that works fine with json_encode(), the overhead of wp_json_encode() is minimal. For extremely large data sets that need to be converted, performance can be pretty bad, but I don't think we're going to encounter much in the way of 6D arrays with 1,000,000 items in them. :-)

$ php json_test.php 
json_encode: 0.051064968109131
wp_json_encode overhead: 0.052061080932617
Best case wp_json_encode: 2.9125101566315
Worst case wp_json_encode: 15.276411056519

DJPaul3 months ago

comment:8 DJPaul3 months ago

@pento's patch fixes the situation where an audio file with non UTF-8 in the meta description will cause the Media Library to not load. However, if that audio file is used in the playlist shortcode, it still breaks the player JS on the front-end. 28786.b1.diff patches the relevant part of media.php to use the new encode function.

Last edited 3 months ago by DJPaul (previous) (diff)

comment:9 ircbot3 months ago

This ticket was mentioned in IRC in #wordpress-dev by ocean90. View the logs.

comment:10 ircbot3 months ago

This ticket was mentioned in IRC in #wordpress-dev by DrewAPicture. View the logs.

comment:11 DrewAPicture3 months ago

#28807 was marked as a duplicate.

comment:12 DrewAPicture3 months ago

  • Keywords needs-patch added; has-patch commit removed

Closed #28807 as a duplicate, since both tickets serve to handle non-UTF-8 failures with json_encode(). We'll need to make sure that the plugin API response calls referenced in #28807 are handled here as well.

This will need lead dev support to get in for 4.0.

pento2 months ago

comment:13 pento2 months ago

  • Keywords has-patch commit 4.1-early added; needs-patch removed
  • Milestone changed from 4.0 to Future Release

attachment:28786.8.diff

It's too late for this to go into 4.0, it can go in 4.1 early.

comment:14 ircbot2 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

pento7 weeks ago

comment:15 pento7 weeks ago

  • Keywords 4.1-early removed
  • Milestone changed from Future Release to 4.1

28786.9.diff is a refresh to apply cleanly against trunk.

comment:16 pento2 weeks ago

#15827 was marked as a duplicate.

pento2 weeks ago

comment:17 pento2 weeks ago

28786.10.diff converts all json_encode() calls in WordPress to be wp_json_encode().

comment:18 mdawaffe2 weeks ago

I worry about mb_detect_encoding() but don't have any data or experience to validate or assuage that worry.

I know we're using this code on WordPress.com in places. In other places, we use code that explicitly converts from the blog_charset (no attempt at auto-detection) to UTF-8.

pento2 weeks ago

comment:19 pento2 weeks ago

28786.11.diff now also sanity checks array/object keys.

And because we all like things that go faster, unwinding unnecessary recursion gives us a bit of a boost when we need to sanity check the data.

$ php json_test.php 
json_encode: 0.055270910263062
wp_json_encode overhead: 0.058070182800293
Best case wp_json_encode: 0.98551392555237
Worst case wp_json_encode: 4.3197019100189

comment:20 azaozz2 weeks ago

Looking at 28786.11.diff, couple of nitpicks:

  • _wp_json_convert_string() can be called many times. Instead of doing function_exists( 'mb_convert_encoding' ) every time, can probably set a static after the first check.
  • A user note in the PHP manual mentions that mb_detect_encoding() should be run in strict mode (third arg set to true). Perhaps some "real life" testing for that is worth it.

Also wondering if we should use _wp_json_convert_string() when htmlspecialchars() fails (can probably rename it to something without 'json' and have an arg to convert to the 'blog_charset'). Still hear sometimes that wp_richedit_pre() and wp_htmledit_pre() may return an empty string especially for posts that were imported (so the charset is not the same as get_option( 'blog_charset' )).

Last edited 2 weeks ago by azaozz (previous) (diff)

comment:21 DrewAPicture2 weeks ago

  • Component changed from General to Formatting

pento3 days ago

comment:22 pento3 days ago

28786.12.diff uses a static var instead of function_exists(), and does strict mb_detect_encoding() checks.

I'm inclined to keep _wp_json_convert_string() as a function specifically for JSON. I'm working on a patch for WPDB that will provide more comprehensive character set conversion functionality.

comment:23 slackbot3 days ago

This ticket was mentioned in Slack in #core by pento. View the logs.

comment:24 slackbot3 days ago

This ticket was mentioned in Slack in #core by nacin. View the logs.

comment:25 slackbot3 days ago

This ticket was mentioned in Slack in #core by pento. View the logs.

comment:26 pento38 hours ago

  • Owner set to pento
  • Resolution set to fixed
  • Status changed from new to closed

In 30055:

Add wp_json_encode(), a wrapper for json_encode() that ensures everything is converted to UTF-8.

Change all core calls from json_encode() to wp_json_encode().

Fixes #28786.

comment:27 slackbot37 hours ago

This ticket was mentioned in Slack in #core by justinsainton. View the logs.

JustinSainton37 hours ago

comment:28 JustinSainton37 hours ago

Minor doc fix in 28786.13.diff

comment:29 pento37 hours ago

In 30058:

Fix a PHPDoc typo for wp_json_encode().

Props JustinSainton.

See #28786.

comment:30 TobiasBg36 hours ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

28786.docs.diff adds some documentation to the private functions, as per drew, fixes punctuation in some more docs, and changes else if to elseif, which seems to be preferred in the WP Coding Standards.

Last edited 36 hours ago by TobiasBg (previous) (diff)

TobiasBg36 hours ago

comment:31 slackbot35 hours ago

This ticket was mentioned in Slack in #core by tobiasbg. View the logs.

TobiasBg35 hours ago

comment:32 slackbot35 hours ago

This ticket was mentioned in Slack in #core by tobiasbg. View the logs.

comment:33 markjaquith35 hours ago

In 30075:

Define JSON_PRETTY_PRINT so it can be used with wp_json_encode()

  • JSON_PRETTY_PRINT was introduced in PHP 5.4
  • Now you can use it with lower PHP versions, without a notice

fixes #30139
see #28786

comment:34 DrewAPicture35 hours ago

28786.docs.2.diff looks pretty good to me. Nice work @TobiasBg!

comment:35 slackbot35 hours ago

This ticket was mentioned in Slack in #core by drew. View the logs.

comment:36 markjaquith35 hours ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 30078:

Docs and code standards cleanup for [30055] (wp_json_encode() & friends)

fixes #28786
props TobiasBg

Note: See TracTickets for help on using tickets.