Opened 4 months ago
Closed 35 hours ago
#28786 closed defect (bug) (fixed)
wp_send_json silently fails on non UTF-8 strings
Reported by: | pento | Owned by: | pento |
---|---|---|---|
Milestone: | 4.1 | Priority: | normal |
Severity: | normal | Version: | 3.5 |
Component: | Formatting | Keywords: | has-patch commit |
Focuses: | Cc: |
Description
When wp_send_json() is used to return a JSON-encoded lump of data, it silently fails if the data contains any non UTF-8 characters.
Attachments (17)
Change History (53)
comment:3 pento — 4 months ago
- Keywords commit added
No objections? Cool. Lets commit it and see what happens.
comment:5 pento — 4 months ago
attachment:28786.5.diff falls back to treating the string as UTF-8 if it can't detect the encoding. This is the same behaviour as wp_check_invalid_utf8().
comment:6 pento — 4 months ago
- Adds $options and $depth parameters, to match json_encode()
- A handful of performance improvements for _wp_json_sanity_check()
- Tidied up test_wp_json_encode()
- test_wp_json_encode_depth() added to make sure we obey $depth
comment:7 pento — 4 months ago
attachment:json_test.php does some performance testing. Results show that, for data that works fine with json_encode(), the overhead of wp_json_encode() is minimal. For extremely large data sets that need to be converted, performance can be pretty bad, but I don't think we're going to encounter much in the way of 6D arrays with 1,000,000 items in them. :-)
$ php json_test.php json_encode: 0.051064968109131 wp_json_encode overhead: 0.052061080932617 Best case wp_json_encode: 2.9125101566315 Worst case wp_json_encode: 15.276411056519
comment:8 DJPaul — 3 months ago
@pento's patch fixes the situation where an audio file with non UTF-8 in the meta description will cause the Media Library to not load. However, if that audio file is used in the playlist shortcode, it still breaks the player JS on the front-end. 28786.b1.diff patches the relevant part of media.php to use the new encode function.
comment:9 ircbot — 3 months ago
This ticket was mentioned in IRC in #wordpress-dev by ocean90. View the logs.
comment:10 ircbot — 3 months ago
This ticket was mentioned in IRC in #wordpress-dev by DrewAPicture. View the logs.
comment:11 DrewAPicture — 3 months ago
#28807 was marked as a duplicate.
comment:12 DrewAPicture — 3 months ago
- Keywords needs-patch added; has-patch commit removed
comment:13 pento — 2 months ago
- Keywords has-patch commit 4.1-early added; needs-patch removed
- Milestone changed from 4.0 to Future Release
- Updates the patch to apply cleanly against trunk
- Adds the fix from attachment:28786.b1.diff
- Fixes #28807
It's too late for this to go into 4.0, it can go in 4.1 early.
comment:14 ircbot — 2 months ago
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
comment:15 pento — 7 weeks ago
- Keywords 4.1-early removed
- Milestone changed from Future Release to 4.1
28786.9.diff is a refresh to apply cleanly against trunk.
comment:16 pento — 2 weeks ago
#15827 was marked as a duplicate.
comment:17 pento — 2 weeks ago
28786.10.diff converts all json_encode() calls in WordPress to be wp_json_encode().
comment:18 mdawaffe — 2 weeks ago
I worry about mb_detect_encoding() but don't have any data or experience to validate or assuage that worry.
I know we're using this code on WordPress.com in places. In other places, we use code that explicitly converts from the blog_charset (no attempt at auto-detection) to UTF-8.
comment:19 pento — 2 weeks ago
28786.11.diff now also sanity checks array/object keys.
And because we all like things that go faster, unwinding unnecessary recursion gives us a bit of a boost when we need to sanity check the data.
$ php json_test.php json_encode: 0.055270910263062 wp_json_encode overhead: 0.058070182800293 Best case wp_json_encode: 0.98551392555237 Worst case wp_json_encode: 4.3197019100189
comment:20 azaozz — 2 weeks ago
Looking at 28786.11.diff, couple of nitpicks:
- _wp_json_convert_string() can be called many times. Instead of doing function_exists( 'mb_convert_encoding' ) every time, can probably set a static after the first check.
- A user note in the PHP manual mentions that mb_detect_encoding() should be run in strict mode (third arg set to true). Perhaps some "real life" testing for that is worth it.
Also wondering if we should use _wp_json_convert_string() when htmlspecialchars() fails (can probably rename it to something without 'json' and have an arg to convert to the 'blog_charset'). Still hear sometimes that wp_richedit_pre() and wp_htmledit_pre() may return an empty string especially for posts that were imported (so the charset is not the same as get_option( 'blog_charset' )).
comment:21 DrewAPicture — 2 weeks ago
- Component changed from General to Formatting
comment:22 pento — 3 days ago
28786.12.diff uses a static var instead of function_exists(), and does strict mb_detect_encoding() checks.
I'm inclined to keep _wp_json_convert_string() as a function specifically for JSON. I'm working on a patch for WPDB that will provide more comprehensive character set conversion functionality.
comment:23 slackbot — 3 days ago
This ticket was mentioned in Slack in #core by pento. View the logs.
comment:24 slackbot — 3 days ago
This ticket was mentioned in Slack in #core by nacin. View the logs.
comment:25 slackbot — 3 days ago
This ticket was mentioned in Slack in #core by pento. View the logs.
comment:26 pento — 38 hours ago
- Owner set to pento
- Resolution set to fixed
- Status changed from new to closed
In 30055:
comment:27 slackbot — 37 hours ago
This ticket was mentioned in Slack in #core by justinsainton. View the logs.
JustinSainton — 37 hours ago
comment:28 JustinSainton — 37 hours ago
Minor doc fix in 28786.13.diff
comment:29 pento — 37 hours ago
In 30058:
comment:30 TobiasBg — 36 hours ago
- Resolution fixed deleted
- Status changed from closed to reopened
28786.docs.diff adds some documentation to the private functions, as per drew, fixes punctuation in some more docs, and changes else if to elseif, which seems to be preferred in the WP Coding Standards.
comment:31 slackbot — 35 hours ago
This ticket was mentioned in Slack in #core by tobiasbg. View the logs.
comment:32 slackbot — 35 hours ago
This ticket was mentioned in Slack in #core by tobiasbg. View the logs.
comment:33 markjaquith — 35 hours ago
In 30075:
comment:34 DrewAPicture — 35 hours ago
28786.docs.2.diff looks pretty good to me. Nice work @TobiasBg!
comment:35 slackbot — 35 hours ago
This ticket was mentioned in Slack in #core by drew. View the logs.
comment:36 markjaquith — 35 hours ago
- Resolution set to fixed
- Status changed from reopened to closed
In 30078:
attachment:28786.4.diff