Criteria for judging "full-width characters" and "half-width characters" on Twitter

Asked 2 years ago, Updated 2 years ago, 265 views

As a preliminary Twitter story, where do you decide whether the text in the tweet is "full-width characters" (up to 140 characters) or other characters (up to 280 characters ... so to speak, "half-width characters"?

Could you please let me know if you have any information such as typing on this URL that you can see the regular expression pattern itself or that it is clearly stated in the English specification of this site?

(However, I don't know how to use the Twitter API at all, so if it means that the exact behavior of the judgment algorithm is not disclosed because I'm just contacting the API from my browser to get the results, I'll give up gracefully.)

By the way, the code I am currently using exclusively for myself is
U+2000 to U+10FFFF full-width
Other than that, half-width
This is a very careless implementation.

(Most of them only write CJK and Latin characters, so 90% of them will be in time.)

twitter unicode

2022-09-30 22:03

1 Answers

Described in Counting characters.Basically, it is treated as a so-called full-width character, and the next range seems to be a half-width character.

  • the Latin-1 code pages. (U+0000-U+10FF).
  • general punctuation up to and including the Zero Width Joiner (used to combine emoji and other glyphs) (U+2000-U+200D).
  • general punctuation excluding U+200E and U+200F, which are Unicode directional marks (U+2010-U+201F).
  • quotation marks (U+2032-U+2037).

More precisely, the JSON file at https://github.com/twitter/twitter-text/tree/master/config describes the range.

Apart from this, it seems that all emojis are treated as full-width characters.

Is there only one full-width character for different characters?


2022-09-30 22:03

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.