Remove all spaces between Chinese words with regex












12















I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    8 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    8 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    8 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    7 hours ago
















12















I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    8 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    8 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    8 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    7 hours ago














12












12








12


3






I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.







javascript regex






share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 3 hours ago









Boann

36.7k1288121




36.7k1288121






New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 8 hours ago









Needa HellNeeda Hell

725




725




New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    8 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    8 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    8 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    7 hours ago














  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    8 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    8 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    8 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    7 hours ago








1




1





Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago





Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago













.replace(/ /g,'')

– Nitesh Virani
8 hours ago





.replace(/ /g,'')

– Nitesh Virani
8 hours ago




2




2





Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago





Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago













Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago





Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago












6 Answers
6






active

oldest

votes


















11














Using @Brett Zamir soluce on how to match chinese character in regex



Javascript unicode string, chinese character but no punctuation








const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

const ret = str.replace(regex, '$1$2');

console.log(ret);







It looks like :



([foo chinese chars]) ([foo chinese chars])*





share|improve this answer





















  • 1





    The output here doesn't match with the ideal output. Notice the space in front of the 10.

    – holydragon
    8 hours ago











  • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

    – jonatjano
    8 hours ago











  • I've edited my post to match your desire

    – Grégory NEUT
    8 hours ago






  • 1





    What about eg 請 的 10 多 個 a

    – bobble bubble
    7 hours ago






  • 1





    @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

    – Aaron
    1 hour ago



















7














Getting to the Chinese char matching pattern



Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


In ES6, to match a single Chinese char, it can be used as



/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


pattern to match any Chinese char using JS RegExp.



So, you may use



s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


See the regex demo.



If your JS environment is ECMAScript 2018 compliant you may use a shorter



s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


Pattern details





  • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


  • s+ - any 1+ whitespaces (any Unicode whitespace)


  • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


JS demo:






var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








share|improve this answer


























  • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

    – Wiktor Stribiżew
    8 hours ago



















3














Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


And replace it by $1



Demo






var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








share|improve this answer

































    2














    Try this



    str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


    I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






    var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
    str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

    console.log(str);








    share|improve this answer





















    • 3





      The space in front of the 10 is missing.

      – holydragon
      8 hours ago











    • @holydragon it's fixed now

      – Kamil Kiełczewski
      8 hours ago





















    0

















    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

    var isChinese = function (str) {
    var charCode;
    var flag;
    var range;
    for (var i = 0; i < str.length;) {
    charCode = str.codePointAt(i);
    flag = false;
    for (var j = 0; j < chineseRange.length; j++) {
    range = chineseRange[j];
    if (charCode >= range[0] && charCode <= range[1]) {
    flag = true;
    break;
    }
    }
    if (!flag) {
    return false;
    }
    if (charCode <= 0xffff) {
    i++
    } else {
    i += 2
    }
    }
    return true;
    }
    // for more information about chinese.js visite this demo in Github
    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

    // I wrote this function to remove space between chinese word

    var spl = chine.trim().split(/s+/);
    var text = '';
    for (var i = 0; i < spl.length; i++) {
    if (isChinese(spl[i])) {
    if (!isChinese(spl[i + 1])) {
    text += spl[i] + ' ';
    } else {
    text += spl[i];
    }
    } else {
    text += spl[i] + ' ';
    }
    }
    console.log(text);








    share|improve this answer

































      0














      This might be useful in your scenario. (?<![ -~]) (?![ -~])






      share|improve this answer

























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });






        Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.










        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        11














        Using @Brett Zamir soluce on how to match chinese character in regex



        Javascript unicode string, chinese character but no punctuation








        const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

        const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

        const ret = str.replace(regex, '$1$2');

        console.log(ret);







        It looks like :



        ([foo chinese chars]) ([foo chinese chars])*





        share|improve this answer





















        • 1





          The output here doesn't match with the ideal output. Notice the space in front of the 10.

          – holydragon
          8 hours ago











        • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

          – jonatjano
          8 hours ago











        • I've edited my post to match your desire

          – Grégory NEUT
          8 hours ago






        • 1





          What about eg 請 的 10 多 個 a

          – bobble bubble
          7 hours ago






        • 1





          @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

          – Aaron
          1 hour ago
















        11














        Using @Brett Zamir soluce on how to match chinese character in regex



        Javascript unicode string, chinese character but no punctuation








        const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

        const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

        const ret = str.replace(regex, '$1$2');

        console.log(ret);







        It looks like :



        ([foo chinese chars]) ([foo chinese chars])*





        share|improve this answer





















        • 1





          The output here doesn't match with the ideal output. Notice the space in front of the 10.

          – holydragon
          8 hours ago











        • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

          – jonatjano
          8 hours ago











        • I've edited my post to match your desire

          – Grégory NEUT
          8 hours ago






        • 1





          What about eg 請 的 10 多 個 a

          – bobble bubble
          7 hours ago






        • 1





          @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

          – Aaron
          1 hour ago














        11












        11








        11







        Using @Brett Zamir soluce on how to match chinese character in regex



        Javascript unicode string, chinese character but no punctuation








        const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

        const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

        const ret = str.replace(regex, '$1$2');

        console.log(ret);







        It looks like :



        ([foo chinese chars]) ([foo chinese chars])*





        share|improve this answer















        Using @Brett Zamir soluce on how to match chinese character in regex



        Javascript unicode string, chinese character but no punctuation








        const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

        const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

        const ret = str.replace(regex, '$1$2');

        console.log(ret);







        It looks like :



        ([foo chinese chars]) ([foo chinese chars])*





        const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

        const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

        const ret = str.replace(regex, '$1$2');

        console.log(ret);





        const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

        const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

        const ret = str.replace(regex, '$1$2');

        console.log(ret);






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 1 hour ago

























        answered 8 hours ago









        Grégory NEUTGrégory NEUT

        8,73621538




        8,73621538








        • 1





          The output here doesn't match with the ideal output. Notice the space in front of the 10.

          – holydragon
          8 hours ago











        • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

          – jonatjano
          8 hours ago











        • I've edited my post to match your desire

          – Grégory NEUT
          8 hours ago






        • 1





          What about eg 請 的 10 多 個 a

          – bobble bubble
          7 hours ago






        • 1





          @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

          – Aaron
          1 hour ago














        • 1





          The output here doesn't match with the ideal output. Notice the space in front of the 10.

          – holydragon
          8 hours ago











        • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

          – jonatjano
          8 hours ago











        • I've edited my post to match your desire

          – Grégory NEUT
          8 hours ago






        • 1





          What about eg 請 的 10 多 個 a

          – bobble bubble
          7 hours ago






        • 1





          @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

          – Aaron
          1 hour ago








        1




        1





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        8 hours ago





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        8 hours ago













        you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        8 hours ago





        you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        8 hours ago













        I've edited my post to match your desire

        – Grégory NEUT
        8 hours ago





        I've edited my post to match your desire

        – Grégory NEUT
        8 hours ago




        1




        1





        What about eg 請 的 10 多 個 a

        – bobble bubble
        7 hours ago





        What about eg 請 的 10 多 個 a

        – bobble bubble
        7 hours ago




        1




        1





        @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

        – Aaron
        1 hour ago





        @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

        – Aaron
        1 hour ago













        7














        Getting to the Chinese char matching pattern



        Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



        [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


        In ES6, to match a single Chinese char, it can be used as



        /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


        Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



        (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


        pattern to match any Chinese char using JS RegExp.



        So, you may use



        s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


        See the regex demo.



        If your JS environment is ECMAScript 2018 compliant you may use a shorter



        s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


        Pattern details





        • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


        • s+ - any 1+ whitespaces (any Unicode whitespace)


        • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


        JS demo:






        var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
        var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
        console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
        // ECMAScript 2018 only
        console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








        share|improve this answer


























        • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

          – Wiktor Stribiżew
          8 hours ago
















        7














        Getting to the Chinese char matching pattern



        Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



        [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


        In ES6, to match a single Chinese char, it can be used as



        /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


        Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



        (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


        pattern to match any Chinese char using JS RegExp.



        So, you may use



        s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


        See the regex demo.



        If your JS environment is ECMAScript 2018 compliant you may use a shorter



        s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


        Pattern details





        • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


        • s+ - any 1+ whitespaces (any Unicode whitespace)


        • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


        JS demo:






        var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
        var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
        console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
        // ECMAScript 2018 only
        console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








        share|improve this answer


























        • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

          – Wiktor Stribiżew
          8 hours ago














        7












        7








        7







        Getting to the Chinese char matching pattern



        Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



        [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


        In ES6, to match a single Chinese char, it can be used as



        /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


        Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



        (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


        pattern to match any Chinese char using JS RegExp.



        So, you may use



        s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


        See the regex demo.



        If your JS environment is ECMAScript 2018 compliant you may use a shorter



        s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


        Pattern details





        • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


        • s+ - any 1+ whitespaces (any Unicode whitespace)


        • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


        JS demo:






        var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
        var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
        console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
        // ECMAScript 2018 only
        console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








        share|improve this answer















        Getting to the Chinese char matching pattern



        Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



        [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


        In ES6, to match a single Chinese char, it can be used as



        /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


        Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



        (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


        pattern to match any Chinese char using JS RegExp.



        So, you may use



        s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


        See the regex demo.



        If your JS environment is ECMAScript 2018 compliant you may use a shorter



        s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


        Pattern details





        • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


        • s+ - any 1+ whitespaces (any Unicode whitespace)


        • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


        JS demo:






        var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
        var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
        console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
        // ECMAScript 2018 only
        console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








        var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
        var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
        console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
        // ECMAScript 2018 only
        console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));





        var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
        var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
        console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
        // ECMAScript 2018 only
        console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 7 hours ago

























        answered 8 hours ago









        Wiktor StribiżewWiktor Stribiżew

        310k16131206




        310k16131206













        • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

          – Wiktor Stribiżew
          8 hours ago



















        • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

          – Wiktor Stribiżew
          8 hours ago

















        FYI: if only one whitespace is expected between Chinese chars, remove + after s.

        – Wiktor Stribiżew
        8 hours ago





        FYI: if only one whitespace is expected between Chinese chars, remove + after s.

        – Wiktor Stribiżew
        8 hours ago











        3














        Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



        ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


        And replace it by $1



        Demo






        var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
        console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








        share|improve this answer






























          3














          Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



          ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


          And replace it by $1



          Demo






          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








          share|improve this answer




























            3












            3








            3







            Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



            ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


            And replace it by $1



            Demo






            var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
            console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








            share|improve this answer















            Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



            ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


            And replace it by $1



            Demo






            var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
            console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








            var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
            console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));





            var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
            console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 8 hours ago

























            answered 8 hours ago









            Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi

            5,7422827




            5,7422827























                2














                Try this



                str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


                I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






                var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
                str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

                console.log(str);








                share|improve this answer





















                • 3





                  The space in front of the 10 is missing.

                  – holydragon
                  8 hours ago











                • @holydragon it's fixed now

                  – Kamil Kiełczewski
                  8 hours ago


















                2














                Try this



                str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


                I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






                var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
                str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

                console.log(str);








                share|improve this answer





















                • 3





                  The space in front of the 10 is missing.

                  – holydragon
                  8 hours ago











                • @holydragon it's fixed now

                  – Kamil Kiełczewski
                  8 hours ago
















                2












                2








                2







                Try this



                str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


                I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






                var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
                str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

                console.log(str);








                share|improve this answer















                Try this



                str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


                I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






                var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
                str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

                console.log(str);








                var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
                str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

                console.log(str);





                var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
                str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

                console.log(str);






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited 7 hours ago

























                answered 8 hours ago









                Kamil KiełczewskiKamil Kiełczewski

                9,27685892




                9,27685892








                • 3





                  The space in front of the 10 is missing.

                  – holydragon
                  8 hours ago











                • @holydragon it's fixed now

                  – Kamil Kiełczewski
                  8 hours ago
















                • 3





                  The space in front of the 10 is missing.

                  – holydragon
                  8 hours ago











                • @holydragon it's fixed now

                  – Kamil Kiełczewski
                  8 hours ago










                3




                3





                The space in front of the 10 is missing.

                – holydragon
                8 hours ago





                The space in front of the 10 is missing.

                – holydragon
                8 hours ago













                @holydragon it's fixed now

                – Kamil Kiełczewski
                8 hours ago







                @holydragon it's fixed now

                – Kamil Kiełczewski
                8 hours ago













                0

















                var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                var isChinese = function (str) {
                var charCode;
                var flag;
                var range;
                for (var i = 0; i < str.length;) {
                charCode = str.codePointAt(i);
                flag = false;
                for (var j = 0; j < chineseRange.length; j++) {
                range = chineseRange[j];
                if (charCode >= range[0] && charCode <= range[1]) {
                flag = true;
                break;
                }
                }
                if (!flag) {
                return false;
                }
                if (charCode <= 0xffff) {
                i++
                } else {
                i += 2
                }
                }
                return true;
                }
                // for more information about chinese.js visite this demo in Github
                //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                // I wrote this function to remove space between chinese word

                var spl = chine.trim().split(/s+/);
                var text = '';
                for (var i = 0; i < spl.length; i++) {
                if (isChinese(spl[i])) {
                if (!isChinese(spl[i + 1])) {
                text += spl[i] + ' ';
                } else {
                text += spl[i];
                }
                } else {
                text += spl[i] + ' ';
                }
                }
                console.log(text);








                share|improve this answer






























                  0

















                  var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                  var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                  var isChinese = function (str) {
                  var charCode;
                  var flag;
                  var range;
                  for (var i = 0; i < str.length;) {
                  charCode = str.codePointAt(i);
                  flag = false;
                  for (var j = 0; j < chineseRange.length; j++) {
                  range = chineseRange[j];
                  if (charCode >= range[0] && charCode <= range[1]) {
                  flag = true;
                  break;
                  }
                  }
                  if (!flag) {
                  return false;
                  }
                  if (charCode <= 0xffff) {
                  i++
                  } else {
                  i += 2
                  }
                  }
                  return true;
                  }
                  // for more information about chinese.js visite this demo in Github
                  //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                  // I wrote this function to remove space between chinese word

                  var spl = chine.trim().split(/s+/);
                  var text = '';
                  for (var i = 0; i < spl.length; i++) {
                  if (isChinese(spl[i])) {
                  if (!isChinese(spl[i + 1])) {
                  text += spl[i] + ' ';
                  } else {
                  text += spl[i];
                  }
                  } else {
                  text += spl[i] + ' ';
                  }
                  }
                  console.log(text);








                  share|improve this answer




























                    0












                    0








                    0










                    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                    var isChinese = function (str) {
                    var charCode;
                    var flag;
                    var range;
                    for (var i = 0; i < str.length;) {
                    charCode = str.codePointAt(i);
                    flag = false;
                    for (var j = 0; j < chineseRange.length; j++) {
                    range = chineseRange[j];
                    if (charCode >= range[0] && charCode <= range[1]) {
                    flag = true;
                    break;
                    }
                    }
                    if (!flag) {
                    return false;
                    }
                    if (charCode <= 0xffff) {
                    i++
                    } else {
                    i += 2
                    }
                    }
                    return true;
                    }
                    // for more information about chinese.js visite this demo in Github
                    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                    // I wrote this function to remove space between chinese word

                    var spl = chine.trim().split(/s+/);
                    var text = '';
                    for (var i = 0; i < spl.length; i++) {
                    if (isChinese(spl[i])) {
                    if (!isChinese(spl[i + 1])) {
                    text += spl[i] + ' ';
                    } else {
                    text += spl[i];
                    }
                    } else {
                    text += spl[i] + ' ';
                    }
                    }
                    console.log(text);








                    share|improve this answer


















                    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                    var isChinese = function (str) {
                    var charCode;
                    var flag;
                    var range;
                    for (var i = 0; i < str.length;) {
                    charCode = str.codePointAt(i);
                    flag = false;
                    for (var j = 0; j < chineseRange.length; j++) {
                    range = chineseRange[j];
                    if (charCode >= range[0] && charCode <= range[1]) {
                    flag = true;
                    break;
                    }
                    }
                    if (!flag) {
                    return false;
                    }
                    if (charCode <= 0xffff) {
                    i++
                    } else {
                    i += 2
                    }
                    }
                    return true;
                    }
                    // for more information about chinese.js visite this demo in Github
                    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                    // I wrote this function to remove space between chinese word

                    var spl = chine.trim().split(/s+/);
                    var text = '';
                    for (var i = 0; i < spl.length; i++) {
                    if (isChinese(spl[i])) {
                    if (!isChinese(spl[i + 1])) {
                    text += spl[i] + ' ';
                    } else {
                    text += spl[i];
                    }
                    } else {
                    text += spl[i] + ' ';
                    }
                    }
                    console.log(text);








                    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                    var isChinese = function (str) {
                    var charCode;
                    var flag;
                    var range;
                    for (var i = 0; i < str.length;) {
                    charCode = str.codePointAt(i);
                    flag = false;
                    for (var j = 0; j < chineseRange.length; j++) {
                    range = chineseRange[j];
                    if (charCode >= range[0] && charCode <= range[1]) {
                    flag = true;
                    break;
                    }
                    }
                    if (!flag) {
                    return false;
                    }
                    if (charCode <= 0xffff) {
                    i++
                    } else {
                    i += 2
                    }
                    }
                    return true;
                    }
                    // for more information about chinese.js visite this demo in Github
                    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                    // I wrote this function to remove space between chinese word

                    var spl = chine.trim().split(/s+/);
                    var text = '';
                    for (var i = 0; i < spl.length; i++) {
                    if (isChinese(spl[i])) {
                    if (!isChinese(spl[i + 1])) {
                    text += spl[i] + ' ';
                    } else {
                    text += spl[i];
                    }
                    } else {
                    text += spl[i] + ' ';
                    }
                    }
                    console.log(text);





                    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                    var isChinese = function (str) {
                    var charCode;
                    var flag;
                    var range;
                    for (var i = 0; i < str.length;) {
                    charCode = str.codePointAt(i);
                    flag = false;
                    for (var j = 0; j < chineseRange.length; j++) {
                    range = chineseRange[j];
                    if (charCode >= range[0] && charCode <= range[1]) {
                    flag = true;
                    break;
                    }
                    }
                    if (!flag) {
                    return false;
                    }
                    if (charCode <= 0xffff) {
                    i++
                    } else {
                    i += 2
                    }
                    }
                    return true;
                    }
                    // for more information about chinese.js visite this demo in Github
                    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                    // I wrote this function to remove space between chinese word

                    var spl = chine.trim().split(/s+/);
                    var text = '';
                    for (var i = 0; i < spl.length; i++) {
                    if (isChinese(spl[i])) {
                    if (!isChinese(spl[i + 1])) {
                    text += spl[i] + ' ';
                    } else {
                    text += spl[i];
                    }
                    } else {
                    text += spl[i] + ' ';
                    }
                    }
                    console.log(text);






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited 7 hours ago

























                    answered 8 hours ago









                    Younes ZaidiYounes Zaidi

                    4771415




                    4771415























                        0














                        This might be useful in your scenario. (?<![ -~]) (?![ -~])






                        share|improve this answer






























                          0














                          This might be useful in your scenario. (?<![ -~]) (?![ -~])






                          share|improve this answer




























                            0












                            0








                            0







                            This might be useful in your scenario. (?<![ -~]) (?![ -~])






                            share|improve this answer















                            This might be useful in your scenario. (?<![ -~]) (?![ -~])







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited 6 hours ago









                            Sebastian Hofmann

                            1,3214818




                            1,3214818










                            answered 6 hours ago









                            Shantanu PatwardhanShantanu Patwardhan

                            11




                            11






















                                Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.










                                draft saved

                                draft discarded


















                                Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.













                                Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.












                                Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
















                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                日野市

                                GameSpot

                                Tu-95轟炸機