Remove all spaces between Chinese words with regex

I would like to remove all spaces among Chinese text only.

My text: "請把這裡的 10 多個字合併. Can you help me?"

Ideal output: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.

edited 3 hours ago

Boann

36.7k1288121

asked 8 hours ago

Needa Hell

725

New contributor

1

Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago

.replace(/ /g,'')

– Nitesh Virani
8 hours ago

2

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago

add a comment |

I would like to remove all spaces among Chinese text only.

My text: "請把這裡的 10 多個字合併. Can you help me?"

Ideal output: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.

edited 3 hours ago

Boann

36.7k1288121

asked 8 hours ago

Needa Hell

725

New contributor

1

Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago

.replace(/ /g,'')

– Nitesh Virani
8 hours ago

2

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago

add a comment |

I would like to remove all spaces among Chinese text only.

My text: "請把這裡的 10 多個字合併. Can you help me?"

Ideal output: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.

edited 3 hours ago

Boann

36.7k1288121

asked 8 hours ago

Needa Hell

725

New contributor

I would like to remove all spaces among Chinese text only.

My text: "請把這裡的 10 多個字合併. Can you help me?"

Ideal output: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.

javascript regex

edited 3 hours ago

Boann

36.7k1288121

asked 8 hours ago

Needa Hell

725

New contributor

edited 3 hours ago

Boann

36.7k1288121

asked 8 hours ago

Needa Hell

725

New contributor

edited 3 hours ago

Boann

36.7k1288121

edited 3 hours ago

Boann

36.7k1288121

edited 3 hours ago

Boann

36.7k1288121

asked 8 hours ago

Needa Hell

725

New contributor

asked 8 hours ago

Needa Hell

725

asked 8 hours ago

Needa Hell

725

New contributor

Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago

.replace(/ /g,'')

– Nitesh Virani
8 hours ago

2

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago

add a comment |

1

Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago

.replace(/ /g,'')

– Nitesh Virani
8 hours ago

2

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago

Does your spaces actually are   or you just used it guessing?

– Justinas
8 hours ago

.replace(/ /g,'')

– Nitesh Virani
8 hours ago

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
8 hours ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
7 hours ago

add a comment |

6 Answers
6

active

oldest

votes

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([foo chinese chars]) ([foo chinese chars])*

edited 1 hour ago

answered 8 hours ago

Grégory NEUT

8,73621538

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
8 hours ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
8 hours ago

I've edited my post to match your desire

– Grégory NEUT
8 hours ago

1

What about eg 請的 10 多個 a

– bobble bubble
7 hours ago

1

@GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

– Aaron
1 hour ago

|
show 3 more comments

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 7 hours ago

answered 8 hours ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
8 hours ago

add a comment |

Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 8 hours ago

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 7 hours ago

answered 8 hours ago

Kamil Kiełczewski

9,27685892

3

The space in front of the 10 is missing.

– holydragon
8 hours ago

@holydragon it's fixed now

– Kamil Kiełczewski
8 hours ago

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 7 hours ago

answered 8 hours ago

Younes Zaidi

4771415

add a comment |

This might be useful in your scenario. (?<![ -~]) (?![ -~])

edited 6 hours ago

Sebastian Hofmann

1,3214818

answered 6 hours ago

Shantanu Patwardhan

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([foo chinese chars]) ([foo chinese chars])*

edited 1 hour ago

answered 8 hours ago

Grégory NEUT

8,73621538

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
8 hours ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
8 hours ago

I've edited my post to match your desire

– Grégory NEUT
8 hours ago

1

What about eg 請的 10 多個 a

– bobble bubble
7 hours ago

1

@GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

– Aaron
1 hour ago

|
show 3 more comments

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([foo chinese chars]) ([foo chinese chars])*

edited 1 hour ago

answered 8 hours ago

Grégory NEUT

8,73621538

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
8 hours ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
8 hours ago

I've edited my post to match your desire

– Grégory NEUT
8 hours ago

1

What about eg 請的 10 多個 a

– bobble bubble
7 hours ago

1

@GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

– Aaron
1 hour ago

|
show 3 more comments

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([foo chinese chars]) ([foo chinese chars])*

edited 1 hour ago

answered 8 hours ago

Grégory NEUT

8,73621538

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([foo chinese chars]) ([foo chinese chars])*

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

edited 1 hour ago

answered 8 hours ago

Grégory NEUT

8,73621538

edited 1 hour ago

answered 8 hours ago

Grégory NEUT

8,73621538

answered 8 hours ago

Grégory NEUT

8,73621538

answered 8 hours ago

Grégory NEUT

8,73621538

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
8 hours ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
8 hours ago

I've edited my post to match your desire

– Grégory NEUT
8 hours ago

1

What about eg 請的 10 多個 a

– bobble bubble
7 hours ago

1

@GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

– Aaron
1 hour ago

|
show 3 more comments

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
8 hours ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
8 hours ago

I've edited my post to match your desire

– Grégory NEUT
8 hours ago

1

What about eg 請的 10 多個 a

– bobble bubble
7 hours ago

1

@GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

– Aaron
1 hour ago

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
8 hours ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
8 hours ago

I've edited my post to match your desire

– Grégory NEUT
8 hours ago

What about eg 請的 10 多個 a

– bobble bubble
7 hours ago

@GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

– Aaron
1 hour ago

|
show 3 more comments

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 7 hours ago

answered 8 hours ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
8 hours ago

add a comment |

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 7 hours ago

answered 8 hours ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
8 hours ago

add a comment |

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 7 hours ago

answered 8 hours ago

Wiktor Stribiżew

310k16131206

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 7 hours ago

answered 8 hours ago

Wiktor Stribiżew

310k16131206

edited 7 hours ago

answered 8 hours ago

Wiktor Stribiżew

310k16131206

answered 8 hours ago

Wiktor Stribiżew

310k16131206

answered 8 hours ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
8 hours ago

add a comment |

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
8 hours ago

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
8 hours ago

add a comment |

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 8 hours ago

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

add a comment |

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 8 hours ago

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

add a comment |

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 8 hours ago

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 8 hours ago

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

edited 8 hours ago

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

answered 8 hours ago

Pushpesh Kumar Rajwanshi

5,7422827

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 7 hours ago

answered 8 hours ago

Kamil Kiełczewski

9,27685892

3

The space in front of the 10 is missing.

– holydragon
8 hours ago

@holydragon it's fixed now

– Kamil Kiełczewski
8 hours ago

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 7 hours ago

answered 8 hours ago

Kamil Kiełczewski

9,27685892

3

The space in front of the 10 is missing.

– holydragon
8 hours ago

@holydragon it's fixed now

– Kamil Kiełczewski
8 hours ago

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 7 hours ago

answered 8 hours ago

Kamil Kiełczewski

9,27685892

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 7 hours ago

answered 8 hours ago

Kamil Kiełczewski

9,27685892

edited 7 hours ago

answered 8 hours ago

Kamil Kiełczewski

9,27685892

answered 8 hours ago

Kamil Kiełczewski

9,27685892

answered 8 hours ago

Kamil Kiełczewski

9,27685892

3

The space in front of the 10 is missing.

– holydragon
8 hours ago

@holydragon it's fixed now

– Kamil Kiełczewski
8 hours ago

add a comment |

3

The space in front of the 10 is missing.

– holydragon
8 hours ago

@holydragon it's fixed now

– Kamil Kiełczewski
8 hours ago

The space in front of the 10 is missing.

– holydragon
8 hours ago

@holydragon it's fixed now

– Kamil Kiełczewski
8 hours ago

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 7 hours ago

answered 8 hours ago

Younes Zaidi

4771415

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 7 hours ago

answered 8 hours ago

Younes Zaidi

4771415

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 7 hours ago

answered 8 hours ago

Younes Zaidi

4771415

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 7 hours ago

answered 8 hours ago

Younes Zaidi

4771415

edited 7 hours ago

answered 8 hours ago

Younes Zaidi

4771415

answered 8 hours ago

Younes Zaidi

4771415

answered 8 hours ago

Younes Zaidi

4771415

add a comment |

This might be useful in your scenario. (?<![ -~]) (?![ -~])

edited 6 hours ago

Sebastian Hofmann

1,3214818

answered 6 hours ago

Shantanu Patwardhan

add a comment |

This might be useful in your scenario. (?<![ -~]) (?![ -~])

edited 6 hours ago

Sebastian Hofmann

1,3214818

answered 6 hours ago

Shantanu Patwardhan

add a comment |

This might be useful in your scenario. (?<![ -~]) (?![ -~])

edited 6 hours ago

Sebastian Hofmann

1,3214818

answered 6 hours ago

Shantanu Patwardhan

This might be useful in your scenario. (?<![ -~]) (?![ -~])

edited 6 hours ago

Sebastian Hofmann

1,3214818

answered 6 hours ago

Shantanu Patwardhan

edited 6 hours ago

Sebastian Hofmann

1,3214818

edited 6 hours ago

Sebastian Hofmann

1,3214818

edited 6 hours ago

Sebastian Hofmann

1,3214818

answered 6 hours ago

Shantanu Patwardhan

answered 6 hours ago

Shantanu Patwardhan

answered 6 hours ago

Shantanu Patwardhan

add a comment |

Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Jtdcftul