Remove all spaces between Chinese words with regex
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
New contributor
add a comment |
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
New contributor
1
Does your spaces actually are
or you just used it guessing?
– Justinas
8 hours ago
.replace(/ /g,'')
– Nitesh Virani
8 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
8 hours ago
Do you want to keep a space before10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
7 hours ago
add a comment |
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
New contributor
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
javascript regex
New contributor
New contributor
edited 3 hours ago
Boann
36.7k1288121
36.7k1288121
New contributor
asked 8 hours ago
Needa HellNeeda Hell
725
725
New contributor
New contributor
1
Does your spaces actually are
or you just used it guessing?
– Justinas
8 hours ago
.replace(/ /g,'')
– Nitesh Virani
8 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
8 hours ago
Do you want to keep a space before10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
7 hours ago
add a comment |
1
Does your spaces actually are
or you just used it guessing?
– Justinas
8 hours ago
.replace(/ /g,'')
– Nitesh Virani
8 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
8 hours ago
Do you want to keep a space before10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
7 hours ago
1
1
Does your spaces actually are
or you just used it guessing?– Justinas
8 hours ago
Does your spaces actually are
or you just used it guessing?– Justinas
8 hours ago
.replace(/ /g,'')
– Nitesh Virani
8 hours ago
.replace(/ /g,'')
– Nitesh Virani
8 hours ago
2
2
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
8 hours ago
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
8 hours ago
Do you want to keep a space before
10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
7 hours ago
Do you want to keep a space before
10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
7 hours ago
add a comment |
6 Answers
6
active
oldest
votes
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
7 hours ago
1
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
1 hour ago
|
show 3 more comments
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
8 hours ago
add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
7 hours ago
1
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
1 hour ago
|
show 3 more comments
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
7 hours ago
1
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
1 hour ago
|
show 3 more comments
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
edited 1 hour ago
answered 8 hours ago
Grégory NEUTGrégory NEUT
8,73621538
8,73621538
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
7 hours ago
1
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
1 hour ago
|
show 3 more comments
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
7 hours ago
1
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
1 hour ago
1
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
I've edited my post to match your desire
– Grégory NEUT
8 hours ago
1
1
What about eg
請 的 10 多 個 a
– bobble bubble
7 hours ago
What about eg
請 的 10 多 個 a
– bobble bubble
7 hours ago
1
1
@GrégoryNEUT
blabla
isn't a common metasyntactic variable in English, you might want to use foo
instead ;)– Aaron
1 hour ago
@GrégoryNEUT
blabla
isn't a common metasyntactic variable in English, you might want to use foo
instead ;)– Aaron
1 hour ago
|
show 3 more comments
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
8 hours ago
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
8 hours ago
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
edited 7 hours ago
answered 8 hours ago
Wiktor StribiżewWiktor Stribiżew
310k16131206
310k16131206
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
8 hours ago
add a comment |
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
8 hours ago
FYI: if only one whitespace is expected between Chinese chars, remove
+
after s
.– Wiktor Stribiżew
8 hours ago
FYI: if only one whitespace is expected between Chinese chars, remove
+
after s
.– Wiktor Stribiżew
8 hours ago
add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
edited 8 hours ago
answered 8 hours ago
Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi
5,7422827
5,7422827
add a comment |
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
edited 7 hours ago
answered 8 hours ago
Kamil KiełczewskiKamil Kiełczewski
9,27685892
9,27685892
3
The space in front of the 10 is missing.
– holydragon
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
add a comment |
3
The space in front of the 10 is missing.
– holydragon
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
3
3
The space in front of the 10 is missing.
– holydragon
8 hours ago
The space in front of the 10 is missing.
– holydragon
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
8 hours ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
edited 7 hours ago
answered 8 hours ago
Younes ZaidiYounes Zaidi
4771415
4771415
add a comment |
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
This might be useful in your scenario. (?<![ -~]) (?![ -~])
edited 6 hours ago
Sebastian Hofmann
1,3214818
1,3214818
answered 6 hours ago
Shantanu PatwardhanShantanu Patwardhan
11
11
add a comment |
add a comment |
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Does your spaces actually are
or you just used it guessing?– Justinas
8 hours ago
.replace(/ /g,'')
– Nitesh Virani
8 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
8 hours ago
Do you want to keep a space before
10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
7 hours ago