extract characters between two commas?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}

I have a file with ~ 3 million rows, here is the first few lines of my file:

head out.txt

    NA

    NA

    NA

    NA

    NA

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752

    gene85752

For those rows that are separated by ",", I want to keep everything after the first comma and before the second comma.
This is my desired output:

outgood.txt

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

asked 2 days ago

Anna1364

456213

add a comment |

I have a file with ~ 3 million rows, here is the first few lines of my file:

head out.txt

    NA

    NA

    NA

    NA

    NA

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752

    gene85752

For those rows that are separated by ",", I want to keep everything after the first comma and before the second comma.
This is my desired output:

outgood.txt

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

asked 2 days ago

Anna1364

456213

add a comment |

I have a file with ~ 3 million rows, here is the first few lines of my file:

head out.txt

    NA

    NA

    NA

    NA

    NA

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752

    gene85752

For those rows that are separated by ",", I want to keep everything after the first comma and before the second comma.
This is my desired output:

outgood.txt

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

asked 2 days ago

Anna1364

456213

I have a file with ~ 3 million rows, here is the first few lines of my file:

head out.txt

    NA

    NA

    NA

    NA

    NA

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753,gene85754

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752,gene85753

    gene85752

    gene85752

For those rows that are separated by ",", I want to keep everything after the first comma and before the second comma.
This is my desired output:

outgood.txt

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

text-processing awk

asked 2 days ago

Anna1364

456213

asked 2 days ago

Anna1364

456213

asked 2 days ago

Anna1364

456213

asked 2 days ago

Anna1364

456213

asked 2 days ago

Anna1364

456213

add a comment |

4 Answers
4

active

oldest

votes

Since cut prints non-delimited lines by default the following works

cut -f2 -d, file

answered 2 days ago

iruvar

12.5k63063

1

It's nice when someone remember the little quirks of standard tools.

– Kusalananda♦
2 days ago

add a comment |

awk -F, 'NF > 1 { $1 = $2 } { print $1 }' file

This uses awk to parse the file as lines consisting of comma-delimited fields.

The code detects when there is more than a single field on a line, and when there is, the first field is replaced by the second field. The first field, either unmodified or modified by the conditional code, is then printed.

answered 2 days ago

Kusalananda♦

141k17262438

With a big file, this would probably be faster: awk -F, '{print(NF>1 ? $2 : $1)}' -- since you won't have to rewrite $0

– glenn jackman
2 days ago

@glennjackman Well, the cut solution would be even faster in any case.

– Kusalananda♦
2 days ago

add a comment |

awk -F, 'NF == 1 {print $1}

         NF > 1 { print $2}' filename

This will print just the first string if there is no comma, second string if there is one or more comma.

answered 2 days ago

unxnut

3,80721120

add a comment |

You can do this with Perl as follows.

Command-line:

$ perl -F, -pale '$_ = $F[1] // $_' out.txt

Explanation:

-p will read records line-by-line AND autoprint before going in to read the next or eof.

-l makes IRS = ORS = "n"

-F, makes FS a comma.

-a splits each record $_ on the field separator, in our case a comma, and goes ahead and stores the fields so generated in the array @F, which is zero-indexed.

-e implies, what follows it is the Perl code, which shall be gets applied to each record.

$_ = $F[1] // $_ expression reads as follows: if the 2nd field $F[1] isn't defined, use the current record $_. And then the result of this expression is assigned to the current record $_.

owing to the -p switch of perl being in use, before the new record is read in, the current record is taken to stdout.

Result:

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

You may also do it with the GNU version of the sed editor as shown below:

$ sed -ne '

    s/,/n/

    s/.*n//

    s/,/n/

    P

' out.txt

answered 2 days ago

Rakesh Sharma

262

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f511284%2fextract-characters-between-two-commas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Since cut prints non-delimited lines by default the following works

cut -f2 -d, file

answered 2 days ago

iruvar

12.5k63063

1

It's nice when someone remember the little quirks of standard tools.

– Kusalananda♦
2 days ago

add a comment |

Since cut prints non-delimited lines by default the following works

cut -f2 -d, file

answered 2 days ago

iruvar

12.5k63063

1

It's nice when someone remember the little quirks of standard tools.

– Kusalananda♦
2 days ago

add a comment |

Since cut prints non-delimited lines by default the following works

cut -f2 -d, file

answered 2 days ago

iruvar

12.5k63063

Since cut prints non-delimited lines by default the following works

cut -f2 -d, file

answered 2 days ago

iruvar

12.5k63063

answered 2 days ago

iruvar

12.5k63063

answered 2 days ago

iruvar

12.5k63063

answered 2 days ago

iruvar

12.5k63063

1

It's nice when someone remember the little quirks of standard tools.

– Kusalananda♦
2 days ago

add a comment |

1

It's nice when someone remember the little quirks of standard tools.

– Kusalananda♦
2 days ago

It's nice when someone remember the little quirks of standard tools.

– Kusalananda♦
2 days ago

add a comment |

awk -F, 'NF > 1 { $1 = $2 } { print $1 }' file

This uses awk to parse the file as lines consisting of comma-delimited fields.

answered 2 days ago

Kusalananda♦

141k17262438

With a big file, this would probably be faster: awk -F, '{print(NF>1 ? $2 : $1)}' -- since you won't have to rewrite $0

– glenn jackman
2 days ago

@glennjackman Well, the cut solution would be even faster in any case.

– Kusalananda♦
2 days ago

add a comment |

awk -F, 'NF > 1 { $1 = $2 } { print $1 }' file

This uses awk to parse the file as lines consisting of comma-delimited fields.

answered 2 days ago

Kusalananda♦

141k17262438

With a big file, this would probably be faster: awk -F, '{print(NF>1 ? $2 : $1)}' -- since you won't have to rewrite $0

– glenn jackman
2 days ago

@glennjackman Well, the cut solution would be even faster in any case.

– Kusalananda♦
2 days ago

add a comment |

awk -F, 'NF > 1 { $1 = $2 } { print $1 }' file

This uses awk to parse the file as lines consisting of comma-delimited fields.

answered 2 days ago

Kusalananda♦

141k17262438

awk -F, 'NF > 1 { $1 = $2 } { print $1 }' file

This uses awk to parse the file as lines consisting of comma-delimited fields.

answered 2 days ago

Kusalananda♦

141k17262438

answered 2 days ago

Kusalananda♦

141k17262438

answered 2 days ago

Kusalananda♦

141k17262438

answered 2 days ago

Kusalananda♦

141k17262438

With a big file, this would probably be faster: awk -F, '{print(NF>1 ? $2 : $1)}' -- since you won't have to rewrite $0

– glenn jackman
2 days ago

@glennjackman Well, the cut solution would be even faster in any case.

– Kusalananda♦
2 days ago

add a comment |

With a big file, this would probably be faster: awk -F, '{print(NF>1 ? $2 : $1)}' -- since you won't have to rewrite $0

– glenn jackman
2 days ago

@glennjackman Well, the cut solution would be even faster in any case.

– Kusalananda♦
2 days ago

With a big file, this would probably be faster: awk -F, '{print(NF>1 ? $2 : $1)}' -- since you won't have to rewrite $0

– glenn jackman
2 days ago

@glennjackman Well, the cut solution would be even faster in any case.

– Kusalananda♦
2 days ago

add a comment |

awk -F, 'NF == 1 {print $1}

         NF > 1 { print $2}' filename

This will print just the first string if there is no comma, second string if there is one or more comma.

answered 2 days ago

unxnut

3,80721120

add a comment |

awk -F, 'NF == 1 {print $1}

         NF > 1 { print $2}' filename

This will print just the first string if there is no comma, second string if there is one or more comma.

answered 2 days ago

unxnut

3,80721120

add a comment |

awk -F, 'NF == 1 {print $1}

         NF > 1 { print $2}' filename

This will print just the first string if there is no comma, second string if there is one or more comma.

answered 2 days ago

unxnut

3,80721120

awk -F, 'NF == 1 {print $1}

         NF > 1 { print $2}' filename

This will print just the first string if there is no comma, second string if there is one or more comma.

answered 2 days ago

unxnut

3,80721120

answered 2 days ago

unxnut

3,80721120

answered 2 days ago

unxnut

3,80721120

answered 2 days ago

unxnut

3,80721120

add a comment |

You can do this with Perl as follows.

Command-line:

$ perl -F, -pale '$_ = $F[1] // $_' out.txt

Explanation:

-p will read records line-by-line AND autoprint before going in to read the next or eof.

-l makes IRS = ORS = "n"

-F, makes FS a comma.

-a splits each record $_ on the field separator, in our case a comma, and goes ahead and stores the fields so generated in the array @F, which is zero-indexed.

-e implies, what follows it is the Perl code, which shall be gets applied to each record.

$_ = $F[1] // $_ expression reads as follows: if the 2nd field $F[1] isn't defined, use the current record $_. And then the result of this expression is assigned to the current record $_.

owing to the -p switch of perl being in use, before the new record is read in, the current record is taken to stdout.

Result:

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

You may also do it with the GNU version of the sed editor as shown below:

$ sed -ne '

    s/,/n/

    s/.*n//

    s/,/n/

    P

' out.txt

answered 2 days ago

Rakesh Sharma

262

add a comment |

You can do this with Perl as follows.

Command-line:

$ perl -F, -pale '$_ = $F[1] // $_' out.txt

Explanation:

-p will read records line-by-line AND autoprint before going in to read the next or eof.

-l makes IRS = ORS = "n"

-F, makes FS a comma.

-a splits each record $_ on the field separator, in our case a comma, and goes ahead and stores the fields so generated in the array @F, which is zero-indexed.

-e implies, what follows it is the Perl code, which shall be gets applied to each record.

$_ = $F[1] // $_ expression reads as follows: if the 2nd field $F[1] isn't defined, use the current record $_. And then the result of this expression is assigned to the current record $_.

owing to the -p switch of perl being in use, before the new record is read in, the current record is taken to stdout.

Result:

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

You may also do it with the GNU version of the sed editor as shown below:

$ sed -ne '

    s/,/n/

    s/.*n//

    s/,/n/

    P

' out.txt

answered 2 days ago

Rakesh Sharma

262

add a comment |

You can do this with Perl as follows.

Command-line:

$ perl -F, -pale '$_ = $F[1] // $_' out.txt

Explanation:

-p will read records line-by-line AND autoprint before going in to read the next or eof.

-l makes IRS = ORS = "n"

-F, makes FS a comma.

-a splits each record $_ on the field separator, in our case a comma, and goes ahead and stores the fields so generated in the array @F, which is zero-indexed.

-e implies, what follows it is the Perl code, which shall be gets applied to each record.

$_ = $F[1] // $_ expression reads as follows: if the 2nd field $F[1] isn't defined, use the current record $_. And then the result of this expression is assigned to the current record $_.

owing to the -p switch of perl being in use, before the new record is read in, the current record is taken to stdout.

Result:

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

You may also do it with the GNU version of the sed editor as shown below:

$ sed -ne '

    s/,/n/

    s/.*n//

    s/,/n/

    P

' out.txt

answered 2 days ago

Rakesh Sharma

262

You can do this with Perl as follows.

Command-line:

$ perl -F, -pale '$_ = $F[1] // $_' out.txt

Explanation:

-p will read records line-by-line AND autoprint before going in to read the next or eof.

-l makes IRS = ORS = "n"

-F, makes FS a comma.

-a splits each record $_ on the field separator, in our case a comma, and goes ahead and stores the fields so generated in the array @F, which is zero-indexed.

-e implies, what follows it is the Perl code, which shall be gets applied to each record.

$_ = $F[1] // $_ expression reads as follows: if the 2nd field $F[1] isn't defined, use the current record $_. And then the result of this expression is assigned to the current record $_.

owing to the -p switch of perl being in use, before the new record is read in, the current record is taken to stdout.

Result:

NA

NA

NA

NA

NA

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85753

gene85752

gene85752

You may also do it with the GNU version of the sed editor as shown below:

$ sed -ne '

    s/,/n/

    s/.*n//

    s/,/n/

    P

' out.txt

answered 2 days ago

Rakesh Sharma

262

answered 2 days ago

Rakesh Sharma

262

answered 2 days ago

Rakesh Sharma

262

answered 2 days ago

Rakesh Sharma

262

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Jtdcftul