How to delete random lines from a file?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
add a comment |
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
2
related: Python: Choose random line from file, then delete that line
– jfs
2 days ago
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
9 hours ago
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
3 hours ago
add a comment |
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
command-line text-processing
edited 9 hours ago
dessert
25.6k674108
25.6k674108
asked 2 days ago
Pravin GaddamPravin Gaddam
284
284
2
related: Python: Choose random line from file, then delete that line
– jfs
2 days ago
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
9 hours ago
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
3 hours ago
add a comment |
2
related: Python: Choose random line from file, then delete that line
– jfs
2 days ago
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
9 hours ago
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
3 hours ago
2
2
related: Python: Choose random line from file, then delete that line
– jfs
2 days ago
related: Python: Choose random line from file, then delete that line
– jfs
2 days ago
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
9 hours ago
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
9 hours ago
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
3 hours ago
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
3 hours ago
add a comment |
5 Answers
5
active
oldest
votes
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename and number variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.
Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.
Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"reads in the file specified by thefilenamevariable and outputs the number of lines this file contains.
- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$numberreturns as many unique random numbers as specified by thenumbervariable in the range 1 to$line_count(both boundaries inclusive).
- For example,
shuf -i 1-6 -n 2would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTSreturns a formatted string, taking in allARGUMENTS(not quoted this time to treat each random number as a separate argument). The format string%dd;will be repeated while there are arguments left, and%dwill be replaced with the argument represented as a decimal number.
- Therefore, e.g. an input of
1 7 42would result in an output of1d;7d;42d;.
- Therefore, e.g. an input of
The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.
All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.
add a comment |
You can use for loop to get random number and use sed command to delete the line.
for i in {0..5};
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
{0..5}expands to0 1 2 3 4 5, so this deletes six lines, you probably mean{1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
yesterday
add a comment |
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
What if two or more of these random numbers are the same?
– dessert
yesterday
add a comment |
With gawk, drop the following code into a file (called say, del_random)
function randint(n)
{
return int(n * rand()) + 1
}
BEGINFILE {
command = sprintf("wc -l <"%s"", FILENAME)
command | getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
{
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
}
}
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
The -i inplace is the gawk equivalent to sed's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
1
@dessert, fair enough, fixed
– iruvar
9 hours ago
1
+1 Nice, thank you! I added a differentawkapproach as an answer.
– dessert
3 hours ago
add a comment |
An answer on U&L has this nice awk solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN {srand()}
rand() * n-- < p {p--; next}
{print}'
Explanation
-v p=5– set variablepholding the number of lines to delete
-v n=$(<file wc -l)– set variablenholding the line count of the file
BEGIN {srand()}– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()to get truly™ random numbers
rand() * n-- < p {…}– A conditional expression running the part in braces if it is true.rand()creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn, which is decreased by 1. If the result is smaller thanp, the expression is true.
p--; next– decreasepby 1 and proceed to the next line ignoring subsequent commands
print– print the currently processed line
The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' {a..e} >file
and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename and number variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.
Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.
Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"reads in the file specified by thefilenamevariable and outputs the number of lines this file contains.
- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$numberreturns as many unique random numbers as specified by thenumbervariable in the range 1 to$line_count(both boundaries inclusive).
- For example,
shuf -i 1-6 -n 2would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTSreturns a formatted string, taking in allARGUMENTS(not quoted this time to treat each random number as a separate argument). The format string%dd;will be repeated while there are arguments left, and%dwill be replaced with the argument represented as a decimal number.
- Therefore, e.g. an input of
1 7 42would result in an output of1d;7d;42d;.
- Therefore, e.g. an input of
The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.
All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.
add a comment |
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename and number variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.
Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.
Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"reads in the file specified by thefilenamevariable and outputs the number of lines this file contains.
- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$numberreturns as many unique random numbers as specified by thenumbervariable in the range 1 to$line_count(both boundaries inclusive).
- For example,
shuf -i 1-6 -n 2would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTSreturns a formatted string, taking in allARGUMENTS(not quoted this time to treat each random number as a separate argument). The format string%dd;will be repeated while there are arguments left, and%dwill be replaced with the argument represented as a decimal number.
- Therefore, e.g. an input of
1 7 42would result in an output of1d;7d;42d;.
- Therefore, e.g. an input of
The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.
All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.
add a comment |
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename and number variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.
Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.
Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"reads in the file specified by thefilenamevariable and outputs the number of lines this file contains.
- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$numberreturns as many unique random numbers as specified by thenumbervariable in the range 1 to$line_count(both boundaries inclusive).
- For example,
shuf -i 1-6 -n 2would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTSreturns a formatted string, taking in allARGUMENTS(not quoted this time to treat each random number as a separate argument). The format string%dd;will be repeated while there are arguments left, and%dwill be replaced with the argument represented as a decimal number.
- Therefore, e.g. an input of
1 7 42would result in an output of1d;7d;42d;.
- Therefore, e.g. an input of
The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.
All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename and number variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.
Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.
Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"reads in the file specified by thefilenamevariable and outputs the number of lines this file contains.
- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$numberreturns as many unique random numbers as specified by thenumbervariable in the range 1 to$line_count(both boundaries inclusive).
- For example,
shuf -i 1-6 -n 2would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTSreturns a formatted string, taking in allARGUMENTS(not quoted this time to treat each random number as a separate argument). The format string%dd;will be repeated while there are arguments left, and%dwill be replaced with the argument represented as a decimal number.
- Therefore, e.g. an input of
1 7 42would result in an output of1d;7d;42d;.
- Therefore, e.g. an input of
The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.
All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.
edited yesterday
answered 2 days ago
Byte CommanderByte Commander
66.9k27181311
66.9k27181311
add a comment |
add a comment |
You can use for loop to get random number and use sed command to delete the line.
for i in {0..5};
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
{0..5}expands to0 1 2 3 4 5, so this deletes six lines, you probably mean{1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
yesterday
add a comment |
You can use for loop to get random number and use sed command to delete the line.
for i in {0..5};
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
{0..5}expands to0 1 2 3 4 5, so this deletes six lines, you probably mean{1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
yesterday
add a comment |
You can use for loop to get random number and use sed command to delete the line.
for i in {0..5};
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
You can use for loop to get random number and use sed command to delete the line.
for i in {0..5};
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
edited 3 hours ago
dessert
25.6k674108
25.6k674108
answered 2 days ago
ShivadityaShivaditya
44934
44934
{0..5}expands to0 1 2 3 4 5, so this deletes six lines, you probably mean{1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
yesterday
add a comment |
{0..5}expands to0 1 2 3 4 5, so this deletes six lines, you probably mean{1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
yesterday
{0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?– dessert
yesterday
{0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?– dessert
yesterday
add a comment |
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
What if two or more of these random numbers are the same?
– dessert
yesterday
add a comment |
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
What if two or more of these random numbers are the same?
– dessert
yesterday
add a comment |
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 3 hours ago
dessert
25.6k674108
25.6k674108
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 2 days ago
Jesse_bJesse_b
1412
1412
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
What if two or more of these random numbers are the same?
– dessert
yesterday
add a comment |
2
What if two or more of these random numbers are the same?
– dessert
yesterday
2
2
What if two or more of these random numbers are the same?
– dessert
yesterday
What if two or more of these random numbers are the same?
– dessert
yesterday
add a comment |
With gawk, drop the following code into a file (called say, del_random)
function randint(n)
{
return int(n * rand()) + 1
}
BEGINFILE {
command = sprintf("wc -l <"%s"", FILENAME)
command | getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
{
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
}
}
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
The -i inplace is the gawk equivalent to sed's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
1
@dessert, fair enough, fixed
– iruvar
9 hours ago
1
+1 Nice, thank you! I added a differentawkapproach as an answer.
– dessert
3 hours ago
add a comment |
With gawk, drop the following code into a file (called say, del_random)
function randint(n)
{
return int(n * rand()) + 1
}
BEGINFILE {
command = sprintf("wc -l <"%s"", FILENAME)
command | getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
{
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
}
}
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
The -i inplace is the gawk equivalent to sed's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
1
@dessert, fair enough, fixed
– iruvar
9 hours ago
1
+1 Nice, thank you! I added a differentawkapproach as an answer.
– dessert
3 hours ago
add a comment |
With gawk, drop the following code into a file (called say, del_random)
function randint(n)
{
return int(n * rand()) + 1
}
BEGINFILE {
command = sprintf("wc -l <"%s"", FILENAME)
command | getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
{
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
}
}
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
The -i inplace is the gawk equivalent to sed's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
With gawk, drop the following code into a file (called say, del_random)
function randint(n)
{
return int(n * rand()) + 1
}
BEGINFILE {
command = sprintf("wc -l <"%s"", FILENAME)
command | getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
{
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
}
}
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
The -i inplace is the gawk equivalent to sed's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
edited 3 hours ago
dessert
25.6k674108
25.6k674108
answered 2 days ago
iruvariruvar
11510
11510
1
@dessert, fair enough, fixed
– iruvar
9 hours ago
1
+1 Nice, thank you! I added a differentawkapproach as an answer.
– dessert
3 hours ago
add a comment |
1
@dessert, fair enough, fixed
– iruvar
9 hours ago
1
+1 Nice, thank you! I added a differentawkapproach as an answer.
– dessert
3 hours ago
1
1
@dessert, fair enough, fixed
– iruvar
9 hours ago
@dessert, fair enough, fixed
– iruvar
9 hours ago
1
1
+1 Nice, thank you! I added a different
awk approach as an answer.– dessert
3 hours ago
+1 Nice, thank you! I added a different
awk approach as an answer.– dessert
3 hours ago
add a comment |
An answer on U&L has this nice awk solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN {srand()}
rand() * n-- < p {p--; next}
{print}'
Explanation
-v p=5– set variablepholding the number of lines to delete
-v n=$(<file wc -l)– set variablenholding the line count of the file
BEGIN {srand()}– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()to get truly™ random numbers
rand() * n-- < p {…}– A conditional expression running the part in braces if it is true.rand()creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn, which is decreased by 1. If the result is smaller thanp, the expression is true.
p--; next– decreasepby 1 and proceed to the next line ignoring subsequent commands
print– print the currently processed line
The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' {a..e} >file
and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
add a comment |
An answer on U&L has this nice awk solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN {srand()}
rand() * n-- < p {p--; next}
{print}'
Explanation
-v p=5– set variablepholding the number of lines to delete
-v n=$(<file wc -l)– set variablenholding the line count of the file
BEGIN {srand()}– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()to get truly™ random numbers
rand() * n-- < p {…}– A conditional expression running the part in braces if it is true.rand()creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn, which is decreased by 1. If the result is smaller thanp, the expression is true.
p--; next– decreasepby 1 and proceed to the next line ignoring subsequent commands
print– print the currently processed line
The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' {a..e} >file
and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
add a comment |
An answer on U&L has this nice awk solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN {srand()}
rand() * n-- < p {p--; next}
{print}'
Explanation
-v p=5– set variablepholding the number of lines to delete
-v n=$(<file wc -l)– set variablenholding the line count of the file
BEGIN {srand()}– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()to get truly™ random numbers
rand() * n-- < p {…}– A conditional expression running the part in braces if it is true.rand()creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn, which is decreased by 1. If the result is smaller thanp, the expression is true.
p--; next– decreasepby 1 and proceed to the next line ignoring subsequent commands
print– print the currently processed line
The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' {a..e} >file
and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
An answer on U&L has this nice awk solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN {srand()}
rand() * n-- < p {p--; next}
{print}'
Explanation
-v p=5– set variablepholding the number of lines to delete
-v n=$(<file wc -l)– set variablenholding the line count of the file
BEGIN {srand()}– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()to get truly™ random numbers
rand() * n-- < p {…}– A conditional expression running the part in braces if it is true.rand()creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn, which is decreased by 1. If the result is smaller thanp, the expression is true.
p--; next– decreasepby 1 and proceed to the next line ignoring subsequent commands
print– print the currently processed line
The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' {a..e} >file
and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
edited 2 hours ago
answered 3 hours ago
dessertdessert
25.6k674108
25.6k674108
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
related: Python: Choose random line from file, then delete that line
– jfs
2 days ago
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
9 hours ago
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
3 hours ago