How to delete random lines from a file?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







5















I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?










share|improve this question




















  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    2 days ago











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    9 hours ago













  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    3 hours ago


















5















I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?










share|improve this question




















  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    2 days ago











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    9 hours ago













  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    3 hours ago














5












5








5








I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?










share|improve this question
















I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?







command-line text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 9 hours ago









dessert

25.6k674108




25.6k674108










asked 2 days ago









Pravin GaddamPravin Gaddam

284




284








  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    2 days ago











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    9 hours ago













  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    3 hours ago














  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    2 days ago











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    9 hours ago













  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    3 hours ago








2




2





related: Python: Choose random line from file, then delete that line

– jfs
2 days ago





related: Python: Choose random line from file, then delete that line

– jfs
2 days ago













To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

– dessert
9 hours ago







To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

– dessert
9 hours ago















@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

– dessert
3 hours ago





@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

– dessert
3 hours ago










5 Answers
5






active

oldest

votes


















14














You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



filename="/PATH/TO/FILE"
number=5

line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"

sed -i.bak -e "$sed_script" "$filename"


Or in one line (after defining the filename and number variables or replacing them manually):



sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.





To break down and explain the rest of the command:



sed -e "SCRIPT" "$filename"


runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:





  • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.




    • In your case, this should return roughly 10000 according to the size you mentioned in the question.




  • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).




    • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.




  • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.




    • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.




The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






share|improve this answer

































    6














    You can use for loop to get random number and use sed command to delete the line.



    for i in {0..5};
    do sed -i "$((1 + RANDOM % 10000))d" filename;
    done





    share|improve this answer


























    • {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

      – dessert
      yesterday



















    4














    Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



    sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


    Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






    share|improve this answer










    New contributor




    Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.
















    • 2





      What if two or more of these random numbers are the same?

      – dessert
      yesterday



















    2














    With gawk, drop the following code into a file (called say, del_random)



    function randint(n)
    {
    return int(n * rand()) + 1
    }

    BEGINFILE {
    command = sprintf("wc -l <"%s"", FILENAME)
    command | getline total_lines
    srand()
    delete arr
    while (length(arr) < lines_to_del)
    {
    val = randint(total_lines)
    if (val in arr)
    continue
    arr[val] = 1
    }
    }
    !(FNR in arr)


    and then execute it as



    gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


    Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
    The -i inplace is the gawk equivalent to sed's -i



    On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



    gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





    share|improve this answer





















    • 1





      @dessert, fair enough, fixed

      – iruvar
      9 hours ago






    • 1





      +1 Nice, thank you! I added a different awk approach as an answer.

      – dessert
      3 hours ago



















    2














    An answer on U&L has this nice awk solution for the problem:





    <file awk -v p=5 -v n=$(<file wc -l) '
    BEGIN {srand()}
    rand() * n-- < p {p--; next}
    {print}'


    Explanation





    • -v p=5 – set variable p holding the number of lines to delete


    • -v n=$(<file wc -l) – set variable n holding the line count of the file


    • BEGIN {srand()} – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


    • rand() * n-- < p {…} – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


    • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


    • print – print the currently processed line


    The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



    Example run



    I created a file with the letters a–e each in an own line with



    printf '%sn' {a..e} >file


    and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
    n=5 p=1
    n=4 p=0 b
    n=3 p=0 c
    n=2 p=0 d
    n=1 p=0 e
    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
    n=5 p=1 a
    n=4 p=1 b
    n=3 p=1
    n=2 p=0 d
    n=1 p=0 e
    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
    n=5 p=1 a
    n=4 p=1 b
    n=3 p=1 c
    n=2 p=1 d
    n=1 p=1


    Further reading




    • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions






    share|improve this answer


























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "89"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      14














      You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



      filename="/PATH/TO/FILE"
      number=5

      line_count="$(wc -l < "$filename")"
      line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
      sed_script="$(printf '%dd;' $line_nums_to_delete)"

      sed -i.bak -e "$sed_script" "$filename"


      Or in one line (after defining the filename and number variables or replacing them manually):



      sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


      The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



      Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.





      To break down and explain the rest of the command:



      sed -e "SCRIPT" "$filename"


      runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



      Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:





      • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.




        • In your case, this should return roughly 10000 according to the size you mentioned in the question.




      • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).




        • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.




      • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.




        • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.




      The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



      All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






      share|improve this answer






























        14














        You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



        filename="/PATH/TO/FILE"
        number=5

        line_count="$(wc -l < "$filename")"
        line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
        sed_script="$(printf '%dd;' $line_nums_to_delete)"

        sed -i.bak -e "$sed_script" "$filename"


        Or in one line (after defining the filename and number variables or replacing them manually):



        sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


        The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



        Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.





        To break down and explain the rest of the command:



        sed -e "SCRIPT" "$filename"


        runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



        Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:





        • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.




          • In your case, this should return roughly 10000 according to the size you mentioned in the question.




        • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).




          • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.




        • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.




          • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.




        The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



        All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






        share|improve this answer




























          14












          14








          14







          You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



          filename="/PATH/TO/FILE"
          number=5

          line_count="$(wc -l < "$filename")"
          line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
          sed_script="$(printf '%dd;' $line_nums_to_delete)"

          sed -i.bak -e "$sed_script" "$filename"


          Or in one line (after defining the filename and number variables or replacing them manually):



          sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


          The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



          Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.





          To break down and explain the rest of the command:



          sed -e "SCRIPT" "$filename"


          runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



          Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:





          • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.




            • In your case, this should return roughly 10000 according to the size you mentioned in the question.




          • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).




            • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.




          • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.




            • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.




          The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



          All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






          share|improve this answer















          You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



          filename="/PATH/TO/FILE"
          number=5

          line_count="$(wc -l < "$filename")"
          line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
          sed_script="$(printf '%dd;' $line_nums_to_delete)"

          sed -i.bak -e "$sed_script" "$filename"


          Or in one line (after defining the filename and number variables or replacing them manually):



          sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


          The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



          Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.





          To break down and explain the rest of the command:



          sed -e "SCRIPT" "$filename"


          runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



          Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:





          • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.




            • In your case, this should return roughly 10000 according to the size you mentioned in the question.




          • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).




            • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.




          • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.




            • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.




          The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



          All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited yesterday

























          answered 2 days ago









          Byte CommanderByte Commander

          66.9k27181311




          66.9k27181311

























              6














              You can use for loop to get random number and use sed command to delete the line.



              for i in {0..5};
              do sed -i "$((1 + RANDOM % 10000))d" filename;
              done





              share|improve this answer


























              • {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                – dessert
                yesterday
















              6














              You can use for loop to get random number and use sed command to delete the line.



              for i in {0..5};
              do sed -i "$((1 + RANDOM % 10000))d" filename;
              done





              share|improve this answer


























              • {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                – dessert
                yesterday














              6












              6








              6







              You can use for loop to get random number and use sed command to delete the line.



              for i in {0..5};
              do sed -i "$((1 + RANDOM % 10000))d" filename;
              done





              share|improve this answer















              You can use for loop to get random number and use sed command to delete the line.



              for i in {0..5};
              do sed -i "$((1 + RANDOM % 10000))d" filename;
              done






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 3 hours ago









              dessert

              25.6k674108




              25.6k674108










              answered 2 days ago









              ShivadityaShivaditya

              44934




              44934













              • {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                – dessert
                yesterday



















              • {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                – dessert
                yesterday

















              {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

              – dessert
              yesterday





              {0..5} expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean {1..5}. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

              – dessert
              yesterday











              4














              Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



              sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


              Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






              share|improve this answer










              New contributor




              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.
















              • 2





                What if two or more of these random numbers are the same?

                – dessert
                yesterday
















              4














              Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



              sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


              Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






              share|improve this answer










              New contributor




              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.
















              • 2





                What if two or more of these random numbers are the same?

                – dessert
                yesterday














              4












              4








              4







              Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



              sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


              Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






              share|improve this answer










              New contributor




              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.










              Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



              sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


              Will select five random numbers between 1 and 10000 and delete those lines in a single operation.







              share|improve this answer










              New contributor




              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              share|improve this answer



              share|improve this answer








              edited 3 hours ago









              dessert

              25.6k674108




              25.6k674108






              New contributor




              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.









              answered 2 days ago









              Jesse_bJesse_b

              1412




              1412




              New contributor




              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.





              New contributor





              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              Jesse_b is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.








              • 2





                What if two or more of these random numbers are the same?

                – dessert
                yesterday














              • 2





                What if two or more of these random numbers are the same?

                – dessert
                yesterday








              2




              2





              What if two or more of these random numbers are the same?

              – dessert
              yesterday





              What if two or more of these random numbers are the same?

              – dessert
              yesterday











              2














              With gawk, drop the following code into a file (called say, del_random)



              function randint(n)
              {
              return int(n * rand()) + 1
              }

              BEGINFILE {
              command = sprintf("wc -l <"%s"", FILENAME)
              command | getline total_lines
              srand()
              delete arr
              while (length(arr) < lines_to_del)
              {
              val = randint(total_lines)
              if (val in arr)
              continue
              arr[val] = 1
              }
              }
              !(FNR in arr)


              and then execute it as



              gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


              Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
              The -i inplace is the gawk equivalent to sed's -i



              On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



              gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





              share|improve this answer





















              • 1





                @dessert, fair enough, fixed

                – iruvar
                9 hours ago






              • 1





                +1 Nice, thank you! I added a different awk approach as an answer.

                – dessert
                3 hours ago
















              2














              With gawk, drop the following code into a file (called say, del_random)



              function randint(n)
              {
              return int(n * rand()) + 1
              }

              BEGINFILE {
              command = sprintf("wc -l <"%s"", FILENAME)
              command | getline total_lines
              srand()
              delete arr
              while (length(arr) < lines_to_del)
              {
              val = randint(total_lines)
              if (val in arr)
              continue
              arr[val] = 1
              }
              }
              !(FNR in arr)


              and then execute it as



              gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


              Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
              The -i inplace is the gawk equivalent to sed's -i



              On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



              gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





              share|improve this answer





















              • 1





                @dessert, fair enough, fixed

                – iruvar
                9 hours ago






              • 1





                +1 Nice, thank you! I added a different awk approach as an answer.

                – dessert
                3 hours ago














              2












              2








              2







              With gawk, drop the following code into a file (called say, del_random)



              function randint(n)
              {
              return int(n * rand()) + 1
              }

              BEGINFILE {
              command = sprintf("wc -l <"%s"", FILENAME)
              command | getline total_lines
              srand()
              delete arr
              while (length(arr) < lines_to_del)
              {
              val = randint(total_lines)
              if (val in arr)
              continue
              arr[val] = 1
              }
              }
              !(FNR in arr)


              and then execute it as



              gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


              Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
              The -i inplace is the gawk equivalent to sed's -i



              On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



              gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





              share|improve this answer















              With gawk, drop the following code into a file (called say, del_random)



              function randint(n)
              {
              return int(n * rand()) + 1
              }

              BEGINFILE {
              command = sprintf("wc -l <"%s"", FILENAME)
              command | getline total_lines
              srand()
              delete arr
              while (length(arr) < lines_to_del)
              {
              val = randint(total_lines)
              if (val in arr)
              continue
              arr[val] = 1
              }
              }
              !(FNR in arr)


              and then execute it as



              gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


              Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
              The -i inplace is the gawk equivalent to sed's -i



              On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



              gawk -i inplace -v lines_to_del=5 -f del_random file1 file2






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 3 hours ago









              dessert

              25.6k674108




              25.6k674108










              answered 2 days ago









              iruvariruvar

              11510




              11510








              • 1





                @dessert, fair enough, fixed

                – iruvar
                9 hours ago






              • 1





                +1 Nice, thank you! I added a different awk approach as an answer.

                – dessert
                3 hours ago














              • 1





                @dessert, fair enough, fixed

                – iruvar
                9 hours ago






              • 1





                +1 Nice, thank you! I added a different awk approach as an answer.

                – dessert
                3 hours ago








              1




              1





              @dessert, fair enough, fixed

              – iruvar
              9 hours ago





              @dessert, fair enough, fixed

              – iruvar
              9 hours ago




              1




              1





              +1 Nice, thank you! I added a different awk approach as an answer.

              – dessert
              3 hours ago





              +1 Nice, thank you! I added a different awk approach as an answer.

              – dessert
              3 hours ago











              2














              An answer on U&L has this nice awk solution for the problem:





              <file awk -v p=5 -v n=$(<file wc -l) '
              BEGIN {srand()}
              rand() * n-- < p {p--; next}
              {print}'


              Explanation





              • -v p=5 – set variable p holding the number of lines to delete


              • -v n=$(<file wc -l) – set variable n holding the line count of the file


              • BEGIN {srand()} – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


              • rand() * n-- < p {…} – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


              • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


              • print – print the currently processed line


              The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



              Example run



              I created a file with the letters a–e each in an own line with



              printf '%sn' {a..e} >file


              and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



              $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
              n=5 p=1
              n=4 p=0 b
              n=3 p=0 c
              n=2 p=0 d
              n=1 p=0 e
              $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
              n=5 p=1 a
              n=4 p=1 b
              n=3 p=1
              n=2 p=0 d
              n=1 p=0 e
              $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
              n=5 p=1 a
              n=4 p=1 b
              n=3 p=1 c
              n=2 p=1 d
              n=1 p=1


              Further reading




              • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions






              share|improve this answer






























                2














                An answer on U&L has this nice awk solution for the problem:





                <file awk -v p=5 -v n=$(<file wc -l) '
                BEGIN {srand()}
                rand() * n-- < p {p--; next}
                {print}'


                Explanation





                • -v p=5 – set variable p holding the number of lines to delete


                • -v n=$(<file wc -l) – set variable n holding the line count of the file


                • BEGIN {srand()} – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                • rand() * n-- < p {…} – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                • print – print the currently processed line


                The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                Example run



                I created a file with the letters a–e each in an own line with



                printf '%sn' {a..e} >file


                and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                n=5 p=1
                n=4 p=0 b
                n=3 p=0 c
                n=2 p=0 d
                n=1 p=0 e
                $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                n=5 p=1 a
                n=4 p=1 b
                n=3 p=1
                n=2 p=0 d
                n=1 p=0 e
                $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                n=5 p=1 a
                n=4 p=1 b
                n=3 p=1 c
                n=2 p=1 d
                n=1 p=1


                Further reading




                • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions






                share|improve this answer




























                  2












                  2








                  2







                  An answer on U&L has this nice awk solution for the problem:





                  <file awk -v p=5 -v n=$(<file wc -l) '
                  BEGIN {srand()}
                  rand() * n-- < p {p--; next}
                  {print}'


                  Explanation





                  • -v p=5 – set variable p holding the number of lines to delete


                  • -v n=$(<file wc -l) – set variable n holding the line count of the file


                  • BEGIN {srand()} – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                  • rand() * n-- < p {…} – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                  • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                  • print – print the currently processed line


                  The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                  Example run



                  I created a file with the letters a–e each in an own line with



                  printf '%sn' {a..e} >file


                  and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                  n=5 p=1
                  n=4 p=0 b
                  n=3 p=0 c
                  n=2 p=0 d
                  n=1 p=0 e
                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                  n=5 p=1 a
                  n=4 p=1 b
                  n=3 p=1
                  n=2 p=0 d
                  n=1 p=0 e
                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                  n=5 p=1 a
                  n=4 p=1 b
                  n=3 p=1 c
                  n=2 p=1 d
                  n=1 p=1


                  Further reading




                  • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions






                  share|improve this answer















                  An answer on U&L has this nice awk solution for the problem:





                  <file awk -v p=5 -v n=$(<file wc -l) '
                  BEGIN {srand()}
                  rand() * n-- < p {p--; next}
                  {print}'


                  Explanation





                  • -v p=5 – set variable p holding the number of lines to delete


                  • -v n=$(<file wc -l) – set variable n holding the line count of the file


                  • BEGIN {srand()} – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                  • rand() * n-- < p {…} – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                  • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                  • print – print the currently processed line


                  The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                  Example run



                  I created a file with the letters a–e each in an own line with



                  printf '%sn' {a..e} >file


                  and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                  n=5 p=1
                  n=4 p=0 b
                  n=3 p=0 c
                  n=2 p=0 d
                  n=1 p=0 e
                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                  n=5 p=1 a
                  n=4 p=1 b
                  n=3 p=1
                  n=2 p=0 d
                  n=1 p=0 e
                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN {srand()} {printf "n="n" p="p" "} rand() * n-- < p {p--; print ""; next} {print}'
                  n=5 p=1 a
                  n=4 p=1 b
                  n=3 p=1 c
                  n=2 p=1 d
                  n=1 p=1


                  Further reading




                  • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 2 hours ago

























                  answered 3 hours ago









                  dessertdessert

                  25.6k674108




                  25.6k674108






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Ask Ubuntu!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      香粉寮

                      GameSpot