bash - How to select text in a file until a certain string using grep, sed or awk? -


i have huge file (this sample) , select lines "ph_gufac1083" , after until reach 1 doesn't have code (in example ph_gufac1139)

>uce_353_ph_gufac1083 |uce_353 tttagccatagaaatgcagaaataattagaagtgccattgtgtacagtgccttctggact gggctgaaggtgaaggagaaagtatcatactatccttgtcagctgcaagggtaattactg ctggctgaaattactcaacatttgtttataagctccccagagcatgctgtaaatagattg tctgttatagtccaatcacattaaaacgctgctccttgcaaactgctacctcctgttttc tgtaagctagacagagaaagcctgctgctcacttactgagcaccaagcactgaagagcta tgtttaatgtgattgttttcattagctcttctctgtctgatattacatttataatttgct gggcttgaagactggcatgttgcattgctttcatttactgtagtaagagtgaatagctct @ >uce_101_ph_gufac1083 |uce_101 ttgggctttatttccaccttaaaatctttacctggccgtgatctgttgttccattactgg agggcaaaaatgggaggaattgtctgggctaaattgcaattaggcagccctgagagaggc tggcaccagttaacttgggatattggagtgaaaaggcccgtaatcagccttcggtcatgt agaacaatgcataaaattaaattgacattaatgaataattgtgtaatgaaaatggaagag gagagttaattgcatgttacagtgagtgtaatgcctagataaccttgcatttaatgctat tcttagccctgctgccaagacttctacagagcctctctctgcaggaagtcattaaagctg tgagtagataatgcaggctcagtgaaacctaagtggcaacaatata >uce_171_ph_gufac1083 |uce_171 catggaaaacgaggaaaagccatatcttccaggccattaatattactacggagacgtctt catatcgccgtaattacagcagatctcaaagtggcacaaccaagaccagcaccaaagcta aaataactcgcaggagcaggcgagctgcttttgcagccctcagtcccagaaatgctcggt agcttttcttaaaatagacagcctgtaaataaggtctgtgaactcaattgaaggtggctg tttctgaattagtcagccctcacaaggctctcggcctacatgctagtacataaattgtcc actttaccaccagacaagaaagattagagtaataaacacggggcattagctcagctagag aaacacaccagccgttacgcacacgcgggattgccaagaactgttaaccccactctccag aaacgcacacaaaaaaacaagttaaagccatgacatcatgggaa  >uce_4300_ph_gufac1139 |uce_4300 attaaaaatacaatcctcatgtttgcattttgcagtcgtcaacaagaaattgaagagaaa ctcatagaggaagaaactgctcgaagggtggaagaacttgtagctaaacgcgtggaagaa gagctggagaaaagaaaggatgagattgagcgagaggttctccgcagggtggaggaggct aagcgcatcatggaaaaacagttgctcgaagaactcgagcgacagcgacaagctgaactt gcagcacaaaaagccagagaggtaacgctcggtcgtttggaaagtagagacagtccatgg caaaactttcagtgtcggtttgtgcctcctgttcggttcagaaagagatggaatacagca aatctaattcccttctcatataaacttgcattgctgcgaaacttaatttctagcctattc agaggagctcactgatatttaaacagttactctcctaaaacctgaacaaggatacttgat tcttaatggaactgacctacatatttcagaattgtttgaaacttttgccatggctgcagg attattcagcagtcctttcatttt >uce_1039_ph_gufac1139 |uce_1039 attagtggaatacaaatatgcaaaaaccaaacagtttggtgctataatgtgaaaagaaat ttacaccaatcttatttttaatttgtatgggaacatttttaccacaaattccatatttta ataatactatcccaactctattttttagactcattttgtcactgttttgtaacagaaaca ctgtaaatattatagatgtggtaaactattatacttgttttcttataaatgaaatgatct gtgccaacactgacaaaatgaattaatgtgttactaaggcaacagtcacattatatgctt tctctttcacagtatgcggtagagcatatggtttactcttaatggaacactagcttctca ttaacataccagtagcaatgtcagaacttacaaaccagcataacagagaaatggaaaaac ttataaattagaccctttcagtattattgagtagaaaatgactgatgttccaaggtacaa tatttagctaatacagtgcccttttctgcatctttcttctcaaaggaaaaaaaaatcctc aaaaaaaaccagagcaagaaacctaactttttcttgt 

i tried several alternatives without success, closest reached was

sed -n '/ph_gufac1083/, />/p' file.txt 

that gave me that:

>uce_2347_ph_gufac1083 |uce_2347 gcttttctatgcagattttttctaattctctccctccccttgcttctgtcagtgtgaagc ccacactaagcattaacagtattaaaaagagtgttatctattagttcaattagacatcag acatttactttccaatgtatttgaagactgatttgatttgggtccaatcatttaaaaata agagagcagaactgtgtacagagctgtgtacagatatctgtagctctgaagtcttaattg caaattcagataaggattagaaggggctgtatctctgtagaccaaaggtatttgctaata cctgagatataaaagtggttaaattcaatatttactaatttaggatttccactttggatt ttgattaagctttttggttgaaaaccccacattattaagctgtgatgagggaaaaagcaa ctctttcataagcctcactttaacgctttatttcaaataatttattttggaccttctaaa g >uce_353_ph_gufac1083 |uce_353  >uce_101_ph_gufac1083 |uce_101 ttgggctttatttccaccttaaaatctttacctggccgtgatctgttgttccattactgg agggcaaaaatgggaggaattgtctgggctaaattgcaattaggcagccctgagagaggc tggcaccagttaacttgggatattggagtgaaaaggcccgtaatcagccttcggtcatgt agaacaatgcataaaattaaattgacattaatgaataattgtgtaatgaaaatggaagag gagagttaattgcatgttacagtgagtgtaatgcctagataaccttgcatttaatgctat tcttagccctgctgccaagacttctacagagcctctctctgcaggaagtcattaaagctg tgagtagataatgcaggctcagtgaaacctaagtggcaacaatata >uce_171_ph_gufac1083 |uce_171 

do know how using grep, sed or awk?

thx

$ awk '/^>/{if(match($0,"ph_gufac1083")){s=1} else s=0}s' file 

i made simple criteria request,

  • if the start of line >, we're going judge if "ph_gufac1083" existed, if yes, set s=1, set s=0 otherwise.
  • for line doesn't start >, value of s retained.
  • the final s in awk command decide if line printed (s=1) or not (s=0).

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -