bash - How to select text in a file until a certain string using grep, sed or awk? -
i have huge file (this sample) , select lines "ph_gufac1083"
, after until reach 1 doesn't have code (in example ph_gufac1139
)
>uce_353_ph_gufac1083 |uce_353 tttagccatagaaatgcagaaataattagaagtgccattgtgtacagtgccttctggact gggctgaaggtgaaggagaaagtatcatactatccttgtcagctgcaagggtaattactg ctggctgaaattactcaacatttgtttataagctccccagagcatgctgtaaatagattg tctgttatagtccaatcacattaaaacgctgctccttgcaaactgctacctcctgttttc tgtaagctagacagagaaagcctgctgctcacttactgagcaccaagcactgaagagcta tgtttaatgtgattgttttcattagctcttctctgtctgatattacatttataatttgct gggcttgaagactggcatgttgcattgctttcatttactgtagtaagagtgaatagctct @ >uce_101_ph_gufac1083 |uce_101 ttgggctttatttccaccttaaaatctttacctggccgtgatctgttgttccattactgg agggcaaaaatgggaggaattgtctgggctaaattgcaattaggcagccctgagagaggc tggcaccagttaacttgggatattggagtgaaaaggcccgtaatcagccttcggtcatgt agaacaatgcataaaattaaattgacattaatgaataattgtgtaatgaaaatggaagag gagagttaattgcatgttacagtgagtgtaatgcctagataaccttgcatttaatgctat tcttagccctgctgccaagacttctacagagcctctctctgcaggaagtcattaaagctg tgagtagataatgcaggctcagtgaaacctaagtggcaacaatata >uce_171_ph_gufac1083 |uce_171 catggaaaacgaggaaaagccatatcttccaggccattaatattactacggagacgtctt catatcgccgtaattacagcagatctcaaagtggcacaaccaagaccagcaccaaagcta aaataactcgcaggagcaggcgagctgcttttgcagccctcagtcccagaaatgctcggt agcttttcttaaaatagacagcctgtaaataaggtctgtgaactcaattgaaggtggctg tttctgaattagtcagccctcacaaggctctcggcctacatgctagtacataaattgtcc actttaccaccagacaagaaagattagagtaataaacacggggcattagctcagctagag aaacacaccagccgttacgcacacgcgggattgccaagaactgttaaccccactctccag aaacgcacacaaaaaaacaagttaaagccatgacatcatgggaa >uce_4300_ph_gufac1139 |uce_4300 attaaaaatacaatcctcatgtttgcattttgcagtcgtcaacaagaaattgaagagaaa ctcatagaggaagaaactgctcgaagggtggaagaacttgtagctaaacgcgtggaagaa gagctggagaaaagaaaggatgagattgagcgagaggttctccgcagggtggaggaggct aagcgcatcatggaaaaacagttgctcgaagaactcgagcgacagcgacaagctgaactt gcagcacaaaaagccagagaggtaacgctcggtcgtttggaaagtagagacagtccatgg caaaactttcagtgtcggtttgtgcctcctgttcggttcagaaagagatggaatacagca aatctaattcccttctcatataaacttgcattgctgcgaaacttaatttctagcctattc agaggagctcactgatatttaaacagttactctcctaaaacctgaacaaggatacttgat tcttaatggaactgacctacatatttcagaattgtttgaaacttttgccatggctgcagg attattcagcagtcctttcatttt >uce_1039_ph_gufac1139 |uce_1039 attagtggaatacaaatatgcaaaaaccaaacagtttggtgctataatgtgaaaagaaat ttacaccaatcttatttttaatttgtatgggaacatttttaccacaaattccatatttta ataatactatcccaactctattttttagactcattttgtcactgttttgtaacagaaaca ctgtaaatattatagatgtggtaaactattatacttgttttcttataaatgaaatgatct gtgccaacactgacaaaatgaattaatgtgttactaaggcaacagtcacattatatgctt tctctttcacagtatgcggtagagcatatggtttactcttaatggaacactagcttctca ttaacataccagtagcaatgtcagaacttacaaaccagcataacagagaaatggaaaaac ttataaattagaccctttcagtattattgagtagaaaatgactgatgttccaaggtacaa tatttagctaatacagtgcccttttctgcatctttcttctcaaaggaaaaaaaaatcctc aaaaaaaaccagagcaagaaacctaactttttcttgt
i tried several alternatives without success, closest reached was
sed -n '/ph_gufac1083/, />/p' file.txt
that gave me that:
>uce_2347_ph_gufac1083 |uce_2347 gcttttctatgcagattttttctaattctctccctccccttgcttctgtcagtgtgaagc ccacactaagcattaacagtattaaaaagagtgttatctattagttcaattagacatcag acatttactttccaatgtatttgaagactgatttgatttgggtccaatcatttaaaaata agagagcagaactgtgtacagagctgtgtacagatatctgtagctctgaagtcttaattg caaattcagataaggattagaaggggctgtatctctgtagaccaaaggtatttgctaata cctgagatataaaagtggttaaattcaatatttactaatttaggatttccactttggatt ttgattaagctttttggttgaaaaccccacattattaagctgtgatgagggaaaaagcaa ctctttcataagcctcactttaacgctttatttcaaataatttattttggaccttctaaa g >uce_353_ph_gufac1083 |uce_353 >uce_101_ph_gufac1083 |uce_101 ttgggctttatttccaccttaaaatctttacctggccgtgatctgttgttccattactgg agggcaaaaatgggaggaattgtctgggctaaattgcaattaggcagccctgagagaggc tggcaccagttaacttgggatattggagtgaaaaggcccgtaatcagccttcggtcatgt agaacaatgcataaaattaaattgacattaatgaataattgtgtaatgaaaatggaagag gagagttaattgcatgttacagtgagtgtaatgcctagataaccttgcatttaatgctat tcttagccctgctgccaagacttctacagagcctctctctgcaggaagtcattaaagctg tgagtagataatgcaggctcagtgaaacctaagtggcaacaatata >uce_171_ph_gufac1083 |uce_171
do know how using grep, sed or awk?
thx
$ awk '/^>/{if(match($0,"ph_gufac1083")){s=1} else s=0}s' file
i made simple criteria request,
- if the start of line
>
, we're going judge if"ph_gufac1083"
existed, if yes, set s=1, set s=0 otherwise. - for line doesn't start
>
, value ofs
retained. - the final
s
inawk
command decide if line printed (s=1) or not (s=0).
Comments
Post a Comment