bash - How to select text in a file until a certain string using grep, sed or awk? -
i have huge file (this sample) , select lines "ph_gufac1083" , after until reach 1 doesn't have code (in example ph_gufac1139)
>uce_353_ph_gufac1083 |uce_353 tttagccatagaaatgcagaaataattagaagtgccattgtgtacagtgccttctggact gggctgaaggtgaaggagaaagtatcatactatccttgtcagctgcaagggtaattactg ctggctgaaattactcaacatttgtttataagctccccagagcatgctgtaaatagattg tctgttatagtccaatcacattaaaacgctgctccttgcaaactgctacctcctgttttc tgtaagctagacagagaaagcctgctgctcacttactgagcaccaagcactgaagagcta tgtttaatgtgattgttttcattagctcttctctgtctgatattacatttataatttgct gggcttgaagactggcatgttgcattgctttcatttactgtagtaagagtgaatagctct @ >uce_101_ph_gufac1083 |uce_101 ttgggctttatttccaccttaaaatctttacctggccgtgatctgttgttccattactgg agggcaaaaatgggaggaattgtctgggctaaattgcaattaggcagccctgagagaggc tggcaccagttaacttgggatattggagtgaaaaggcccgtaatcagccttcggtcatgt agaacaatgcataaaattaaattgacattaatgaataattgtgtaatgaaaatggaagag gagagttaattgcatgttacagtgagtgtaatgcctagataaccttgcatttaatgctat tcttagccctgctgccaagacttctacagagcctctctctgcaggaagtcattaaagctg tgagtagataatgcaggctcagtgaaacctaagtggcaacaatata >uce_171_ph_gufac1083 |uce_171 catggaaaacgaggaaaagccatatcttccaggccattaatattactacggagacgtctt catatcgccgtaattacagcagatctcaaagtggcacaaccaagaccagcaccaaagcta aaataactcgcaggagcaggcgagctgcttttgcagccctcagtcccagaaatgctcggt agcttttcttaaaatagacagcctgtaaataaggtctgtgaactcaattgaaggtggctg tttctgaattagtcagccctcacaaggctctcggcctacatgctagtacataaattgtcc actttaccaccagacaagaaagattagagtaataaacacggggcattagctcagctagag aaacacaccagccgttacgcacacgcgggattgccaagaactgttaaccccactctccag aaacgcacacaaaaaaacaagttaaagccatgacatcatgggaa >uce_4300_ph_gufac1139 |uce_4300 attaaaaatacaatcctcatgtttgcattttgcagtcgtcaacaagaaattgaagagaaa ctcatagaggaagaaactgctcgaagggtggaagaacttgtagctaaacgcgtggaagaa gagctggagaaaagaaaggatgagattgagcgagaggttctccgcagggtggaggaggct aagcgcatcatggaaaaacagttgctcgaagaactcgagcgacagcgacaagctgaactt gcagcacaaaaagccagagaggtaacgctcggtcgtttggaaagtagagacagtccatgg caaaactttcagtgtcggtttgtgcctcctgttcggttcagaaagagatggaatacagca aatctaattcccttctcatataaacttgcattgctgcgaaacttaatttctagcctattc agaggagctcactgatatttaaacagttactctcctaaaacctgaacaaggatacttgat tcttaatggaactgacctacatatttcagaattgtttgaaacttttgccatggctgcagg attattcagcagtcctttcatttt >uce_1039_ph_gufac1139 |uce_1039 attagtggaatacaaatatgcaaaaaccaaacagtttggtgctataatgtgaaaagaaat ttacaccaatcttatttttaatttgtatgggaacatttttaccacaaattccatatttta ataatactatcccaactctattttttagactcattttgtcactgttttgtaacagaaaca ctgtaaatattatagatgtggtaaactattatacttgttttcttataaatgaaatgatct gtgccaacactgacaaaatgaattaatgtgttactaaggcaacagtcacattatatgctt tctctttcacagtatgcggtagagcatatggtttactcttaatggaacactagcttctca ttaacataccagtagcaatgtcagaacttacaaaccagcataacagagaaatggaaaaac ttataaattagaccctttcagtattattgagtagaaaatgactgatgttccaaggtacaa tatttagctaatacagtgcccttttctgcatctttcttctcaaaggaaaaaaaaatcctc aaaaaaaaccagagcaagaaacctaactttttcttgt i tried several alternatives without success, closest reached was
sed -n '/ph_gufac1083/, />/p' file.txt that gave me that:
>uce_2347_ph_gufac1083 |uce_2347 gcttttctatgcagattttttctaattctctccctccccttgcttctgtcagtgtgaagc ccacactaagcattaacagtattaaaaagagtgttatctattagttcaattagacatcag acatttactttccaatgtatttgaagactgatttgatttgggtccaatcatttaaaaata agagagcagaactgtgtacagagctgtgtacagatatctgtagctctgaagtcttaattg caaattcagataaggattagaaggggctgtatctctgtagaccaaaggtatttgctaata cctgagatataaaagtggttaaattcaatatttactaatttaggatttccactttggatt ttgattaagctttttggttgaaaaccccacattattaagctgtgatgagggaaaaagcaa ctctttcataagcctcactttaacgctttatttcaaataatttattttggaccttctaaa g >uce_353_ph_gufac1083 |uce_353 >uce_101_ph_gufac1083 |uce_101 ttgggctttatttccaccttaaaatctttacctggccgtgatctgttgttccattactgg agggcaaaaatgggaggaattgtctgggctaaattgcaattaggcagccctgagagaggc tggcaccagttaacttgggatattggagtgaaaaggcccgtaatcagccttcggtcatgt agaacaatgcataaaattaaattgacattaatgaataattgtgtaatgaaaatggaagag gagagttaattgcatgttacagtgagtgtaatgcctagataaccttgcatttaatgctat tcttagccctgctgccaagacttctacagagcctctctctgcaggaagtcattaaagctg tgagtagataatgcaggctcagtgaaacctaagtggcaacaatata >uce_171_ph_gufac1083 |uce_171 do know how using grep, sed or awk?
thx
$ awk '/^>/{if(match($0,"ph_gufac1083")){s=1} else s=0}s' file i made simple criteria request,
- if the start of line
>, we're going judge if"ph_gufac1083"existed, if yes, set s=1, set s=0 otherwise. - for line doesn't start
>, value ofsretained. - the final
sinawkcommand decide if line printed (s=1) or not (s=0).
Comments
Post a Comment