Regex to read URL from ASPX File PowerShell -
i'm writing powershell script extracts url's aspx files , test if http statuscode equal 200.
i found following regex url:
$regex = "(http[s]?|[s]?ftp[s]?)(:\/\/)([^\s,]+)" select-string -path $path -pattern $regex -allmatches | % { $_.matches } | % { $_.value }
but return looks this:
https://code.jquery.com/ui/1.9.0/themes/base/jquery-ui.css"/> https://code.jquery.com/ui/1.11.4/jquery-ui.min.js"></script>
as can see, doesn't trim end of html tags.
how can edit regex url without html tags in end?
if have @ [^\s,]
negated character class, see matches any char whitespace , ,
. if @ input have, notice "
, <
, >
can matched [^\s,]
.
a fix current situation add <>"
chars negated character class make regex engine "stop" when comes across >
, <
, "
chars.
note since extract whole matches, may refactor pattern bit , remove unnecessary groupings , turn first 1 non-capturing group:
$regex = '(?:http|s?ftp)s?://[^\s,<>"]+'
mind in .net patterns, /
not need escaped (it not special regex metacharacter/operator).
Comments
Post a Comment