sed, awk, perl o lex: encuentre cadenas por prefijo+regex, ignorando el resto de la entrada [cerrado]
Frecuentes
Visto 467 veces
-3
I need to find strings with a certain prefix, followed by a regexp, in a bunch of files, but ignore the rest of the input (including the content of the line before the prefix, and after the end of the matching regexp).
What's the best tool for the job? grep
finds complete lines; sed
is usually used just for editing and select-and-replace; awk
? perl
?
También pensé en lex
, but am I really after a compiler compiler?!
Edit: the input is several thousand of HTML files, the prefix + regular expression would be https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})
(of which I want $1
), and the rest of the input ignored.
1 Respuestas
1
If you won't have more than one of the pattern on a single line, I'd probably use sed
:
sed -n -e 's%.*https://\([-.0-9A-Za-z]\{1,\}\.[A-Za-z]\{2,\}\).*%\1%p'
Dado el archivo de datos:
Nothing here
Before https://example.com after
https://example.com and after
Before you get to https://www.example.com
And double your https://example.com for fun and happiness https://www.example.com in triplicate https://a.bb
and nothing here
EL sed
script produces one entry per line, showing the last entry when there's more than one on the line:
example.com
example.com
www.example.com
a.bb
A Perl script can be used for multiple entries per line:
$ perl -nle 'print $1 while (m%https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})%g);' data
example.com
example.com
www.example.com
example.com
www.example.com
a.bb
$
respondido 27 nov., 13:05
I have no clue what the html input is; I'd like to be able to find more than one occurrence of my pattern on any given line. - cnst
Example please. What do you mean by "prefix"? - Jim Garrison
"https://" would be the prefix. - cnst
What do you mean by "regexp?" Examples of the strings would help. - Kenosis
Why do you think grep is not a solution? Sounds like it will work just fine with the right expression, but without more details and input samples we're all just guessing. - Jim Garrison
Amended the question. - cnst