Linux sed 命令

2017-07-14|Categories: awk-sed-grep, External cmd, Linux|

正则匹配没有「懒惰模式」

Sed支持POSIX风格的正则表达式(BRE、ERE),不支持Perl风格的正则表达式(PCRE),因此也就不支持Perl正则表达式所特有的「懒惰匹配模式」,详见《正则表达式学习笔记》

下面的例子来自 http://coolshell.cn/articles/9104.html

例:去掉某html中的tags

<b>This</b> is what <span style="text-decoration: underline;">I</span> meant. Understand?

如果你像下面这样搞的话,就会有问题,因为元字符.*会「贪婪匹配」:

$ sed 's/<.*>//g' html.txt
 meant. Understand?

要解决上面的那个问题,就得像下面这样使用<[^>]*>:除了大于符号>之外的字符重复0次或多次,换句话说,碰到大于符号>就立即停止匹配,开始替换:

$ sed 's/<[^>]*>//g' html.txt
This is what I meant. Understand?

这样就完美去除了HTML标签,只保留了纯文本。

查找替换子命令使用任意分隔符

刚开始尝试sed时,不知道s子命令可以使用任意字符作为分隔符,使用了大量的反斜线\转义,结果被「锯齿」晃花了眼!

Copy from http://backreference.org/2010/02/20/using-different-delimiters-in-sed/

What if, in sed, you have lots of slashes in the pattern and/or replacement?

One solution is to escape them all (the so-called sawtooth effect):

sed 's/\/a\/b\/c\//\/d\/e\/f\//'    # change "a/b/c/" to "d/e/f/"

But that is ugly and unreadable. It's a not-so-known fact that sed can use ANY character as separator for the "s" command. Basically, sed takes whatever follows the "s" as the separator. So, our code above can be rewritten for example in one of the following ways:

sed 's_/a/b/c/_/d/e/f/_'
sed 's;/a/b/c/;/d/e/f/;'
sed 's#/a/b/c/#/d/e/f/#'
sed 's|/a/b/c/|/d/e/f/|'
sed 's /a/b/c/ /d/e/f/ '       # yes, even space

An even less-known fact is that you can use a different delimiter even for patterns used in addresses, using a special syntax:

# do this (ugly)...
sed '/\/a\/b\/c\//{do something;}'

# ...or these (better)
sed '\#/a/b/c/#{do something;}'
sed '\_/a/b/c/_{do something;}'
sed '\%/a/b/c/%{do something;}'

Leave A Comment