Guangcun's Blog分享 http://blog.sciencenet.cn/u/gcshan 哥伦比亚大学访问学者,香港城市大学校董 (2011-2012)

博文

awk or gawk command syntax note

已有 3167 次阅读 2014-2-9 04:44 |个人分类:科学札记|系统分类:科研笔记| awk

awk or gawk (gnu awk)

Find and Replace text, database sort/validate/index

Syntax awk <options> 'Program' Input-File1 Input-File2 ... awk -f PROGRAM-FILE <options> Input-File1 Input-File2 ... Key -F FS --field-separator FS Use FS for the input field separator (the value of the `FS' predefined variable). -f PROGRAM-FILE --file PROGRAM-FILE Read the awk program source from the file PROGRAM-FILE, instead of from the first command line argument. -mf NNN -mr NNN The `f' flag sets the maximum number of fields, and the `r' flag sets the maximum record size. These options are ignored by `gawk', since `gawk' has no predefined limits; they are only for compatibility with the Bell Labs research version of Unix awk. -v VAR=VAL --assign VAR=VAL Assign the variable VAR the value VAL before program execution begins. -W traditional -W compat --traditional --compat Use compatibility mode, in which `gawk' extensions are turned off. -W lint --lint Give warnings about dubious or non-portable awk constructs. -W lint-old --lint-old Warn about constructs that are not available in the original Version 7 Unix version of awk. -W posix --posix Use POSIX compatibility mode, in which `gawk' extensions are turned off and additional restrictions apply. -W re-interval --re-interval Allow interval expressions, in regexps. -W source=PROGRAM-TEXT --source PROGRAM-TEXT Use PROGRAM-TEXT as awk program source code. This option allows mixing command line source code with source code from files, and is particularly useful for mixing command line programs with library functions. -- Signal the end of options. This is useful to allow further arguments to the awk program itself to start with a `-'. This is mainly for consistency with POSIX argument parsing conventions. 'Program' A series of patterns and actions: see below Input-File If no Input-File is specified then awk applies the Program to "standard input", (piped output of some other command or the terminal. Typed input will continue until end-of-file (typing `Control-d')

Basic functions

The basic function of awk is to search files for lines (or other units of text) that contain a pattern. When a line matches, awk performs a specific action on that line.

The Program statement that tells awk what to do; consists of a series of "rules".   Each rule specifies one pattern to search for, and one action to perform when that pattern is found.

For ease of reading, each line in an awk program is normally a separate Program statement , like this:

pattern { action } pattern { action } ...

e.g. Display lines from samplefile containing the string "123" or "abc" or "some text":

awk '/123/ { print $0 } /abc/ { print $0 } /some text/ { print $0 }' samplefile

A regular expression enclosed in slashes (/) is an awk pattern that matches every input record whose text belongs to that set. e.g. the pattern /foo/ matches any input record containing the three characters `foo', *anywhere* in the record.

awk patterns may be one of the following:

/Regular Expression/ - Match = Pattern && Pattern - AND Pattern || Pattern - OR ! Pattern - NOT Pattern ? Pattern : Pattern - If, Then, Else Pattern1, Pattern2 - Range Start - end BEGIN - Perform action BEFORE input file is read END - Perform action AFTER input file is read

The special patterns BEGIN and END may be used to capture control before the first input line is read and after the last. BEGIN and END do not combine with other patterns.

Variable names with special meanings:

CONVFMT conversion format used when converting numbers (default %.6g) FS regular expression used to separate fields; also settable by option -Ffs. NF number of fields in the current record NR ordinal number of the current record FNR ordinal number of the current record in the current file FILENAME the name of the current input file RS input record separator (default newline) OFS output field separator (default blank) ORS output record separator (default newline) OFMT output format for numbers (default %.6g) SUBSEP separates multiple subscripts (default 034) ARGC argument count, assignable ARGV argument array, assignable; non-null members are taken as filenames ENVIRON array of environment variables; subscripts are names.

In addition to simple pattern matching awk has a huge range of text and arithmetic Functions, Variables and Operators.




https://blog.sciencenet.cn/blog-417402-765785.html

上一篇:VdW in VASP and QE
下一篇:留港一年后续办IANG签证的详细流程
收藏 IP: 141.5.13.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-23 19:33

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部