Land the job you want — prepare
with Real interviews Q&A
Curated interview questions, company-wise guides and coding rounds. Practice mock interviews, improve with feedback, and track your progress.
AWK is a powerful text-processing, data-extraction, and pattern-scanning programming language that is standard on Unix and Linux-like operating systems. It works by checking lines of a file, matches particular pattern rules, and runs corresponding scripting instructions.
AWK was designed and implemented in 1977 by three legendary computer scientists: Alfred Aho, Peter Weinberger, and Brian Kernighan. The name of the language is an acronym derived from the first letters of their last names.
The standard structure consists of a pattern definition followed by an action enclosed inside curly braces:
awk 'pattern { action }' fileA pattern acts as a gatekeeper rule or filtering condition. AWK loops through input lines and runs actions only if a line matches the pattern (e.g., regex filters, numerical comparisons, or target words).
An action is a sequence of statements enclosed in curly brackets { ... }. When a pattern is matched, AWK runs these statements (e.g., calculations, custom formatting, or text outputs).
In AWK, $0 is a special variable representing the entire current line of input record in its original un-split state:
awk '{ print $0 }' file.txtAWK splits records into column columns dynamically. $1 stores the first field, $2 is the second, and so on:
awk '{ print $1, $2 }' file.txtFS stands for Field Separator. It defines the character used to split incoming records into fields. The default FS is whitespace (spaces and tabs):
awk 'BEGIN { FS=":" } { print $1 }' file.txtOFS stands for Output Field Separator. It specifies the delimiter used when printing fields separated by commas in print statements:
awk 'BEGIN { OFS=" | " } { print $1, $2 }' file.txtNR stands for Number of Records. It tracks the total count of lines processed so far, functioning like a current loop iteration counter:
awk '{ print NR, $0 }' file.txtNF is a built-in variable containing the Number of Fields in the current line record. It is highly useful for checking line complexity or printing specific end elements:
awk '{ print NF }' file.txtThe BEGIN block represents an initialization phase. Statements placed inside a BEGIN block execute exactly once before AWK reads any lines of the input file:
awk 'BEGIN { print "Processing Start..." } { print $1 }' file.txtThe END block runs exactly once after AWK has finished processing all lines of input records. It is typically used for printing final summaries, calculated totals, or averages:
awk '{ print $1 }' END { print "Total Lines Processed:", NR } file.txtBy default, calling print with no parameters outputs the entire current line record:
awk '{ print }' file.txtYou can print targeted fields by passing variable indices like $1 or $2 directly to print commands:
awk '{ print $2 }' file.txtUse the -F command-line argument to specify a custom character string as the field delimiter:
awk -F',' '{ print $1, $3 }' file.csvPass the word inside regular expression slashes (/pattern/) before the action execution block:
awk '/error/' file.txtUse comparison operators (>, <, ==) to evaluate field column values:
awk '$3 > 100' file.txtBy outputting the final NR value inside an END block, you get the overall number of lines processed:
awk 'END { print NR }' file.txtAccumulate column field values iteratively inside variables, then output the final calculation state in the END block:
awk '{ sum += $2 } END { print sum }' file.txtBy printing the record variable NR alongside the original string $0, you get line numbers prefixing each record:
awk '{ print NR, $0 }' file.txtAccess the final field element dynamically using $NF, which acts as a pointer index to the total number of fields:
awk '{ print $NF }' file.txtUse the built-in gsub() utility to swap matches globally on line strings:
awk '{ gsub(/old/, "new"); print }' file.txtCheck if the current line record number NR is divisible by 2 using modulus operations:
awk 'NR % 2 == 0' file.txtFilter line records using the modulus operation matching non-zero remainder states:
awk 'NR % 2 == 1' file.txtRS is the Record Separator. It controls how input data is divided into logical records (default is newline \n). Setting RS to empty ("") enables multi-line paragraph parsing modes:
awk 'BEGIN { RS="" } { print $1 }' file.txtPlace string or numeric conditional checks as patterns before action blocks to selectively process lines:
awk '$2 == "Admin"' file.txtUse standard if-else constructs inside action blocks to create branching logic paths:
awk '{ if ($3 > 50) print $1, "Passed"; else print $1, "Failed" }' file.txtRun standard loop loops (for, while) directly inside action blocks to parse field segments cleanly:
awk '{ for (i = 1; i <= NF; i++) print "Field", i, "is", $i }' file.txtUse the formatting utility printf to configure tabular columns, padding alignments, and custom layouts:
awk '{ printf "%-10s %5d\n", $1, $2 }' file.txtAn associative array is a map index system backed by string keys instead of numbers. This allows developers to build lookup structures matching terms, counts, or keys directly.
By mapping string values into associative arrays, you can easily track occurrence frequencies across fields:
awk '{ count[$1]++ } END { for (val in count) print val, count[val] }' file.txtThe getline function explicitly reads the next line of input immediately, allowing developers to control line iteration manually within custom control logic loops.
Configure the system variable IGNORECASE = 1 inside your BEGIN block to make regular expressions case-insensitive:
awk 'BEGIN { IGNORECASE = 1 } /error/' file.txtCheck the record count NR for 1 to catch the first line, then run the END block to catch the final record:
awk 'NR == 1 { print "First:", $0 } END { print "Last:", $0 }' file.txtUse the -v argument flag to map shell environment variables into AWK parameters before script parsing begins:
awk -v threshold=100 '$3 > threshold' file.txtTrack index states using NR == FNR to cache the first file in memory, then map values against the second target:
awk 'NR == FNR { a[$1] = $2; next } { print $0, a[$1] }' file1 file2The gsub() function performs string replacement globally across matched target strings. It modifies the variable parameter references directly in memory.
Unlike `gsub()`, the sub() function is non-global. It targets and replaces only the very first occurrence of a matched pattern on structural variables.
By checking the field count variable NF, AWK automatically filters out and skips blank line inputs:
awk 'NF' file.txtIsolate column columns using AWK, then pass the pipe stream into the terminal utility sort:
awk '{ print $2 }' file.txt | sortCalculate maximum column values efficiently by running variable check bounds iteratively:
awk 'BEGIN { max = 0 } $3 > max { max = $3 } END { print "Max:", max }' file.txtLoad all line strings sequentially into an array, then iterate backwards through the array in the END block:
awk '{ lines[NR] = $0 } END { for (i = NR; i >= 1; i--) print lines[i] }' file.txtFilter repeating strings quickly using associative index increments with boolean patterns:
awk '!seen[$0]++' file.txtTrack dynamic word counts by accumulating the NF values of each input record:
awk '{ total += NF } END { print "Words:", total }' file.txtIdentify and output records belonging strictly between specific starting and ending pattern definitions:
awk '/start/,/end/' file.txtSplit standard comma-separated inputs safely using custom field delimiters:
awk -F',' '{ print $2 }' file.csvSwap delimiters globally by configuring matching variables inside BEGIN blocks:
awk 'BEGIN { FS=","; OFS="|" } { $1=$1; print }' file.csvParse fields iteratively across columns to match and count exact target words:
awk '{ for (i=1; i<=NF; i++) if ($i == "target") count++ } END { print "Occurrences:", count }' file.txtMap and isolate unique string parameters across columns while skipping repeated entries:
awk '!seen[$1]++ { print $1 }' file.txt