Common Log Format
March 2004 | Fredrik Lundh
Here’s a simple regular expression that can be used to parse server log files, in the Common Log Format.
p = re.compile(
'([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
)
for line in file.readlines():
m = p.match(line)
if not m:
continue
host, ignore, user, date, request, status, size = m.groups()
...Here’s a variation that parses the Extended Common Log Format, which contains additional referrer and user-agent fields.
p = re.compile(
'([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^"]*)" ([^ ]*) ([^ ]*)'
' "([^"]*)" "([^"]*)"' # extensions
)
for line in file.readlines():
m = p.match(line)
if not m:
continue
host, ignore, user, date, request, status, size,
referer, agent = m.groups()
...