warc ls
List WARC record fields
Synopsis
List information about WARC records
warc ls FILE/DIR ... [flags]
Options
-c, --concurrency int number of input files to process simultaneously. (default 6)
-d, --delimiter string field delimiter (default " ")
-F, --fields string which fields to include in the output
Field specification letters are mostly the same as the fields in the CDX file specification (https://iipc.github.io/warc-specifications/specifications/cdx-format/cdx-2015/).
The following fields are supported:
a - original URL
b - date in 14 digit format
B - date in RFC3339 format
e - IP address
g - filename
h - original host
i - record id
k - checksum
m - document mime type
s - http response code
S - record size
T - record type
V - offset
A number after the field letter restricts the field length. By adding a + or - sign before the number the field is padded to have the exact length. + is right aligned and - is left aligned.
--ftp-pool-size int32 size of the ftp pool (default 1)
-h, --help help for ls
--id strings filter record ID's. For more than one, repeat flag or comma separated list.
-i, --input-file string input file (system). Default is to use OS file system.
Legal values:
/path/to/archive.( tar | tar.gz | tgz | zip | wacz )
ftp://user/pass@host:port
--json output as JSON lines
-l, --limit int The maximum number of records to show. Defaults to show all records.
If -o or -n option is set limit is set to 1.
-m, --mime-type strings filter records with given mime-types. For more than one, repeat flag or comma separated list.
-n, --num int print the n'th record. Only records that are not filtered out by other options are counted.
-o, --offset int record offset
-t, --record-type strings filter records by type. For more than one, repeat the flag or use a comma separated list.
Legal values:
warcinfo, request, response, metadata, revisit, resource, continuation and conversion
-r, --recursive walk directories recursively
-S, --response-code string filter records by http response code
Example:
200 - only records with a 200 response
200-300 - records with response codes between 200 (inclusive) and 300 (exclusive)
500- - response codes from 500 and above
-400 - all response codes below 400
--source-file-list string a file containing a list of files to process, one file per line
--strict strict parsing
--suffixes strings filter files by suffix (default [.warc,.warc.gz])
-s, --symlinks follow symlinks
--tmpdir string directory to use for temporary files (default "/tmp")
Options inherited from parent commands
--config string config file. If not set, $XDG_CONFIG_DIRS, /etc/xdg/warc $XDG_CONFIG_HOME/warc and the current directory will be searched for a file named 'config.yaml'
-O, --log-file string log to file (default "-")
--log-format string log format. Valid values: text, json (default "text")
--log-level string log level. Valid values: debug, info, warn, error (default "info")
SEE ALSO
- warc - A tool for handling warc files