warc cat

Concatenate and print warc files

warc cat FILE/DIR ... [flags]

Examples

Print all content from a WARC file
warc cat file1.warc.gz

# Pipe payload from record #4 into the image viewer feh
warc cat -n4 -P file1.warc.gz | feh -

Options

  -z, --compress                  output is compressed (per record)
      --ftp-pool-size int32       size of the ftp pool (default 1)
  -w, --header                    show WARC header
  -h, --help                      help for cat
      --id strings                filter record ID's. For more than one, repeat flag or comma separated list.
  -i, --input-file string         input file (system). Default is to use OS file system.
                                  Legal values:
                                  	/path/to/archive.( tar | tar.gz | tgz | zip | wacz )
                                  	ftp://user/pass@host:port
                                  
  -l, --limit int                 The maximum number of records to show. Defaults to show all records.
                                  If -o or -n option is set limit is set to 1.
  -m, --mime-type strings         filter records with given mime-types. For more than one, repeat flag or comma separated list.
  -n, --num int                   print the n'th record. Only records that are not filtered out by other options are counted.
  -o, --offset int                record offset
  -P, --payload                   show payload
  -p, --protocol-header           show protocol header
  -t, --record-type strings       filter records by type. For more than one, repeat the flag or use a comma separated list.
                                  Legal values:
                                  	warcinfo, request, response, metadata, revisit, resource, continuation and conversion
  -r, --recursive                 walk directories recursively
  -S, --response-code string      filter records by http response code
                                  Example:
                                  	200	- only records with a 200 response
                                  	200-300	- records with response codes between 200 (inclusive) and 300 (exclusive)
                                  	500-	- response codes from 500 and above
                                  	-400	- all response codes below 400
      --source-file-list string   a file containing a list of files to process, one file per line
      --suffixes strings          filter files by suffix (default [.warc,.warc.gz])
  -s, --symlinks                  follow symlinks
      --tmpdir string             directory to use for temporary files (default "/tmp")

Options inherited from parent commands

      --config string       config file. If not set, $XDG_CONFIG_DIRS, /etc/xdg/warc $XDG_CONFIG_HOME/warc and the current directory will be searched for a file named 'config.yaml'
  -O, --log-file string     log to file (default "-")
      --log-format string   log format. Valid values: text, json (default "text")
      --log-level string    log level. Valid values: debug, info, warn, error (default "info")

SEE ALSO

  • warc - A tool for handling warc files