s3log v0.2.0

Utilities for the S3 access log files

s3log

Utilities for the S3 access log files.

Usage

Usage: s3log [options] <command>

command:
  json    convert the log file to json
  sample  show sample data
  schema  show field names in json
  
options:
  --clickhouse  ClickHouse mode.
  --fail-fast   Abort the run on first failure.
  --help        Show this help.
  --version     Print the version and exit.

JSONify

convert to json, where LOG_FILE is like 2018-08-29-16-33-....

$ cat LOG_FILE
abc backet [29/Aug/2018:15:00:36 +0000] 0.0.0.0 arn:aws:sts::1:role 1 REST.HEAD.OBJECT key "HEAD /path HTTP/1.1" 200 - - 0 10 - "-" "agent" -
...

$ s3log json LOG_FILE
{"bucketOwner":"abc","bucket":"backet","timestamp":"29/Aug/20...

$ s3log json LOG_FILE | jq '.requestScheme'
"HEAD"
"GET"
"HEAD"
...

Play with ClickHouse

### create table named "s3logs"
$ s3log schema --clickhouse | clickhouse-client

### insert
$ s3log json --clickhouse LOG_FILE | clickhouse-client --query="INSERT INTO s3logs FORMAT JSONEachRow"

### play
:) select requestScheme,count(*) from s3logs group by requestScheme;
┌─requestScheme─┬─count()─┐
│ GET           │    2284 │
│ PUT           │       1 │
│ HEAD          │    1070 │
│ DELETE        │       1 │
└───────────────┴─────────┘

Production use with ClickHouse

In a production environment it is more convenient to divide it into smaller tables than to use in a single table to facilitate partial updates.

For example, suppose that the 2018-08-29/ directory contains log files for 2018-08-29. Here, we put them into s3logs_20180829 table, and then we use it via merge table s3logs.

### create merge table (first time only)
$ s3log schema --clickhouse --table "s3logs" --merge | clickhouse-client

### create current date table (ex. 20180829)
$ s3log schema --clickhouse --table "s3logs_20180829" | clickhouse-client

### import with idempotency (replace the table)
$ s3log schema --clickhouse --table "tmp_s3logs_20180829" | clickhouse-client
$ cat 2018-08-29/* | s3log json --clickhouse | clickhouse-client --query="INSERT INTO tmp_s3logs_20180829 FORMAT JSONEachRow"
$ clickhouse-client --query="RENAME TABLE s3logs_20180829 TO tmp_s3logs_20180829_old, tmp_s3logs_20180829 TO s3logs_20180829"
$ clickhouse-client --query="DROP TABLE tmp_s3logs_20180829_old"

### play with distributed table "s3logs"
$ clickhouse-client --query="show tables"
s3logs
s3logs_20180828
s3logs_20180829

Compile

make build

Contributing

  1. Fork it (https://github.com/maiha/s3log/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

  • maiha maiha - creator, maintainer
Repository

s3log

Owner
Statistic
  • 0
  • 0
  • 0
  • 0
  • 1
  • about 6 years ago
  • August 29, 2018
License

MIT License

Links
Synced at

Sat, 23 Nov 2024 23:47:36 GMT

Languages