CyberKeeda In Social Media

AWS Kinesis Agent configuration to process and parse multi line logs.


Within Kinesis Agent configuration, in order to preprocess data/logs before it send to Kinesis Stream or Firehose directly, we can use it's dataProcessingOptions  configuration settings.

Below are the three configuration options available for now.
  • SINGLELINE
  • CSVTOJSON
  • LOGTOJSON
There are many standard data/log format Kinesis agent is aware of and it don't need any pre processing such as Apache logs, but there are many cases our logs like Wildfly logs, custom data, stack traces which is not predefined, we need to use Kinesis Agent's dataProcessingOptions  to parse into a JSON value.

So we will be using here option "CSVTOJSON" and "SINGLELINE" option to parse our custom logs.

Here is our sample log looks like, every individual complete log line is highlighted with a different color.
14:36:21,753 | INFO  --- xyzignorelogs-1842 | orrelationIdGeneratorInterceptor | 1135 - com.xyz.ppp.def-orchestration-api-impl-bundle-v1 - 0.0.2211 | Execution started in CorrelationIdGeneratorInterceptor : somerandomnumber09897w9w7w
14:36:21,753 | INFO  --- xyzignorelogs-1842 | LogInInterceptor                 | 1135 - com.xyz.ppp.def-orchestration-api-impl-bundle-v1 - 0.0.2211 | Execution started in LogInInterceptor : somerandomnumber09897w9w7w
14:36:21,759 | INFO  ---  xyzignorelogs-1842 | Util                             | 1134 - com.xyz.ppp.def-orchestration- 0.0.2211 | {
  "consumer-workflow-DATA" : {
    "correlationId" : "somerandomnumber09897w9w7w",
    "flowName" : "carate",
    "Url" : "www.cyberkeeda.com",
    "Request/Response Type" : "tracecustomer",
    "API request" : {"fromAddress": "LA", "country" : "US", "DOB": "19-12-1977"}
  }
}
Before we do anything, we will tell agent to read our logs as multiline and for this we need to define the start of the string with a regex pattern.

Kinesis agent will treat a next line only when it again finds the same pattern.
From above logs we can clearly find that every new line of our log start with below time strings.
14:36:21,753 | INFO  
14:36:21,753 | INFO
14:36:21,759 | INFO 
    
So to process the above log file in JSON format, before it sends to stream.
We will use below configuration
{
    "flows": [
        {
            "filePattern": "/tmp/app.log*",
            "kinesisStream": "myapplogstream",
            "multiLineStartPattern": "^[0-9]{2}-[0-9]{2}-[0-9]{2}",
            "dataProcessingOptions": [
                {
                    "optionName": "SINGLELINE"
                },
                {
                    "optionName": "CSVTOJSON",
                    "customFieldNames": [ "timeframe", "message" ],
                    "delimiter": "---"
                }
            ]

        }
    ]
}

    
We have defined a regex to meet our requirement as "^[0-9]{2}-[0-9]{2}-[0-9]{2}" to let kinesis agent know our where does our multi-line starts with.

Further we are converting the entire line as SINGLELINE and dividing the entire line based on delimiter.
Here we are using "---", so overall we are dividing the entire lines into two part and thus seperating them by a "Comma (',')" to make it a CSV value.
"customFieldNames": [ "timeframe", "message" ],
Further As we are breaking the single line into CSV value with only two fields, we are using their field name as "timeframe" and "message"

Thus final processed line to stream will be sent as below
"timeframe" :"14:36:21,753 | INFO", "message": "xyzignorelogs-1842 | orrelationIdGeneratorInterceptor | 1135 - com.xyz.ppp.def-orchestration-api-impl-bundle-v1 - 0.0.2211 | Execution started in CorrelationIdGeneratorInterceptor : somerandomnumber09897w9w7w"

"timeframe" :"14:36:21,753" | INFO",  "message": "xyzignorelogs-1842 | LogInInterceptor                 | 1135 - com.xyz.ppp.def-orchestration-api-impl-bundle-v1 - 0.0.2211 | Execution started in LogInInterceptor : somerandomnumber09897w9w7w"


"timeframe" :" "14:36:21,759 | INFO", "message": "xyzignorelogs-1842 | Util                             | 1134 - com.xyz.ppp.def-orchestration- 0.0.2211 | {
  "consumer-workflow-DATA" : {
    "correlationId" : "somerandomnumber09897w9w7w",
    "flowName" : "rate",
    "Url" : "www.cyberkeeda.com",
    "Request/Response Type" : "tracecustomer",
    "API request" : {"fromAddress": "LA", "country" : "US", "DOB": "19-12-1977"}
  }
}"

    
Do let me know, if it works for you or not.

1 comment:

  1. hi and thanks regarding the particular post ive really been searching regarding this kind of info online for sum time these days hence thanks a lot bank logs

    ReplyDelete

Designed By Jackuna