사용자 지정 로그 수집
machregex 실행
[mach@localhost ~/mach_collector_home/bin]$ ./machregex ================================================================= Machbase Collector Regex Utility Release Version Copyright 2015, Machbase Inc. or its subsidiaries. All Rights Reserved. ================================================================= Usage> ./machregex Pattern NewlinePattern Result file : machregex.ok machregex.err << APACHE access log >> => machregex "^([0-9.:]+)\\s([\\w.-]+)\\s([\\w.-]+)\\s(\\[[^\\[\\]]+\\])\\s\"((?:[^\"]|\")+)\"\\s(\\d{3})\\s(\\d+|-)\\s\"((?:[^\"]|\")*)\"\\s\" ((?:[^\"]|\")*)\"$" "^([0-9.:]+)\s" < DATA.LOG << MACH trace log >> => machregex "^\\[(\\d+[-]\\d+[-]\\d+\\s\\d+[:]\\d+[:]\\d+)+\\s([P][-]\\d+)+\\s([T][-]\\d+)+\\]\\s((?:[^\\0])*)$" "^\\[" < DATA.LOG << syslog >> => machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" < DATA.LOG
machregex 실행 화면의 예제이다.
machregex 테스트
Syslog 데이터를 정규 표현식을 이용하여 machregex로 파싱 테스트한 내용이다.
[mach@localhost bin]$ machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" </var/log/syslog machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" </var/log/syslog Pattern => (^(([a-zA-Z]+)\s+([0-9]+)\s+([0-9:]*))\s(\S*)\s+((?:[^\0])*)$) ======================================================================== ............. ======================================================================== SUCCESS[107] (rc=7)(Aug 19 18:17:01 localhost CRON[6553]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) ) ALL (0:110) => [Aug 19 18:17:01 localhost CRON[6553]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) ] 0 (0:15) => [Aug 19 18:17:01] 1 (0:3) => [Aug] 2 (4:6) => [19] 3 (7:15) => [18:17:01] 4 (16:37) => [localhost] 5 (38:110) => [CRON[6553]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) ] ======================================================================= SUCCESS[107] (rc=7)(Aug 19 18:39:01 localhost CRON[6616]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime)) ) ALL (0:232) => [Aug 19 18:39:01 localhost CRON[6616]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sephp5/maxlifetime)) ] 0 (0:15) => [Aug 19 18:39:01] 1 (0:3) => [Aug] 2 (4:6) => [19] 3 (7:15) => [18:39:01] 4 (16:37) => [localhost] 5 (38:232) => [CRON[6616]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sphp5/maxlifetime)) ] Summary : Success(107), Failure(0) <== It shows that all of them were successfully completed.
위 실행 예에서, machregex는 syslog text파일을 주어진 정규 표현식으로 파싱하여 6개의 토큰으로 분리하였다. 이 토큰중 0, 4, 5를 데이터베이스 입력으로 사용하기 위해서 템플릿 파일의 COL_LIST 변수를 사용하여 토큰과 데이터베이스 컬럼을 연결한다.
사용자 정의 템플릿 생성 예제
이 장에서는 예제 text log파일을 이용하여 이 파일에서 데이터를 수집하는 collector template를 생성하는 것을 다룬다.
입력 샘플 text 파일은 다음과 같다.
[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever. [2014-08-18 13:51:19] superman message-2 : This is the best machine data DBMS ever. [2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever. [2014-08-18 13:51:33] superman message-4 : This is the best machine data DBMS ever. [2014-08-18 13:51:34] batman message-5 : This is the best machine data DBMS ever. [2014-08-18 13:52:34] superman message-6 : This is the best machine data DBMS ever. [2014-08-18 13:53:34] batman message-7 : This is the best machine data DBMS ever. [2014-08-18 13:54:31] superman message-8 : This is the best machine data DBMS ever. [2014-08-18 13:55:30] batman message-9 : This is the best machine data DBMS ever. [2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever. [2014-08-18 13:57:59] superman message-11 : This is the best machine data DBMS ever.
위 샘플 파일은 tm, user, msg의 3개 컬럼으로 변환될 수 있다. 각 컬럼의 데이터 타입은 각각 datetime, varchar(16), varchar(512)로 지정될 수 있다.
정규 표현식 생성 예제
정규 표현식 작성
\[([0-9-: ]+)\]
: 첫번째, 대괄호로 싸인 날짜 데이터가 들어온다. 대괄호는 제외하고 내부의 숫자 값만 토큰으로 받아오기위해서 아래의 표현식을 사용한다.
: 두번째, 유저이름 데이터가 들어오고, 공백을 제외한 문자열들을 입력받는다.
: 세번째, 끝까지의 문자열을 입력받는다.
\[([0-9-: ]+)\]\s(\S+)\s+([^\0]*)
: 세개의 토큰 사이에서 공백을 입력받도록 합친다.
"\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]*)"
: 쉘에서 문자열을 사용하기위해 더블 슬래쉬 처리를 한다.
: 줄바꿈 정규표현식으로는 시간의 시작부분인 대괄호로 합니다.
정규 표현식 검증
[mach@localhost ~/mach_collector_home/bin]$ machregex "\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]+)" "\\[" <test.log Pattern => (\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+)) ============================================================================ SUCCESS[2] (rc=4)([2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:51:19] 1 (22:31) => [spiderman] 2 (32:85) => [message-1 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[3] (rc=4)([2014-08-18 13:51:19] superman message-2 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:51:19] superman message-2 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:51:19] 1 (22:30) => [superman] 2 (32:85) => [message-2 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[4] (rc=4)([2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:51:33] 1 (22:31) => [spiderman] 2 (32:85) => [message-3 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[5] (rc=4)([2014-08-18 13:51:33] superman message-4 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:51:33] superman message-4 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:51:33] 1 (22:30) => [superman] 2 (32:85) => [message-4 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[6] (rc=4)([2014-08-18 13:51:34] batman message-5 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:51:34] batman message-5 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:51:34] 1 (22:28) => [batman] 2 (32:85) => [message-5 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[7] (rc=4)([2014-08-18 13:52:34] superman message-6 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:52:34] superman message-6 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:52:34] 1 (22:30) => [superman] 2 (32:85) => [message-6 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[8] (rc=4)([2014-08-18 13:53:34] batman message-7 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:53:34] batman message-7 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:53:34] 1 (22:28) => [batman] 2 (32:85) => [message-7 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[9] (rc=4)([2014-08-18 13:54:31] superman message-8 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:54:31] superman message-8 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:54:31] 1 (22:30) => [superman] 2 (32:85) => [message-8 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[10] (rc=4)([2014-08-18 13:55:30] batman message-9 : This is the best machine data DBMS ever. ) ALL (0:85) => [[2014-08-18 13:55:30] batman message-9 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:55:30] 1 (22:28) => [batman] 2 (32:85) => [message-9 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[11] (rc=4)([2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever. ) ALL (0:86) => [[2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever. ] 0 (1:20) => [2014-08-18 13:56:44] 1 (22:31) => [spiderman] 2 (32:86) => [message-10 : This is the best machine data DBMS ever. ] ============================================================================ SUCCESS[11] (rc=4)([2014-08-18 13:57:59] superman message-11 : This is the best machine data DBMS ever.) ALL (0:85) => [[2014-08-18 13:57:59] superman message-11 : This is the best machine data DBMS ever.] 0 (1:20) => [2014-08-18 13:57:59] 1 (22:30) => [superman] 2 (32:85) => [message-11 : This is the best machine data DBMS ever.] Summary : Success(11), Failure(0)
test.rgx 생성
작성한 정규 표현식을 위 과정을 거쳐서 입력 파일을 정상적으로 파싱하는 지를 확인 한 후, 파싱에 문제가 없었다면 정규 표현식과 컬럼 바인딩을 위해서 rgx파일을 다음과 같이 작성한다. 이 파일은 $MACHBASE_HOME/collector/samples/test.rgx
에 작성한다.
############################################################################### # Copyright of this product 2013-2023, # Machbase Corporation (Incorporation) or its subsidiaries. # All Rights reserved ############################################################################### # # This file is for Machbase trace collector regex file. # LOG_TYPE=custom COL_LIST= ( ( REGEX_NO = 0 NAME = tm TYPE = datetime SIZE = 8 DATE_FORMAT="%Y-%m-%d %H:%M:%S" ), ( REGEX_NO = 1 NAME = user TYPE = varchar SIZE = 16 USE_INDEX = 1 ), ( REGEX_NO = 2 NAME = msg TYPE = varchar SIZE = 512 USE_INDEX = 1 ) ) REGEX="\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+)" END_REGEX="\["
test.tpl 생성
을 $MACHBASE_HOME/collector/test.tpl
의 이름으로 복사하고 그 파일을 아래와 같이 수정한다.
############################################################################### # Copyright of this product 2013-2023, # Machbase Corporation(Incorporation) or its subsidiaries. # All Rights reserved ############################################################################### # # This file is for Machbase collector template file. # ################################################################### # Collect setting ################################################################### COLLECT_TYPE=FILE LOG_SOURCE=/home/mach/machbase_home/collector/samples/test.log ################################################################### # Process setting ################################################################### REGEX_PATH=/home/mach/machbase_home/collector/samples/test.tpl ################################################################### # Output setting ################################################################### DB_TABLE_NAME = "custom_table" DB_ADDR = "" DB_PORT = 5656 DB_USER = "SYS" DB_PASS = "MANAGER" # 0: Direct insert # 1: Prepared insert # 2: Append APPEND_MODE=2 # 0: None, just append. # 1: Truncate. # 2: Try to create table. If table already exists, warn it and proceed. # 3: Drop and create. CREATE_TABLE_MODE=2 Create and Execute a Collector
Collector 생성/실행
"myclt" collector를 생성하고 이를 실행한다.
Mach> create collector localhost.myclt from "/home/mach/mach_collector_home/collector/samples/test.tpl"; Created successfully. Elapsed Time : 0.106 Mach> Mach> alter collector localhost.myclt start; Altered successfully.
Collector 디버깅
입력 데이터를 기록할 TESTTABLE이 생성되지 않았다.
Mach> select * from custom_table; [ERR-02025 : Table CUSTOM_TABLE does not exist.]
Collector의 오류를 추적 파일에 기록하여 오류 해결을 하기 위해서 trace파일을 생성한다. trace파일의 생성하기 위해 다음의 명령을 수행한다.
Mach> alter collector localhost.myclt stop; Altered successfully. Mach> alter collector localhost.myclt start trace; Altered successfully.
Trace Log 를 통한 문제 탐색/해결
Collector실행시 오류가 발생한 경우, $MACHBASE_HOME/trc/machbase.trc
파일을 조사하면 데이터베이스 실행 오류를 찾을 수 있다. collector에서 실행 오류가 발생한 경우에는 collector를 TRACE모드로 실행해야 한다.
[2016-03-13 23:44:35 P-29741 T-139982693979904][INFO] PREPARE Error [create table custom_table ( collector_type varchar(32), collector_addr ipv4, collector_origin varchar(512), collector_offset long, tm datetime, user varchar(16), msg varchar(512))] (100007DA:Error in parse (syntax): near token (user varchar(16), msg varchar(512))).)
위 메시지를 살펴보면 테이블 생성 질의가 실패했고, 그 원인은 컬럼명으로 설정된 user가 built-in 키워드여서 컬럼명으로 쓸 수가 없기 때문이다. 따라서 rgx 파일의 COL_LIST 부분에서 user컬럼을 myuser로 변경하고 콜렉터를 다시 실행한다.
A partial contents from "test.rgx" ........... COL_LIST= ( ( REGEX_NO = 0 NAME = tm TYPE = datetime SIZE = 8 DATE_FORMAT="%Y-%m-%d %H:%M:%S" ), ( REGEX_NO = 1 NAME = myuser <== 수정된 부분 TYPE = varchar SIZE = 16 USE_INDEX = 1 ), ( REGEX_NO = 2 NAME = msg TYPE = varchar SIZE = 512 USE_INDEX = 1 ) ) ..................
실행/결과 확인
수정한 rgx파일로 재실행한다.
Mach> alter collector localhost.myclt stop; <== Stop the TRACE mode. Altered successfully. Mach> alter collector localhost.myclt start; <== Execute it again in a normal mode after the modification Altered successfully.
정상적으로 실행되었다면 콜렉터가 데이터를 저장한 테이블 내용을 조회할 수 있다.
Mach> select tm, myuser, msg from custom_table; tm myuser ----------------------------------------------------- msg ------------------------------------------------------------------------------------ 2014-08-18 13:57:59 000:000:000 superman message-11 : This is the best machine data DBMS ever. 2014-08-18 13:56:44 000:000:000 spiderman message-10 : This is the best machine data DBMS ever. 2014-08-18 13:55:30 000:000:000 batman message-9 : This is the best machine data DBMS ever. 2014-08-18 13:54:31 000:000:000 superman message-8 : This is the best machine data DBMS ever. 2014-08-18 13:53:34 000:000:000 batman message-7 : This is the best machine data DBMS ever. 2014-08-18 13:52:34 000:000:000 superman message-6 : This is the best machine data DBMS ever. 2014-08-18 13:51:34 000:000:000 batman message-5 : This is the best machine data DBMS ever. 2014-08-18 13:51:33 000:000:000 superman message-4 : This is the best machine data DBMS ever. 2014-08-18 13:51:33 000:000:000 spiderman message-3 : This is the best machine data DBMS ever. 2014-08-18 13:51:19 000:000:000 superman message-2 : This is the best machine data DBMS ever. 2014-08-18 13:51:19 000:000:000 spiderman message-1 : This is the best machine data DBMS ever. [11] row(s) selected.