/
Creating Collector

Creating Collector

Let's look at an example of collecting data in real time from syslog on a Linux system.

Normally, the collector and the Machbase server run on different machines, but in this example we assume that they run on the same machine for convenience.

Run Collector Manager


The collector manager controls the collectors. To control the collector, the collector manager must be executed first. Run the collector manager with the following command:

[mach@localhost ~]$ machcollectoradmin --startup
Waiting for collector manager starts.
Collector Manager started successfully.


Run the following netstat command to see if the collector manager is running:

[mach@localhost ~]$ netstat -anp | grep "LISTEN "
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
...
tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN 21163/machcollecto
...
[mach@localhost ~]$

Index


Register Collector Manager


To connect the collector manager with the Machbase server, register the collector manager with the Machbase server. Execute the following command using machsql.

CREATE COLLECTORMANAGER manager_name AT "host_addr:host_port";
  • manager_name  : The name of the collector manager. Duplicate values ​​are not allowed.
  • host_addr: The IP address of the server where the collector manager is running.
  • host_port: Port number of the server on which collector manager is running.
[mach@localhost ~/mach]$ machsql
=================================================================
    Machbase Client Query Utility
    Release Version x.x.x.official
    Copyright 2015, Machbase Inc. or its subsidiaries.
    All Rights Reserved.
=================================================================
Machbase server address (Default:127.0.0.1):
Machbase user ID (Default:SYS)
Machbase user password:
MACHBASE_CONNECT_MODE=INET, PORT=5656
mach>CREATE COLLECTORMANAGER LOCALHOST AT "127.0.0.1:9999";
Created successfully.


After registering a collector manager on the Machbase server, you can query the status in the m$sys_collectormanagers table.

mach> SELECT * FROM m$sys_collectormanagers;
MANAGER_ID MANAGER_NAME MANAGER_HOST MANAGER_PORT MANAGER_STATE
------------------------------------------------------------------------------------------------------------------------------
1 LOCALHOST 127.0.0.1 9999 1
[1] row(s) selected.


In the table, the identifier, name, port number, address, and execution status of the collector manager can be inquired.

Create Collector


After registering the collector manager, create the collector object through the collector manager.

Information about the Collector is stored in the Machbase server and can be retrieved. Execute the following command through machsql to create a collector.

CREATE COLLECTOR manager_name.collector_name FROM "path_for_template.tpl";
  • manager_name  : The name of the collector manager that runs the collector.
  • collector_name: The name of the collector object.
  • path_for_template.tpl: The path to the configuration file for collector. The various sample configuration files are located in the "$MACHBASE_COLLECTOR_HOME/collector" directory. It is recommended to select the desired sample file, modify it, and save it as another file.

Prepare Template File

The template file is a text file that describes the Collector's data source, processing method, and storage method. Sample files are provided in the $MACHBASE_COLLECTOR_HOME/collector directory.

Template File Structure

The template file has a structure of "variable name = value" similar to the Machbase property file. Detailed information of each setting variable is shown in the following table.

Configuration files after Machbase version 3.5 are not backward compatible.



Variable Name

Description

Remarks

COLLECT_TYPE

Data collection method

Sets the data collection method. The data collection method is as follows. FILE defaults to a specific file on the device where the collector is installed.  SFTP: Remote SFTP file path,  SOCKET: Enters socket input dataODBC : Enters data from database set to ODBC.

LOG_SOURCE

Location of log file to be read

The location of the data file to be read. In SFTP mode, you must specify the absolute path of the remote host. Not used in SOKET and ODBC modes. It is also possible to set multiple source files or set them to regular expressions.

SFTP_HOSTSFTP_HOSTHost Ip Address
SFTP_PORTSFTP_PORT

Is set to 22 by default if not set.

SFTP_USER

SFTP username

Is set to anonymous by default.

SFTP_PASS

SFTP password

Is set to anonymous by default.

SOCKET_PORT

Socket port number on which the Collector enters data


SOCKET_PROTOCOL

Collector socket protocol type

Possible values ​​are TCP and UDP. The default value is TCP.

ODBC_DSN

ODBC mode DSN

".odbc.ini" value

ODBC_QUERY

ODBC mode query

Query string executed to obtain input data from an ODBC data source

ODBC_SEQ_COLUMN

Increased column names in ODBC mode

Only numeric columns are allowed.

LIB_NAME

External link library pass

Not used yet.

REGEX_PATH

Regular expression file for analyzing input data

Not used in ODBC mode.

PREPROCESS_PATH

Location of Python script files for data preprocessing


SLEEP_TIME

Wait time after inputting data

In milliseconds, with a default of 1000.

DB_TABLE_NAME

Table name to be entered


DB_ADDR

Database IP address to be entered


DB_PORT

Database port number


DB_USER

Database username


DB_PASS

Database password


APPEND_MODE

Data input method configuration

Not used as a value for compatibility with past versions.

AUTO_ADD_COLUMN

Whether to automatically generate a table column if it does not exist

If 0, it is not generated. If1, it is generated automatically.

Default value is 1.

CREATE_TABLE_MODE

Set an operation on the input table. (0: do nothing. 1: truncate the existing table 2: create the table. If an error occurs, write the error to trc and continue 3: drop the table and recreate the table)

Generally recommended to set to 2.

LANG

Specifies the encoding of the input data file.

Available values ​​are UTF-8 (default), CP949 (MS949), KSC5601, EUCJP, SHIFTJIS, BIG5 and BG231280.

REGEX_SORT

Determines the order of the input files.

Default value is ASC and DESC is also possible.

ROTATE_FILE_PATH

Rotation file path configuration


ROTATE_FILE_COUNT

Rotation file number configuration


ROTATE_REGEX_SORT

Rotation file order configuration

Default value is ASC. DESC is also possible.


 REGEX_PATH, and PREPROCESS_PATH are the files that the collector refers to at run time. Below is a description of the rgx file set in REGEX_PATH.

Variable Name

Description

Remarks

LOG_TYPE

Regular expression name

Value that can be modified, but it is better to keep the value because it is stored together in the database.

COL_LIST

List of columns in the table

Information on the columns belonging to the table

REGEX

Regular expressions for data analysis


END_REGEX

Regular expression that signifies the end of a record

Regular expression to separate each record. If not set, "\n" line break is used as default.


COL_LIST describes the information linking the log file to the database column. You must set the result of the regular expression and various information to set the column. Complex log data can be entered into structured table columns using COL_LIST.

Variable Name

Description

Remarks

NAME

Column name

String that does not contain spaces.

TYPE

Column data type

Name of the type.

SIZE

Column size

Refers to the actual specified size of the column. The string specifies a different value depending on the size to be created or created. ((short (6), int (11), long (20), float (17), double (17), datetime -defined), ipv4 (15), ipv6 (45), text (64MB), binary (64MB))

DATE_FORMAT

Datetime data format when type is datetime

Internally parses the value using the "strptime" function.

e.g.) 'Aug 19 07:56:16' has the format 'month day hour: minute: second'. Therefore, the format values ​​used are as follows. "% b% d% H:% M:% S"
add) Additional factors are % 0,% 1,% 2, and are entered with three digits representing milliseconds, microseconds, and nanoseconds, respectively.

USE_INDEX

Whether to create index

Creates LSM or KEYWORD LSM index based on type.

0: Do not create. / 1: Create.

REGEX_NO

Token number within regular expression

Among the REGEX syntax specified in the regular expression file, the "()" parenthesized area is a token. 0 means the entire record data. After that, it becomes the first token from the first parenthesis.


syslog.tpl Example

Below is an example of a syslog.tpl file. The file is provided as a sample in $MACHBASE_COLLECTOR_HOME/collector/syslog.tpl.

###############################################################################
# Copyright of this product 2013-2023,
# Machbase Inc. or its subsidiaries.
# All Rights reserved
###############################################################################

#
#  This file is for Machbase collector template file.
#

###################################################################
# Input setting
###################################################################

COLLECT_TYPE=FILE <== It specifies a method to collect local data.
LOG_SOURCE=/var/log/syslog <== It specifies a location of source file.

###################################################################
# Process setting
###################################################################

REGEX_PATH=syslog.rgx        <== Regular expression file location.
                                 Set $MACHBASE_HOME/collector/regex/ to root

###################################################################
# Output setting
###################################################################

DB_TABLE_NAME = "syslogtable" <== Table name: Data entered here
DB_ADDR       = "127.0.0.1"   <== Running Machbase server IP/PORT
DB_PORT       = 5656
DB_USER       = "SYS"
DB_PASS       = "MANAGER"

# 0: Direct insert
# 1: Prepared insert
# 2: Append
APPEND_MODE=2 <== Data insertion in APPEND mode.

# 0: None, just append.
# 1: Truncate.
# 2: Try to create table. If table already exists, warn it and proceed.
# 3: Drop and create.
CREATE_TABLE_MODE=2 <== Create a table if there is none.


The syslog.rgx file is a regular expression file set in the syslog.tpl file. When setting up an rgx file, you can either set it to an absolute path or relative path based on $MACHBASE_COLLECTOR_HOME/collector/regex.

###############################################################################
# Copyright of this product 2013-2023,
# Machbase Corporation (Incorporation) or its subsidiaries.
# All Rights reserved
###############################################################################

#
#  This file is for Machbase collector regex file.
#

LOG_TYPE=syslog

COL_LIST= (
     (
        REGEX_NO = 0 <== Regular expression token number
        NAME = tm
        TYPE = datetime
        SIZE = 8
        DATE_FORMAT="%b %d %H:%M:%S" <== datetime format used by strptime function
         ),
     (
        REGEX_NO = 4
        NAME = host
        TYPE = varchar
        SIZE = 128
        USE_INDEX = 1 <== Whether index is in use
         ),
     (
        REGEX_NO = 5
        NAME = msg
        TYPE = varchar
        SIZE = 512
        USE_INDEX = 1
         )
)

# Below is the regular expression to pares syslog data. It may not work properly if it is modified.
REGEX="(([a-zA-Z]+)\s+([0-9]+)\s+([0-9:]*))\s(\S+)\s+([^\n]+)"

END_REGEX="\n"


Create Collector


Create the collector "syslog_test" as shown below.

mach> CREATE COLLECTOR localhost.syslog_test FROM "/home/mach/mach_collector_home/collector/syslog.tpl";
Created successfully.


Check Collector


The M$SYS_COLLECTORS table contains information about the registered collectors. The collector with the "RUN_FLAG" column value of 1 is running and if it is 0, the execution is stopped.

mach> SELECT collector_name, run_flag FROM m$sys_collectors;
collector_name                            run_flag    
---------------------------------------------------------
SYSLOG_TEST                               0           
[1] row(s) selected.
mach> SELECT * FROM m$sys_collectors;
COLLECTOR_ID MANAGER_NAME                                 COLLECTOR_NAME                            
-----------------------------------------------------------------------------------------------------
LOG_TYPE                                  TABLE_NAME                                
---------------------------------------------------------------------------------------
TEMPLATE_NAME                                                                     COLLECT_TYPE                              
-------------------------------------------------------------------------------------------------------------------------------
COLLECTOR_SOURCE                                                                  
------------------------------------------------------------------------------------
COLLECTOR_LIB                                                                     COL_COUNT   
-------------------------------------------------------------------------------------------------
PREPROCESS_PATH                                                                   
------------------------------------------------------------------------------------
REGEX_PATH                                                                        
------------------------------------------------------------------------------------
REGEX                                                                             
------------------------------------------------------------------------------------
END_REGEX                                                                         
------------------------------------------------------------------------------------
DEFAULT_ADDR                                                                      LANGUAGE                          
-----------------------------------------------------------------------------------------------------------------------
SLEEP_TIME  DB_ADDR                                   DB_PORT     DB_USER                                   
-----------------------------------------------------------------------------------------------------------------
DB_PASS                                   RUN_FLAG    
---------------------------------------------------------
1           LOCALHOST                                 SYSLOG_TEST                               
syslog                                    syslogtable                               
/home/mach/mach_collector_home/collector/syslog.tpl                             FILE                                      
/var/log/syslog                                                                   
NULL                                                                              7           
NULL                                                                              
syslog.rgx                                                                        
(([a-zA-Z]+)\s+([0-9]+)\s+([0-9:]*))\s(\S+)\s+([^\n]+)                            
\n                                                                                
192.168.122.1                                                                     UTF-8                             
1000        127.0.0.1                                 5656        SYS                                       
MANAGER                                   0           
[1] row(s) selected.


Run Collector


ALTER COLLECTOR manager_name.collector_name START [TRACE];

To start the registered collector, use the ALTER COLLECTOR statement.

  • manager_name  : Name of the registered collector manager
  • collector_name: The name of the collector to execute.

If an error occurs when executing Collector, you can refer to $MACHBASE_COLLECTOR_HOME/trc/machcollector.trc file for troubleshooting.

mach> ALTER COLLECTOR localhost.syslog_test START;
Altered successfully.
mach> SELECT collector_name, run_flag FROM m$sys_collectors;
collector_name                            run_flag    
---------------------------------------------------------
SYSLOG_TEST                               1           
[1] row(s) selected.

When you start collector with the ALTER COLLECTOR statement, you can see that the value of the RUN_FLAG column has changed by one.

When you start the Collector, a log table is created on the database server where the collected data is stored. The values ​​of collector_type, collector_addr, collector_origin, and collector_offset are set to default values. The tmp, host, and msg columns set in the syslog.tpl file are also created.

mach> ALTER COLLECTOR localhost.syslog_test START;
Altered successfully.
mach> SELECT collector_name, run_flag FROM m$sys_collectors;
collector_name                            run_flag    
---------------------------------------------------------
SYSLOG_TEST                               1           
[1] row(s) selected.


When you execute a query using machsql, you need to make sure that it is connected to the Machbase server and is running. If the Machbase server and collector are installed on different machines, it may not execute normally if the server to which machsql is connected is collector.
When the Collector is executed, the Collector reads the position of the last data entered and re-executes the data.

Data Check

Below is a comparison of the last 10 syslog logs with data and input data.

[mach@localhost ~/mach]$ tail -n 10 /var/log/syslog
Jun 28 21:05:01 localhost CROND[12285]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Jun 28 21:10:01 localhost CROND[12442]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Jun 28 21:10:01 localhost CROND[12443]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jun 28 21:15:01 localhost CROND[12527]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Jun 28 21:20:01 localhost CROND[12609]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jun 28 21:20:01 localhost CROND[12608]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Jun 28 21:25:01 localhost CROND[12707]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Jun 28 21:25:01 localhost CROND[12708]: (pcp) CMD ( /usr/libexec/pcp/bin/pmlogger_check -C)
Jun 28 21:25:43 localhost su: pam_unix(su:session): session opened for user root by mach(uid=506)
Jun 28 21:26:02 localhost su: pam_unix(su:session): session closed for user root


The following is the last 10 data entered into the Machbase server.

mach> SELECT tm, msg FROM syslogtable LIMIT 10;
tm                              msg                                                                               
---------------------------------------------------------------------------------------------------------------------
2016-06-28 21:26:02 000:000:000 su: pam_unix(su:session): session closed for user root                            
2016-06-28 21:25:43 000:000:000 su: pam_unix(su:session): session opened for user root by mach(uid=506)          
2016-06-28 21:25:01 000:000:000 CROND[12708]: (pcp) CMD ( /usr/libexec/pcp/bin/pmlogger_check -C)                 
2016-06-28 21:25:01 000:000:000 CROND[12707]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --loc  
                                k-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)              
2016-06-28 21:20:01 000:000:000 CROND[12608]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --loc  
                                k-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)              
2016-06-28 21:20:01 000:000:000 CROND[12609]: (root) CMD (/usr/lib64/sa/sa1 1 1)                                  
2016-06-28 21:15:01 000:000:000 CROND[12527]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --loc  
                                k-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)              
2016-06-28 21:10:01 000:000:000 CROND[12443]: (root) CMD (/usr/lib64/sa/sa1 1 1)                                  
2016-06-28 21:10:01 000:000:000 CROND[12442]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --loc  
                                k-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)              
2016-06-28 21:05:01 000:000:000 CROND[12285]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --loc  
                                k-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)              
[10] row(s) selected.


You can check whether the collector is executed by the following query.

mach> SELECT collector_name, run_flag FROM m$sys_collectors;
collector_name                            run_flag    
---------------------------------------------------------
SYSLOG_TEST                               1           
[1] row(s) selected.

Stop Collector


ALTER COLLECTOR manager_name.collector_name STOP;
mach> ALTER COLLECTOR localhost.syslog_test STOP;
Altered successfully.


You can stop the collector with the following command:

mach> ALTER COLLECTOR localhost.syslog_test STOP;
Altered successfully.


Drop Collector


DROP COLLECTOR manager_name.collector_name;
mach> DROP COLLECTOR localhost.syslog_test;
Dropped successfully.


Whether the collector is dropped can be confirmed by the following query.

mach> SELECT collector_name, run_flag FROM m$sys_collectors;
collector_name                            run_flag    
---------------------------------------------------------
[0] row(s) selected.


Update Collector


ALTER COLLECTOR manager_name.collector_name RELOAD;

This is used to change the template file after creating the collector and to apply the new contents. The contents of the template file updated at the time of execution are applied. The following example changes the table into "anothertable" instead of the original value.

mach> ALTER COLLECTOR localhost.custom RELOAD;
Altered successfully.

mach> SELECT * FROM m$sys_collectors;
COLLECTOR_ID MANAGER_NAME                                 COLLECTOR_NAME                            
-----------------------------------------------------------------------------------------------------
LOG_TYPE                                  TABLE_NAME                                
---------------------------------------------------------------------------------------
TEMPLATE_NAME                                                                     COLLECT_TYPE                              
-------------------------------------------------------------------------------------------------------------------------------
COLLECTOR_SOURCE                                                                  
------------------------------------------------------------------------------------
COLLECTOR_LIB                                                                     COL_COUNT   
-------------------------------------------------------------------------------------------------
PREPROCESS_PATH                                                                   
------------------------------------------------------------------------------------
REGEX_PATH                                                                        
------------------------------------------------------------------------------------
REGEX                                                                             
------------------------------------------------------------------------------------
END_REGEX                                                                         LANGUAGE                          
-----------------------------------------------------------------------------------------------------------------------
SLEEP_TIME  DB_ADDR                                   DB_PORT     DB_USER                                   
-----------------------------------------------------------------------------------------------------------------
DB_PASS                                   PROCESS_BYTE         PROCESS_RECORD       RUN_FLAG    
-----------------------------------------------------------------------------------------------------
4           LOCALHOST                                 CUSTOM                                    
syslog                                    anothertable                              
syslog.tpl                                                                        FILE                                      
/var/log/syslog                                                                   
NULL                                                                              7           
NULL                                                                              
syslog.rgx                                                                        
(([a-zA-Z]+)\s+([0-9]+)\s+([0-9:]*))\s(\S+)\s+([^\n]+)                            
\n                                                                                UTF-8                             
1000        127.0.0.1                                 5656        SYS                                       
MANAGER                                   0                    0                    0           
[1] row(s) selected.

When you look up the meta table, you can see that the input table has been changed to anothertable.