Elasticsearch – the river with jdbc (for mysql).

First of all you cannot block automapping, so:

# elasticsearch.yml
index.mapper.dynamic: true

Pre-mapping of your river doesn’t resolve the problem.

# part of the river configuration:
"type_mapping": {
                "mytype" : {
                    "properties" : {
                        "location" : {
                            "type" : "geo_point"
                        }
                    }
                }

Tip: If your node does not work after installation of the plugin you can try to install the plugin in latest beta version.

/usr/share/elasticsearch/bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.3.Beta1/elasticsearch-river-jdbc-1.4.0.3.Beta1-plugin.zip

Next, you have to install the driver for mysql:

wget "http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.34.tar.gz"
tar -xf ./mysql-connector-java-5.1.34.tar.gz
mv -v ./mysql-connector-java-5.1.34/mysql-connector-java-5.1.34-bin.jar /usr/share/elasticsearch/plugins/jdbc/

Create river:

curl -XPUT 'http://localhost:9200/_river/my_personal_river/_meta' -d '
{
        "type" : "jdbc",
        "jdbc" : {
                "url" : "jdbc:mysql://10.0.25.203:3306/db_fixed",
                "user" : "elasticsearchtst",
                "password" : "************",
                "sql" : "SELECT my_id_column _id, some_title_column header, some_text_column content FROM my_table WHERE _id < 18201014 ;",
                "index" : "clipping",
                "type" : "detail",
                "index_settings" : {
                "index" : {
                        "number_of_shards" : 1,
                        "number_of_replicas" : 0
                        }
                }
        }
}
'
# Check status
curl -XGET 'localhost:9200/_river/my_personal_river/_status'
# Delete
curl -XDELETE 'localhost:9200/_river/clipping_detail'


Cautions, conlusions & misconseption:
To understand the instance of the river you should treat the river as other index. This special index is flexible - you can modify it conform to your needs (shards/replicas/types). Elasticsearch internally transport all data from the river into the index which is dedicated.
You can disable the river on some of your nodes with parameter:
node.river: _none_ # disabling any river allocation to it

RECOMENDED:
It is recommended to use timestamps in UTC

    {
        "type" : "jdbc",
        "jdbc" : {
                "url" : "jdbc:mysql://localhost:3306/test",
                "user" : "",
                "password" : "",
            "sql" : [
                {
                    "statement" : "select * from \"products\" where \"mytimestamp\" > ?",
                    "parameter" : [ "$river.state.last_active_begin" ]
                }
            ],
            "index" : "my_jdbc_river_index",
            "type" : "my_jdbc_river_type"
        }
    }

Problem and solutions:
In case your river doesn't work properly...
Check which node is responsible for the river:

curl -XGET '10.0.25.197:9200/_river/clipping_detail/_status?pretty'
{
  "_index" : "_river",
  "_type" : "details",
  "_id" : "_status",
  "_version" : 1,
  "found" : true,
  "_source":{"node":{"id":"KTLya3PxQ_Supn19MswMRA",
  "name":"es-lab-201",
  "transport_address":"inet[/10.0.25.180:9300]"}}
}

Then, you can log into 10.0.25.180 box and check log to solve potential problems:

tail  /var/log/elasticsearch/elastic_lab.log -n 100 | grep jdbc
[2014-11-18 13:30:49,023][ERROR][river.jdbc.RiverPipeline ] com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ZELECT clde_id _id, clde_t header, clde_c content FROM the_tab' at line 1
  1. River is to be depracated 🙂

  1. No trackbacks yet.