Elasticsearch – the river with jdbc (for mysql).
First of all you cannot block automapping, so:
# elasticsearch.yml index.mapper.dynamic: true
Pre-mapping of your river doesn’t resolve the problem.
# part of the river configuration:
"type_mapping": {
"mytype" : {
"properties" : {
"location" : {
"type" : "geo_point"
}
}
}
Tip: If your node does not work after installation of the plugin you can try to install the plugin in latest beta version.
/usr/share/elasticsearch/bin/plugin --install jdbc --url http://xbib.org/repository/org/xbib/elasticsearch/plugin/elasticsearch-river-jdbc/1.4.0.3.Beta1/elasticsearch-river-jdbc-1.4.0.3.Beta1-plugin.zip
Next, you have to install the driver for mysql:
wget "http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.34.tar.gz"
tar -xf ./mysql-connector-java-5.1.34.tar.gz
mv -v ./mysql-connector-java-5.1.34/mysql-connector-java-5.1.34-bin.jar /usr/share/elasticsearch/plugins/jdbc/
Create river:
curl -XPUT 'http://localhost:9200/_river/my_personal_river/_meta' -d '
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://10.0.25.203:3306/db_fixed",
"user" : "elasticsearchtst",
"password" : "************",
"sql" : "SELECT my_id_column _id, some_title_column header, some_text_column content FROM my_table WHERE _id < 18201014 ;",
"index" : "clipping",
"type" : "detail",
"index_settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
}
}
}
'
# Check status
curl -XGET 'localhost:9200/_river/my_personal_river/_status'
# Delete
curl -XDELETE 'localhost:9200/_river/clipping_detail'
Cautions, conlusions & misconseption:
To understand the instance of the river you should treat the river as other index. This special index is flexible - you can modify it conform to your needs (shards/replicas/types). Elasticsearch internally transport all data from the river into the index which is dedicated.
You can disable the river on some of your nodes with parameter:
node.river: _none_ # disabling any river allocation to it
RECOMENDED:
It is recommended to use timestamps in UTC
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "",
"password" : "",
"sql" : [
{
"statement" : "select * from \"products\" where \"mytimestamp\" > ?",
"parameter" : [ "$river.state.last_active_begin" ]
}
],
"index" : "my_jdbc_river_index",
"type" : "my_jdbc_river_type"
}
}
Problem and solutions:
In case your river doesn't work properly...
Check which node is responsible for the river:
curl -XGET '10.0.25.197:9200/_river/clipping_detail/_status?pretty'
{
"_index" : "_river",
"_type" : "details",
"_id" : "_status",
"_version" : 1,
"found" : true,
"_source":{"node":{"id":"KTLya3PxQ_Supn19MswMRA",
"name":"es-lab-201",
"transport_address":"inet[/10.0.25.180:9300]"}}
}
Then, you can log into 10.0.25.180 box and check log to solve potential problems:
tail /var/log/elasticsearch/elastic_lab.log -n 100 | grep jdbc [2014-11-18 13:30:49,023][ERROR][river.jdbc.RiverPipeline ] com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ZELECT clde_id _id, clde_t header, clde_c content FROM the_tab' at line 1
River is to be depracated 🙂