{"id":513,"date":"2012-05-02T11:36:23","date_gmt":"2012-05-02T11:36:23","guid":{"rendered":"http:\/\/10sa.com\/sql_stories\/?p=513"},"modified":"2012-05-02T11:36:23","modified_gmt":"2012-05-02T11:36:23","slug":"when-sphinx-search-v-0-9-9-ends","status":"publish","type":"post","link":"http:\/\/10sa.com\/sql_stories\/?p=513","title":{"rendered":"When Sphinx Search v. 0.9.9 ends"},"content":{"rendered":"<p>My source data grew and grew until I encounted the problem with the version 0.9.9.<\/p>\n<p>1. <strong>INDEXER<\/strong><\/p>\n<p>While building index I received the following warning:<br \/>\nWARNING: duplicate document ids found<\/p>\n<p>I checked it twice and the source data were not duplicated.<\/p>\n<p>This is the sample of the indexer output when the a\/b warning occurred.<br \/>\ncollected 37556684 docs, 88183.8 MB<br \/>\ncollected 203872754 attr values<br \/>\nsorted 203.9 Mvalues, 100.0<br \/>\nsorted 53924.6 Mhits, 100.0<\/p>\n<p>2. <strong>SEARCHD<\/strong><\/p>\n<p>The next problem was that the searchd was unable to bring this index up (I received the message &#8220;Segmentation fault&#8221;).<br \/>\nI observed the following errors in \/var\/log\/messages<br \/>\nFeb 15 15:54:57 shp01 kernel: searchd[4860]: segfault at 0000000000000000 rip 000000000048291e rsp 00007ffff30ebba0 error 4<br \/>\nFeb 17 08:32:12 shp01 kernel: searchd[2692]: segfault at 0000000000000020 rip 00000000004d627b rsp 00007ffff30eabf0 error 6<\/p>\n<p>While examaning the sorces I think the problem in indexer was at:<br \/>\n# in the loop<br \/>\nwhile ( qDocinfo.GetLength() )<br \/>\nWith the following code I noticed many of id had the value 0:<br \/>\nbool bPBiszero = DOCINFO2ID ( pEntry ) ==0;<\/p>\n<p>When the searchd started it has the following problems:<\/p>\n<pre lang=\"cplus\">\r\n# in the method:\r\nbool CSphIndex_VLN::Preread ()  \r\n# int the loop:\r\nfor ( DWORD i=1; i<m_uDocinfo; i++ )\r\n\r\n# I introduced the following code in order to avoid Segmentation fault\r\nif ( uHash==uLastHash )\r\n    continue;\r\nif (uHash  < 0 )\r\n    continue;\r\nif (DOCINFO2ID ( &#038;m_pDocinfo[i*iStride] )==0)\r\n    continue;\r\n# Here I tried to catch the exception\r\nwhile ( uLastHash<uHash ) {\r\n    pHash [ ++uLastHash ] = i;\r\n}   \r\n\r\n# Next at MVA loop\r\nARRAY_FOREACH ( i, dMvaAttrs )\r\n# I did not allow to stop process by adding the following instructions:\r\nif ( uOff>=m_pMva.GetNumEntries() )\r\n   \/\/ printf(\"broken index: mva docid verification failed [...]\r\n    continue;\r\nif ( i==0 && DOCINFO2ID(pMva-DOCINFO_IDSIZE)!=uDocID )\r\n    \/\/ printf(\"broken index: mva docid verification failed, (docidfmt: [...]\r\n    if (uDocID == 0)\r\n        continue;\r\n<\/pre>\n<p>And the index worked (but no concistency).<\/p>\n<p>The solution is to use the latest version (2.0.4).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My source data grew and grew until I encounted the problem with the version 0.9.9. 1. INDEXER While building index I received the following warning: WARNING: duplicate document ids found I checked it twice and the source data were not duplicated. This is the sample of the indexer output when the a\/b warning occurred. collected [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,12],"tags":[],"_links":{"self":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts\/513"}],"collection":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=513"}],"version-history":[{"count":12,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts\/513\/revisions"}],"predecessor-version":[{"id":525,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=\/wp\/v2\/posts\/513\/revisions\/525"}],"wp:attachment":[{"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=513"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=513"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/10sa.com\/sql_stories\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=513"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}