Overview

The index web application is POSTed data containing a group of related items, i.e. for example a purchase containing several items.

curl -H"Content-Type:text/json" -XPOST -v http://localhost:8080/indexing/index -d '
{
   "channel":"de",
   "site":"amazon",
   "items":[
      {
         "id":"1",
         "type":"map"
      },
      {
         "id":"2",
         "type":"compass"
      },
      {
         "id":"3",
         "type":"torch"
      },
      {
         "id":"4",
         "type":"torch",
         "channel":"uk"
      }
   ]
}'

The indexing application returns a 202 accepted status code to the client that issued the indexing request. This indicates to the client that the request has been accepted for processing and indexing. Please note that the returning of 202 doesn’t mean that the related items have been stored in Elasticsearch. It means that the item is current held within the disruptor ring buffer pending processing and storage. The indexing application can return the following status codes:

  • 400 : The indexing json contains no products, or is invalid jason
  • 503 : The ring buffer is full, and cannot accept any more items for processing.
  • 202 : Item is being processed

It is at this point the POSTed json data is assembled into several documents. For example, for the above request; 3 JSON documents will be assembled and submitted to elasticsearch for storage and indexing:

{
    "id": "1" ,
    "date": "2013-12-24T17:44:41.943Z",
    "related-with": [ "2","3","4"],
    "type": "map",
    "site": "amazon",
    "channel": "de"
}

{
    "id": "2" ,
    "date": "2013-12-24T17:44:41.943Z",
    "related-with": [ "1","3","4"],
    "type": "compass",
    "site": "amazon",
    "channel": "de"
}

{
    "id": "3" ,
    "date": "2013-12-24T17:44:41.943Z",
    "related-with": [ "1","2","4"],
    "type": "torch",
    "site": "amazon",
    "channel": "de"
}

{
    "id": "4" ,
    "date": "2013-12-24T17:44:41.943Z",
    "related-with": [ "1","2","3"],
    "type": "torch",
    "site": "amazon",
    "channel": "uk"
}

It is the json element “related-with”, that will be used during search, to provide faceting information. It is this faceting functionality provided by Elasticsearch that allows us to say:

  • For this id, what are the 5 top most frequently related-with ids.