Document Format

The related items document that is POST’ed to the indexing web application has a couple of keys that it refers to:

  • items”: An array of items that are related, i.e. just been purchased together
  • id”: The id of the item, this is your identify for the item, for example the Product Id.
  • date”: The date at which the related items were created, i.e. the purchase time

Example:

{
    "date": "2013-12-24T09:44:41.943+10:00", 
    "items": [
        {
            "id": "1"      
        }, 
        {
            "id": "2" 
        } 
    ] 
}

Or in short form:

{
    "date": "2013-12-24T09:44:41.943+10:00", 
    "items": [ "1","2" ]
}

The short form of the post makes the assumption that the strings in the ”items” array are the ”id“‘s

As previously seen the indexing POST document can contain extra keys and values that are associated to either the related items POST in it’s entirety, or can be specific to a particular item in the related items document.

For example:

{
    "date": "2013-12-24T09:44:41.943+10:00", 
    "site": "amazon", 
    "channel" : "uk",       
    "items": [
        {
            "department" : "electronics",
            "category" : "storage",
            "type" : "hard disk",
            "id": "1",      
        }, 
        {
            "department" : "electronics",
            "category" : "notebooks",
            "type" : "macbook pro",
            "memory" : "8gb",
            "channel" : "de",
            "id": "2", 
        } 
    ] 
}

Given the above the key/value: “site”: “amazon”, will apply to all the related item documents that are indexed in elasticsearch (2 documents in the above). The key/value pairs within the “items” array will apply just to that item’s document that is indexed.

Document with “id”: “1” will have the key/values:

"date": "2013-12-23T23:44:41.943"
"site": "amazon", 
"channel" : "uk",    
"department" : "electronics",
"category" : "storage",
"type" : "hard disk",    

Document with “id” : “2” will have the key/values:

"date": "2013-12-23T23:44:41.943"
"site": "amazon", 
"channel" : "de",    
"department" : "electronics",
"category" : "notebooks",
"type" : "macbook pro",
"memory" : "8gb"

If an element exist in the item’s property that was defined in the parent enclosing document, then the item’s value takes precedence and overrides that of the enclosing document’s setting.

With the above you can see that the second item has an extra field “memory”, than that of the first document. Whilst it does not make much difference to the backend (elasticsearch) or the web applications (searching or indexing), for consistencies sake you shouldn’t really have key existing in one related item that do not appear in the. However, it is entirely up to you.