This topic contains 6 replies, has 0 voices, and was last updated by CREECE 7 years, 7 months ago.

  • Author
    Posts
  • #1294

    CREECE

    If I use the cache module within a map reduce script and use the cache to store a counter that each queue accesses and increments, is this a safe operation? I want to take some metrics throughout the script and use them in my summary phase. Global objects won’t work as they aren’t available in the summary phase but the cache is.

    Thanks!
    This is a cached copy. Click here to see the original post.

  • #1295

    david.smith

    I haven’t done much with cache but in this case I’m not sure I’d trust it. I guess you could try testing it on a small scale and see. Are you trying to read and write to it from all the queues?

  • #1296

    CREECE

    I have tested it with single queue phases like getInputData where I write how many of a certain record I retrieved, search filters/columns. I can easily store that in the cache and access that in the summary phase. Multi-queue phases, I am not certain it will be valid though as I’m not sure how it behaves behind the scenes. I can run tests but with my data available, I am not sure I’d even get an accurate sample size to be sure it is what I am expecting. I know each summary phase has key/values but that’s not necessarily how many records could have been created from said data should their be errors. From what I have seen so far, I believe I can only accurately give data from certain phases and leave the rest up to some saved searches on the customer end.

  • #1297

    chanarbon

    From my experience using it with M/R script, it makes thing easier when retrieving data that would be used on the parallel phases or also store information from a RESTlet scheduling an M/R script that is too big or too complicated to be stored on parameter fields

  • #1298

    david.smith

    I finished up a script that uses all queues and I used cache to store some values. You have to be careful where your puts and gets are within your code but I have not been seeing any issues with the way I’ve done this so far. I’m not only keeping a count but also keeping an array of errors that occur during the map phase. Then in the summary I’m writing out the cache in an email notice.


    CREECE replied on 04/05/2017, 12:17 AM: I too would like to keep count of some things but I am not sure if queue 1 is accessing the cache and incrementing a counter, and queue 2 or 3 comes in and simultaneously wants the cache resource as well, is there a deadlock? who wins? a deadlock seems totally possible unless the resource is locked and processed in order of access (so queue 1 goes then 2 then 3).

  • #1299

    chanarbon

    On my case, I normally write the to the cache on either getInputData or map then retrieve them on reduce (If I write it in getInputData) and summarize (if I write into the cache using the parallel phases) also for the fact that I would want to consolidate things on summarize where I do the post-run reports of the M/R scripts and so far it is pretty useful to me.


    CREECE replied on 04/05/2017, 12:16 AM: But how are you sure that those values you are caching will be available if the TTL isn’t promised as per the documentation? If its not something you can specify a loader to such as “I created x amount of records in my map phase across all queues” then couldn’t that data potentially be lost?

  • #1300

    CREECE

    I also use it like that chanarbon My main concern with the cache at this point is there is a disclaimer stating that you aren’t promised that the data will be there when you need it since the TTL (time to live) may or may not stick. I don’t specify a value for TTL so I expect the cache to be available for the life of the map/reduce run but sine I am not promised that, I am a bit worried on relying on it. Do you have any insight? It seems like using the runtime session is the better thing if its contained within the same script.

    Update:

    Looks like the runtime session doesn’t go across map/reduce phases. Will have to just revert to cache w/ the loader function.

You must be logged in to reply to this topic.