This topic contains 4 replies, has 0 voices, and was last updated by dc92974 6 years, 8 months ago.

  • Author
    Posts
  • #18003

    dc92974

    What is the most efficient way to process a large query with a map/reduce script? I have a query that returns 100,000-150,000 rows. That consolidates down to 10,000-12,000 rows. Then some calculations run which result in usually less than 100 updates that need to happen.

    The piece I would like to improve is processing the initial 100-150,000 rows. Here are the methods I’ve tried…

    Try 1: Loop through the results in the input phase and create an array.

    Result: I didn’t keep this code but it hit a script execution limit looping through the results. Not really surprising.

    Try 2: Just return a search in the input phase so the map phase and process the results.

    return {

    type: ‘search’,

    id: ‘customsearch_oc_inventory_calc_feed’

    };

    Result: Hit the 50mb memory limit

    Try 3: Run the saved search in the input phase and pass along the number of pages to the map phase. In the map phase reload the query and process whatever page index was passed to it.

    Result: Takes about 11 minutes to get through 135,000 records.

    function getInputData() {

    var searchObj = search.load({id: ‘customsearch_mysearch’});

    var pagedSearch = searchObj.runPaged({

    pageSize: 1000

    });

    if(pagedSearch.pageRanges.length === 1800){

    log.error(“Search returned more than the max of 1800 pages.”);

    return;

    }

    return pagedSearch.pageRanges;

    }

    function map(context) {

    var pageIdx = JSON.parse(context.value);

    log.debug({ title: ‘Page ‘ + pageIdx.index + ‘ – Start’ });

    var searchObj = search.load({id: ‘customsearch_mysearch’});

    var pagedSearch = searchObj.runPaged({

    pageSize: 1000

    });

    var data = [];

    var page = pagedSearch.fetch({index: pageIdx.index});

    page.data.forEach(function(result) {

    data.push({

    id: result.id,

    type: result.getValue({name:’type’}),

    upccode: result.getValue({name:’upccode’}),

    /* rest of columns go here, 18 in total */

    });

    return true;

    });

    log.debug({ title: ‘Page ‘ + pageIdx.index + ‘ – End’, details: data.length + ‘ records’ });

    }

    Try 4: Same as 3 but I used an object I passed in through define to try and maintain the state of the paged search from the input to map phase.

    Result: The state was reset in the map phase which is pretty much expected behavior.

    So, Try 3 I was definitely getting “creative” but it’s actually the only attempt that worked. This seems like such a hack but I guess it did technically get all the results. Any better ideas? I haven’t written any code past this, once I have the results the rest is pretty easy.
    This is a cached copy. Click here to see the original post.

  • #18004

    MChammaTX

    Interesting approaches. I will say I’m surprised you hit a script execution limit in try 1 as I’ve done that with over 150k rows before (although I think I did max out please close to that point).

    Try 3 may be your best bet or find a way to filter your data more.

  • #18005

    dc92974

    In this case I was able to use formulas, grouping and summary functions to bring my saved search results down to 35,000.

    Try 1: Loop through the results in the input phase and create an array.

    Result: This worked and allowed me to create an array to pass to the map phase. The size ended up just over 10Mb so I have a comfortable gap to the 50Mb limit.

    Try 2: Just return a search in the input phase so the map phase and process the results.

    return {

    type: ‘search’,

    id: ‘customsearch_oc_inventory_calc_feed’

    };

    Result: Allows the data to get to the map phase but because of the grouping and formulas using the results after their converted to JSON was a mess… although technically possible if you hate someone on the support staff.

    Try 3 & 4: Not necessary with the slimmed search results

    So looks like I’m going with a modified saved search and Try 1. It’s interesting trying to find the line between the work you do in the saved search and the work you do in the Map/Reduce script. At this point I’m a little disappointed with the size of data you can feed the Map/Reduce script but, from the map down phases it actually does seem to work well.

    Hoping there are some good sessions at Suiteworld this year I can pick up a few more tricks to working with Map/Reduce scripts. ๐Ÿ™‚

  • #18006

    MChammaTX

    For what it’s worth NetSuite is increasing the data size to 200MB from 50MB in 2018.1

  • #18007

    dc92974

    Nice, I missed that in the release notes. That will be nice, 50MB disappears pretty quick.

You must be logged in to reply to this topic.