Monday, March 12, 2018

Application Engine Parallel Processing (batch)

Parallel processing comes in picture when there is a requirement to process large amount of rows without compromising the performance which might otherwise get impacted greatly with non-parallel based processing. Sharing this concept with an actual example where this concept has been used.

I had written an application engine program which read data via a SQL object, performed some computations and finally wrote the result to an extract file which was sFTP'ed to the vendor. Pretty much run of the mill stuff. The AE was reading from PS_JOB, PS_PERSONAL_DATA, PS_EMAIL_ADDRESSES tables etc via a single single SQL object in a peoplecode step, performing the data manipulation as required and writing to a file. It was sequentially selecting an employee at a time, performing the data manipulations and writing to the file. To process around 3K employees it was taking a little over an hour, so not a lot of data but too much time to process this data-set.

So to solve this issue of performance, this is what I did. 
  1. I created a single TAO or temporary table. The fields in this table are essentially the unique list of fields or values that the single SQL object was returning. Did not build the TAO table yet, will do it in a later step.
  2. The TAO table has two keys, PROCESS_INSTANCE and EMPLID.
  3. Then before the peoplecode step was called added a new step to select from the various HR tables and write to this newly created TAO table. 
  4. SQL step looks something like this.
         INSERT INTO %Table(N_MY_TAO)          SELECT %Bind(PROCESS_INSTANCE)         , B.EMPLID         .....        .....
        .....
        FROM PS_JOB A, PS_PERSONAL_DATA B .....
        WHERE ....

    5. After this added a new step to update statistics on the newly populated TAO table.
        %UpdateStats(N_MY_TAO)
    6. In the Peoplecode step replaced the SQL fetch as follows.
        &EESQL = CreateSQL(FetchSQL(SQL.N_MY_SQL), N_MY_AET.PROCESS_INSTANCE);
    7. I had already defined a state record in my AE which has process_instance field in it, so did not have to do anything different with the state record. In the N_MY_SQL SQL object I am selecting all fields FROM %Table(N_MY_TAO) WHERE PROCESS_INSTANCE = :1 
ORDER BY EMPLID. I have some SQL case statements and formatting rules defined in the SQL itself.
8. Added the newly created TAO table under temp tables program properties and provided an instance count of 1. Instance count has to be 1 or more, if its set to 0 then you will see the following message in your AE log and there won't be much of performance improvement. 

WARNING: NO DEDICATED INSTANCES AVAILABLE FOR N_MY_TAO - USING BASE TABLE. (108,544)

9. Under PeopleTools > Utilities > Administration > PeopleTools Options, in my case the Temp Table Instances (Total) and Temp Table Instances (Online) is set to 3. So when I add 1 as the instance count in my AE and then build the TAO table, it will create PS_N_MY_TAO, PS_N_MY_TAO1, 2, 3 and 4 and when the process runs it will use PS_N_MY_TAO4 as the first 3 are used for online processing. 

10. You can add a step at the beginning or end to purge the TAO table like %TruncateTable (%Table(N_MY_TAO)) or better yet use the check-box under program properties "Use Delete for Truncate Table". With this option the TAO table is purged at the beginning of each run.

Addition of a single TAO table enabled the process to complete in 7 mins. Massive improvement.