Tuesday, December 18, 2012

Mass Data Processing in SAP ABAP Environment - Update Function Modules

Background

For an SAP ABAP project I am doing currently, I had the challenge to loop through a database table with 50+ million rows, do some processing with each line and after successful processing I wanted to write some data to another table in the database.

Note: throughout this post I use the notation "<some text>"

First Attempt: Standard Approach

My first attempt looked roughly like this:

SELECT * FROM <DB table1 (50+ million)> into <wa1>. 
process data and put processed data into internal table <itab1>. 
ENDSELECT. 
LOOP AT <itab1> INTO <wa2>. 
INSERT INTO <DB table2> FROM <wa2>.
COMMIT WORK. 
ENDLOOP.

Unfortunately, due to the size of the <DB table1> this ran into a timeout error in dialog processing as well as in batch / background processing.

Second Attempt: Write each single line within SELECT ...ENDSELECT

My second attempt looked like this:

SELECT * FROM <DB table1 (50+ million)> into <wa1>. 
process data and put processed data into internal table <wa2>. 
INSERT INTO <DB table2> FROM <wa2>.
COMMIT WORK. 
ENDSELECT.
Unfortunately, the COMMIT WORK within the SELECT ... ENDSELECT causes a shortdump, as the COMMIT WORK closes the database connection which is required for the next SELECT step.

Third Attempt: Using the Update Function Module

After a bit of reading, I thought out that the problems above could be resolved by encapsulating the processing logic above into an update function module and put the COMMIT WORK outside of the SELECT ... ENDSELECT. For the update module I used the option "Start Immediately". PLease note as well that no COMMIT must be called inside the update module.
Additionally, I used the option PACKAGE SIZE n for the SELECT to do packaged (bunched) processing. In pseudocoding this looks like this:
SELECT * FROM <large DB table> INTO TABLE <itab> PACKAGE SIZE <package size>. 
CALL FUNCTION 'Update_Module_With_Processing_Logic' IN UPDATE TASKEXPORTINGitab_in = <itab>. 
ENDSELECT. 
COMMIT WORK.

Unfortunately, this approach ran into issues as well. I created too many update tasks and the update queue crashed before the COMMIT WORK was issued.

Fourth and Final Attempt: Putting it All Together

Finally, after some more reading and a helpful blog I found a working solution. This solution uses

  • Database cursor and FETCH instead of SELECT
  • Packaged processing via PACKAGE SIZE n
  • Decoupling of the DB data fetch from COMMIT by using the option WITH HOLD
  • Decoupling of the work processes via asynchronous RFC calls
  • Queued DB updates via update function module

The corresponding main program logic looks like this:

OPEN CURSOR WITH HOLD <the cursor> FOR SELECT * FROM <large DB table>. 
DO. 
FETCH NEXT CURSOR <the cursor> INTO TABLE <itab> PACKAGE SIZE <package size>. 
IF sy-subrc <> 0.EXIT.ENDIF. 
CALL FUNCTION 'RFC_Module_Calling_Update_Module' STARTING NEW TASK <the task>DESTINATION GROUP DEFAULTEXPORTINGitab_in = <itab>. 
ENDDO. 
CLOSE CURSOR <the cursor>.
The RFC-module logic looks like this:
CALL FUNCTION 'Update_Module_With_Processing_Logic' IN UPDATE TASKEXPORTINGitab_in = <itab>.  

COMMIT WORK AND WAIT.

Furthermore I changed the update module type to "Start Delayed".

By this approach I was able to do the processing of more than 50 million lines. Maybe it is noteworthy that the processing of the update queue took about 1 day (depending on system). The update queue can be monitored with the help of transaction SM13.

I hope you can get something out of this and you enjoyed reading, all the best
Wolfi