Abaqus user subroutines matrix generation

SP 2020-01-31

Hi, I'm working with UEL and UMAT user subroutines and have a question about the interaction between the solver and the fortran subroutines. I did some benchmark test and discovered that only 15-20% of the computation time (wall and cpu time, single thread multi thread calculations doesn't matter) from the first call of my UEL (or UMAT) to the last call of this specific UEL (or UMAT) is spent in my code (See example below). What is abaqus doing in between? Does the assembly really need that long? Of course for the UMAT additional time is needed for calculating the residuals but for the UELs I already return those values. Could some of the return values from a UEL or UMAT cause such an effect?

Example:
0. Init Timer A and Time B
1. Element 1 enters UEL (start recording to A, start recording to B)
2. Element 1 exits UEL (stop recording to B)
3. Element 2 enters UEL (start recording to B)
4. Element 2 exits UEL (stop recording to B)
....
N. Last Element enters UEL (start recording to B)
N+1. Last Element exits UEL (stop recording to B, stop recording to A)
After this the sum of all Bs divided by A is around .15 to .20

Maybe to put the question in a more general context: What is abaqus doing when obtaining the global stiffness matrix and residuals when one uses user subroutines?

I know that is a quite specific question, and also that this is maybe hard to answer in such a general manner.