History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: AMATH-746
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Critical Critical
Assignee: Unassigned
Reporter: Stefan Winterstein
Votes: 1
Watchers: 1
Operations

Link this issue to another issue
If you were logged in you would be able to see more operations.
ActiveMath

XLM slows down application over time

Created: 2006-04-26 15:04   Updated: 2006-11-09 16:11
Component/s: Server: Scalability, XLM: Learner Model
Affects Version/s: None
Fix Version/s: 1.0

Time Tracking:
Not Specified

File Attachments: 1. Text File activemath-timing.log (757 kb)
2. Zip Archive activemath-with-xlm.zip (342 kb)
3. Zip Archive activemath-without-xlm.zip (6 kb)
4. Text File single-request-with-xlm.log (72 kb)
5. Java Source File XlmModelImpl.java (33 kb)



 Description  « Hide
When using XLM over time, crucial operations of the application get slower and slower. This esp. affects viewing books and putting together a book.

With a newly created user, everything is fast as ever, but when there is more and more of XLM data for a user, things get noticeably slower over time.

The reason seems to be fetching mastery/knowledge values (for toc, page, groupings). This is with batch operations to get values for a list of ids already in place.

Need to dig deeper...



 All   Comments   Work Log   Change History   Version Control   FishEye   Crucible   Related Builds      Sort Order: Ascending order - Click to sort in descending order
Rafael Morales - 2006-04-27 11:04
Have you tried with a new created user at the same time as an old user? If so, is access to the new user model still fast or becomes slow as access to the old model?

Rafael


Carsten Ullrich - 2006-04-27 11:04
I tried it the following way: First, the old user. Slow. Then logout and creation of a new user. Fast.

Rafael Morales - 2006-04-27 14:04
Without deleting the previous user learner model?

Carsten Ullrich (JIRA) wrote:

> [ http://jira.activemath.org//browse/AMATH-746?page=comments#action_12416 ]
>
>Carsten Ullrich commented on AMATH-746:
>---------------------------------------
>
>I tried it the following way: First, the old user. Slow. Then logout and creation of a new user. Fast.
>
>
>
>>XLM slows down application over time
>>------------------------------------
>>
>> Key: AMATH-746
>> URL: http://jira.activemath.org//browse/AMATH-746
>> Project: ActiveMath
>> Type: Bug
>> Components: Scalability, XLM: Learner Model
>> Reporter: Stefan Winterstein
>> Fix For: 1.0-radar
>>
>>
>
>
>
>>When using XLM over time, crucial operations of the application get slower and slower. This esp. affects viewing books and putting together a book.
>>With a newly created user, everything is fast as ever, but when there is more and more of XLM data for a user, things get noticeably slower over time.
>>The reason seems to be fetching mastery/knowledge values (for toc, page, groupings). This is with batch operations to get values for a list of ids already in place.
>>Need to dig deeper...
>>
>>
>
>
>


Stefan Winterstein - 2006-04-27 16:04
Here's a logfile with timing data for requests, and DEBUG only enabled for XLM.

Stefan Winterstein - 2006-04-27 16:04

The log contains a test with 3 users: 1 old user with lots of XLM data, 1 fresh user who has only solved 1 exercise, and one new user.

The script for the log (you can use the times to find the relevant places in the log):

login winter 1 (old user)

15:45:04 go to Up&Down p1, p2
Main Menu
go to Examples p1, p2
logout,
login winter 2 (new user, 1 exercise solved):

15:47:17 go to Up&Down p1, p2
Main Menu
go to Examples p1, p2
logout,
create a new user winter3

15:48:48 go to Up&Down p1, p2, p1, p2
Main Menu
go to Examples p1, p2

The times for "action()" denote the time spent in the controller, without the view rendering.

We find that a page view takes 2-6 seconds, but independant of the user (this contradicts what I said in the description, that a new user is "fast as ever").

Doing the same test without XLM enabled for knowledge values (config: xlm.user-mastery.enabled=false") yields controller page view times <500ms.

More to come.


Stefan Winterstein - 2006-04-27 16:04
Here's the log when XLM is disabled for the knowledge values.

Rafael Morales - 2006-04-27 17:04
Why there are more times taken in the first run than in the second? I guess it is because in the second only one of the three examples (winter 1, 2 and 3) is run.

Some statistics:

$ stats -v n min max mean sd < with-xlm-times.dat
n = 78
min = 15
max = 6813
mean = 2310.53
sd = 1992.32

$ stats -v n min max mean sd < without-xlm-times.dat
n = 21
min = 15
max = 6171
mean = 604.19
sd = 1565.68

The difference is, on average, between 0.6 and 2.3 seconds, yet xLM is not much slower in the worse case. I am wondering what the results would be for without-xlm with more data.


Rafael Morales - 2006-04-27 17:04
When a request is made to xLM on a summary belief on content the following process take place:

content-id => top-content-ids => topic-ids => belief-descs => beliefs => single belief => summary.

My guess is that the most time consuming task in the process of recovering a belief on content from the learner model is content-id => top-content-ids, because it traverses the 'for' relations up to the top. We could cache this information, yet I think it would be cleaner if the cache is closer to the content; i.e. moving getTopContentIds from XlmModelImpl to somewhere outside xLM (e.g. ContentManager).

What do you think?


Carsten Ullrich - 2006-04-27 18:04
I can definitely reproduce the difference between a new learner and an old learner (without deleting the learner model). You can see it best if you use the first steps of course generation (without generating a course). Simply enter the menu, select any scenario, select the LeAM grouping. For a new learner, the next step (displaying the content) takes place in about 30 seconds, for an old learner it takes more than a minute. I can confirm it when switching between pages, too. In the book up&down, switching between pages (for both learner the same pages) with a new learner takes three seconds, with the old learner about six seconds. I tried this several times, to rule out effects of the cache.

Stefan Winterstein - 2006-04-28 09:04
Hi Rafael, basically I think caching is a good idea, but I' not sure that content-id => top-content-ids really is the bottleneck. For-Relations are instanciated lazyly in Item, but once they are fetched, they are stored in Item.

Why not store content-id -> summary once calculated?

For quick tests on what takes time, you can use org.activemath.webapp.util.StopWatch:

StopWatch sw = new StopWatch("Transformer");

sw.start("Building JDOM document");
...
sw.stop();

sw.start("Loading stylesheet");
...
sw.stop();

System.out.println(sw.prettyPrint());


Stefan Winterstein - 2006-04-28 09:04
Regarding Rafaels comment on statistics AMATH-746#action_12421: the logfile with XLM contains many other actions, whereas for the short one without XLM, I only did the test script.

For your statistics, you should only consider the requests to view.cmd.


Stefan Winterstein - 2006-04-28 10:04
Attaching single-request-with-xlm.log, which is an extract of a single page view from the big log, for further analysis.

Stefan Winterstein - 2006-04-28 10:04
Looking at single-request-with-xlm.log, we get 19x "Recovering summary beliefs":

2006-04-27 15:43:11,069 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,225 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,303 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,350 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,491 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,553 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,772 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,819 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:11,866 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,006 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,147 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,210 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,335 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,397 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,522 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,600 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,756 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,819 DEBUG model.XlmSession - Gone out (winter, 1)!
2006-04-27 15:43:12,960 DEBUG model.XlmSession - Gone out (winter, 1)!

Each recovering takes in the order 50-200ms (which sums up to our 2000ms request time).

Having a look at one recovery, we see that the bulk of time is spent between the last 2 lines, nowhere else:

2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Recovering belief on 'winter' on 'mbase://ActiveMath_Content_Examples/example_examples_theory/example_1'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Recovering top content for item 'mbase://ActiveMath_Content_Examples/example_examples_theory/example_1'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Revising 'mbase://ActiveMath_Content_Examples/example_examples_theory/example_1'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Revising 'mbase://ActiveMath_Content_Examples/example_examples_theory/def_example'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Revising 'mbase://ActiveMath_Content_Examples/example_examples_theory/example'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Top content for exercise 'mbase://ActiveMath_Content_Examples/example_examples_theory/example_1':[mbase://ActiveMath_Content_Examples/example_examples_theory/example]
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Recovering top content for item 'mbase://ActiveMath_Content_Examples/example_examples_theory/def_example'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Revising 'mbase://ActiveMath_Content_Examples/example_examples_theory/def_example'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Revising 'mbase://ActiveMath_Content_Examples/example_examples_theory/example'.
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Top content for exercise 'mbase://ActiveMath_Content_Examples/example_examples_theory/def_example':[mbase://ActiveMath_Content_Examples/example_examples_theory/example]
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Focus topics: []
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Related topics: []
2006-04-27 15:43:11,491 DEBUG model.XlmModelImpl - Belief( winter, mbase://ActiveMath_Content_Examples/example_examples_theory/example_1) = [: 0.0, I: 0.0, II: 0.0, III: 0.0, IV: 0.0, I,II: 0.0, II,III: 0.0, III,IV: 0.0, I,II,III: 0.0, II,III,IV: 0.0, I,II,III,IV: 1.0]
2006-04-27 15:43:11,553 DEBUG model.XlmSession - Gone out (winter, 1)!

So I think getTopContentIds() is not the culprit. Rafael?


Rafael Morales - 2006-04-28 16:04
OK. I have run a LeActiveMath with stopwatches inside getSummaryBelief, decomposing the time into

(1) recovering root of competency map,
(2) creating an xLM session,
(2) recovering domain topics associated to content,
(3) calculating the summary belief, and
(5) releasing the xLM session

I played with LeAM Calculus RecBook, chapter Items for differentiation. I did the first eleven exercises three times, including self report on affect. Then I logged out and logged in again, and open book again. I measured the time it took for LeActiveMath to display the book (on Items for differentiation, being the last chapter I had seen, and the times for recovering summary beliefs. This is a summary of what happened.

  • It took 5.5 minutes to display the book on that chapter, of which 3.7 where spent recovering summary beliefs.
  • The statistics for the times in getting summary beliefs are as follows (in milliseconds):

TASK MIN MAX MEAN STDEV SUM
Get compet. id 0 1 0.00943 0.09713 1
Get xLM session 0 43281 1245.76 6135.71 132051
Recover topics 0 375 4.66038 36.6915 494
Calc. belief 0 97 2.72642 11.0475 289
Releasing session 0 3202 851.547 636.269 90264
TOTAL 0 45009 2104.71 6145.8 223099

  • Basically, they say most of the time is consumed at creating/recovering and releasing xLM sessions, which are there for synchronising access to the learner models. It amounts to more than 99% of the time consumed at recovering a belief.
  • Notice that on average it takes xLM 2.1 seconds to produce a summary belief and 80% of the times it takes it less than 8.2 seconds.
  • It is very important to notice that the time consuming task has almost nothing to do with learner modelling and more to do with concurrent programming in Java and synchronised access to Hibernate. Hence, in the worst case, the problem could be fixed by anyone with good knowledge of Java and concurrent programming, no requiring learner modelling expertise.

Rafael Morales - 2006-04-28 16:04
Forgot to attache log file.

Rafael Morales - 2006-04-28 17:04
This is the log file with stopwatches on getSummaryBelief.

Rafael Morales - 2006-04-28 17:04
This is XlmModelImpl with the stopwatches in.

Stefan Winterstein - 2006-05-03 17:05
Thanks for your analysis, Rafael.

> It is very important to notice that the time consuming task has
> almost nothing to do with learner modelling and more to do with
> concurrent programming in Java and synchronised access to
> Hibernate. Hence, in the worst case, the problem could be fixed by
> anyone with good knowledge of Java and concurrent programming, no
> requiring learner modelling expertise.

This is true for almost all performance problems...


Stefan Winterstein - 2006-05-03 17:05
Maybe Hibernate caching would be a solution?

Carsten Ullrich - 2006-07-14 13:07
This still happens despite all optimisation efforts BUT ONLY with users I have been using for a long time. The differences are striking: opening the LeAM TOC for course generation happens instantly for a new users, but takes 30 seconds for my old user.

Stefan Winterstein - 2006-07-20 15:07
Yes, XLM is still behaving strangely. There can be long delays, up to a minute.
See also the deadlocks in AMATH-863 .

Things are starting to look bad for the evaluation.


Stefan Winterstein - 2006-11-09 16:11
Seems like we have to live with it.