Reduce load on runs db #105

JelleAalbers · 2017-05-28T07:05:21Z

I went through cax looking for code that heavily loads the runs db. Probably one of these three is most significant:

~~massive_ruciax, massice_tsmclient and~~ cax_tape_log_file basically does while True: fetch most of runs db:
https://github.com/XENON1T/cax/blob/8f2a99450cc79a3db7ccec0294ead7a6952122fd~~/cax/main.py#L1146

~~The last one literally does this, the others have a bit of a query restricting it. massive_cax doesn't have this problem, as it uses a projection for the relevant query:~~

cax/cax/main.py

Line 289 in 8f2a994

projection=['start', 'number','name',

Every correction downloads the latest correction doc every time we check a run. One doc in particular is huge -- the electron lifetime beast. Perhaps we should just query the version string first.
Every cax task gets the full run doc of every run it runs on:

cax/cax/task.py

Line 48 in 291ab95

self.run_doc = self.collection.find_one({'_id': id})

That includes runs it's just checking to e.g. see if corrections are up to date.

Then there are some more minor things:

During '_process' there's also a full run doc query, when we're just checking if we need to process a run or not:

cax/cax/tasks/process.py

Line 68 in c3686e3

doc = collection.find_one(query) # Query DB

Massive-ruciax and massive-tsmclient fetch most of the docs in the full runs db when starting up (but only once)

The text was updated successfully, but these errors were encountered:

XeBoris · 2017-05-29T13:17:27Z

Hello,

Regarding 1:
Taking this link as example:

cax/cax/main.py

Line 723 in 8f2a994

docs = list(collection.find(query,

.
Maybe I don't understand the concept fully, but in line 716 is a selection columns which are read from the database and in the lines above (698 - 713) are the numbers of runs limited according the input to massive-ruciax. The idea is to pull once the necessary information from the run database to know which runs are necessary to upload then. I don't see how to reduce this request from the runDB.

JelleAalbers · 2017-05-29T13:24:10Z

Ah, you're right, I was mistaken due to the comment "Select specific data sets" with the selections variable. In fact what you call selections is a projection, so you don't get all the runs info. In massive_cax the projection is called projection (

cax/cax/main.py

Line 289 in 8f2a994

projection=['start', 'number','name',

), so when I didn't see it in the other caxes, I though it wasn't there.

Still, I think cax_tape_log_file doesn't have a projection, here:

cax/cax/main.py

Line 1146 in 8f2a994

while True: # yeah yeah

while True:
    query = {}
    docs = list(collection.find(query))

that looks like a full db dump, right?

XeBoris · 2017-05-29T13:37:54Z

Yes it does. This function is older and was written when I was less experienced with the selection process. On the other other side, this function is called once or twice a week to test data base entries. I can adjust it at some point in the future but due to the low usage I don't think that it would be necessary.

JelleAalbers · 2017-05-29T13:45:54Z

Ok, I thought it was a program that was left running (since it has a while True that only breaks if --run_once is passed). I guess you always run it with run_once then.

pdeperio mentioned this issue Jun 1, 2017

cax all-run mode skips corrections (and slow/stalls) #108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce load on runs db #105

Reduce load on runs db #105

JelleAalbers commented May 28, 2017 •

edited

Loading

XeBoris commented May 29, 2017

JelleAalbers commented May 29, 2017

XeBoris commented May 29, 2017

JelleAalbers commented May 29, 2017

Reduce load on runs db #105

Reduce load on runs db #105

Comments

JelleAalbers commented May 28, 2017 • edited Loading

XeBoris commented May 29, 2017

JelleAalbers commented May 29, 2017

XeBoris commented May 29, 2017

JelleAalbers commented May 29, 2017

JelleAalbers commented May 28, 2017 •

edited

Loading