Question - Working with, exporting, and deleting LOTS of OAs and OA items?

JD_TXST · December 2, 2020, 5:26pm

We’ve had some performance issues with Eramba when working with large sets of data on the Online Assessments module (context below). I’d like to see if anyone could help me find answers to some more specific questions:

Are there any config tweaks that can be made to improve queries made against 13k+ OA Items?
Are there any ways to programmatically export answered OA questionnaires as user-readable PDFs (e.g., creating a PDF of an OA’s questions, answers, comments, findings, attachments, etc. generated for each OA that meets X criteria)?
Other than running a query using a filter and using the “export as CSV” options, are there any better ways to export large sets of OAs and OA items (including comments, findings, and attachments) for external processing or archival purposes?
Assuming we can get all the info we need copied out of Eramba for OAs, can OAs and related OA items be readily purged en masse to reduce query time and improve performance?
Is it correct to assume that other actions, such as running Eramba updates, is directly tied to the amount of data in the system’s database?

Context:

We have been using Eramba to distribute and track risk assessment questionnaires sent to system owners in our tenant*. As a result, we have hundreds of Online Assessments (OAs) with over 13,900 OA items (anywhere from 12 to 70 questions in each OA + comments/findings). To meet some reporting needs, we have to export detailed CSV files of both OAs and OA Items to do more detailed analysis on the data to generate some metrics.

These activities have led to VERY long query times when trying to search for and export data; times are especially long when querying against OA items. For example, it took ~17 minutes to return 13,972 OA items this morning. The amount of data in the system also appears to correlate to how long it takes updates to run.

I’ve watched our production instance’s CPU usage when a query is made, and I’ve seen that only 1 of the 8 CPUs allocated to the VM has ~91% usage attributable to mysql. The DB is installed on the same host with more than adequate disk I/O and 16GB RAM. Ordinary usage, even with multiple simulations users in the OA module, is okay. Running similar queries against much smaller data sets in our non-production instance is much faster despite it having fewer resource (4 vCPUs, 4GB RAM, similar disk I/O capabilities) and being otherwise configured the same (same OS, packages, and just about the same in-app settings).

*My org’s current requirements have us distribute recurring risk assessment questionnaires to internal stakeholders instead of having security team members be the ones to shoulder all of the assessment work. The process in general isn’t great, but it largely comes from requirements out of our control. We DO want to explore other ways to support the “business requirement” of running recurring assessments, but that’s a bit out of scope for this thread and short-term issues we’re facing.

eramba · December 2, 2020, 5:40pm

not frontend , please contact support and send them the output of System / Settings / Error Log / Usage Diagnostics + Download All Log Files

this is a starting point on debugging what could be done on the backend (code) … we might need a Zoom call so bear that in mind. no need to explain the whole thing again, just reference this post (copy the url) on the ticket.

are you after a one report that shows all that? can probably be done with item reports … except when you mention “the criteria” bit, item reports must be “clicked” in order to obtain them…

nope , perhaps if you help us understand what is to be improved on that export other than the obvious slowness you have?

i know this wont sound very logic, but the slowness is not (or better: should not) be related to that volume, the problem is somewhere else and that somewhere else will be hopefully be fully improved on the new cake migration project: Feature - CakePHP upgrade (massive, complex and uncertain) … so while “deleting” things will do the trick for you is not a tool we want to use to deal with this type of problems.

updates on the database will be slower because each update does a full backup of the database, yours must be large and that is why you feel that slowness. the actual update process does not care how much data you have unless a data migration takes place on your large tables … but its very unlikely.

yes - makes sense mysql is the bottleneck because of the amount of short queries needed to build the pages you see in eramba when you click things on that section.

JD_TXST · December 2, 2020, 6:30pm

Wow, thanks for the quick and thorough reply!

I’ll see what I can do to export logs and get them submitted. The address support@eramba.com is where they’ll need to go, correct?

I’m mostly looking for a way to export a user-readable/sharable PDF report of an OA with the listed criteria for batches of OAs. Or at least some more guidance on generating reports through the Online Assessments feature set to export the above criteria for each OA. Having more advanced stats options features (e.g., for each question in X templates, what’s the average score; what questionnaires have questions with answers or sum totals below Y). Admittedly, the report builder hasn’t worked well for me in the past, so to get reports and more detailed analyses, we’ve taken to exporting CSVs, doing some enrichment locally in excel, and uploading them to other systems to run more in-depth queries. Since we’re working with more than a dozen lengthy questionnaire templates, having to build reports for each questionnaire template would be challenging.

Somewhat related being unable to modify an OA’s template after it’s created contributes to the number of OAs we have - having to make a new OA anytime there’s a typo fix or new question that needs to be added means we can’t rely on the recurrence functionality that’d reduce the count of OAs and allow our users to have previous answers auto-imported. I understand the logic behind locking questionnaires for integrity purposes, but it would be nice to be able to administratively disable that on a per-tenant basis.

To elaborate further,

Beyond slowness, I’ve had issues using the “export all pages” when viewing OAs returned through filters. The export all pages feature returns a mostly blank CSV instead of returned filter data. Workaround is showing 100 OAs per page, exporting each page separately, and using local software similar to “stitch” the CSVs together.
For filtering/querying/exporting OA Items, being able to natively get more data on each OA item’s parent OA would be stellar (including things like OA ID, Title, Date, OA Score, Tags).
For OA items exports, it’d also be able to get the points earned from an answer (e.g., answer of “A” on one question is 0 points, “C” on another is 3 points).
More generally, it seems some of the filter options for OAs and OA items don’t display all the data that could be accessible. I suspect some of it may be tied to the questionnaire template used for the OA, which causes some of it to not be currently available in existing filter options.
For full disclosure, I haven’t yet explored some filters’ option to “Store full filter results in a daily log”

I’d also rather not purge data to try to reduce performance issues either. Knowing that a fix should be on the way soon and that the severe performance issue shouldn’t be related to data volume helps. Being able to remove OAs in large groups might be a nice-to-have later on if/when we would want to either clean up some groups of “test” OAs created for demos and workshops or for other, more reasonable needs like resetting a non-production environment.

This is pretty anecdotal, but I’ve seen updates complete on our mostly empty testing install finish in under a few minutes, while updates to prod have taken 10+ minutes on recent releases (enough to start making me sweat). This is a pretty limited issue though, so I’m not going to worry about it as much.

Between these replies and me not being a developer, I’m tempted to ask: Would future updates to Eramba possibly split up queries to mysql to allow for something similar to multi-threading?

eramba · December 3, 2020, 3:41pm

support@eramba.ORG

yes - reports in particular charts are hardcoded, so anything other than what is there must be done using CSV and the Spreadsheets. I dont see this changing (having dynamically generated reports) anytime soon.

yes, like you said we block edition on questionnaires which have been used, if you delete the associated OAs then you can edit it (i guess) … duplicating it and adjusting is the way to go. this does not cause db slowness for sure…well…sure you know what i mean.

this is a bug … i suggest that on the email you did to support you schedule a call to review this in particular. @sam will look into it.

this is doable, int ref: https://github.com/eramba/eramba_v2/issues/2830

i see these fields on the filter:

am i missing something?

capitals needed here …

its very long , 10 minutes means php will hit the environmental variable timeout … we’ll see the issue when we meet on Zoom

thanks !