Support - Awareness Program Issue (release 47)

Hi All,

We have finished out our pilot program for our security awareness training - Everything was sorted out with the issues we had and so we have rolled out the training to all our users.

In order to do this we have increased the amount of people in the security group to include all staff. Once we do this we have an awareness program which no longer works.

So here are the symptoms we are now experiencing

  • Emails are being sent to users

  • The user accepts the invite and logs in and is presented with the following screen

  • The total number of people in the training group is not being updated it is still at 16 - There is now a lot more people than 16 in the security group

  • Cron job ran with no issues last night

I have done some basic troubleshooting so far which is as follows

  • Test Getting Members of a Group returns all the members of the group which are supposed to be there

  • Test Getting List of Groups works as it should

  • At one point in debug mode we got the following

Database Error

Error: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

SQL Query: UPDATE eramba_enterprise.awareness_programs SET workflow_status = 4, modified = ‘2017-02-07 09:42:56’ WHERE = ‘8’

Notice: If you want to customize this error message, create app/View/Errors/pdo_error.ctp
Stack Trace

CORE/Cake/Model/Datasource/DboSource.php line 468 → PDOStatement->execute(array)
CORE/Cake/Model/Datasource/DboSource.php line 434 → DboSource->_execute(string, array)
CORE/Cake/Model/Datasource/Database/Mysql.php line 415 → DboSource->execute(string)
CORE/Cake/Model/Model.php line 1924 → Mysql->update(AwarenessProgram, array, array)
CORE/Cake/Model/Model.php line 1758 → Model->_doSave(array, array)
APP/Controller/WorkflowsController.php line 1054 → Model->save(array, boolean)
APP/Controller/WorkflowsController.php line 1361 → WorkflowsController->saveData(string, string, integer)
APP/Controller/WorkflowsController.php line 1130 → WorkflowsController->beforeRequestApproval(string, string, string)
APP/Controller/WorkflowsController.php line 1094 → WorkflowsController->beforeRequestValidation(string, string)
[internal function] → WorkflowsController->requestValidation(string, string)
CORE/Cake/Controller/Controller.php line 491 → ReflectionMethod->invokeArgs(WorkflowsController, array)
CORE/Cake/Routing/Dispatcher.php line 193 → Controller->invokeAction(CakeRequest)
CORE/Cake/Routing/Dispatcher.php line 167 → Dispatcher->_invoke(WorkflowsController, CakeRequest)
APP/webroot/index.php line 110 → Dispatcher->dispatch(CakeRequest, CakeResponse)

Trying to start and stop the program makes no difference either apart from sending multiple mails out to people, this in itself causes issues as outlook sees the volume of mails coming in and marks it as spam.

Could someone get in touch with me please as this is becoming a bit of a disaster

A bit of an update here - we were working with John on the background on this - some conclusions (@john.oneill let me know if this is right):

  • Timeouts played a role here, a reverse proxy timed out the cron which was taking long due the high number of emails required.
  • PHP execution timeouts had the same fate, they were too short (60 seconds) for this long cron process, remember to set this to at least 300 seconds on your php.ini (this is also a health check on eramba)
  • Some accounts from the AD did not have emails, when eramba got an account without emails … kaput. We fixed that out now and will be testing tomorrow Thursday 9th Feb.

We are also looking at a separate cron to handle emails just to make sure the email process does not messes up the rest of the cron. An email takes some 5 to 10 seconds (if using an external server), so 150 emails would be up to 25 minutes!! … sending all out at once wont scale well. For this reason we plan to put an hourly cron that will flush a mail queue. We’ll tell you more about it once is done.

Thanks John for your help debugging all this!

Update here - we have completed a fix for this item and tested with our AD (thousands of groups and users). There will be a few changes:

  • All emails eramba sends out now go to a queue
  • This queue is flushed with your daily cron and an additional hourly cron we have configured. You need to add a new cron entry on your system (we’ll give details once we make this public).
  • Each cron call sends out up to 15 emails, that is 24 x 15 = 360 emails / Day. We will expand the 15 to 50 in some weeks if we see all goes well.
  • We created a page under settings that shows what emails were sent and if they were sent or not (due wrong configs, etc).

We are making this update public next Tuesday.


Just wanted to share that today we concluded testing, it all works well. Our testing is attached. we’ll update tonight the documentation on queues and package the fix , it will be available on tuesday.

DAY 1:

1- running an awareness, checking emails are sent and not duplicated (sent mails with hourly, step 1.1.sql)
2- records make sense (filters are ok, they list reminders are being sent, queue is also ok)
3- take one user and complete training (completed training with goran.galic)
4- check filters make sense for that user (makes sense, goran has two trainings…demo and non demo)

Update clock to one day ahead: (20.2)
DAY 2:

5- run daily cron (dump is step2.sql)
6- expect dashboard updates correctly
7- mails for reminders should be on the queue (they get on the queue and also sent -up to 15 users-)
8- run hourly cron, no duplicated emails … emails should go out (no emails left, the daily cron sent them already)
9- take a user and complete training (ava.bailey completed it) … dump step2.8.sql

Update clock to one day ahead: (21.2)
DAY 3:

10- run cron daily again (dump step 3.10.sql)
11- expect dashboard update again (now two users have completed trainings) -> worked well
12- expect reminders go out again to everyone except the two compliant users // i spoke to martin, the email reminder does not go out here because eramba counts the invite as one mail … so invite + reminder (of day2) = 2 …
13- include on the training 1 user in AD, remove 1 user from the group (removed ava.grant , included carol.gibson)

Update clock to one day ahead: (22.2)
DAY 4:

14- run cron daily again (dump dev4.14.sql)
15- expect dashboard update again (reflecting the user added, the user deleted and the two compliants) -> was ok
16- new user should be sent an email to do the awareness (carol.gibson got the email so is ok)
17- no reminders should be sent (no reminders sent , and is ok)

Update clock to one day ahead: (23.2)
DAY 5:

18- run daily cron (dump dev5.18.sql)
19- emails for new awareness training should be sent to everyone (not ok … no emails on the queue)
20- user included in day3, step 13 must receive a reminder (worked well)

Update clock to one day ahead: (24.2)
Day 6:

21- (dump dev6.21.sql)
22- sends email to everyone again (everyone was sent emails) -> worked well
23- I’ll make one training, so 1 should be compliant (not two as we had so far on the statistics) -> worked well, i did Ava.Rutherford
24- makes everyone not compliant -> not ok, stats still shows 2 compliant

Update clock to one day ahead: (25.2)
Day 7:

25- run daily cron
26- reminders sent
27- stats updated to one compliance