We have experienced a number of issues that the Member Service and support teams have been working on over the last week and a half. Whilst we have implemented some improvements we are still manually monitoring and processing files to ensure that SIMs are distributed and can be activated.
We made a deployment two weeks ago, on Thursday 19th that changed the state in which we send out our SIMs. This meant that the SIMs would register on the network when they were put in the handset rather than only after they had been topped up at giffgaff.com. This was so Members could check the network coverage before paying.
To send SIMs in this state meant we had to set up a number of new interactions between our Prepay billing system and the network. When the SIM was ordered, we added a service that would divert any outgoing call made by the SIM, before top up, to a giffgaff announcement. This would give the Member more information about giffgaff, then when the SIM was topped up we would remove that service.
It was in this area where we eventually found the main issue: the new interactions, combined with increase in demand for giffgaff SIMs meant that the volume of traffic on our internal systems increased by about 250% in the last two weeks, and the current build of our system at that point struggled to cope.
We saw the firewall across this interface fail about midday on Tuesday 24th and whilst we brought the interface back up later that day the queue had continued to grow. It was only on Wednesday 25th that we identified that the queues were increasing due to the poor performance of the interface, and at that point we started to proactively manage the back log.
The main problem we face is that this interface processes all orders that make changes to the network. These orders include SIM Orders, port ins, adding bars and activations (top ups). All of these orders have to pass through this interface - and because of the new interactions there were more individual transactions per order.
On Thursday 26th we implemented a change to our code to reduce the number of transactions through the interface (by not adding and then removing the divert service mentioned above), on top of this we have been running additional monitoring and manual batch processes across the weekend and into this week.
Clearing the back log in a batch process has also meant that some of the usual steps in the processes have been upset, which has meant that the impact to Members has been varied
- Delays to port ins
- Members not being provisioned on the network or delays in completion of top up / activation
- Members getting diverted to pre-recorded messages
- Members able to use some elements of the service but not have others (e.g see their balance but not have goodybags, or be able to use data but not make calls)
We now plan to make some changes to this key interface, and hope that these can be built, tested and deployed in the next 10 days. we are working closely with our suppliers to make that happen. But we still have a back log of SIM orders that are due to go out and are managing these in detail on calls 4 times a day.
I am extremely sorry for the impact to our Members and we are doing everything we can to make sure that we put in place long term fixes as well as giving you the best service we can.
I do have more details on this and some other issues that I am happy to share in the next couple of days, if people would like.