FlexiCapture Cloud (EU) : Issue with processing server
Incident Report for ABBYY
Postmortem

Dear Customer,

On July 26, 2024, ABBYY FlexiCapture Cloud EU experienced severe service interruptions. We are pleased to confirm that the issues have been mitigated and the service is now fully functional again. Please review the following incident Root Cause Analysis (RCA) information:

Cloud instance

  • Europe

Incident timeframe

  • July 26, 2024

    • 03:18 – 16:30 UTC

Incident status

  • Fully mitigated

Customer impact

  • Document import, processing, and export were interrupted or experienced significant delays.
  • Most processing tasks created via the REST API timed out during the incident.
  • The Web Verification Station could not open verification tasks and displayed the message “No License. Cannot get license ticket”.
  • The Remote Verification Station could not open verification tasks and displayed the message “Failed to get licensing ticket”.
  • The Administration and Monitoring Console displayed the error message “License has expired or has been deleted”.

Incident history

  • 03:18 UTC: The Processing Server stopped after multiple unsuccessful attempts to restart and automatically resume task processing.
  • 03:22 UTC: The service health monitoring system was triggered, notifying the on-duty team of a service interruption.
  • 03:25 UTC: The team began identifying the root cause of the service interruption according to internal playbooks.
  • 06:00 UTC: After multiple unsuccessful attempts to recover the service, the team started reconfiguring the system to switch to the backup Processing Server.
  • 06:26 UTC: The switch to the backup Processing Server was completed, heartbeats were restored, and the service returned to normal operation.
  • 06:37 UTC: Task processing stopped again, triggering another alert from the service health monitoring system.
  • 08:00 UTC: The team discovered that the Processing Server was stopping due to the licensing service's inability to access the Azure storage holding customer licenses. A shared backup license was used to restore task processing for some customer tenants.
  • 12:00 UTC: Further analysis confirmed an issue with the Azure storage holding customer licenses. The team created new storage and began transferring customer licenses to it, restoring service operations for the remaining customer tenants.
  • 16:30 UTC: The license transfer was completed for all customer tenants, fully restoring service operations. The incident was considered fully mitigated.

Root cause

  • An outage of the Azure storage holding customer licenses led to the Processing Server stopping after several minutes of operation.

Mitigation measures

  • New Azure storage was created, and customer licenses were transferred to this new storage, restoring service operations.

Prevention measures

  • Extend the logging capabilities of the licensing service to enable immediate identification of license access issues, as outlined in internal playbooks.
  • Enhance internal infrastructure to facilitate an immediate switch to a storage-independent backup license in the event of future outages of the Azure storage used for holding customer licenses.

We apologize for any inconvenience and most of all, for the potential impact on your business. We are committed to preventing the issue in the future and will continue working on improving the infrastructure and our monitoring solutions.

Thank you for using ABBYY FlexiCapture Cloud!

If you have any questions or feedback, please feel free to contact our support team via the Help Center portal.

Yours faithfully,
ABBYY FlexiCapture Cloud Team

Posted Aug 28, 2024 - 07:55 UTC

Resolved
This incident has been resolved.
Posted Jul 26, 2024 - 16:37 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 26, 2024 - 14:07 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 26, 2024 - 13:37 UTC
Update
Processing was partially restored. The full recovery expected soon.
Posted Jul 26, 2024 - 12:59 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 26, 2024 - 11:09 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 26, 2024 - 07:20 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 26, 2024 - 07:19 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Jul 26, 2024 - 07:10 UTC
This incident affected: FlexiCapture Cloud EU and FlexiCapture Cloud EU2.