Azure Update – 4th July 2025
July 5, 2025Lesson Learned #525: Tracking Command Timeouts in Azure SQL: Beyond Query Store with Extended Events
July 5, 2025A few days ago, we were working on a support case where our customer was intermittently experiencing command timeouts. What made the case interesting was that queries which usually completed in under one second suddenly started taking more than 10 seconds to execute.
Since the application — developed in Python using the ODBC Driver 18 for SQL Server — had a command timeout set to 5 seconds, the following error was triggered every time the threshold was exceeded: Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: (‘HYT00’, ‘[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)’)
The application had built-in retry logic, dynamically increasing the timeout in each of the three retry attempts, to allow time for the query to complete and to log enough data for post-error analysis.
Example logs from the retry logic:
(RunCommandTimeout) – Thread: 39808 – Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: (‘HYT00’, ‘[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)’)
INFO:root:Connecting to the DB jmjuradotestdb1 – Thread id 39808 – (Attempt 1/3)
INFO:root:Connected to the Database in jmjuradotestdb1 – Thread id 39808 – 0.0445 seconds —
(RunCommandTimeout) – Thread: 39808 – Error executing command, retrying in 9 seconds. Attempt 2 of 3 with new timeout 9. Error: (‘HYT00’, ‘[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)’)
INFO:root:Connecting to the DB jmjuradotestdb1 – Thread id 39808 – (Attempt 1/3)
INFO:root:Connected to the Database in jmjuradotestdb1 – Thread id 39808 – 0.0532 seconds —
(RunCommandTimeout) – Thread: 39808 – Error executing command, retrying in 13 seconds. Attempt 3 of 3 with new timeout 13. Error: (‘HYT00’, ‘[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)’)
(RunCommandTimeout) – Thread: 39808 – Loop:2/5 Execution Time: 9.7537 seconds
My first prompt using SSMS Copilot was this “Review the queries that experienced a command timeout or were aborted in the last 30 minutes. Include query text, queryid, duration, and the reason and code for the abort if available.” and I got the following results. So, all points that the query 216 got command timeouts.
My next question, was, for query ID 216, show the number of total executions reporting that is 28 executions. The response showed 28 executions, but this number didn’t match the number of aborted and non-aborted executions observed in the application logs, why this difference?
Checking the table sys.query_store_runtime_stats I found 10 rows all having execution_type = 3, and total executions 28. So, that’s mean that Query Store aggregates query execution data over a fixed interval. So, the execution_type is an indicator that at least an execution during this runtime interval was aborted. So, at least several of them were aborted and other not.
To obtain a more granular and accurate picture, I created an Extended Events session to capture these events using ring_buffer target.
CREATE EVENT SESSION [CommandAborted]
ON DATABASE
ADD EVENT sqlserver.attention(
ACTION (
sqlserver.client_app_name,
sqlserver.client_hostname,
sqlserver.username,
sqlserver.database_name,
sqlserver.sql_text
)
)
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY = 4096KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS);
GO
ALTER EVENT SESSION [CommandAborted] ON DATABASE STATE = START;
after reproducing the command timeout scenario again, I was able to see only the aborted executions. So, in this case, 28 executions were executed and 7 executions were aborted.
WITH RingBufferXML AS (
SELECT
CAST(t.target_data AS XML) AS target_data
FROM sys.dm_xe_database_session_targets t
JOIN sys.dm_xe_database_sessions s
ON t.event_session_address = s.address
WHERE t.target_name = ‘ring_buffer’
AND s.name = ‘CommandAborted’
)
SELECT
x.value(‘@name’, ‘varchar(50)’) AS event_name,
x.value(‘@timestamp’, ‘datetime2’) AS event_time,
x.value(‘(action[@name=”client_app_name”]/value)[1]’, ‘nvarchar(256)’) AS client_app_name,
x.value(‘(action[@name=”sql_text”]/value)[1]’, ‘nvarchar(max)’) AS sql_text,
x.value(‘(data[@name=”duration”]/value)[1]’, ‘bigint’) AS duration_microseconds,
CAST(x.value(‘(data[@name=”duration”]/value)[1]’, ‘bigint’) / 1000000.0 AS decimal(10,3)) AS duration_seconds
FROM RingBufferXML
CROSS APPLY target_data.nodes(‘//event’) AS tab(x)
WHERE x.value(‘@name’, ‘varchar(50)’) = ‘attention’
and x.value(‘(action[@name=”client_app_name”]/value)[1]’, ‘nvarchar(256)’) = ‘TEST-DataCon’
ORDER BY event_time DESC;