Reverse Engineering an Amazon Prime Error
I like screen capping software error messages and speculate what the cause or bug might have been. It let me put myself in the engineer shoes who wrote the software and think of their behalf. Sometimes, you can tell so much from an error code. I’m wrong most of the time in my guesses, but I find the process of trouble-shooting enjoyable.
In this post I attempt to reverse engineer an error message (DECRYPTION_FAILURE) I got while watching a TV Show on Amazon Prime videos. Please note the goal of the post is learning and curiosity, bugs exist in every software.
How much can an error code tell us?
This message popped while I was watching Three Pines on Prime videos — a crime tv show- using the Amazon Prime application on Amazon Fire stick. Native app on native hardware. This is the message:
Error code: DECRYPTION_FAILURE
There’s a problem with this video which might be solved automatically by restarting your device. If this problem continues, please contact Amazon Customer Service at https://www.amazon.com/ww-av-android-contactus.
From Google this seems this error is common.
It is interesting that the error message include the resolution — restart your device and try again. This tells few things:
It is a known issue, thus the workaround included in the message to restart.
It is not consistently reproducible, otherwise it would have been fixed.
A client side state bug that goes away with device restart.
Assuming the error code is correct and not a catch-all error, as evidant the failure must have happened in TLS while decrypting the video stream.
But how exactly could this happen? How can a valid session key no longer work? Let me explore this.
Cost of Connection Establishment
The fire stick is an affordable device with limited hardware that let you stream content. The limited resources on the stick means the software must be optimized to run smoothly so Amazon and any developers would try to minimize CPU and memory usage as much as possible.
With that knowledge (and the error message in mind), one of the expensive and repetitive tasks the stick must do is establish an encrypted connection to amazon servers to stream data. This is mostly TLS over TCP, the layer 7 protocol doesn’t matter much while I suspect it might be HTTP.
Establishing connection has a cost, the TCP three way handshake for connection and the TLS handshake for encryption. Using TLS 1.3 would save a round trip over TLS 1.2 so I would assume amazon would use that, the symmetric encryption ciphers are also critical to pick those with low CPU consumption. Also as part of the TLS handshake the authentication happens which transfers large payload containing the server certificate which is no compressed.
The certificate chain is no compressed which makes the server hello payload large and take the toll on the client for downloading + authenticating. RFC 8879 proposes compressing certificate chain which I talked about this more on a backend engineering episode here if you are interested to learn more.
Using the pre-shared-key TLS extension allows the server to generate a ticket for the client that they can use in future TLS sessions. Future handshakes with the pre-shared-key ticket will skip the authentication phase because the session will be tied to the original session. This way even if the TCP connection got disconnected (a very common thing that happens in the Internet), the fire stick can use pre-shared-key to skip the authentication saving precious CPU cycles. We still get a new session key (bad idea to use the same session key for encryption) but we have skipped a step.
In some cases the same pre-shared-key TLS extension can be used to perform something called 0-RTT which using the pre-shared-key to encrypt the app data in the initial handshake, the server decrypts that and generate the session key which will be used for the rest of the connection.
A diagram from IBM explain this nicely
What went wrong?
Here is where I speculate, I think the bug might be in the pre-shared-key TLS extension library or a client above it using it. Failing to decrypt tells us we are using the wrong session key for the incoming data. Knowing what session key to use based on an incoming encrypted packet is interesting, I’m pretty sure there is one-to-one relation between connection->session key.
Creating new connection does not pass in the pre-shared-key which means the full TLS handshake is performed. However, if an existing connection with a valid session key is lost, fire stick would recreate the connection by doing a new Three-way handshake followed by the TLS handshake passing the pre-shared-key extension. This is when things go wrong.
The server receives the client hello — key-share and pre-share, verifies it and generates the new session key, encrypts the extensions and replies back with its parameters and the pre-shared-key again. The server does not send the certificate because the authentication has already been performed in an older session. The client gets the server hello and here is what I think goes wrong, two possiblities.
The client receives the server parameter generates a new session key but a bug prevents updating the existing session key in memory causing the old session key to be used in the client for decryption triggering the message.
OR the client recieves the server hello, sees it has the same pre-shared-key it already has and skips generating a new session key all together. Which now causes the old session key to be used triggering the error.
In both cases restarting the device client will clear all session keys and will force a new connection. In my case navigating back and forth fixed my problem which I suppose is a workaround put in the client side to generate a new connection when this error in encountered.
The reason the bug is inconsistent and not always reproducible might be due to how common pre-share-key session resumption is triggered. Does it trigger when TCP connection is lost but the client is still running? Does it trigger in occasions when the server forcefully closes the connection?
Again, I could be way off. But I had fun with this one.
Here is a diagram roughly sketched on my IPad summarizing this.




