01 April 2020 hacking web scraping
Hack the WhatsApp status to track contacts
WhatsApp shares your contacts’ status with you.
TL;DR: You can protect yourself from this hack by changing your account privacy settings. By default, WhatsApp shares your status with others.
Since nobody is changing settings nowadays, this hack works almost all the time.
DISCLAIMER: This is a proof of concept to raise awareness and a bit of a technical challenge before anything else. Don't use my source code provided to track someone, don't be a dick ❤️.
WhatsApp in Android
Exploit the feature
I want to exploit this feature to track users (for science). My first question is: How does this feature works?
To make things up, I am using https://web.whatsapp.com/
in my laptop web browser instead of my Android Smartphone. So I’ll deal with regular web reverse engineering to get this exploit done. I skip Android app reverse engineering for another time.
I pick a friend of mine’s phone and look at how his status behaves on my side.
Initially, the status is offline, and in this case, WhatsApp gives you an absolute date, last seen 16/03/2020 at 15:40
.
I’m unlocking the friend’s phone and opening an app (not WhatsApp), doing that for a minute, nothing on my side.
Ok; Now switching to WhatsApp. The status has changed to online
10 seconds later. I didn’t go to the conversations I am sharing with this phone/contact to verify that the status transmits without this condition.
The online
status remains until you leave WhatsApp or shut off the screen on the targeted phone.
And then, it reverts to a new last seen
date once offline again.
So to summarize the thing:
- We won’t be able to track someone using his phone globally (hopefully!)
- But we can track the WhatsApp usage of anyone we have in our contacts
- The info leaked are
last seen
date and the liveonline
status per contact - We can expect to have at least a minute of accuracy for the
last seen
date - And the
online
status shows up once WhatsApp is in the foreground for at least 5-10 seconds
Technical analysis
I’m opening the Firefox debugger (Proudly using Firefox again !) to see how the front of WhatsApp web is fetching the coveted data.
The front uses a web socket communication to gather the data in real-time, somewhat every 10-15 seconds.
If we look carefully, the front seems to poke the server every ~15 seconds with ?,,
and most of the time follows a reply of !{timestamp}
. A kind of keep-alive stuff. Not interesting for us.
The server pushes another kind of message to the front when the status of the contact changes.
The id
value I partially cover in black is the phone number, type
is the available/unavailable flag, t
is the timestamp of our last seen
date. The whole payload is encapsulated in a Presence
object, easy to recognize.
The timestamp is matching what we read in the UI.
using https://www.epochconverter.com/
Limitations
To receive the presence
events from the server via the web socket com, we (the front) subscribe to a specific phone number (id
). It is triggered when we select another conversation/contact with the web interface.
So, in this conception, we only receive the active contact’s presence
events.
In other words, we can only track one contact at a time in the web socket connection. Too bad for us!
WhatsApp also prevents us from opening several concurrent instances (of the same cookies). So we can’t open two web-socket channels altogether. It would have been too easy!
And finally, this one-WhatsApp-web-session-at-a-time behavior still applies when trying two independent sessions (not the same cookies). A new session will trigger the older one to close, and especially at the web-socket layer.
Another expected limitation, the validity of our session is limited in time. Mine will expire the 22/10/2020, in 6 months+. It’s odd to retrieve this info on the front side like this. I might be misinterpreting this one.
NaĂŻve implementation
Now that we’ve defined what is the status
feature of WhatsApp and how it could be misused to track users, it is time to code something.
We also looked at the technical implementation and for a possible easy-security-flaw.
We could re-code the web socket communication exchange to retrieve the status data, but this will be complex. Too complex if we can only track one contact at a time. I will start with high-level techno and accept the current known limitations and see where it goes.
My idea is to see where we can go with cheap hacking work before doing advanced things.
I’ll decompose the proof of concept into three steps:
- Gather the data
- Store the data (easy)
- Visualize the data (easy, but challengy for me)
I’ll scrape the data using Node.js and Puppeteer; Puppeteer allows us to control a browser and interact the same way a user would do with the mouse and keyboard. It avoids doing complex reverse engineering at the web socket level, and that’s why I picked that up. I’m more used to Selenium + C#. This is my first puppeteer experiment, so be kind to me.
We got the core stuff in 38 lines of code.
To continue, we need to parse the last seen today at 13:15
format into a proper date format.
To do that, I’m using the so-wonderful chrono-node
npm package.
Finally, I implement a loop in the code to scan the status constantly and store it into InfluxDB 2.0.
InfluxDB is a time-series database. That’s perfect for our use case.
I will derive the last seen
date into an offline since
UInteger
. It will be the seconds counter since the last seen
date.
offline since
will value 0
when the status is online
.
Deriving our data is turning our event-based data into time-series data.
This design fits better for InfluxDB and especially for Grafana whose will display our data. And that’s stateless; I like that.
To store the data into InfluxDB 2.0, I’m using the Node.js client with the line protocol
format of InfluxDB.
measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100400200
--------------- --------------- --------------------- -------------------
| | | |
Measurement Tag set Field set Timestamp
The data stored looks like this:
status,contactName=Toto offlineSince=8275u 1465839830100400200
status,contactName=Toto offlineSince=8280u 1465839830100400200
status,contactName=Toto offlineSince=0u 1465839830100400200
status,contactName=Tata offlineSince=0u 1465839830100400200
------ --------------- ----------------- -------------------
| | | |
Measurement Tag set Field set Timestamp
The code implementation:
There is an edge case I want to handle: Sometimes, the status does not display at all in WhatsApp.
In this case, we won’t enter a offlineSince
measure into the database because we don’t have one.
Instead, we will log a statusAvailable
measure (being 0
or 1
) each time we scan the status.
We now connect Grafana to InfluxDB and create a dashboard to monitor our acquisition. And voilĂ , let it run.
You can find the source code of this proof of concept here.
We will try to improve this hack later, for another blog entry someday!
Update
You can found the next episode here in which we scale this work to track 5000 Smartphones.