Keylogger

Published at May 16, 2024

#embedded
#arduino

Challenge

I want to create a real-time heatmap of my keyboard usage (just for fun! 🔥), the micro will log keypress and send to a backend computer that will store it and deliver it to frontend.

How does a keyboard work?

I think these videos are quite good at explaining it:

Decipher USB message

First step is to intercept the message sent from keyboard to computer. That message is somehow transmitted through the wires, we need to checkout the USB specs 2.0. The message will contain the keypress value.

Hardware

I stripped the cable and there are 4 cables:

  • Black : ground
  • Red: 5V
  • Green: D+
  • White: D-
USB cable wiring colors

I connected ground, D+ and D- wire to my digital analyzer.

Digital analyzer wiring

Signal capturing

Once the cables are connected, I checked the signal received in idle (just connected without pressing any keys):

Signals in idle

D+ stays low (0) and D- stays high (1), what does it mean? looking the table 7.2 of USB specs, we can see that in idle state -> Low-speed has D+ low and D- high . So this is a low speed device.

Signaling levels

Important thing, there are 2 signals (D+ and D-) used to transmit data, with them we can obtain 4 combinations :

D+ D- Name in table State
1 0 Differential 1 K
0 1 Differential 0 J
0 0 Single ended 0 (SE0) SE0
1 1 Single ended 1 (SE1) SE1

I can notice that sometimes another signal is captured, could be a ping message or something like that because I did not press anything. That signal is shown below, each bit time takes 0.08 μs (12.5 Mbit/s).

Signaling levels

The table of signaling levels describe the End of Packet (EOP) signal, D+ and D- should stay low for 2 bit times and followed by J state (differential 0 for low speed). The picture above shows 5 packets.

But what happens when I press a key?

Capturing that packet was more difficult than I thought, the keypress gets lost in the middle of other messages shown above. I had to think on recording some seconds and then postprocess after.

The setup in digital analyzer is will record 20M samples at 100MHz :

Analog discovery setup for recording

After pressing start, I start pressing ‘w’ key. The record file size was ~300 MB !!

USB protocol

Now I have a big file with 2 signals of 0 & 1’s , time to decode it!. There are many types of packets: token, data and handshake. I am interested in the data packets that have the next structure:

Data packet format PID types

Data packet: all bits are ordered LSB bit first.

  • PID (Packet Identifier): consists of 4 bit packet type followed by 4 bit check field (ones’ complement). The 4 bit packet type starts with LSb , we need to reverse it to have the actual order. Data packets are 0011, 1011 (low speed device).
  • DATA: the maximum payload size for low speed devices is 8 bytes
  • CRC16: 8 bits for check, cyclic redundancy check algorithm

📚 LSb first (least significant bit first) refers to the sequence of the bits, the lowest bit first therefore we need to reverse it to get the real value.

All packets begin with a synchronization (SYNC) field. The SYNC pattern for low speed is required to be 3 KJ pairs followed by 2 K’s (total 8 states). This serves to know how much time takes a bittime.

How do we get the bits in data packet? We have to remember that there are 4 states (K, J, SE0, SE1) for two signals. We have to convert them to bits. How?, USB employs NRZI data encoding when transmitting packets. In NRZI encoding:

  • No change in state -> 1
  • Change in state -> 0

A zero is inserted after every six consecutive ones in the data stream before data is NRZI is encoded. This helps the receiver to lock the data and clock.

With all this information on USB protocol and Signaling levels, I started writing a Python code to decode it and I finally 🔥 got the data packets containing the ‘w’ keypresses. The Python code was useful to test the decoding algorithm.

#TODO: show data packet decoded 0&1 of w keypress to SHOW data payload and see that letter is in one byte, and explain that modifier keys (shift, ctrl) can be another byte and other keys pressed at same time can be there as well.