[RFC] Stop parsing the packet payload #485

atenart · 2025-01-29T11:22:17Z

This is an attempt at not parsing the payload in Retis and only provide the raw data as part of the event. I decided to keep the existing skb event sections (IP, ARP, etc) for backward compatibility but not to generate nor use them anymore. A middle ground could be to still generate them (but not use them) or to let this be controllable by a flag. The question is then if this is actually needed or just hypothetical; as this will impact maintenance.

One major benefit of this is the instant support for stacked VLANs, tunnels, etc. We can also remove the pnet_packet dependency.

Retis used to parse the payload data itself but this has major disadvantages:

It's extremely challenging and time consuming to support all fields and all protocols.
It requires more time when collecting data (and not displaying it).
It goes against the initial goal: extracting information from targets. While metadata has to be manually parsed and while it's OK to sometimes post-process some fields to add value, the payload itself is already self-contained. It makes sense to provide it as a single entity.
This kinda reinvents the wheel.

Instead we can reuse existing tools and libraries:

tcpdump for displaying the packet part of the events.
Existing libraries such as Scapy when post-processing in Python.

There are a few drawbacks:

No easy JSON access to fields like IP. But this wasn't working well and is actually challenging when e.g. thinking about tunnels. Python can be used instead.
No easy way to access fields in Rust. Note that we don't do this ATM.

Overall IMO the benefits outweigh the drawbacks.

If this is wanted, TODO:

Investigate performances (excluding startup time of tcpdump).
Update the output of all examples in the doc, including the gif.
Document using scapy in Python. Provide an example[1]?
Fix out tests.

[1]

from scapy.layers.l2 import Ether
from scapy.all import *

for e in reader.events():
    if 'skb' in e:
        p = Ether(e['skb'].packet.packet)
        print(p.summary())
        if IP in p:
            print("src: " + p[IP].src)

Instead of relying on our custom implementation and requiring a lot of work to support all the various protocols, use tcpdump instead and capture its output. Signed-off-by: Antoine Tenart <[email protected]>

Signed-off-by: Antoine Tenart <[email protected]>

Previous logic relied on whether or not a LL sub-section is present. But we know always have this information in the raw packet and we're moving away from parsing the packet ourselves. This is better as it allows to decide if the LL information is important at post-processing time. The one drawback is the Python `show()` helper which isn't configurable yet in this regards (it always prints LL info). Signed-off-by: Antoine Tenart <[email protected]>

…ad info Retis used to parse the payload data itself but this has major disadvantages: - It's extremely challenging and time consuming to support all fields and all protocols. - It requires more time when collecting data (and not displaying it). - It goes against the initial goal: extracting information from targets. While metadata has to be manually parsed and while it's OK to sometimes post-process some fields to add value, the payload itself is already self-contained. It makes sense to provide it as a single entity. - This kinda reinvents the wheel. Instead we can reuse existing tools and libraries: - tcpdump for display the packet part of the events. - Existing libraries such as Scapy when post-processing in Python. There are a few drawbacks: - No easy JSON access to fields like IP. But this wasn't working well and is actually challenging when e.g. thinking about tunnels. - No easy way to access fields in Rust. Note that we don't do this ATM. Overall IMO the benefits outweigh the drawbacks. Signed-off-by: Antoine Tenart <[email protected]>

The outer VLAN header can be "accelerated" in Linux, aka. part of the skb metadata and not the payload. Only report those as part of the skb event as for all other cases the VLAN information will be part of the payload and displayed correctly. Signed-off-by: Antoine Tenart <[email protected]>

Signed-off-by: Antoine Tenart <[email protected]>

This was kept to 3 to fix an incompatibility with c8s, but this should be fixed at the packaging level with an out-of-tree patch. Signed-off-by: Antoine Tenart <[email protected]>

atenart · 2025-02-05T10:47:09Z

Adding one additional possibility here:

If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with pnet but at display time and implement our own display logic only. This could be done in addition to what this PR does, and be either enabled by a flag, be the default behavior, or used as a fallback when the external binary does not exist / respond / work.

This would allow supporting all protocols right now (in the display) while still giving us a way forward, long term, for handling this ourselves. Note that it'll be far easier to parse a packet and display it than converting it to an event; also this should help for performances (-o), etc. We can even make this as a separate lib.

amorenoz · 2025-02-05T20:53:49Z

I generally find the idea quite interesting but the implications are very big. My biggest concern is calling an external tool, the dependency it introduces.

In the ideal world, we would use tcpdump without having to start it. Using it as a library but it seems not possible atm.
@vlrpl did you look at tshark/wireshark?

If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with pnet but at display time and implement our own display logic only. This could be done in addition to what this PR does, and be either enabled by a flag, be the default behavior, or used as a fallback when the external binary does not exist / respond / work.

This would help: keep the somewhat limited display logic while removing the need for header info in the event file.
Maybe we should even disable printing by default?

I am also concerned if we loose control of how the packet is displayed, things like the sorted output can get really messy really quickly.

atenart · 2025-02-06T10:38:25Z

I generally find the idea quite interesting but the implications are very big. My biggest concern is calling an external tool, the dependency it introduces.

In the ideal world, we would use tcpdump without having to start it.

+1

If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with pnet but at display time and implement our own display logic only. This could be done in addition to what this PR does, and be either enabled by a flag, be the default behavior, or used as a fallback when the external binary does not exist / respond / work.

This would help: keep the somewhat limited display logic while removing the need for header info in the event file. Maybe we should even disable printing by default?

Meaning -o would be the default option?

I am also concerned if we loose control of how the packet is displayed, things like the sorted output can get really messy really quickly.

Any example of what could go wrong specifically for the sorted output?

vlrpl · 2025-02-06T16:06:42Z

I generally find the idea quite interesting but the implications are very big. My biggest concern is calling an external tool, the dependency it introduces.

In the ideal world, we would use tcpdump without having to start it. Using it as a library but it seems not possible atm. @vlrpl did you look at tshark/wireshark?

I took a (real) quick look and there seems to be libwireshark that could be interesting, but I guess it's not conceived as a public library (API stability and so forth).
Of course, it requires further checks, if we end up finding this interesting.

If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with pnet but at display time and implement our own display logic only. This could be done in addition to what this PR does, and be either enabled by a flag, be the default behavior, or used as a fallback when the external binary does not exist / respond / work.

This would help: keep the somewhat limited display logic while removing the need for header info in the event file. Maybe we should even disable printing by default?

I am also concerned if we loose control of how the packet is displayed, things like the sorted output can get really messy really quickly.

amorenoz · 2025-02-10T21:23:50Z

If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with pnet but at display time and implement our own display logic only. This could be done in addition to what this PR does, and be either enabled by a flag, be the default behavior, or used as a fallback when the external binary does not exist / respond / work.

This would help: keep the somewhat limited display logic while removing the need for header info in the event file. Maybe we should even disable printing by default?

Meaning -o would be the default option?

Yes

I am also concerned if we loose control of how the packet is displayed, things like the sorted output can get really messy really quickly.

Any example of what could go wrong specifically for the sorted output?

It depends on what tcpdump / tshark prints but what I mean is that we no longer control it.

I think we could consider combining this feature with a binary format more fit for storing packets than base64 inside a JSON. Maybe retis could generate a pcap file with some unique packet IDS in the comments and a matching json file with the metadata?

atenart · 2025-02-11T09:30:56Z

If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with pnet but at display time and implement our own display logic only. This could be done in addition to what this PR does, and be either enabled by a flag, be the default behavior, or used as a fallback when the external binary does not exist / respond / work.

This would help: keep the somewhat limited display logic while removing the need for header info in the event file. Maybe we should even disable printing by default?

Meaning -o would be the default option?

Yes

I'm not sure about that, adding -o is quite easy and aligns with many other utilities. Running live collection is used in many cases.

I am also concerned if we loose control of how the packet is displayed, things like the sorted output can get really messy really quickly.

Any example of what could go wrong specifically for the sorted output?

It depends on what tcpdump / tshark prints but what I mean is that we no longer control it.

OK, so that wouldn't be limited to the sorted output. I guess that's a trade-off between parsing more protocols and fields and controlling 100% what gets displayed. Using the above tools at least would produce a known format for many users. We can also add safeguards, e.g. controlling there is no newline in the output if that's a concern (although that adds processing). Let's also note if we go with using an external tool optional, we could still be in controlled when configured and in the mid/long-term.

I think we could consider combining this feature with a binary format more fit for storing packets than base64 inside a JSON. Maybe retis could generate a pcap file with some unique packet IDS in the comments and a matching json file with the metadata?

I'm not sure I see the link between that PR and this proposal.

This seems more about performances and storing less data on disk, while the PR is about supporting more protocols and fields. I'm also not convinced the above would help, as we can chose the format we want to store the packet in our event. Splitting it would make things more complex and wouldn't help to use the pcap file directly (which is larger than our current format too), we would still need retis pcap. I'm not really getting what would be the benefits?

amorenoz · 2025-02-28T19:19:14Z

I'm really divided by this PR.
On the one hand, I very much understand the motivation and the great weight we take from our shoulders if we let tcpdump do what it does best.

On the other hand, it just seems too much of a change:

Making retis directly depend on having tcpdump installed
Output might change depending on what tcpdump version is installed
More blocking I/O in the collection event loop
Basic header info no longer available in the event (e.g: quick python exploration now is more complicated)

Could we at least ease the transition somehow? Thinking out loud here:

Make it an opt-in instead of the default behavior
Keep super-basic packet data in the event and clearly document that this is best-effort and that for full information, tcpdump or scapy need to be used to parse the packet
Have a builtin python helper that parses the packet using scapy and returns a scapy object that is easy to use

atenart · 2025-03-03T09:35:13Z

I'm really divided by this PR. On the one hand, I very much understand the motivation and the great weight we take from our shoulders if we let tcpdump do what it does best.

On the other hand, it just seems too much of a change:

Making retis directly depend on having tcpdump installed

Output might change depending on what tcpdump version is installed

More blocking I/O in the collection event loop

Basic header info no longer available in the event (e.g: quick python exploration now is more complicated)

Also note I chose an opinionated solution in this PR to start discussing the matter, but of course I'm expecting this to be more nuanced (if applied at all).

Could we at least ease the transition somehow? Thinking out loud here:

Make it an opt-in instead of the default behavior

Yes, I think that would make perfect sense. Default parser would still be implemented in retis-event (or in a new crate), while it could be overridden if the user chooses to.

We can also not use third party tools to display the data at all if we want, but IMO that only works in the short term if we're not parsing the raw packet to issue event sections.

Keep super-basic packet data in the event and clearly document that this is best-effort and that for full information, tcpdump or scapy need to be used to parse the packet

I don't think providing un-trusted partial data is helpful as it can't really be used. It also can be a chicken and egg issue: you need to know what should be the right data to use it, which conflicts with the purpose of the tool. Also note this would require to parse the raw packet twice (once to provide best effort sections, once to print it in a reliable way).

I'm not necessarily saying we should not provide parsed packet sections (although it's my opinion), but if we decide to keep sections we should make then reliable and complete so they can be used.

Have a builtin python helper that parses the packet using scapy and returns a scapy object that is easy to use

Making consumption easier is always nice. I would just avoid to make scapy a hard dependence.

atenart force-pushed the at/tcpdump branch 3 times, most recently from 67713e7 to b711d6f Compare January 29, 2025 12:56

atenart mentioned this pull request Jan 30, 2025

collector: skb: handle VLAN in the payload #481

Open

atenart added 8 commits February 3, 2025 18:12

events: skb: use tcpdump to format the packet when displaying

bb973f3

Instead of relying on our custom implementation and requiring a lot of work to support all the various protocols, use tcpdump instead and capture its output. Signed-off-by: Antoine Tenart <[email protected]>

events: skb: print the packet on a dedicated line

08afbd5

Signed-off-by: Antoine Tenart <[email protected]>

events: ct: fix single-line formatting

6710a14

Signed-off-by: Antoine Tenart <[email protected]>

docs: reflect latest changes in the skb collector output

c82a82d

Signed-off-by: Antoine Tenart <[email protected]>

Cargo.lock: update to version 4

e7c9f7b

This was kept to 3 to fix an incompatibility with c8s, but this should be fixed at the packaging level with an out-of-tree patch. Signed-off-by: Antoine Tenart <[email protected]>

atenart force-pushed the at/tcpdump branch from b711d6f to e7c9f7b Compare February 3, 2025 17:15

atenart mentioned this pull request Feb 5, 2025

Add basic tests for VLAN #482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Stop parsing the packet payload #485

[RFC] Stop parsing the packet payload #485

atenart commented Jan 29, 2025 •

edited

Loading

atenart commented Feb 5, 2025

amorenoz commented Feb 5, 2025

atenart commented Feb 6, 2025

vlrpl commented Feb 6, 2025 •

edited

Loading

amorenoz commented Feb 10, 2025

atenart commented Feb 11, 2025

amorenoz commented Feb 28, 2025

atenart commented Mar 3, 2025

[RFC] Stop parsing the packet payload #485

Are you sure you want to change the base?

[RFC] Stop parsing the packet payload #485

Conversation

atenart commented Jan 29, 2025 • edited Loading

atenart commented Feb 5, 2025

amorenoz commented Feb 5, 2025

atenart commented Feb 6, 2025

vlrpl commented Feb 6, 2025 • edited Loading

amorenoz commented Feb 10, 2025

atenart commented Feb 11, 2025

amorenoz commented Feb 28, 2025

atenart commented Mar 3, 2025

atenart commented Jan 29, 2025 •

edited

Loading

vlrpl commented Feb 6, 2025 •

edited

Loading