-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Stop parsing the packet payload #485
base: main
Are you sure you want to change the base?
Conversation
67713e7
to
b711d6f
Compare
Instead of relying on our custom implementation and requiring a lot of work to support all the various protocols, use tcpdump instead and capture its output. Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: Antoine Tenart <[email protected]>
Previous logic relied on whether or not a LL sub-section is present. But we know always have this information in the raw packet and we're moving away from parsing the packet ourselves. This is better as it allows to decide if the LL information is important at post-processing time. The one drawback is the Python `show()` helper which isn't configurable yet in this regards (it always prints LL info). Signed-off-by: Antoine Tenart <[email protected]>
…ad info Retis used to parse the payload data itself but this has major disadvantages: - It's extremely challenging and time consuming to support all fields and all protocols. - It requires more time when collecting data (and not displaying it). - It goes against the initial goal: extracting information from targets. While metadata has to be manually parsed and while it's OK to sometimes post-process some fields to add value, the payload itself is already self-contained. It makes sense to provide it as a single entity. - This kinda reinvents the wheel. Instead we can reuse existing tools and libraries: - tcpdump for display the packet part of the events. - Existing libraries such as Scapy when post-processing in Python. There are a few drawbacks: - No easy JSON access to fields like IP. But this wasn't working well and is actually challenging when e.g. thinking about tunnels. - No easy way to access fields in Rust. Note that we don't do this ATM. Overall IMO the benefits outweigh the drawbacks. Signed-off-by: Antoine Tenart <[email protected]>
The outer VLAN header can be "accelerated" in Linux, aka. part of the skb metadata and not the payload. Only report those as part of the skb event as for all other cases the VLAN information will be part of the payload and displayed correctly. Signed-off-by: Antoine Tenart <[email protected]>
Signed-off-by: Antoine Tenart <[email protected]>
This was kept to 3 to fix an incompatibility with c8s, but this should be fixed at the packaging level with an out-of-tree patch. Signed-off-by: Antoine Tenart <[email protected]>
Adding one additional possibility here: If not parsing the payload to make fields part of the event is agreed on (not saying is is right now nor that it'll be the case), and if concerns are only directed towards calling an external binary to format part of the events, we could keep parsing the packet with This would allow supporting all protocols right now (in the display) while still giving us a way forward, long term, for handling this ourselves. Note that it'll be far easier to parse a packet and display it than converting it to an event; also this should help for performances ( |
I generally find the idea quite interesting but the implications are very big. My biggest concern is calling an external tool, the dependency it introduces. In the ideal world, we would use tcpdump without having to start it. Using it as a library but it seems not possible atm.
This would help: keep the somewhat limited display logic while removing the need for header info in the event file. I am also concerned if we loose control of how the packet is displayed, things like the sorted output can get really messy really quickly. |
+1
Meaning
Any example of what could go wrong specifically for the sorted output? |
I took a (real) quick look and there seems to be libwireshark that could be interesting, but I guess it's not conceived as a public library (API stability and so forth).
|
Yes
It depends on what tcpdump / tshark prints but what I mean is that we no longer control it. I think we could consider combining this feature with a binary format more fit for storing packets than base64 inside a JSON. Maybe retis could generate a pcap file with some unique packet IDS in the comments and a matching json file with the metadata? |
I'm not sure about that, adding
OK, so that wouldn't be limited to the sorted output. I guess that's a trade-off between parsing more protocols and fields and controlling 100% what gets displayed. Using the above tools at least would produce a known format for many users. We can also add safeguards, e.g. controlling there is no newline in the output if that's a concern (although that adds processing). Let's also note if we go with using an external tool optional, we could still be in controlled when configured and in the mid/long-term.
I'm not sure I see the link between that PR and this proposal. This seems more about performances and storing less data on disk, while the PR is about supporting more protocols and fields. I'm also not convinced the above would help, as we can chose the format we want to store the packet in our event. Splitting it would make things more complex and wouldn't help to use the pcap file directly (which is larger than our current format too), we would still need |
I'm really divided by this PR. On the other hand, it just seems too much of a change:
Could we at least ease the transition somehow? Thinking out loud here:
|
Also note I chose an opinionated solution in this PR to start discussing the matter, but of course I'm expecting this to be more nuanced (if applied at all).
Yes, I think that would make perfect sense. Default parser would still be implemented in We can also not use third party tools to display the data at all if we want, but IMO that only works in the short term if we're not parsing the raw packet to issue event sections.
I don't think providing un-trusted partial data is helpful as it can't really be used. It also can be a chicken and egg issue: you need to know what should be the right data to use it, which conflicts with the purpose of the tool. Also note this would require to parse the raw packet twice (once to provide best effort sections, once to print it in a reliable way). I'm not necessarily saying we should not provide parsed packet sections (although it's my opinion), but if we decide to keep sections we should make then reliable and complete so they can be used.
Making consumption easier is always nice. I would just avoid to make scapy a hard dependence. |
This is an attempt at not parsing the payload in Retis and only provide the raw data as part of the event. I decided to keep the existing skb event sections (IP, ARP, etc) for backward compatibility but not to generate nor use them anymore. A middle ground could be to still generate them (but not use them) or to let this be controllable by a flag. The question is then if this is actually needed or just hypothetical; as this will impact maintenance.
One major benefit of this is the instant support for stacked VLANs, tunnels, etc. We can also remove the
pnet_packet
dependency.Retis used to parse the payload data itself but this has major disadvantages:
Instead we can reuse existing tools and libraries:
There are a few drawbacks:
Overall IMO the benefits outweigh the drawbacks.
If this is wanted, TODO:
[1]