Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-448 Improve NWPathMonitor thread safety #109

Merged
merged 1 commit into from
May 18, 2020

Conversation

ncreated
Copy link
Member

@ncreated ncreated commented May 15, 2020

What and why?

🔬 This PR attempts to fix issues with NWPathMonitor that were reported by several people. Unfortunately, I wasn't able to reproduce any of reports, so all the changes are based on my research, investigation and intuition.

GH Issue #110

Provided crash logs clearly says we have a problem with some parts of NWPath being destroyed while we're accessing it inside NWPathMonitor.current:

0  libswiftCore.dylib             0x195690314 _swift_release_dealloc + 16
1  libswiftNetwork.dylib          0x1bd5d4d88 destroy for NWPath + 44
2  libswiftNetwork.dylib          0x1bd5d4d88 destroy for NWPath + 44
3  Datadog                        0x1052e79fc NWPathMonitor.current.getter + 22 (NetworkPathMonitor.swift:22)
4  Datadog                        0x1052c2234 DataUploadConditions.canPerformUpload() + 48 (NetworkConnectionInfoProvider.swift:48)

Even when we capture the value of struct NWPath on local property, this code seems to not be thread safe:

let info: NWPath = currentPath
_ = info.supportsIPv4
_ = info.supportsIPv6

My suspicion is that the NWPath internals are based on mutable state, thus it crashes when the NWPath is updated by the OS.

Issue reported by @jegnux:

swift_unknownObjectRelease crash during background → foreground transition (and probably network connection status change - not sure):

image

GH Issue #70:

This is clearly the same problem:

EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000020

Datadog
$s7Datadog29NetworkConnectionInfoProviderC7currentAA0bcD0Vvg + 152
EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000020

Datadog
$s7Network13NWPathMonitorC7DatadogE15currentPathInfoAD09NWCurrentfG0VyF + 132

Other issues

There is a small chance that the NWPathMonitor might be the root cause of #106. In this case, we don't have a symbolicated crashlog, so we can't really tell if dyld_stub_binder attempts to evaluate a symbol coming from DarwinNetwork image, but this is IMO possible, as NWPathMonitor's start() is invoked at the SDK initialization time.

How?

Debugging

To reproduce the issue, I was trying different combinations of following:

  • Using DispatchQueue.concurrentPerform(iterations: 10_000) to stress test nwPathMonitor.currentPath access from multiple threads.
  • Transitioning the app between background and foreground while running tests on a real device.
  • Switching between wifi, cellular and no Internet connection.

After many runs, I still didn't catch the crash.

The fix

First, I got rid of fetch {} and cancel {} closures, which were the only things retaining NWPathMonitor instance. Instead, the monitor is now retained on the property explicitly. Even if it doesn't help, it simplifies the code and should result with cleaner / different stack trace if the issue is back.

Second, looking at the same problem solved in rwbutler/Connectivity project, I decided to update our way of handling the NWPathMonitor. This monitor provides to ways of accessing the NWPath:

  • pulling: monitor.currentPath;
  • pushing: monitor.pathUpdateHandler = { path in _ }.

The pulling model is very convenient for other parts of the SDK, but its documentation doesn't give any guidelines on how to deal with a thread safety (same, when we look at this implementation, we can only guess what happens internally).

The pushing model is more explicit - it promises to deliver updates on the queue passed to monitor.start(queue:). To leverage both, the thread safety of the pushing model, and keep the convenience of the pulling model, I created a wrapper which exposes .current value while using .pathUpdateHandler = { _ in } internally.

Because rwbutler/Connectivity successfully uses monitor.pathUpdateHandler = { _ in } to read network info, I hope this will ultimately solve also our problem.

In addition to this PR, I opened a thread on Apple Developer Forum, to hopefully get some more insights on NWPath thread safety:

https://forums.developer.apple.com/message/420389#420389

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • I point this PR to tracing branch, as we have better setup for running unit tests on device. When approved, I will rebase it onto master. ⚠️

@ncreated ncreated added this to the next-version milestone May 15, 2020
@ncreated ncreated requested a review from a team as a code owner May 15, 2020 10:52
@ncreated ncreated self-assigned this May 15, 2020
@ncreated ncreated force-pushed the ncreated/RUMM-448-improve-nwpath-monitor-safety branch from 9bc5322 to 87ee84b Compare May 15, 2020 10:57
Copy link
Contributor

@buranmert buranmert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful insight from #110

  1. iOS 11 is okay: proves NWPathMonitor is the root cause
  2. Crash rate 500/~3M: this rate seems consistent to me with thread safety of read/write of currentPath hypothesis

@buranmert buranmert dismissed their stale review May 16, 2020 00:39

One shouldn't review thread-safety at 2 AM

@ncreated ncreated marked this pull request as draft May 18, 2020 07:21
@ncreated ncreated marked this pull request as ready for review May 18, 2020 08:13
@ncreated
Copy link
Member Author

@buranmert Yes, #110 brought some new light to the problem. We should avoid NWPath processing by our own, and consider this to be not thread safe:

let info: NWPath = currentPath
_ = info.supportsIPv4
_ = info.supportsIPv6

I updated the PR and description accordingly.

@ncreated ncreated force-pushed the ncreated/RUMM-448-improve-nwpath-monitor-safety branch from ab6bb81 to 23305e1 Compare May 18, 2020 13:03
@ncreated ncreated changed the base branch from tracing to master May 18, 2020 13:03
@ncreated ncreated force-pushed the ncreated/RUMM-448-improve-nwpath-monitor-safety branch from 23305e1 to 3da2053 Compare May 18, 2020 13:10
@ncreated ncreated merged commit 4022bd2 into master May 18, 2020
@ncreated ncreated deleted the ncreated/RUMM-448-improve-nwpath-monitor-safety branch May 18, 2020 13:23
@ncreated ncreated removed this from the next-version milestone May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants