-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve extraction for non-defanged URLs #61
Comments
Thanks for taking my comment into account! Hopefully this can be fixed (: |
Hi, @luis261! I finally got a second to look over the issue. Your comment was absolutely valuable, but time is unfortunately limited, so I wasn't able to really look into it until now. A solution is currently in testing and will be available in the next release. I've included a few examples with comments below. You may notice a new parameter: import iocextract
data = [
"1.1.1.1",
"1[.]1[.]1[.]1",
"domain.com",
"domain[.]com"
]
for d in data:
# Everything should be refanged
print(list(iocextract.extract_urls(d, refang=True, no_scheme=True)))
# Half should be defanged, half should be normal (defang_data defaults to false)
print(list(iocextract.extract_urls(d, refang=False, no_scheme=True)))
# Everything should be defanged
print(list(iocextract.extract_urls(d, refang=False, no_scheme=True, defang_data=True))) |
@azazelm3dj3d Alright, thanks for keeping me updated! Once the new release is out I will check out the new behavior of |
The new version is now available: https://pypi.org/project/iocextract/1.14.1/ |
Alright, I verified the behavior you wrote about in your comment. However, the fundamental issue of |
Definitely a good note for the future. Due to the repository not having too many outstanding issues relative to other open-source initiatives, I haven't taken much time to review the actual documentation and how thorough (or accurate) it is. I do have it on my backlog, but no issue assignment, so I just took care of that. Thank you for bringing that to my attention. Issue: #65 |
"while it seems like the bug originally referenced in this issue is fixed in the new version, the one I commented above still exists. Defanged IPs still get extracted by
extract_urls
while their non-defanged counterparts don't"Issue comment: #34 (comment)
The text was updated successfully, but these errors were encountered: