-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127
Comments
Hi, could you provide an example of a phone number with both 8 and 9 digits for better understanding? Also, just to clarify, are you saying that WhatsApp still registers some users with the 8-digit format, even though your contacts.vcf file has already been updated to the 9-digit format? |
Hello! Yes, I can show you an example of a phone number with 8 and 9 digits. 557312345678 - It is saved like this with eight digits in wa.db and you can also see it like this on the contact information screen in WhatsApp And to clarify your doubt, yes! WhatsApp still records some numbers with 8 digits in its database, even if the contacts.vcf file is already updated with 9 digits. I strongly believe that this number with 8 digits comes directly from the WhatsApp server, which ended up not updating some numbers in the migration from 8 digits to 9 digits here in Brazil. |
This exporter is designed to be compliant with WhatsApp. If WhatsApp still uses eight-digit registered numbers, the exporter should continue matching contacts based on that format, so I don't believe any modifications are necessary for the exporter itself. However, if there is a consistent pattern in the change of Brazilian phone numbers, I may be able to create a separate script to modify the VCF file before it gets consumed by the exporter. |
You may try the script below and see if it correctly process your vcard files. import re
import argparse
def remove_ninth_digit(phone_number):
# Remove non-numeric characters (such as +, spaces, etc.)
phone_number = re.sub(r'\D', '', phone_number)
# If the phone number has 13 digits (with the country code), or 11 digits (without it)
if len(phone_number) == 13:
phone_number = phone_number[:4] + phone_number[5:]
elif len(phone_number) == 11:
phone_number = phone_number[:2] + phone_number[3:]
return phone_number
def process_vcard(input_vcard, output_vcard):
with open(input_vcard, 'r', encoding='utf-8') as file:
vcard_data = file.read()
phone_pattern = re.compile(r'TEL[:;](\+?\d{1,2} ?\d{1,2}\d{4}\d{4,5}|\d{1,2}\d{4}\d{4,5}|\d{9})')
# Replace phone numbers by removing the 9th digit if necessary
def replace_phone(match):
original_phone = match.group(1)
modified_phone = remove_ninth_digit(original_phone)
return f'TEL:{modified_phone}'
# Replace all phone numbers in the VCARD data
updated_vcard = re.sub(phone_pattern, replace_phone, vcard_data)
with open(output_vcard, 'w', encoding='utf-8') as file:
file.write(updated_vcard)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Process a VCARD file and remove the 9th digit from phone numbers.')
parser.add_argument('input_vcard', type=str, help='The input VCARD file name')
parser.add_argument('output_vcard', type=str, help='The output VCARD file name')
args = parser.parse_args()
process_vcard(args.input_vcard, args.output_vcard)
print(f"VCARD has been processed and saved to {args.output_vcard}") |
Hello, greetings! The code will create a .vcf file with the contacts in the following structure:
See how the code looks after my modifications import re
import argparse
def process_phone_number(raw_phone):
"""
Process the raw phone string from the VCARD and return two formatted numbers:
- The original formatted number, and
- A modified formatted number with the extra (ninth) digit removed, if applicable.
Desired output:
For a number with a 9-digit subscriber:
Original: "+55 {area} {first 5 of subscriber}-{last 4 of subscriber}"
Modified: "+55 {area} {subscriber[1:5]}-{subscriber[5:]}"
For example, for an input that should represent "027912345678", the outputs are:
"+55 27 91234-5678" and "+55 27 1234-5678"
This function handles numbers that may already include a "+55" prefix.
It expects that after cleaning, a valid number (without the country code) should have either 10 digits
(2 for area + 8 for subscriber) or 11 digits (2 for area + 9 for subscriber).
If extra digits are present, it takes the last 11 (or 10) digits.
"""
# If the number starts with '+55', remove it for processing.
number_to_process = raw_phone.strip()
if number_to_process.startswith("+55"):
number_to_process = number_to_process[3:].strip()
# Remove all non-digit characters.
digits = re.sub(r'\D', '', number_to_process)
# Remove trunk zero if present.
if digits.startswith("0"):
digits = digits[1:]
# After cleaning, we expect a valid number to have either 10 or 11 digits.
# If there are extra digits, use the last 11 (for a 9-digit subscriber) or last 10 (for an 8-digit subscriber).
if len(digits) > 11:
# Here, we assume the valid number is the last 11 digits.
digits = digits[-11:]
elif len(digits) == 12:
# In some cases with an 8-digit subscriber, take the last 10 digits.
digits = digits[-10:]
if len(digits) not in (10, 11):
return None, None
area = digits[:2]
subscriber = digits[2:]
if len(subscriber) == 9:
# Format the original number (5-4 split, e.g., "91234-5678")
orig_subscriber = f"{subscriber[:5]}-{subscriber[5:]}"
# Create a modified version: drop the first digit of the subscriber to form an 8-digit subscriber (4-4 split)
mod_subscriber = f"{subscriber[1:5]}-{subscriber[5:]}"
original_formatted = f"+55 {area} {orig_subscriber}"
modified_formatted = f"+55 {area} {mod_subscriber}"
elif len(subscriber) == 8:
original_formatted = f"+55 {area} {subscriber[:4]}-{subscriber[4:]}"
modified_formatted = None
return original_formatted, modified_formatted
def process_vcard(input_vcard, output_vcard):
with open(input_vcard, 'r', encoding='utf-8') as file:
lines = file.readlines()
output_lines = []
# Regex to capture any telephone line.
# It matches lines starting with "TEL:" or "TEL;TYPE=..." or with prefixes like "item1.TEL:".
phone_pattern = re.compile(r'^(?P<prefix>(?:TEL(?:;TYPE=[^:]+)?|(?:.*\.)?TEL)):(?P<number>.*)$')
for line in lines:
stripped_line = line.rstrip("\n")
match = phone_pattern.match(stripped_line)
if match:
raw_phone = match.group("number").strip()
orig_formatted, mod_formatted = process_phone_number(raw_phone)
if orig_formatted:
# Always output using the standardized prefix.
output_lines.append(f"TEL;TYPE=CELL:{orig_formatted}\n")
else:
output_lines.append(line)
if mod_formatted:
output_lines.append(f"TEL;TYPE=CELL:{mod_formatted}\n")
else:
output_lines.append(line)
with open(output_vcard, 'w', encoding='utf-8') as file:
file.writelines(output_lines)
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description="Process a VCARD file to standardize telephone entries and add a second TEL line with the modified number (removing the extra ninth digit) for contacts with 9-digit subscribers."
)
parser.add_argument('input_vcard', type=str, help='Input VCARD file')
parser.add_argument('output_vcard', type=str, help='Output VCARD file')
args = parser.parse_args()
process_vcard(args.input_vcard, args.output_vcard)
print(f"VCARD processed and saved to {args.output_vcard}") I chose to build the vCard like this because I noticed that WhatsApp Chat Exporter recognizes the numbers when they are like this, through the .vcf file Notes:
|
I'm glad you took the initiative to make it work for you, and I also appreciate you documenting the code! I've reviewed it and will be adding it to the repository. However, I believe a few changes are necessary before it can be released (for example, what if the number contains a country code but without the |
Must have
Describe the bug
Here in Brazil, we had the addition of a ninth digit to cell phone numbers. This ninth digit would be added after the DDI and DDD and before the contact's cell phone number. It would be something like +55 DDD NINTH_DIGIT CONTACT_NUMBER. It turns out that on WhatsApp, some numbers still have only 8 digits and not 9. This causes the exporter to get confused and not recognize some contacts that are saved on WhatsApp with only eight digits in their number, because it expects the contact to have nine digits.
To extract the contacts from the .vcf file, I used the command
--enrich-from-vcard contacts.vcf --default-country-code 55
The text was updated successfully, but these errors were encountered: