Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127

Open
magpires opened this issue Dec 26, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@magpires
Copy link

magpires commented Dec 26, 2024

Must have

  • WhatsApp version: 2.24.25.77
  • OS: Android - 14
  • Platform: Windows
  • Exporter's branch and version: main - Major Release (2024/10/24)

Describe the bug
Here in Brazil, we had the addition of a ninth digit to cell phone numbers. This ninth digit would be added after the DDI and DDD and before the contact's cell phone number. It would be something like +55 DDD NINTH_DIGIT CONTACT_NUMBER. It turns out that on WhatsApp, some numbers still have only 8 digits and not 9. This causes the exporter to get confused and not recognize some contacts that are saved on WhatsApp with only eight digits in their number, because it expects the contact to have nine digits.

To extract the contacts from the .vcf file, I used the command --enrich-from-vcard contacts.vcf --default-country-code 55

@KnugiHK
Copy link
Owner

KnugiHK commented Jan 2, 2025

Hi, could you provide an example of a phone number with both 8 and 9 digits for better understanding? Also, just to clarify, are you saying that WhatsApp still registers some users with the 8-digit format, even though your contacts.vcf file has already been updated to the 9-digit format?

@magpires
Copy link
Author

magpires commented Jan 2, 2025

Hello! Yes, I can show you an example of a phone number with 8 and 9 digits.
An example of a number I can show you is this:

557312345678 - It is saved like this with eight digits in wa.db and you can also see it like this on the contact information screen in WhatsApp
5573912345678 - It is saved like this with nine digits in the contacts.vcf file

And to clarify your doubt, yes! WhatsApp still records some numbers with 8 digits in its database, even if the contacts.vcf file is already updated with 9 digits. I strongly believe that this number with 8 digits comes directly from the WhatsApp server, which ended up not updating some numbers in the migration from 8 digits to 9 digits here in Brazil.

@KnugiHK
Copy link
Owner

KnugiHK commented Jan 4, 2025

This exporter is designed to be compliant with WhatsApp. If WhatsApp still uses eight-digit registered numbers, the exporter should continue matching contacts based on that format, so I don't believe any modifications are necessary for the exporter itself.

However, if there is a consistent pattern in the change of Brazilian phone numbers, I may be able to create a separate script to modify the VCF file before it gets consumed by the exporter.

@KnugiHK
Copy link
Owner

KnugiHK commented Feb 9, 2025

You may try the script below and see if it correctly process your vcard files.

import re
import argparse


def remove_ninth_digit(phone_number):
    # Remove non-numeric characters (such as +, spaces, etc.)
    phone_number = re.sub(r'\D', '', phone_number)
    
    # If the phone number has 13 digits (with the country code), or 11 digits (without it)
    if len(phone_number) == 13:
        phone_number = phone_number[:4] + phone_number[5:]
    elif len(phone_number) == 11:
        phone_number = phone_number[:2] + phone_number[3:]
    
    return phone_number


def process_vcard(input_vcard, output_vcard):
    with open(input_vcard, 'r', encoding='utf-8') as file:
        vcard_data = file.read()

    phone_pattern = re.compile(r'TEL[:;](\+?\d{1,2} ?\d{1,2}\d{4}\d{4,5}|\d{1,2}\d{4}\d{4,5}|\d{9})')
    
    # Replace phone numbers by removing the 9th digit if necessary
    def replace_phone(match):
        original_phone = match.group(1)
        modified_phone = remove_ninth_digit(original_phone)
        return f'TEL:{modified_phone}'
    
    # Replace all phone numbers in the VCARD data
    updated_vcard = re.sub(phone_pattern, replace_phone, vcard_data)
    
    with open(output_vcard, 'w', encoding='utf-8') as file:
        file.write(updated_vcard)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process a VCARD file and remove the 9th digit from phone numbers.')
    parser.add_argument('input_vcard', type=str, help='The input VCARD file name')
    parser.add_argument('output_vcard', type=str, help='The output VCARD file name')
    args = parser.parse_args()

    process_vcard(args.input_vcard, args.output_vcard)

    print(f"VCARD has been processed and saved to {args.output_vcard}")

@KnugiHK KnugiHK self-assigned this Feb 9, 2025
@KnugiHK KnugiHK added the enhancement New feature or request label Feb 9, 2025
@magpires
Copy link
Author

magpires commented Feb 9, 2025

Hello, greetings!
The code you provided here did not work as expected. I took the liberty of editing it and would like to share it with you. Python is not my everyday language, so if the code seems dirty, feel free to modify it.

The code will create a .vcf file with the contacts in the following structure:

VERSION:3.0
FN:Contact Name
N:Name;Contact;;;
TEL;TYPE=CELL:+55 DD 91234-5678
TEL;TYPE=CELL:+55 DD 1234-5678
CATEGORIES:myContacts
END:VCARD

See how the code looks after my modifications

import re
import argparse

def process_phone_number(raw_phone):
    """
    Process the raw phone string from the VCARD and return two formatted numbers:
      - The original formatted number, and
      - A modified formatted number with the extra (ninth) digit removed, if applicable.
      
    Desired output:
      For a number with a 9-digit subscriber:
         Original: "+55 {area} {first 5 of subscriber}-{last 4 of subscriber}"
         Modified: "+55 {area} {subscriber[1:5]}-{subscriber[5:]}" 
      For example, for an input that should represent "027912345678", the outputs are:
         "+55 27 91234-5678"  and  "+55 27 1234-5678"
    
    This function handles numbers that may already include a "+55" prefix.
    It expects that after cleaning, a valid number (without the country code) should have either 10 digits 
    (2 for area + 8 for subscriber) or 11 digits (2 for area + 9 for subscriber).
    If extra digits are present, it takes the last 11 (or 10) digits.
    """
    # If the number starts with '+55', remove it for processing.
    number_to_process = raw_phone.strip()
    if number_to_process.startswith("+55"):
        number_to_process = number_to_process[3:].strip()
    
    # Remove all non-digit characters.
    digits = re.sub(r'\D', '', number_to_process)
    
    # Remove trunk zero if present.
    if digits.startswith("0"):
        digits = digits[1:]
    
    # After cleaning, we expect a valid number to have either 10 or 11 digits.
    # If there are extra digits, use the last 11 (for a 9-digit subscriber) or last 10 (for an 8-digit subscriber).
    if len(digits) > 11:
        # Here, we assume the valid number is the last 11 digits.
        digits = digits[-11:]
    elif len(digits) == 12:
        # In some cases with an 8-digit subscriber, take the last 10 digits.
        digits = digits[-10:]
    
    if len(digits) not in (10, 11):
        return None, None

    area = digits[:2]
    subscriber = digits[2:]

    if len(subscriber) == 9:
        # Format the original number (5-4 split, e.g., "91234-5678")
        orig_subscriber = f"{subscriber[:5]}-{subscriber[5:]}"
        # Create a modified version: drop the first digit of the subscriber to form an 8-digit subscriber (4-4 split)
        mod_subscriber = f"{subscriber[1:5]}-{subscriber[5:]}"
        original_formatted = f"+55 {area} {orig_subscriber}"
        modified_formatted = f"+55 {area} {mod_subscriber}"
    elif len(subscriber) == 8:
        original_formatted = f"+55 {area} {subscriber[:4]}-{subscriber[4:]}"
        modified_formatted = None

    return original_formatted, modified_formatted

def process_vcard(input_vcard, output_vcard):
    with open(input_vcard, 'r', encoding='utf-8') as file:
        lines = file.readlines()
    
    output_lines = []
    
    # Regex to capture any telephone line.
    # It matches lines starting with "TEL:" or "TEL;TYPE=..." or with prefixes like "item1.TEL:".
    phone_pattern = re.compile(r'^(?P<prefix>(?:TEL(?:;TYPE=[^:]+)?|(?:.*\.)?TEL)):(?P<number>.*)$')
    
    for line in lines:
        stripped_line = line.rstrip("\n")
        match = phone_pattern.match(stripped_line)
        if match:
            raw_phone = match.group("number").strip()
            orig_formatted, mod_formatted = process_phone_number(raw_phone)
            if orig_formatted:
                # Always output using the standardized prefix.
                output_lines.append(f"TEL;TYPE=CELL:{orig_formatted}\n")
            else:
                output_lines.append(line)
            if mod_formatted:
                output_lines.append(f"TEL;TYPE=CELL:{mod_formatted}\n")
        else:
            output_lines.append(line)
    
    with open(output_vcard, 'w', encoding='utf-8') as file:
        file.writelines(output_lines)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="Process a VCARD file to standardize telephone entries and add a second TEL line with the modified number (removing the extra ninth digit) for contacts with 9-digit subscribers."
    )
    parser.add_argument('input_vcard', type=str, help='Input VCARD file')
    parser.add_argument('output_vcard', type=str, help='Output VCARD file')
    args = parser.parse_args()
    
    process_vcard(args.input_vcard, args.output_vcard)
    print(f"VCARD processed and saved to {args.output_vcard}")

I chose to build the vCard like this because I noticed that WhatsApp Chat Exporter recognizes the numbers when they are like this, through the .vcf file

Notes:

  1. The code captures several occurrences of how Brazilian numbers are saved in the .vcf file
  2. This may not work well for numbers from other countries, so when attaching to the project, run it only if default-country-code is 55

@KnugiHK
Copy link
Owner

KnugiHK commented Feb 10, 2025

I'm glad you took the initiative to make it work for you, and I also appreciate you documenting the code! I've reviewed it and will be adding it to the repository. However, I believe a few changes are necessary before it can be released (for example, what if the number contains a country code but without the + sign).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants