[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127

magpires · 2024-12-26T21:57:09Z

Must have

WhatsApp version: 2.24.25.77
OS: Android - 14
Platform: Windows
Exporter's branch and version: main - Major Release (2024/10/24)

Describe the bug
Here in Brazil, we had the addition of a ninth digit to cell phone numbers. This ninth digit would be added after the DDI and DDD and before the contact's cell phone number. It would be something like +55 DDD NINTH_DIGIT CONTACT_NUMBER. It turns out that on WhatsApp, some numbers still have only 8 digits and not 9. This causes the exporter to get confused and not recognize some contacts that are saved on WhatsApp with only eight digits in their number, because it expects the contact to have nine digits.

To extract the contacts from the .vcf file, I used the command --enrich-from-vcard contacts.vcf --default-country-code 55

The text was updated successfully, but these errors were encountered:

KnugiHK · 2025-01-02T08:17:10Z

Hi, could you provide an example of a phone number with both 8 and 9 digits for better understanding? Also, just to clarify, are you saying that WhatsApp still registers some users with the 8-digit format, even though your contacts.vcf file has already been updated to the 9-digit format?

magpires · 2025-01-02T18:09:25Z

Hello! Yes, I can show you an example of a phone number with 8 and 9 digits.
An example of a number I can show you is this:

557312345678 - It is saved like this with eight digits in wa.db and you can also see it like this on the contact information screen in WhatsApp
5573912345678 - It is saved like this with nine digits in the contacts.vcf file

And to clarify your doubt, yes! WhatsApp still records some numbers with 8 digits in its database, even if the contacts.vcf file is already updated with 9 digits. I strongly believe that this number with 8 digits comes directly from the WhatsApp server, which ended up not updating some numbers in the migration from 8 digits to 9 digits here in Brazil.

KnugiHK · 2025-01-04T07:16:12Z

This exporter is designed to be compliant with WhatsApp. If WhatsApp still uses eight-digit registered numbers, the exporter should continue matching contacts based on that format, so I don't believe any modifications are necessary for the exporter itself.

However, if there is a consistent pattern in the change of Brazilian phone numbers, I may be able to create a separate script to modify the VCF file before it gets consumed by the exporter.

KnugiHK · 2025-02-09T07:16:14Z

You may try the script below and see if it correctly process your vcard files.

import re
import argparse


def remove_ninth_digit(phone_number):
    # Remove non-numeric characters (such as +, spaces, etc.)
    phone_number = re.sub(r'\D', '', phone_number)
    
    # If the phone number has 13 digits (with the country code), or 11 digits (without it)
    if len(phone_number) == 13:
        phone_number = phone_number[:4] + phone_number[5:]
    elif len(phone_number) == 11:
        phone_number = phone_number[:2] + phone_number[3:]
    
    return phone_number


def process_vcard(input_vcard, output_vcard):
    with open(input_vcard, 'r', encoding='utf-8') as file:
        vcard_data = file.read()

    phone_pattern = re.compile(r'TEL[:;](\+?\d{1,2} ?\d{1,2}\d{4}\d{4,5}|\d{1,2}\d{4}\d{4,5}|\d{9})')
    
    # Replace phone numbers by removing the 9th digit if necessary
    def replace_phone(match):
        original_phone = match.group(1)
        modified_phone = remove_ninth_digit(original_phone)
        return f'TEL:{modified_phone}'
    
    # Replace all phone numbers in the VCARD data
    updated_vcard = re.sub(phone_pattern, replace_phone, vcard_data)
    
    with open(output_vcard, 'w', encoding='utf-8') as file:
        file.write(updated_vcard)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process a VCARD file and remove the 9th digit from phone numbers.')
    parser.add_argument('input_vcard', type=str, help='The input VCARD file name')
    parser.add_argument('output_vcard', type=str, help='The output VCARD file name')
    args = parser.parse_args()

    process_vcard(args.input_vcard, args.output_vcard)

    print(f"VCARD has been processed and saved to {args.output_vcard}")

magpires · 2025-02-09T23:50:01Z

Hello, greetings!
The code you provided here did not work as expected. I took the liberty of editing it and would like to share it with you. Python is not my everyday language, so if the code seems dirty, feel free to modify it.

The code will create a .vcf file with the contacts in the following structure:

VERSION:3.0
FN:Contact Name
N:Name;Contact;;;
TEL;TYPE=CELL:+55 DD 91234-5678
TEL;TYPE=CELL:+55 DD 1234-5678
CATEGORIES:myContacts
END:VCARD

See how the code looks after my modifications

import re
import argparse

def process_phone_number(raw_phone):
    """
    Process the raw phone string from the VCARD and return two formatted numbers:
      - The original formatted number, and
      - A modified formatted number with the extra (ninth) digit removed, if applicable.
      
    Desired output:
      For a number with a 9-digit subscriber:
         Original: "+55 {area} {first 5 of subscriber}-{last 4 of subscriber}"
         Modified: "+55 {area} {subscriber[1:5]}-{subscriber[5:]}" 
      For example, for an input that should represent "027912345678", the outputs are:
         "+55 27 91234-5678"  and  "+55 27 1234-5678"
    
    This function handles numbers that may already include a "+55" prefix.
    It expects that after cleaning, a valid number (without the country code) should have either 10 digits 
    (2 for area + 8 for subscriber) or 11 digits (2 for area + 9 for subscriber).
    If extra digits are present, it takes the last 11 (or 10) digits.
    """
    # If the number starts with '+55', remove it for processing.
    number_to_process = raw_phone.strip()
    if number_to_process.startswith("+55"):
        number_to_process = number_to_process[3:].strip()
    
    # Remove all non-digit characters.
    digits = re.sub(r'\D', '', number_to_process)
    
    # Remove trunk zero if present.
    if digits.startswith("0"):
        digits = digits[1:]
    
    # After cleaning, we expect a valid number to have either 10 or 11 digits.
    # If there are extra digits, use the last 11 (for a 9-digit subscriber) or last 10 (for an 8-digit subscriber).
    if len(digits) > 11:
        # Here, we assume the valid number is the last 11 digits.
        digits = digits[-11:]
    elif len(digits) == 12:
        # In some cases with an 8-digit subscriber, take the last 10 digits.
        digits = digits[-10:]
    
    if len(digits) not in (10, 11):
        return None, None

    area = digits[:2]
    subscriber = digits[2:]

    if len(subscriber) == 9:
        # Format the original number (5-4 split, e.g., "91234-5678")
        orig_subscriber = f"{subscriber[:5]}-{subscriber[5:]}"
        # Create a modified version: drop the first digit of the subscriber to form an 8-digit subscriber (4-4 split)
        mod_subscriber = f"{subscriber[1:5]}-{subscriber[5:]}"
        original_formatted = f"+55 {area} {orig_subscriber}"
        modified_formatted = f"+55 {area} {mod_subscriber}"
    elif len(subscriber) == 8:
        original_formatted = f"+55 {area} {subscriber[:4]}-{subscriber[4:]}"
        modified_formatted = None

    return original_formatted, modified_formatted

def process_vcard(input_vcard, output_vcard):
    with open(input_vcard, 'r', encoding='utf-8') as file:
        lines = file.readlines()
    
    output_lines = []
    
    # Regex to capture any telephone line.
    # It matches lines starting with "TEL:" or "TEL;TYPE=..." or with prefixes like "item1.TEL:".
    phone_pattern = re.compile(r'^(?P<prefix>(?:TEL(?:;TYPE=[^:]+)?|(?:.*\.)?TEL)):(?P<number>.*)$')
    
    for line in lines:
        stripped_line = line.rstrip("\n")
        match = phone_pattern.match(stripped_line)
        if match:
            raw_phone = match.group("number").strip()
            orig_formatted, mod_formatted = process_phone_number(raw_phone)
            if orig_formatted:
                # Always output using the standardized prefix.
                output_lines.append(f"TEL;TYPE=CELL:{orig_formatted}\n")
            else:
                output_lines.append(line)
            if mod_formatted:
                output_lines.append(f"TEL;TYPE=CELL:{mod_formatted}\n")
        else:
            output_lines.append(line)
    
    with open(output_vcard, 'w', encoding='utf-8') as file:
        file.writelines(output_lines)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="Process a VCARD file to standardize telephone entries and add a second TEL line with the modified number (removing the extra ninth digit) for contacts with 9-digit subscribers."
    )
    parser.add_argument('input_vcard', type=str, help='Input VCARD file')
    parser.add_argument('output_vcard', type=str, help='Output VCARD file')
    args = parser.parse_args()
    
    process_vcard(args.input_vcard, args.output_vcard)
    print(f"VCARD processed and saved to {args.output_vcard}")

I chose to build the vCard like this because I noticed that WhatsApp Chat Exporter recognizes the numbers when they are like this, through the .vcf file

Notes:

The code captures several occurrences of how Brazilian numbers are saved in the .vcf file
This may not work well for numbers from other countries, so when attaching to the project, run it only if default-country-code is 55

KnugiHK · 2025-02-10T16:49:32Z

I'm glad you took the initiative to make it work for you, and I also appreciate you documenting the code! I've reviewed it and will be adding it to the repository. However, I believe a few changes are necessary before it can be released (for example, what if the number contains a country code but without the + sign).

KnugiHK self-assigned this Feb 9, 2025

KnugiHK added the enhancement New feature or request label Feb 9, 2025

KnugiHK added a commit that referenced this issue Feb 10, 2025

Create a script to process Brazilian numbers in vcards #127

0cbae4d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127

[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127

magpires commented Dec 26, 2024 •

edited

Loading

KnugiHK commented Jan 2, 2025

magpires commented Jan 2, 2025

KnugiHK commented Jan 4, 2025

KnugiHK commented Feb 9, 2025

magpires commented Feb 9, 2025 •

edited

Loading

KnugiHK commented Feb 10, 2025

[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127

[BUG] Incompatibility with contacts that have eight digits in their cell phone number in Brazil #127

Comments

magpires commented Dec 26, 2024 • edited Loading

Must have

KnugiHK commented Jan 2, 2025

magpires commented Jan 2, 2025

KnugiHK commented Jan 4, 2025

KnugiHK commented Feb 9, 2025

magpires commented Feb 9, 2025 • edited Loading

Notes:

KnugiHK commented Feb 10, 2025

magpires commented Dec 26, 2024 •

edited

Loading

magpires commented Feb 9, 2025 •

edited

Loading