How updated is PHEN column in the result sample.RARE_PASS_GENE.xlsx file? #79

nswh · 2024-11-20T02:22:44Z

The PHEN column in the result sample.RARE_PASS_GENE.xlsx file is amalgamated information of OMIM disease gene and Orphanet nomenclature of rare diseases. Here is an example:

GENES	PHEN
MPZ,SDHC	MPZ: MIM -  ROUSSY-LEVY HEREDITARY AREFLEXIC DYSTASIA;  CHARCOT-MARIE-TOOTH DISEASE, AXONAL, TYPE 2I; CMT2I;  CHARCOT-MARIE-TOOTH DISEASE, DEMYELINATING, TYPE 1B; CMT1B;  NEUROPATHY, CONGENITAL HYPOMYELINATING OR AMYELINATING, AUTOSOMAL;  CHARCOT-MARIE-TOOTH DISEASE, AXONAL, TYPE 2J; CMT2J;  HYPERTROPHIC NEUROPATHY OF DEJERINE-SOTTAS;  CHARCOT-MARIE-TOOTH DISEASE, DOMINANT INTERMEDIATE D; CMTDID;  ADIE PUPIL|SDHC: MIM -  PARAGANGLIOMAS 3; PGL3;  PARAGANGLIOMA AND GASTRIC STROMAL SARCOMA

I can see the amalgamated information in PHEN column is derived from clinsv reference data. A file named ensemble_GRCh37_2_phen.txt as shown below. Date is Nov 7 2019

├── refdata-b38
│   ├── annotation
│   │   ├── 1kG_estd219.bed.gz
│   │   ├── 1kG_estd219.bed.gz.tbi
│   │   ├── DGV_GRCh38_hg38_variants_2020-02-25.bed.gz
│   │   ├── DGV_GRCh38_hg38_variants_2020-02-25.bed.gz.tbi
│   │   ├── ensemble_GRCh37_2_phen.txt
│   │   ├── Homo_sapiens.GRCh38.99.gff.gz
│   │   ├── Homo_sapiens.GRCh38.99.gff.gz.tbi
│   │   ├── Hs-gene-labels.txt
│   │   ├── Hs-gene-to-phenotype.txt
│   │   ├── MGRB-SV.bed.gz
│   │   └── MGRB-SV.bed.gz.tbi

Could you clarify is Nov 7 2019 the time last updated of the OMIM and Orphanet information? And why the file name is as GRCh37 instead of GRCh38? Is it because of the transcripts and GENES are not changing between genome build? Also, Do you have an estimation of the next update of ensemble_GRCh37_2_phen.txt? If there is no plan of update, could you provide by the time you created ensemble_GRCh37_2_phen.txt, what were the resource files you download and what procedure you have gone through to make this file ensemble_GRCh37_2_phen.txt?

The text was updated successfully, but these errors were encountered:

J-Bradlee · 2024-12-09T01:02:53Z

Hi @nswh

It looks like "ensemble_GRCh37_2_phen.txt" is generated from biomart on the ensembl website.

It selects these attributes:

Then click on results and download the tsv version.

Two of the column names have changed from when ensemble_GRCh37_2_phen.txt was originally created (which is probably since Nov 7, 2019, as you mentioned):

Ensembl Gene ID -> Gene stable ID
Associated Gene Name -> Gene Name

Attached I have the tsv I made on the 2/12/24 following the procedure above, I have also re-mapped the changed column names back to the old column name. Google drive link to file.

There are significantly more annotations in this one 177k vs 32k.

ClinSV needs to update its annotation resources files. I can write up a script which could pull this annotation on the fly from biomart.

J-Bradlee · 2024-12-11T04:52:24Z

biomart also gives the perl script which generates this TSV using their API (just click the perl tab).

Note: it doesn't do any of the mapping mention above which is required to keep the annoation file headers consistent:

Ensembl Gene ID -> Gene stable ID
Associated Gene Name -> Gene Name

# An example script demonstrating the use of BioMart API.
# This perl API representation is only available for configuration versions >=  0.5 
use strict;
use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

my $confFile = "PATH TO YOUR REGISTRY FILE UNDER biomart-perl/conf/. For Biomart Central Registry navigate to
						http://www.biomart.org/biomart/martservice?type=registry";
#
# NB: change action to 'clean' if you wish to start a fresh configuration  
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
#

my $action='cached';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

		
	$query->setDataset("hsapiens_gene_ensembl");
	$query->addAttribute("ensembl_gene_id");
	$query->addAttribute("phenotype_description");
	$query->addAttribute("source_name");
	$query->addAttribute("external_gene_name");
	$query->addAttribute("study_external_id");
	$query->addAttribute("mim_gene_accession");
	$query->addAttribute("mim_morbid_description");
	$query->addAttribute("mim_gene_description");

$query->formatter("TSV");

my $query_runner = BioMart::QueryRunner->new();
############################## GET COUNT ############################
# $query->count(1);
# $query_runner->execute($query);
# print $query_runner->getCount();
#####################################################################


############################## GET RESULTS ##########################
# to obtain unique rows only
# $query_runner->uniqueRowsOnly(1);

$query_runner->execute($query);
$query_runner->printHeader();
$query_runner->printResults();
$query_runner->printFooter();
#####################################################################```

drmjc · 2025-01-20T07:23:00Z

did this new file work for you @nswh ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How updated is PHEN column in the result sample.RARE_PASS_GENE.xlsx file? #79

How updated is PHEN column in the result sample.RARE_PASS_GENE.xlsx file? #79

nswh commented Nov 20, 2024 •

edited

Loading

J-Bradlee commented Dec 9, 2024

J-Bradlee commented Dec 11, 2024 •

edited

Loading

drmjc commented Jan 20, 2025

How updated is PHEN column in the result sample.RARE_PASS_GENE.xlsx file? #79

How updated is PHEN column in the result sample.RARE_PASS_GENE.xlsx file? #79

Comments

nswh commented Nov 20, 2024 • edited Loading

J-Bradlee commented Dec 9, 2024

J-Bradlee commented Dec 11, 2024 • edited Loading

drmjc commented Jan 20, 2025

nswh commented Nov 20, 2024 •

edited

Loading

J-Bradlee commented Dec 11, 2024 •

edited

Loading