-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How updated is PHEN column in the result sample.RARE_PASS_GENE.xlsx file? #79
Comments
Hi @nswh It looks like "ensemble_GRCh37_2_phen.txt" is generated from biomart on the ensembl website. Then click on results and download the tsv version. Two of the column names have changed from when ensemble_GRCh37_2_phen.txt was originally created (which is probably since Nov 7, 2019, as you mentioned):
Attached I have the tsv I made on the 2/12/24 following the procedure above, I have also re-mapped the changed column names back to the old column name. Google drive link to file. There are significantly more annotations in this one 177k vs 32k. ClinSV needs to update its annotation resources files. I can write up a script which could pull this annotation on the fly from biomart. |
biomart also gives the perl script which generates this TSV using their API (just click the perl tab). Note: it doesn't do any of the mapping mention above which is required to keep the annoation file headers consistent:
|
did this new file work for you @nswh ? |
The PHEN column in the result sample.RARE_PASS_GENE.xlsx file is amalgamated information of OMIM disease gene and Orphanet nomenclature of rare diseases. Here is an example:
I can see the amalgamated information in PHEN column is derived from clinsv reference data. A file named ensemble_GRCh37_2_phen.txt as shown below. Date is Nov 7 2019
Could you clarify is Nov 7 2019 the time last updated of the OMIM and Orphanet information? And why the file name is as GRCh37 instead of GRCh38? Is it because of the transcripts and GENES are not changing between genome build? Also, Do you have an estimation of the next update of ensemble_GRCh37_2_phen.txt? If there is no plan of update, could you provide by the time you created ensemble_GRCh37_2_phen.txt, what were the resource files you download and what procedure you have gone through to make this file ensemble_GRCh37_2_phen.txt?
The text was updated successfully, but these errors were encountered: