-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script: Add convert .gt file format to .csv.gz + docs #62
Conversation
…script for graph conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Script ran for me once I got deps installed! I wasn't able to unzip the gz with built-in Extract in Ubuntu, but I guess you've tested that these files work with cugraph!
Couple of little docs suggestions.
README.md
Outdated
|
||
- The host machine has a GPU and the NVIDIA drivers and cuda-toolkit are installed with correct versions. [Check the NVIDIA documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux) | ||
- The necessary libraries for GPU support are installed in your environment.[Check RAPIDS documentation](https://rapids.ai/) | ||
- Ensure any new databases have *graph.csv.gz* file. If not run script in scripts folder: `python gt-to-csv-gz.py` with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Ensure any new databases have *graph.csv.gz* file. If not run script in scripts folder: `python gt-to-csv-gz.py` with | |
- Ensure any new PopPUNK databases have *graph.csv.gz* file. If not run script in scripts folder: `python gt-to-csv-gz.py` with | |
README.md
Outdated
- `--input` the path to the graph .gt file | ||
- `--output` the path to the output csv.gz file | ||
- In `args.json` set the `gpu_graph` and `gpu_dist` to `True` for both `assign` and `visualise` fields. | ||
*Note: may need to reinstall and **pp-sketchlib, PopPUNK and mandrake** to ensure CUDA enabled versions are installed* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Note: may need to reinstall and **pp-sketchlib, PopPUNK and mandrake** to ensure CUDA enabled versions are installed* | |
*Note: may need to reinstall **pp-sketchlib, PopPUNK and mandrake** to ensure CUDA enabled versions are installed* |
Would be it easy to say here what the minimum GPU enabled versions of these libraries are?
README.md
Outdated
- The necessary libraries for GPU support are installed in your environment.[Check RAPIDS documentation](https://rapids.ai/) | ||
- Ensure any new databases have *graph.csv.gz* file. If not run script in scripts folder: `python gt-to-csv-gz.py` with | ||
- `--input` the path to the graph .gt file | ||
- `--output` the path to the output csv.gz file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does PopPUNK know where to find this? Should it always have the same name as the .gt file, and be output to the same location? If so, maybe that can be assumed by the script and shouldn't be a parameters, or should be the default, or should at least be documented that for normal operation that's what --output
value should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yup ive just added a note to say what it should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually yeah im stupid have removed --output completely now
This PR adds script that converts .gt(graph-tool) file to a .csv.gz file. This is needed as GPU graphing library cugraph cannot read .gt file format, thus we need to convert.
This has already been done for all databases and is already in mrcdata/beebop.
There is also some docs updates to run beebop with a gpu
Testing:
run python script and check the .csv.gz file shows up Eg:
python gt-to-csv-gz.py -i /home/$USER/code/beebop_py/storage/dbs/gas_database/gas_database_graph.gt -o /home/$USER/code/beebop_py/storage/dbs/gas_database/gas_database_graph.csv.gz