RECOVAR

Overview

RECOVAR is an unsupervised machine learning framework for detecting seismic signals from continuous waveform data. By leveraging representation learning through deep auto-encoders, this method aims to effectively distinguish between seismic signals and noise without supervision, offering competitive performances to many state-of-the-art supervised methods in cross-dataset scenarios.

Features

Unsupervised Learning: Utilizes deep auto-encoders to learn compressed representations of seismic waveforms, requiring raw waveforms.
Robust Performance: Demonstrates superior detection capabilities compared to existing supervised methods, with strong cross-dataset generalization.
Scalability: Designed to handle large-scale time-series data, making it applicable to various signal detection tasks beyond seismology.
Intuitive Design: Employs a time-axis-preserving approach and a straightforward triggering mechanism to differentiate noise from signals.

Installation

Prerequisites

Python Version: Ensure you are using Python 3.10.
NVIDIA GPU Drivers (If using GPU): Required for GPU support.
CUDA and cuDNN Libraries (If using GPU): Compatible versions for TensorFlow 2.14.0.

Using conda (Without GPU)

Create and activate conda environment

conda create -n <environment_name> python=3.10
conda activate <environment_name>

Install package SeismicPurifier

git clone [email protected]:onurefe/SeismicPurifier.git
cd SeismicPurifier
python setup.py install

Using conda (With GPU)

Create and activate conda environment

conda create -n <environment_name> python=3.10
conda activate <environment_name>

Install tensorflow and cuda/cudnn libraries
```
pip install tensorflow[and-cuda]==2.14
```

Install package SeismicPurifier

git clone [email protected]:onurefe/SeismicPurifier.git
cd SeismicPurifier
python setup.py install

Direct installation (Without GPU)

Install package SeismicPurifier

git clone [email protected]:onurefe/SeismicPurifier.git
cd SeismicPurifier
python setup.py install

Direct installation (With GPU)

Install tensorflow and cuda/cudnn libraries
```
pip install tensorflow[and-cuda]==2.14
```

Install package SeismicPurifier

git clone [email protected]:onurefe/SeismicPurifier.git
cd SeismicPurifier
python setup.py install

Reproducing the Results

Downloading dataset

In order to reproduce the results given in the paper, first you need to download stead and instance datasets. Visit STEAD and INSTANCE for downloading instructions.

Adjusting paths, and settings

After downloading the dataset, we need to configure settings.json file in the SeismicPurifier/reproducibility folder. This file provides path variables and there are multiple experimentation options which you may consider adjusting for different purposes.

Configurable parameters

For reproducing the current results, keep these values as they are.

CONFIG
- SUBSAMPLING_FACTOR: Default value 1.0. Change this factor if you want to use less data for training, testing and validation.
- TRAIN_VALIDATION_SPLIT: Default value 0.75. The ratio of training set size to the validation set size.
- KFOLD_SPLITS: Default value 5. Dataset is split into equal parts for obtaining statistics about the performance of the model. This parameter controls the number of splits.
- DATASET_CHUNKS: Default value 20. We split the dataset into chunk of smaller portions before using kfold validation which boosts the training performance.
- PHASE_PICK_ENSURED_CROP_RATIO: Default value 0.666666. During the training period, if the dataset window size is longer than 30s, we crop the waveform randomly. However, this crop may or may not include the P arrival event. This ratio narrows down the selection of crop window positions such that phase arrival event is ensured to be included for %66(for the default value) of the waveform samples.
- PHASE_ENSURING_MARGIN: Marging of the definition of including the P arrival event.

Directories

Adjust paths of the datasets after downloading them to your system.

DATASET_DIRECTORIES
- STEAD_WAVEFORMS_HDF5_PATH: Path to the STEAD dataset's waveforms stored in HDF5 format. Ensure this path points to the correct location of your STEAD waveforms file.
- STEAD_METADATA_CSV_PATH: Path to the STEAD dataset's metadata stored in CSV format. This file contains essential metadata associated with the STEAD waveforms.
- INSTANCE_NOISE_WAVEFORMS_HDF5_PATH: Path to the INSTANCE dataset's noise waveforms stored in HDF5 format. Ensure this path points to the correct location of your INSTANCE noise waveforms file.
- INSTANCE_EQ_WAVEFORMS_HDF5_PATH: Path to the INSTANCE dataset's earthquake waveforms stored in HDF5 format. Ensure this path points to the correct location of your INSTANCE earthquake waveforms file.
- INSTANCE_NOISE_METADATA_CSV_PATH: Path to the INSTANCE dataset's noise metadata stored in CSV format. This file contains essential metadata associated with the INSTANCE noise waveforms.
- INSTANCE_EQ_METADATA_CSV_PATH: Path to the INSTANCE dataset's earthquake metadata stored in CSV format. This file contains essential metadata associated with the INSTANCE earthquake waveforms.
PREPROCESSED_DATASET_DIRECTORY: Directory where preprocessed datasets will be stored. Ensure this directory exists or the application has permissions to create it.
TRAINED_MODELS_DIR: Directory to store trained machine learning models. Ensure this directory exists or the application has permissions to create it.
RESULTS_DIR: Directory to store the classification results.

Training the models

After completing the download and setting adjustment procedures, proceed to the reproducibility folder inside the SeismicPurifier directory.

By using training.ipynb you can train all available of models on both datasets. At the initial phase of the training, datapreprocessing part may take a while(approximately couple of hours). However, it boost the training procedure significantly. On NVIDIA RTX3090 Ti, all training procedure(5-Fold, three models, whole of two datasets and for 20 epochs) takes approximately a day.

Testing the models

After the training the models, you can test different method performances by using kfold_tester(for obtaining unnormalized earthquake probabilities) and evaluator. testing.ipynb provides a template for the testing procedure.

Experimenting with Custom Data.

If your dataset is compatible with the structure of either INSTANCE or STEAD datasets, you can use all machinery under the folder reproducibility. Or, a different option could be converting your data into STEAD dataset format by using QuakeLabeler or SeisBench.

For other types of data, it's possible to feed numpy arrays directly for training. SeismicPurifier/model_train.ipynb provides example for this case. You can also test your data as well. Please check SeismicPurifier/model_test.ipynb. For training and testing, pretrained models are stored in SeismicPurifier/models folder. Besides, SeismicPurifier/data involves small dataset for experimenting.

License

This project is licensed under the MIT License.

Contact

For any questions, issues, or feature requests, please open an issue on the GitHub repository contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.vscode		.vscode
data		data
models		models
recovar		recovar
reproducibility		reproducibility
.gitignore		.gitignore
README.md		README.md
model_test.ipynb		model_test.ipynb
model_train.ipynb		model_train.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RECOVAR

Overview

Features

Table of Contents

Installation

Prerequisites

Using conda (Without GPU)

Using conda (With GPU)

Direct installation (Without GPU)

Direct installation (With GPU)

Reproducing the Results

Downloading dataset

Adjusting paths, and settings

Configurable parameters

Directories

Training the models

Testing the models

Experimenting with Custom Data.

License

Contact

About

Releases

Packages

Languages

onurefe/recovar

Folders and files

Latest commit

History

Repository files navigation

RECOVAR

Overview

Features

Table of Contents

Installation

Prerequisites

Using conda (Without GPU)

Using conda (With GPU)

Direct installation (Without GPU)

Direct installation (With GPU)

Reproducing the Results

Downloading dataset

Adjusting paths, and settings

Configurable parameters

Directories

Training the models

Testing the models

Experimenting with Custom Data.

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages