-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event grid updates #366
base: master
Are you sure you want to change the base?
Event grid updates #366
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, Jinyan, I think we are very close! I added a few comments in the code and also have a few higher-level concerns/suggestions. I'll list those numbered below so that you can easily reference them in your response using the numbers.
-
Currently, we have two methods to provide filtering functionality in NNE.py. We allow users to have a
filter_label
and also allow them to add columns with the "filter_" prefix. I like what you did to make the latter approach more robust and I think it is in every way superior to the former one. I suggest dropping thefilter_label
approach and converting our example to use the "filter_" prefix. A minor related suggestion is to generalize our comments to discuss filtering for any feature, not only soil types or some other geotechnical characteristic. What we have is agnostic to the feature as far as I understand. -
I think mixing file input/output operations with the core sampling functionality in NNE is a big step backwards in code quality. Those were intentionally separated in the earlier version of the code and I strongly recommend keeping them separate to follow best practices. (I am talking about lines 188 - 331 in NNE.py)
-
I think it makes sense to keep supporting the old CSV format for EventGrids, but I don't think this should lead to if statements all over our applications. Instead, I would take care of this in the first tool that uses the event grids: NNE.py. Within that tool, we should start with parsing whatever event information is provided and storing the data in an internal object - whatever we find convenient, probably a dataframe or geodataframe. After that point, we should not need to worry about what kind of input was provided by the user and the scripts should work just fine.
-
A related, but smaller point concerns the other GIS inputs. These should be supported the same way. Make sure we can parse them; if they have missing or incorrect data, throw a meaningful error, otherwise, they should just work with the rest of the codebase.
-
When dealing with IMs, I thought we were replacing the multinomial random sampling of neighbors with a nonlinear interpolation using IMs and weights from neighbors. This will always provide a smooth IM field which is almost surely the preferred outcome over the noisy, jaggered IM field that comes out of the NNE.py right now. The difference is especially pronounced when we have large grid spacing with substantial changes in IM values assigned to nearby grid points -> note that this is always the case when we are dealing with buildings close to the rupture surface, so it happens in many examples. This should be a relatively simple change in NNE.py - let me know if you need help with it, we can discuss further.
Just to be clear: We cannot interpolate time histories, so we will need to continue sampling them like we do currently. That should be fine considering the amount of noise in such analyses. -
I suggest considering to store the Event sample in the EVENT.json file instead of carrying it in the AIM.json file. Currently, the SimCenterEvent application doesn't do much and the IMasEDP application is supposed to know that the event data is in the AIM file. I think this is a good example of the implicit assumptions that make our workflow difficult to understand and work with. It would be much more intuitive to find the event sample in the EVENT file. We used to store it there, but we had to put it in the randomVariables part of the file earlier. Now, we could store it in the Events part. I am not sure why we wouldn't want to do that.
-
I did not check the commented out parts of the code that will be used for the multiple grid options. I will return to those once we have discussed and agreed on the above points.
if filter_label == '': | ||
grid_extra_keys = list( | ||
gdf.drop(['geometry'], axis=1).columns | ||
) | ||
|
||
else: | ||
# Else assume GIS files - works will all gis files that geopandas supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: will -> with
if filter_label == '': | ||
grid_extra_keys = list( | ||
gdf.drop(['geometry'], axis=1).columns | ||
) | ||
|
||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The treatment of the generic GIS file option seems to be very similar to that of the GeoJSON. It seems to me that they could be combined by saying that we treat all GIS file like so and the only meaningful difference is how we read the file where we could put an if statement to handle .geojson separately from everything else.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. They are the same. I was trying to avoid breaking compatibility for this symposium release. I can modify this for the next iteration when we deal with multiple event types and regional event sampling.
@@ -312,6 +335,47 @@ def find_neighbors( # noqa: C901, D103 | |||
] * e | |||
|
|||
scale_list = np.ones(len(event_list)) | |||
if file_extension == '.geojson': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this should be elif
- you probably fix it in a future commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also strange that we need to worry about the file_extension at this point. Ideally, we would load and parse the information from the files we support, store them in a data structure we can work with and then wouldn't have to worry about the original file format any longer.
Let's discuss if we can merge the various options here into a single flow so that we don't need these conditional statements to separate csv and geojson and all other files.
nbr_index = ind_list[nbr] | ||
|
||
# if the grid has ground motion records... | ||
if event_type == 'timeHistory': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we pick up the event_type? That part seems to be missing from this clause.
# if the grid has ground motion records... | ||
if event_type == 'timeHistory': | ||
# load the file for the selected grid point | ||
event_collection_file = grid_df.iloc[nbr_index]['GP_file'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will change in a future commit
# get the index of the nth neighbor | ||
nbr_index = ind_list[nbr] | ||
if sample_j >= len(values_list): | ||
values_list.append([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Organizing Values to have sample first and IM types as the second dimension does make sense when we process them later in the workflow, but it makes adding values quite complicated.
It would be worth discussing this to consider the transposed option: IM types first and sample second. That way, we could simply append the new IM types when processing multiple grids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when running simulations, a realization index will be picked, and all IM in this realization should be accessed. This access is easier using my current ordering orientation.
Adding new IM types will only do n times when there are n different grids. The number n should be much smaller than the number access to certain realizations. So, I would suggest not using the transposed option.
} | ||
) | ||
|
||
# the event should be sampled during hazard to asset mapping, so not random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: not -> no
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
# the event should be sampled during hazard to asset mapping, so not random | ||
# variables are needed | ||
event_file['Events'].append({ | ||
"data_dir": str(data_dir), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using 'dataDir' instead of 'data_dir' in the event_file
to be consistent with our camelCase convention in JSON files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also consider making dataDir optional since it is only needed for timeHistory events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to dataDir.
I think in the future, by default, the format should support both IM and timHistory. As a result, by default, the dataDir should be needed. I intended to avoid too many "if" conditions when different types of event formats are used.
gdf = gdf.to_crs(epsg=4326) | ||
|
||
lat_e = gdf['geometry'].apply(lambda pt: pt.y) | ||
lon_e = gdf['geometry'].apply(lambda pt: pt.x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a user sends in a GeoJSON with geometries other than points? It is invalid, but we wouldn't catch that now. I suggest adding a check somewhere to make sure all geometries are indeed points. If that check fails, we should also provide a helpful error message for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in "load_sc_geojson" function
value_list = AIM_in['Events'][0]['SimCenterEvent']['Values'] | ||
label_list = AIM_in['Events'][0]['SimCenterEvent']['Labels'] | ||
evt = EVENT_in['Events'][0] | ||
data_dir = Path(evt['data_dir']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not seem to need the data dir when using SimCenterEvent event type and IMs only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this
I created this pull request to be able to comment on and discuss changes made to accommodate the new EventGrid format.
Note that additional commits and changes are possible while keeping this pull request open. We should merge this PR once we all are happy with the new functionality