Event grid updates #366

zsarnoczay · 2025-01-23T17:57:23Z

I created this pull request to be able to comment on and discuss changes made to accommodate the new EventGrid format.

Note that additional commits and changes are possible while keeping this pull request open. We should merge this PR once we all are happy with the new functionality

zsarnoczay

Great work, Jinyan, I think we are very close! I added a few comments in the code and also have a few higher-level concerns/suggestions. I'll list those numbered below so that you can easily reference them in your response using the numbers.

Currently, we have two methods to provide filtering functionality in NNE.py. We allow users to have a filter_label and also allow them to add columns with the "filter_" prefix. I like what you did to make the latter approach more robust and I think it is in every way superior to the former one. I suggest dropping the filter_label approach and converting our example to use the "filter_" prefix. A minor related suggestion is to generalize our comments to discuss filtering for any feature, not only soil types or some other geotechnical characteristic. What we have is agnostic to the feature as far as I understand.
I think mixing file input/output operations with the core sampling functionality in NNE is a big step backwards in code quality. Those were intentionally separated in the earlier version of the code and I strongly recommend keeping them separate to follow best practices. (I am talking about lines 188 - 331 in NNE.py)
I think it makes sense to keep supporting the old CSV format for EventGrids, but I don't think this should lead to if statements all over our applications. Instead, I would take care of this in the first tool that uses the event grids: NNE.py. Within that tool, we should start with parsing whatever event information is provided and storing the data in an internal object - whatever we find convenient, probably a dataframe or geodataframe. After that point, we should not need to worry about what kind of input was provided by the user and the scripts should work just fine.
A related, but smaller point concerns the other GIS inputs. These should be supported the same way. Make sure we can parse them; if they have missing or incorrect data, throw a meaningful error, otherwise, they should just work with the rest of the codebase.
When dealing with IMs, I thought we were replacing the multinomial random sampling of neighbors with a nonlinear interpolation using IMs and weights from neighbors. This will always provide a smooth IM field which is almost surely the preferred outcome over the noisy, jaggered IM field that comes out of the NNE.py right now. The difference is especially pronounced when we have large grid spacing with substantial changes in IM values assigned to nearby grid points -> note that this is always the case when we are dealing with buildings close to the rupture surface, so it happens in many examples. This should be a relatively simple change in NNE.py - let me know if you need help with it, we can discuss further.
Just to be clear: We cannot interpolate time histories, so we will need to continue sampling them like we do currently. That should be fine considering the amount of noise in such analyses.
I suggest considering to store the Event sample in the EVENT.json file instead of carrying it in the AIM.json file. Currently, the SimCenterEvent application doesn't do much and the IMasEDP application is supposed to know that the event data is in the AIM file. I think this is a good example of the implicit assumptions that make our workflow difficult to understand and work with. It would be much more intuitive to find the event sample in the EVENT file. We used to store it there, but we had to put it in the randomVariables part of the file earlier. Now, we could store it in the Events part. I am not sure why we wouldn't want to do that.
I did not check the commented out parts of the code that will be used for the multiple grid options. I will return to those once we have discussed and agreed on the above points.

zsarnoczay · 2025-01-23T18:04:15Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

+        if filter_label == '':
+            grid_extra_keys = list(
+                gdf.drop(['geometry'], axis=1).columns
+            )

    else:
        # Else assume GIS files - works will all gis files that geopandas supports


typo: will -> with

zsarnoczay · 2025-01-23T18:07:09Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

+        if filter_label == '':
+            grid_extra_keys = list(
+                gdf.drop(['geometry'], axis=1).columns
+            )

    else:


The treatment of the generic GIS file option seems to be very similar to that of the GeoJSON. It seems to me that they could be combined by saying that we treat all GIS file like so and the only meaningful difference is how we read the file where we could put an if statement to handle .geojson separately from everything else.
What do you think?

Yes. They are the same. I was trying to avoid breaking compatibility for this symposium release. I can modify this for the next iteration when we deal with multiple event types and regional event sampling.

zsarnoczay · 2025-01-23T18:08:21Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

@@ -312,6 +335,47 @@ def find_neighbors(  # noqa: C901, D103
                    ] * e

                scale_list = np.ones(len(event_list))
+        if file_extension == '.geojson':


Again, this should be elif - you probably fix it in a future commit

It is also strange that we need to worry about the file_extension at this point. Ideally, we would load and parse the information from the files we support, store them in a data structure we can work with and then wouldn't have to worry about the original file format any longer.
Let's discuss if we can merge the various options here into a single flow so that we don't need these conditional statements to separate csv and geojson and all other files.

zsarnoczay · 2025-01-23T18:24:37Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

+                nbr_index = ind_list[nbr]
+
+                # if the grid has ground motion records...
+                if event_type == 'timeHistory':


Where do we pick up the event_type? That part seems to be missing from this clause.

zsarnoczay · 2025-01-23T18:26:26Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

+                # if the grid has ground motion records...
+                if event_type == 'timeHistory':
+                    # load the file for the selected grid point
+                    event_collection_file = grid_df.iloc[nbr_index]['GP_file']


I assume this will change in a future commit

zsarnoczay · 2025-01-23T19:36:10Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

+                # get the index of the nth neighbor
+                nbr_index = ind_list[nbr]
+                if sample_j >= len(values_list):
+                    values_list.append([])


Organizing Values to have sample first and IM types as the second dimension does make sense when we process them later in the workflow, but it makes adding values quite complicated.
It would be worth discussing this to consider the transposed option: IM types first and sample second. That way, we could simply append the new IM types when processing multiple grids.

I think when running simulations, a realization index will be picked, and all IM in this realization should be accessed. This access is easier using my current ordering orientation.

Adding new IM types will only do n times when there are n different grids. The number n should be much smaller than the number access to certain realizations. So, I would suggest not using the transposed option.

zsarnoczay · 2025-01-23T19:42:57Z

modules/createEVENT/SimCenterEvent/SimCenterEvent.py

-                }
-            )
-
+        # the event should be sampled during hazard to asset mapping, so not random


typo: not -> no

zsarnoczay · 2025-01-23T19:43:50Z

modules/createEVENT/SimCenterEvent/SimCenterEvent.py

+        # the event should be sampled during hazard to asset mapping, so not random
+        # variables are needed
+        event_file['Events'].append({
+            "data_dir": str(data_dir),


I suggest using 'dataDir' instead of 'data_dir' in the event_file to be consistent with our camelCase convention in JSON files

Also consider making dataDir optional since it is only needed for timeHistory events

changed to dataDir.

I think in the future, by default, the format should support both IM and timHistory. As a result, by default, the dataDir should be needed. I intended to avoid too many "if" conditions when different types of event formats are used.

zsarnoczay · 2025-01-23T19:58:03Z

modules/performRegionalMapping/NearestNeighborEvents/NNE.py

+            gdf = gdf.to_crs(epsg=4326)
+
+        lat_e = gdf['geometry'].apply(lambda pt: pt.y)
+        lon_e = gdf['geometry'].apply(lambda pt: pt.x)


What happens if a user sends in a GeoJSON with geometries other than points? It is invalid, but we wouldn't catch that now. I suggest adding a check somewhere to make sure all geometries are indeed points. If that check fails, we should also provide a helpful error message for the user.

Added in "load_sc_geojson" function

zsarnoczay · 2025-01-23T20:13:23Z

modules/performSIMULATION/IMasEDP/IMasEDP.py

+        value_list = AIM_in['Events'][0]['SimCenterEvent']['Values']
+        label_list = AIM_in['Events'][0]['SimCenterEvent']['Labels']
+        evt = EVENT_in['Events'][0]
+        data_dir = Path(evt['data_dir'])


We do not seem to need the data dir when using SimCenterEvent event type and IMs only

Removed this

jinyan1214 added 8 commits January 7, 2025 12:14

jz: Load geojson event grid in nearest neighbor HTA mapping

5ca6738

Merge remote-tracking branch 'upstream/master' into EventGrid

b1a2c75

debug so old csv format will work

bf12f90

NNE geojson format for IntensityMeasures

4cac1d7

SimCenterEvent and NNE for IMs

6982dee

Merge remote-tracking branch 'upstream/master' into EventGrid

4bb4cb9

Merge remote-tracking branch 'upstream/master' into EventGrid

9e46054

multi grid for IM type analysis

70e54c3

zsarnoczay commented Jan 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event grid updates #366

Event grid updates #366

zsarnoczay commented Jan 23, 2025

zsarnoczay left a comment

zsarnoczay Jan 23, 2025

zsarnoczay Jan 23, 2025

jinyan1214 Jan 23, 2025

zsarnoczay Jan 23, 2025

zsarnoczay Jan 23, 2025

zsarnoczay Jan 23, 2025

zsarnoczay Jan 23, 2025

zsarnoczay Jan 23, 2025

jinyan1214 Jan 23, 2025

zsarnoczay Jan 23, 2025

jinyan1214 Jan 23, 2025

zsarnoczay Jan 23, 2025

zsarnoczay Jan 23, 2025

jinyan1214 Jan 23, 2025

zsarnoczay Jan 23, 2025

jinyan1214 Jan 23, 2025

zsarnoczay Jan 23, 2025

jinyan1214 Jan 23, 2025 •

edited

Loading

Event grid updates #366

Are you sure you want to change the base?

Event grid updates #366

Conversation

zsarnoczay commented Jan 23, 2025

zsarnoczay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinyan1214 Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

jinyan1214 Jan 23, 2025 •

edited

Loading