Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBD mirroring: rbd.getLastSyncTime panics with "index out of range [1] with length 1" #3388

Closed
ShyamsundarR opened this issue Sep 13, 2022 · 1 comment · Fixed by #3392
Closed
Assignees

Comments

@ShyamsundarR
Copy link
Contributor

Running a simple VolumeRelication resource creation for a PVC backed by RBD, causes the following message/stack in the csi-addons container in the rbd provisioner plugin:

W0913 15:39:11.158185       1 driver.go:162] replication service running on controller server is deprecated and replaced by CSI-Addons, see https://github.com/ceph/ceph-csi/issues/3314 for more details
W0913 15:49:09.668614       1 replicationcontrollerserver.go:123] ID: 20 Req-ID: 0001-0009-rook-ceph-0000000000000001-97db3529-42ca-4119-883c-e8b0760eeaa1 mirroringMode is not set in parameters, setting to mirroringMode to default (snapshot)
W0913 15:49:10.041624       1 replicationcontrollerserver.go:123] ID: 21 Req-ID: 0001-0009-rook-ceph-0000000000000001-97db3529-42ca-4119-883c-e8b0760eeaa1 mirroringMode is not set in parameters, setting to mirroringMode to default (snapshot)
E0913 15:49:10.942540       1 utils.go:227] panic occurred: runtime error: index out of range [1] with length 1
goroutine 131 [running]:
runtime/debug.Stack()
	/usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/ceph/ceph-csi/internal/csi-common.panicHandler.func1()
	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:228 +0xdd
panic({0x1d918a0, 0xc0005c2450})
	/usr/local/go/src/runtime/panic.go:838 +0x207
github.com/ceph/ceph-csi/internal/rbd.getLastSyncTime({0xc0005c2390?, 0xc000a2eab0?})
	/go/src/github.com/ceph/ceph-csi/internal/rbd/replicationcontrollerserver.go:938 +0x14d
github.com/ceph/ceph-csi/internal/rbd.(*ReplicationServer).GetVolumeReplicationInfo(0xc0003c0aa0, {0x2338320, 0xc000a2eab0}, 0xc0007445f0)
	/go/src/github.com/ceph/ceph-csi/internal/rbd/replicationcontrollerserver.go:909 +0x1098
github.com/csi-addons/spec/lib/go/replication._Controller_GetVolumeReplicationInfo_Handler.func1({0x2338320, 0xc000a2eab0}, {0x1d59d00?, 0xc0007445f0})
	/go/src/github.com/ceph/ceph-csi/vendor/github.com/csi-addons/spec/lib/go/replication/replication_grpc.pb.go:256 +0x78
github.com/ceph/ceph-csi/internal/csi-common.panicHandler({0x2338320?, 0xc000a2eab0?}, {0x1d59d00?, 0xc0007445f0?}, 0xc000398120?, 0x1?)
	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:233 +0x82
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x2338320?, 0xc000a2eab0?}, {0x1d59d00?, 0xc0007445f0?})
	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
github.com/ceph/ceph-csi/internal/csi-common.logGRPC({0x2338320, 0xc000a2eab0}, {0x1d59d00?, 0xc0007445f0}, 0x1b21100?, 0xc0003ee140)
	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:208 +0x372
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x2338320?, 0xc000a2eab0?}, {0x1d59d00?, 0xc0007445f0?})
	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
github.com/ceph/ceph-csi/internal/csi-common.contextIDInjector({0x2338320, 0xc000a2e2d0}, {0x1d59d00, 0xc0007445f0}, 0x7fd97cda58d0?, 0xc0003ee160)
	/go/src/github.com/ceph/ceph-csi/internal/csi-common/utils.go:186 +0x13f
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x2338320?, 0xc000a2e2d0?}, {0x1d59d00?, 0xc0007445f0?})
	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x2338320, 0xc000a2e2d0}, {0x1d59d00, 0xc0007445f0}, 0xc000087ac8?, 0x1c0fd60?)
	/go/src/github.com/ceph/ceph-csi/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0xbf
github.com/csi-addons/spec/lib/go/replication._Controller_GetVolumeReplicationInfo_Handler({0x1e57200?, 0xc0003c0aa0}, {0x2338320, 0xc000a2e2d0}, 0xc00037c230, 0xc0009356b0)
	/go/src/github.com/ceph/ceph-csi/vendor/github.com/csi-addons/spec/lib/go/replication/replication_grpc.pb.go:258 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0006d61e0, {0x233db88, 0xc0001de9c0}, 0xc0009cc5a0, 0xc0009359b0, 0x33a51d8, 0x0)
	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:1301 +0xb0b
google.golang.org/grpc.(*Server).handleStream(0xc0006d61e0, {0x233db88, 0xc0001de9c0}, 0xc0009cc5a0, 0x0)
	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:1642 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:938 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/src/github.com/ceph/ceph-csi/vendor/google.golang.org/grpc/server.go:936 +0x28a
E0913 15:49:10.944029       1 utils.go:210] ID: 22 GRPC error: rpc error: code = Internal desc = panic runtime error: index out of range [1] with length 1

The issue seems to stem from the description string read from rbd mirror image status <image-spec> where the post replaying description string is absent.

This possibly requires a fix to check if there is a further JSON blob in the second array index before proceeding with the unmarshal.

@ShyamsundarR
Copy link
Contributor Author

Issue stems from processing the data in local status, rather than peer status. Local status further does not have any , causing the array index issue.

A test patch that I used to overcome the problem as below:

diff --git a/internal/rbd/replicationcontrollerserver.go b/internal/rbd/replicationcontrollerserver.go
index e7d359a55..777934de0 100644
--- a/internal/rbd/replicationcontrollerserver.go
+++ b/internal/rbd/replicationcontrollerserver.go
@@ -836,6 +836,24 @@ func (rs *ReplicationServer) ResyncVolume(ctx context.Context,
        return resp, nil
 }
 
+// RemoteStatus returns one SiteMirrorImageStatus item from the SiteStatuses
+// slice that corresponds to the remote site's status. If the remote status
+// is not found than the error ErrNotExist will be returned.
+func RemoteStatus(gmis *librbd.GlobalMirrorImageStatus) (librbd.SiteMirrorImageStatus, error) {
+       var (
+               ss  librbd.SiteMirrorImageStatus
+               err error = librbd.ErrNotExist
+       )
+       for i := range gmis.SiteStatuses {
+               if gmis.SiteStatuses[i].MirrorUUID != "" {
+                       ss = gmis.SiteStatuses[i]
+                       err = nil
+                       break
+               }
+       }
+       return ss, err
+}
+
 // GetVolumeReplicationInfo extracts the RBD volume information from the volumeID, If the
 // image is present, mirroring is enabled and the image is in primary state.
 func (rs *ReplicationServer) GetVolumeReplicationInfo(ctx context.Context,
@@ -897,15 +915,18 @@ func (rs *ReplicationServer) GetVolumeReplicationInfo(ctx context.Context,
 
                return nil, status.Error(codes.Internal, err.Error())
        }
+       log.ErrorLog(ctx, "mirrorStatus: %v", mirrorStatus)
 
-       localStatus, err := mirrorStatus.LocalStatus()
+       remoteStatus, err := RemoteStatus(mirrorStatus)
        if err != nil {
                log.ErrorLog(ctx, err.Error())
 
                return nil, fmt.Errorf("failed to get local status: %w", err)
        }
+       log.ErrorLog(ctx, "remoteStatus: %v", remoteStatus)
 
-       description := localStatus.Description
+       description := remoteStatus.Description
+       log.ErrorLog(ctx, "Description: %s %v", description, description)
        lastSyncTime, err := getLastSyncTime(description)
        if err != nil {
                return nil, fmt.Errorf("failed to get last sync time: %w", err)

yati1998 added a commit to yati1998/ceph-csi that referenced this issue Sep 14, 2022
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: ceph#3388

Signed-off-by: Yati Padia <[email protected]>
Co-authored-by: shyam Ranganathan <[email protected]>
yati1998 added a commit to yati1998/ceph-csi that referenced this issue Sep 14, 2022
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: ceph#3388

Signed-off-by: Yati Padia <[email protected]>
Co-authored-by: shyam Ranganathan <[email protected]>
yati1998 added a commit to yati1998/ceph-csi that referenced this issue Sep 14, 2022
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: ceph#3388

Signed-off-by: Yati Padia <[email protected]>
Co-authored-by: shyam Ranganathan <[email protected]>
@mergify mergify bot closed this as completed in #3392 Sep 14, 2022
mergify bot pushed a commit that referenced this issue Sep 14, 2022
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: #3388

Signed-off-by: Yati Padia <[email protected]>
Co-authored-by: shyam Ranganathan <[email protected]>
yati1998 added a commit to yati1998/ceph-csi that referenced this issue Sep 14, 2022
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: ceph#3388

Signed-off-by: Yati Padia <[email protected]>
Co-authored-by: shyam Ranganathan <[email protected]>
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/ceph-csi that referenced this issue Sep 14, 2022
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: ceph#3388

Signed-off-by: Yati Padia <[email protected]>
Co-authored-by: shyam Ranganathan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants