Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition with memory cache? #2560

Closed
Tolriq opened this issue Nov 3, 2017 · 19 comments
Closed

Race condition with memory cache? #2560

Tolriq opened this issue Nov 3, 2017 · 19 comments

Comments

@Tolriq
Copy link
Contributor

Tolriq commented Nov 3, 2017

Glide Version:
4.2, 4.3

Integration libraries:

Device/Android Version:

Issue details / Repro steps / Use case background:

When starting at the same time the same request to the same model (With correct hashCode/Model implementation) with different submit(x,y) values, you most of the time get
Finished loading Bitmap from RESOURCE_DISK_CACHE
and not
Finished loading Bitmap from MEMORY_CACHE

Meaning image is loaded from disk and not memory cache.

When targeting the same size 90% of the time requests are from memory.
When adding a delay between the calls, 100% of the time the image is correctly returned from memory cache.

It is important to note that it's not a problem with purging of the memory cache, as all other images are still correctly returned from memory cache.

Glide load line / GlideModule (if any) / list Adapter code (if any):

GlideApp.with(appcontext).load(custommodel).buildAsBitmap()
.diskCacheStrategy(DiskCacheStrategy.RESOURCE)
.submit(width, height).get(10, TimeUnit.SECONDS)

Stack trace / LogCat:
One of the request is sometimes from disk when using same size

11-03 12:41:41.663 32580-32580/V/Engine: Started new load in 0.591834ms, key: EngineKey{model=ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F}, width=256, height=256, resourceClass=class android.graphics.Bitmap, transcodeClass=class android.graphics.Bitmap, signature=EmptySignature, hashCode=-1873111732, transformations={}, options=Options{values={Option{key='com.bumtpech.glide.load.resource.bitmap.Downsampler.AllowHardwareDecode'}=false, Option{key='com.bumptech.glide.load.resource.bitmap.Downsampler.DecodeFormat'}=PREFER_ARGB_8888, Option{key='com.bumptech.glide.load.resource.gif.GifOptions.DecodeFormat'}=PREFER_ARGB_8888}}}
11-03 12:41:41.691 32580-32580/D/Glide: Finished loading Bitmap from RESOURCE_DISK_CACHE for ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F} with size [256x256] in 28.292082999999998 ms
11-03 12:41:41.734 32580-32580/D/Glide: Finished loading Bitmap from MEMORY_CACHE for ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F} with size [256x256] in 0.40262499999999996 ms
11-03 12:41:41.735 32580-32580/V/Engine: Loaded resource from active resources in 0.4775ms, key: EngineKey{model=ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F}, width=256, height=256, resourceClass=class android.graphics.Bitmap, transcodeClass=class android.graphics.Bitmap, signature=EmptySignature, hashCode=-1873111732, transformations={}, options=Options{values={Option{key='com.bumtpech.glide.load.resource.bitmap.Downsampler.AllowHardwareDecode'}=false, Option{key='com.bumptech.glide.load.resource.bitmap.Downsampler.DecodeFormat'}=PREFER_ARGB_8888, Option{key='com.bumptech.glide.load.resource.gif.GifOptions.DecodeFormat'}=PREFER_ARGB_8888}}}

When using different target size all requests are from disk.

11-03 12:45:25.078 4034-4034/V/Engine: Started new load in 0.316042ms, key: EngineKey{model=ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F}, width=256, height=256, resourceClass=class android.graphics.Bitmap, transcodeClass=class android.graphics.Bitmap, signature=EmptySignature, hashCode=1444532587, transformations={}, options=Options{values={Option{key='com.bumtpech.glide.load.resource.bitmap.Downsampler.AllowHardwareDecode'}=false, Option{key='com.bumptech.glide.load.resource.bitmap.Downsampler.DecodeFormat'}=PREFER_ARGB_8888, Option{key='com.bumptech.glide.load.resource.gif.GifOptions.DecodeFormat'}=PREFER_ARGB_8888}}}
11-03 12:45:25.099 4034-4034/D/Glide: Finished loading Bitmap from RESOURCE_DISK_CACHE for ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F} with size [256x256] in 21.784125 ms
11-03 12:45:25.134 4034-4034/V/Engine: Started new load in 0.644042ms, key: EngineKey{model=ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F}, width=-2147483648, height=-2147483648, resourceClass=class android.graphics.Bitmap, transcodeClass=class android.graphics.Bitmap, signature=EmptySignature, hashCode=-1825984149, transformations={}, options=Options{values={Option{key='com.bumtpech.glide.load.resource.bitmap.Downsampler.AllowHardwareDecode'}=false, Option{key='com.bumptech.glide.load.resource.bitmap.Downsampler.DecodeFormat'}=PREFER_ARGB_8888, Option{key='com.bumptech.glide.load.resource.gif.GifOptions.DecodeFormat'}=PREFER_ARGB_8888}}}
11-03 12:45:25.167 4034-4034/D/Glide: Finished loading Bitmap from RESOURCE_DISK_CACHE for ImageRequest{url='http://192.168.1.80:8080/image/image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F', onlyCached=false, keepTransparency=true, cacheKey=image%3A%2F%2Fmusic%40smb%253a%252f%252fdiskstation%252fred%252fmusic%252f%255bDivers%255d%252fCraig%2520David%2520-%25207%2520Days.mp3%2F} with size [-2147483648x-2147483648] in 33.877917 ms
@sjudd
Copy link
Collaborator

sjudd commented Nov 3, 2017

I'm not totally sure this is the issue you're describing, but, if you start two requests simultaneously for an image that's in the disk cache, but not in the memory cache, I'd expect both of those requests to complete with RESOURCE_DISK_CACHE as their source.

Glide will start the first request asynchronously, then attach the second request as a listener to the first. When the load completes, both the listener for the first request and the second request are notified with the source where the image was loaded from.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 3, 2017

@sjudd The images are in memory or should be. The result is consistent for every calls.

I found that when changing to support Android Auto and lacked a check, this resulted in updating both notification and MediaSession at the same time every second with those 2 nearly simultaneous calls.

This leaded to load from disk cache every seconds for the 2 calls. I then tried some tests to have more data (The test with same size, then test with delay). I can understand the first calls are from disk cache, but all future should not be.

That's why I suspect a race condition.

@sjudd
Copy link
Collaborator

sjudd commented Nov 4, 2017

Are you sure the items were in the memory cache? It's a little hard to measure, but custom logging or patience with the debugger might eventually work.

The memory cache is managed entirely on the main thread, so there shouldn't be race conditions. It's possible there's some place where that condition doesn't hold, but that would be easy to verify by creating a subclass of the LruCache an adding a main thread assertion to every call.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 4, 2017

Well race condition are complicated to debug without understanding all the library internals :(

Logs does not tell if the memory cache is tested at all and why it fails. And adding breakpoints would create the delay and remove the issue. And the issue could be reading or writing to it after.
Maybe there's a log tag to enable to have more info to give you?

Maybe I'm not clear but the problem is that

At T=0 I do the 2 calls, both are returned from disk (could be OK)
at T=1s I do the 2 calls, both are returned from disk (They should be in memory and returned from it now)
at T=2s I do the 2 calls, both are returned from disk (They should be in memory and returned from it now)
....

With same size in request at T=1 or 2 or 3, 90% both are from memory, 10% 1 or 2 are from disk.

When adding a delay between the calls at T=1, 2, 3 ... 100% are from memory.

Since with a delay all are in memory I can assume that putting to memory cache works correctly for the model and query types so that it should work in the other cases too.

From the engine logs the query are the exact same and all parameters in the logs confirms that including hashcode.

@sjudd
Copy link
Collaborator

sjudd commented Nov 6, 2017

I made an attempt to write a test case for what you described, let me know what else is missing from there: sjudd@6849da3.

I don't think there are any logs, but you could try to debug evictions from the memory cache:

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 6, 2017

I do not use override but into(x,y) and I use a custom model that implement hashcode/equals and the Model interface don't know if any of those have impacts.
I also use Priority.HIGH and disallowHardwareConfig()

I'll see if I can get more logs today.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 6, 2017

@sjudd I tried to log the cache eviction but it seems it does not work.

In custom GlideModule:

   @Override
    public void applyOptions(Context context, GlideBuilder builder) {
        LruResourceCache memcache = new LruResourceCache(new MemorySizeCalculator.Builder(context).build().getMemoryCacheSize());
        memcache.setResourceRemovedListener(removed -> Logger.logError("Glide", "Resource removed: %s", removed));
        builder.setMemoryCache(memcache);

And the listener is never called (But I do see ressources properly switching from loaded from memory to disk when scrolling in heavy image recyclerview so there is correct eviction.)

Trying to debug it seems the engine will override the listener :(

So I went to a child class and overriding and then I can see that there's no evict.
But I can see puts from normal usage when the fragments are unloaded, no puts for those the into() and I can't get any get to be logged. So I'm completely lost :(

    private static class MemCache extends LruResourceCache {
        public MemCache(int size) {
            super(size);
        }

        @Override
        protected void onItemEvicted(Key key, Resource<?> item) {
            Logger.logError("Glide", "onItemEvicted: %s / %s", key, item);
            super.onItemEvicted(key, item);
        }

        @Override
        public synchronized Resource<?> put(Key key, Resource<?> item) {
            Logger.logError("Glide", "put: %s / %s", key, item);
            return super.put(key, item);
        }

        @Nullable
        @Override
        public synchronized Resource<?> get(Key key) {
            Logger.logError("Glide", "get: %s", key);
            return super.get(key);
        }
    }

@sjudd
Copy link
Collaborator

sjudd commented Nov 6, 2017

Try logging remove() in the LruCache instead of get(). The memory cache in Glide is actually composed of two parts, the LruCache and active resources. Active resources in Engine keeps track of the set of images that someone is currently using. It's not size bounded because the contents is already in use, so adding an additional consumer isn't expensive. Only when items are no longer in use by anyone are they added to the size bounded LruCache. If someone acquires a resource from the LruCache, it gets moved back into actives resources. As a result, we never get ()from the LruCache, we only ever put() or remove().

If we know the cache its self isn't involved, the next thing to look at is active resources.

Active resources uses weak values, so it's possible that the values are being gced. That could happen if the Target returned by .submit() isn't referenced. In the foreground typically Targets are referenced by Views so this would happen infrequently.

Looking at the code again, I'd bet that's it. You can test this by just retaining a reference to the Future returned by submit() and seeing if that fixes the issue. I may also be able to reproduce this in a test.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 6, 2017

@sjudd ok so after logging remove I see the calls and they always return null for the bitmaps (All works as you explained for the Drawable targets)

So it seems asBitmap().into(x,y).get(10,SECONDS) never put back the images in memory cache and it only works when get from active ressources?

But that does not explain why it only occurs with simultaneous calls :(

Not sure how I can keep a ref to the future as those calls are made from static helper functions.

@sjudd
Copy link
Collaborator

sjudd commented Nov 6, 2017

For testing you could probably just use a random static collection.

I wasn't able to reproduce the scenario I described in an emulator, but I think that's more due to WeakReference/gc issues.

I was able to reproduce the scenario in a unit test: cdb21f6.

If I'm right, it's basically a matter of getting the system to collect the weak references. That may be more frequent under more memory pressure, or it may be more or less random.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 6, 2017

Adding ref to FutureTarget makes the query always return from MEMORY with "Loaded resource from active resources"

But then I'm not sure to fully understand the issue and the possible solution.
Keeping ref makes the ressources stay active, but it's a ref to a future and not the actual usage of the bitmap and returns image from active bitmaps.

Not keeping ref, does not keep image in active but do not put it in memory cache either. (This is the part I do not understand if there's no ref then the image should return to memory cache no?)

Solution seems to keep a ref to last futuretarget but does calling GlideApp.clear on that futuretarget when I suppose no more used works and properly release everything ?

@sjudd
Copy link
Collaborator

sjudd commented Nov 6, 2017

Yes the long term solution in Glide is to make sure that when active resources is cleared because a reference is removed, the image is still added to the memory cache. There is a caveat that without an explicit clear it won't be safe for us to re-use the Bitmap associated with the resource, even if we can add it to the memory cache.

Yes you should be able to solve this for now by retaining a reference to the FutureTarget.

Yes you can clear the FutureTarget whenever you're done with the Bitmap/Drawable to allow Glide to re-use the image as expected. In fact it's the recommended way of dealing with Glide's Futures on background threads, especially for cases where you're loading lots of images in a row.

Does that help?

@sjudd sjudd added this to the 4.4 milestone Nov 6, 2017
@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 6, 2017

Well still unsure how to deal with that correctly :(

Currently for MediaSession meta data I do get the bitmap each time to avoid caching it myself.

So every time there's a change I call Glide to get the bitmap as I assumed it would always come from memory cache if the same image is request multiple times in a row. (All that in a static method that returns a MediaMetadataCompat but it's not important)

But since it's a future I'd need to keep previous + new and only clear previous when new is actually sent to MediaSession.

But doing so the clear happens too late and new will still load from disk. If I clear too early I guess it could lead to problem in MediaSession as glide could reuse / recycle. And if I reuse the same future target, there's still chance GC remove the image before new call actually ends.

So not really sure what's the correct approach except caching the bitmap myself.

@sjudd
Copy link
Collaborator

sjudd commented Nov 6, 2017

Can you clarify what you mean by:

But doing so the clear happens too late and new will still load from disk.

If the clear happens late that's ok from the caching perspective. Clear isn't required for the memory cache to be used, it's only required to allow Glide to re-use the resources (Bitmap in this case) for a subsequent load.

You don't actually have to clear at all, it's just a performance optimization. The only work around you need to get memory caching working is to retain a reference to the FutureTarget until you're done with the corresponding Bitmap.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 7, 2017

I was thinking about future fix and using memory cache, was not thinking about your workaround that is to use the active ressources. I'll try with the workaround for now thanks.

@sjudd sjudd added bug and removed enhancement labels Nov 8, 2017
sjudd added a commit to sjudd/glide that referenced this issue Nov 8, 2017
Previously resources that were dereferenced but not released vanished 
into the ether. As a result you could load a Bitmap and then immediately 
after load the same Bitmap again and get a memory cache miss on the 
second load. This pattern is particularly common with the submit() APIs 
that load images on background threads where the Target is neither 
referenced by a view nor referenced directly.

Fixes bumptech#2560
sjudd added a commit to sjudd/glide that referenced this issue Nov 10, 2017
Previously resources that were dereferenced but not released vanished 
into the ether. As a result you could load a Bitmap and then immediately 
after load the same Bitmap again and get a memory cache miss on the 
second load. This pattern is particularly common with the submit() APIs 
that load images on background threads where the Target is neither 
referenced by a view nor referenced directly.

Fixes bumptech#2560
sjudd added a commit to sjudd/glide that referenced this issue Nov 11, 2017
Previously resources that were dereferenced but not released vanished
into the ether. As a result you could load a Bitmap and then immediately
after load the same Bitmap again and get a memory cache miss on the
second load. This pattern is particularly common with the submit() APIs
that load images on background threads where the Target is neither
referenced by a view nor referenced directly.

Fixes bumptech#2560
@sjudd
Copy link
Collaborator

sjudd commented Nov 13, 2017

So I have a fix for this, the tradeoff is that Glide will retain some extra memory.

We'll keep a soft reference to the resource we pass out to consumers, but a hard reference to the actual data. If the resource we pass out is GCed, we release the data back into the memory cache. We won't be able to re-use any resources returned to Glide like this because we can't know if the caller is still using the actual Bitmap.

The hard reference Glide will retain may transiently increase memory usage because there's not a defined time between when the resource we pass out is GCed and when our weak reference is cleared.

It's always going to be good practice to retain a reference to the FutureTarget returned from submit() and to call clear() on it when you're done with the resource. It will allow Glide to re-use the resource (Bitmap) and it will also free memory faster. I've updated some documentation to indicate this (http://bumptech.github.io/glide/doc/getting-started.html#background-threads) and I'll try and go through the rest in the near future.

@Tolriq
Copy link
Contributor Author

Tolriq commented Nov 13, 2017

Thanks can you still confirm when to use the clear call?

When I know that I will not need the image called again or when I know that the bitmap is no more used anywhere (This one is kinda impossible)

@sjudd sjudd closed this as completed in 0209662 Nov 13, 2017
@sjudd
Copy link
Collaborator

sjudd commented Nov 13, 2017

clear() should be used the same on background threads as it is on foreground threads.

In particular, call clear() whenever you're done with this particular usage of the image. For example, in RecyclerView you would call clear() in onViewRecycled() or in onBindView() when you're going to load a new image.

Glide uses reference counting, so each load increments the reference count and each clear() decrements the reference count. You want to decrement the reference count whenever you're done with the resource in the scope of your particular request.

If you have many places across your application using the resource with different load calls, each one can call clear() independently when they're done with the resource and Glide will avoid recycling or re-using the Bitmap until all areas of the app do so.

@sjudd
Copy link
Collaborator

sjudd commented Nov 13, 2017

Also I should note that Glide is permissive. We won't re-use or recycle the Bitmap if you never call clear() or just allow the Target/View to be garbage collected. Using clear() is an optimization and only required for correctness if you want Glide not to finish an in progress load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants