Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump jmespath from 0.10.0 to 1.0.1 #8

Closed
wants to merge 99 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
c97d0eb
update
jameslahm Dec 16, 2024
4490528
add clip
jameslahm Dec 16, 2024
c85abf5
update clip
jameslahm Dec 16, 2024
58d893d
add lvis
jameslahm Dec 16, 2024
2ca984c
add data and install
jameslahm Dec 16, 2024
efbe0b7
add cache
jameslahm Dec 16, 2024
5dc4b93
update install.sh
jameslahm Dec 16, 2024
88662eb
add setup.py
jameslahm Dec 16, 2024
b771f0b
add grounding and label embedding cache
jameslahm Dec 21, 2024
899c26b
add grounding neg padding
jameslahm Dec 21, 2024
4ee2f11
add val interval
jameslahm Dec 21, 2024
4fd2deb
update
jameslahm Dec 21, 2024
0a86414
add mobileclip
jameslahm Dec 21, 2024
0e30c36
remove threshold in global neg padding
jameslahm Dec 21, 2024
8e77cd1
Merge branch 'main' of wa24.git.tsinghua.edu.cn:wanga24/yolo-world
jameslahm Dec 21, 2024
a9fe5e9
sync main
jameslahm Dec 21, 2024
9bb0ea3
use mobileclips0
jameslahm Dec 24, 2024
e218a59
add text model check
jameslahm Dec 27, 2024
f65bc72
add pan
jameslahm Dec 27, 2024
ec96fa4
use actual imgsz for flops
jameslahm Dec 27, 2024
4081372
update
jameslahm Dec 27, 2024
55ff9a3
add speed bench
jameslahm Dec 27, 2024
7406a7d
use clip as default
jameslahm Dec 27, 2024
9e50fcf
update
jameslahm Dec 28, 2024
881c7e2
Merge branch 'main' into mobileclip
leonnil Dec 28, 2024
81bb834
update
jameslahm Dec 27, 2024
41d6709
fix attn block fuse
leonnil Dec 28, 2024
ce54d23
Merge branch 'main' into mobileclip
leonnil Dec 28, 2024
333d9a5
Merge branch 'mobileclip' into vl-head
leonnil Dec 28, 2024
0ac3137
update vlhead
leonnil Dec 28, 2024
dcc1690
update scale args
jameslahm Dec 29, 2024
abfed1a
update vlhead
jameslahm Dec 29, 2024
5b2caca
fix fuse
jameslahm Dec 29, 2024
ba7cd9c
Merge branch 'mobileclip' into vl-head
leonnil Dec 30, 2024
7bdd5dc
add 11
jameslahm Dec 31, 2024
345cca8
update
leonnil Dec 31, 2024
1bf5bd0
update
jameslahm Jan 3, 2025
afe5599
rename vldetect
jameslahm Jan 4, 2025
9cf4ed9
update
jameslahm Jan 7, 2025
ef5a343
update
jameslahm Jan 7, 2025
870ce93
add imgsz 800
jameslahm Jan 10, 2025
678c702
update
jameslahm Jan 15, 2025
1d7b19c
update
jameslahm Feb 12, 2025
5ca53b3
update
jameslahm Feb 12, 2025
ee43de6
update
jameslahm Feb 12, 2025
25cccaf
update
jameslahm Feb 12, 2025
e05a3a4
update
jameslahm Feb 12, 2025
16707dd
update
jameslahm Feb 12, 2025
c03b4f3
update
jameslahm Feb 12, 2025
3b98889
update
jameslahm Feb 12, 2025
f6ea7d1
update
jameslahm Feb 12, 2025
8f55402
update
jameslahm Feb 12, 2025
f9e41b8
update
jameslahm Feb 10, 2025
8128586
add llm
jameslahm Feb 12, 2025
199659e
update
jameslahm Feb 12, 2025
fdd59a5
update
jameslahm Feb 13, 2025
c7fa175
update
jameslahm Feb 13, 2025
01a8bb6
update
jameslahm Feb 13, 2025
09145e3
add single cls
jameslahm Feb 13, 2025
40b72c1
update
jameslahm Feb 13, 2025
a5c4386
update
jameslahm Feb 13, 2025
e34995d
update
jameslahm Feb 14, 2025
bfa6758
update
leonnil Feb 15, 2025
0b7efc4
update
leonnil Feb 15, 2025
1ceed66
update
leonnil Feb 15, 2025
4891335
update
leonnil Feb 15, 2025
d0d5149
update
jameslahm Feb 15, 2025
f9c1ace
update
jameslahm Feb 16, 2025
c83b575
update
jameslahm Feb 17, 2025
da2477e
update
jameslahm Feb 17, 2025
18b3b5e
update
jameslahm Feb 17, 2025
79573a8
update
jameslahm Feb 18, 2025
dc00db9
update
Even-ok Feb 18, 2025
eeeb734
update
jameslahm Feb 18, 2025
8fc6a13
update
Even-ok Feb 18, 2025
f0e7cce
update
jameslahm Feb 19, 2025
b52f8ba
update
Even-ok Feb 22, 2025
7744a8b
update
jameslahm Feb 24, 2025
e4785c5
update
jameslahm Feb 26, 2025
2f167b4
update
leonnil Feb 24, 2025
e365098
update
jameslahm Feb 23, 2025
767dc75
update
leonnil Feb 24, 2025
1701b49
Merge branch 'test-coco' into yoloe
jameslahm Mar 9, 2025
4ab179c
Merge branch 'test-pf-oe' into yoloe
jameslahm Mar 9, 2025
f72b6cb
update
jameslahm Mar 9, 2025
d174aec
update
jameslahm Mar 9, 2025
941c205
update
jameslahm Mar 9, 2025
316ba46
update
jameslahm Mar 9, 2025
7615860
update
jameslahm Mar 9, 2025
fe198eb
update
jameslahm Mar 9, 2025
8e3904b
update
jameslahm Mar 9, 2025
0306a72
update
jameslahm Mar 9, 2025
cf1d9b2
update gradio
jameslahm Mar 9, 2025
fb809cd
update
jameslahm Mar 9, 2025
d73e9a4
update
jameslahm Mar 10, 2025
381babf
update
jameslahm Mar 10, 2025
e68f236
update
jameslahm Mar 10, 2025
18ae42c
update
jameslahm Mar 10, 2025
1e0d191
Bump jmespath from 0.10.0 to 1.0.1
dependabot[bot] Mar 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
requirements.txt
setup.py
ultralytics.egg-info

# PyInstaller
Expand Down Expand Up @@ -171,3 +169,7 @@ pnnx*

# calibration image
calibration_*.npy

ignore
exports
pretrain
28 changes: 28 additions & 0 deletions CLIP/.github/workflows/format.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Ultralytics 🚀 - AGPL-3.0 License https://ultralytics.com/license
# Ultralytics Actions https://github.com/ultralytics/actions
# This workflow automatically formats code and documentation in PRs to official Ultralytics standards

name: Ultralytics Actions

on:
issues:
types: [opened]
pull_request:
branches: [main]
types: [opened, closed, synchronize, review_requested]

jobs:
format:
runs-on: ubuntu-latest
steps:
- name: Run Ultralytics Formatting
uses: ultralytics/actions@main
with:
token: ${{ secrets._GITHUB_TOKEN }} # note GITHUB_TOKEN automatically generated
labels: true # autolabel issues and PRs
python: true # format Python code and docstrings
prettier: true # format YAML, JSON, Markdown and CSS
spelling: true # check spelling
links: false # check broken links
summary: true # print PR summary with GPT4o (requires 'openai_api_key')
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
33 changes: 33 additions & 0 deletions CLIP/.github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: test
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
CLIP-test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.8]
pytorch-version: [1.7.1, 1.9.1, 1.10.1]
include:
- python-version: 3.8
pytorch-version: 1.7.1
torchvision-version: 0.8.2
- python-version: 3.8
pytorch-version: 1.9.1
torchvision-version: 0.10.1
- python-version: 3.8
pytorch-version: 1.10.1
torchvision-version: 0.11.2
steps:
- uses: conda-incubator/setup-miniconda@v2
- run: conda install -n test python=${{ matrix.python-version }} pytorch=${{ matrix.pytorch-version }} torchvision=${{ matrix.torchvision-version }} cpuonly -c pytorch
- uses: actions/checkout@v2
- run: echo "$CONDA/envs/test/bin" >> $GITHUB_PATH
- run: pip install pytest
- run: pip install .
- run: pytest
10 changes: 10 additions & 0 deletions CLIP/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
__pycache__/
*.py[cod]
*$py.class
*.egg-info
.pytest_cache
.ipynb_checkpoints

thumbs.db
.DS_Store
.idea
Binary file added CLIP/CLIP.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
661 changes: 661 additions & 0 deletions CLIP/LICENSE

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions CLIP/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include clip/bpe_simple_vocab_16e6.txt.gz
195 changes: 195 additions & 0 deletions CLIP/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# CLIP

[\[Blog\]](https://openai.com/blog/clip/) [\[Paper\]](https://arxiv.org/abs/2103.00020) [\[Model Card\]](model-card.md) [\[Colab\]](https://colab.research.google.com/github/openai/clip/blob/master/notebooks/Interacting_with_CLIP.ipynb)

CLIP (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. It can be instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizing for the task, similarly to the zero-shot capabilities of GPT-2 and 3. We found CLIP matches the performance of the original ResNet50 on ImageNet “zero-shot” without using any of the original 1.28M labeled examples, overcoming several major challenges in computer vision.

## Approach

![CLIP](CLIP.png)

## Usage

First, [install PyTorch 1.7.1](https://pytorch.org/get-started/locally/) (or later) and torchvision, as well as small additional dependencies, and then install this repo as a Python package. On a CUDA GPU machine, the following will do the trick:

```bash
$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install ftfy regex tqdm
$ pip install git+https://github.com/openai/CLIP.git
```

Replace `cudatoolkit=11.0` above with the appropriate CUDA version on your machine or `cpuonly` when installing on a machine without a GPU.

```python
import torch
from PIL import Image

import clip

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("CLIP.png")).unsqueeze(0).to(device)
text = clip.tokenize(["a diagram", "a dog", "a cat"]).to(device)

with torch.no_grad():
image_features = model.encode_image(image)
text_features = model.encode_text(text)

logits_per_image, logits_per_text = model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs) # prints: [[0.9927937 0.00421068 0.00299572]]
```

## API

The CLIP module `clip` provides the following methods:

#### `clip.available_models()`

Returns the names of the available CLIP models.

#### `clip.load(name, device=..., jit=False)`

Returns the model and the TorchVision transform needed by the model, specified by the model name returned by `clip.available_models()`. It will download the model as necessary. The `name` argument can also be a path to a local checkpoint.

The device to run the model can be optionally specified, and the default is to use the first CUDA device if there is any, otherwise the CPU. When `jit` is `False`, a non-JIT version of the model will be loaded.

#### `clip.tokenize(text: Union[str, List[str]], context_length=77)`

Returns a LongTensor containing tokenized sequences of given text input(s). This can be used as the input to the model

---

The model returned by `clip.load()` supports the following methods:

#### `model.encode_image(image: Tensor)`

Given a batch of images, returns the image features encoded by the vision portion of the CLIP model.

#### `model.encode_text(text: Tensor)`

Given a batch of text tokens, returns the text features encoded by the language portion of the CLIP model.

#### `model(image: Tensor, text: Tensor)`

Given a batch of images and a batch of text tokens, returns two Tensors, containing the logit scores corresponding to each image and text input. The values are cosine similarities between the corresponding image and text features, times 100.

## More Examples

### Zero-Shot Prediction

The code below performs zero-shot prediction using CLIP, as shown in Appendix B in the paper. This example takes an image from the [CIFAR-100 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), and predicts the most likely labels among the 100 textual labels from the dataset.

```python
import os

import torch
from torchvision.datasets import CIFAR100

import clip

# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device)

# Download the dataset
cifar100 = CIFAR100(root=os.path.expanduser("~/.cache"), download=True, train=False)

# Prepare the inputs
image, class_id = cifar100[3637]
image_input = preprocess(image).unsqueeze(0).to(device)
text_inputs = torch.cat([clip.tokenize(f"a photo of a {c}") for c in cifar100.classes]).to(device)

# Calculate features
with torch.no_grad():
image_features = model.encode_image(image_input)
text_features = model.encode_text(text_inputs)

# Pick the top 5 most similar labels for the image
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
similarity = (100.0 * image_features @ text_features.T).softmax(dim=-1)
values, indices = similarity[0].topk(5)

# Print the result
print("\nTop predictions:\n")
for value, index in zip(values, indices):
print(f"{cifar100.classes[index]:>16s}: {100 * value.item():.2f}%")
```

The output will look like the following (the exact numbers may be slightly different depending on the compute device):

```
Top predictions:

snake: 65.31%
turtle: 12.29%
sweet_pepper: 3.83%
lizard: 1.88%
crocodile: 1.75%
```

Note that this example uses the `encode_image()` and `encode_text()` methods that return the encoded features of given inputs.

### Linear-probe evaluation

The example below uses [scikit-learn](https://scikit-learn.org/) to perform logistic regression on image features.

```python
import os

import numpy as np
import torch
from sklearn.linear_model import LogisticRegression
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR100
from tqdm import tqdm

import clip

# Load the model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device)

# Load the dataset
root = os.path.expanduser("~/.cache")
train = CIFAR100(root, download=True, train=True, transform=preprocess)
test = CIFAR100(root, download=True, train=False, transform=preprocess)


def get_features(dataset):
all_features = []
all_labels = []

with torch.no_grad():
for images, labels in tqdm(DataLoader(dataset, batch_size=100)):
features = model.encode_image(images.to(device))

all_features.append(features)
all_labels.append(labels)

return torch.cat(all_features).cpu().numpy(), torch.cat(all_labels).cpu().numpy()


# Calculate the image features
train_features, train_labels = get_features(train)
test_features, test_labels = get_features(test)

# Perform logistic regression
classifier = LogisticRegression(random_state=0, C=0.316, max_iter=1000, verbose=1)
classifier.fit(train_features, train_labels)

# Evaluate using the logistic regression classifier
predictions = classifier.predict(test_features)
accuracy = np.mean((test_labels == predictions).astype(float)) * 100.0
print(f"Accuracy = {accuracy:.3f}")
```

Note that the `C` value should be determined via a hyperparameter sweep using a validation split.

## See Also

- [OpenCLIP](https://github.com/mlfoundations/open_clip): includes larger and independently trained CLIP models up to ViT-G/14
- [Hugging Face implementation of CLIP](https://huggingface.co/docs/transformers/model_doc/clip): for easier integration with the HF ecosystem
2 changes: 2 additions & 0 deletions CLIP/VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://github.com/ultralytics/CLIP.git
61a901215e3e7c706a2f32b5f9969f9dec2beaf9
1 change: 1 addition & 0 deletions CLIP/clip/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .clip import *
Binary file added CLIP/clip/bpe_simple_vocab_16e6.txt.gz
Binary file not shown.
Loading
Loading