Files

.circleci
Algorithms_and_Hardness_for_Learning_Linear_Thresholds_from_Label_Proportions
CIQA
COSTAR
CardBench_zero_shot_cardinality_training
CoDi
Domain_Agnostic_Contrastive_Representations_for_Learning_from_Label_Proportions
KNF
LLP_Bench
On_Combining_Bags_to_Better_Learn_from_Label_Proportions
OpenMSD
RevThink
STraTA
aav
abps
abstract_nas
action_angle_networks
action_gap_rl
activation_clustering
active_selective_prediction
adaptive_learning_rate_tuner
adaptive_low_rank
adaptive_prediction
adaptive_surrogates
adversarial_nets_lr_scheduler
after_kernel
agile_modeling
al_for_fep
albert
algae_dice
aloe
alx
amortized_bo
android_control
android_in_the_wild
android_interaction
anthea
aptamers_mlpd
aqt
aquadem
ara_optimization
arithmetic_sampling
arxiv_latex_cleaner
assemblenet
assessment_plan_modeling
attentional_adapters
attribute_semantics
attribute_with_prefixlm
attribution
auditing_privacy_via_lia
automated_feature_engineering
automatic_structured_vi
automl_zero
autoregressive_diffusion
aux_tasks
axial
bam
bangbang_qaoa
basisnet
batch_science
behavior_regularized_offline_rl
bertseq2seq
better_storylines
bigg
bigger_better_faster
bisimulation_aaai2020
bitempered_loss
blur
bnn_hmc
bonus_based_exploration
brush_splat
building_detection
business_metric_aware_forecasting
bustle
c_learning
cache_replacement
caltrain
camp_zipnerf
cann
capsule_em
caql
cascaded_networks
cate
causal_label_bias
cbertscore
cell_embedder
cell_mixer
cfq
cfq_pt_vs_sa
charformer
ciw_label_noise
ckd
class_balanced_distillation
clay
clip_as_rnn
cluster_gcn
clustering_normalized_cuts
cmmd
cnn_quantization
cochlear_implant
code_as_policies
codistillation
cognate_inpaint_neighbors
coherent_gradients
cola
cold_posterior_bnn
cold_posterior_flax
collaborative_tr_collection
collocated_irradiance_network
coltran
combiner
comisr
compgen_d2t
compositional_classification
compositional_rl
compositional_transformers
concept_explanations
concept_marl
conceptor
conqur
constrained_language_typology
context_aware_transliteration
contrack
contrails
contrastive_rl
coref_mt5
correct_batch_effects_wdn
correlated_compression
correlation_clustering
covid_epidemiology
covid_vhh_design
cube_unfoldings
cubert
cvl_public
d3pm
dac
darc
data_free_distillation
data_selection
dataset_or_not
dble
ddpm_w_distillation
deciphering_clinical_abbreviations
dedal
deep_homography
deep_representation_one_class
demogen
dense_representations_for_entity_retrieval
deplot
depth_and_motion_learning
depth_from_video_in_the_wild
design_bipartite_experiments
dialogue_ope
dichotomy_of_control
dictionary_learning
didi_dataset
differentiable_data_selection
differentially_private_gnns
diffusion_distillation
dimensions_of_motion
dipper
direction_net
disarm
dissecting_factual_predictions
distinguishing_romanized_hindi_urdu
distracting_control
distribution_embedding_networks
dnn_predict_accuracy
do_wide_and_deep_networks_learn_the_same_things
docent
domain_conditional_predictors
dot_vs_learned_similarity
dp_alternating_minimization
dp_instructions
dp_l2
dp_multiq
dp_posets
dp_regression
dp_scaling_laws
dp_sgd_clipping
dp_topk
dp_transfer
dpok
dpsgd_batch_sampler_accounting
dql_grasping
drawtext
dreamfields
dreg_estimators
drfact
drjax
drops
dselect_k_moe
dual_dice
dual_pixels
dvrl
earthquakes_fern
ebp
editable_graph_temporal
eim
eli5_retrieval_large_lm
elo_rater_model
enas_lm
encyclopedic_vqa
entropy_semiring
eq_mag_prediction
es_enas
es_maml
es_optimization
etcmodel
etcsum
euphonia_spice
ev3
evanet
evolution
experience_replay
explaining_risk_increase
extreme_memorization
f_divergence_estimation_ram_mc
f_net
factoring_sqif
factorize_a_city
factors_of_influence
fair_submodular_matroid
fair_submodular_maximization_2020
fair_survival_analysis
fairness_and_bias_in_online_selection
fairness_teaching
fast_gradient_clipping
fast_k_means_2020
fastconvnets
fat
federated_vision_datasets
felix
findit
fisher_brc
flare_removal
flax_models
floatseg
flood_forecasting
fm4tlp
fractals_language
frechet_audio_distance
frechet_video_distance
frequency_analysis
frmt
frost
fsq
fully_dynamic_facility_location
fully_dynamic_submodular_maximization
fvlm
- checkpoints
- configs
- data
- datasets
- demo_utils
- dito
- losses
- metrics
- modeling
- ops
- optim
- output
- rovit
- scripts
- utils
- CONTRIBUTING.md
- README.md
- demo.py
- export_saved_model.py
- requirements.txt
- setup.py
- train_and_eval.py
fwl
gaternet
gbrt
ged_tts
gen_patch_neural_rendering
general-pattern-machines
general_staircase_mechanism
generalization_representations_rl_aistats22
generalized_rates
generative_forests
generative_trees
genomics_ood
geometric_tokenizer
gfsa
ghum
gift
gigamol
goemotions
gon
gradient_based_tuning
gradient_coresets_replay
graph_compression
graph_embedding
graph_sampler
graph_temporal_ai
graphqa
grbm
group_agnostic_fairness
grouptesting
grow_bert
gumbel_max_causal_gadgets
gwikimatch
hal
hct
health_equity_toolbox
hierarchical_foresight
high_confidence_ir_eval_using_genai
hipi
hist_thresh
hitnet
hmc_swindles
homophonous_logography
hspace
hst_clustering
human_attention
human_object_interaction
hybrid_zero_dynamics
hyperattention
hyperbolic
hyperbolic_discount
hypertransformer
ials
icetea
ieg
igt_optimizer
ime
imghum
imp
implicit_constrained_optimization
implicit_pdf
incontext
incremental_gain
individually_fair_clustering
inerf
infinite_nature
infinite_nature_zero
infinite_uncertainty
instruction_following_eval
intent_recognition
interactive_cbms
interpretability_benchmark
invariant_explanations
invariant_slot_attention
investigating_m4
ipagnn
irregular_timeseries_pretraining
isl
isolating_factors
jax_dft
jax_mpc
jax_particles
jaxbarf
jaxnerf
jaxraytrace
jaxsel
jaxstronomy
jrl
jslm
k_norm
keypose
kip
kl_guided_sampling
kobe
ksme
kwikbucks
kws_streaming
l2da
l2tl
label_bias
lamp
language_model_uncertainty
large_margin
large_scale_voting
lasagna_mt
latent_programmer
latent_shift_adaptation
layout-blt
learn_to_forget
learn_to_infer
learning_parameter_allocation
learning_with_little_mixing
learnreg
ledge
lego
light_field_neural_rendering
lighthouse
linear_dynamical_systems
linear_eval
linear_identifiability
linear_vae
lista_design_space
llm4mobile
llm_longdoc_interpretability
lm_fact_tracing
lm_memorization
local_forward_gradient
locoprop
logic_inference_dataset
logit_adjustment
logit_dp
loss_functions_transfer
low_rank_local_connectivity
m_layer
m_theory
macro_mining
madlad_400
many_constraints
marot
mathwriting
matsci
mave
mbpp
mechanic
meena
memento
memory_efficient_attention
menger_rl
mentormix
merf
mesh_diffusion
meta_augmentation
meta_learning_without_memorization
meta_pseudo_labels
meta_reward_learning
metapose
mface
mico
micronet_challenge
microscope_image_quality
milking_cowmask
minigrid_basics
mir_uai24
misinfo_provenance
missing_link
ml_debiaser
mmmt_if
mobilebert
model_pruning
model_swarm
moe_models_implicit_bias
moe_mtl
moew
mol_dqn
moment_advice
motion_blur
mpi_extrapolation
mqm_viewer
muNet
mucped22
mucped23
muller
multi_annotator
multi_game_dt
multi_resolution_rec
multilingual_abbreviation_survey
multimodalchat
munchausen_rl
musiq
mutual_information_representation_learning
muzero
ncsnv3
negative_cache
nerflets
nested_rhat
neural_additive_models
neural_guided_symbolic_regression
neutra
newspalm_mbr_qe
nf_diffusion
ngrammer
nigt_optimizer
nngp_nas
non_decomp
non_semantic_speech_benchmark
nopad_inception_v3_fcn
norml
npy_array
numbert
occluder_recovery
offline_online_bandits
omnimatte3D
on_device_rewrite
online_belief_propagation
online_correlation_clustering
opencontrails
openscene
opt_list
optimizing_interpretability
osf
padir
pair_ngram
pairwise_fairness
pali
palm2_automqm
parallel_clustering
pde_preconditioner
performer
persistent-nature
persistent_es
perso_arabic_norm
perturbations
pgdl
playrooms
poem
policy_eval
polish
poly_kernel_sketch
polysketchformer
postproc_fairness
pretrained_conv
prime
primer
privacy_poison
privacy_sandbox
private_covariance_estimation
private_kendall
private_personalized_pagerank
private_sampling
private_text_transformers
procedure_cloning
property_linking
protein_lm
protenn
protnlm
protoattend
protseq
proxy_rewards
pruning_identified_exemplars
pse
psyborgs
psycholab
ptopk_patch_selection
pvn
pwil
q_match
qanet
qsp_quantum_metrology
quantile_regression
quantum_sample_learning
r4r
rank_ckpt
rankgen
rankt5
ravens
rcc_algorithms
rce
rcs_tnsa
re_identification_risk
readtwice
realformer
recs_ecosystem_creator_rl
recursive_optimizer
red-ace
regnerf
relc
rembert
remote_sensing_representations
repnet
representation_batch_rl
representation_clustering
representation_similarity
reset_free_learning
resolve_ref_exp_elements_ml
restarting_FOM_for_LP
revisiting_neural_scaling_laws
rewritelm
richhf_18k
rico_semantics
rise
rl4circopt
rl_metrics_aaai2021
rl_repr
rllim
robust_count_sketch
robust_loss
robust_loss_jax
robust_optim
robust_retrieval
rouge
routing_transformer
rpc
rrlfd
rs_gnn
saccader
saf
sail_rl
saycan
scalable_shampoo
scaling_transformer_inference_efficiency
scaling_transformers
scann
schema_guided_dst
schptm_benchmark
score_prior
scouts_ml_model_env
screen2words
scrna_benchmark
sd_gym
seeds
semantic_routing
seq2act
sequential_attention
sgk
shortcut_testing
sifer
sign_language_detection
simpdom
simple_mesh_viewer
simple_probabilistic_programming
simulation_research
single_view_mpi
sketching
sliding_window_clustering
slot_attention
sm3
smart_eval
smerf
smith
smu
smug_saliency
smurf
snerg
snlds
sobolev
social_rl
socraticmodels
soft_sort
soft_topk
soil_moisture_retrieval
solver1d
sorb
spaceopt
sparse_data
sparse_deferred
sparse_mixers
sparse_soft_topk
sparse_user_encoders
special_orthogonalization
specinvert
spectral_bias
spectral_graphormer
speculative_cascades
speech_embedding
spelling_convention_nlm
spin_spherical_cnns
spreadsheet_coder
sql_palm
squiggles
stable_transfer
stacked_capsule_autoencoders
standalone_self_attention_in_vision_models
star_cfq
state_of_sparsity
stochastic_to_deterministic
storm_optimizer
strategic_exploration
stream_s2s
streetview_contrails_dataset
structformer
structured_multihashing
student_mentor_dataset_cleaning
study_recommend
subclass_distillation
sudoku_gpt
sufficient_input_subsets
summae
supcon
supervised_pixel_contrastive_loss
symbiosis
symbolic_functionals
t5_closed_book_qa
table_rag
tabnet
tag
talk_about_random_splits
taperception
task_set
task_specific_learned_opt
tcc
tempered_boosting
text_blueprint
tf3d
tf_trees
tft
tide
tide_nlp
time_varying_optimization
tiny_video_nets
tip
topological_transformer
towards_gan_benchmarks
trainable_grids
transformer_modifications
trimap
true_teacher
truss_decomposition
tsmixer
tubevit
tunas
uflow
ugif
ugsl
ul2
uncertainties
understanding_convolutions_on_graphs
universal_embedding_challenge
unprocessing
uq_benchmark_2019
using_dl_to_annotate_protein_universe
vae_ood
value_dice
value_function_polytope
vatt
vbmi
vct
vdvae_flax
video_structure
video_timeline_modeling
vila
visual_relationship
vmsst
vrdu
warmstart_graphcut_image_segmentation
wavelet_fields
weak_disentangle
whatever-name-we-choose
widget-caption
widget_caption
wiki_split_bleu_eval
wildfire_conv_lstm
wildfire_perc_sim
wt5
xirl
xor_attriqa
yeast_transcription_network
yobo
yoto
youtube_asl
youtube_sl_25
zebra_puzzle_generator
zebraix
zero_shot_structured_reflection
.gitignore
CONTRIBUTING.md
LICENSE
README.md
__init__.py
compile_protos.sh

fvlm

jakeharmon8

and

copybara-github

Update references to JAX's GitHub repo

Mar 5, 2025

cfdd945 · Mar 5, 2025

History

Name	Name	Last commit message	Last commit date
parent directory ..
checkpoints	checkpoints	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
configs	configs	Open-source RO-ViT Demo.	Sep 14, 2023
data	data	Replace deprecated `jax.tree_` functions with `jax.tree.`	May 20, 2024
datasets	datasets	Open-source RO-ViT Demo.	Sep 14, 2023
demo_utils	demo_utils	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
dito	dito	Update references to JAX's GitHub repo	Mar 5, 2025
losses	losses	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
metrics	metrics	Replace deprecated `jax.tree_` functions with `jax.tree.`	May 20, 2024
modeling	modeling	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
ops	ops	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
optim	optim	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
output	output	F-VLM demo open-sourcing for ICLR 2023.	Apr 28, 2023
rovit	rovit	Replace deprecated `jax.tree_` functions with `jax.tree.`	May 20, 2024
scripts	scripts	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
utils	utils	Automated Code Change	Feb 7, 2025
CONTRIBUTING.md	CONTRIBUTING.md	F-VLM demo open-sourcing for ICLR 2023.	Apr 28, 2023
README.md	README.md	Update references to JAX's GitHub repo	Mar 5, 2025
demo.py	demo.py	Replace deprecated `jax.tree_` functions with `jax.tree.`	May 20, 2024
export_saved_model.py	export_saved_model.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
requirements.txt	requirements.txt	Update dependencies.	Oct 25, 2023
setup.py	setup.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024
train_and_eval.py	train_and_eval.py	Open-sourcing the code for "CLIP as RNN: Segment Countless Visual Con…	Jan 23, 2024

README.md

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

This is the JAX/FLAX implementation of the ICLR-2023 paper "F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models". The model is also supported on the Cloud Vertex API here, where you can train and predict with this model on Google Cloud Vertex AI Training and Prediction service using the provided notebook at the top of the model card.

Installation

We use the Python built-in virtual env to set up the environment. Run the following commands:

svn export https://github.com/google-research/google-research/trunk/fvlm

PATH_TO_VENV=/path/to/your/venv
python3 -m venv ${PATH_TO_VENV}
source ${PATH_TO_VENV}/bin/activate

Install the requirements from the root fvlm directory.

pip install -r requirements.txt

For GPU training, please refer to this github page for installation instructions.

Download the checkpoints.

Run the following commands from the root fvlm directory.

cd ./checkpoints
./download.sh

For users who want to use the FLAX checkpoints directly rather than tf.SavedModel, we have prepared the checkpoints for downloading by the commands below:

MODEL="r50"  # Supported model: r50, r50x4, r50x16
wget "https://storage.googleapis.com/cloud-tpu-checkpoints/detection/projects/fvlm/jax_checkpoints/${MODEL}_checkpoint_184000"

We recommend users to run the above commands in the checkpoints directory.

Run the demo.

Run the following commands from the root fvlm directory. This will run F-VLM demo using ResNet50 backbone.

python3 demo.py --model=resnet_50

You can set model size, demo image, category string, and visualization by the command line flags. Please refer to demo.py for more documentation on the flags.

We note that the demo models are trained on a mixture of COCO, Objects365 and full LVIS dataset to increase the base category coverage. They are different from the ones used for LVIS/COCO benchmarks in the paper which are trained on subsets of LVIS/COCO vocabularies.

Export the JAX checkpoint.

Run the following commands from the root fvlm directory.

INPUT_DIR="/your/input/jax_checkpoint_dir"
OUTPUT_DIR="/your/output/savedmodel/dir"

./scripts/export_model.sh "${INPUT_DIR}" "${OUTPUT_DIR}"

Train and evaluate F-VLM from CLIP checkpoints.

Here we describe the steps to use the COCO dataset for training and evaluation as an example. To use any custom dataset, users would need to follow the same setup.

Follow the steps here to set up the COCO dataset and move it to datasets/coco. The coco directory should contain train*.tfrecord, val*.tfrecord, and instances_val2017.json (the standard COCO evaluation file).
Download the precomputed COCO category embeddings and store it by running

./scripts/download_precomputed_embeddings.sh

Run the following command:

OUTPUT_DIR="/your/output/dir"

./train_and_eval.sh "${OUTPUT_DIR}"

Set up custom datasets.

Here we describe the specific changes needed in ./scripts/fvlm_train_and_eval.gin to set up training/evaluation with custom datasets.

Update TRAIN_FILE_PATTERN and EVAL_FILE_PATTERN to point to your dataset.
Update EMBED_PATH to point to your cached embedding.npy.
Update CATG_PAD_SIZE to the number of your training categories.
Update EVAL_STEPS to the number of your validation set size.

Citation

@inproceedings{
kuo2023openvocabulary,
title={Open-Vocabulary Object Detection upon Frozen Vision and Language Models},
author={Weicheng Kuo and Yin Cui and Xiuye Gu and AJ Piergiovanni and Anelia Angelova},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=MIMwy4kh9lf}
}

Demo Image Source and License

Source: https://pixabay.com/nl/photos/het-fruit-eten-citroen-limoen-3134631/

Creative Commons License: https://pixabay.com/nl/service/terms/

Disclaimer

This is not an officially supported Google product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

fvlm

fvlm

README.md

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Installation

Download the checkpoints.

Run the demo.

Export the JAX checkpoint.

Train and evaluate F-VLM from CLIP checkpoints.

Set up custom datasets.

Citation

Demo Image Source and License

Disclaimer

Files

fvlm

Directory actions

More options

Directory actions

More options

Latest commit

History

fvlm

Folders and files

parent directory

README.md

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Installation

Download the checkpoints.

Run the demo.

Export the JAX checkpoint.

Train and evaluate F-VLM from CLIP checkpoints.

Set up custom datasets.

Citation

Demo Image Source and License

Disclaimer