-
I've been toying around with memchr and memmap2 trying to see if i could match ripgrep's performance at counting lines in a file. Here's my test setup, beware generating the file takes a few minutes, and is 68gb.
Here's my attempt main.rs fn main() -> Result<(), Box<dyn std::error::Error>> {
let args: Vec<_> = std::env::args().into_iter().collect();
let input = &args[1];
let jobs: usize = args.get(2).and_then(|a| a.parse().ok()).unwrap_or(1);
let advise: bool = args.get(3).map(|a| a.contains('a')).unwrap_or(false);
let populate: bool = args.get(3).map(|a| a.contains('p')).unwrap_or(false);
let huge: bool = args.get(3).map(|a| a.contains('h')).unwrap_or(false);
let file = std::fs::File::open(input)?;
let file_size = file.metadata()?.len() as usize;
let mut options = memmap2::MmapOptions::new();
if populate {
options.populate();
}
if huge {
options.huge(Some(21));
}
let mmap = unsafe { options.len(file_size).map(&file)? };
if advise {
mmap.advise(memmap2::Advice::Sequential)?;
}
let count = if jobs > 1 {
use rayon::prelude::*;
mmap.par_chunks(file_size.div_ceil(jobs))
.map(|chunk| memchr::memchr_iter(b'\n', chunk).count())
.sum()
} else {
memchr::memchr_iter(b'\n', &mmap[0..file_size]).count()
};
println!("{}", count);
Ok(())
} Cargo.toml [package]
name = "linecount"
version = "0.1.0"
edition = "2021"
[dependencies]
memchr = "2.7.4"
memmap2 = "0.9.5"
rayon = "1.10.0" Running it on my 14-core m4 macbook:
I know ripgrep is singlethreaded for single file searches, and its perplexing me how it manages to attain such incredible performance. When I singlethread my naive implementation, it takes 6x longer than ripgrep. I've been reading the ripgrep source code and I seem to be doing most things kind of similarly, i believe. And even with multithreading to my core count, and various configurations, I still don't even get within 2x of ripgrep performance! Any idea what I'm doing wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 11 replies
-
First thing to note here is that your benchmark is kinda brutal. It takes a very long time to run. I'd suggest smaller inputs. You still want something big enough so that it's easier to measure throughput, and while 68GB might be the size of the real file you ultimately want to search, it's totally fine to decrease that for a benchmark. Like... 10GB maybe? I ended up just removing a
With that I get these timings:
So your program is, as I would expect, quite a bit faster than ripgrep! Parallelism helps a bit, but overall isn't that much faster than just using single threaded. However, I am on Linux. Not macOS. And I was in my I don't know what the macOS equivalent of I have an M2 mac mini:
And here are my timings on my M2:
Which also matches what I'd expect. I think the data above suggests that something else is going wrong here. Maybe since your input is so big, it can't fit into memory and is never cached. Which could mean that all you're really measuring is disk read time. However, if this were true, it should in theory impact ripgrep just as much as your program. So that is a bit of a mystery to me. It's possible something else is going wrong with your measurement process, but it's unclear to me what it could be. It might be worth re-booting and trying to re-create your measurements step-by-step. |
Beta Was this translation helpful? Give feedback.
Oh. Well that answers everything