Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JRuby] Optimize scan(): Use strBL.getBegin() + curr instead of currPtr() #109

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Oct 15, 2024

Why?

Because they are identical.

ByteList strBL = str.getByteList();
int strBeg = strBL.getBegin();

private int currPtr() {
return str.getByteList().getBegin() + curr;
}

Benchmark

It shows String as a pattern is 2.33x faster than Regexp as a pattern.

$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.421M i/s -      7.378M times in 0.994235s (134.75ns/i)
          regexp_var     7.302M i/s -      7.307M times in 1.000706s (136.95ns/i)
              string    12.715M i/s -     12.707M times in 0.999388s (78.65ns/i)
          string_var    13.575M i/s -     13.533M times in 0.996914s (73.66ns/i)
Calculating -------------------------------------
              regexp     8.287M i/s -     22.263M times in 2.686415s (120.67ns/i)
          regexp_var    10.180M i/s -     21.905M times in 2.151779s (98.23ns/i)
              string    20.148M i/s -     38.144M times in 1.893226s (49.63ns/i)
          string_var    23.695M i/s -     40.726M times in 1.718753s (42.20ns/i)

Comparison:
          string_var:  23694846.7 i/s
              string:  20147598.6 i/s - 1.18x  slower
          regexp_var:  10180018.3 i/s - 2.33x  slower
              regexp:   8287384.8 i/s - 2.86x  slower

…ead of `currPtr()`

## Why?

Because they are identical.

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L267-L268

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L359-L361

## Benchmark

It shows String as a pattern is 2.33x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.421M i/s -      7.378M times in 0.994235s (134.75ns/i)
          regexp_var     7.302M i/s -      7.307M times in 1.000706s (136.95ns/i)
              string    12.715M i/s -     12.707M times in 0.999388s (78.65ns/i)
          string_var    13.575M i/s -     13.533M times in 0.996914s (73.66ns/i)
Calculating -------------------------------------
              regexp     8.287M i/s -     22.263M times in 2.686415s (120.67ns/i)
          regexp_var    10.180M i/s -     21.905M times in 2.151779s (98.23ns/i)
              string    20.148M i/s -     38.144M times in 1.893226s (49.63ns/i)
          string_var    23.695M i/s -     40.726M times in 1.718753s (42.20ns/i)

Comparison:
          string_var:  23694846.7 i/s
              string:  20147598.6 i/s - 1.18x  slower
          regexp_var:  10180018.3 i/s - 2.33x  slower
              regexp:   8287384.8 i/s - 2.86x  slower
```
@naitoh naitoh force-pushed the optimize_scan_currPtr branch from 8636749 to d0671e1 Compare October 15, 2024 14:47
@naitoh naitoh changed the title [JRuby] Optimize scan() method: Use strBL.getBegin(); + curr instead of currPtr() [JRuby] Optimize scan() method: Use strBL.getBegin() + curr instead of currPtr() Oct 15, 2024
@kou kou changed the title [JRuby] Optimize scan() method: Use strBL.getBegin() + curr instead of currPtr() [JRuby] Optimize scan(): Use strBL.getBegin() + curr instead of currPtr() Oct 16, 2024
@kou kou merged commit e73a154 into ruby:master Oct 16, 2024
37 checks passed
@kou
Copy link
Member

kou commented Oct 16, 2024

Thanks.

@naitoh naitoh deleted the optimize_scan_currPtr branch October 16, 2024 01:07
kou pushed a commit that referenced this pull request Oct 19, 2024
…ize()) return context.nil;` checks in `!headonly`. (#110)

- before: #109

## Why?


https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.


https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants