Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JRuby] Optimize scan(): Remove duplicate if (restLen() < patternsize()) return context.nil; checks in !headonly. #110

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Oct 18, 2024

Why?

private int restLen() {
return str.size() - curr;
}

This means the following :

if (str.size() - curr < pattern.size()) return context.nil;

A similar check is made within StringSupport#index() within !headonly.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
  • source = strBL
  • other = patternBL
  • offset = strBeg + curr

This means the following :
if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return -1;

Both checks are the same.

Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower

…size()) return context.nil;` checks in `!headonly`.

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within `!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return -1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```
@naitoh naitoh marked this pull request as ready for review October 18, 2024 15:09
@kou kou merged commit 843e931 into ruby:master Oct 19, 2024
37 checks passed
@kou
Copy link
Member

kou commented Oct 19, 2024

Thanks.

It's better that we compare performance change between string/string_var instead of string/regexp with/without this change.

@naitoh naitoh deleted the optimize_scan_restLen branch October 19, 2024 06:25
hsbt pushed a commit to hsbt/ruby that referenced this pull request Oct 25, 2024
(restLen() < patternsize()) return context.nil;` checks in
`!headonly`.
(ruby/strscan#110)

- before: ruby#109

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

ruby/strscan@843e931d13
hsbt pushed a commit to hsbt/ruby that referenced this pull request Oct 25, 2024
(restLen() < patternsize()) return context.nil;` checks in
`!headonly`.
(ruby/strscan#110)

- before: ruby#109

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

ruby/strscan@843e931d13
hsbt pushed a commit to hsbt/ruby that referenced this pull request Oct 26, 2024
(restLen() < patternsize()) return context.nil;` checks in
`!headonly`.
(ruby/strscan#110)

- before: ruby#109

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

ruby/strscan@843e931d13
hsbt pushed a commit to ruby/ruby that referenced this pull request Oct 26, 2024
(restLen() < patternsize()) return context.nil;` checks in
`!headonly`.
(ruby/strscan#110)

- before: #109

## Why?

https://github.com/ruby/strscan/blob/d31274f41b7c1e28f23d58cf7bfea03baa818cb7/ext/jruby/org/jruby/ext/strscan/RubyStringScanner.java#L371-L373

This means the following :

`if (str.size() - curr < pattern.size()) return context.nil;`

A similar check is made within `StringSupport#index()` within
`!headonly`.

https://github.com/jruby/jruby/blob/be7815ec02356a58891c8727bb448f0c6a826d96/core/src/main/java/org/jruby/util/StringSupport.java#L1706-L1720

```Java
    public static int index(ByteList source, ByteList other, int offset, Encoding enc) {
        int sourceLen = source.realSize();
        int sourceBegin = source.begin();
        int otherLen = other.realSize();

        if (otherLen == 0) return offset;
        if (sourceLen - offset < otherLen) return -1;
```

- source = `strBL`
- other = `patternBL`
- offset = `strBeg + curr`

This means the following :
`if (strBL.realSize() - (strBeg + curr) < patternBL.realSize()) return
-1;`

Both checks are the same.

## Benchmark

It shows String as a pattern is 2.40x faster than Regexp as a pattern.

```
$ benchmark-driver benchmark/check_until.yaml
Warming up --------------------------------------
              regexp     7.613M i/s -      7.593M times in 0.997350s (131.35ns/i)
          regexp_var     7.793M i/s -      7.772M times in 0.997364s (128.32ns/i)
              string    13.222M i/s -     13.199M times in 0.998297s (75.63ns/i)
          string_var    15.283M i/s -     15.216M times in 0.995667s (65.43ns/i)
Calculating -------------------------------------
              regexp    10.003M i/s -     22.840M times in 2.283361s (99.97ns/i)
          regexp_var     9.991M i/s -     23.378M times in 2.340019s (100.09ns/i)
              string    23.454M i/s -     39.666M times in 1.691221s (42.64ns/i)
          string_var    23.998M i/s -     45.848M times in 1.910447s (41.67ns/i)

Comparison:
          string_var:  23998466.3 i/s
              string:  23453777.5 i/s - 1.02x  slower
              regexp:  10002809.4 i/s - 2.40x  slower
          regexp_var:   9990580.1 i/s - 2.40x  slower
```

ruby/strscan@843e931d13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants