Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use clucore #995

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Use clucore #995

wants to merge 5 commits into from

Conversation

zupon
Copy link
Contributor

@zupon zupon commented Mar 29, 2021

This PR addresses #991.

So far, this PR...

  • Adds CluCoreProcesor to EidosEnglishProcessor
  • Removes EidosCluProcessor
  • Removes CLU language from Language.scala and NegationHandler.scala

After these first changes, I tried running sbt test, but most of them aborted and I got an error about how I need to initialize dynet first:

 Attempting to define parameters before initializing DyNet. Be sure to call dynet::initialize() before defining your model.

There's also a good chance that I just plugged the CluCoreProcessor in the wrong place.

@zupon zupon requested review from BeckySharp and kwalcock March 29, 2021 03:43
@kwalcock
Copy link
Member

If the addition of org.clulab.dynet.Utils.initializeDyNet didn't help, then holler. Sorry it is buried so deeply. It's part of processors.

@zupon
Copy link
Contributor Author

zupon commented Apr 4, 2021

It does seem to work now. Thanks! I might not have placed the initialization call in the most ideal place, but we can move it around if needed.

I also reran sbt test in both this branch and master to compare which tests are failing.

Here's the end result of running sbt test in my local master branch:

[info] Run completed in 38 minutes, 12 seconds.
[info] Total number of tests run: 654
[info] Suites: completed 81, aborted 0
[info] Tests: succeeded 612, failed 42, canceled 7, ignored 254, pending 0
[info] *** 42 TESTS FAILED ***
[error] Failed tests:
[error]         org.clulab.wm.eidos.system.TestCrLf
[error]         org.clulab.wm.eidos.groundings.TestVersioner
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc3

And here's the same output in the use-clucore branch:

[info] Run completed in 24 minutes, 39 seconds.
[info] Total number of tests run: 654
[info] Suites: completed 81, aborted 0
[info] Tests: succeeded 498, failed 156, canceled 7, ignored 254, pending 0
[info] *** 156 TESTS FAILED ***
[error] Failed tests:
[error]         org.clulab.wm.eidos.text.english.raps.TestRaps
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc5
[error]         org.clulab.wm.eidos.serialization.jsonld.TestJLDSerializer
[error]         org.clulab.wm.eidos.text.english.raps.TestRaps1
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc8
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP1
[error]         org.clulab.wm.eidos.text.english.cag.TestExtraText
[error]         org.clulab.wm.eidos.serialization.TestDocSerialization
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc2
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP4
[error]         org.clulab.wm.eidos.system.TestCrLf
[error]         org.clulab.wm.eidos.groundings.TestVersioner
[error]         org.clulab.wm.eidos.serialization.jsonld.TestJLDDeserializer
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc3
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc6
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP3
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc1
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc4
[error]         org.clulab.wm.eidos.system.TestEidosMention
[error]         org.clulab.wm.eidos.text.english.cag.TestCagP2
[error]         org.clulab.wm.eidos.utils.TestMentionUtils
[error]         org.clulab.wm.eidos.document.TestSentenceClassifier
[error]         org.clulab.wm.eidos.text.english.eval6.TestDoc7

At least for master, I'm pretty sure the TestCrLf failure is just because I have some local resources that are included in the test, but aren't actually pushed to remote. Not sure about TestVersioner or TestDoc3 though. I also don't know what the 7 cancelled tests are, but I assume (for now) that they are the same 7 in both branches, so no difference there.

@kwalcock
Copy link
Member

kwalcock commented Apr 4, 2021

You don't need to worry about TestCrLf and TestVersioner and probably not TestDoc3. TestCrLf is probably because you are using Windows, TestVersioner is maybe because the git plugin can't version something locally (is git available on the command line?), and I think that TestDoc3 might be the unstable one. I'll definitely check before it becomes critical.

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Apr 4, 2021 via email

@zupon
Copy link
Contributor Author

zupon commented Apr 4, 2021

I'm not completely sure what you mean. Do you just mean to get the specific outputs of each failing test so we can see which parts are failing? E.g. for the TestRaps test, the output that includes this:

...
[info] Raps_sent8
[info] - should have correct node
[info] Raps_sent9
[info] - should have correct edge !!! IGNORED !!!
[info] Raps_sent10
[info] - should have correct edge 1 !!! IGNORED !!!
[info] - should have correct edge 2
[info] - should have correct edge 3 !!! IGNORED !!!
[info] Raps_sent11
[info] - should have correct node 2 *** FAILED ***
[info]   List("
[info]   Errors:
[info]          Could not find NodeSpec [Use of improved cultivars and mechanization|+POS(improved)]
...

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Apr 4, 2021 via email

@zupon
Copy link
Contributor Author

zupon commented Apr 5, 2021

Ok I think I understand. The CluCoreProcessor uses the new corenlp relations (e.g. obj instead of dobj) for some things, correct? Changing to CluCoreProcessor is likely to break some tests that used the old relations. Now, we want to find out which relations are missing, and compare the parse trees and SRLs for those sentences that are different. Does that sound accurate?

Is there a specific test or a particular document/text to check all the relations?

@BeckySharp
Copy link
Contributor

@zupon what's the status of this?

@zupon
Copy link
Contributor Author

zupon commented Jun 8, 2021

@BeckySharp There have been some other updates that haven't made it into this PR yet. Mihai adjusted some rules for expanding on contractions based on an error analysis I did. Now I am still in the process of looking at the failing tests when using CluCore and comparing those with the outputs from master using FastNLP.

There haven't been any more updates to this branch yet though, and I'm not sure when it'll be ready. If you want we can close this PR for now (without merging) until this is further along.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants