Some solution for avoiding the lowercasing #861

domenic · 2016-11-03T16:59:05Z

I thought there was an open issue for this but I didn't see it...

When you reference "event handler IDL attributes", or anything with the word "ECMAScript", the bibliography entry gets lowercased.

Last time we talked about allowing dfns to declare some kind of canonical casing. Maybe just as simple as data-dfn-preserve-case or something.

tabatkins · 2016-11-03T18:26:56Z

A boolean isn't good, as illustrated by your first example - if you happen to start your sentence with that definition, it'll preserve the fact that E is capitalized, which isn't what you want.

I have two possible ideas:

Have an attribute that takes one or more comma-separated phrases. It ensures these phrases appear in the definition text. When printing in indexes/etc, it ensures those phrases have the correct (original) capitalization. So for your two examples, you'd have preserve-case='IDL' and preserve-case='ECMAScript'.
Have a capitalization mask attribute. The value must be a a sequence of periods, spaces, and upper/lowercase letters; periods indicate caseless letters, upper/lowercase indicate case-important letters, and spaces indicate spaces. The mask must match the definition text exactly. So for your two examples you'd have capitalization-mask="..... ....... IDL .........." and capitalization-mask="ECMAScript".

First is definitely easier, but potentially has issues if the word appears more than once, and only one has an important case. That seems pretty rare, tho. Second is fully unambiguous, but kinda annoying to write out.

Opinions?

domenic · 2016-11-03T18:33:30Z

The idea would be you'd use the boolean along with lt="". That way you'd only need the boolean unless you're starting a sentence.

So, comparing:

My idea: <dfn preserve-case>event handler IDL attribute</dfn> or <dfn lt="event handler IDL attribute" preserve-case>Event handler IDL attribute</dfn>
Your idea (1): <dfn preserve-case="IDL">event handler IDL attributes</dfn> or <dfn preserve-case="IDL">Event handler IDL attributes</dfn>
Your idea (2): <dfn capitalization-mask="..... ....... IDL ..........">event handler IDL attributes</dfn> orEvent handler IDL attributes`

I like my idea the most here, but your idea (1) is usable too. I really don't want to have to count the periods for your idea (2).

tabatkins · 2016-11-03T20:17:09Z

The idea would be you'd use the boolean along with lt="". That way you'd only need the boolean unless you're starting a sentence.

The problem with this is that you have to be aware of the problem every time you define a term, so you can catch yourself and provide an lt with the correct casing when necessary. That's a footgun. Both of my proposed solutions, on the other hand, Just Work© when you use them, regardless of whether the dfn is starting the sentence or not.

I agree that the masking idea is annoying. I can accept the very slim possibility of a capitalization misfire from my first idea instead. ^_^

domenic · 2016-11-03T20:23:30Z

The problem with this is that you have to be aware of the problem every time you define a term, so you can catch yourself and provide an lt with the correct casing when necessary. That's a footgun.

I don't understand why this is more of a problem than having to be aware of the problem when you define the term, and thus using preserve-case="IDL"?

tabatkins · 2016-11-03T20:45:46Z

Because it's a non-obvious edge-case. You do have to be aware of the capitalization issue in the first place to know that you should use the attribute, but you can get that from things looking weird in the index. But then, once you learn about the concept, and start applying it on your own by reflex, having to additionally remember that if you have some sort of unimportant capitalization in the phrase, you have to normalize it with an lt, is something that will probably often be skipped (and it's not quite as obvious of an error in the index).

Versus my suggestion, which still requires you to remember to apply a caps attribute when necessary, but then automatically ignores unimportant capitalization without you having to do anything else.

domenic · 2016-11-03T20:46:35Z

I see. Sounds good!

annevk · 2016-11-18T10:21:56Z

How about we consider what's inside the <dfn> as canonical casing? If you happen to start a sentence with the <dfn>, you can use <dfn lt="event handler IDL attribute">Event handler IDL attributes</dfn>. That is, we'd use the lt attribute for the canonical form just as we already do. Matching would continue to be ASCII case-insensitive (or you store the lowercased form as well and only use it for matching purposes).

annevk · 2016-11-18T10:25:01Z

I think my solution would be much more natural going forward and require less overall work too (especially considering that going forward we wouldn't need special annotations each time we introduce an uppercased term).

tabatkins · 2016-11-19T00:42:20Z

I already explained why "use the linking text as canonical casing by default" doesn't work well - it requires people to remember to do something when casing isn't important for their definition (which is the vast majority of cases) and the definition happens to be in a particular part of the sentence. This is much much harder than having to remember to do something when capitalization is important, not to mention that capitalization being important is much rarer than it being unimportant.

As I've mentioned in other issues, I try to optimize toward stability and predictability, even if it means a little more work than other solutions (such as having to mark up every cap-important definition, rather than just the subset of cap-unimportant definitions that start a sentence).

annevk · 2016-11-19T06:35:16Z

I guess I'm questioning how common that really is. I don't recall a single instance of that myself.

tabatkins · 2016-11-21T23:13:03Z

While I don't recall where they were (and am not going to spend 20 minutes trying to track them down), I've definitely put definitions at the start of a sentence before.

annevk · 2016-11-22T07:54:45Z

Sure, I'm just saying they're far less common than definitions where case matters and that I think that will also be true going forward.

tabatkins · 2016-11-28T23:34:41Z

Yeah, I don't disagree with you on that. My argument against is that if I do what you ask, then the case where you have to correct things is a very rare positional thing where the problem is non-obvious (casing doesn't matter for links in most circumstances, but you have to recognize this one spot where casing does matter but shouldn't, and give the correct casing which, again, doesn't actually matter). Requiring case-matters definitions to mark themselves as such means somewhat more definitions have to do something, but the situation where you have to do it is where casing matters, and you're providing the correct case. This is way easier to recognize and think about.

annevk · 2016-11-29T07:58:13Z

So I think you are proposing something different. All I want is to preserve the case of the definition, but matching for that definition should still happen using ASCII case-insensitive matching as before.

So that whenever Bikeshed presents the definition, it uses "Canonical CASE", but I can still write "canonical case", "canonical Case", etc. So the only impact this has on existing usage is that some terms end up being correctly cased (ASCII case-insensitive, ASCII whitespace, etc.) and others end up with an initial uppercased character due to being defined at the start of a sentence. No matching would be impacted.

The fix for existing usage where the initial uppercased character is not warranted would be to add an lt attribute where it's not uppercased. Similarly how we sometimes have to add an lt attribute to avoid defining a plural.

I think that kind of change is much more in line with expectations about how things intuitively work too. I don't think we should ever have case-sensitive matching for terms. That would be too confusing for readers.

tabatkins · 2016-11-29T21:54:49Z

No, we're definitely still talking about the same thing. That's precisely why I'm being more careful about which case has to trigger this - because the casing only matters for the purpose of printing, having to always keep in mind "does this term, for which casing doesn't matter, happen to have the wrong casing for printing in an index, where casing matters?" is bad and people will miss it a lot. Vs having to remember "this is a term that has important casing", which you can remember to mark with the "preserve this casing" attribute because, well, casing matters for that term. The mental work required to remember the tag and notice that you need to apply it is a lot less in the latter case.

tabatkins · 2017-01-25T23:50:44Z

I'm not sure why I was so opinionated about a boolean being insufficient. A boolean + requiring lt to be specified seems like more than enough. I'm pinging @plinss right now about starting to track this in Shepherd, and then I'll make Bikeshed start honoring it.

frivoal · 2023-08-04T15:40:36Z

I'd suggest there's an easy subcase: when the entire term in the <dfn> is uppercase, then it's not a matter of being at the start of the sentence or not, and it's 99% sure that it's an acronym of some kind.

Further, the only case that is really ambiguous is when the first letter is uppercase, because then you don't know if it is because it starts a sentence, or because it's supposed to be anyways.

So, regardless of overrides, I'd suggest a better default behavior would be:

If the <dfn> doesn't start with an upper case letter, or if all letters in the <dfn> are uppercase, treat the <dfn> as canonical and don't fiddle with the casing
else (i.e. the first letter is uppercase, but not all letters are), if at least one other letter within the first word (i.e. before the first space, or before the end of string) is also uppercase, treat the <dfn> as canonical and don't fiddle with the casing (this catches cases like "ECMAScript", or "WebAssembly" or "DOM Event")
else, switch the case of the first letter from upper to lower.

Now, sure, these are heuristics, and they could go occasionally wrong (and so we can still want some method to override). But I think this would be far more likely to do what authors and readers expect. At least case 1 and 2 are almost certainly correct. 3 could go either way, but it will only be wrong in the case cases where the current approach is wrong, and will be right far more often.

tabatkins · 2023-08-04T18:41:22Z

These heuristics seem pretty reasonable. I'm not fussed about having to necessarily provide an override immediately; in the worst case we're no worse off than we are today.

woutermont · 2023-08-04T18:58:38Z

Not sure that I like the prospects of this. It's much easier to remember a rule like "what I write between dfn tags is canonical unless I use lt" than this arbitrary combination of heuristics.

tabatkins · 2023-08-04T20:24:53Z

It will almost always be "what you write between the tags", except if you start a sentence with a definition and aren't using any other capitals (then it'll lowercase, which you will likely expect anyway).

The only problematic case is starting a sentence with a definition that also uses several capital letters, in which case the initial cap from starting a sentence will be taken as canonical.

I think that, with these heuristics for the body text, I can just take lt text as completely canonical, tho.

woutermont · 2023-08-05T05:48:18Z

What about definition lists? Would you make an exception for those?

annevk mentioned this issue Nov 18, 2016

Reconsider lowercasing definitions #881

Closed

domenic mentioned this issue Jan 9, 2017

Various "terms defined by this specification" are miscapitalized whatwg/webidl#172

Open

tabatkins added the enhancement label May 22, 2017

TimothyGu mentioned this issue Sep 9, 2017

[=ref=] syntax should work on abstract-op type definitions too #809

Closed

domenic mentioned this issue Oct 23, 2017

WebAssembly JS and Web integration spec in Bikeshed WebAssembly/spec#591

Merged

sidvishnoi mentioned this issue Jun 12, 2020

Access to originally defined external terms #1705

Open

dontcallmedom mentioned this issue Jun 25, 2020

Refine Specifications Definitions data model w3c/reffy#336

Closed

dontcallmedom mentioned this issue Jun 21, 2021

Document case-sensitivity for definition types speced/spec-dfn-contract#3

Open

tabatkins mentioned this issue Oct 31, 2024

Provided custom definitions are all lowercase #2941

Closed

tabatkins closed this as completed in cce96c8 Nov 4, 2024

tabatkins mentioned this issue Dec 6, 2024

Release Notes #1773

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some solution for avoiding the lowercasing #861

Some solution for avoiding the lowercasing #861

domenic commented Nov 3, 2016

tabatkins commented Nov 3, 2016

domenic commented Nov 3, 2016 •

edited

Loading

tabatkins commented Nov 3, 2016

domenic commented Nov 3, 2016

tabatkins commented Nov 3, 2016

domenic commented Nov 3, 2016

annevk commented Nov 18, 2016 •

edited

Loading

annevk commented Nov 18, 2016

tabatkins commented Nov 19, 2016

annevk commented Nov 19, 2016

tabatkins commented Nov 21, 2016

annevk commented Nov 22, 2016

tabatkins commented Nov 28, 2016

annevk commented Nov 29, 2016

tabatkins commented Nov 29, 2016

tabatkins commented Jan 25, 2017

frivoal commented Aug 4, 2023 •

edited

Loading

tabatkins commented Aug 4, 2023

woutermont commented Aug 4, 2023

tabatkins commented Aug 4, 2023

woutermont commented Aug 5, 2023

Some solution for avoiding the lowercasing #861

Some solution for avoiding the lowercasing #861

Comments

domenic commented Nov 3, 2016

tabatkins commented Nov 3, 2016

domenic commented Nov 3, 2016 • edited Loading

tabatkins commented Nov 3, 2016

domenic commented Nov 3, 2016

tabatkins commented Nov 3, 2016

domenic commented Nov 3, 2016

annevk commented Nov 18, 2016 • edited Loading

annevk commented Nov 18, 2016

tabatkins commented Nov 19, 2016

annevk commented Nov 19, 2016

tabatkins commented Nov 21, 2016

annevk commented Nov 22, 2016

tabatkins commented Nov 28, 2016

annevk commented Nov 29, 2016

tabatkins commented Nov 29, 2016

tabatkins commented Jan 25, 2017

frivoal commented Aug 4, 2023 • edited Loading

tabatkins commented Aug 4, 2023

woutermont commented Aug 4, 2023

tabatkins commented Aug 4, 2023

woutermont commented Aug 5, 2023

domenic commented Nov 3, 2016 •

edited

Loading

annevk commented Nov 18, 2016 •

edited

Loading

frivoal commented Aug 4, 2023 •

edited

Loading