Sandcastle Spell Checking?

Jan 16, 2009 at 7:57 PM
Is there a spellchecker that I can use with Sandcastle?  I am especially interested in spell checking my code comments.  Obviously the spell checker would need to include the ability to add words.

Pat O
Jan 16, 2009 at 8:23 PM
See this thread for information
http://www.codeplex.com/Sandcastle/Thread/View.aspx?ThreadId=32096

Best regards,
Paul.
Jan 16, 2009 at 9:49 PM
Thanks, I had seen that dicussion.  First off I have no idea what MAML is, but the real issue is that this is not the kind of spell checker I am looking for.  Many people work on this SDK.  I want to just a spell check each time I produce the help file to verify that no spelling errors have snuck in.  In short I was hoping that Sandcastle could do the spell checking as it ran.  Is there anything like that?

Pat O
Jan 16, 2009 at 10:31 PM
>>Pat O: Is there anything like that?
The answer is in my post of that thread, it has nothing to do with MAML.

BTW, you may need the MAML if you have to write tutorial and how to for the SDK
using the Sandcastle.

Best regards,
Paul.
Jan 17, 2009 at 7:02 PM
Thanks, we are looking at MAML and I do want to include code examples eventually.  Getting back to my original question, that conversation has links to VisualStudio add-on that allow or interactive spell checking.  What I am looking or is to do a final check at build time, or as part of creating the documentation.  Obviously I expect people to spell words correctly, but I would like to check.  Think of it as QA. 
I assume that Sandcastle does not have any spell checking support.

Pat O
Editor
Jan 17, 2009 at 9:55 PM
Sandcastle doesn't have any means of spell checking content right now but that isn't to say you couldn't create a build component that does that.  I have a spell checking library I use in other projects that I got off of Code Project.  I'll see if it's possible to use it in a build component to at least get a list of misspelled words that can be logged for review.

Eric
Jan 18, 2009 at 3:18 AM
Hi Eric,

A spell-checking build component could be useful.  If in your investigation you find that it's worth the effort to write one, please consider supporting a simple design that only outputs a warning message upon the discovery of a misspelled word, including in the warning the line number and column (if possible) and the misspelled word itself.  Please also add support for specifying the dictionary as an XML file (I have no requirements on the schema at the moment :) so that business terms and words that have special meaning in the context of a custom program can be included manually (in the same way that Code Analysis supports an XML dictionary of new words).

Ultimately, I'd like to be able to run an automatic build in the background using a Sandcastle config file that contains only this build component (and any required plumbing) to report a list of spelling errors before the documentation is generated (during the design and authoring phase), and to use the information about the location of each misspelled word (file, line number and column) to highlight the word in an editor.  Please consider this usage in your design so that I don't have to redo all of your hard work in a custom spell-checking build component of my own :)

Thanks, 
Dave
Jan 18, 2009 at 9:14 PM
>>Pat O: Obviously I expect people to spell words correctly, but I would like to check.  Think of it as QA.
Okay, I now understand what you mean, thanks for the clarifications. I will look into it.

Best regards,
Paul.
Editor
Jan 18, 2009 at 9:41 PM
Edited Jan 18, 2009 at 9:46 PM
Hi Dave,

I've got the core processing done in a test application and it looks like it'll work fine.  It works with XML comments and MAML topics and spell checks the content of text nodes and certain attributes that can contain text (i.e. title, lead, and altText).  It uses an XML text reader on the fragments and I've got it set up to ignore certain elements like c, code, codeEntityReference, etc. that most likely do not contain anything worth spell checking which cuts down on false reports.  The elements to ignore as well as the attributes to spell check will be part of the configuration.  A list of words to ignore will be supported too.  It'll just be just that though, a simple list of words, as that's what the library (NetSpell) takes.

Right now, it outputs an XPath expression that indicates the element containing the misspelled and/or doubled-words and a list of the words and the number of times they occur within the element.  Adding line and column info is possible but just be aware that it may not be exact.  For XML comments, line numbering will be relative to the start of the member, not the XML comments file itself, nor will they match up with the comments in the code.  Line numbers in MAML topics will probably be off by at least one as well.  CR/LFs within text nodes would also have to be taken into account when figuring out the actual location of the word too.  It may be close enough to get you in the vicinity, but not always spot on.  Spell checking within an editor is probably best left to a dedicated feature within the editor rather than hacking something together with the output of build assembler.  It would probably be much faster too and could be done in real time as the document is edited for highlighting purposes.

Eric
Jan 19, 2009 at 12:11 AM
Hi Eric,

It sounds like you've hit the target.

Note that I agree about your suggestion to do spell-checking in the editor itself, but my point was that I'd like to have DocProject add a list of misspelled words to Visual Studio's task list and make each word clickable, which would open the correct editor and highlight the word independent of the editor's internal spell-checking functionality.  This has been on my to-do list for a while but it wasn't a high priority.  Seeing as you've found time to work on the build component, maybe I'll find the time to integrate it :)

Thanks, 
Dave
Editor
Jan 19, 2009 at 5:22 AM
It would appear that line and column info may not be that useful.  Internally, the BuildAssembler components used to load the files use XPathDocument which destroys any layout from the original file.  The line numbers and columns reported usually don't line up with the original file, especially if you spread longer elements with attributes across multiple lines for readability (they end up as one long line in the navigator object).  I'll probably leave them in since it's already coded but make it a configuration option as to whether or not they are reported.

I'm not sure it's that important anyway.  Opening the file with the misspelled words should be sufficient as you can just invoke the Spell Check feature and have it do the whole document and take care of all the misspellings at once rather than do them one at a time.  It might be nice to be able to go to the word if there's only one or two or if it's just a transposed letter but in order to do that it looks like you'd have to parse the files using an XmlReader to preserve the layout which kind of defeats the purpose of using a build component where you're given the information already.  You might as well do it as a standalone tool or a background thread in the editor or project system that can add the info to the task window with accurate positions.

Eric
Jan 19, 2009 at 6:21 AM
Hi Eric,

The idea was that XML documentation comments in source code and XML in VS's XML editor (for MAML) aren't spell-checked, so having a build component output spelling errors for all documents at once seemed like a natural fit to me.

You make a good point though about how changes to insignificant whitespace may present a problem.  Obviously I never put much thought into the implementation because I wasn't planning on implementing this yet :)   I just figured that since somebody else brought it up, including line number and column information would be easy enough to do.

But thanks for helping me to work out the problems.

- Dave