I've used Slate in a production application for about 2 years now, as part of a CMS for medical professionals to write rich content for other medical professionals and patients.
They tend to write the initial content in MS Word which they are comfortable with and then paste into the editor, then the editor has to reconcile this according to the site's structural elements. Then for editing, this works fantastically well, and allows more complex widgets and unusual design elements to be embedded and edited in the WYSIWYG context in the editor.
When the system was first being built we trialed 10-20 or so other solutions in this space, many with the same design concepts as Slate, but something about this stuck. Just brilliant.
I have been dealing with paste-from-word issues as well. The biggest issue is ordered lists. Word is very smart about keeping the ol going even if there are extra linebreaks in between list items. Most rich text editors mess up this enumeration when pasted from Word. One thing I noticed about slate.js is that it simply converts the ol into text, which keeps the correct numbering. This is a successful solution.
And my question to HN is: why do our web app users keep typing in Word first? I save drafts in local storage, but they still use Word first. Is it the formatting familiarity? Is it a Start Menu shortcut? What is it? What can we do to address users not using our web apps first?
I think they are comfortable with it from using it for a long time.
In fact I find people do everything in MS Office products, generally in ways you wouldn't necessarily expect the software to be used. E.g. PowerPoint is the de facto tool for designing conference posters. To create 'versions' of documents, people will copy and paste the file and add a number at the end.
I think we often get so myopic about our perspective on technology that we forget that non-technologists have a vastly different relationship with it.
(CKEditor team member speaking) Proper handling pasting from Word really takes a lot of time and it's hardly possible to provide a high quality solution alone. We will deliver some basic (in our understanding) support for pasting from office in our most recent editor (https://ckeditor.com/ckeditor-5/) in 2-3 weeks and it already took long weeks of development (see https://github.com/ckeditor/ckeditor5-paste-from-office/issu...), even with our years of experience in rich text editing and with this particular feature.
If pasting from Word is absolutely critical feature for you you may want to check older editors on the market, like CKEditor 4 or even... TinyMCE, one of our competitors. These editors are on the market for 6+ years and had enough time and people to deal with the crap that MS Word produces - correctly preserving as much formatting as possible, without wasting a lot of your end users time on recreating the same content again in an online rich text editor.
It seems to somewhat work in the Paste HTML example [0], for which the source is available [1]. I just did a cursory test with some simple formatting, so YMMV, but it may be a good starting point.
My issue is mainly that Word doesn't really maintain the same structured hierarchy in the XML that HTML would – it's more like a sequential format. The users wanted a way to indicate certain types of content or annotations and they do so via coloring text in certain ways – but I found in practice this is very hard to reconcile since there is a lot of invisible formatting in word, the element may terminate and start again, with a new invisible element in between. Invisible to the user - but very visible to the parser.
Essentially it's a balance of attempting to remove all the spurious elements (`<o:p>`, or invisible empty formatting, etc.) and then reason about what remains. Much of that involves a lot of walking the tree to inspect neighbouring nodes because them being co-located can indicate something.
Look you may be recoiling in horror by now – it sounds horrific. Actually what we have is a remarkably stable system all things considered but it was built up over time. I think the only approach you can take is write a large amount of unit tests for the schema normaliser, with real MS-word samples and expected outputs, and then really put the system through its paces. Every time you find an example that breaks your model, add a unit test for that snippet, and evolve.
God forbid a Word update ever introduces a new format.
While it looks pretty simple it cleans MS Word and browser artifacts in pasted markup pretty well.
But I shall admit that such simplicity is possible only with sciter (that html-notepad is based on). E.g. that canonicalizeDOM gets called before the content appears in target document. So all this does not affect undo/redo stack, etc.
It was an attempt to provide a resource for comprehensive, in-depth, accessible information for maxillofacial / heck & neck patients and practitioners. Part of the movement of empowering patients by giving them the information traditionally held by clinicians, but also helping clinicians to be aware of scientific research from adjacent disciplines that might be useful.
The CMS design comes into it because when you have a larger collection of non-technical authors you need an accessible way to manage the content. This was how it was initially conceived anyway - in practice more development time had to go towards handling the users' existing workflow (MS Word) rather than migrating them into the CMS directly. A real eye-opener for me about user-centred design..
Very cool project - thanks for giving me the background. Just clicked around and enjoyed checking it out. Sounds like you learned a lot and built a valuable tool, well done!
They tend to write the initial content in MS Word which they are comfortable with and then paste into the editor, then the editor has to reconcile this according to the site's structural elements. Then for editing, this works fantastically well, and allows more complex widgets and unusual design elements to be embedded and edited in the WYSIWYG context in the editor.
When the system was first being built we trialed 10-20 or so other solutions in this space, many with the same design concepts as Slate, but something about this stuck. Just brilliant.