Show HN: Fixmydocuments.com – Transform any document into an optimized version
fixmydocuments.comA few months ago, I submitted a webapp that lets you take any YouTube video and turn it into a polished written document in markdown format. I got feedback from people that they wanted something that could work for any audio file. Separately from that, I submitted an open-source project (llm_aided_ocr) a few months back that lets you "upgrade" the output of tesseract OCR, using an LLM to correct transcription errors and also to convert the formatting to use markdown. Well, I decided to combine all those features and more in my newest app, called FixMyDocuments.com.
You can submit any kind of document-- PDFs (including scanned PDFs that require OCR), MS Word and Powerpoint files, images, audio files (mp3, m4a, etc.), and turn them into highly optimized versions in nice markdown formatting, from which HTML and PDF versions are automatically generated. Once converted, you can also edit them directly in the site using the built-in markdown editor, where it saves a running revision history and regenerates the PDF/HTML versions.
In addition to just getting the optimized version of the document, you can also generate many other kinds of "derived documents" from the original: interactive multiple choice quizzes that you can actually take and get graded on; slick looking presentation slides as PDF or HTML (using LaTeX and Reveal.js), an in-depth summary, a concept mind map (using Mermaid diagrams) and outline, custom lesson plans where you can select your target audience, a readability analysis and grade-level versions of your original document (good for simplifying concepts for students), Anki Flashcards that you can import directly into the Anki app or use on the site in a nice interface, and more.
For any HTML generated content, you can also host it with one click and you get a unique URL that you can distribute to anyone for viewing, and they don't need to have an account to see it.
This has been a lot more challenging to make than I originally guessed it would be, but I'm pretty pleased with the final output quality, which was a result of tons of prompt engineering and iteration and chaining together different prompts in pipelines. The mind map generation in particular is ~2,700 lines of Python code and involves many dozens if not hundreds of separate LLM inference calls to generate a single mind map from a source document. What I think is interesting about this is that, even though one theoretically could do many of these things using ChatGPT manually, it wouldn't be practical because of the many stages of complex logic involved in combining and transforming the LLM outputs.
There was also a lot of more manual "quality control" filtering/processing involved to remove any traces of the LLM inserting irrelevant text, such as preambles/introductory comments (even when explicitly prompted not to do so).
Anyway, happy to answer any questions people have about it.
For a real-world test of this, I just tried it with this document featured in this other submission today about OCR:
https://news.ycombinator.com/item?id=42443022
My message from that discussion:
Out of curiosity, I tried submitting the first 200 pages of the PDF he used to my new tool that I also submitted today to Show HN, ( fixmydocuments.com ), and it generated the following without any further interaction besides submitting the PDF file:
https://fixmydocuments.com/api/hosted/m-moires-de-saint-simo...
I think it's not a bad result, and any minor imperfections could be revised easily in the markdown. My feature to turn the document into presentation slides got a bit confused because of the French language, so some slides ended up getting translated into English. But again, it wouldn't be hard to revise the slide contents using ChatGPT or Claude to make them all either French or English:
https://fixmydocuments.com/api/hosted/m-moires-de-saint-simo...
Looks solid, going to try it out.
I'm going to be __that guy__, but just ask - is the functionality set similar to llamaparse or is this llamaparse + llm?
This does a lot more… the optimization state is incredibly elaborate. And some of the derived document type generation is complex enough to be a standalone app in my opinion. It’s really a suite of tools for generating new types of documents from your original document without any additional input required from the user besides the document itself.