Introduction
I was talking to my father about the Lewis and Clark expedition of 1804.
I found the original document (in the public domain) on the gutenberg project’s webpage. I translated a couple of entries for him and he looked interested. Unfortunately I couldn’t find any French translation of these work for him to read.
Given the simple nature of the medium (cleanly separated entries with a title), we wondered if we could use ChatGPT to perform the translation. A couple of manual tests showed that the result was very good. We were really surprised by the quality of the translation.
I decided to build a simple script (Python / Async / openai lib) to translate every single entries (~ 1600) and generate a file in markdown. pandoc was then used to convert it into epub and pdf. It is really more of an afternoon hack than a proper “project”.
The model used is GPT4-Turbo (gpt-4-1106-preview
). It was a fun afternoon project, and who knows, maybe someone else will be able to use it in the future.
Github repository: https://github.com/Blizarre/lewis_clark_journal_french_translation
Original and translation quality
The original document is not perfect, with plenty of abbreviations, old-fashioned words, truncated phrases and OCR errors. But ChatGPT4 handled them pretty well.
[Clark, May 29, 1804]
Tuesday 29th May Sent out hunters, got a morning obsvtn and one at 12 oClock, rained last night, the river rises fast The Musquetors are verry bad, Load the pierogue
Was translated into
Clark, May 29, 1804
Mardi 29 mai, envoyé des chasseurs, pris une observation le matin et une à 12 heures, il a plu la nuit dernière, la rivière monte rapidement. Les moustiques sont très mauvais, chargez la pirogue.
The only caveat is that the tables were not translated well. They were not formatted very well in the original document, so the output is not very good:
F Inch Length from nose to tail 5 2 Circumpherence in largest part– 41/2 Number of scuta on belly–221 Do. on Tale–53
was translated into:
F Pouces Longueur du nez à la queue 5 2 Circonférence à la partie la plus large– 4 1/2 Nombre de scutelles sur le ventre–221 Idem sur la queue–53
We decided that for our purpose this wasn’t an issue, but that would be a potential improvement.