How I reduced text formatting errors by 90%

My current role is as a Channel Manager and Automation Engineer for a company which provides quality control and compliance for TV series and movies that are to be aired on TV across Africa and Europe.

One of my team’s responsibilities is to ensure that the data entries are cleared to be sent to the programming team, whether it be the audio or visual quality of the file, its content (e.g. does it need a specific age rating, or do we need to remove some scenes to comply with the current age rating), or the formatting of the information lined to the file, typically, the cast list or synopsis.

Any formatting issue will prevent the programming team from effectively doing their part of the job, so it is of the utmost importance that my team doesn’t miss anything.

We noticed that we would occasionally receive files with wrongly formatted entries that we were manually changing. The issue with this solution is that it still leaves room for human error.

For example, we would receive a cast list like this:

michelle Yeoh,Stephanie Hsu,Jamie Lee

curtis, Ke huy quan, James Hong

that we would manually change to look like this:

Michelle Yeoh, Stephanie Hsu, Jamie Lee Curtis, Ke Huy Quan, James Hong

or a film Synopsis that looks like this:

a middle-aged Chinese immigrant is swept up into an insane adventure in

which she alone can save existence by exploring other

universes and connecting with the lives she

could have led.

that we would manually change to look like this:

a middle-aged Chinese immigrant is swept up into an insane adventure in which she alone can save existence by exploring other universes and connecting with the lives she could have led.

Now, these aren’t huge issues and don’t happen all the time but it does make things less efficient and introduces the risk of missing a formatting issue which makes it impossible for the programming team to schedule content.

In order to reduce these errors and keep the programming team happy, I put myself forward to write some custom software that our team could use that would standardise the metadata that we input into the program entries. The only restriction was that I needed to avoid making it overly complicated as my manager didn’t want this to take up too much of my time.

With that in mind, I decided to go with a very straightforward GUI within Python called Tkinter which ended up looking like this:

The button on the left has three options: “Cast”, “Synopsis” and “Other”. By picking the relevant option, pasting the text into the box at the top then clicking on convert, you get the following:

On top of removing line jumps, the cast mode makes sure there is exactly one space after every comma, that all names are capitalised, and that any unnecessary commas or unwanted characters are removed.

While the synopsis mode does also remove line jumps, extra spaces, and unwanted characters, it won’t add the extra space after a comma or capitalise every word but instead will add a capital letter at the first word and ensure there is a full stop at the end of the last sentence.

Finally, the other mode is there for more general use cases where you might want to remove just line jumps and special characters without making any formatting changes to the text.

After the whole team started using this tool regularly, we noticed a significant improvement in the number of formatting errors, effectively reducing them by over 90%.

If you’d like to look at the source code, click here: github.com/InguzL/Text_reformatting

P.S: Yes, that is the cast list and synopsis for “Everything, Everywhere All At Once” which is one of my favourite films. I’m so glad you noticed!

Tags: