search_replace

One of the perks of being an indie publisher is that you can do everything your way, from manuscript formatting to cover design and marketing approach. But it’s also one of the downsides of being an indie publisher. Perhaps you can afford to hire an intern, but chances are that you’ll have to keep your nose to the grindstone for now.

Therefore, every technical detail about your indie publisher existence that can be automated should be automated so you can save your precious brain juices for the important stuff.

When it comes to manuscript editing and formatting, there are (at least) two major ways in which you can outsource your energies to the machine in front of you:

  • Styles
  • Regular Expressions

I’ve already written about document Styles at great length in this tutorial series about creating ebooks with open source tools. Today I’d like to give you a very brief introduction to the topic of regular expressions, and why it might be worthwhile to learn more about it.

Note: I’ll focus on Open Office’s Writer here, but regular expressions are available in a number of text processing tools. While the specific application varies, the principle is always the same.

Regular What? Is that Something to Eat?

What are regular expressions? A quick look on Wikipedia and you’ll know:

a regular expression (abbreviated regex or regexp) is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations.

All clarity eradicated? Seriously! What are regular expressions and why should indie publishers care?

Regex is like a turbo-charged Search & Replace function. Instead of just being able to find single words across a document and replace them with another word, regular expressions allow you to find patterns across documents and replace them with different patterns.

Regular expression can be applied in a myriad different ways. Today I’d like to show you a few examples.

From Straight Quotes To Curly Quotes

Straight quotes are the two standard vertical quotation marks. There’s the single straight quote (') and the straight double quote ("). These straight quotes are a remnant from the typewriting age. They aren’t considered good typography. Read more about it here.

It’s always recommended to use curly quotes, because they come in pairs and help the reader to better parse a text. Here’s the single opening curly quote (‘) and here’s the closing curly quote (’). And here’s the double opening curly quote (“) and the double closing curly quote (”).

Straight and curly quotes are also sometimes called dumb and smart quotes, in case you’ve seen those terms before.

curly-straight-quotes

Now, let’s say you have a 350 page novel full of witty dialogue, but you’ve used straight quotes everywhere, and now you want to change them all into curly quotes. Nobody in their right mind would try to change them manually (although I’ve done these things manually far too often before I found out about alternatives).

You could just go ahead and run a search for " and replace it with “, but that would just turn all your quotes into opening quotes.

Here’s a very simple example how you can use regular expressions to help you out. First of all open your search & replace dialogue,  either manually or by hitting CTRL+F, or Command-F on Mac.

location of regular expression tick box

Before we can start with our regular expressions magic in Open Office’s Writer, we’ll have to click on “More Options” and then tick the field “Regular expressions”.

What we’re going to do now is to find all the double quotes at the beginning of a paragraph and turn them into double curly quotes.

To illustrate this process, let’s apply our function to an excerpt from the Hemmingway story A Clean, Well-Lighted Place.

dialogue-hemmingway-clean-well-lighted-place

So, all these straight quotes at the beginning of a paragraph should be turned into curly opening quotes. To accomplish this, enter the following into your searchbox:

^"

This regular expression looks for the beginning of a paragraph (^) followed by a double quote.

In your replace box, enter:

That’s the opening curly quote. Once you click Replace All the above text will look like this:

hemmingway-quotes

In your search box, enter:

"$

The dollar sign stands for the end of a paragraph, so this regular expression looks for a double quote before the end of a paragraph.

In your replace box, enter:

And the result:

hemmingway-all-smart

Now all straight quotes at the beginning and end of a paragraph are converted to curly quotes. As you can see, these two simple regular expressions discussed above do not catch all the cases, i.e. if the quote is not at the beginning or end of a paragraph it will not be replaced.

To solve this, you could come up with more  advanced regular expressions to find all these quotes inside paragraphs, or you could just use a plain old search & replace, looking for (,") and replacing it with (,”) to deal with this specific situation.

Search & Destroy Empty Paragraphs With Regular Expressions

Another very simple use of regular expressions is to delete empty paragraphs. Sometimes empty paragraphs are automatically created, sometimes you use them while composing a text, but you want to get rid of them later.

Here’s how.

In your search box, enter:

^$

You know both of these signs already. The sign ^ stands for the beginning of a paragraph and $ for the end. So if the end comes directly after the beginning, this means we got an empty paragraph!

In your replace box, write nothing this time, because we want to replace these empty paragraphs with sweet nothingness:

no-empty-paragraphs

Note: If you want some more space between your lines, it’s far better to use the line-spacing function of your text processing software than empty paragraphs.

Further Study

The above examples are very very simple regular expressions. Regular expressions are much more powerful than that, allowing you to search for complex patterns and replace (parts of) them. Here’s an example from OOOninja:

200712-diagram-regex

As you see we can use Regex to manipulate any kind of text or numbers, and dissemble and re-assemble them. For example, the red part looks for a 4 digit number sequence, and by wrapping a search in curved brackets, its result can be retrieved through $1 (contents of first bracket), $2 (contents of second bracket), etc. But these things can be scary, so it’s good to take small steps at first.

To find out more about regular expression in Open Office, I recommend this official documentation. For specific uses, it can also be helpful to google and read through forums.

I hope you’ve found this short introduction helpful and got a bit interested in the bewildering world of regular expressions. If you have any questions, don’t hesitate to ask — that’s what the comments are for!

img: "Search & Destroy" poster mashup / diagram: oooninja / shirt: by Lasse Havelund via flickR