Well, just recently, I successfully converted fifteen of the sixteen posts that I perviously made over to an ebook format. This reminded me of something that is very important to do, which is to back up data, especially blogs that people run. Today, I will be talking about a few ways to backup a blog.
Why are backups important? In this day and age of reliance on computer technology, we are bound to lose data due to something like corruption and hard drive failure. We even lose our data when hard drives are formatted. The data still exists on the hard drive or other secondary storage medium, but it is not an easy task to recover that data and not every computer technician or expert has the tools, or knowledge, to do it. By backing up the data, it can be easily restored, no matter what happens to the computer or server. I know all of this because I graduated with a degree in computers and I have been helping people solve problems as sort of a side job, while my main income comes from writing books. Back ups makes things easier for the end user, when problems arise.
Backups are not just important for regular data, but also the data that we supply in these kinds of websites, commonly referred to as weblogs or blogs. Many people, such as myself, use blogs to keep a record of themselves, although books written by an author could very well replace the need for such things. However, like all websites, data can be lost when the server crashes or the network the server resides on crashes. This applies to blogs that are publicly viewable, such as this, or those that are private, which are mainly hosted on server that are only available via LAN. The only difference between the two, in this case, is that the ports necessary for webhosting are not open or forwarded between the computer and the modem (or router, computer, and modem, if the network is a wireless network). However, backing up a blog is not always exactly like backing up a website, at least if the said website is produced as static pages via a text editor or something like MS FrontPage or Adobe Dreamweaver, which was formerly a product from Macromedia. There are some blogware, or blogging software, that is based on a flat file system, where each page is its own file, but most rely on database software, such as MySQL, MS SQL, PostgreSQL, and SQLite. The latter is not like the others in that all databases resides in the same area, but each database is its own file. To back up a blog, one needs to back up the static files, such as images and themes/templates, and the database. The database is where all of the blog's content resides, but it also holds your user names, passwords, publication dates, comments, etc. This is just the normal way to back up a blog, at least a self hosted blog. For something like Blogger, currently owned by Google at the time this was written, or Wordpress, one could just export a little XML file. However, that XML file will not carry over images, only posts, comments, and categories/tags at best.
That sounds like a lot of work. Is there another way that I can back up a blog? Yes, there is one other way and that is to make an electronic book, ebook for short. However, there are some required items.
- retrieval program (i.e. wget or curl)
- software to create ebook (i.e. text editor, archival software, application to make PDF)
The first thing to do is retrieve the posts and save them to HTML files. The easiest way to do this is to use programs like wget or curl. All that needs to do with either one is specify the options and a web address. They both even work with the localhost reference that is often used in a local server, which is a subset of the private server model, but it does not even require a network connection, since the server is on the same machine the user is currently using. Programs like Adobe Acrobat Pro can also retrieve data, in case one is not comfortable with a CLI, but those are not as good as wget and curl, since they require extra stuff, whereas curl and wget do not require major changes, if any. Posts must be retrieved, before they can be turned into an ebook.
Once the posts are retrieved, they can be put together into an ebook. Acrobat can do this automatically, since it creates a PDF of the site that is requested to be retrieved. However, with eith curl or wget, the user needs to create the ebook themselves. As they retrieved files are HTML, CSS, and images, provided you tell the program to get everything necessary for the posts (otherwise you only get HTML), the easiest formats to create will be EPUB and Kindle. As for the actual process of creating those, that is beyond the scope of this post. However, the basis of each is HTML and its required components. The EPUB standard requires XML with that content and the Kindle edition can be produced by using one of those XML files as a reference. The next easiest would be PDF, but that will not guarantee the formatting of each post is kept, but unlike EPUB, it does not provide easy access to the images used in the post. After files are retrieved, steps must be taken to make the actual ebook.
Okay, I produced my ebook. How do I import my posts from it? If you created an EPUB, this is a trivial matter, since all that needs to be done is decompress the book. The EPUB format is just HTML and the files it needs to render properly and XML all wrapped up in a zip archive. By decompressing, or unarchiving the file, access is granted to the HTML files, which is where the posts will be contained. The images used in those posts will be their own files. The first thing hat needs to be uploaded, after restoring the blog platform, if necessary, is the images. If the images are not uploaded, they will not load in their designated posts. After that, just copy and paste from the HTML files (preferrably code view, so that images will still be associated with posts). Once that is done, everything should be up and running.
So far, ebooks, XML, and database back ups were mentioned. Which one is the best method? ebooks and XML take the most time to restore, since neither one includes images with posts, when uploading to a blog. Also, XML is the only one of the two that reserves the publication date of posts, so restoring posts from ebooks will be like creating recent entries. Because of this fact, neither one of those two are the best method for backing up a blog. The best method is backing up the blog's directory (aka. static files) and database preserves everything about the blog, such as images, publication dates, etc. The database method also takes the least time to restore, since one only needs to import the data from the previous database and upload the folder that is the blog's structure. While I recommend the database backup method, it is best that the user decides the best method for themselves.
Blogs are just like any other piece of computer data. they will be lost on a network or hardware failure and must be backed up, in order to ensure that data can be accessed and restored. The back up process involves retrieving the necessary files and either putting them in an ebook or backing up the individual files (including database backups).
What are your opinions on backing up blogs? Do you have any other options? Feel free to comment.
Use an app on your phone (e.g. Scan for Android) to capture the image above. If successful, you should be taken to the web version of this article.