Thursday 15 June 2017

Thinking about how I can save all these blog posts as a book or an InDesign document....

I've been looking at ways to convert the blogs I've been writing over the last year, into a more recognisable text that can be imported into MS Word, or into Adobe InDesign etc.

As I'm using Google's 'Blogger' and Blogspot for my regular updates, it seems that some time ago, the organisation removed the "export" to MS Word function from the 'Blogger' tools.  I've therefore only been able to find the 'Backup' option within blogger, that simply 'dumps' a full XML file to disk.
Looking at resources like W3C.org, which is a HUGE repository and has lots of the Web format standards on-line, (The World Wide Web Consortium (W3C) is an international community that develops open standards to ensure the long-term growth of the Web), the following considerations could be ascertained.

So, the things I need to consider are;

  • The 'Atom' format is used by Google
  • A 'Transformation' file required by MS Word.
  • A need to create a Book-and-Word-Style Sheet, using Cascading Style Sheets format (CSS).

Special thanks to Alex Milowski who wrote the original Atom user case in the repository on W3C as a wiki, at https://www.w3.org/wiki/Atom which allowed the AtomTransform[XSL] to be created/ciphered and checked... and the original Atom Syndication Format, written way back in 2005, at https://tools.ietf.org/html/rfc4287

The simple coding was originally written by the brilliant CSS & Web design 'guru' David Kutcher, (based just west of Boston, in Easthampton, Massachusetts).

AtomTransform.xsl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:atom="http://www.w3.org/2005/Atom"
  exclude-result-prefixes="atom"
>
  <xsl:template match="/">
    <html>
      <head>
    <link rel="stylesheet" href="style.css" />
     
      </head>
      <body>
 
 
          <xsl:apply-templates select="atom:feed/atom:entry[atom:category/@term='http://schemas.google.com/blogger/2008/kind#post']">
        <xsl:sort select="position()" data-type="number" order="descending"/>
      </xsl:apply-templates>
 
      </body>
    </html>
  </xsl:template>
 
  <xsl:template match="atom:entry[atom:category/@term='http://schemas.google.com/blogger/2008/kind#post']">
      <h1><xsl:value-of select="atom:title"/></h1>     
      <h2><xsl:value-of select="atom:author/atom:name"/></h2>
     <h3><xsl:value-of select="substring(atom:published,0,11)" /></h3>
<xsl:value-of select="atom:content"  disable-output-escaping="yes" />
  </xsl:template>
</xsl:stylesheet>

This makes the import/open in Word with specific headers and footers found in the XML file and treated like a database formatting.

Once the blogger archive .xml file has been saved from the 'backup' in google, and then on loading into the MSWord application, simply use the "Open" [Browse] and then select the Google Blogger.XML file, but ensure it is opened using the 'drop down' [open] dialogue button.  This allows you to open the file with an XML transform file, which is simply a format and data selection based on the above code written with an editor or in Notepad and saved with a *.xsl suffix. e.g. AtomTransform.xsl   -  MS Word may find it's own style sheet called style.css from the myriad files with that name already [in a registry somewhere!], but if not, and it's probably not necessary, (I didn't need to do this as it sorted itself out).  In this case, if all the above doesn't work, and it's just good practice anyway, create a .css style sheet. (Thanks to David again!)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
body {
    font-size1.0em;
    font-weight:    normal;
}
table{
  display: table;
}
 
tr{
  border: 1px solid red;
  display: table-row;
}
 
th{
  font-weight: bold;
  text-decoration: underline;
  padding: 0.5em 1.0em 1.0em 1.0em;
}
 
td{
  padding: 1.0em;
  vertical-align: top;
}
 
h1 {
    font-weight: bold;
    font-size3.0em;
}
h2 {
    font-weight: bold;
    font-size2.0em;
}
h3 {
    text-transform: uppercase;
    font-size1.5em;
}

So the way I did this was to create the above in Notepad and save it as [style sheet name, e.g. Style.css], that is, with the .css suffix.

Then , [again, only if necessary], but in the original xml file, it may be necessary to redirect a pointer within the XML code.  Edit the [backup-archive].xml in Notepad and change the following lines:

<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
to be
<?xml-stylesheet href="AtomTransform.xsl" type="text/xsl"?>

This will make sure that it points the xml-stylesheet to "AtomTransform.xsl" or whatever else we've decided to call it.

Conclusions & Next steps;


  1. The outcome is a perfectly formatted Word file which incorporated the date of the blog, AND the pictures too! - Brilliant!
  2. The next plan is to create a document with Chapters, which I can do with some slight hacking to the .xls code above, to recognise and then sort my 'Labels' tags, such as Book ReviewCreative InnovationDigital Media ConceptsDigital Media ProcessesDrawingsGallery VisitGuest LectureIdeasLectureLecture NotesMajor ProjectPracticeReflectionsResearchRotor ExhibitionSymposiumTMA1401TMA1402TMA1403TMA1404TMA1407TutorialWorkshop  and so on.  
  3. But the only issue I see here is that if I add to these labels, I will need to add code each time, so I'm thinking about how to do this automatically (I'm sure it's possible, but I'm very rusty when it comes to coding)...

References:


  • Atom Syndication Format, (2005), at https://tools.ietf.org/html/rfc4287
  • Kutcher, D. (2013) , http://www.blogxpertise.com/  (Easthampton, Massachusetts). Retrieved 15/6/2017.
  • Milowski, A. (no published date). Atom user case in the repository on W3C as a wiki, at https://www.w3.org/wiki/Atom  Retrieved 15/6/2017.