EricN
EricN

 

Welcome to DOGland

 

DOGs

 

Technical Manuals on a Leash


 

When I was first hired by the US Forest Service, the first acronym I had to figure out was “DOG”, short for Daily Operations Guide. Large enterprises use DOGs to bring consistency in knowledge, training and operations. Airlines, for example, are probably the tour-de-force in DOGland, down to the torque level on the last screw. I noticed that the IBM technicians who came into the computer center did the same thing. The error codes appearing on the tiny LED panels on the server or tape library were matched in their manuals, and the manuals told them step-by-step the diagnostic procedures, removal and replacement procedures, and validation steps. Not much was left to the imagination.

I learned to respect the IBM way. The IBM consultants that helped us on our AIX systems had a method for everything. I began to take notes and soon I was producing fresh material for operations in our computer center, as well as augmenting the procedures we used in AIX administration and tape backup management. But the original source material I let be.

When I transferred to the Linux team we lost many of our IBM consultants at the same time. The DOGs they left us needed updating. Before I knew it, I was the senior editor of the Linux Daily Operations Guide. Many other DOGs have since followed. I have a kennel full of them now.

Where Do I Begin?

Begin with a simple approach. Consider operations. System administrators get onto servers and find something wrong. What they need to know is very simple.

  1. Get the facts

  2. How do you stop a process

  3. How do you start a process

In many of the DOGs I write, I title this section “The Basics”. The Linux DOG is very complex, but each section of the Linux DOG begins with those first three steps. What is the name of the process? Where is the log file? What is the best way to determine if a process is failing or not performing to specification?

If you are familiar with knowledgebase applications, this section can easily be cut and pasted into a knowledgebase article.

But things are not all that simple

The second section of the DOG I title “Technical Notes” or “Troubleshooting”. This is where the DOG becomes really geeky. Consider “The Basics” Linux 101, and the “Technical Notes” Advanced Linux 240. This is the part of the DOG where technical issues are discussed. They are the sort of material reserved for experts or for material which is not “mainstream”. Some of the topics are open-ended, i.e. no definite procedures to recommend.

Even the simple how-to DOGs I have written a few issues remain that need further examination. It is nice to be able to have those issues discussed in a separate location, enabling the rest of the DOG to be published with the confidence the material is meaningful and easily understood.

Give credit where credit is due

Each section of my DOG has a sub-section for Contributors. Since DOGs are internal documents and never sold to the public, copyright infringement is not really a problem. But it is a standard of professional ethics that you give credit to those who provide the knowledge. It also increases interest and participation in DOG contributions. I often include web sites where I found useful information. This can also make it much easier to identify Subject Matter Experts (SMEs).

Team Notes

The Linux DOG is also a guide for the conduct of operations for the team. So there are sections in the DOG that enable team members to literally read off the same page. Such things as how to get administrative rights to a system or domain, pre-requisites for doing the job, recommended software on their workstations, and monitoring tools.

I wrote up a DOG for how to handle tape backups in the regional computer center because the main document used by the USFS was purely technical operations. It did not contain the particulars about doing the job. What account IDs do you use? Who do you ask for one? What are the protocols for tape handling? Who are the SMEs? I was new to the USFS at that time, still trying to figure out the system. But today I would probably recommend that they add to the tape backup DOG a section that deals with how the team operates. This is the sort of material you would hand to your colleague when you went on vacation.

Editing Rules

I remember my reaction when I received an updated tape backup DOG. At the beginning it stated which sections were changed. But I had no clue what changed. I would discover that later when AIX DOGs were updated. Such-n-such section was changed, but I had no idea what had changed. (Sorry, I don’t memorize DOGs).

To solve that problem, I use a set of rules for governing how DOGs are changed. Consider the following steps.

  • Have your working copy of the DOG at a central site where team members can contribute their material

  • Delegate one person to be the senior editor.

  • Discuss the proposed changes to assure there is consensus.

  • Publish the DOG to read-only designated sites

  • Update knowledgebase entries if necessary

The important thing is to distinguish a working copy from a published copy. Nothing can be more confusing than a published DOG with edit marks and several debated points. Save that for the working copy. The working copy is where you can freely add content, recommend deletions, insert questions or concerns. Have at it. The only people that see this stage are the folks on the team.

Once a month or so, discuss the changes and come to a consensus. The senior editor can clean up the document and then publish it. And, again, if some topics are not resolved, you can reserve that for the technical notes of the DOG.

Use Colors

I like colors. To clearly show what has changed, use colors. Here is an example.

Linux Operations Team

Version 3.51

March 1, 2016

Prepared for

USDA Forest Service


 

This shows that all changes for the month of February will be highlighted in magenta. The document is scheduled for publication on March 1st. So anybody who reads this document will readily see that all changes made during the past month are highlighted in magenta.

I adopted IBM’s way of providing a synopsis of changes for the month. Now you can see how changes in the past six months are distinguished, based on color. Any item highlighted in yellow was introduced to the DOG during the month of August, published September 1st.


 

Revision Table

3.46

07/01/2015

Sections 2,3,9,11

Eric Niewoehner

3.47

08/01/2015

Sections 2,3,6,7,19,21,22, Appendix

Eric Niewoehner

3.48

09/01/2015

Sections 2,7,15,19

Eric Niewoehner

3.49

10/01/2015

Sections 2,7,10,14,15,22

Eric Niewoehner

3.50

02/01/2016

Sections 4,7,8,11,14,15,19

Eric Niewoehner

3.51

03/01/2016

 

 


 

If this looks a bit too complicated, try this using one color to highlight changes for the month. At least everyone on the review panel will know what has changed.

Highlighting headers in Microsoft Word causes the Table of Contents to adopt the coloration. In this example, advancements were made in measuring system performance.

 

 

Sample Table of Contents

14 Monitoring and Performance

14.1 CPU Utilization 

14.1.1 Gathering the Process Data 

14.1.2 Zombies 

14.1.3 Core Dumps 

14.1.4 Wait States 

14.2 I/O Utilization

14.3 Network Utilization

14.4 Troubleshooting

14.4.1 Integration of CPU Alerts

14.4.2 Integration of Memory Alerts

14.5 Training Path

14.6 Resources


 

Anyone who views the Table of Contents can readily see which sections have been affected. In this example, changes have been introduced during the current month (February) highlighted in magenta and there were changes made to the “Wait States” section the previous month, highlighted in light blue. If you keep up with changes to DOGs, and you are a busy person, you can click on the sections highlighted in magenta and immediately learn what has changed.

Using section 14.4 as an example, you can see that this section has been very active in the past two months, with contributed material in two colors.

Sample Text

So, at this point, this figure is used for informational purposes only. When an alert is issued, you will need to balance this with other information to gauge whether it is a problem or not.


 

  • Is Swap Memory being utilized?

  • Are there higher than normal CPU wait states (>5%)

  • What are the primary processes running on a system (Oracle, Java, etc)

  • Are there any Incidents in ServiceNow reflecting complaints over performance

  • Check with developers to see if they have observed any performance issues

Suggested commands:

ps aux –sort pmem

For the next month, April, I will go to the beginning of the color cycle, returning to using red to highlight all changes.

  • First, be sure to publish the latest version of the DOG.

  • Then, from the working copy of the DOG, remove all red highlighting

  • Add the next expected revision date to the Revision table.

  • Mark it in red

  • Update the title page and highlight the expected date and version number in red.

After that is completed, simply wait for the next change to emerge for the DOG, and highlight the changes in red.

Removing Content

The last thing that you do with DOG editing is manage deletions. One thing we do on the Linux team is not jump the gun on deletions. I can’t count how many times somebody says that such-and-such application is history when you are informed two months later that the application is still being used.

To manage deleted content, follow these steps:

  • Create an archival folder for older DOGs

  • Strikethrough material that is planned for deletion

  • Highlight it with the current month’s color.

  • Leave it that way for six months (a full cycle of colors)

  • When it comes time to re-use a color (in this example, green), note if there are any planned deletions highlighted in green.

  • If so, save the DOG in the Archive, titled with the file name followed by the version number.

  • Return to the working copy of the DOG

  • Delete crossed-out content highlighted in green

  • Then remove all green highlighting from the rest of the DOG.

The example below is from an obsolete process. It was crossed-out and highlighted in green. It remained that way until the color green is re-used for a current month’s changes.

Pam_tally RN does not install


 

Refer to RN-2013-242


 

Symptom – RN is applied but the pam_tally errors still pour into the system log.

Now suppose someone comes along and says, “Oh, by the way, for SuSE 9 systems we will still need to use this application.” I have two options for recovery.

  • If it is still in the current DOG, simply remove the highlighting and the strike-through

  • If it was deleted, you can do a search in your archive, locate the text, cut and paste it to the working copy of the DOG.

Knowledgebase Integration

In this day and age, knowledgebases are the thing. Large enterprises will usually have incident management software that includes a knowledgebase. But not everything fits nicely into a knowledgebase. I laughed when a data center director asked if our Linux DOG could be integrated into the knowledgebase. My reply was, “All 194 pages?” Both of us agreed that knowledgebase articles are usually short and concise, designed to solve a problem. DOGs, on the other hand, can be quite comprehensive. Sections I described above, like “Troubleshooting,” don’t fit nicely into the knowledgebase concept.

Knowledgebases also have issues with formatting. A nice, trim table you constructed in a Word document may not look so readable in a knowledgebase format. Links, anchors and bookmarks may not translate. Images will not transfer. In essence, there is a lot more you can do with a Word document.

To solve this riddle, consider the following:

  • Have knowledgebase articles derive from “The Basics”. They need to be short and concise “how-to” articles.

  • For the section in your DOG that will be copied to a knowledgebase article, attach a footnote that provides the corresponding knowledgebase number.

  • Whenever a change occurs to that section, copy and paste the contents to that knowledgebase article.

In the example below, a section on “ZenWorks” is footnoted. At the bottom of the page is the corresponding knowledgebase number. Whenever changes are made to this section, the editor knows which sections to update in the knowledgebase.

ZENWorks (SMT/ZMD)1

ZENWorks is recognizable on SuSE 10 systems through a couple of acronyms: SMT and ZMD. It is an automated solution for patch management where approved updates to the operating system and related ….

Notice the footnote appears below. It is also hyperlinked so a technician can easily go from the DOG to the corresponding knowledgebase article.


 

Conclusion

Now you have the nuts and bolts of a Daily Operations Guide. Hopefully these pointers make it easier to construct a DOG.

  • Begin with the Basics

  • Add a technical section for open discussions

  • Give credit where credit is due

  • Distinguish between a working copy and a published finished document

  • Add material that affects the whole team

  • Use colors to distinguish recent changes

  • Archive DOGs as they change to prevent accidental loss of content.