A Highly Reliable Enterprise System for NASA's Mars Rover Mission > Reliability

20.5. Reliability

Reliable code continues to perform well without problems. It rarely crashes, if ever. As you can imagine, code that is on board the Mars rovers must be extremely reliable because making on-site maintenance calls is somewhat difficult. But the MER mission wanted earthbound software used by mission control to be reliable, too. Once the mission was underway, no one wanted software problems to disrupt operations.

As noted earlier, the CIP project took several measures to ensure the intrinsic reliability of the system:

We further enhanced reliability with extra nails: service logging and monitoring. While these features can be useful for debugging even small programs, they become essential for keeping track of the runtime behavior of large applications.

20.5.1. Logging

During development, we used the open source Apache Log4J Java package to log nearly everything that occurred in the middleware services. It was certainly useful for debugging during development. Logging enabled us to write code that was more reliable. Whenever there was a bug, the logs told us what was going on just prior to the problem, and so we were better able to fix the bug.

We originally intended to reduce the logging only to serious messages before CIP went into operation. But we ended up leaving most of the logging on since it had a negligible impact on overall performance. Then we discovered that the logs gave us much useful information not only about what was going on with each service, but also how client applications were using the services. By analyzing the logs (which we called "log mining"), we were able to tune the services for better performance based on empirical data (see "Dynamic Reconfiguration," later in this chapter).

Here are some code snippets from the streamer service provider bean that show how we did logging for file downloading. The getDataFile( ) method processes each "Get Data File" request (via web services) from the client applications. The method immediately logs the request (lines 15–17), including the user ID of the requester and the filepath of the desired source file:

1 public class StreamerServiceBean implements SessionBean
2 {
3     static {
4         Globals.loadResources("Streamer");
5      };
6
7      private static Hashtable readerTable = new Hashtable( );
8      private static Hashtable writerTable = new Hashtable( );
9
10     private static BeanCacheStats cacheStats = Globals.queryStats;
11
12     public FileToken getDataFile(AccessToken accessToken, String filePath)
13         throws MiddlewareException
14     {
15         Globals.streamerLogger.info(accessToken.userId( ) +
16                                     ": Streamer.getDataFile("
17                                     + filePath + ")");
18         long startTime = System.currentTimeMillis( );
19
20         UserSessionObject.validateToken(accessToken);
21         FileToken fileToken = doFileDownload(accessToken, filePath);
22         cacheStats.incrementTotalServerResponseTime(startTime);
23         return fileToken;
24     }
25


					    

The doFileDownload( ) method creates a new file token (line 30) and file reader bean (line 41), and then calls the reader bean's getDataFile( ) method (line 42). The cacheStats field deals with runtime monitoring, which is described later:

26     private static FileToken doFileDownload(AccessToken accessToken,
27                                             String filePath)
28         throws MiddlewareException
29     {
30         FileToken fileToken = new FileToken(accessToken, filePath);
31         String    key       = fileToken.getKey( );
32
33         FileReaderLocal reader = null;
34         synchronized (readerTable) {
35             reader = (FileReaderLocal) readerTable.get(key);
36         }
37
38         // Create a file reader bean to start the download.
39         if (reader == null) {
40             try {
41                 reader = registerNewReader(key);
42                 reader.getDataFile(filePath);
43
44                 return fileToken;
45             }
46             catch(Exception ex) {
47                 Globals.streamerLogger.warn("Streamer.doFileDownload("
48                                             + filePath + "): " +
49                                             ex.getMessage( ));
50                 cacheStats.incrementFileErrorCount( );
51                 removeReader(key, reader);
52                 throw new MiddlewareException(ex);
53             }
54         }
55         else {
56             throw new MiddlewareException("File already being downloaded: " +
57                                           filePath);
58         }
59     }
60


					    

The readDataBlock( ) method processes each "Read Data Block" request from the client applications. It looks up the correct file reader bean (line 71) and calls the reader bean's readDataBlock( ) method (line 79). At the end of the source file, it removes the file reader bean (line 91):

61     public DataBlock readDataBlock(AccessToken accessToken, FileToken fileToken)
62         throws MiddlewareException
63     {
64         long startTime = System.currentTimeMillis( );
65         UserSessionObject.validateToken(accessToken);
66
67         String key = fileToken.getKey( );
68
69         FileReaderLocal reader = null;
70         synchronized (readerTable) {
71             reader = (FileReaderLocal) readerTable.get(key);
72         }
73
74         // Use the reader bean to download the next data block.
75         if (reader != null) {
76             DataBlock block = null;
77
78             try {
79                 block = reader.readDataBlock( );
80             }
81             catch(MiddlewareException ex) {
82                 Globals.streamerLogger.error("Streamer.readDataBlock("
83                                              + key + ")", ex);
84                 cacheStats.incrementFileErrorCount( );
85                 removeReader(key, reader);
86                 throw ex;
87             }
88
89             // End of file?
90             if (block == null) {
91                 removeReader(key, reader);
92             }
93
94             cacheStats.incrementTotalServerResponseTime(startTime);
95             return block;
96         }
97         else {
98             throw new MiddlewareException(
99                 "Download source file not opened: " +
100                fileToken.getFilePath( ));
101        }
102    }
103


					    

The registerNewReader( ) and removeReader( ) methods create and destroy the stateful file reader beans, respectively:

104    private static FileReaderLocal registerNewReader(String key)
105        throws Exception
106    {
107        Context context = MiddlewareUtility.getInitialContext( );
108        Object queryRef = context.lookup("FileReaderLocal");
109
110        // Create the reader service bean and register it.
111        FileReaderLocalHome home = (FileReaderLocalHome)
112            PortableRemoteObject.narrow(queryRef, FileReaderLocalHome.class);
113        FileReaderLocal reader = home.create( );
114
115        synchronized (readerTable) {
116            readerTable.put(key, reader);
117        }
118
119        return reader;
120    }
121
122    private static void removeReader(String key, FileReaderLocal reader)
123    {
124        synchronized (readerTable) {
125            readerTable.remove(key);
126        }
127
128        if (reader != null) {
129            try {
130                reader.remove( );
131            }
132            catch(javax.ejb.NoSuchObjectLocalException ex) {
133                // ignore
134            }
135            catch(Exception ex) {
136                Globals.streamerLogger.error("Streamer.removeReader("
137                                             + key + ")", ex);
138                cacheStats.incrementFileErrorCount( );
139            }
140        }
141    }
142 }


					    

Now, here are code snippets from the file reader bean. The cacheStats and fileStats fields are for runtime monitoring, as described later. The getDataFile( ) method logs the start of the file download (lines 160–161):

143 public class FileReaderBean implements SessionBean
144 {
145     private static final String FILE = "file";
146
147     private transient static BeanCacheStats cacheStats = Globals.queryStats;
148     private transient static FileStats       fileStats = Globals.fileStats;
149
150     private transient int                  totalSize;
151     private transient String               type;
152     private transient String               name;
153     private transient FileInputStream      fileInputStream;
154     private transient BufferedInputStream  inputStream;
155     private transient boolean              sawEnd;
156
157     public void getDataFile(String filePath)
158         throws MiddlewareException
159     {
160         Globals.streamerLogger.debug("Begin download of file '"
161                                      + filePath + "'");
162         this.type = FILE;
163         this.name = filePath;
164         this.sawEnd = false;
165
166         try {
167
168             // Create an input stream from the data file.
169             fileInputStream = new FileInputStream(new File(filePath));
170             inputStream     = new BufferedInputStream(fileInputStream);
171
172             fileStats.startDownload(this, FILE, name);
173         }
174         catch(Exception ex) {
175             close( );
176             throw new MiddlewareException(ex);
177         }
178     }
179


					    

The readDataBlock( ) method reads each data block from the source file. When it has read the entire source file, it logs the completion (lines 191–193):

180     public DataBlock readDataBlock( )
181         throws MiddlewareException
182     {
183         byte buffer[] = new byte[Globals.streamerBlockSize];
184
185         try {
186             int size = inputStream.read(buffer);
187
188             if (size == -1) {
189                 close( );
190
191                 Globals.streamerLogger.debug("Completed download of " +
192                                              type + " '" + name + "': " +
193                                              totalSize + " bytes");
194
195                 cacheStats.incrementFileDownloadedCount( );
196                 cacheStats.incrementFileByteCount(totalSize);
197                 fileStats.endDownload(this, totalSize);
198
199                 sawEnd = true;
200                 return null;
201             }
202             else {
203                 DataBlock block = new DataBlock(size, buffer);
204                 totalSize += size;
205                 return block;
206             }
207         }
208         catch(Exception ex) {
209             close( );
210             throw new MiddlewareException(ex);
211         }
212     }
213 }


					    

Here are some sample streamer service log entries:

    2004-12-21 19:17:43,320 INFO : jqpublic:
    Streamer.getDataFile(/surface/tactical/sol/120/jpeg/1P138831013ETH2809P2845L2M1.JPG)
    2004-12-21 19:17:43,324 DEBUG: Begin download of file '/surface/tactical/sol/120/
        jpeg/1P138831013ETH2809P2845L2M1JPG'
    2004-12-21 19:17:44,584 DEBUG: Completed download of file '/surface/tactical/sol/120/
        jpeg/1P138831013ETH2809P2845L2M1.JPG': 1876 bytes


					    

Figure 20-6 shows a useful graph of information we can glean from log mining. The graph shows the trend in the amount of downloading (the number of files and the number of bytes downloaded) over a period of several months during the mission. Over shorter periods of time, the graph can show spikes whenever one of the rovers makes an interesting discovery.

Figure 20-6. A graph generated from "mining" the CIP streamer service logs


20.5.2. Monitoring

Logging enables us to analyze the performance of the services by examining what they have been doing over a period of time. Unlike log entries, which are most helpful in pinpointing problems and their causes, runtime monitoring helps us see how well the services are currently performing. It gives us a chance to make dynamic adjustments to improve performance or to head off any potential problems. As mentioned earlier, the ability to monitor operational behavior is often critical to the success of any large application.

The code listings previously shown included statements that update the performance data kept by global static objects referenced by the fields cacheStats and fileStats. A middle-ware monitoring service probes this performance data upon request. The global objects to which these fields refer aren't shown, but you should be able to imagine what they contain. The key point is that it's not complicated to gather useful runtime performance data.

We wrote the CIP Middleware Monitor Utility as a client application that periodically sends requests to the middleware monitoring service to obtain the current performance data. Figure 20-7 shows a screenshot of the utility's Statistics tab that displays, among other runtime statistics, the number of files and bytes that have been downloaded and uploaded by the streamer service, and the number of file errors (such as an invalid filename specified by a client application) that have occurred.

Figure 20-7. Screenshot of the Statistics tab of the Middleware Monitor Utility


The streamer service provider bean's doFileDownload( ) and readDataBlock( ) methods both update the global file error count (lines 50 and 84 in the code shown earlier in the section "Logging"). The getDataFile( ) and readDataBlock( ) methods increment the global total service response time (lines 22 and 94). As seen in Figure 20-7, the middleware monitor utility displays average response times under the "Total Server Response" label.

The file reader bean's getDataFile( ) method records the start of each file download (line 172). The readDataBlock( ) method increments the global total file and byte counts (lines 195 and 196) and records the completion of a download (line 197). Figure 20-8 shows a screenshot of the Files tab of the monitor utility, which displays current and recent file downloading and uploading activity.

Figure 20-8. Screenshot of the Files tab of the Middleware Monitor Utility