Reliable code continues to perform well without problems. It rarely crashes, if ever. As you can imagine, code that is on board the Mars rovers must be extremely reliable because making on-site maintenance calls is somewhat difficult. But the MER mission wanted earthbound software used by mission control to be reliable, too. Once the mission was underway, no one wanted software problems to disrupt operations.
As noted earlier, the CIP project took several measures to ensure the intrinsic reliability of the system:
Adhering to industry standards and best practices, including J2EE
Using proven COTS software wherever practicable, including a commercial application server from an established middleware vendor
Implementing simple, straightforward middleware services
We further enhanced reliability with extra nails: service logging and monitoring. While these features can be useful for debugging even small programs, they become essential for keeping track of the runtime behavior of large applications.
During development, we used the open source Apache Log4J Java package to log nearly everything that occurred in the middleware services. It was certainly useful for debugging during development. Logging enabled us to write code that was more reliable. Whenever there was a bug, the logs told us what was going on just prior to the problem, and so we were better able to fix the bug.
We originally intended to reduce the logging only to serious messages before CIP went into operation. But we ended up leaving most of the logging on since it had a negligible impact on overall performance. Then we discovered that the logs gave us much useful information not only about what was going on with each service, but also how client applications were using the services. By analyzing the logs (which we called "log mining"), we were able to tune the services for better performance based on empirical data (see "Dynamic Reconfiguration," later in this chapter).
Here are some code snippets from the streamer service provider bean that show how we did logging for file downloading. The getDataFile( ) method processes each "Get Data File" request (via web services) from the client applications. The method immediately logs the request (lines 15–17), including the user ID of the requester and the filepath of the desired source file:
1 public class StreamerServiceBean implements SessionBean
2 {
3 static {
4 Globals.loadResources("Streamer");
5 };
6
7 private static Hashtable readerTable = new Hashtable( );
8 private static Hashtable writerTable = new Hashtable( );
9
10 private static BeanCacheStats cacheStats = Globals.queryStats;
11
12 public FileToken getDataFile(AccessToken accessToken, String filePath)
13 throws MiddlewareException
14 {
15 Globals.streamerLogger.info(accessToken.userId( ) +
16 ": Streamer.getDataFile("
17 + filePath + ")");
18 long startTime = System.currentTimeMillis( );
19
20 UserSessionObject.validateToken(accessToken);
21 FileToken fileToken = doFileDownload(accessToken, filePath);
22 cacheStats.incrementTotalServerResponseTime(startTime);
23 return fileToken;
24 }
25
The doFileDownload( ) method creates a new file token (line 30) and file reader bean (line 41), and then calls the reader bean's getDataFile( ) method (line 42). The cacheStats field deals with runtime monitoring, which is described later:
26 private static FileToken doFileDownload(AccessToken accessToken,
27 String filePath)
28 throws MiddlewareException
29 {
30 FileToken fileToken = new FileToken(accessToken, filePath);
31 String key = fileToken.getKey( );
32
33 FileReaderLocal reader = null;
34 synchronized (readerTable) {
35 reader = (FileReaderLocal) readerTable.get(key);
36 }
37
38 // Create a file reader bean to start the download.
39 if (reader == null) {
40 try {
41 reader = registerNewReader(key);
42 reader.getDataFile(filePath);
43
44 return fileToken;
45 }
46 catch(Exception ex) {
47 Globals.streamerLogger.warn("Streamer.doFileDownload("
48 + filePath + "): " +
49 ex.getMessage( ));
50 cacheStats.incrementFileErrorCount( );
51 removeReader(key, reader);
52 throw new MiddlewareException(ex);
53 }
54 }
55 else {
56 throw new MiddlewareException("File already being downloaded: " +
57 filePath);
58 }
59 }
60
The readDataBlock( ) method processes each "Read Data Block" request from the client applications. It looks up the correct file reader bean (line 71) and calls the reader bean's readDataBlock( ) method (line 79). At the end of the source file, it removes the file reader bean (line 91):
61 public DataBlock readDataBlock(AccessToken accessToken, FileToken fileToken)
62 throws MiddlewareException
63 {
64 long startTime = System.currentTimeMillis( );
65 UserSessionObject.validateToken(accessToken);
66
67 String key = fileToken.getKey( );
68
69 FileReaderLocal reader = null;
70 synchronized (readerTable) {
71 reader = (FileReaderLocal) readerTable.get(key);
72 }
73
74 // Use the reader bean to download the next data block.
75 if (reader != null) {
76 DataBlock block = null;
77
78 try {
79 block = reader.readDataBlock( );
80 }
81 catch(MiddlewareException ex) {
82 Globals.streamerLogger.error("Streamer.readDataBlock("
83 + key + ")", ex);
84 cacheStats.incrementFileErrorCount( );
85 removeReader(key, reader);
86 throw ex;
87 }
88
89 // End of file?
90 if (block == null) {
91 removeReader(key, reader);
92 }
93
94 cacheStats.incrementTotalServerResponseTime(startTime);
95 return block;
96 }
97 else {
98 throw new MiddlewareException(
99 "Download source file not opened: " +
100 fileToken.getFilePath( ));
101 }
102 }
103
The registerNewReader( ) and removeReader( ) methods create and destroy the stateful file reader beans, respectively:
104 private static FileReaderLocal registerNewReader(String key)
105 throws Exception
106 {
107 Context context = MiddlewareUtility.getInitialContext( );
108 Object queryRef = context.lookup("FileReaderLocal");
109
110 // Create the reader service bean and register it.
111 FileReaderLocalHome home = (FileReaderLocalHome)
112 PortableRemoteObject.narrow(queryRef, FileReaderLocalHome.class);
113 FileReaderLocal reader = home.create( );
114
115 synchronized (readerTable) {
116 readerTable.put(key, reader);
117 }
118
119 return reader;
120 }
121
122 private static void removeReader(String key, FileReaderLocal reader)
123 {
124 synchronized (readerTable) {
125 readerTable.remove(key);
126 }
127
128 if (reader != null) {
129 try {
130 reader.remove( );
131 }
132 catch(javax.ejb.NoSuchObjectLocalException ex) {
133 // ignore
134 }
135 catch(Exception ex) {
136 Globals.streamerLogger.error("Streamer.removeReader("
137 + key + ")", ex);
138 cacheStats.incrementFileErrorCount( );
139 }
140 }
141 }
142 }
Now, here are code snippets from the file reader bean. The cacheStats and fileStats fields are for runtime monitoring, as described later. The getDataFile( ) method logs the start of the file download (lines 160–161):
143 public class FileReaderBean implements SessionBean
144 {
145 private static final String FILE = "file";
146
147 private transient static BeanCacheStats cacheStats = Globals.queryStats;
148 private transient static FileStats fileStats = Globals.fileStats;
149
150 private transient int totalSize;
151 private transient String type;
152 private transient String name;
153 private transient FileInputStream fileInputStream;
154 private transient BufferedInputStream inputStream;
155 private transient boolean sawEnd;
156
157 public void getDataFile(String filePath)
158 throws MiddlewareException
159 {
160 Globals.streamerLogger.debug("Begin download of file '"
161 + filePath + "'");
162 this.type = FILE;
163 this.name = filePath;
164 this.sawEnd = false;
165
166 try {
167
168 // Create an input stream from the data file.
169 fileInputStream = new FileInputStream(new File(filePath));
170 inputStream = new BufferedInputStream(fileInputStream);
171
172 fileStats.startDownload(this, FILE, name);
173 }
174 catch(Exception ex) {
175 close( );
176 throw new MiddlewareException(ex);
177 }
178 }
179
The readDataBlock( ) method reads each data block from the source file. When it has read the entire source file, it logs the completion (lines 191–193):
180 public DataBlock readDataBlock( )
181 throws MiddlewareException
182 {
183 byte buffer[] = new byte[Globals.streamerBlockSize];
184
185 try {
186 int size = inputStream.read(buffer);
187
188 if (size == -1) {
189 close( );
190
191 Globals.streamerLogger.debug("Completed download of " +
192 type + " '" + name + "': " +
193 totalSize + " bytes");
194
195 cacheStats.incrementFileDownloadedCount( );
196 cacheStats.incrementFileByteCount(totalSize);
197 fileStats.endDownload(this, totalSize);
198
199 sawEnd = true;
200 return null;
201 }
202 else {
203 DataBlock block = new DataBlock(size, buffer);
204 totalSize += size;
205 return block;
206 }
207 }
208 catch(Exception ex) {
209 close( );
210 throw new MiddlewareException(ex);
211 }
212 }
213 }
Here are some sample streamer service log entries:
2004-12-21 19:17:43,320 INFO : jqpublic:
Streamer.getDataFile(/surface/tactical/sol/120/jpeg/1P138831013ETH2809P2845L2M1.JPG)
2004-12-21 19:17:43,324 DEBUG: Begin download of file '/surface/tactical/sol/120/
jpeg/1P138831013ETH2809P2845L2M1JPG'
2004-12-21 19:17:44,584 DEBUG: Completed download of file '/surface/tactical/sol/120/
jpeg/1P138831013ETH2809P2845L2M1.JPG': 1876 bytes
Figure 20-6 shows a useful graph of information we can glean from log mining. The graph shows the trend in the amount of downloading (the number of files and the number of bytes downloaded) over a period of several months during the mission. Over shorter periods of time, the graph can show spikes whenever one of the rovers makes an interesting discovery.
Logging enables us to analyze the performance of the services by examining what they have been doing over a period of time. Unlike log entries, which are most helpful in pinpointing problems and their causes, runtime monitoring helps us see how well the services are currently performing. It gives us a chance to make dynamic adjustments to improve performance or to head off any potential problems. As mentioned earlier, the ability to monitor operational behavior is often critical to the success of any large application.
The code listings previously shown included statements that update the performance data kept by global static objects referenced by the fields cacheStats and fileStats. A middle-ware monitoring service probes this performance data upon request. The global objects to which these fields refer aren't shown, but you should be able to imagine what they contain. The key point is that it's not complicated to gather useful runtime performance data.
We wrote the CIP Middleware Monitor Utility as a client application that periodically sends requests to the middleware monitoring service to obtain the current performance data. Figure 20-7 shows a screenshot of the utility's Statistics tab that displays, among other runtime statistics, the number of files and bytes that have been downloaded and uploaded by the streamer service, and the number of file errors (such as an invalid filename specified by a client application) that have occurred.
The streamer service provider bean's doFileDownload( ) and readDataBlock( ) methods both update the global file error count (lines 50 and 84 in the code shown earlier in the section "Logging"). The getDataFile( ) and readDataBlock( ) methods increment the global total service response time (lines 22 and 94). As seen in Figure 20-7, the middleware monitor utility displays average response times under the "Total Server Response" label.
The file reader bean's getDataFile( ) method records the start of each file download (line 172). The readDataBlock( ) method increments the global total file and byte counts (lines 195 and 196) and records the completion of a download (line 197). Figure 20-8 shows a screenshot of the Files tab of the monitor utility, which displays current and recent file downloading and uploading activity.