Architectural Recommendations
April 16, 1997 Prepared by AMG - WWW Applications Working Group
STANDARDS FOR WORLD WIDE WEB-BASED APPLICATIONS AT NIH
Prepared by: AMG, WWW Applications Working Group
April 16, 1997
THE NEED
The NIH needs a consistent infrastructure to efficiently support development of
NIH-wide Web applications. Unfortunately, this infrastructure does not
currently exist.
Mandate: To address this need, a Web sub-committee of the AMG was
established to formulate a set of World Wide Web standards and features that
will enable the development of Web applications across the NIH.
Executive Summary
The AMG Web sub-committee, chaired by Fernando Burbano, included members from a
wide cross section of NIH (see Attachment D for a list of committee members and
support staff). The committee examined several aspects of Web technology --
application development, Web servers, and other crosscutting Web issues. Input
from previous groups (the AMG Web Technologies Subcommittee as well as the NIH
Home page Committee and its Search Engine Sub-committee) was extremely useful
as the AMG Web sub-committee developed its recommendations. These
recommendations include both strategic and tactical actions.
Adopt Web technologies as the "client/server architecture of choice" for
new NIH applications (Tactical and Strategic)
The NIH should embrace and exploit Web technologies for research,
administrative, and management purposes, as has the computer industry at large.
We expect that new vendor products will incorporate the look and feel of Web
browsers for their user interface. This should provide a high level of
consistency across COTS applications -- thus reducing training and support
requirements while increasing user productivity. NIH should join this rapidly
growing movement to maximize its IT effectiveness. It would be extremely
inefficient for the NIH to ignore industry trends.
Act to ensure that on and after June 1998 all NIHnet-connected desktop
systems include a fully functional, standards compliant Web browser
(Tactical and Strategic)
It is important to provide NIH-wide consistency for desktop clients (i.e., Web
browsers). Funding is needed to build this level playing field to avoid
creating pockets of "have nots" that would compromise the ubiquity and
effectiveness of Web-based applications. This central funding will promote a
controlled implementation -- and the future cost of not funding it may far
outweigh the initial investment.
Act to provide adequate employee training and support for the use of Web
browsers no later than June 1998 (Tactical)
The opportunity to have maximum impact on NIH's use of Web technologies is
now, before each organizational entity has become committed to their own
implementation standards.
Create a standing AMG Web Committee (Tactical)
Web technologies are evolving at an incredibly rapid rate. Thus, an algorithm
was developed to identify appropriate application functionality and Web
products such as browsers. A standing AMG Web Committee should apply this
algorithm every six months in order to provide NIH with a current set of
browsers that adhere to the NIH standards.
I. Overview
The World Wide Web is a technology that NIH should embrace and exploit for
research, administrative, and management purposes. The scope and rate of change
in Web technologies and associated products is so rapid that it is not
practical to establish static standards for the NIH. Rather, it is the
consensus of this sub-committee that specific Web standards for NIH should
follow industry trends in a well-defined, controlled manner; thus, a process
for identifying these trends should be adopted. Toward this goal, the
sub-committee recommendations are in three areas:
* Application development issues, including browser/application compatibility,
HTML syntax, applet usage, and database access.
* Server issues, including security, robustness, backup/recovery, scalability,
and capacity management.
* Crosscutting issues, including choice of search engines, the use of Web
robots, and applications with high bandwidth requirements.
As with many aspects of the web, the rapid pace of technological change makes
it difficult to develop a set of rigid guidelines and "best-practices" as they
are often out-of-date by the time they are completed. This problem is best
addressed through the creation of a standing AMG Web Committee to deal with Web
standards and recommendations. This Web committee should update specific Web
standards recommendations for NIH Web application development and review the
currency and applicability of the server and ancillary issues included in this
paper. The sub-committee recommends that this process take place no less
frequently than every six months.
In addition to creating a Web committee, we feel that it is vital for the AMG
to recommend NIH action to ensure that every NIHnet-connected desktop system
have an appropriate (i.e., standards compliant) Web browser installed and
functional. Further, it should recommend that training be available so that
all NIH staff is able to effectively use their browser. This sub-committee
recommends that these actions have a target completion date of June
1998.
To restate, this sub-committee feels that the NIH should:
* Adopt the use of Web technologies as the "client/server architecture of
choice" for new NIH applications
* Act to ensure that on and after June 1998 all NIHnet-connected desktop
systems include a fully functional, standards compliant Web browser
* Act to provide adequate employee training and support for the
use of Web browsers no later than June 1998
* Create a standing AMG Web Committee
This proposal further describes each of these recommendations and provides the
rationale for them.
We feel that acting now will create NIH-wide architectural consistency
in the early stages of the new Web computing technology -- when it is most
efficient (and possible) to do so. It is rare to have the opportunity to have
a major impact throughout NIH's IT community. The specific actions envisioned
will, of course, require funding. We estimate that the objectives can be
reached at a cost of $550,000 during FY98 with an annual expenditure of about
$230,000 thereafter. Attachment B provides the basis for this cost estimate.
In addition to the recommendations above, the sub-committee suggests that the
AMG Web Committee create a monitored LISTSERV List and Web page to distribute
information and promote collaboration among Web Masters and Web application
developers.
II. Web Policy Considerations
It has been suggested that this AMG sub-committee review other policy issues
found in the current NIH WWW Guidelines document to determine
whether selected items need to be updated or made consistent with the HHS
WWW Guidelines and Best Practices. The majority of topics covered in
these existing documents are not related to the technical aspects of making
applications compatible across the Internet. Instead, they concern broader
agency Web issues such as: appropriate-use of federal ADP resources, adherence
to existing federal laws and polices such as the Privacy Act, editorial and
artistic quality control, use of standard design elements, effective design
practices (such as the economic use of images and the design of fast-loading
graphics, marketing, user feedback, the need for text-only pages or other
techniques that provide for the accessibility of on-line information for the
disabled, content issues such as copyrights and disclaimers of endorsement, and
the relevancy, accuracy and timeliness of posted data).
We examined these NIH and DHHS policies and felt comfortable with their
recommendations. We do, however, consider it highly desirable to identify the
roles of the various groups and committees that currently deal with NIH Web
policy and to clearly define an overall centralized structure to provide
coordination and leadership for their efforts.
Two aspects of Web policy have special bearing on other architectural issues.
For this reason, we feel that it is important to re-affirm our support of these
policies for NIH Web documents:
a. Web content must meet DHHS standards on proper HTML, headers and
footers, use of logos, etc.
b. Documents should contain descriptive, meaningful titles (commonly
displayed on search engine document hit lists).
III. Application Development Issues
The scope and rate of change in Web technologies and associated products is so
rapid that it is not practical to establish static standards for application
development at NIH. Further, Web technologies are still in their infancy, thus
industry directions have not yet been determined. With this in mind, we feel
that NIH will be best served by following de facto industry standards; thus, a
process for identifying these standards should be adopted. The process will
be based on:
An algorithm to identify a "standard" set of browsers based on
market share.
* Using an identified information authority used to determine market share.
* With a periodic review by an AMG Web Committee that will apply the algorithm
to obtain a list of products (i.e., browsers) that make up the "standard"
set.
This approach will permit application developers to utilize relatively new Web
technologies while still achieving the efficiency benefits of NIH-wide
architectural consistency.
The sub-committee believes that different considerations are appropriate for
applications that have different intended audiences. With this in mind, the
sub-committed has addressed two types of application: trans-NIH applications
and general public applications. Trans-NIH applications are those whose
intended audience is anyone involved in the direct business of NIH (research,
administration, or management). Thus, the audience is some unnamed group of
NIH-related individuals whose desktop software includes Web products that
adhere to the standards as defined below. General public applications are
those intended for use by the public at large.
"Standard" Browsers
a. The standard for Web browsers is any one of the market share-leading
browsers that together constitute 85% of the market at the time the AMG's Web
committee convenes.
b. The Browser Statistics Usage Page of the University of Illinois at
Urbana-Champaign should be used to identify browser market share. (See
Attachment A for a discussion of this information authority.)
c. From the set of browsers identified above, an application developer may
assume the following level of currency:
* Trans-NIH applications -- all production releases during the past six
months.
* General Public applications -- all production releases during the past twelve
months.
d. If the set of browsers and releases identified above does not address all
desktop systems that must be supported by NIH policy, then additional
browsers/releases will be added to accommodate this requirement.
e. The standard browsers are assumed to be in their off-the-shelf state (i.e.,
no additional plug-ins or extensions).
An application of this process, and a set of "standard" browsers as of March
20, 1997, is included in Attachment A. We recommend that these be used as the
standard browsers until the AMG Web Committee provides a more current
recommendation.
Software Design
a. Applications should be designed to use HTML, applets, and security features
that are functional with the complete set of browsers identified above (for
either trans-NIH or general public audiences).
b. Database access should be compatible with the Web browsers identified
above.
c. Applications should be tested with all of the standard Web browsers. Note:
the AMG Web Committee should consider developing a "test bed" that will allow
software developers to try new applications with all of the browsers (and
versions) that constitute the standard browser set.
Robot Considerations
Application developers and Web page owners should ensure that their pages
provide for robot-friendly navigation. This requires that the responsible
individuals:
a. Use robots.txt to exclude sensitive documents (i.e., Intranet documents)
(See Attachment C.)
b. Consider robots when designing a CGI interface, and test prior to
implementation.
IV. Server Issues
Web-based applications are quickly becoming critical components of the work of
NIH. Thus, it is essential for the servers that run these applications to
receive the same considerations that are appropriate for other production
computing applications. The DHHS Automated Information Systems Security Program
(AISSP) defines the security levels for application systems and data files --
and specifies the actions that are required for the computing system (platform
and environment) used. These actions address the many areas that must be
considered to insure that the NIH business process has total integrity and
reliability. Servers should adhere to the DHHS standards for all aspects of
security, robustness, and operational management that is appropriate for the
applications and data that depend upon them.
V. Crosscutting Issues
Three areas of Web technology that merit a great deal of consideration at NIH
are:
* Robots
* Search Engines
* High Bandwidth Applications
Robots
Robots are a new breed of web-based applications that can have a direct effect
on other NIH services and resources (see
http://info.webcrawler.com/mak/projects/robots/faq.html). Unlike traditional
applications that operate within a well-defined domain, usually a single
computer or interconnected group of related computer systems, robots traverse
across Web space examining/collecting data from systems far beyond the
application developer's control. Spiders, used by search engines to index Web
document collections, are a common example of a Web robot. Agents, programs
that crawl Web space searching for specific information for a particular user,
are another growing class of robot applications.
Robots are extremely useful in organizing and sifting through the vast amount
of data available across NIH Web space. However, they can cause some unwanted
side effects:
a. Rapid-fire HTTP requests, made by a robot, can overwhelm a small Web
server.
b. A robot can also overwhelm Web servers that have a poorly designed CGI
interface. This is particularly true of servers that generate documents
on-the-fly, where multiple links to the same "page" have different uniquely
generated URLs.
c. A robot that downloads a Web server's complete document collection can
greatly skew that Web server's access statistics.
d. A robot, collecting documents within NIH Web space for distribution to the
general public, could circumvent NIH IP based security.
Releasing Web Robots
a. At least 30 days prior to deploying a robot based application the
application developer must post an announcement to the LISTSERV list created
and managed by the AMG Web Committee. This announcement should include:
i. Brief description of the application
ii. Deployment date.
iii. Contact information, including name(s), phone number and e-mail
address.
iv. Instructions on how identify and exclude the application (e.g.,
robots.txt).
b. All robots must adhere to the Robot Exclusion Standard (see Attachment
C).
c. Robots should contain a built-in delay of at least 5 seconds between making
multiple HTTP requests from the same server. If possible, the robot should
time the requests to match the performance of the server (i.e., longer delays
between requests to a slower server).
Search Engines
A sub-committee of the NIH Home page Committee recently completed an evaluation
of search engines. We endorse this group's approach and current recommendations
that can be found at
http://bigblue.od.nih.gov/websearch/report.htm
The AMG Web Committee should periodically review and update these
recommendations.
High Bandwidth Applications
Web based applications that transmit/receive huge quantities of data (e.g.,
PointCast, real audio, video streaming, MBONE) can also have an adverse impact
on the NIH Web community. Since NIH shares a common network, an application
that has sustained high bandwidth requirements could impede access to other NIH
systems.
Whenever an application that has high bandwidth requirements is being
considered, the responsible individual should first identify the expected
network resources likely to be impacted, then consult with those responsible
for those resources. This will permit arrangements to be made for additional
network capacity, or will permit re-examination of the technical approach prior
to significant resource investments.
Standards And Features For World Wide Web-Based Applications
Working Group Recommendations
Recommendations
|
Initial
Step(s)
|
Proposed
Agent
|
1.
Adopt Web technology as the "client/server architecture of choice."
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
2.
Every NIHnet -connected desktop system should have a standards compliant
browser.
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
3.
A training program in browser use should be implemented.
|
*
Develop Training Program
* Provide Resources
|
AMG
CIO
|
4.
Create a standing AMG Web committee to deal with Web standards.
|
*
Draft committee charter for review and approval
* Form initial committee membership
|
AMG,
CIO
AMG
|
5.
Provide funding to create and maintain a Web infrastructure for NIH.
|
*
Review resource estimates
* Provide funds
|
AMG
CIO
|
6.
NIH should designate as its standard those off-the-shelf browsers that
constitute an 85% market share. For Trans-NIH applications, all production
releases within the past 6 months. For general public applications, all
production releases within the past 12 months.
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
7.
Use only HTML and applets that are functional with the standard browsers.
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
8.
Use only database access methods compatible with the standard browsers.
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
9.
Use only those security measures compatible with the standard browsers
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
10.
Applications should be thoroughly tested with the standard browsers.
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
11.
Ensure a Robot-friendly environment by adopting the Robot Exclusion Standard.
|
*
Draft policy
* Publish and mandate compliance
|
AMG
CIO
|
12.
The AMG Web committee should review these recommendations on a periodic basis.
|
*
Draft policy
|
AMG,
CIO
|
Attachment A
Standard Web Browsers
An Application of the Recommended Algorithm
(As of March 20, 1997)
As described in the report above, the process for establishing Web application
standards is based on identifying those families of Web browsers that make up
85 percent of browser usage market share at any specific time. This in turn
depends on the availability of authoritative statistics for Web browser usage.
For this purpose, and until an alternative source of browser statistics can be
identified and justified, such statistics will be derived from Information
obtained from the Browser Statistics Usage Page of the University of Illinois
at Urbana-Champaign (UIUC):
http://www.cen.uiuc.edu/bstats/latest.html
This site publishes browser usage statistics for "hits" to the UIUC Engineering
Workstations WWW Servers, and is currently (as of March 1997) basing its
information on over 1 million "hits" per week from over 83 thousand different
browser hosts. Statistics are provided for browser operating system, browser
vendor, and browser version. Available time increments are daily, weekly, and
monthly, with historical data archives starting from April 1996.
The processes by which an NIH Web application developers might ensure that they
are adhering to the compatibility standards set forth here are:
1) Suppose that a developer preparing to implement an application for
production in March 1997 and that the client audience for that application will
be trans-NIH. In order to ensure that the application functionality will be
synchronized with the anticipated level of NIH Web browser functionality, the
developers would monitor the UIUC statistics site described above. From that
site they would determine which families of Web browsers make up 85 percent (or
more) of general browser usage.
For example, based on statistics from the week ending with March 16, 1997
(83,819 hosts in 102 countries making 1,046,435 accesses), they would find
that:
Netscape (all OSs and versions) = 74.2 percent
Microsoft Internet Explorer (MSIE)(all OSs and versions) = 23.0 percent
Lynx = 0.5 percent
Mosaic = 0.4 percent
Therefore, since Netscape and MSIE currently account for 97.2 percent of the
current general browser usage, confining compatibility testing to Netscape and
MSIE is acceptable in the March 1997 time frame for application testing and
deployment.
2) For a trans-NIH client audience, the lag period for browser implementation
is stipulated to be six months. Therefore the developers must test their
applications with all production version releases (by version number and not by
platform) of the browsers from Netscape and MSIE starting with October 1996.
This would then indicate that the application should be tested with Netscape
versions 3+ and MSIE 3+. If the client audience were the general public, then
the lag period would be 12 months, and the developers would be requested to
test their application with Netscape version 2+ and all versions of MSIE.
Attachment B
Cost Basis
The $550,000 FY98 and additional annual expenditure of $230,000/year are
specifically for maintaining currency in NIH browser support. Costs associated
with hardware, operating systems, and other applications are not included in
these estimates.
NIH browser costs are derived from the following assumptions:
* There are approximately 16,000 NIHnet-connected desktop computers. We
estimate that 40% of the people responsible for these computers will acquire
their own browsers, even if NIH provides a "free" one.
Thus, about 10,000 browsers will need to be acquired
* Netscape browsers cost about $37/copy when purchased singly
* Large volume, site licenses, or right-to-buy agreements can significantly
reduce the cost of browser purchase
* The Internet Explorer browser is included in the cost of Windows 95 and
Windows NT
Thus, the cost for providing 10,000 additional browsers for NIH use will be
approximately $200,000 ($20/copy) + 15% annual maintenance ($30,000)
* Since there is already an infrastructure for software support and training,
we feel that the additional resources needed for browser support services could
be acquired for approximately $350,000 during FY98 (the "ramp up" period), then
about $200,000 annually for future years. This annual cost might become lower
once the current rate of technology upgrades diminishes and new browser
versions aren't released so frequently.
Attachment C
Robot Exclusion Standard
This attachment has been adapted from:
http://info.webcrawler.com/mak/projects/robots/robots.html.
To exclude robots from a server or specify an access policy for robots, the
webmaster must create a robots.txt file accessible via HTTP as
http://server.name/robots.txt.
The robots.txt file consists of one or more records separated by one or more
blank lines. Each record contains lines of the form:
<field>:<optional space><value>
The field name is case insensitive. The '#' character is used to indicate that
the remainder of the line is a comment and can be discarded.
The record starts with one or more User-agent lines, followed by one or more
Disallow lines, as detailed below.
User-agent
The value of this field is the name of the robot the record is describing
access policy for.
If more than one User-agent field is present the record describes an identical
access policy for more than one robot. At least one field needs to be present
per record. If the value is '*', the record describes the default access
policy for any robot that has not matched any of the other records..
Disallow
The value of this field specifies a partial URL that is not to be visited. This
can be a full path, or a partial path; any URL that starts with this value will
not be retrieved. For example, Disallow: /help disallows both /help.html and
/help/index.html, whereas Disallow: /help/ would disallow /help/index.html but
allow /help.html. Any empty value, indicates that all URLs can be retrieved. At
least one Disallow field needs to be present in a record.
Example
The following example "/robots.txt" file specifies that no robots should visit
any URL starting with "/nih/intranet/" or "/cgi-bin/:
User-agent: * # Applies to all robots
Disallow: /nih/intranet/ # Internal documents
Disallow: /cgi-bin/ # CGI applications
Attachment D
Working Group Membership:
Dennis Burns (ORS/OAM)
Ron Edwards (NCRR)
Dr, Robert Goldschmidt (OD/OER)
Paul Logan (NHLBI)
Pete Morton (DCRT)
Dennis Rodrigues (OD/OC)
Mark Silverman (NLM)
Susan Teper (NIAAA)
Roy Standing (NLM) Vice-Chair
Fernando Burbano (NLM) Chair
Report on Interoperability at the NIH
|