|
Note that although we wrote the following instructions while installing 1.0.1,
we believe that because most installation problems relate not to DSpace itself,
but rather to other factors, such as mod_webapp, these tips should be useful
for fresh installations of DSpace 1.1.1 or later.
SunSITE wanted to launch a DSpace testbed, but we found that consistent, reliable
installation and configuration information was difficult to find. Eventually, we
were able to install DSpace successfully, but we believe that the sharing of our
installation and configuration experience can only assist others as they attempt
the setup. Please note that this document is not meant to be a step-by-step
manual detailing every step of installation; rather, we hope that it will shed some
light on the more complex and difficult pats of the DSpace setup.
First, a few notes. General information about DSpace can be found
at the main MIT project page.
MIT's installation instructions - which are an excellent starting point, and
to which we will refer often - are located
here,
with specific emphasis on the
installation section.
Due to the generous support of the DSpace community, we have some documents that
have been contributed to us covering other aspects of DSpace installation and
confguration. Also, in this list are links to external pages or sites that cover
tips and techniques for related technologies. So far, we have documents exploring these issues:
Here are the links directly to the various sections of this document:
We are running DSpace under Solaris 9 on Sparc hardware. With that in mind, here are
the software packages that we used, with specific version numbers (unless linked, all
software can be found by following the instructions and links within the DSpace installation
instructions, as linked above):
- DSpace 1.0.1 (currently we are running 1.1)
- JavaBeans Activation Framework 1.0.2
- Java Servlet 2.3 and JSP 1.2
- JavaMail API 1.3
- Tomcat 4.0.6 (binary version)
- Apache 1.3.27
- Ant 1.5.2
- PostgreSQL 7.3.2
In order to build mod_webapp, we used the following software:
For the most part, the DSpace install documentation is fine, especially for the few JAR files
that need to be put into the DSpace source tree, etc. Even for PostgreSQL, the installation
instructions are easy to follow, although we used a later version of the database than they
suggested (just remember, if you do use a later version, to add the --with-java to
the configure parameters, and then place the postgresql.jar file in
the correct place as indicated). Update: the new installation instructions for 1.1 and later
suggest that you must configure PostgreSQL with the --with-java option.
The main difficulty that people encounter when installing DSpace revolves around the proper
configuration of both Apache and Tomcat, as well as the bridging of the two with mod_webapp.
Let's look at the Apache install first, then Tomcat, and then finally mod_webapp.
Apache Back to Top
While it is possible to run newer versions of Tomcat under SSL as a standalone server, we
still believe that using Apache as a sort of proxy server for handling SSL is the better
method. However, here is a link describing how
to run Tomacat as a standalone SSL server, should you be interested in going that route.
Since the scope of this document tries to focus as much as possible on DSpace-related
issues, we will not describe how to setup mod_ssl, for instance. To progress beyond this point,
it is assumed that you have a working Apache server with SSL capabilities.
We had problems using the default dspace-httpd.conf file that comes with DSpace.
So, we extracted what we needed from it and just placed the directives inside the regular
httpd.conf. This renders the usefulness of letting DSpace create the
dspace-httpd.conf file moot, but that's something we can live with. Check out
our sample httpd.conf file incorporating the changes
described below. The file has all of the DSpace-specific additions and changes we made
to get a working DSpace Apache build talking over mod_webapp to Tomcat.
The RedirectMatch lines went in right above our
<VirtualHost _default_:443> line.
Make certain you define a real ServerName in the top-level ServerName
directive; otherwise, you may encounter an error complaining that you have an invalid
virtual host name. In other words, we have a top-level ServerName dspace.sunsite.utk.edu
as well as another ServerName dspace.sunsite.utk.edu within the default
SSL virtual host entry.
Within the included dspace-httpd.conf, they have included several directives
dealing with SSL. We actually added none of these to our configuration file, finding that
the way we had it set up worked just fine; your experience, however, may be completely
different. Some paths they include in the dspace-httpd.conf are completely
inaccurate, such as the SSLCertificateFile and the SSLCACertificateFile
values. This must be for the MIT-specific implementation, because by default
DSpace does not even come with an etc directory at the top level (e.g.,
/dspace/etc doesn't exist). Therefore, use the correct paths to your
certificates instead, but you should probably already have these entered correctly if
you have a working, SSL-enabled Apache install to begin with.
We changed the user and group that Apache runs as from nobody to
dspace, which is our system's DSpace user, as well as the user that
Tomcat runs under.
Other than the necessary mod_webapp changes - which will be discussed later - our Apache
installation remained fairly unchanged.
Tomcat Back to Top
Our Tomcat installation was fairly painless as well. In fact, we only changed one thing.
The DSpace docs will instruct you to delete some extraneous lines from the server.xml
Tomcat configuration file. While you can do that, it is not necessary. Our only
change was to change the name attribute within the Tomcat-Apache
Service from localhost to dspace.sunsite.utk.edu - or in other words,
exactly what we set the ServerName directive in httpd.conf to.
mod_webapp Back to Top
By far, the most common headache when installing DSpace is the mod_webapp component.
However, if you follow these steps, you will have a better than average chance of ending
up with a stable binary.
Before we talk about the steps used to compile mod_webapp, I want to address the issue
of the lack of binaries for mod_webapp. I do not know why people are so against giving
away their binaries. So, here is our mod_webapp binary, compiled
for Solaris 9 Sparc. Have fun.
Alright... Here are the instructions to compile mod_webapp (note that much of this is
repeated from the README.txt file found within the
jakarta-tomcat-connectors-4.0.6-src/webapp directory):
- Start out by unpacking the
jakarta-tomcat-connectors-4.0.6-src.tar.gz
file, probably in some temp directory (we use /scratch).
This will create a jakarta-tomcat-connectors-4.0.6-src directory.
- Within that new directory, there will be a
webapp directory. Inside
of the webapp directory, unpack the apr_APACHE_2_0_35.tar.gz
file. This will create an apr directory within the webapp
directory.
- From within the
webapp directory, run ./support/buildconf.sh.
This should produce a configure script within the webapp
directory.
- Run
./configure --with-apxs=PATH_TO_APXS --with-apr=PATH_TO_APR_SRC
--enable-java=PATH_TO_TOMCAT. The PATH_TO_APXS will likely be similar to
/usr/local/apache/bin/apxs, the PATH_TO_APR_SRC should be right
within your current directory, such as
/scratch/jakarta-tomcat-connectors-4.0.6-src/webapp/apr, and the
PATH_TO_TOMCAT should be obvious, such as /usr/local/tomcat.
- If everything went well (no errors, etc.), run
make and hope for
the best.
At this point, you should have the file mod_webapp.so inside an
apache-1.3 subdirectory within webapp, as well as the
tomcat-warp.jar file inside the build subdirectory. This is
the same information as is found within the README.txt file, and if
you do have both of these files, congratulations, and installation and configuration
can proceed as detailed within INSTALL.txt and the other DSpace
instructions.
But, as many people know, most of the time the building of mod_webapp does not proceed
smoothly. While we cannot address every problem people face, we can discuss some of
the problems we had, as well as the steps we took to fix them.
Under Solaris, we ran into a problem while trying to run the buildconf.sh
script. An error would appear saying essentially that autom4te was not
found, even though clearly it was right in the same directory with automake
, autoconf, etc. This is actually not some strange system problem!
Rather, for us, we simply edited the autom4te script and changed the
path to Perl on the "shebang" line at the top of it. That's right - just make certain
that that line actually points to your Perl executable, and that should solve that problem.
We had this exact same problem with another script later in the buildconf.sh
process, so if you see the script complaining that it cannot find a file that you know
is there, check to see whether it is using the correct path to perl.
Perhaps more disturbing, even after we ended up with a "clean" configure
run, we encountered problems late in the running of make. For some unknown
reason, we were given an error saying that the path
/scratch/jakarta-tomcat-connectors-4.0.6-src/webapp/apr/apr was invalid.
Well, yes, that certainly is invalid. There is no apr subdirectory
within the webapp/apr directory. And apparently the make
script was trying to copy a file into or out of this directory. Our first
thought was to try and fake the script out by making a symlink named apr
within the "real" apr directory pointing back to its parent. This is
quite the kludge, but it has worked on a few other occasions with testy scripts. But
what we ended up doing was simply creating a real, empty apr directory
within apr, and at that point, the make completed without
problem, and produced a working, stable mod_webapp.
Please note that it is possible, in some crazy way, to actually compile a
mod_webapp.so file even though you have had many errors, especially
during the running of buildconf.sh! It is imperative that you make certain to
fix all errors during that process! For instace, under Solairs we had to install
a couple of packages, such as m4, to meet the requirements. We
produced a couple of "successful" builds of mod_webapp without satisfying each
error, and our Apache servers would simply segfault and die on each connection.
The configuration of getting Apache to "talk" with Tomcat using mod_webapp is relatively
easy. In fact, we just added the WebAppConnection conn warp localhost:8008
and WebAppDeploy dspace-oai conn /oai lines directly above
the <VirtualHost _default_:443> section, and the
WebAppDeploy dspace conn / line within the VirtualHost
section, just as outlined in the dspace-httpd.conf file. Also,
here is our sample httpd.conf file, with
all the additions and changes needed to configure Apache for mod_webapp and Tomcat
that were just mentioned.
Other people have reported good success using mod_jk, or even mod_proxy, to bridge
the proverbial gap between Apache and Tomcat. I would love to see any information
about this, including sample configuration files, etc. If there is enough interest,
I would gladly add a section dedicated to those or other methods of making Apache
work with DSpace. Also, if anyone knows of good web resources for the installation
and configuration of mod_webapp, please let me know as well. I, however, read through
what seemed like every page remotely related to mod_webapp, and much of the information
was incorrect, incomplete, confusing, or otherwise less than helpful.
Handle Server Back to Top
Perhaps one of the most difficult components of DSpace to set up is the Handle sever.
This is mostly due to exceedingly poor documentation. The good news is that setting
up the server is actually fairly easy, but poor wording and conflicting reports make
the process seem more difficult than it really is.
Here are the steps that we went through to set up our server:
- Run
/dspace/bin/dsrun net.handle.server.SimpleSetup /dspace/handle-server.
Follow the instructions, answering relatively simple questions. Another option
might be to run /dspace/bin/make-handle-config after configuring
dspace.cfg, but we did not do this.
- Mail the file
/dspace/handle-server/sitebndl.zip to
hdladmin@cnri.reston.va.us. They will quickly create your global
identifier and email you with the appropriate information.
- Once you receive the email, you must edit the Handle configuration file
(
/dspace/handle-server/config.dct). Just as
the DSpace instructions say, make the "storage_type" = "CUSTOM"
change, as well as the "storage_class" = "org.dspace.handle.HandlePlugin"
change.
- While still in
config.dct, update any lines that say something
like YOUR_NAMING_AUTHORITY with the appropriate number as sent to
you in the email from the Handle admin people. For instance, here at UTK
we have a line that reads "server_admins" = ( "300:0.NA/1785" ).
We made this change in three places, under server_admins,
backup_admins, and replication_admins.
- Edit
/dspace/config/dspace.cfg, changing handle.prefix
to whatever number you were assigned (in our case, the line reads
handle.prefix = 1785), and changing if necessary handle.dir
to point to the right place, usually /dspace/handle-server.
You should be finished at this point, and the server should start after running
/dspace/bin/start-handle-server. Since this is not a perfect world,
you may encounter a problem when attempting to start the server, or you may
notice a problem while trying to resolve handles through the hdl.handle.net
server. Here are some of the problems we had, and how to fix them:
Do not follow the instructions on the Handle site to home your server.
While it is true that the official DSpace docs do not explicitly tell you to home
your new server, they also do not tell you that it is not necessary to do so.
Apparently, this is a common mistake, and while it does not hurt you to do so,
certainly it does not help. In fact, ignore any further instructions on the Handle
site; even though the DSpace docs point you there for guidance, ignore anything
else about getting your server up and running.
Another thing that is not explicitly spelled out is that you do not actually
create your own handles; rather, DSpace takes care of handle creation whenever a
new collection is created, a new item is added, and so forth. Again, the DSpace
docs do not tell you to use the Handle admin interface to create handles, but
the official Handle docs may lead you to think that you should. Going down that
road will produce many errors and headaches...
Pay close attention to /dspace/handle-server/error.log. We could not
figure out why our Handle server was not resolving handles, but an examination of
the error log showed a message saying that the TCP port was already taken, and thus
it could not bind. Well, a quick look at the running processes showed that we had
several running zombie processes relating to Java and DSpace, even though everything
was completely shut down. After killing those processes, the port was open and the
Handle server could bind to its port. Problem solved.
Even though you may not choose to do this, we went ahead and deleted all useless
stuff in our config.dct file, including everything related to HTTP
and UDP. To the best of our knowledge, the DSpace implementation of the Handle server
does not use these; instead, it uses only the TCP interface. (In fact, it seems
that the included Handle server is kind of watered-down in several respects - or at
least its implmentation within DSpace is - but that is another conversation.) For
example, here is our production config.dct file,
showing what we left in, and our server works just fine. Indeed, if I had to wager
a guess, I would say that even more stuff could be pruned out, although in reality
the excess is probably not hurting anything or wasting resources.
Apparently, the coders of DSpace found an error in the source that prevents handles
from resolving globally (e.g., when linked through hdl.handle.net as
opposed to locally). Anyway, they released a code fix for this. What you need is the
new HandlePlugin.java source file. Download
it, and place it in your DSpace source tree in the right place, which for us is
/scratch/dspace-1.0.1/src/org/dspace/handle, obviously overwriting the
older copy that is there. When that is done, run ant, then ant
update from within your main source dir (again, for us,
/scratch/dspace-1.0.1). This will create and install a new copy of
dspace.jar within /dspace/lib. Your handle problems should
go away, provided everything else seems in place. However, note that they are
including code in the mid-April 2003 release of DSpace that will render this fix
unnecessary. Update: I am certian this is no longer an issue in any new release of
DSpace, but I will leave the instructions here just in case.
Conclusion
In retrospect, our installation of DSpace actually went relatively well. Aside from
the mod_webapp component, and the lack of usefulness of the dspace-httpd.conf
file, everything went according to spec, as outlined in the DSpace docs. If
you have any questions, comments, or additions to this page, please email me at
jsimms@utk.edu.
|
|
|