Michael van Eeden on Mon, 2 Feb 1998 09:04:15 +0100 (MET)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Linux Helps Bring Titanic to Life (Fwd)



Hi nettime,

here a nice article about how the film Titanic was rendered on 100 Linux 
machines. I feel there is some change in the air; Linux/Apache/PGP policy of
cooperation/free/with source code is becoming mainstream, also with Netscape
5 now also becoming free and with source code...

Linux Helps Bring Titanic to Life
http://www.ssc.com/lj/issue46/2494.html

A Titanic challenge to Microsoft
http://www.msnbc.com/news/139296.asp

First article here in ascii, see webpage for the other one.

--------------------

Linux Helps Bring Titanic to Life

Digital Domain uses Linux to create high-tech visual effects for the
movies.

by Daryll Strauss

Digital Domain is an advanced full-service production studio located in
Venice, California. There, we generate visual effects for feature films and
commercials as well as new media applications. Our feature film credits
include Interview with the Vampire, True Lies, Apollo 13, Dante's Peak and
The Fifth Element. Our commercial credits are challenging to count, much
less list here (see the web site at http://www.d2.com/). While we are best
known for the excellent technical quality of our work, we are also well
respected for our creative contributions to our assignments.

The film Titanic (written and directed by James Cameron) opened in
theaters December 19, 1997. Set on the Titanic during its first and final
voyage across the Atlantic ocean, this tale had to be recreated on the
screen in all the splendor and drama of both the ship and the tragedy.
Digital Domain was selected to produce a large number of extraordinarily
challenging visual effects for this demanding film.

The Task

Digital visual effects are a large portion of our work. For many digital
effects shots, original photographic images are first shot on film (using
conventional cinematic methods) and then scanned into the computer. Each
"cut" or "scene" is set up as a collection of directories with an "element"
directory for all the photographic passes that contribute to the final
scene. Each frame of film is stored as a separate file on a central file
server. A digital artist then begins working on the shot. The work may
involve creating whole new elements such as animating and rendering 3D
models or modifying existing elements such as painting out a wire or
isolating the areas of interest in the original film. This work is done at
the artist's desktop (often on an SGI or NT workstation). Once the setup for
this work is done, the process is repeated for each frame of the shot. This
batch processing is done on all the available CPUs in the facility, often in
parallel and requires a distributed file
system and uniform data overview. A goal of this processing is to remain
platform independent whenever possible. Finally, once all the elements are
created, the final image is "composited". During this step the individual
elements are color corrected to match the original photography, spatially
coordinated and layered to create the final image. Again, the set up for
compositing work is usually done on a desktop SGI, and the batch processing
is done throughout the facility.

Since building a full-scale model of the Titanic would have been
prohibitively expensive, only a portion of the ship was built full size (by
the production staff), and miniatures were used for the rest of the scenes.
To this model we added other elements of the scene such as the ocean,
people, birds, smoke and other details that make the model appear to be
docked, sailing or sunk in the ocean. To this end, we built a 3D model and
photographed 2D elements to simulate underwater, airborne and land-based
photography.

During the work on Titanic the facility had approximately 350 SGI CPUs,
200 DEC Alpha CPUs and 5 terabytes of disk all connected by a 100Mbps or
faster network.

The Analysis

Our objective is always to create the highest quality images within
financial and schedule constraints. Image creation is accomplished in two
phases. In the first phase, the digital artist works at an interactive
workstation utilizing specific, sophisticated software packages and specific
high-performance hardware. During the second phase the work is processed in
batch mode on as many CPUs as possible, regardless of vintage, location or
features to enhance interactive performance. It is difficult to improve on
that first, interactive phase. The digital artists require certain packages
that are not always available on other platforms. Even if similar packages
are available, there is a significant cost associated with interoperating
between them. Another problem is that some of the packages require certain
high-end (often 3D) hardware acceleration. That same quality and performance
of 3D acceleration may not be available on other platforms. In the
batch-processing phase, improvements are more easily found, since basic 
requirements are high-bandwidth computation, access to large storage and 
a fast network. If the appropriate applications are available, we can 
improve that part of the process. Even in cases where only a subset of 
the applications are available on a particular platform, using that 
platform gives us the ability to partition work flow to improve access to 
resources in general. We rapidly concluded the DEC Alpha-based systems 
served our batch-processing needs very well. They provide extremely high 
floating-point performance in commodity packaging. We were able to 
identify certain floating-point-intensive applications as port targets. 
The Alpha systems could be configured with large amounts of memory
and fast networking at extremely attractive price points. Overall, the DEC
Alpha had the best price/performance match for our needs. The next question
was which operating system to use. We had the usual choices: Windows/NT, DEC
UNIX and Linux. We knew which programs we needed to run on the systems, 
so we assembled systems of each type and proceeded to evaluate their 
suitability for the various tasks we needed to complete for this 
production. Windows NT had several shortfalls. First, our standard 
applications, which normally run on SGI hardware, were not available 
under NT. Our software staff could port the tools, but that solution 
would be quite expensive. NT also had several other limitations; it 
didn't support an automounter, NFS or symbolic links, all of
which are critical to our distributed storage architecture. There were
third-party applications available to fill some of these holes, but they
added to the cost and, in many cases, did not perform well in handling our
general computing needs.

Digital UNIX performed very well and integrated nicely into our
environment. The biggest limitations of Digital UNIX were cost and lack of
flexibility. We would be purchasing and reconfiguring a large number of
systems. Separately purchasing Digital UNIX for each system would have been
time consuming and expensive. Digital UNIX also didn't have certain
extensions we required and could not provide them in an acceptable time
frame. For example, we needed to communicate with our NT-based file servers,
connect two unusual varieties of tape drives and allow large numbers of
users on a single system; none are supported by Digital UNIX. Linux
fulfilled the task very well. It handled every job we threw at it. During
our testing phase, we used its ability to emulate Digital UNIX applications
to benchmark standard applications and show that its performance would meet
our needs. The flexibility of the existing devices and available source code
gave Linux a definitive advantage. The downside of

Linux was the engineering effort required to support it. We knew that we
would need to dedicate one engineer to support these systems during their
set up. Fortunately, we had engineers with significant previous experience
with Linux on Intel systems (the author and other members of the
system-administration staff) and enough Unix-system experience to make any
required modifications. We carefully tested a variety of hardware to make
sure all were completely compatible with Linux.

Numerology and Fate

The Linux distribution used was Red Hat 4.1. At that time Red Hat was
shipping Linux 2.0.18, which didn't support the PC164 mainboard, so the
first thing we had to do was upgrade the kernel. During our testing we
tracked down a number of problems with devices and kept up with both the 2.0
and 2.1 series of kernels. We ended up sticking with 2.1.42 with a few
patches. We also decided on the NCR 810 SCSI card with the BSD-based driver
and the SMC 100MB Ethernet card with the de4x5 driver. It turned out to be a
very stable configuration, but there was one serious floating-point problem
that caused our water-rendering software to die with an unexpected
floating-point exception. This turned out to be a tricky problem to fix and
didn't make it into the kernel sources until 2.0.31-pre5 and 2.1.43. The
Alpha kernel contains code to catch floating-point exceptions and to handle
them according to the IEEE standard. That code failed to handle one of the
floating-point instructions that could generate an exception. As a 
result, when that case occurred, the application would exit with a 
floating-point exception. Once fixed, our applications ran quite smoothly 
on the Alpha systems.

The Implementation

At this point the decision was made to purchase 160 433MHz DEC Alpha
systems from Carrera Computers of Newport Beach, California. Of those 160
machines, 105 of the machines are running Linux, the other 55 are running
NT. The machines are connected with 100Mbps Ethernet to each other and to
the rest of our facility.

The staff at Carrera was extraordinarily helpful and provided inestimable
support for our project. This support began at the factory, with follow-up
support through delivery, support and repair. We created a master disk,
which we provided to Carrera, along with a single initialization script that
would configure the generic master disk to one of the 160 unique
personalities by setting up parameters such as the system name and IP
address. Carrera built, configured and burned-in the machine, then logged in
as a special user causing the setup script to execute. When the script
completed, the machine automatically shut down. This process made
configuring the machines easy for both Carrera and us. When the hosts
arrived, we just plugged them in and flipped the switch, and they came up on
the network. All 160 machines are housed in a small room at Digital Domain
in ten 19 inch racks. They are all connected to a central screen, keyboard
and mouse via a switching system to allow an operator to sit in the 
middle of the room and work on the console of any machine in the room.

We created a master disk, which we provided to Carrera, along with a
single initialization script that would configure the generic master
disk to one of the 160 unique personalities by setting up parameters
such as the system name and IP address. Carrera built, configured and
burned-in the machine, for both Carrera and us. When the hosts arrived, 
we just plugged them in and flipped the switch, and they came up on the 
network. All 160 machines are housed in a small room at Digital Domain in 
ten 19 inch racks. They are all connected to a central screen, keyboard 
and mouse via a switching system to allow an opera.

Digital Domain Computer Room

The room was assembled in a time period of two weeks including the
installation of the electrical, computing and networking. The time spent
creating the initialization script was extremely well spent as it allowed
the machines to be dropped in place with relatively little trouble. At that
point we began running the Titanic work through the "Render Ranch" of
Alphas. The first part of this work partition was to simulate and render the
water elements. We knew that the water elements were computationally very
expensive, so this process was one of the major reasons for purchasing the
Alphas. These jobs computed for approximately 45 minutes and then generated
several hundred megabytes of image data to be stored on central storage
servers. Intermediate data was stored on the local SCSI disk of the Alpha.
The floating-point power of the DEC Alpha made jobs run about 3.5 times
faster than on our old SGI systems. As the water rendering completed, the
task load then switched to compositing.
These jobs were more I/O bound, because they had to read elements from
disks on servers spread around the facility and combine them into frames to
be stored centrally. Even so, we still saw improvements of a factor of two
for these tasks. We were extremely pleased with the results. Between the
beginning of June and the end of August, the Alpha Linux systems processed
over three hundred thousand frames. The systems were up and running 24 hours
a day, seven days a week. There were no extended downtimes, and many of the
machines were up for more than a month at a time.

The Problems

We addressed a number of different problems using a variety of techniques.
Some of the problems were Alpha specific, and some were issues for the Linux
community at large. Hopefully, these issues will help others in the same
position and provide feedback for the Linux community. Hardware
compatibility, particularly with Alpha Linux, is still a problem. Carrera
was very cooperative about sending us multiple card varieties, so that we
could do extensive testing. The range of choices was large enough that we
were able to find a combination that worked. We had to pay careful attention
to which products we were using, as the particular chip revision made a
difference in one case. The floating-point problem (discussed above) was the
toughest problem we had to address. We didn't expect to find this kind of
problem when we started the project. This was a long-standing bug that had
never been tracked down--we attribute this fact to the relatively small
Alpha Linux community. Linux software for Alpha seems to be less tested 
than the equivalent software for the Intel processors--again, a function 
of the user-base size. It was exacerbated by the fact that Alpha Linux 
uses glibc instead of libc5, which introduced problems in our code and, 
we suspect, in other packages. We had a number of small configuration 
issues with respect to the size of our facility. Most of these were just 
parameter changes in the kernel, but they took some effort to track down. 
For example, we had to increase the number of simultaneously mounted file 
systems (64 was not sufficient). Also, NFS directory reads were expected 
to fit within one page (4K on Intel, 8K on Alpha); we had to double this 
number to support the average number of frames stored in a single 
directory. Boot management under Linux Alpha was more difficult than we 
would have liked. We felt the documentation needed improvements to make 
it more useful. Boot management required extensive knowledge of ARC, MILO 
and Linux to make it work.

ARC requires entering a reasonably large amount of data to get MILO to
boot. MILO worked well and provided a good set of options, but we never
managed to get soft reboots to operate correctly. We've been working with
the engineers at DEC to improve some of these issues. The weakest link in
the current Linux kernel appeared to be the NFS implementation, resulting in
most of our system crashes. We generally had a large number of file systems
mounted simultaneously, and those file systems were often under heavy load.
When central servers died or had problems, the Linux systems didn't recover.
The common symptoms of these problems were stale NFS handles and kernel
hangs. When all the servers were running, the Linux boxes worked correctly.
Overall, the NFS implementation worked, but it should be more robust.

Conclusion

The Linux systems worked incredibly well for our problems. The cost
benefit was overwhelmingly positive even including the engineering resources
we devoted to the problems. The Alpha Linux turned out to be slightly more
difficult than first expected, but the state of Alpha Linux is improving
very rapidly and should be substantially better now. Digital Domain will
continue to improve and expand the tools we have available on these systems.
We are engendering the development of more commercial and in-house
applications available on Linux. We are requesting that vendors port their
applications and libraries. At this time, the Linux systems are only used
for batch processing, but we expect our compositing software to be used
interactively by our digital artists. This software does not require
dedicated acceleration hardware, and the speed provided by the Alpha
processor is a great benefit to productivity. Feature film and television
visual effects development has provided a high-performance, 
cost-sensitive, proving ground for Linux. We believe that the general 
purpose nature of the platform coupled with commodity pricing gives it 
wide application in areas outside our industry. The low entry cost, 
versatility and interoperability of Linux is sufficiently attractive to 
warrant more extensive investigation, experimentation and deployment. We 
are currently at the forefront of that development within our
industry and hope to be joined shortly by our peers.

Why Risk Linux? A Production Perspective

   by Wook

Currently, Digital Domain's core business is as a premier provider of
visual effects creativity and services to the feature film and commercial
production industries. As such, we often take a conservative approach to
changes in infrastructure and methodologies in order to meet aggressive
delivery schedules and the most demanding standards of product quality.

During the course of work on several recent feature film productions, we
encountered situations where our installed base of equipment was not
adequate to meet changing production schedules and dynamic visual effects
requirements (in terms of increasing magnitude of effort and complexity). We
needed to meet these challenges head on without impacting the existing
pipeline and without creating new methodologies or systems which would
require re-engineering or re-training. Linux Alpha helped us overcome these
challenges both cost effectively and quickly (a rare combination).

Selecting Linux as part of the production pipeline for the film Titanic
required several goals to be met. If we had not met these requirements, it
is unlikely we would have been able to deliver sufficient computing
resources in a timely fashion to the production. We needed interoperability
and, to a certain degree, compatibility with our SGI/Irix-based systems.
Interoperability and compatibility with Linux had been demonstrated during a
previous effort (Dante's Peak). We ported critical infrastructure elements
(to support distributed processing) to the Linux environment in days, not
weeks, using existing staff. The developers of these tools were able to
rapidly deploy to the Linux environment, demonstrating that we could
leverage that environment in short order. We needed performance, as the
schedule for the production, as well as the magnitude of the work implied a
100% or more increase in studio processing capacity. As we had shown that
Alpha Linux provided a factor of three to four over our SGI systems (see 
main article), it was possible to deliver that increased level of 
performance while physically constrained (air, power and floor space) 
within our current facility.
As to cost effectiveness, we would have needed more than twice as many
Intel machines as Alphas to meet our performance goals. SGI was a valid
contender, but could not compete on a price per CPU basis. We also needed a
viable structure for delivery, installation and support. Carrera Computers
had proven their ability to supply and support us in a timely and
cost-effective manner prior to this order, and that company continued to
provide an extraordinary level of service throughout the Titanic project.

All things considered, this risk paid off in substantial dividends of
project quality and time. Because the urgency of the situation demanded that
we think "outside the box", we were able to deliver a superior solution in a
framework that was entirely compatible with our normal operating models and
that gave a productivity increase equal to double that of our previous
infrastructure. The satisfaction in this success actually made up for the
stress incurred in risking one's job and career.

------

Daryll Strauss is a software engineer at Digital Domain. He has been
hacking on Unix systems of one variety or another for the last 15 years, but
he gets the most enjoyment out of computer graphics. He has spent the last
five years working in the film industry doing visual effects for film. He
can be reached at daryll@d2.com.

Wook has been a software engineer for over 20 years, having discovered
computers and became a complete geek at the age of 14. He has worked for
many companies over those years, finally coming to rest at Digital Domain,
where he was considered unfit for the task of software engineering and 
has been relegated to the position of Director of (Digital) Engineering.

---
#  distributed via nettime-l : no commercial use without permission
#  <nettime> is a closed moderated mailinglist for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: majordomo@icf.de and "info nettime" in the msg body
#  URL: http://www.desk.nl/~nettime/  contact: nettime-owner@icf.de