| Leveraging Content Management Systems for e-Discovery |
|
By Brad Harris
Over the last few years, we’ve seen the emergence of enterprise-class content management and e-mail archiving as the grand solution to controlling the ever-burgeoning information explosion. Content management and e-mail archiving applications have clear and compelling value propositions for consolidating data management practices, optimizing storage and facilitating more effective use of information assets. In many cases, such tools also enable records and information management, as well as enhanced compliance and risk management. But are they the panacea for legal discovery readiness? To determine the right mix of people, process and technology, IT should work proactively with their legal counterparts, ask questions and review current methodologies for accessing content management and e-mail archives. In doing so, IT can be much better prepared to identify, preserve and collect the electronic evidence required as a part of commercial litigation or governmental investigation. Evaluating the Systems The first step in responding to a preservation or production obligation is to identify which files in the repositories are potentially relevant. It is useful to know how the files are logically organized in a given repository, such as by folders or classification, and how consistently this organization is applied. It’s important to know how file ownership is defined from an end-user perspective, where the files are physically stored and whether custodians can be tracked back to the files. Counsel will also need to know:
When identifying, collecting and preserving evidence for discovery, the search protocol needs to be clearly documented and include such details as:
Once specific files or documents have been identified, the discovery team, comprised of both legal and IT, must then be able to tag and categorize the files for the particular discovery matter. Can this be done by adding a new attribute to the file or by creating a new record in the database? In some systems, this may be as simple as a “drag-and-drop” into a special folder, where a pointer links back to the original file. In other instances, an actual copy of the file might be created. Regardless of which approach, it is imperative to prevent spoliation by insuring the original metadata associated with the file remains preserved. For example, will the tagging action change the last modified date or when copying the item to a new folder, will the original path be lost? Preserving the Evidence
When responding to a request for production, or preemptively collecting potentially relevant evidence to ensure compliance with pending litigation, content must be exported from the repository in a legally-defensible manner. To be defensible, a process needs to be predictable (repeatable and testable results), transparent (well-understood and articulated), and trusted (non-repudiation of the end result). The process needs to:
Other questions to ask include:
Understanding the Context
Content management systems can also add a whole new dimension to discovery if document versioning is enabled. For example, when exporting a file for discovery, will the most current version be exported or will it include all previous versions and history? If a matter concerns the state of the repository at some time in the past, will the content management system enable the retrieval of a particular version of the file from a specific point in time, including deleted or archived files? Does the search and retrieval processes account for deleted or archive history? If legally obligated, will the system be able to produce such metadata, including the version history, which custodians accessed the files and what changes were made? Similarly, e-mail archiving systems also add new questions. Many archiving systems add value by actively managing storage parameters. For instance, single-instance storage offers s a compelling advantage by enabling a single version of duplicate e-mails or attachments to be stored, using links and pointers to keep everything transparent to the end-user.4 Some e-mail management systems take this a step further by allowing e-mail senders to avoid attaching a document altogether, instead embedding on a link back to the repository where the “attached” file is actually stored. When exporting e-mails for discovery, understanding what happens to these links is critical. For example, does the export process replace logical links with actual file attachments? File Migration Before or After a Duty to Preserve Exists However, typical migration utilities do little to retain the original metadata from the source repository, such as file create date, last modified date or original file path. Therefore, it is not uncommon at all to see hundreds or thousands of files in the repository all showing the same “create date,” since the repository typically records the date the file was added to the repository as the original date. If a data migration is being done absent an active matter or preservation obligation, such alteration or loss of the original metadata may not be an issue. But if information that existed in the original source repository remains relevant, the organization could become exposed to serious sanctions for metadata spoliation. It is because of a need to preserve metadata that most content repositories or archiving systems cannot typically be used as a repository for preserving potentially relevant evidence stored “in the wild” once a duty to preserve has arisen. Doing so should only been done once legal implications have been fully vetted and only if special migration methodologies are in place to ensure legal defensibility. Brad Harris is the Director of the Discovery Center of Excellence for Fios, Inc. www.FiosInc.com 1 Most search engines rely on indexes to perform keyword searching, where the content of a file is accessed to extract its textual content. A search index, or more precisely a full-text index, is the database which a full-text search engine uses to respond to the query issued by the user. 2 Retention schedules are a key component of Records Management or Information Lifecycle Management (ILM) systems, where a document is assigned a file plan which articulates how long the record is to be retained and where. A typical lifecycle may be triggered from the date it a file is declared a record, defining how long it is then retained in active storage, when it should be moved to archive storage, and ultimately when it should be disposed of. 3 HSM, or hierarchical storage management, allows for optimizing storage of electronic documents based on use and access needs. As a file matures, it is oftentimes accessed far less frequently than a brand new record. Thus, more sophisticated DR systems allow for offloading older files to less accessible (and therefore less costly) storage mediums. 4 When several files in a repository contain exactly the same data, single instance storage (SIS) can replace the references to these identical files by references to a single stored copy of the file. This can potentially save large amounts of disk space in systems with many copies of the same file. |