William Binney: Thin Thread – Signals Intelligence Within the Rule of Law

526Shares

General Observation: I assume you knew something about Thinthread from your work as a government [employee or] contractor; but, you probably did not get the truth about the TT Program. I have attached an article written about TT by Diane Roark. Plus, TT was not abandoned back then. The software is what they are using today to spy on the whole world. This is clear from the Ed Snowden material plus the NSA IG report from 2009. Also, they have not changed the name of some the TT related programs. They adopted these TT programs because they had nothing else that could handle massive data and data flows. They removed three parts of TT back in 2001. First, the filtering software that did smart selection of relevant data only; second, the encryption Of identifying attributes to mask individuals identities; and, finally, monitoring software that looked at every one on the network and what they were doing As they did it. I would also point out that not only did they adopt all the software to manage data but they also adopted our (SARC) design for Networks that connected multiple parties (1st, 2nd and 3rd) – see Rampart A on the WWW. Further, Kirk Wiebe and I are working with partners
In Amsterdam to implement a TT type program, but on a much more comprehensive scale. Bill

ROBERT STEELE: Thin Thread will be the primary technical tool for the Judicial Commission of Inquiry into Human Trafficking and Child Sex Abuse, along with Steve Arnold's deanonymization packages, and the two in combination will be taught to and offered to any law enforcement agency that wishes to do what the secret intelligence world is incapable of doing or refuses to do. An Alternative Intelligence Community is now under development, along with an Autonomous Internet.

– o –

May, 2002

THIN THREAD

Diane Rourk

General Description

THIN THREAD is a revolutionary system designed under U.S. government aegis for rapid discovery of useful knowledge from enormous accumulated and/or streaming data volumes, with the goal of optimizing timely decisions and effective management. It has a myriad of government, law enforcement and commercial applications, and its effectiveness in handling large volumes of data has been demonstrated by a prototype that has been operational since late 2000. THIN THREAD extracts and maintains high-integrity data, and from this data automates event recognition, activity (pattern) recognition and information/knowledge inferencing.

The system or parts thereof may be applied against any transactions recorded within formal information systems. A few examples include: tracking the movement of goods and managing the supply chain (including inventory control and re-ordering); records management and market/trends analysis, with deference to privacy rights or sensitivities; analyzing financial activities (by institution, community, location, person, amount, date/time, etc.); tracking/profiling analyst activities (e.g. financial analysts); correlating information on travel and border crossings or criminal activities; re-tracing and timelining an individual’s movements and/or other activities, or matching individuals to an established profile; tracking outbreaks of disease; detecting electronic attacks; identifying fraud, such as illegal use of services, Social Security Numbers, etc.; and organizing and retrieving information on alien visits and immigrants. In addition to its many potential commercial and domestic government applications, the system has obvious applications to the recognized need for accumulating, organizing, providing and sharing unclassified and classified national security-related information very quickly and effectively. Thus, it could help implement the Department of Defense vision for Network Centric Warfare, notably key portions relating to infrastructure, sensors, data fusion, information management, improved and shared awareness, virtual collaboration and organization, and increased tempo/responsiveness.

THIN THREAD processes data; selects items of interest; identifies and filters out or removes protected entities; stores data and metadata; correlates massive volumes of metadata and other information or events identified by the analyst to be of interest; and allows easy analyst retrieval of information in its intended appearance and format. Presently, it captures items of interest to analysts, and it can be developed further to generate some reports automatically. It is eminently capable of handling the problems of volume, velocity and variety, especially if planned spiral/progressive development proceeds further. THIN THREAD abandons legacy approaches and procedures to address digital and packetized data in increments from relatively small volume to the highest of volumes (e.g., fiber optic rates). “Stovepiped” analog data may also be digitized, processed through the system and correlated with data from all sources. THIN THREAD allows rapid knowledge discovery for massive information environments, because its four functional components all accommodate terabytes of information and it facilitates analyst validation of information. It can examine all available data and applies analyst-developed scoring algorithms to select information of interest within two seconds of receipt for processing.

Data is processed and stored in “distributed” fashion, at the various points of origin, although the metadata initially may be transmitted to a central location if desired. This minimizes transmission loads and costs, which otherwise may constitute a significant bottleneck and financial drain, especially with high-volume systems; only content that the analyst deems potentially useful is forwarded to the analyst. In this way, the system also considerably limits vulnerability to accidents (such as IT failure) or disaster/attacks (e.g. terrorism), since data is stored at many locations and analysts worldwide can access it from many locations. The need to build expensive duplicative facilities and systems in order to address vulnerability could be minimized by using such a distributed architecture. Data retrieval may be both automated and manual; in the latter case, an analyst gathers data from all relevant database sites using a single query.

THIN THREAD is an end-to-end (access processing to analyst retrieval and potentially to partially automated reporting) solution. With all parts working together, it can acquire and sustain high-quality data, in contrast with some current systems in which over half the data becomes unusable/unrecoverable. The highest volume feeds can be handled effectively only when THIN THREAD is deployed as a complete, interactive system, including critical analyst input; in this mode, it can scale up to “N,” or any number, the only limitations being space, power and money.

Further, accumulated knowledge and future feedback mechanisms, coupled with additional automation, will allow use not only for data selection and refinement, but also for automatically grouping and storing data related to items of interest, as well as for automated and other tasking to find additional relevant data to update or to fill identified gaps. Thus, through sensible spiral development, THIN THREAD could progressively and relatively quickly evolve to 1) a “big awesome graph” of comprehensive, matrixed intra- and inter-organization data on items of interest, plus 2) the “mission management” tasking capability that is so desperately needed, both within and across organizations — at far lower cost and risk than current initiatives foresee.

Since THIN THREAD produces and displays data in its intended native format, it eliminates much of the need for specialized analysts who spend half their time converting data rather than evaluating content. This would allow more efficient, pooled, and even co-located use of scarce analytic resources. The four parts of the system, working together synergistically, provide analysts with a comprehensive ability to map all media contacts, regardless of constraints and challenges posed by stovepiped processing systems focused on individual types of media.

Development and Deployment History

Between July, 2000 and November, 2000, seven programmers, working closely with a few highly experienced system engineers and information analysts, assembled and deployed the THIN THREAD prototype for a specific government application. By January, 2001, some minor latent software “bugs” were eliminated and the system was ready for full deployment. Existing operational deployments against data streams include full processing of STM-1 (155 Mbps), STM-4 (622 Mbps) and STM-16 (2.488 Gbps) rates, with no known theoretical limitation.

As resources permitted, THIN THREAD technology was documented overall at the system/sub-system level, with some documentation available on the program level. The technology was reviewed in concert with legal authorities and found to be unclassified, before key designers and developers departed the parent agency in late 2001. The company originally contracted by the government to develop THIN THREAD, initiated the process of trademarking component names and pursuing intellectual property rights, after government legal authorities ruled that the company owned commercial patent rights to three of its four functional components. Under this ruling, the U.S. government may, at no cost, use the technology previously developed with public funds for the three components, but would have to pay royalties to exploit alterations, additions and upgrades to it. The designers, former government employees, recently formed a limited partnership, and, in cooperation with the company that initially

developed THIN THREAD are exploring additional government and commercial applications.

Cost

The development and initial deployment of the THIN THREAD prototype cost $2.5 million, including $1.5 million for developing a component to graph metadata relationships. (However, THIN THREAD also built upon technology used and paid for in two prior programs.) The great majority of funding for these systems was provided by Congress. THIN THREAD is extremely cost effective, with complete processing, selection, filtering and storage costs for a STM-1 currently running at $45,000 (including recent reductions in hardware prices). Higher volumes would be priced at relevant multiples of that speed/cost, employing a STM-1 configuration as the basic building block for high-volume traffic. Up to $30,000 more would be required for a dedicated metadata graphing server at the content storage site, while distributing among sites (vice centralizing) the automated data correlation process. Very little space and money are required, relative to competitors. Although specific data on the full processing (and thus storage and correlation) requirement for parent agency modernization is not readily available, it is estimated informally that complete modernization of the parent agency could be accomplished for $60 million to $100 million, with more fulsome capabilities at the higher end.

Further Development

The current model is highly capable, e.g., able to deal with nearly 100% of the data recently accessible from the Internet. Analog data can be digitized and then processed through the system. Additional planned development would process more proprietary communications protocols transiting networks, as well as add more foreign languages, automate data base indexing, and combine the graphing database portion of THIN THREAD with other programs to create a multi-source database. Thereafter, development would proceed to other domains and other currently less urgent requirements. Lower- level improvements to keep abreast of industry developments (e.g., new commercial attachments to documents) would be required regularly, as with most such systems. There exist many potential government and commercial uses for the system’s data management capabilities, some of which could require adaptations.

Overall System Description

THIN THREAD is a software-based system to produce and enable rapid discovery of useful and actionable knowledge from huge data volumes. It uses commercial hardware processors that regularly will become ever more powerful (new deployments would move from an INTEL-based Dell 1550 to a faster Dell 1650, at reduced overall cost), with software being easily and remotely upgradeable. Many of its software components are commercial products (the browser is Netscape Navigator and the remote search engine uses the freeware product WAIS), making it extremely inexpensive and quickly and easily upgradeable. Overall, 95% of system software is based on commercial technology. Rather than developing a unique “architecture,” developers made the system web-based, using LINUX software and employing ANSI C Library, Command Line, HTTP/XML, HTTP/HTML and COM/CORBA/DCE. COM/CORBA/DCE provides distributed API access for legacy systems. Use of the other standards allows the system to continue leveraging enormous commercial investment in the Internet and, more specifically and respectively, to: provide brute-force automation processing efficiently by using a local API for large-scale automated mapping; ease shell programming and testing; use a distributed API for manual mapping on small data sets and to allow distributed use of commercial tools, including easy support; and provide a simple interface that can be accessed from any browser.

The development philosophy was to begin with a thorough understanding of the holistic, end-to-end problem (thereby making the business case for changes), then attack the problem with a strong core capability that demonstrated a positive return on investment, and thereafter insert incremental improvements based upon analyst feedback and direction, through spiral development. The vision of ultimate system capabilities is described above and below. The architecture is inherently agile, enabling new implementations without disturbing the existing infrastructure. The approach is inclusive and holistic, providing the business enterprise a total, end-to-end view and solution, from input through output, including steps to advance the enterprise to its business objectives. This design was accomplished through considerable effort and built on many years of experience and prior work, using intense collaboration between substantive analytic experts, engineers, computer scientists and customers to ensure a system that optimizes use of human cognitive powers and automates as many processes as possible.

In the application implemented for the original prototype, material of interest is processed in sequence by sessionizing (putting all packets back together to form the original, complete transaction), automatically scoring the content as a means of selection, generating non-identifying metadata, finding and discarding or categorizing protected data as appropriate, and then generating additional, identifying metadata. Metadata is stored in either centralized or decentralized graphing server databases, while scored content resides for a time at the data access points in its original format, accessible by an analyst using a commercial web browser. These functions are performed by four THIN THREAD components, discussed in more detail below.

Component One – P25v2

The P25 version 2 (P25v2) processor is arguably the most important THIN THREAD breakthrough. It sessionizes (puts all packets back together to form the original complete transaction) all data and accomplishes follow-on processing of nearly 100% of this data. There is no known theoretical limit to its capability against high volumes. This capability is accomplished through a “division of labor” or “divide and conquer” attack that, e.g., can parcel out portions of a STM 1 transmission link for processing in a distributed fashion (processing of lower increments/speeds also is available). A 155-megabits-per-second STM 1 requires about two feet of rack space, in which commercial processors (see above) less than two inches high are stacked on top of each other, so very little floor space need be devoted to the system. Higher-speed lines incorporate multiples of STM 1s, so processors are added accordingly. Scalability is a fundamental strength of the approach. The equivalent of 4 and 16 STM 1’s already can be processed completely and routinely.

The distinction and importance of one hundred percent processing must be stressed. Other systems claiming the ability to handle high volume in fact only process a small sample of that volume, as little as 3%. Thus, they are forced to focus on items already known to be of interest and have little or no real capability to discover items or patterns not yet identified or found. Complete processing provides a high volume of traffic that, when fed to other components of THIN THREAD, can be filtered and correlated with previously received data and known entities of interest to produce new information and associations. The high volume feeds ever more accurate, automated traffic analysis (for example, analysis of transaction externals — such as names, Social Security, policy or order numbers, and subject lines, as well as other data such as timing and behavior characteristics). This nourishes a “self-learning system” that generates its own tasking based upon new associations created – through the application of analyst-defined hypotheses about relationships. Over time, this enables progressive reduction to an ever smaller and more accurate number of targets, markets, etc. It also permits growing refinement of traffic analysis templates useful, e.g., to generate targeted commercial sales or informative notices, or to produce manual or (eventually) automated reports providing indications and warning based on projected or ongoing activities (examples: changes in consumption patterns, terrorist preparation for attack).

P25 also logs continuous signal quality measurements. This supports performance analyses of reception and domain characteristics. Continuous signal quality measurements increase the probability of high bit integrity in the data collection process by setting off alarms when errors occur. This allows for both local and remote quality control on signal reception, thus preventing the occurrence of dropped packets and broken, incomplete sessions (i.e. data loss). Therefore, the scoring and retrieval process is performed on complete and accurately recorded traffic.

The original version of P25 produced for a predecessor program , P25v1, was very fast and highly capable for its time, but version 2 is far more capable. This is partly because the initial speed of version 1 was reduced after it was adapted for other systems, when a lot more code was added to implement other features. Various names have been given to the altered version 1.

P25v2 output is displayed using the commercial Netscape Navigator browser, although Microsoft’s Internet Explorer may be employed as well. It is important to note that the remaining three downstream components of THIN THREAD are able to accommodate the high volumes that can be processed by P25v2.

Component Two – P26

Another major THIN THREAD breakthrough is P26, an advanced selection process developed by U.S. government civilians, with intellectual property rights owned by the U.S. government. P-26 is based on the integration of several commercial products that leverages analyst-defined scoring criteria to find targets and information of interest, at extremely high throughput rates (it is able to keep up with P25v2 processing). P26 extracts relevant information loosely grouped under “who” (entity or community of interest), “what” (topic of discussion) and “why” (the reason for interest, from transaction content and attachments). P26 is unique in its ability to select from the content of text attachments. A hypothetical illustration of selection criteria would be Dr. Wu Chang (who) at the Institute for Nuclear Studies (what) regarding nuclear inspections (why). Selectors can be altered according to changing analyst interest and knowledge level or as the subject of interest is refined. P26 also generates some metadata, such as the type of attachment being used and what terms are scored.

P26 scores resulting hits by evaluating them against analyst-entered selection criteria. Initially, only items hitting against all three columns (who, what, why), i.e. those scoring 100%, were retrieved. Modifications can occur to allow variable scores based on the respective values assigned to the “who, what, why” categories by the individual analyst, but further advances require funding and analyst assistance in the development program. The result must be retrieval of only the most relevant items, so as to avoid overwhelming limited analyst capability; this is necessary in order to deal effectively with high-volume data. The P26 scoring process also provides analysts with a means to collaborate on selection criteria (tacit knowledge) when engaged in target development activities. The objective is to enable analysts to test those hypotheses having the greatest potential for success (highest “hit” rates), especially on nascent items of interest.

P26 has previously unavailable multilingual scoring capability for all text and text attachments. Capabilities presently support criteria entered by the analyst in English terms and in one alternate language. For the alternate language, text segmentation (or word isolation within a string of text characters) is supported. At present, only one non-English language is incorporated, although others can be added as needed.

Lastly, P26 incorporates an easily used database supporting an auditing process for scoring (selector arguments) that identifies the responsible analyst or organization. This feature also keeps track of analyst activities (inputs, queries, searches, etc.), providing a record to verify adherence to guidelines or legal standards, evaluate analyst performance, and help ascertain the most successful analytic methods. This capability can be extended to capture and maintain analyst profiles to provide insights into the analysis process, and it can be leveraged as an important teaching tool for analysts.

Since P26 depends on analyst input regarding targets and selectors of interest, analyst cooperation and competence are an important factor in degree of success. Analyst training on system use and initial loading of selectors for established targets should take only 2 to 3 days. However, although it is far easier to use than prior systems and in its initial deployment area became more successful, the prototype implementation of P26 and THIN THREAD met some analyst resistance. The reasons for this appear to be:

objection to automated tracking of data pulls and other analyst activities, which was seen as a threat to career and an invasion of privacy;
management refusal to allow concentrated training time (10 minutes per week was permitted);
a priori internal political objections to and preconceptions about the system;
absence of good corporate engineering and fact-based decision-making processes for quickly leveraging innovation;
absence of key leadership (required in any change initiative) and a preference for the familiar, traditional way of doing things, under which analyst value is in large part based on facility with esoteric programming and processing functions (about half of analyst time currently spent on these);
analyst fear that P26 was too restrictive, not permitting them to “wild card” to find potentially valuable hidden data as much as under current selection methods, with which they are familiar and comfortable;
a management culture more concerned with preservation of existing interests than continuous improvement and adaptation to meet mission objectives; and
comfort with lower-level repetition and reporting of data.

It is worth dwelling on the last two of these factors, which cut to the very reason why the THIN THREAD system as a whole was developed — inability to deal with higher volumes, the increasingly distributed nature of network-based information sources, and consequent decline in the value of analytic product. The growing dysfunction under legacy approaches highlighted the need to expand the universe of source data and automate as many analytic steps as possible: in short, to change analytic practices. To do so requires strong leadership, integrity and discipline in planning and decision-making practices, and a revival of traditional traffic analysis as a core function. The entire analytic effort must move to more of a “sense and respond” model, wherein continuous feedback through research, innovation and metrics analysis allows rapid improvements to move forward with clarity and purpose over time. Analysts need to rise to a higher, all-seeing level of target assessment, to be able to assess the target intent and capability, and project what the entity of interest is going to do before the intended action takes place. The innovations embodied in THIN THREAD are technology breakthroughs in selection methods capable against much higher volumes of fully processed data in native, intended format. These innovations are meant to free up analyst time for higher-level analysis on a continuing basis. Among those unfamiliar and uncomfortable with the more intensive research and thought required, or disquieted over the greater risks and uncertainties of such analysis, a backlash can develop. However, as pointed out above, development of the system was motivated by the need for cultural change in order to achieve enterprise mission objectives and to make timely decisions based on the best projections possible.

Because a central problem has been the inability to select data of interest even at low volumes, but especially at high volumes/speeds, failure to use THIN THREAD types of capabilities, including P26 and graphing capabilities, widely and holistically will mean overall failure. The P26 selection system may be used only on processed/sessionized data, so only full processing will allow the system to be used truly effectively and at its full potential. Another government-sponsored selection system, that has been under development for about three years with a staff ten times larger, contemplates using an approach with generic similar logic to that of THIN THREAD, plus additional statistical procedures; it is not now deployable, and would have to be refined for two years after deployment to ascertain value.

Component Three – GAMBIT-LITE

GAMBIT-LITE, the third THIN THREAD component, performs automated filtering of protected entities at throughput rates consistent with P25v2 and P26 processing/scoring capabilities. It currently generates metadata on approximately 13 data items, collects metadata generated by P26 and forwards all the metadata to a central storage system for statistics generation and for traditional traffic analysis through graphing. GAMBIT-LITE also provides distributed local site storage of scored content that is automatically aged off (dumped/eliminated) at variable rates dependent upon scoring results. Content data that does not score is not retained at present, but under plans for the future, it would be kept as long as extra storage space is available. Local data storage capability is .5 terabyte, and after one year of prototype operation, only a small percentage had been consumed. Metadata is retained indefinitely, currently at a central repository. Data is discarded to minimize storage costs and to maximize efficient use of the IT infrastructure, focusing IT services on the most relevant information; it is a policy decision based on financial and cost-effectiveness criteria, not a technology limitation. Finally, GAMBIT-LITE supports remote content display using the commercial Netscape Navigator (Microsoft’s Internet Explorer can also be used) and the shareware WAIS search engine.

GAMBIT LITE automatically filters out (eliminates/erases) entities protected under its initial application by searching for them in metadata and scored content. Under the prototype application, the data and metadata are automatically discarded or retained according to guidelines for four categories of material, in a reliable manner. If protected data is retained because content may be of legitimate interest, identifying information can be encrypted in order to protect privacy while pursuing legitimate information.

In contrast, current systems “filter and select” by rejecting or throwing out (filtering) well over 90% of data early on, then choosing or selecting items of potential interest from a far smaller pool. Within THIN THREAD, the only filtering, or elimination, of data is that accomplished by GAMBIT-LITE for protection of privacy and adherence to legal standards. This limitation of filtering to a “minimization” or protection function, combined with full processing under P25v2, allows maximum use of all unprotected data. Thus, as discussed below, the various parts of THIN THREAD, acting together, permit search for items, patterns and trends of which the analyst is not currently aware, and allow the use of processed high volume material to build and refine target templates and traffic analysis approaches, eventually permitting generation of automated cues, warnings and (possible with additional development) reports.

The GAMBIT predecessor to GAMBIT-LITE has been available for years, and was originally intended to work with a subsequently cancelled system. However, its deployment was blocked through imposition of various legal hurdles and criteria to which far inferior systems were not subjected. In 1998, a painstaking study of GAMBIT was required, using time-consuming manual verification of a high-volume filtering test; the study concluded that the system was 94% accurate for the test case; of the remaining 6%, 4% were protected unnecessarily and only 2% were left without proper protections. However, although the conditions for the test case were established by the authority imposing the requirement, this authority then opined that perhaps the test case was flawed in conception, and in any case that this success rate might be insufficient against extremely high volumes, since even a very small failure rate could still allow a cumulatively high number of protected items to slip through the filter. (As under guidelines for existing systems, if any such unfiltered, protected data were accidentally accessed by an analyst, it would be discarded or the identity protected, depending upon the category into which it fell.) The needed success rate was never established, and an overt policy decision apparently was not made.

However, it was subsequently mandated that additional filtering and selection techniques be layered onto the system. Since existing techniques were inadequate, this gave rise to the effort to improve selection by developing P26. When THIN THREAD development began in July, 2000, it included an altered GAMBIT-LITE, as well as P26. The GAMBIT-LITE alteration to GAMBIT involved reducing retained metadata items from nearly 200 to about 13 generally associated with classic traffic analysis support, mainly because it was found that, in practice, analysts seldom used the other metadata, but secondarily to maximize use of existing storage space and minimize storage costs, and to avoid unnecessarily slowing system performance. Alternative development efforts still focus on nearly 200 metadata items. Metadata elements could be added back to GAMBIT-LITE as justified. After recent questioning from Congress, it was decided that the GAMBIT-LITE system could be deployed, but at an unspecified future date following compilation of operational guidelines.

In the meantime, legal rights could have been far better protected over the past three years if GAMBIT-LITE had been used. The GAMBIT-LITE test demonstrated a success rate superior, by multiples, to existing filtering technology and manual procedures designed to ensure legal protections. Even had there been doubts about some high-volume processing applications, the GAMBIT system could have been used for far more capable filtering of low-volume [e.g., 2 to 8 Mbps] data and for high-volume areas with relatively few protected entities. This history also betrays a reluctance to face the legal, policy and technology issues accompanying modern operations, a typical avoidance/reactive, rather than proactive, approach, as well as a lack of institutional acknowledgement of the reality that high-volume processing is necessary for success.

Component Four – GRAPHS

The fourth component of the THIN THREAD system is graphing. This B+ Tree Index flat file database provides an analytical system/aid that automatically chains and maps relationships or transactions among the community of interest, arrays events of interest according to timelines in which they occurred, and visually displays contact events for such transactions. It also provides a matrixed method for organizing data from multiple sources about a particular entity of interest.

Graphing maps all metadata and selected content inputs to reconstruct relationships among entities of interest by performing automated, continuous relationship chaining. The chaining is currently limited from items of known interest, for manageable content mining. The continuous and cumulative tracking of transactions, versus discrete “snapshots,” and the time lining of these contacts, distinguishes this relationship chaining from all other known systems and approaches. Over time, these capabilities permit continual improvement of information on relationships and to produce ever more refined templates for communities of specific interest to the analyst and for traffic analysis. The graphing mantra became “volume is our friend” — not our enemy, as with other approaches. It is this set of unique capabilities that enables the rapid discovery of patterns and inferences that, in turn, enable automation of parts of the analytic process. Used with the rest of THIN THREAD, graphing can map relationships in all media even if they are currently processed by differing systems, thereby breaking down legacy “stovepipes.” This allows greater knowledge creation by making relevant data and patterns available to human analysts.

Graphing can organize and correlate virtually limitless amounts of metadata and of “selected” data by linking it to items of interest via matrices. The combination of standard processing and formatting under P25v2, P26 data selection, and storage of data from diverse sources within graphs according to matrixed categories and relationships, is powerful. It permits a breakdown of the “stovepipes” and reveals previously unknown information residing in diverse databases, removing conditions that inhibit data sharing. Fused information allows comprehensive, efficient, fused analysis. Graphing has many potential applications wherever analysis of large amounts of data is needed, both for most government agencies and for multitudinous commercial uses. However, its potential to correlate massive amounts of information requires careful and sensitive use according to strict guidelines, in order to avoid intrusive “big brother” types of abuses.

To achieve increased efficiency and better use of scarce and expensive analytic talent, the ultimate goal should be the automation of as much data input, cueing (through graphing and P26), correlation, reporting and tasking as possible. In this way, data and information can be produced in a more timely manner, and human analysts are freed for more focused research and for in-depth reporting. A conservative estimate is that it should be possible to improve analyst productivity by five to six orders of magnitude from today’s baseline – some believe by 10 to even 20 orders of magnitude, assuming receptive analysts and management support. What is needed, therefore, is to grow from automation of the initial step of contact or transaction chaining to full traffic analysis, including automated reporting on changes or tip offs in patterns of activities. While analysts report mainly on content, rules applied to graphs could report on classic traffic analysis externals contained in the metadata.

The system already has progressed from contact chaining to automation of continuous contact chaining, contact time lining, relationship mapping, data indexing, correlation and retrieval, and other information management functions. Further refinement of entity profiles could also be automated; in this respect, the best practices of commercial firms (ISPs, supermarkets, internet marketers) could be adopted. After finding and refining entity profiles, there should be a progression to the forming and encoding of business rules, in order to coordinate various processes from end to end and to improve mission management — e.g., automating as much as possible of tasking, data collection and routing, processing, analyst cueing and issuance of event and warning reports.

Automation of warning reports might allow sufficient time for company or government officials to take action to avert undesirable developments. Further, the system is designed to generate metrics so that each function can be monitored and measured, providing a basis for near-real-time systems management and decision-making for the enterprise.

Conclusion

THIN THREAD integrates mostly commercial technology in a unique, inexpensive, and revolutionary approach to process, assess and manage transactions captured in formal information systems. It effectively exploits massive volumes of currently stovepiped data flows to allow fact-based enterprise or inter-enterprise analysis, projections and anticipatory decisions. THIN THREAD may also be used to automate the management of enterprise or inter-enterprise processes, and it provides a way to automatically and continuously monitor return on investment for all parts of the enterprise.