Information on any given topic is often scattered across the Web. Previously this scatter has been characterized through the inequality of distribution of facts (i.e. pieces of information) across webpages. Such an approach conceals how specific facts (e.g. rare facts) occur in specific types of pages (e.g. fact-rich pages). To reveal such regularities, we construct bipartite networks, consisting of two types of vertices: the facts contained in webpages and the webpages themselves. Such a representation enables the application of a series of network analysis techniques, revealing structural features such as connectivity, robustness and clustering. Not only does network analysis yield new insights into information scatter, but we also illustrate the benefit of applying new and existing analysis techniques directly to a bipartite network as opposed to its one-mode projection. We discuss the implications of each network feature to the users' ability to find comprehensive information online. Finally, we compare the bipartite graph structure of webpages and facts with the hyperlink structure between the webpages.
ASJC Scopus subject areas
- Physics and Astronomy(all)