Blogroll

This blog is created for education about engineering students.All the engineering students are get free downloadable books here. not only books and also different software also available here

MINING THE WEB DISCOVERING KNOWLEDGE FROM HYPERTEXT DATA

 DISCOVERING KNOWLEDGE
FROM HYPERTEXT DATA by
Soumen Chakrabarti

 

                                          This book is about finding significant statistical patterns relating hypertext documents, topics, hyperlinks, and queries and using these patterns to connect users to information they seek. TheWeb has become a vast storehouse of knowledge, built in a decentralized yet collaborative manner. It is a living, growing, populist, and participatory medium of expression with no central editorship. This has positive and negative implications. On the positive side, there is widespread participation in authoring content. Compared to print or broadcast media, the ratio of content creators to the audience is more equitable. On the negative side, the heterogeneity and lack of structure makes it hard to frame queries and satisfy information needs.

                                          For many queries posed with the help of words and phrases, there are thousands
of apparently relevant responses, but on closer inspection these turn out to be disappointing for all but the simplest queries. Queries involving nouns and noun phrases, where the information need is to find out about the named entity, are the simplest sort of information-hunting tasks. Only sophisticated users succeed with
more complex queries—for instance, those that involve articles and prepositions to relate named objects, actions, and agents. If you are a regular seeker and user of Web information, this state of affairs needs no further description.

Detecting and exploiting statistical dependencies between terms,Web pages, and hyperlinks will be the central theme in this book. Such dependencies are also called patterns, and the act of searching for such patterns is called machine learning, or data mining. Here are some examples of machine learning forWeb applications. Given a crawl of a substantial portion of the Web, we may be interested in constructing a topic directory like Yahoo!, perhaps detecting the emergence and decline of prominent topics with passing time. 

Once a topic directory is available, we may wish to assign freshly crawled pages and sites to suitable positions in the directory.

In this book, the data that we will “mine” will be very rich, comprising text, hypertext markup, hyperlinks, sites, and topic directories. This distinguishes the area ofWeb mining as a new and exciting field, although it also borrows liberally from traditional data analysis. As we shall see, useful information on the Web is accompanied by incredible levels of noise, but thankfully, the law of large numbers kicks in often enough that statistical analysis can make sense of the confusion.

This book is available HERE
Don't forget to say thanks

No comments:

Post a Comment

Total Pageviews