Skip to Main Content
This paper presents metadata extraction technique from email documents. Emails are characterized in terms of keywords that are extracted from body of the mail using frequency, average similarity and term discrimination value measures. The email metadata is defined as a document type definition (DTD) in extensible markup language (XML) that captures the structure as well as content characterizing keywords with their attributes (weights). A PERL application has been designed and implemented to extract keywords with their attributes (weights) and generate XML document for email metadata. Practical applications of metadata extraction technique are also discussed briefly.