Using weight-controlled token matching to extract data from HTML files | IEEE Conference Publication | IEEE Xplore