`
木木三个竖
  • 浏览: 11444 次
  • 性别: Icon_minigender_1
  • 来自: 沈阳
社区版块
存档分类
最新评论

从html网页找email地址的蜘蛛程序(马士兵尚学堂)

阅读更多
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class EmailSpider {

public static void main(String[] args) {
try {
BufferedReader br = new BufferedReader(new FileReader("D:\\share\\courseware\\1043633.html"));
String line = "";
while((line=br.readLine()) != null) {
parse(line);
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

private static void parse(String line) {
Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
Matcher m = p.matcher(line);
while(m.find()) {
System.out.println(m.group());
}
}

}
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics