用java做抓取的時候免不了要用到多線程的了,因為要同時抓取多個網站或一條線程抓取一個網站的話實在太慢,而且有時一條線程抓取同一個網站的話也比較浪費CPU資源。要用到多線程的等方面,也就免不了對線程的控制或用到線程池。 我在做我們現在的那一個抓取框架的時候,就曾經用過java.util.concurrent.ExecutorService作為線程池,關於ExecutorService的使用代碼大概如下:
java.util.concurrent.Executors類的API提供大量創建連接池的靜態方法:
1.固定大小的線程池:
package BackStage;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
public class JavaThreadPool {
public static void main(String[] args) {
// 創建一個可重用固定線程數的線程池
ExecutorService pool = Executors.newFixedThreadPool(2);
// 創建實現了Runnable接口對象,Thread對象當然也實現了Runnable接口
Thread t1 = new MyThread();
Thread t2 = new MyThread();
Thread t3 = new MyThread();
Thread t4 = new MyThread();
Thread t5 = new MyThread();
// 將線程放入池中進行執行
pool.execute(t1);
pool.execute(t2);
pool.execute(t3);
pool.execute(t4);
pool.execute(t5);
// 關閉線程池
pool.shutdown();
}
}
class MyThread extends Thread {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + "正在執行。。。");
}
}
後來發現ExecutorService的功能沒有想像中的那麼好,而且最多只是提供一個線程的容器而然,所以後來我用改用了java.lang.ThreadGroup,ThreadGroup有很多優勢,最重要的一點就是它可以對線程進行遍歷,知道那些線程已經運行完畢,還有那些線程在運行。關於ThreadGroup的使用代碼如下:
class MyThread extends Thread {
boolean stopped;
MyThread(ThreadGroup tg, String name) {
super(tg, name);
stopped = false;
}
public void run() {
System.out.println(Thread.currentThread().getName() + " starting.");
try {
for (int i = 1; i < 1000; i++) {
System.out.print(".");
Thread.sleep(250);
synchronized (this) {
if (stopped)
break;
}
}
} catch (Exception exc) {
System.out.println(Thread.currentThread().getName() + " interrupted.");
}
System.out.println(Thread.currentThread().getName() + " exiting.");
}
synchronized void myStop() {
stopped = true;
}
}
public class Main {
public static void main(String args[]) throws Exception {
ThreadGroup tg = new ThreadGroup("My Group");
MyThread thrd = new MyThread(tg, "MyThread #1");
MyThread thrd2 = new MyThread(tg, "MyThread #2");
MyThread thrd3 = new MyThread(tg, "MyThread #3");
thrd.start();
thrd2.start();
thrd3.start();
Thread.sleep(1000);
System.out.println(tg.activeCount() + " threads in thread group.");
Thread thrds[] = new Thread[tg.activeCount()];
tg.enumerate(thrds);
for (Thread t : thrds)
System.out.println(t.getName());
thrd.myStop();
Thread.sleep(1000);
System.out.println(tg.activeCount() + " threads in tg.");
tg.interrupt();
}
}
由以上的代碼可以看出:ThreadGroup比ExecutorService多以下幾個優勢
1.ThreadGroup可以遍歷線程,知道那些線程已經運行完畢,那些還在運行
2.可以通過ThreadGroup.activeCount知道有多少線程從而可以控制插入的線程數