In a university study,
to collect various data
The Java program is scraping the data on the web.
Data access uses multiple forward proxy servers (apache).
To prevent access from being concentrated on a single proxy server,
I wrote a program that would change the proxy server round robin as follows.
In the output on the command line that I vomited in sysout,
It appears that access to the proxy server is done in round robin, but
If you look at the proxy server logs, you'll find that access is concentrated on one.
(One access every 20 seconds seems to be distributed. It seems to be concentrated after more than 5 seconds.)
The HttpURLConnection or Proxy class is
Is it impossible to change the proxy server every time I access it?
If you have any information, I would appreciate it if you could give me some advice.
I look forward to your kind cooperation.
private DataInputStream dataAccess(String proxy_url, String page_url)throws Exception {
URL url = new URL(page_url); // web page
String proxy_port = "80";
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxy_url, Integer.parseInt(proxy_port)));
HttpURLConnection connection=(HttpURLConnection) url.openConnection(proxy);
connection.setAllowUserInteraction(false);
connection.setInstanceFollowRedirects(true);
connection.setRequestMethod("GET");
connection.addRequestProperty("Cookie", this.getCookie());
connection.connect();
System.out.println(proxy_url+":"+page_url);
int httpStatusCode=connection.getResponseCode();
if(httpStatusCode!=HttpURLConnection.HTTP_OK){
System.err.println("File Not Found:" + urlStr);
through new Exception();
}
// Return the retrieved data to DataInputStream
return newDataInputStream(connection.getInputStream());
}
We experimented with Proxy
with the following code, and they connected to the specified proxy host as expected.
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.util.List;
import java.util.ArrayList;
import java.net.URL;
import java.net.HttpURLConnection;
public class Sample {
private static List <Proxy>proxies = new ArrayList <Proxy>(){
{
add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.1", 8080)));
add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.2", 8080)));
add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.3", 8080)));
add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.1", 8080)));
add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.2", 8080)));
add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.3", 8080)));
}
};
public static void main (String...args)
{
try{
URL url = new URL ("https://www.example.com");
for (Proxy proxy:proxies) {
HttpURLConnection conn=(HttpURLConnection) url.openConnection(proxy);
try{
/* Just connect to the proxy host.
* (No requests will be sent to the destination URL)
*/
System.out.println("Proxy="+proxy);
conn.connect();
System.out.println("done.");
}
catch(java.io.IOExceptione){
e.printStackTrace();
}
US>finally
conn.disconnect();
}
}
}
catch(Exceptione){
e.printStackTrace();
}
}
}
192.168.1.1
, 192.168.1.2
, and 192.168.1.3
.Also, the access URL is https://www.example.com
.(Please rewrite it when you experiment) Is the proxy_url
of the dataAccess
method as expected or
Also, make sure that the Proxy
is generated correctly.
(Proxy#toString
will tell you.)
574 Who developed the "avformat-59.dll" that comes with FFmpeg?
620 Uncaught (inpromise) Error on Electron: An object could not be cloned
916 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
573 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
© 2024 OneMinuteCode. All rights reserved.