Proxy changes per web access

Asked 2 years ago, Updated 2 years ago, 76 views

In a university study,
to collect various data The Java program is scraping the data on the web.
Data access uses multiple forward proxy servers (apache).

To prevent access from being concentrated on a single proxy server,
I wrote a program that would change the proxy server round robin as follows.

In the output on the command line that I vomited in sysout,
It appears that access to the proxy server is done in round robin, but
If you look at the proxy server logs, you'll find that access is concentrated on one.
(One access every 20 seconds seems to be distributed. It seems to be concentrated after more than 5 seconds.)

The HttpURLConnection or Proxy class is
Is it impossible to change the proxy server every time I access it?

If you have any information, I would appreciate it if you could give me some advice.
I look forward to your kind cooperation.

private DataInputStream dataAccess(String proxy_url, String page_url)throws Exception {
    URL url = new URL(page_url); // web page
    String proxy_port = "80";

    Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxy_url, Integer.parseInt(proxy_port)));
    HttpURLConnection connection=(HttpURLConnection) url.openConnection(proxy);
    connection.setAllowUserInteraction(false);
    connection.setInstanceFollowRedirects(true);
    connection.setRequestMethod("GET");
    connection.addRequestProperty("Cookie", this.getCookie());

    connection.connect();
    System.out.println(proxy_url+":"+page_url);

    int httpStatusCode=connection.getResponseCode();
    if(httpStatusCode!=HttpURLConnection.HTTP_OK){
        System.err.println("File Not Found:" + urlStr);
        through new Exception();
    }

    // Return the retrieved data to DataInputStream
    return newDataInputStream(connection.getInputStream());
}

java url

2022-09-30 21:24

1 Answers

We experimented with Proxy with the following code, and they connected to the specified proxy host as expected.

import java.net.InetSocketAddress;
import java.net.Proxy;
import java.util.List;
import java.util.ArrayList;
import java.net.URL;
import java.net.HttpURLConnection;

public class Sample {
    private static List <Proxy>proxies = new ArrayList <Proxy>(){
        {
            add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.1", 8080)));
            add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.2", 8080)));
            add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.3", 8080)));
            add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.1", 8080)));
            add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.2", 8080)));
            add(new Proxy(Proxy.Type.HTTP, new InetSocketAddress("192.168.1.3", 8080)));
        }
    };

    public static void main (String...args)
    {
        try{
            URL url = new URL ("https://www.example.com");
            for (Proxy proxy:proxies) {
                HttpURLConnection conn=(HttpURLConnection) url.openConnection(proxy);
                try{
                    /* Just connect to the proxy host.
                     * (No requests will be sent to the destination URL)
                     */
                    System.out.println("Proxy="+proxy);
                    conn.connect();
                    System.out.println("done.");
                }
                catch(java.io.IOExceptione){
                    e.printStackTrace();
                }
                US>finally
                    conn.disconnect();
                }
            }
        }
        catch(Exceptione){
            e.printStackTrace();
        }
    }
}
  • OS:CentOS 6.4(x86_64)
  • Java:1.8.0_91
  • Note: In the above code, there are three proxy hosts: 192.168.1.1, 192.168.1.2, and 192.168.1.3.Also, the access URL is https://www.example.com.(Please rewrite it when you experiment)

Is the proxy_url of the dataAccess method as expected or
Also, make sure that the Proxy is generated correctly.
(Proxy#toString will tell you.)


2022-09-30 21:24

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.