Like the title, it's a code that uses Java parsing to scratch information on the Billboard chart website.
I'm not sure because I'm a beginner, but I think I use the URL class to create a url object and read it through buffered reader
, but it popped up on Java that url class must handle exceptions. The result value is a parsing error that I made an exception to. Why can't I read it?
package s89;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
public class BillboardMain3 {
public static void main(String[] args) {
String newUrls="https://www.billboard.com/charts/hot-100/";
URL url=null;
try {
url = new URL (newUrls); // Find Address
//Strawing on Address
BufferedReader reader = new BufferedReader(
new InputStreamReader(url.openStream(),"euc-kr"),8);
String line = null;
while((line = reader.readLine())!=null) { // Read one line at a time
If(!line.trim().equals(") { // print if it is not blank.
System.out.println(line.trim());
}
}
} } catch (Exception e) {
System.out.println("Billboard Parsing error!!!");
}
}
}
You must be working on 200 Javas I'm doing that, too. lol
I think the billboard chart site blocked parsing.
It comes out well if you type in the domain of another site.
If you touch the cord a little bit, the rest of it seems to work well.
I think it's because it's been a while since I published a book.
This is a groovy sample. Please refer to it.
groovy:000> @Grab(group='org.apache.httpcomponents', module='httpclient', version='4.4')
groovy:001> go
===> null
groovy:000> import org.apache.http.impl.client.*
groovy:000> import org.apache.http.client.methods.*
groovy:000> import org.apache.http.util.*
groovy:000> httpClient = HttpClients.createDefault()
groovy:000> httpGet = new HttpGet("https://www.billboard.com/charts/hot-100/")
groovy:000> response = httpClient.execute(httpGet)
===> HttpResponseProxy{HTTP/1.1 200 OK [Date: Sat, 28 Sep 2019 17:51:51 GMT, Content-Type: text/html; charset=UTF-8, Transfer-Encoding: chunked, Connection: keep-alive, CF-Cache-Status: HIT, Cache-Control: max-age=1, public, s-maxage=300, CF-Ray: 51d7915b2b73a261-ICN, Age: 275, Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct", Last-Modified: Fri, 27 Sep 2019 19:50:29 GMT, Set-Cookie: PGMINFO=cc:kr-ip:116.37.93.4; Max-Age=3600; Path=/; Domain=.billboard.com, Vary: Accept-Encoding, Via: 1.1 varnish (Varnish/5.2), X-Cache-Hits: HIT (25), X-Debug-Cookies: , X-Debug-Log: Removed cookies, X-NX-Host: www.billboard.com, X-Varnish: 917770121 932119920, Server: cloudflare] org.apache.http.client.entity.DecompressingEntity@60d6fdd4}
groovy:000> contents = EntityUtils.toString(response.getEntity(), "UTF-8")
===> <!doctype html>
<html class="" lang="">
<head>
<script>
_udn = "billboard.com";
</script>
<script>function utmx_section(){}function utmx(){}(function(){var
k='67942495-39',d=document,l=d.location,c=d.cookie;
if(l.search.indexOf('utm_expid='+k)>0)return;
function f(n){if(c){var i=c.indexOf(n+'=');if(i>-1){var j=c.
indexOf(';',i);return escape(c.substring(i+n.length+1,j<0?c.
length:j))}}}var x=f('__utmx'),xx=f('__utmxx'),h=l.hash;d.write(
© 2024 OneMinuteCode. All rights reserved.