在日常生活中,为了实现对数据的快速获取或者生成,可能没有那么大的数据,所以今天介绍一个小小的在网站爬取数据的案例

爬取目标:书城

在这里插入图片描述
我这里爬取的是图片路径以及书名存储到数据库(mysql),并将图片存储到本地(D盘下),

主要依赖:

		<dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.14</version>
        </dependency>

代码功能块:

在这里插入图片描述

public class Spide {
    @Autowired
    private GoodsMapper goodsMapper;

    public void getHTML(String url) {
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet(url);
        CloseableHttpResponse response = null;
        try {
            response = httpClient.execute(httpGet);
            String content = EntityUtils.toString(response.getEntity());
            Document document = Jsoup.parse(content);

            Elements elements = document.select(".tushu");
            for (Element element : elements) {
                Goods goods = new Goods();
                String imageUrl = element.select(".cover > a > img").first().attr("src");
                getImage(imageUrl);
                goods.setPicture(imageUrl);
                String title = element.select(".name").first().text();
//                System.out.println(title);
                goods.setName(title);
//                goodsService.save(goods);
                goodsMapper.insert(goods);
            }
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
	//爬取下来的图片存储到D盘
    public void getImage(String imageUrl) {
        String imageDir = "D://spideimages//";
        CloseableHttpClient httpClient = HttpClients.createDefault();
        HttpGet httpGet = new HttpGet(imageUrl);
        CloseableHttpResponse response = null;
        try {
            response = httpClient.execute(httpGet);
            InputStream is = response.getEntity().getContent();
            String newFileName = UUID.randomUUID().toString().replaceAll("-", "");
            String suffix = imageUrl.substring(imageUrl.lastIndexOf("."));
            FileOutputStream fos = new FileOutputStream(imageDir + newFileName + suffix);

            byte[] b = new byte[1024];
            int len = 0;
            while ((len = is.read(b, 0, b.length)) != -1) {
                fos.write(b, 0, len);
            }
            fos.close();
            is.close();
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

测试代码:

@Test
    public void test() {
        for (int i = 1; i <= 15; i++) {
            spide.getHTML("https://book.dangdang.com/list/newRelease_C01.03_P" + i + ".htm");
        }
    }

结果:

在这里插入图片描述
在这里插入图片描述

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐