Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

EasyExcel Alibaba open source Excel operation artifact!


May 31, 2021 Article blog


Table of contents


This article is from the public number: Throwable, author Trowable

precondition

Exporting data to Excel is one of the most common back-end requirements, and today we recommend an Excel operating artifact from Ali: EasyExcel EasyExcel is a package of apache-poi from its dependent tree, and the author chose EasyExcel from the beginning of contact with Excel processing, avoiding the memory leak caused by the widely circulated apache-poi

Introduce EasyExcel dependency

Maven which introduced EasyExcel is as follows:



    com.alibaba
    easyexcel
    ${easyexcel.version}

The latest version 2020-09 is 2.2.6

Introduction to the API

Excel files are primarily processed around read and write operations, and EasyExcel API are designed around both. First look at the relevant API for read operations:

// 新建一个ExcelReaderBuilder实例
ExcelReaderBuilder readerBuilder = EasyExcel.read();
// 读取的文件对象,可以是File、路径(字符串)或者InputStream实例
readerBuilder.file("");
// 文件的密码
readerBuilder.password("");
// 指定sheet,可以是数字序号sheetNo或者字符串sheetName,若不指定则会读取所有的sheet
readerBuilder.sheet("");
// 是否自动关闭输入流
readerBuilder.autoCloseStream(true);
// Excel文件格式,包括ExcelTypeEnum.XLSX和ExcelTypeEnum.XLS
readerBuilder.excelType(ExcelTypeEnum.XLSX);
// 指定文件的标题行,可以是Class对象(结合@ExcelProperty注解使用),或者List实例
readerBuilder.head(Collections.singletonList(Collections.singletonList("head")));
// 注册读取事件的监听器,默认的数据类型为Map,第一列的元素的下标从0开始
readerBuilder.registerReadListener(new AnalysisEventListener() {


    @Override
    public void invokeHeadMap(Map headMap, AnalysisContext context) {
        // 这里会回调标题行,文件内容的首行会认为是标题行
    }


    @Override
    public void invoke(Object o, AnalysisContext analysisContext) {
        // 这里会回调每行的数据
    }


    @Override
    public void doAfterAllAnalysed(AnalysisContext analysisContext) {


    }
});
// 构建读取器
ExcelReader excelReader = readerBuilder.build();
// 读取数据
excelReader.readAll();
excelReader.finish();

As you can see, read operations are primarily designed using Builder mode and event listening (or, as "observer mode"). In general, the above code can be simplified as follows:

Map head = new HashMap();
List data = new LinkedList();
EasyExcel.read("文件的绝对路径").sheet()
        .registerReadListener(new AnalysisEventListener() {


            @Override
            public void invokeHeadMap(Map headMap, AnalysisContext context) {
                head.putAll(headMap);
            }


            @Override
            public void invoke(Map row, AnalysisContext analysisContext) {
                data.add(row);
            }


            @Override
            public void doAfterAllAnalysed(AnalysisContext analysisContext) {
                    // 这里可以打印日志告知所有行读取完毕
            }
        }).doRead();

If you need to read the data and convert it to a corresponding list of objects, you need to specify Class for the title line, which is used in conjunction with the annotation @ExcelProperty

文件内容:


|订单编号|手机号|
|ORDER_ID_1|112222|
|ORDER_ID_2|334455|


@Data
private static class OrderDTO {


    @ExcelProperty(value = "订单编号")
    private String orderId;


    @ExcelProperty(value = "手机号")
    private String phone;
}


Map head = new HashMap();
List data = new LinkedList();
EasyExcel.read("文件的绝对路径").head(OrderDTO.class).sheet()
        .registerReadListener(new AnalysisEventListener() {


            @Override
            public void invokeHeadMap(Map headMap, AnalysisContext context) {
                head.putAll(headMap);
            }


            @Override
            public void invoke(OrderDTO row, AnalysisContext analysisContext) {
                data.add(row);
            }


            @Override
            public void doAfterAllAnalysed(AnalysisContext analysisContext) {
                // 这里可以打印日志告知所有行读取完毕
            }
        }).doRead();

"If the amount of data is large, it is recommended to use Map type to read and manipulate data objects, otherwise a large number of reflection operations can greatly increase the time it takes to read the data, and in extreme cases, such as when the property is large, the reflection operation may take longer than reading and traversing."

Then look at the API for the write:

// 新建一个ExcelWriterBuilder实例
ExcelWriterBuilder writerBuilder = EasyExcel.write();
// 输出的文件对象,可以是File、路径(字符串)或者OutputStream实例
writerBuilder.file("");
// 指定sheet,可以是数字序号sheetNo或者字符串sheetName,可以不设置,由下面提到的WriteSheet覆盖
writerBuilder.sheet("");
// 文件的密码
writerBuilder.password("");
// Excel文件格式,包括ExcelTypeEnum.XLSX和ExcelTypeEnum.XLS
writerBuilder.excelType(ExcelTypeEnum.XLSX);
// 是否自动关闭输出流
writerBuilder.autoCloseStream(true);
// 指定文件的标题行,可以是Class对象(结合@ExcelProperty注解使用),或者List实例
writerBuilder.head(Collections.singletonList(Collections.singletonList("head")));
// 构建ExcelWriter实例
ExcelWriter excelWriter = writerBuilder.build();
List data = new ArrayList();
// 构建输出的sheet
WriteSheet writeSheet = new WriteSheet();
writeSheet.setSheetName("target");
excelWriter.write(data, writeSheet);
// 这一步一定要调用,否则输出的文件有可能不完整
excelWriter.finish();

ExcelWriterBuilder there are many styles, row processors, converter settings and other methods, the author feel not commonly used, here do not make an example, the content of the style is usually after the output file processing again will be easier to operate. Write operations can generally be simplified as follows:

List head = new ArrayList();
List data = new LinkedList();
EasyExcel.write("输出文件绝对路径")
        .head(head)
        .excelType(ExcelTypeEnum.XLSX)
        .sheet("target")
        .doWrite(data);

Practical tips

Here's a brief introduction to the practical techniques used in production.

Multithreaded reading

Using EasyExcel multithreaded reading recommendations is used under qualified prerequisites:

  • The source file has been split into several small files, and each small file has the same number of heading rows and columns.
  • The machine memory should be sufficient because the results of concurrent reads eventually need to be combined into a large result set, with all the data stored in memory.

Multiple files that often encounter external feedback require urgent data analysis or cross-proofing, and in order to speed up file reading, the author usually uses this method to read Excel files in a consistent format in bulk

A simple example is as follows:

@Slf4j
public class EasyExcelConcurrentRead {


    static final int N_CPU = Runtime.getRuntime().availableProcessors();


    public static void main(String[] args) throws Exception {
        // 假设I盘的temp目录下有一堆同格式的Excel文件
        String dir = "I:\\temp";
        List mergeResult = Lists.newLinkedList();
        ThreadPoolExecutor executor = new ThreadPoolExecutor(N_CPU, N_CPU * 2, 0, TimeUnit.SECONDS,
                new LinkedBlockingQueue(), new ThreadFactory() {


            private final AtomicInteger counter = new AtomicInteger();


            @Override
            public Thread newThread(@NotNull Runnable r) {
                Thread thread = new Thread(r);
                thread.setDaemon(true);
                thread.setName("ExcelReadWorker-" + counter.getAndIncrement());
                return thread;
            }
        });
        Path dirPath = Paths.get(dir);
        if (Files.isDirectory(dirPath)) {
            List futures = Files.list(dirPath)
                    .map(path -> path.toAbsolutePath().toString())
                    .filter(absolutePath -> absolutePath.endsWith(".xls") || absolutePath.endsWith(".xlsx"))
                    .map(absolutePath -> executor.submit(new ReadTask(absolutePath)))
                    .collect(Collectors.toList());
            for (Future future : futures) {
                mergeResult.addAll(future.get());
            }
        }
        log.info("读取[{}]目录下的文件成功,一共加载:{}行数据", dir, mergeResult.size());
        // 其他业务逻辑.....
    }


    @RequiredArgsConstructor
    private static class ReadTask implements Callable {


        private final String location;


        @Override
        public List call() throws Exception {
            List data = Lists.newLinkedList();
            EasyExcel.read(location).sheet()
                    .registerReadListener(new AnalysisEventListener() {


                        @Override
                        public void invoke(Map row, AnalysisContext analysisContext) {
                            data.add(row);
                        }


                        @Override
                        public void doAfterAllAnalysed(AnalysisContext analysisContext) {
                            log.info("读取路径[{}]文件成功,一共[{}]行", location, data.size());
                        }
                    }).doRead();
            return data;
        }
    }
}

Here you submit a concurrent read task using ThreadPoolExecutor#submit() and then use Future#get() to wait for all tasks to complete before merging the final read results.

Note that writes to general files cannot be performed concurrently, otherwise there is a high probability that the data will be confused

Multiple Sheet writes

Multiple Sheet writes are actually written to multiple WriteSheet instances using the same ExcelWriter instance, and each Sheet title line can be overwritten by the configuration properties in the WriteSheet instance, as follows:

public class EasyExcelMultiSheetWrite {


    public static void main(String[] args) throws Exception {
        ExcelWriterBuilder writerBuilder = EasyExcel.write();
        writerBuilder.excelType(ExcelTypeEnum.XLSX);
        writerBuilder.autoCloseStream(true);
        writerBuilder.file("I:\\temp\\temp.xlsx");
        ExcelWriter excelWriter = writerBuilder.build();
        WriteSheet firstSheet = new WriteSheet();
        firstSheet.setSheetName("first");
        firstSheet.setHead(Collections.singletonList(Collections.singletonList("第一个Sheet的Head")));
        // 写入第一个命名为first的Sheet
        excelWriter.write(Collections.singletonList(Collections.singletonList("第一个Sheet的数据")), firstSheet);
        WriteSheet secondSheet = new WriteSheet();
        secondSheet.setSheetName("second");
        secondSheet.setHead(Collections.singletonList(Collections.singletonList("第二个Sheet的Head")));
        // 写入第二个命名为second的Sheet
        excelWriter.write(Collections.singletonList(Collections.singletonList("第二个Sheet的数据")), secondSheet);
        excelWriter.finish();
    }
}

The effect is as follows:

 EasyExcel Alibaba open source Excel operation artifact!1

Paginated queries and bulk writing

In some cases where the amount of data is relatively large, consider pagination queries and bulk writing, which is in fact 分页查询原始数据 -> 数据聚合或者转换 -> 写目标数据 -> 下一页查询.... In fact, in the case of small amount of data, one-time full-scale query and full-scale writing is only a special case of paginated queries and bulk writing, so you can abstract queries, transformations, and write operations into a reusable template method:

int batchSize = 定义每篇查询的条数;
OutputStream outputStream = 定义写到何处;
ExcelWriter writer = new ExcelWriterBuilder()
        .autoCloseStream(true)
        .file(outputStream)
        .excelType(ExcelTypeEnum.XLSX)
        .head(ExcelModel.class);
for (;;){
    List list = originModelRepository.分页查询();
    if (list.isEmpty()){
        writer.finish();
        break;
    }else {
        list 转换-> List excelModelList;
        writer.write(excelModelList);
    }
}

See the author's previous non-titled party production application article "Million-level data Excel export optimization", suitable for big data export scenarios, the code is as follows:

 EasyExcel Alibaba open source Excel operation artifact!2

Excel uploads and downloads

The following example applies to Servlet containers, commonly used in Tomcat, for spring-boot-starter-web

Excel file uploads are similar to normal file uploads, and then use EasyExcel ExcelReader to read the request object MultipartHttpServletRequest Chinese partially abstract InputStream instance:

@PostMapping(path = "/upload")
public ResponseEntity upload(MultipartHttpServletRequest request) throws Exception {
    Map fileMap = request.getFileMap();
    for (Map.Entry part : fileMap.entrySet()) {
        InputStream inputStream = part.getValue().getInputStream();
        Map head = new HashMap();
        List data = new LinkedList();
        EasyExcel.read(inputStream).sheet()
                .registerReadListener(new AnalysisEventListener() {


                    @Override
                    public void invokeHeadMap(Map headMap, AnalysisContext context) {
                        head.putAll(headMap);
                    }


                    @Override
                    public void invoke(Map row, AnalysisContext analysisContext) {
                        data.add(row);
                    }


                    @Override
                    public void doAfterAllAnalysed(AnalysisContext analysisContext) {
                        log.info("读取文件[{}]成功,一共:{}行......", part.getKey(), data.size());
                    }
                }).doRead();
        // 其他业务逻辑
    }
    return ResponseEntity.ok("success");
}

The Postman request is as follows:

 EasyExcel Alibaba open source Excel operation artifact!3

Using EasyExcel for Excel file export is also simple, simply attaching the OutputStream object carried in the response object HttpServletResponse to the ExcelWriter instance of EasyExcel

@GetMapping(path = "/download")
public void download(HttpServletResponse response) throws Exception {
    // 这里文件名如果涉及中文一定要使用URL编码,否则会乱码
    String fileName = URLEncoder.encode("文件名.xlsx", StandardCharsets.UTF_8.toString());
    // 封装标题行
    List head = new ArrayList();
    // 封装数据
    List data = new LinkedList();
    response.setContentType("application/force-download");
    response.setHeader("Content-Disposition", "attachment;filename=" + fileName);
    EasyExcel.write(response.getOutputStream())
            .head(head)
            .autoCloseStream(true)
            .excelType(ExcelTypeEnum.XLSX)
            .sheet("Sheet名字")
            .doWrite(data);
}

Here's a look:

  • If the file name contains Chinese, it needs to be URL encoded, otherwise it will be garbled.
  • Whether imported or exported, if the amount of data is time-consuming, remember to adjust the upper limit configuration of the connection, read and write timeout time in Nginx if you Nginx Nginx
  • Using SpringBoot requires adjusting the configuration values spring.servlet.multipart.max-request-size and spring.servlet.multipart.max-file-size to avoid exceptions to over-uploaded files.

brief summary

EasyExcel API design is easy to use and you can use him to quickly develop scenes with Excel data import or export, which is a boon to the vast majority of Javar people.

Above is W3Cschool编程狮 About EasyExcel Alibaba Open Source Excel Operating Artifact! Related to the introduction, I hope to help you.