Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Cloud development Excel document processing


May 22, 2021 Mini Program Cloud Development Advanced



Excel is a more common format for storing data, it is the carrier of daily office operation data, but also a way that many non-technical people often use for data transfer, very frequent use, so it is an important topic to study how to import Excel (CSV) data into the database and export the data from the database to Excel (CSV). In addition to importing export csv files in the cloud development console, we can also use some modules from Nodejs for Excel documents in cloud functions.

First, read the Excel file stored in the cloud

We can search for the keyword "Node Excel" on Github to filter Star more, the conditions are more enough, here is recommended to use node-xlsx, Github address: node-xlsx.

Use the developer tool to create a new cloud function such as node-excel, add the latest version of node-xlsx to package.json, and right-click the cloud function directory to select the input command npm install installation dependency to open in the terminal:

  1. "dependencies": {
  2. "wx-server-sdk": "latest",
  3. "node-xlsx": "latest"
  4. }

Then enter the following code .js in the index code, and here are a few points to note:

  • The source of Excel files processed using cloud functions is your cloud storage, so you need to upload the data csv file to cloud storage in advance and replace it with your cloud storage csv address in the following code;

  • The cloud function downloads the csv file from the cloud store, then parses the Excel file using node-xlsx, and then writes each row to the database, which uses the chinese economic data described earlier, and only some fields are written here;

  • Because the following is to read each row of data, and read the data loop to the database, that is, the database add request in the loop, in general, we are very not recommended to do so, if you want to do so, mainly to set the time-out of the cloud function to be longer, such as 20s to 60s, to ensure the success of the cloud function, otherwise there will be only a part of the success of the situation;

  1. const cloud = require('wx-server-sdk')
  2. cloud.init({
  3. env: cloud.DYNAMIC_CURRENT_ENV
  4. })
  5. const xlsx = require('node-xlsx');
  6. const db = cloud.database()
  7. exports.main = async (event, context) => {
  8. const fileID = 'cloud://xly-xrlur.786c-xly-xrlur-1300446086/china.csv' //你需要将该csv的地址替换成你的云存储的csv地址
  9. const res = await cloud.downloadFile({
  10. fileID: fileID,
  11. })
  12. const buffer = await res.fileContent
  13. const sheets = await xlsx.parse(buffer); //解析下载后的Excel Buffer文件,sheets是一个对象,而sheets['data']是数组,Excel有多少行数据,这个数组里就有多少个数组;
  14. const sheet = sheets[0].data //取出第一张表里的数组,注意这里的sheet为数组
  15. const tasks = []
  16. for (let rowIndex in sheet) { //如果你的Excel第一行为字段名的话,从第2行开始
  17. let row = sheet[rowIndex];
  18. const task = await db.collection('chinaexcel')
  19. .add({
  20. data: {
  21. city: row[0],
  22. province: row[1],
  23. city_area: row[2],
  24. builtup_area: row[3],
  25. reg_pop: row[4],
  26. resident_pop: row[5],
  27. gdp: row[6]
  28. }
  29. })
  30. tasks.push(task) //task是数据库add请求返回的值,包含数据添加之后的_id,以及是否添加成功
  31. }
  32. return tasks;
  33. }

The data obtained using xlsx.parse parsing Excel files is an array, i.e. the sheets mentioned above, the values in the array are each table of Excel, sheets[0].data the data in the first table, and the data is still an array, and the sheets[0].data the array are the data of each row of excel tables.

In the object returned by parsing, each array is a row of Excel data.

  1. [
  2. {
  3. name: 'Sheet1',
  4. data: [
  5. [Array], [Array],
  6. ... 233 more items
  7. ]
  8. }
  9. ]

Found that many people use cloud functions to import large amounts of data into the database, using the Project.all() method, this method will appear con consumable problems, will report [LimitExceeded.NoValidConnection] Connection num overrun because the database at the same time the number of connections is limited, different package database connections are different, such as free is 20. There are other solutions to this problem, which are not covered here; and although you may have set the timeout time of the cloud function to 60s, it will still occur, the data is not fully imported, obviously your Excel file is too large or the data imported at once is too much, beyond the limits of this cloud function, it is recommended to split processing, this method applies only to a few hundred data.

Second, save the data in the database as CSV

Node-xlsx not only parses Excel files to take data out of them, but also generates Excel files, so we can take the data out of the cloud database and save it as Excel files, and then upload the saved Excel files to cloud storage.

We can modify the node-excel cloud function to update the file directly after the following code (because the dependencies are the same so you don't need to install the dependencies):

  • This cloud function is to first take out the data inside the database, you can also filter the data according to your own needs, we know that the cloud function can get up to 1000 data at a time, if more than 1000, you need to traverse the processing yourself;

  • dataList.data is an array, the format inside is key: value pair, we dataList.data[index].key form to remove the corresponding value, so this way also dataList.data[index].key.subkey take out the values in the nested sub-document;

  • The cloud function pushes the field values of each row of excel (equivalent to each grid of excel) into a row of data, then pushes the array of each row into a table, then writes the table into an xlsx Buffer file, and then uploads it to cloud storage.

  1. const cloud = require('wx-server-sdk')
  2. cloud.init({
  3. env: 'xly-xrlur'
  4. })
  5. const xlsx = require('node-xlsx');
  6. const db = cloud.database()
  7. const _ = db.command
  8. exports.main = async (event, context) => {
  9. const dataList = await db.collection("chinaexcel").where({
  10. _id:_.exists(true)
  11. }).limit(1000).get()
  12. const data = dataList.data //data是获取到的数据数组,每一个数组都是一个key:value的对象
  13. let sheet = [] // 其实最后就是把这个数组写入excel
  14. let title = ['id','builtup_area','city','city_area','gdp','province','reg_pop','resident_pop']//这是第一行
  15. await sheet.push(title) // 添加完列名 下面就是添加真正的内容了
  16. for(let rowIndex in data){ //
  17. let rowcontent = [] //这是声明每一行的数据
  18. rowcontent.push(data[rowIndex]._id) //注意下面这个与title里面的值的顺序对应
  19. rowcontent.push(data[rowIndex].builtup_area)
  20. rowcontent.push(data[rowIndex].city)
  21. rowcontent.push(data[rowIndex].city_area)
  22. rowcontent.push(data[rowIndex].gdp)
  23. rowcontent.push(data[rowIndex].province)
  24. rowcontent.push(data[rowIndex].reg_pop)
  25. rowcontent.push(data[rowIndex].resident_pop)
  26. await sheet.push(rowcontent) //将每一行的字段添加到rowcontent里面
  27. }
  28. const buffer = await xlsx.build([{name: "china", data: sheet}])
  29. return await cloud.uploadFile({
  30. cloudPath: 'china.xlsx',
  31. fileContent: buffer,
  32. })
  33. }

Third, import Excel more data solutions

As we learned earlier, to import the data from Excel into the database, there will be a situation where the database adds add in the loop, which is very inefficient, even if the time-out of the cloud function is set to 60s, still can only import a small amount of data, if your business often needs to import data into the database, how should we deal with it? We can use the design of embedded sub-documents.

Request add to the database is an increase in records, how many will be requested how many times, and the database request is very time-consuming, resource-consuming, performance-consuming, and the data volume is relatively large success rate is difficult to control, but if you add all the data you want to add as an entire array to the value of a field, you only need to perform a database request operation, such as a collection can be designed as:

  1. {
  2. china:[{...//几百个城市的数据
  3. }]
  4. }

Since it is the value of a field in the record, we can use the update instruction to push the array into the array, which greatly improves the performance of the data import.

  1. db.collection('china').doc(id).update({
  2. data: {
  3. china: _.push([数组])
  4. }
  5. })

Fourth, the Excel file one-click into a cloud database jason file

Here's a script file that runs locally on your computer, not on the cloud. The script file is just the json format needed to turn excel files into a cloud database, and the usefulness is not really great.

Using Excel to import the database developed by the cloud, when the amount of data is relatively large, there will be some problems, we can turn Excel into a CSV file, let CSV's first behavior field name (if the English oh), and then use the following code to turn the CSV file into a johnson file.

  • The first step is to install the Nodejs environment and then use vscode to create a new csv2json .js file and copy the following code in;

  • The second step, right-click csv2json .js in vscode's resource manager, open it in the terminal, and then enter the command npm install csvtojson replace-in-file;

  • The third step, the csv file to be converted in the same directory, here to change to your file can be, that is, the following .csv to change to your csv file;

  • The fourth step, the next code is not in charge, and then open the vscode terminal, input node csv2json.js execution, will generate two files, one is a jason file, one can be imported into the cloud development database data.json

  1. //用vscode打开文件之后,npm install csvtojson replace-in-file
  2. const csv=require('csvtojson')
  3. const replace = require('replace-in-file');
  4. const fs = require('fs')
  5. const csvFilePath='china.csv' //把要转化的csv文件放在同一个目录,这里换成你的文件即可
  6. //后面的代码都不用管,然后打开vscode终端,就会生成两个文件,一个是json文件,一个是可以导入到
  7. csv()
  8. .fromFile(csvFilePath)
  9. .then((jsonObj)=>{
  10. // console.log(jsonObj);
  11. var jsonContent = JSON.stringify(jsonObj);
  12. console.log(jsonContent);
  13. fs.writeFile("output.json", jsonContent, 'utf8', function (err) {
  14. if (err) {
  15. console.log("保存json文件出错.");
  16. return console.log(err);
  17. }
  18. console.log("JSON文件已经被保存为output.json.");
  19. fs.readFile('output.json', 'utf8', function (err,data) {
  20. if (err) {
  21. return console.log(err);
  22. }
  23. var result = data.replace(/},/g, '}\n').replace(/\[/,'').replace(/\]/,'')
  24. fs.writeFile('data.json', result, 'utf8', function (err) {
  25. if (err) return console.log(err);
  26. });
  27. });
  28. });
  29. })