May 22, 2021 Mini Program Cloud Development Advanced
Excel is a more common format for storing data, it is the carrier of daily office operation data, but also a way that many non-technical people often use for data transfer, very frequent use, so it is an important topic to study how to import Excel (CSV) data into the database and export the data from the database to Excel (CSV). In addition to importing export csv files in the cloud development console, we can also use some modules from Nodejs for Excel documents in cloud functions.
We can search for the keyword "Node Excel" on Github to filter Star more, the conditions are more enough, here is recommended to use node-xlsx, Github address: node-xlsx.
Use the developer tool to create a new cloud function such as node-excel, add the latest version of node-xlsx to package.json, and right-click the cloud function directory to select the input command npm install installation dependency to open in the terminal:
"dependencies": {
"wx-server-sdk": "latest",
"node-xlsx": "latest"
}
Then enter the following code .js in the index code, and here are a few points to note:
const cloud = require('wx-server-sdk')
cloud.init({
env: cloud.DYNAMIC_CURRENT_ENV
})
const xlsx = require('node-xlsx');
const db = cloud.database()
exports.main = async (event, context) => {
const fileID = 'cloud://xly-xrlur.786c-xly-xrlur-1300446086/china.csv' //你需要将该csv的地址替换成你的云存储的csv地址
const res = await cloud.downloadFile({
fileID: fileID,
})
const buffer = await res.fileContent
const sheets = await xlsx.parse(buffer); //解析下载后的Excel Buffer文件,sheets是一个对象,而sheets['data']是数组,Excel有多少行数据,这个数组里就有多少个数组;
const sheet = sheets[0].data //取出第一张表里的数组,注意这里的sheet为数组
const tasks = []
for (let rowIndex in sheet) { //如果你的Excel第一行为字段名的话,从第2行开始
let row = sheet[rowIndex];
const task = await db.collection('chinaexcel')
.add({
data: {
city: row[0],
province: row[1],
city_area: row[2],
builtup_area: row[3],
reg_pop: row[4],
resident_pop: row[5],
gdp: row[6]
}
})
tasks.push(task) //task是数据库add请求返回的值,包含数据添加之后的_id,以及是否添加成功
}
return tasks;
}
The data obtained using xlsx.parse parsing Excel files is an array, i.e. the sheets mentioned above, the values in the array are each table of Excel,
sheets[0].data
the data in the first table, and the data is still an array, and the
sheets[0].data
the array are the data of each row of excel tables.
In the object returned by parsing, each array is a row of Excel data.
[
{
name: 'Sheet1',
data: [
[Array], [Array],
... 233 more items
]
}
]
Found that many people use cloud functions to import large amounts of data into the database, using the Project.all() method, this method will appear con consumable problems, will report
[LimitExceeded.NoValidConnection] Connection num overrun
because the database at the same time the number of connections is limited, different package database connections are different, such as free is 20. There are other solutions to this problem, which are not covered here; and although you may have set the timeout time of the cloud function to 60s, it will still occur, the data is not fully imported, obviously your Excel file is too large or the data imported at once is too much, beyond the limits of this cloud function, it is recommended to split processing, this method applies only to a few hundred data.
Node-xlsx not only parses Excel files to take data out of them, but also generates Excel files, so we can take the data out of the cloud database and save it as Excel files, and then upload the saved Excel files to cloud storage.
We can modify the node-excel cloud function to update the file directly after the following code (because the dependencies are the same so you don't need to install the dependencies):
dataList.data[index].key
form to remove the corresponding value, so this way also
dataList.data[index].key.subkey
take out the values in the nested sub-document;
const cloud = require('wx-server-sdk')
cloud.init({
env: 'xly-xrlur'
})
const xlsx = require('node-xlsx');
const db = cloud.database()
const _ = db.command
exports.main = async (event, context) => {
const dataList = await db.collection("chinaexcel").where({
_id:_.exists(true)
}).limit(1000).get()
const data = dataList.data //data是获取到的数据数组,每一个数组都是一个key:value的对象
let sheet = [] // 其实最后就是把这个数组写入excel
let title = ['id','builtup_area','city','city_area','gdp','province','reg_pop','resident_pop']//这是第一行
await sheet.push(title) // 添加完列名 下面就是添加真正的内容了
for(let rowIndex in data){ //
let rowcontent = [] //这是声明每一行的数据
rowcontent.push(data[rowIndex]._id) //注意下面这个与title里面的值的顺序对应
rowcontent.push(data[rowIndex].builtup_area)
rowcontent.push(data[rowIndex].city)
rowcontent.push(data[rowIndex].city_area)
rowcontent.push(data[rowIndex].gdp)
rowcontent.push(data[rowIndex].province)
rowcontent.push(data[rowIndex].reg_pop)
rowcontent.push(data[rowIndex].resident_pop)
await sheet.push(rowcontent) //将每一行的字段添加到rowcontent里面
}
const buffer = await xlsx.build([{name: "china", data: sheet}])
return await cloud.uploadFile({
cloudPath: 'china.xlsx',
fileContent: buffer,
})
}
As we learned earlier, to import the data from Excel into the database, there will be a situation where the database adds add in the loop, which is very inefficient, even if the time-out of the cloud function is set to 60s, still can only import a small amount of data, if your business often needs to import data into the database, how should we deal with it? We can use the design of embedded sub-documents.
Request add to the database is an increase in records, how many will be requested how many times, and the database request is very time-consuming, resource-consuming, performance-consuming, and the data volume is relatively large success rate is difficult to control, but if you add all the data you want to add as an entire array to the value of a field, you only need to perform a database request operation, such as a collection can be designed as:
{
china:[{...//几百个城市的数据
}]
}
Since it is the value of a field in the record, we can use the update instruction to push the array into the array, which greatly improves the performance of the data import.
db.collection('china').doc(id).update({
data: {
china: _.push([数组])
}
})
Here's a script file that runs locally on your computer, not on the cloud. The script file is just the json format needed to turn excel files into a cloud database, and the usefulness is not really great.
Using Excel to import the database developed by the cloud, when the amount of data is relatively large, there will be some problems, we can turn Excel into a CSV file, let CSV's first behavior field name (if the English oh), and then use the following code to turn the CSV file into a johnson file.
//用vscode打开文件之后,npm install csvtojson replace-in-file
const csv=require('csvtojson')
const replace = require('replace-in-file');
const fs = require('fs')
const csvFilePath='china.csv' //把要转化的csv文件放在同一个目录,这里换成你的文件即可
//后面的代码都不用管,然后打开vscode终端,就会生成两个文件,一个是json文件,一个是可以导入到
csv()
.fromFile(csvFilePath)
.then((jsonObj)=>{
// console.log(jsonObj);
var jsonContent = JSON.stringify(jsonObj);
console.log(jsonContent);
fs.writeFile("output.json", jsonContent, 'utf8', function (err) {
if (err) {
console.log("保存json文件出错.");
return console.log(err);
}
console.log("JSON文件已经被保存为output.json.");
fs.readFile('output.json', 'utf8', function (err,data) {
if (err) {
return console.log(err);
}
var result = data.replace(/},/g, '}\n').replace(/\[/,'').replace(/\]/,'')
fs.writeFile('data.json', result, 'utf8', function (err) {
if (err) return console.log(err);
});
});
});
})