Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

3 ways for list collections to go heavy


Jun 01, 2021 Article blog


Table of contents


The problem comes from

In the actual development, we often encounter such a difficulty: a collection container has a lot of duplicate objects, the object inside does not have the primary key, but according to the needs of the business, we actually need to filter out the objects that are not duplicated according to the criteria.

A more violent approach is to judge by a two-tier loop based on business requirements, adding elements that are not duplicated to the new collection, and skipping elements that already exist in the new collection.

Here's an example of how to create an entity PenBean with the following code:

/**
 * 笔实体
 */
public class PenBean {


    /**类型*/
    private String type;


    /**颜色*/
    private String color;


    //... 省略 setter 和 getter


    public PenBean(String type, String color) {
        this.type = type;
        this.color = color;
    }


    @Override
    public String toString() {
        return "PenBean{" +
                "type='" + type + '\'' +
                ", color='" + color + '\'' +
                '}';
    }
}

Test demo as follows:

public static void main(String[] args) {
    //添加信息,PenBean中没有主键
    List<PenBean> penBeanList = new ArrayList<PenBean>();
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("铅笔","white"));
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("中性笔","white"));
    penBeanList.add(new PenBean("中性笔","white"));


    //新数据
    List<PenBean> newPenBeanList = new ArrayList<PenBean>();
    //传统重复判断
    for (PenBean penBean : penBeanList) {
        if(newPenBeanList.isEmpty()){
            newPenBeanList.add(penBean);
        }else{
            boolean isSame = false;
            for (PenBean newPenBean : newPenBeanList) {
                //依靠type、color来判断,是否有重复元素
                //如果新集合包含元素,直接跳过
                if(penBean.getType().equals(newPenBean.getType()) && penBean.getColor().equals(newPenBean.getColor())){
                    isSame = true;
                    break;
                }
            }
            if(!isSame){
                newPenBeanList.add(penBean);
            }
        }
    }

When working with objects of array types in general, you can use this method to de-re-manipulate array elements to filter out arrays that do not contain duplicate elements.

Is there a more concise way of writing?

The answer must be yes, the contains() method in List is!

(Recommended tutorial: Java tutorial)

1, the use of list in the contyns method to heavy

Before you can use contains() you have to override the equals() method for the PenBean class, why do you want to do this? We'll explain it in detail!

Let's rewrite the equals() method in the PenBean class first, as follows:

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    PenBean penBean = (PenBean) o;
   //当type、color 内容都相等的时候,才返回true
    return Objects.equals(type, penBean.type) &&
            Objects.equals(color, penBean.color);
}

Modify the test demo as follows:

public static void main(String[] args) {
    //添加信息
    List<PenBean> penBeanList = new ArrayList<PenBean>();
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("铅笔","white"));
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("中性笔","white"));
    penBeanList.add(new PenBean("中性笔","white"));


    //新数据
    List<PenBean> newPenBeanList = new ArrayList<PenBean>();
    //使用contain判断,是否有相同的元素
    for (PenBean penBean : penBeanList) {
        if(!newPenBeanList.contains(penBean)){
            newPenBeanList.add(penBean);
        }
    }


    //输出结果
    System.out.println("=========新数据======");
    for (PenBean penBean : newPenBeanList) {
        System.out.println(penBean.toString());
    }
}

The output is as follows:

=========新数据======
PenBean{type='铅笔', color='black'}
PenBean{type='铅笔', color='white'}
PenBean{type='中性笔', color='white'}

If PenBean object does not override equals() contains() method is false The new data is the same as the source data and does not serve the purpose we want to remove duplicate elements

So how does contains() do this and judge that there are the same elements in a collection?

Let's open the contains() method in ArrayList the source code is as follows:

public boolean contains(Object o) {
    return indexOf(o) >= 0;
}

Find indexOf(o) method and continue looking down, the source code is as follows:

public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
           //对象通过 equals 方法,判断是否相同
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

At this point, it is very clear that if the incoming object is null the for loop determines whether the elements in the array have null returns the subscript if there is, and if the incoming object is not null the equals() method of the object, the for loop determines whether there are the same elements, and if so, returns the subscript!

If the array returns the subscript, it must be greater than 0, otherwise return -1!

That's why you use contains() method in List and objects need to override equals() method!

2, java 8 to re-operate

Of course, some friends might think of streaming in JDK1.8 such as the collection element in jdk1.8 as follows:

public static void main(String[] args) {
    //添加信息
    List<PenBean> penBeanList = new ArrayList<PenBean>();
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("铅笔","white"));
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("中性笔","white"));
    penBeanList.add(new PenBean("中性笔","white"));


    //使用java8新特性stream进行List去重
    List<PenBean> newPenBeanList = penBeanList.stream().distinct().collect(Collectors.toList());


    //输出结果
    System.out.println("=========新数据======");
    for (PenBean penBean : newPenBeanList) {
        System.out.println(penBean.toString());
    }
}

Using the Stream.distinct() list provided in jdk1.8 to re-emphasis, Stream.distinct() uses hashCode() and equals() to get different elements, so with this writing, objects need to override hashCode() and equals()

Overrides the hashCode() method for PenBean objects, as follows:

@Override
public int hashCode() {
    return Objects.hash(type, color);
}

In running the test demo, the results are as follows:

=========新数据======
PenBean{type='铅笔', color='black'}
PenBean{type='铅笔', color='white'}
PenBean{type='中性笔', color='white'}

The de-re-operation of collection elements can be achieved!

So why aren't we rewritten when we use String objects as collection elements?

Because String native class in java has been rewritten, the source code is as follows:

public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {

 
 @Override
 public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

 
 @Override
 public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;


        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}
}

(Recommended micro-class: Java micro-class)

3, HashSet to re-operate

In the share above, we introduced the collection of List to re-operate! Among them, netizens also mentioned HashSet can achieve the element of de-weight!

Indeed, HashSet collection naturally supports elements without repetition!

The practice code is as follows!

Or create an PenBean and override the equals() and hashCode() methods in Object as follows:

/**
 * 笔实体
 */
public class PenBean {
    /**类型*/
    private String type;
    /**颜色*/
    private String color;
    //... 省略 setter 和 getter
    public PenBean(String type, String color) {
        this.type = type;
        this.color = color;
    }
    @Override
    public String toString() {
        return "PenBean{" +
                "type='" + type + '\'' +
                ", color='" + color + '\'' +
                '}';
    }

 
 @Override
 public boolean equals(Object o) {
      if (this == o) return true;
      if (o == null || getClass() != o.getClass()) return false;
      PenBean penBean = (PenBean) o;
      //当type、color 内容都相等的时候,才返回true
      return Objects.equals(type, penBean.type) &&
          Objects.equals(color, penBean.color);
 }

 
 @Override
 public int hashCode() {
    return Objects.hash(type, color);
 }

  
}

Create a test demo as follows:

public static void main(String[] args) {
    //添加信息
    List<PenBean> penBeanList = new ArrayList<PenBean>();
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("铅笔","white"));
    penBeanList.add(new PenBean("铅笔","black"));
    penBeanList.add(new PenBean("中性笔","white"));
    penBeanList.add(new PenBean("中性笔","white"));


    //新数据
    List<PenBean> newPenBeanList = new ArrayList<PenBean>();
    //set去重
    HashSet<PenBean> set = new HashSet<>(penBeanList);
    newPenBeanList.addAll(set);


    //输出结果
    System.out.println("=========新数据======");
    for (PenBean penBean : newPenBeanList) {
        System.out.println(penBean.toString());
    }
}

The output is as follows:

=========新数据======
PenBean{type='铅笔', color='white'}
PenBean{type='铅笔', color='black'}
PenBean{type='中性笔', color='white'}

Very detailed, the new collection returned has no duplicate elements!

So how did HashSet do it?

Turn on HashSet source code and see how we built it in:

public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

Obviously, first create an HashMap object, and then call addAll() method, and move on to this method!

public boolean addAll(Collection<? extends E> c) {
    boolean modified = false;
    for (E e : c)
        if (add(e))
            modified = true;
    return modified;
}

First traverse the elements in List and then call the add() method, which has the following source code:

public boolean add(E e) {
    return map.put(e, PRESENT)==null;
}

In fact, it is to insert elements into the HashMap object, where PRESENT is a new Object() constant!

private static final Object PRESENT = new Object();

It's basically clear here that adding elements to HashSet is actually equivalent

Map<Object,Object> map = new HashMap<Object,Object>();
map.put(e,new Object);//e表示要插入的元素

The element inserted in it, e is key in HashMap

We know HashMap is the equals() and hashCode() that determine whether the inserted key is the same key so when we rewrite PenBean object equals() and hashCode() we can guarantee that the element is weighted when the judgment is the same key

(Recommended content: Java interview basic questions)

Finally, by wrapping HashSet which has already gone heavy, and wrapping it with addAll() method in ArrayList you get the data we want that doesn't contain duplicate elements!

文章来源于:mp.weixin.qq.com/s/LBjfarlK5Qv-A85Hey0btA
作者:鸭血粉丝

Here's an introduction to the 3 ways the List collection goes heavy, and hopefully it'll help.